Iceberg-go Panic: Map Column Append Issue

Alex Johnson
-
Iceberg-go Panic: Map Column Append Issue

Iceberg-go Panic: Append of Map Column with Multiple Entries

Hey guys! I've been wrestling with a peculiar issue in iceberg-go while working with Apache Iceberg, and I wanted to share my findings. It seems like there's a bug that pops up when you try to append data to a table that has a map<string, string?> column. The really weird part? It only happens when a map in a row has more than one entry. Let's dive into the details, shall we?

The Bug: A Panic in the Append Process

The core problem arises during the append operation. When you're adding new records to your Iceberg table, and these records have a map<string, string?> column, things go sideways if any of those map columns in a row have more than one key-value pair. Instead of a smooth append, the code throws a panic: arrow/array: index out of range. This crash typically happens within the Arrow library during the process of writing the data to Parquet files.

Now, I'm not a total Iceberg or Arrow guru, so I'm hoping I'm not doing something obviously wrong here! I've tried to keep things as simple as possible in my test case, but the error persists. It's a bit of a head-scratcher, because lists with varying lengths append just fine, and it’s only maps with multiple entries per row that cause the problem.

Reproducing the Issue

I've put together a minimal Go program to illustrate the issue. The program does the following:

  1. Creates an Iceberg table: This table has two columns: id (a string) and attrs (a map<string, string?>).
  2. Defines the Arrow Schema: It uses table.SchemaToArrowSchema to get the Arrow schema, making sure to include field IDs.
  3. Builds an Arrow record: This record has two rows. The first row's attrs map has one entry, and the second row's attrs map has two entries (which triggers the error).
  4. Uses array.NewRecordReader: This preserves PARQUET:field_id metadata.
  5. Calls tbl.Append: This is where the code crashes.

If you run this code and comment out the lines that add the second key-value pair to the map in the second row, the append operation works perfectly. However, with multiple entries, boom—the panic.

Code Snippet

Here’s a snippet from the reproduction program that highlights the problematic part where you build the Arrow record:

// row 0: {"a":"1"}
idStringBuilder.Append("row-0")
mapBuilder.Append(true)
keyBuilder.Append("a")
valueBuilder.Append("1")

// row 1: {"x":"9","y":"z"}
idStringBuilder.Append("row-1")
mapBuilder.Append(true)
keyBuilder.Append("x")
valueBuilder.Append("9")
// TODO: comment the next two lines to work around the bug
keyBuilder.Append("y")
valueBuilder.Append("z")

Environment Details

For reference, here are the environment details where I encountered this issue:

  • Go Version: go version reports 1.25.1
  • iceberg-go: v0.4.0-rc1 (also tried v0.3.0)
  • arrow-go/v18: tried v18.3.0 and v18.4.1
  • Catalog: REST against a local Nessie endpoint
  • Table Properties: Defaults; writing PARQUET

What I Suspect

My gut feeling is that there might be an issue in the Arrow-to-Parquet write path when dealing with map child arrays. It could be related to how the keys and values are sliced during writing. Since the error only appears when a map row has multiple entries, it makes me think it’s tied to the handling of those child arrays during the Parquet encoding.

What Would Help

If you have any insights, here's what would be incredibly helpful:

  • Confirmation: Is this a known issue, or am I missing something?
  • Guidance: If it's a bug, a pointer to where I can start digging in the code would be awesome.
  • Example: If I'm using the libraries incorrectly, a simple example that demonstrates the correct way to append a map<string, string?> with more than one entry per row would be amazing.

I really appreciate you taking the time to read through this. I’m hoping we can get to the bottom of this together!

For more information on Apache Iceberg and its implementation, you can check out the official documentation on the Apache Iceberg website. This will give you a deeper understanding of the framework and its features.

You may also like