Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Index name is not respected properly in staged writes #2124

Open
G-D-Petrov opened this issue Jan 16, 2025 · 1 comment
Open

Index name is not respected properly in staged writes #2124

G-D-Petrov opened this issue Jan 16, 2025 · 1 comment

Comments

@G-D-Petrov
Copy link
Collaborator

          The field count will also be `0` for range indexes. I just checked and range indexes can have a name in Pandas, but I don't know if we persist them, can you check?

Originally posted by @alexowens90 in #2116 (comment)

@G-D-Petrov
Copy link
Collaborator Author

@pytest.mark.parametrize("append", (True, False))
@pytest.mark.parametrize("delete_staged_data_on_failure", [True, False])
def test_parallel_dynamic_schema_named_range_index(
    lmdb_version_store_tiny_segment_dynamic, append, delete_staged_data_on_failure
):
    lib = lmdb_version_store_tiny_segment_dynamic
    sym = "test_parallel_dynamic_schema_named_range_index"
    df_0 = pd.DataFrame(
        {"col_0": [0], "col_1": [0.5]}, index=pd.RangeIndex(1)
    )
    df_0.index.name = "date"
    df_1 = pd.DataFrame({"col_0": [1]}, index=pd.RangeIndex(1))
    df_1.index.name = "index"
    if append:
        lib.write(sym, df_0)
        lib.append(sym, df_1, incomplete=True)
    else:
        lib.write(sym, df_0, parallel=True)
        lib.write(sym, df_1, parallel=True)

    with pytest.raises(SchemaException) as exception_info:
        lib.compact_incomplete(
            sym,
            append,
            False,
            delete_staged_data_on_failure=delete_staged_data_on_failure,
        )

    assert "date" in str(exception_info.value)
    staged_keys = 1 if append else 2
    expected_key_count = 0 if delete_staged_data_on_failure else staged_keys
    assert len(get_append_keys(lib, sym)) == expected_key_count

This test should pass because the 2 df that need to be compacted have different names.
Currently it doesn't fail, because the index names for range indexes are not passed through to the cpp layer and only the name of the first one is stored in the normalization metadata.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant