[Bug]: [null & default] The searched results number is larger than expected when search with expression "field_name == 0" on nullable field with None data without flush #37734

binbinlv · 2024-11-16T02:25:09Z

Is there an existing issue for this?

I have searched the existing issues

Environment

- Milvus version: master-20241115-d1596297-amd64
- Deployment mode(standalone or cluster):both
- MQ type(rocksmq, pulsar or kafka):    all
- SDK version(e.g. pymilvus v2.0.0rc2): 2.5.0rc121
- OS(Ubuntu or CentOS): 
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

The searched results number is larger than expected when search with expression "field_name == 0" on nullable field with None data without flush

search_results_check: limit(topK) searched (10) is not equal with expected (1) (func_check.py:346)

Expected Behavior

search_results_check: limit(topK) searched (1) is not equal with expected (1) (func_check.py:346)

Steps To Reproduce

    @pytest.mark.tags(CaseLabel.L1)
    # @pytest.mark.skip(reason="issue #37547")
    def test_search_none_data_expr_cache(self, is_flush):
        """
        target: test search case with none data to test expr cache
        method: 1. create collection with double datatype as nullable field
                2. search with expr "nullableFid == 0"
                3. drop this collection
                4. create collection with same collection name and same field name but modify the type of nullable field
                   as varchar datatype
                5. search with expr "nullableFid == 0" again
        expected: 1. search successfully with limit(topK) for the first collection
                  2. report error for the second collection with the same name
        """
        # 1. initialize with data
        collection_w, _, _, insert_ids, time_stamp = \
            self.init_collection_general(prefix, True, is_flush=is_flush)[0:5]
        collection_name = collection_w.name
        # 2. generate search data
        vectors = cf.gen_vectors_based_on_vector_type(default_nq, default_dim)
        # 3. search with expr "nullableFid == 0"
        search_exp = f"{ct.default_float_field_name} == 0"
        output_fields = [default_int64_field_name, default_float_field_name]
        collection_w.search(vectors[:default_nq], default_search_field,
                            default_search_params, default_limit,
                            search_exp,
                            output_fields=output_fields,
                            check_task=CheckTasks.check_search_results,
                            check_items={"nq": default_nq,
                                         "ids": insert_ids,
                                         "limit": 1,
                                         "output_fields": output_fields})
        # 4. drop collection
        collection_w.drop()
        # 5. create the same collection name with same field name but varchar field type
        int64_field = cf.gen_int64_field(is_primary=True)
        string_field = cf.gen_string_field(ct.default_float_field_name)
        json_field = cf.gen_json_field()
        float_vector_field = cf.gen_float_vec_field()
        fields = [int64_field, string_field, json_field, float_vector_field]
        schema = cf.gen_collection_schema(fields)
        collection_w = self.init_collection_wrap(name=collection_name, schema=schema)
        int64_values = pd.Series(data=[i for i in range(default_nb)])
        string_values = pd.Series(data=[str(i) for i in range(default_nb)], dtype="string")
        json_values = [{"number": i, "string": str(i), "bool": bool(i),
                        "list": [j for j in range(i, i + ct.default_json_list_length)]} for i in range(default_nb)]
        float_vec_values = cf.gen_vectors(default_nb, default_dim)
        df = pd.DataFrame({
            ct.default_int64_field_name: int64_values,
            ct.default_float_field_name: string_values,
            ct.default_json_field_name: json_values,
            ct.default_float_vec_field_name: float_vec_values
        })
        collection_w.insert(df)
        collection_w.create_index(ct.default_float_vec_field_name, ct.default_flat_index)
        collection_w.load()
        collection_w.flush()
        collection_w.search(vectors[:default_nq], default_search_field,
                            default_search_params, default_limit,
                            search_exp,
                            output_fields=output_fields,
                            check_task=CheckTasks.err_res,
                            check_items={"err_code": 1100,
                                         "err_msg": "failed to create query plan: cannot parse expression: float == 0, "
                                                    "error: comparisons between VarChar and Int64 are not supported: "
                                                    "invalid parameter"})

Milvus Log

https://grafana-4am.zilliz.cc/explore?orgId=1&left=%7B%22datasource%22:%22Loki%22,%22queries%22:%5B%7B%22refId%22:%22A%22,%22expr%22:%22%7Bcluster%3D%5C%22devops%5C%22,namespace%3D%5C%22chaos-testing%5C%22,pod%3D~%5C%22test-null-master-cjqsw.*%5C%22%7D%22%7D%5D,%22range%22:%7B%22from%22:%22now-1h%22,%22to%22:%22now%22%7D%7D

Anything else?

collection name: search_collection_euwUGGzx

The text was updated successfully, but these errors were encountered:

binbinlv · 2024-11-16T02:26:02Z

if search after flush, it is ok, the number search is 1 not 10.
if search with ""field_name == 1', it is OK, the number search is 1 not 10.

binbinlv · 2024-11-16T02:27:01Z

And after verifying the crash issue #37547 (now the crash issue is fixed), this issue exposed using the same case.

binbinlv added kind/bug Issues or changes related a bug triage/accepted Indicates an issue or PR is ready to be actively worked on. 2.5-features labels Nov 16, 2024

binbinlv added this to the 2.5.0 milestone Nov 16, 2024

binbinlv assigned smellthemoon Nov 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: [null & default] The searched results number is larger than expected when search with expression "field_name == 0" on nullable field with None data without flush #37734

[Bug]: [null & default] The searched results number is larger than expected when search with expression "field_name == 0" on nullable field with None data without flush #37734

binbinlv commented Nov 16, 2024 •

edited

Loading

binbinlv commented Nov 16, 2024

binbinlv commented Nov 16, 2024

[Bug]: [null & default] The searched results number is larger than expected when search with expression "field_name == 0" on nullable field with None data without flush #37734

[Bug]: [null & default] The searched results number is larger than expected when search with expression "field_name == 0" on nullable field with None data without flush #37734

Comments

binbinlv commented Nov 16, 2024 • edited Loading

Is there an existing issue for this?

Environment

Current Behavior

Expected Behavior

Steps To Reproduce

Milvus Log

Anything else?

binbinlv commented Nov 16, 2024

binbinlv commented Nov 16, 2024

binbinlv commented Nov 16, 2024 •

edited

Loading