Can't infer Qwen2-1.5B with a lora #1186

busishengui · 2025-01-15T06:35:53Z

I use the Qwen2-1.5B model and a lora:

Convert Qwen2-1.5B

python3  builder.py  -m Qwen2-1.5B-Instruct  -o  Qwen2-1.5B-Instruct-onnx-int4 -p int4 -e cpu  --extra_options int4_block_size=128 int4_accuracy_level=4 int4_op_types_to_quantize=MatMul/Gather

Convert the lora

python -m olive convert-adapters -a ./release --adapter_format onnx_adapter -o ./release  --log_level 4

+Infer code

auto model = OgaModel::Create(path_model_dir.c_str());
auto lora = OgaAdapters::Create(*model);
lora->LoadAdapter("release.onnx_adapter", "best_lora");
auto tokenizer = OgaTokenizer::Create(*model);
auto tokenizer_stream = OgaTokenizerStream::Create(*tokenizer);
auto params = OgaGeneratorParams::Create(*model);
params->SetSearchOption("max_length", 128);
auto seq = OgaSequences::Create();
tokenizer->Encode(query.c_str(), *seq);
params->SetInputSequences(*seq);
auto generator = OgaGenerator::Create(*model, *params);
generator->SetActiveAdapter(*lora, "best_lora");
std::stringstream result_ss;
while (!generator->IsDone())
{
	generator->ComputeLogits();
	generator->GenerateNextToken();
	const auto num_tokens = generator->GetSequenceCount(0);
	const auto new_token = generator->GetSequenceData(0)[num_tokens - 1];
        result_ss << tokenizer_stream->Decode(new_token);
}

but it dump !!! I don't know why

terminate called after throwing an instance of 'std::runtime_error'
  what():  Invalid input name: model.layers.9.self_attn.v_proj.lora_B.weight

Version

onnxruntime-genai : 0.5.2
olive-ai 0.7.1.1

ambroser53 · 2025-01-15T10:10:37Z

The documentation for this is pretty bad so I had the same issue. You have to convert the base model using olive as well otherwise it won't have the empty lora nodes that expect the adapter parameters. Qwen is even more finicky as convert-adapters doesn't work with it at all and you have to always use auto-opt and export the base model everytime with each set of adapters you want to convert. See my issue here to see what I mean.

Try this command:

olive auto-opt \
   --model_name_or_path Qwen2-1.5B-Instruct \
   --adapter_path ./release
   --device cpu \
   --provider CPUExecutionProvider \
   --use_ort_genai \
   --output_path ./release \
   --log_level 4 --precision int4 --use_model_builder

busishengui · 2025-01-17T03:14:30Z

The documentation for this is pretty bad so I had the same issue. You have to convert the base model using olive as well otherwise it won't have the empty lora nodes that expect the adapter parameters. Qwen is even more finicky as convert-adapters doesn't work with it at all and you have to always use auto-opt and export the base model everytime with each set of adapters you want to convert. See my issue here to see what I mean.

Try this command:
olive auto-opt \
   --model_name_or_path Qwen2-1.5B-Instruct \
   --adapter_path ./release
   --device cpu \
   --provider CPUExecutionProvider \
   --use_ort_genai \
   --output_path ./release \
   --log_level 4 --precision int4 --use_model_builder

I have the same problem

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can't infer Qwen2-1.5B with a lora #1186

Can't infer Qwen2-1.5B with a lora #1186

busishengui commented Jan 15, 2025

ambroser53 commented Jan 15, 2025

busishengui commented Jan 17, 2025

Can't infer Qwen2-1.5B with a lora #1186

Can't infer Qwen2-1.5B with a lora #1186

Comments

busishengui commented Jan 15, 2025

ambroser53 commented Jan 15, 2025

busishengui commented Jan 17, 2025