Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't infer Qwen2-1.5B with a lora #1186

Open
busishengui opened this issue Jan 15, 2025 · 2 comments
Open

Can't infer Qwen2-1.5B with a lora #1186

busishengui opened this issue Jan 15, 2025 · 2 comments

Comments

@busishengui
Copy link

I use the Qwen2-1.5B model and a lora:

  • Convert Qwen2-1.5B
python3  builder.py  -m Qwen2-1.5B-Instruct  -o  Qwen2-1.5B-Instruct-onnx-int4 -p int4 -e cpu  --extra_options int4_block_size=128 int4_accuracy_level=4 int4_op_types_to_quantize=MatMul/Gather
  • Convert the lora
python -m olive convert-adapters -a ./release --adapter_format onnx_adapter -o ./release  --log_level 4

+Infer code

auto model = OgaModel::Create(path_model_dir.c_str());
auto lora = OgaAdapters::Create(*model);
lora->LoadAdapter("release.onnx_adapter", "best_lora");
auto tokenizer = OgaTokenizer::Create(*model);
auto tokenizer_stream = OgaTokenizerStream::Create(*tokenizer);
auto params = OgaGeneratorParams::Create(*model);
params->SetSearchOption("max_length", 128);
auto seq = OgaSequences::Create();
tokenizer->Encode(query.c_str(), *seq);
params->SetInputSequences(*seq);
auto generator = OgaGenerator::Create(*model, *params);
generator->SetActiveAdapter(*lora, "best_lora");
std::stringstream result_ss;
while (!generator->IsDone())
{
	generator->ComputeLogits();
	generator->GenerateNextToken();
	const auto num_tokens = generator->GetSequenceCount(0);
	const auto new_token = generator->GetSequenceData(0)[num_tokens - 1];
        result_ss << tokenizer_stream->Decode(new_token);
}

but it dump !!! I don't know why

terminate called after throwing an instance of 'std::runtime_error'
  what():  Invalid input name: model.layers.9.self_attn.v_proj.lora_B.weight
  • Version
  1. onnxruntime-genai : 0.5.2
  2. olive-ai 0.7.1.1
@ambroser53
Copy link

The documentation for this is pretty bad so I had the same issue. You have to convert the base model using olive as well otherwise it won't have the empty lora nodes that expect the adapter parameters. Qwen is even more finicky as convert-adapters doesn't work with it at all and you have to always use auto-opt and export the base model everytime with each set of adapters you want to convert. See my issue here to see what I mean.

Try this command:

olive auto-opt \
   --model_name_or_path Qwen2-1.5B-Instruct \
   --adapter_path ./release
   --device cpu \
   --provider CPUExecutionProvider \
   --use_ort_genai \
   --output_path ./release \
   --log_level 4 --precision int4 --use_model_builder

@busishengui
Copy link
Author

The documentation for this is pretty bad so I had the same issue. You have to convert the base model using olive as well otherwise it won't have the empty lora nodes that expect the adapter parameters. Qwen is even more finicky as convert-adapters doesn't work with it at all and you have to always use auto-opt and export the base model everytime with each set of adapters you want to convert. See my issue here to see what I mean.

Try this command:

olive auto-opt \
   --model_name_or_path Qwen2-1.5B-Instruct \
   --adapter_path ./release
   --device cpu \
   --provider CPUExecutionProvider \
   --use_ort_genai \
   --output_path ./release \
   --log_level 4 --precision int4 --use_model_builder

I have the same problem

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants