qmd-query-expansion-1.7B-ONNX
ONNX version of tobil/qmd-query-expansion-1.7B for use with Transformers.js v4 and WebGPU.
This is a fine-tuned Qwen3-1.7B model for query expansion in QMD (Query Markup Documents). Given a search query, it generates three types of expanded queries:
- lex โ BM25-optimized keywords
- vec โ Dense retrieval sentences for semantic search
- hyde โ Hypothetical Document Embedding (a synthetic passage)
Usage with Transformers.js
import { pipeline } from "@huggingface/transformers";
const generator = await pipeline(
"text-generation",
"shreyask/qmd-query-expansion-1.7B-ONNX",
{ dtype: "q4", device: "webgpu" }
);
const output = await generator("/no_think Expand this search query: API versioning", {
max_new_tokens: 256,
do_sample: false,
});
console.log(output[0].generated_text);
// Expected output:
// lex: API versioning best practices REST version management
// vec: How to implement API versioning strategies in REST APIs
// hyde: This document discusses API versioning approaches including URL path versioning...
Model Details
- Base model: Qwen/Qwen3-1.7B (fine-tuned by tobil)
- ONNX variant: Q4 (4-bit MatMulNBits, block size 32)
- Size: ~2.2 GB
- Exported with: optimum + onnxruntime MatMulNBitsQuantizer
- Intended use: In-browser query expansion for search pipelines
Conversion
Exported from the HuggingFace Transformers checkpoint using optimum-cli, then quantized to Q4:
optimum-cli export onnx -m tobil/qmd-query-expansion-1.7B --task text-generation output/
from onnxruntime.quantization.matmul_nbits_quantizer import MatMulNBitsQuantizer
quant = MatMulNBitsQuantizer(
model="output/model.onnx", bits=4, block_size=32,
is_symmetric=True, accuracy_level=4,
)
quant.process()
quant.model.save_model_to_file("onnx/model_q4.onnx", use_external_data_format=True)
Credits
- QMD by Tobi Lutke
- ONNX conversion by shreyask
- Downloads last month
- 421
Model tree for shreyask/qmd-query-expansion-1.7B-ONNX
Base model
tobil/qmd-query-expansion-1.7B