qmd-query-expansion-1.7B-ONNX

ONNX version of tobil/qmd-query-expansion-1.7B for use with Transformers.js v4 and WebGPU.

This is a fine-tuned Qwen3-1.7B model for query expansion in QMD (Query Markup Documents). Given a search query, it generates three types of expanded queries:

  • lex โ€” BM25-optimized keywords
  • vec โ€” Dense retrieval sentences for semantic search
  • hyde โ€” Hypothetical Document Embedding (a synthetic passage)

Usage with Transformers.js

import { pipeline } from "@huggingface/transformers";

const generator = await pipeline(
  "text-generation",
  "shreyask/qmd-query-expansion-1.7B-ONNX",
  { dtype: "q4", device: "webgpu" }
);

const output = await generator("/no_think Expand this search query: API versioning", {
  max_new_tokens: 256,
  do_sample: false,
});

console.log(output[0].generated_text);
// Expected output:
// lex: API versioning best practices REST version management
// vec: How to implement API versioning strategies in REST APIs
// hyde: This document discusses API versioning approaches including URL path versioning...

Model Details

  • Base model: Qwen/Qwen3-1.7B (fine-tuned by tobil)
  • ONNX variant: Q4 (4-bit MatMulNBits, block size 32)
  • Size: ~2.2 GB
  • Exported with: optimum + onnxruntime MatMulNBitsQuantizer
  • Intended use: In-browser query expansion for search pipelines

Conversion

Exported from the HuggingFace Transformers checkpoint using optimum-cli, then quantized to Q4:

optimum-cli export onnx -m tobil/qmd-query-expansion-1.7B --task text-generation output/
from onnxruntime.quantization.matmul_nbits_quantizer import MatMulNBitsQuantizer
quant = MatMulNBitsQuantizer(
    model="output/model.onnx", bits=4, block_size=32,
    is_symmetric=True, accuracy_level=4,
)
quant.process()
quant.model.save_model_to_file("onnx/model_q4.onnx", use_external_data_format=True)

Credits

Downloads last month
421
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for shreyask/qmd-query-expansion-1.7B-ONNX

Quantized
(2)
this model