qmd-query-expansion-1.7B-ONNX

ONNX version of tobil/qmd-query-expansion-1.7B for use with Transformers.js v4 and WebGPU.

This is a fine-tuned Qwen3-1.7B model for query expansion in QMD (Query Markup Documents). Given a search query, it generates three types of expanded queries:

lex — BM25-optimized keywords
vec — Dense retrieval sentences for semantic search
hyde — Hypothetical Document Embedding (a synthetic passage)

Usage with Transformers.js

import { pipeline } from "@huggingface/transformers";

const generator = await pipeline(
  "text-generation",
  "shreyask/qmd-query-expansion-1.7B-ONNX",
  { dtype: "q4", device: "webgpu" }
);

const output = await generator("/no_think Expand this search query: API versioning", {
  max_new_tokens: 256,
  do_sample: false,
});

console.log(output[0].generated_text);
// Expected output:
// lex: API versioning best practices REST version management
// vec: How to implement API versioning strategies in REST APIs
// hyde: This document discusses API versioning approaches including URL path versioning...

Model Details

Base model: Qwen/Qwen3-1.7B (fine-tuned by tobil)
ONNX variant: Q4 (4-bit MatMulNBits, block size 32)
Size: ~2.2 GB
Exported with: optimum + onnxruntime MatMulNBitsQuantizer
Intended use: In-browser query expansion for search pipelines

Conversion

Exported from the HuggingFace Transformers checkpoint using optimum-cli, then quantized to Q4:

optimum-cli export onnx -m tobil/qmd-query-expansion-1.7B --task text-generation output/

from onnxruntime.quantization.matmul_nbits_quantizer import MatMulNBitsQuantizer
quant = MatMulNBitsQuantizer(
    model="output/model.onnx", bits=4, block_size=32,
    is_symmetric=True, accuracy_level=4,
)
quant.process()
quant.model.save_model_to_file("onnx/model_q4.onnx", use_external_data_format=True)

Credits

QMD by Tobi Lutke
ONNX conversion by shreyask

Downloads last month: 421

Model tree for shreyask/qmd-query-expansion-1.7B-ONNX

Base model

tobil/qmd-query-expansion-1.7B

Quantized

(2)

this model