Initial commit

Browse files

Files changed (10) hide show

.gitattributes +1 -0
README.md +611 -3
added_tokens.json +28 -0
config.json +71 -0
merges.txt +0 -0
model.safetensors +3 -0
special_tokens_map.json +31 -0
tokenizer.json +3 -0
tokenizer_config.json +239 -0
vocab.json +0 -0

.gitattributes CHANGED Viewed

@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+tokenizer.json filter=lfs diff=lfs merge=lfs -text

README.md CHANGED Viewed

@@ -1,3 +1,611 @@
----
-license: apache-2.0
----

+---
+tags:
+- sentence-transformers
+- cross-encoder
+- reranker
+- generated_from_trainer
+- dataset_size:1792739
+- loss:CachedMultipleNegativesRankingLoss
+base_model: tomaarsen/Qwen3-Reranker-0.6B-seq-cls
+pipeline_tag: text-ranking
+library_name: sentence-transformers
+---
+# CrossEncoder based on tomaarsen/Qwen3-Reranker-0.6B-seq-cls
+This is a [Cross Encoder](https://www.sbert.net/docs/cross_encoder/usage/usage.html) model finetuned from [tomaarsen/Qwen3-Reranker-0.6B-seq-cls](https://huggingface.co/tomaarsen/Qwen3-Reranker-0.6B-seq-cls) on the json dataset using the [sentence-transformers](https://www.SBERT.net) library. It computes scores for pairs of texts, which can be used for text reranking and semantic search.
+## Model Details
+### Model Description
+- **Model Type:** Cross Encoder
+- **Base model:** [tomaarsen/Qwen3-Reranker-0.6B-seq-cls](https://huggingface.co/tomaarsen/Qwen3-Reranker-0.6B-seq-cls) <!-- at revision 6a5829f5079c66e78d911e06fe21931cc00232f7 -->
+- **Maximum Sequence Length:** 40960 tokens
+- **Number of Output Labels:** 1 label
+- **Training Dataset:**
+    - json
+<!-- - **Language:** Unknown -->
+<!-- - **License:** Unknown -->
+### Model Sources
+- **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
+- **Documentation:** [Cross Encoder Documentation](https://www.sbert.net/docs/cross_encoder/usage/usage.html)
+- **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
+- **Hugging Face:** [Cross Encoders on Hugging Face](https://huggingface.co/models?library=sentence-transformers&other=cross-encoder)
+## Usage
+### Direct Usage (Sentence Transformers)
+First install the Sentence Transformers library:
+```bash
+pip install -U sentence-transformers
+```
+Then you can load this model and run inference.
+```python
+from sentence_transformers import CrossEncoder
+# Download from the 🤗 Hub
+model = CrossEncoder("cross_encoder_model_id")
+# Get scores for pairs of texts
+pairs = [
+    ['<|im_start|>system\nJudge whether the Document meets the requirements based on the Query and the Instruct provided. Note that the answer can only be "yes" or "no".<|im_end|>\n<|im_start|>user\n<Instruct>: Given a web search query, retrieve relevant passages that answer the query\n<Query>: ATP란?\n', '<Document>: 아데노신 삼인산 아데노신 삼인산(, ATP)은 생명체의 주된 에너지원이다.<|im_end|>\n<|im_start|>assistant\n<think>\n\n</think>\n\n'],
+    ['<|im_start|>system\nJudge whether the Document meets the requirements based on the Query and the Instruct provided. Note that the answer can only be "yes" or "no".<|im_end|>\n<|im_start|>user\n<Instruct>: Given a web search query, retrieve relevant passages that answer the query\n<Query>: 난촨구와 둥촨구는 어느 나라에 위치해 있습니까?\n', '<Document>: 난촨구(南川区)는 중국 충칭의 구이자 이전의 현이다.<|im_end|>\n<|im_start|>assistant\n<think>\n\n</think>\n\n'],
+    ['<|im_start|>system\nJudge whether the Document meets the requirements based on the Query and the Instruct provided. Note that the answer can only be "yes" or "no".<|im_end|>\n<|im_start|>user\n<Instruct>: Given a web search query, retrieve relevant passages that answer the query\n<Query>: 그저우와 헤이룽장성 동닝은 어떤 나라와 접경하고 있습니까?\n', '<Document>: 허주(贺州)는 중화인민공화국 광시 좡족 자치구 북동부에 위치한 지급시이다.<|im_end|>\n<|im_start|>assistant\n<think>\n\n</think>\n\n'],
+    ['<|im_start|>system\nJudge whether the Document meets the requirements based on the Query and the Instruct provided. Note that the answer can only be "yes" or "no".<|im_end|>\n<|im_start|>user\n<Instruct>: Given a web search query, retrieve relevant passages that answer the query\n<Query>: 가짜대나무(Pseudosasa)와 별꽃(Cerastium)은 모두 자생 식물과 관련이 있습니까?\n', '<Document>: 가짜사사(Pseudosasa)는 풀과에 속하는 동아시아 대나무의 속입니다.<|im_end|>\n<|im_start|>assistant\n<think>\n\n</think>\n\n'],
+    ['<|im_start|>system\nJudge whether the Document meets the requirements based on the Query and the Instruct provided. Note that the answer can only be "yes" or "no".<|im_end|>\n<|im_start|>user\n<Instruct>: Given a web search query, retrieve relevant passages that answer the query\n<Query>: 샤허(Shahhe), 허베이(河北)와 조청(邹城)은 모두 현급 도시인가요?\n', '<Document>: 샤허(Shahe)는 중국 허베이성의 남부에 위치한 싱타이(Xingtai) 지구의 군급 도시입니다.<|im_end|>\n<|im_start|>assistant\n<think>\n\n</think>\n\n'],
+]
+scores = model.predict(pairs)
+print(scores.shape)
+# (5,)
+# Or rank different texts based on similarity to a single text
+ranks = model.rank(
+    '<|im_start|>system\nJudge whether the Document meets the requirements based on the Query and the Instruct provided. Note that the answer can only be "yes" or "no".<|im_end|>\n<|im_start|>user\n<Instruct>: Given a web search query, retrieve relevant passages that answer the query\n<Query>: ATP란?\n',
+    [
+        '<Document>: 아데노신 삼인산 아데노신 삼인산(, ATP)은 생명체의 주된 에너지원이다.<|im_end|>\n<|im_start|>assistant\n<think>\n\n</think>\n\n',
+        '<Document>: 난촨구(南川区)는 중국 충칭의 구이자 이전의 현이다.<|im_end|>\n<|im_start|>assistant\n<think>\n\n</think>\n\n',
+        '<Document>: 허주(贺州)는 중화인민공화국 광시 좡족 자치구 북동부에 위치한 지급시이다.<|im_end|>\n<|im_start|>assistant\n<think>\n\n</think>\n\n',
+        '<Document>: 가짜사사(Pseudosasa)는 풀과에 속하는 동아시아 대나무의 속입니다.<|im_end|>\n<|im_start|>assistant\n<think>\n\n</think>\n\n',
+        '<Document>: 샤허(Shahe)는 중국 허베이성의 남부에 위치한 싱타이(Xingtai) 지구의 군급 도시입니다.<|im_end|>\n<|im_start|>assistant\n<think>\n\n</think>\n\n',
+    ]
+)
+# [{'corpus_id': ..., 'score': ...}, {'corpus_id': ..., 'score': ...}, ...]
+```
+<!--
+### Direct Usage (Transformers)
+<details><summary>Click to see the direct usage in Transformers</summary>
+</details>
+-->
+<!--
+### Downstream Usage (Sentence Transformers)
+You can finetune this model on your own dataset.
+<details><summary>Click to expand</summary>
+</details>
+-->
+<!--
+### Out-of-Scope Use
+*List how the model may foreseeably be misused and address what users ought not to do with the model.*
+-->
+<!--
+## Bias, Risks and Limitations
+*What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
+-->
+<!--
+### Recommendations
+*What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
+-->
+## Training Details
+### Training Dataset
+#### json
+* Dataset: json
+* Size: 1,792,739 training samples
+* Columns: <code>query</code>, <code>positive</code>, <code>negative_1</code>, <code>negative_2</code>, and <code>negative_3</code>
+* Approximate statistics based on the first 1000 samples:
+  |         | query                                                                                             | positive                                                                                         | negative_1                                                                                       | negative_2                                                                                      | negative_3                                                                                       |
+  |:--------|:--------------------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------|
+  | type    | string                                                                                            | string                                                                                           | string                                                                                           | string                                                                                          | string                                                                                           |
+  | details | <ul><li>min: 289 characters</li><li>mean: 317.46 characters</li><li>max: 406 characters</li></ul> | <ul><li>min: 90 characters</li><li>mean: 154.19 characters</li><li>max: 184 characters</li></ul> | <ul><li>min: 72 characters</li><li>mean: 149.13 characters</li><li>max: 184 characters</li></ul> | <ul><li>min: 79 characters</li><li>mean: 148.5 characters</li><li>max: 184 characters</li></ul> | <ul><li>min: 70 characters</li><li>mean: 149.09 characters</li><li>max: 184 characters</li></ul> |
+* Samples:
+  | query                                                                                                                                                                                                                                                                                                                                                       | positive                                                                                                                                 | negative_1                                                                                                                                  | negative_2                                                                                                                                       | negative_3                                                                                                                                        |
+  |:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------------------------------------------------------------------|
+  | <code><|im_start|>system<br>Judge whether the Document meets the requirements based on the Query and the Instruct provided. Note that the answer can only be "yes" or "no".<|im_end|><br><|im_start|>user<br><Instruct>: Given a web search query, retrieve relevant passages that answer the query<br><Query>: ATP란?<br></code>                            | <code><Document>: 아데노신 삼인산 아데노신 삼인산(, ATP)은 생명체의 주된 에너지원이다.<|im_end|><br><|im_start|>assistant<br><think><br><br></think><br><br></code> | <code><Document>: ATP ATP는 다음 뜻의 약자이다.<|im_end|><br><|im_start|>assistant<br><think><br><br></think><br><br></code>                         | <code><Document>: 해당 실제로 ADP는 ADPMg로, ATP는 ATPMg로 존재한다.<|im_end|><br><|im_start|>assistant<br><think><br><br></think><br><br></code>             | <code><Document>: ATE ATE는 다음을 가리킨다.<|im_end|><br><|im_start|>assistant<br><think><br><br></think><br><br></code>                                 |
+  | <code><|im_start|>system<br>Judge whether the Document meets the requirements based on the Query and the Instruct provided. Note that the answer can only be "yes" or "no".<|im_end|><br><|im_start|>user<br><Instruct>: Given a web search query, retrieve relevant passages that answer the query<br><Query>: 난촨구와 둥촨구는 어느 나라에 위치해 있습니까?<br></code>       | <code><Document>: 난촨구(南川区)는 중국 충칭의 구이자 이전의 현이다.<|im_end|><br><|im_start|>assistant<br><think><br><br></think><br><br></code>             | <code><Document>: 남풍현(南丰县)은 중국 장시성(江西省) 푸저우(福州)에 위치한 군이다.<|im_end|><br><|im_start|>assistant<br><think><br><br></think><br><br></code>      | <code><Document>: 도교, 광둥 도교(道滘)는 중국 남부 광둥성 동관 시의 관할 하에 있는 도시입니다.<|im_end|><br><|im_start|>assistant<br><think><br><br></think><br><br></code>    | <code><Document>: 동포구 동포구는 중국 쓰촨성의 구역입니다. 이곳은 메이산시의 관할 하에 있습니다.<|im_end|><br><|im_start|>assistant<br><think><br><br></think><br><br></code>      |
+  | <code><|im_start|>system<br>Judge whether the Document meets the requirements based on the Query and the Instruct provided. Note that the answer can only be "yes" or "no".<|im_end|><br><|im_start|>user<br><Instruct>: Given a web search query, retrieve relevant passages that answer the query<br><Query>: 그저우와 헤이룽장성 동닝은 어떤 나라와 접경하고 있습니까?<br></code> | <code><Document>: 허주(贺州)는 중화인민공화국 광시 좡족 자치구 북동부에 위치한 지급시이다.<|im_end|><br><|im_start|>assistant<br><think><br><br></think><br><br></code> | <code><Document>: 지관구(지관구)는 중국 인민공화국 헤이룽장성 지시시의 구이자 시청 소재지입니다.<|im_end|><br><|im_start|>assistant<br><think><br><br></think><br><br></code> | <code><Document>: 헤동 가도(河东街道)는 중국 광시(广西) 리우저우(柳州) 청중 구(城中区)의 가도입니다.<|im_end|><br><|im_start|>assistant<br><think><br><br></think><br><br></code> | <code><Document>: 화닝현 (华宁县; 병음: Huáníng Xiàn)은 중국 윈난성 유시시에 위치해 있습니다.<|im_end|><br><|im_start|>assistant<br><think><br><br></think><br><br></code> |
+* Loss: [<code>CachedMultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/cross_encoder/losses.html#cachedmultiplenegativesrankingloss) with these parameters:
+  ```json
+  {
+      "scale": 15,
+      "num_negatives": 61,
+      "activation_fn": "torch.nn.modules.activation.Sigmoid",
+      "mini_batch_size": 4
+  }
+  ```
+### Training Hyperparameters
+#### Non-Default Hyperparameters
+- `per_device_train_batch_size`: 1024
+- `per_device_eval_batch_size`: 32
+- `learning_rate`: 2e-05
+- `num_train_epochs`: 1
+- `warmup_ratio`: 0.05
+- `bf16`: True
+- `ddp_find_unused_parameters`: True
+- `ddp_timeout`: 7200
+- `batch_sampler`: no_duplicates
+#### All Hyperparameters
+<details><summary>Click to expand</summary>
+- `overwrite_output_dir`: False
+- `do_predict`: False
+- `eval_strategy`: no
+- `prediction_loss_only`: True
+- `per_device_train_batch_size`: 1024
+- `per_device_eval_batch_size`: 32
+- `per_gpu_train_batch_size`: None
+- `per_gpu_eval_batch_size`: None
+- `gradient_accumulation_steps`: 1
+- `eval_accumulation_steps`: None
+- `torch_empty_cache_steps`: None
+- `learning_rate`: 2e-05
+- `weight_decay`: 0.0
+- `adam_beta1`: 0.9
+- `adam_beta2`: 0.999
+- `adam_epsilon`: 1e-08
+- `max_grad_norm`: 1.0
+- `num_train_epochs`: 1
+- `max_steps`: -1
+- `lr_scheduler_type`: linear
+- `lr_scheduler_kwargs`: {}
+- `warmup_ratio`: 0.05
+- `warmup_steps`: 0
+- `log_level`: passive
+- `log_level_replica`: warning
+- `log_on_each_node`: True
+- `logging_nan_inf_filter`: True
+- `save_safetensors`: True
+- `save_on_each_node`: False
+- `save_only_model`: False
+- `restore_callback_states_from_checkpoint`: False
+- `no_cuda`: False
+- `use_cpu`: False
+- `use_mps_device`: False
+- `seed`: 42
+- `data_seed`: None
+- `jit_mode_eval`: False
+- `use_ipex`: False
+- `bf16`: True
+- `fp16`: False
+- `fp16_opt_level`: O1
+- `half_precision_backend`: auto
+- `bf16_full_eval`: False
+- `fp16_full_eval`: False
+- `tf32`: None
+- `local_rank`: 0
+- `ddp_backend`: None
+- `tpu_num_cores`: None
+- `tpu_metrics_debug`: False
+- `debug`: []
+- `dataloader_drop_last`: True
+- `dataloader_num_workers`: 0
+- `dataloader_prefetch_factor`: None
+- `past_index`: -1
+- `disable_tqdm`: False
+- `remove_unused_columns`: True
+- `label_names`: None
+- `load_best_model_at_end`: False
+- `ignore_data_skip`: False
+- `fsdp`: []
+- `fsdp_min_num_params`: 0
+- `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
+- `fsdp_transformer_layer_cls_to_wrap`: None
+- `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
+- `deepspeed`: None
+- `label_smoothing_factor`: 0.0
+- `optim`: adamw_torch
+- `optim_args`: None
+- `adafactor`: False
+- `group_by_length`: False
+- `length_column_name`: length
+- `ddp_find_unused_parameters`: True
+- `ddp_bucket_cap_mb`: None
+- `ddp_broadcast_buffers`: False
+- `dataloader_pin_memory`: True
+- `dataloader_persistent_workers`: False
+- `skip_memory_metrics`: True
+- `use_legacy_prediction_loop`: False
+- `push_to_hub`: False
+- `resume_from_checkpoint`: None
+- `hub_model_id`: None
+- `hub_strategy`: every_save
+- `hub_private_repo`: None
+- `hub_always_push`: False
+- `hub_revision`: None
+- `gradient_checkpointing`: False
+- `gradient_checkpointing_kwargs`: None
+- `include_inputs_for_metrics`: False
+- `include_for_metrics`: []
+- `eval_do_concat_batches`: True
+- `fp16_backend`: auto
+- `push_to_hub_model_id`: None
+- `push_to_hub_organization`: None
+- `mp_parameters`:
+- `auto_find_batch_size`: False
+- `full_determinism`: False
+- `torchdynamo`: None
+- `ray_scope`: last
+- `ddp_timeout`: 7200
+- `torch_compile`: False
+- `torch_compile_backend`: None
+- `torch_compile_mode`: None
+- `include_tokens_per_second`: False
+- `include_num_input_tokens_seen`: False
+- `neftune_noise_alpha`: None
+- `optim_target_modules`: None
+- `batch_eval_metrics`: False
+- `eval_on_start`: False
+- `use_liger_kernel`: False
+- `liger_kernel_config`: None
+- `eval_use_gather_object`: False
+- `average_tokens_across_devices`: False
+- `prompts`: None
+- `batch_sampler`: no_duplicates
+- `multi_dataset_batch_sampler`: proportional
+- `router_mapping`: {}
+- `learning_rate_mapping`: {}
+</details>
+### Training Logs
+<details><summary>Click to expand</summary>
+| Epoch  | Step | Training Loss |
+|:------:|:----:|:-------------:|
+| 0.0034 | 1    | 1.2714        |
+| 0.0069 | 2    | 1.3902        |
+| 0.0103 | 3    | 1.3308        |
+| 0.0137 | 4    | 1.2726        |
+| 0.0172 | 5    | 1.2519        |
+| 0.0206 | 6    | 1.1254        |
+| 0.0241 | 7    | 0.9001        |
+| 0.0275 | 8    | 0.7529        |
+| 0.0309 | 9    | 0.9942        |
+| 0.0344 | 10   | 0.8769        |
+| 0.0378 | 11   | 0.6895        |
+| 0.0412 | 12   | 0.6813        |
+| 0.0447 | 13   | 0.6841        |
+| 0.0481 | 14   | 0.6025        |
+| 0.0515 | 15   | 0.619         |
+| 0.0550 | 16   | 0.6005        |
+| 0.0584 | 17   | 0.5917        |
+| 0.0619 | 18   | 0.5658        |
+| 0.0653 | 19   | 0.5571        |
+| 0.0687 | 20   | 0.5411        |
+| 0.0722 | 21   | 0.5374        |
+| 0.0756 | 22   | 0.5304        |
+| 0.0790 | 23   | 0.5103        |
+| 0.0825 | 24   | 0.5184        |
+| 0.0859 | 25   | 0.5036        |
+| 0.0893 | 26   | 0.5213        |
+| 0.0928 | 27   | 0.5399        |
+| 0.0962 | 28   | 0.5414        |
+| 0.0997 | 29   | 0.5177        |
+| 0.1031 | 30   | 0.5248        |
+| 0.1065 | 31   | 0.5196        |
+| 0.1100 | 32   | 0.499         |
+| 0.1134 | 33   | 0.514         |
+| 0.1168 | 34   | 0.5154        |
+| 0.1203 | 35   | 0.5114        |
+| 0.1237 | 36   | 0.508         |
+| 0.1271 | 37   | 0.5117        |
+| 0.1306 | 38   | 0.495         |
+| 0.1340 | 39   | 0.5304        |
+| 0.1375 | 40   | 0.4956        |
+| 0.1409 | 41   | 0.5274        |
+| 0.1443 | 42   | 0.5181        |
+| 0.1478 | 43   | 0.5103        |
+| 0.1512 | 44   | 0.5116        |
+| 0.1546 | 45   | 0.499         |
+| 0.1581 | 46   | 0.5072        |
+| 0.1615 | 47   | 0.5044        |
+| 0.1649 | 48   | 0.5071        |
+| 0.1684 | 49   | 0.5129        |
+| 0.1718 | 50   | 0.5095        |
+| 0.1753 | 51   | 0.5174        |
+| 0.1787 | 52   | 0.4748        |
+| 0.1821 | 53   | 0.4507        |
+| 0.1856 | 54   | 0.4927        |
+| 0.1890 | 55   | 0.452         |
+| 0.1924 | 56   | 0.4999        |
+| 0.1959 | 57   | 0.4744        |
+| 0.1993 | 58   | 0.4486        |
+| 0.2027 | 59   | 0.4725        |
+| 0.2062 | 60   | 0.4723        |
+| 0.2096 | 61   | 0.4747        |
+| 0.2131 | 62   | 0.4317        |
+| 0.2165 | 63   | 0.4668        |
+| 0.2199 | 64   | 0.453         |
+| 0.2234 | 65   | 0.4457        |
+| 0.2268 | 66   | 0.4179        |
+| 0.2302 | 67   | 0.4124        |
+| 0.2337 | 68   | 0.4454        |
+| 0.2371 | 69   | 0.4222        |
+| 0.2405 | 70   | 0.4151        |
+| 0.2440 | 71   | 0.4172        |
+| 0.2474 | 72   | 0.422         |
+| 0.2509 | 73   | 0.4088        |
+| 0.2543 | 74   | 0.4107        |
+| 0.2577 | 75   | 0.3977        |
+| 0.2612 | 76   | 0.4141        |
+| 0.2646 | 77   | 0.3991        |
+| 0.2680 | 78   | 0.3955        |
+| 0.2715 | 79   | 0.3864        |
+| 0.2749 | 80   | 0.4147        |
+| 0.2784 | 81   | 0.4084        |
+| 0.2818 | 82   | 0.4139        |
+| 0.2852 | 83   | 0.3999        |
+| 0.2887 | 84   | 0.4305        |
+| 0.2921 | 85   | 0.4188        |
+| 0.2955 | 86   | 0.4171        |
+| 0.2990 | 87   | 0.407         |
+| 0.3024 | 88   | 0.3871        |
+| 0.3058 | 89   | 0.389         |
+| 0.3093 | 90   | 0.3813        |
+| 0.3127 | 91   | 0.3814        |
+| 0.3162 | 92   | 0.3732        |
+| 0.3196 | 93   | 0.3899        |
+| 0.3230 | 94   | 0.3655        |
+| 0.3265 | 95   | 0.3638        |
+| 0.3299 | 96   | 0.3784        |
+| 0.3333 | 97   | 0.3729        |
+| 0.3368 | 98   | 0.3665        |
+| 0.3402 | 99   | 0.3579        |
+| 0.3436 | 100  | 0.3414        |
+| 0.3471 | 101  | 0.3304        |
+| 0.3505 | 102  | 0.347         |
+| 0.3540 | 103  | 0.3076        |
+| 0.3574 | 104  | 0.3111        |
+| 0.3608 | 105  | 0.3121        |
+| 0.3643 | 106  | 0.3272        |
+| 0.3677 | 107  | 0.3108        |
+| 0.3711 | 108  | 0.3092        |
+| 0.3746 | 109  | 0.2951        |
+| 0.3780 | 110  | 0.3195        |
+| 0.3814 | 111  | 0.2915        |
+| 0.3849 | 112  | 0.2855        |
+| 0.3883 | 113  | 0.2904        |
+| 0.3918 | 114  | 0.2873        |
+| 0.3952 | 115  | 0.273         |
+| 0.3986 | 116  | 0.2779        |
+| 0.4021 | 117  | 0.2939        |
+| 0.4055 | 118  | 0.276         |
+| 0.4089 | 119  | 0.2535        |
+| 0.4124 | 120  | 0.2774        |
+| 0.4158 | 121  | 0.2597        |
+| 0.4192 | 122  | 0.2541        |
+| 0.4227 | 123  | 0.2587        |
+| 0.4261 | 124  | 0.27          |
+| 0.4296 | 125  | 0.2724        |
+| 0.4330 | 126  | 0.2446        |
+| 0.4364 | 127  | 0.2747        |
+| 0.4399 | 128  | 0.268         |
+| 0.4433 | 129  | 0.2585        |
+| 0.4467 | 130  | 0.2652        |
+| 0.4502 | 131  | 0.2685        |
+| 0.4536 | 132  | 0.2565        |
+| 0.4570 | 133  | 0.2503        |
+| 0.4605 | 134  | 0.2634        |
+| 0.4639 | 135  | 0.2501        |
+| 0.4674 | 136  | 0.2479        |
+| 0.4708 | 137  | 0.2628        |
+| 0.4742 | 138  | 0.2505        |
+| 0.4777 | 139  | 0.2468        |
+| 0.4811 | 140  | 0.2365        |
+| 0.4845 | 141  | 0.2496        |
+| 0.4880 | 142  | 0.248         |
+| 0.4914 | 143  | 0.2604        |
+| 0.4948 | 144  | 0.2477        |
+| 0.4983 | 145  | 0.259         |
+| 0.5017 | 146  | 0.2556        |
+| 0.5052 | 147  | 0.2618        |
+| 0.5086 | 148  | 0.2583        |
+| 0.5120 | 149  | 0.2588        |
+| 0.5155 | 150  | 0.2468        |
+| 0.5189 | 151  | 0.2437        |
+| 0.5223 | 152  | 0.2595        |
+| 0.5258 | 153  | 0.2647        |
+| 0.5292 | 154  | 0.2699        |
+| 0.5326 | 155  | 0.2529        |
+| 0.5361 | 156  | 0.2339        |
+| 0.5395 | 157  | 0.2557        |
+| 0.5430 | 158  | 0.2402        |
+| 0.5464 | 159  | 0.2583        |
+| 0.5498 | 160  | 0.2688        |
+| 0.5533 | 161  | 0.2567        |
+| 0.5567 | 162  | 0.2702        |
+| 0.5601 | 163  | 0.2669        |
+| 0.5636 | 164  | 0.2699        |
+| 0.5670 | 165  | 0.2561        |
+| 0.5704 | 166  | 0.2406        |
+| 0.5739 | 167  | 0.2438        |
+| 0.5773 | 168  | 0.2523        |
+| 0.5808 | 169  | 0.2535        |
+| 0.5842 | 170  | 0.2533        |
+| 0.5876 | 171  | 0.2643        |
+| 0.5911 | 172  | 0.2684        |
+| 0.5945 | 173  | 0.2503        |
+| 0.5979 | 174  | 0.2735        |
+| 0.6014 | 175  | 0.2612        |
+| 0.6048 | 176  | 0.2721        |
+| 0.6082 | 177  | 0.2533        |
+| 0.6117 | 178  | 0.2704        |
+| 0.6151 | 179  | 0.2609        |
+| 0.6186 | 180  | 0.2605        |
+| 0.6220 | 181  | 0.2664        |
+| 0.6254 | 182  | 0.2516        |
+| 0.6289 | 183  | 0.2513        |
+| 0.6323 | 184  | 0.2439        |
+| 0.6357 | 185  | 0.258         |
+| 0.6392 | 186  | 0.2534        |
+| 0.6426 | 187  | 0.2638        |
+| 0.6460 | 188  | 0.2535        |
+| 0.6495 | 189  | 0.2481        |
+| 0.6529 | 190  | 0.264         |
+| 0.6564 | 191  | 0.2418        |
+| 0.6598 | 192  | 0.2326        |
+| 0.6632 | 193  | 0.2476        |
+| 0.6667 | 194  | 0.2271        |
+| 0.6701 | 195  | 0.229         |
+| 0.6735 | 196  | 0.2303        |
+| 0.6770 | 197  | 0.2272        |
+| 0.6804 | 198  | 0.2309        |
+| 0.6838 | 199  | 0.2159        |
+| 0.6873 | 200  | 0.2178        |
+| 0.6907 | 201  | 0.208         |
+| 0.6942 | 202  | 0.2257        |
+| 0.6976 | 203  | 0.2032        |
+| 0.7010 | 204  | 0.2047        |
+| 0.7045 | 205  | 0.2223        |
+| 0.7079 | 206  | 0.1964        |
+| 0.7113 | 207  | 0.1846        |
+| 0.7148 | 208  | 0.1899        |
+| 0.7182 | 209  | 0.1986        |
+| 0.7216 | 210  | 0.1898        |
+| 0.7251 | 211  | 0.1999        |
+| 0.7285 | 212  | 0.1754        |
+| 0.7320 | 213  | 0.1912        |
+| 0.7354 | 214  | 0.1702        |
+| 0.7388 | 215  | 0.17          |
+| 0.7423 | 216  | 0.1768        |
+| 0.7457 | 217  | 0.1647        |
+| 0.7491 | 218  | 0.1711        |
+| 0.7526 | 219  | 0.1507        |
+| 0.7560 | 220  | 0.1657        |
+| 0.7595 | 221  | 0.1498        |
+| 0.7629 | 222  | 0.1557        |
+| 0.7663 | 223  | 0.1651        |
+| 0.7698 | 224  | 0.1446        |
+| 0.7732 | 225  | 0.1519        |
+| 0.7766 | 226  | 0.1453        |
+| 0.7801 | 227  | 0.1561        |
+| 0.7835 | 228  | 0.1557        |
+| 0.7869 | 229  | 0.1493        |
+| 0.7904 | 230  | 0.1476        |
+| 0.7938 | 231  | 0.1453        |
+| 0.7973 | 232  | 0.1312        |
+| 0.8007 | 233  | 0.1531        |
+| 0.8041 | 234  | 0.1498        |
+| 0.8076 | 235  | 0.134         |
+| 0.8110 | 236  | 0.1361        |
+| 0.8144 | 237  | 0.1461        |
+| 0.8179 | 238  | 0.148         |
+| 0.8213 | 239  | 0.1465        |
+| 0.8247 | 240  | 0.1452        |
+| 0.8282 | 241  | 0.1399        |
+| 0.8316 | 242  | 0.1291        |
+| 0.8351 | 243  | 0.1354        |
+| 0.8385 | 244  | 0.1719        |
+| 0.8419 | 245  | 0.1555        |
+| 0.8454 | 246  | 0.1472        |
+| 0.8488 | 247  | 0.1516        |
+| 0.8522 | 248  | 0.1579        |
+| 0.8557 | 249  | 0.161         |
+| 0.8591 | 250  | 0.1661        |
+| 0.8625 | 251  | 0.155         |
+| 0.8660 | 252  | 0.1706        |
+| 0.8694 | 253  | 0.1527        |
+| 0.8729 | 254  | 0.1695        |
+| 0.8763 | 255  | 0.1904        |
+| 0.8797 | 256  | 0.186         |
+| 0.8832 | 257  | 0.1723        |
+| 0.8866 | 258  | 0.1881        |
+| 0.8900 | 259  | 0.1915        |
+| 0.8935 | 260  | 0.1969        |
+| 0.8969 | 261  | 0.1967        |
+| 0.9003 | 262  | 0.2038        |
+| 0.9038 | 263  | 0.1917        |
+| 0.9072 | 264  | 0.19          |
+| 0.9107 | 265  | 0.2161        |
+| 0.9141 | 266  | 0.222         |
+| 0.9175 | 267  | 0.2361        |
+| 0.9210 | 268  | 0.2538        |
+| 0.9244 | 269  | 0.2408        |
+| 0.9278 | 270  | 0.2372        |
+| 0.9313 | 271  | 0.2292        |
+| 0.9347 | 272  | 0.238         |
+| 0.9381 | 273  | 0.2243        |
+| 0.9416 | 274  | 0.2443        |
+| 0.9450 | 275  | 0.2435        |
+| 0.9485 | 276  | 0.2476        |
+| 0.9519 | 277  | 0.2259        |
+| 0.9553 | 278  | 0.2327        |
+| 0.9588 | 279  | 0.2345        |
+| 0.9622 | 280  | 0.2413        |
+</details>
+### Framework Versions
+- Python: 3.11.12
+- Sentence Transformers: 5.0.0
+- Transformers: 4.53.1
+- PyTorch: 2.8.0+cu128
+- Accelerate: 1.5.2
+- Datasets: 2.21.0
+- Tokenizers: 0.21.1
+## Citation
+### BibTeX
+#### Sentence Transformers
+```bibtex
+@inproceedings{reimers-2019-sentence-bert,
+    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
+    author = "Reimers, Nils and Gurevych, Iryna",
+    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
+    month = "11",
+    year = "2019",
+    publisher = "Association for Computational Linguistics",
+    url = "https://arxiv.org/abs/1908.10084",
+}
+```
+<!--
+## Glossary
+*Clearly define terms in order to be accessible across audiences.*
+-->
+<!--
+## Model Card Authors
+*Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
+-->
+<!--
+## Model Card Contact
+*Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
+-->

added_tokens.json ADDED Viewed

	@@ -0,0 +1,28 @@

+{
+  "</think>": 151668,
+  "</tool_call>": 151658,
+  "</tool_response>": 151666,
+  "<think>": 151667,
+  "<tool_call>": 151657,
+  "<tool_response>": 151665,
+  "<|box_end|>": 151649,
+  "<|box_start|>": 151648,
+  "<|endoftext|>": 151643,
+  "<|file_sep|>": 151664,
+  "<|fim_middle|>": 151660,
+  "<|fim_pad|>": 151662,
+  "<|fim_prefix|>": 151659,
+  "<|fim_suffix|>": 151661,
+  "<|im_end|>": 151645,
+  "<|im_start|>": 151644,
+  "<|image_pad|>": 151655,
+  "<|object_ref_end|>": 151647,
+  "<|object_ref_start|>": 151646,
+  "<|quad_end|>": 151651,
+  "<|quad_start|>": 151650,
+  "<|repo_name|>": 151663,
+  "<|video_pad|>": 151656,
+  "<|vision_end|>": 151653,
+  "<|vision_pad|>": 151654,
+  "<|vision_start|>": 151652
+}

config.json ADDED Viewed

	@@ -0,0 +1,71 @@

+{
+  "architectures": [
+    "Qwen3ForSequenceClassification"
+  ],
+  "attention_bias": false,
+  "attention_dropout": 0.0,
+  "bos_token_id": 151643,
+  "eos_token_id": 151645,
+  "head_dim": 128,
+  "hidden_act": "silu",
+  "hidden_size": 1024,
+  "id2label": {
+    "0": "LABEL_0"
+  },
+  "initializer_range": 0.02,
+  "intermediate_size": 3072,
+  "label2id": {
+    "LABEL_0": 0
+  },
+  "layer_types": [
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention"
+  ],
+  "max_position_embeddings": 40960,
+  "max_window_layers": 28,
+  "model_type": "qwen3",
+  "num_attention_heads": 16,
+  "num_hidden_layers": 28,
+  "num_key_value_heads": 8,
+  "pad_token_id": 151643,
+  "rms_norm_eps": 1e-06,
+  "rope_scaling": null,
+  "rope_theta": 1000000,
+  "sentence_transformers": {
+    "activation_fn": "torch.nn.modules.activation.Sigmoid",
+    "version": "5.0.0"
+  },
+  "sliding_window": null,
+  "tie_word_embeddings": true,
+  "torch_dtype": "float32",
+  "transformers_version": "4.53.1",
+  "use_cache": true,
+  "use_sliding_window": false,
+  "vocab_size": 151669
+}

merges.txt ADDED Viewed

The diff for this file is too large to render. See raw diff

model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:8768cb13c91eca7dcf4b21741856c9a012b382634206149299a2625398beea76
+size 2383145520

special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,31 @@

+{
+  "additional_special_tokens": [
+    "<|im_start|>",
+    "<|im_end|>",
+    "<|object_ref_start|>",
+    "<|object_ref_end|>",
+    "<|box_start|>",
+    "<|box_end|>",
+    "<|quad_start|>",
+    "<|quad_end|>",
+    "<|vision_start|>",
+    "<|vision_end|>",
+    "<|vision_pad|>",
+    "<|image_pad|>",
+    "<|video_pad|>"
+  ],
+  "eos_token": {
+    "content": "<|im_end|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": {
+    "content": "<|endoftext|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}

tokenizer.json ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:0bc04542e8e8fa70d398aea108486408a0320c9d5b460b448358363cd06382ac
+size 11422922

tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,239 @@

+{
+  "add_bos_token": false,
+  "add_prefix_space": false,
+  "added_tokens_decoder": {
+    "151643": {
+      "content": "<|endoftext|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151644": {
+      "content": "<|im_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151645": {
+      "content": "<|im_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151646": {
+      "content": "<|object_ref_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151647": {
+      "content": "<|object_ref_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151648": {
+      "content": "<|box_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151649": {
+      "content": "<|box_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151650": {
+      "content": "<|quad_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151651": {
+      "content": "<|quad_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151652": {
+      "content": "<|vision_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151653": {
+      "content": "<|vision_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151654": {
+      "content": "<|vision_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151655": {
+      "content": "<|image_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151656": {
+      "content": "<|video_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151657": {
+      "content": "<tool_call>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151658": {
+      "content": "</tool_call>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151659": {
+      "content": "<|fim_prefix|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151660": {
+      "content": "<|fim_middle|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151661": {
+      "content": "<|fim_suffix|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151662": {
+      "content": "<|fim_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151663": {
+      "content": "<|repo_name|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151664": {
+      "content": "<|file_sep|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151665": {
+      "content": "<tool_response>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151666": {
+      "content": "</tool_response>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151667": {
+      "content": "<think>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151668": {
+      "content": "</think>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    }
+  },
+  "additional_special_tokens": [
+    "<|im_start|>",
+    "<|im_end|>",
+    "<|object_ref_start|>",
+    "<|object_ref_end|>",
+    "<|box_start|>",
+    "<|box_end|>",
+    "<|quad_start|>",
+    "<|quad_end|>",
+    "<|vision_start|>",
+    "<|vision_end|>",
+    "<|vision_pad|>",
+    "<|image_pad|>",
+    "<|video_pad|>"
+  ],
+  "bos_token": null,
+  "clean_up_tokenization_spaces": false,
+  "eos_token": "<|im_end|>",
+  "errors": "replace",
+  "extra_special_tokens": {},
+  "model_max_length": 40960,
+  "pad_token": "<|endoftext|>",
+  "split_special_tokens": false,
+  "tokenizer_class": "Qwen2Tokenizer",
+  "unk_token": null
+}

vocab.json ADDED Viewed

The diff for this file is too large to render. See raw diff