saishshinde15
/

Clyrai_Base_Reasoning_GGUF

text-generation-inference

Model card Files Files and versions

Clyrai_Base_Reasoning_GGUF / README.md

saishshinde15's picture

Update README.md

10c6490 verified 3 months ago

|

history blame contribute delete

2.58 kB

	---
	base_model:
	- Qwen/Qwen2.5-3B-Instruct
	tags:
	- gguf
	- q4
	- text-generation-inference
	- transformers
	- qwen2
	- trl
	- grpo
	license: apache-2.0
	language:
	- zho
	- eng
	- fra
	- spa
	- por
	- deu
	- ita
	- rus
	- jpn
	- kor
	- vie
	- tha
	- ara
	---

	# saishshinde15/Clyrai_Base_Reasoning_GGUF (GGUF - Q4) (Formerly known as TBH.AI Base Reasoning )

	- Developed by: Clyrai
	- License: apache-2.0
	- Fine-tuned from: Qwen/Qwen2.5-3B-Instruct
	- GGUF Format: 4-bit quantized (Q4) for optimized inference

	## Model Description
	Clyrai Base Reasoning (GGUF - Q4) is a 4-bit GGUF quantized version of `saishshinde15/Clyrai_Base_Reasoning`, a fine-tuned model based on Qwen 2.5. This version is designed for high-efficiency inference on CPU/GPU with minimal memory usage, making it ideal for on-device applications and low-latency AI systems.

	Trained using GRPO (General Reinforcement with Policy Optimization), the model excels in self-reasoning, logical deduction, and structured problem-solving, comparable to DeepSeek-R1. The Q4 quantization ensures significantly lower memory requirements while maintaining strong reasoning performance.

	## Features
	- 4-bit Quantization (Q4 GGUF): Optimized for low-memory, high-speed inference on compatible backends.
	- Self-Reasoning AI: Can process complex queries autonomously, generating logical and structured responses.
	- GRPO Fine-Tuning: Uses policy optimization for improved logical consistency and step-by-step reasoning.
	- Efficient On-Device Deployment: Works seamlessly with llama.cpp, KoboldCpp, GPT4All, and ctransformers.
	- Ideal for Logical Tasks: Best suited for research, coding logic, structured Q&A, and decision-making applications.

	## Limitations
	- This Q4 GGUF version is inference-only and does not support additional fine-tuning.
	- Quantization may slightly reduce response accuracy compared to FP16/full-precision models.
	- Performance depends on the execution environment and GGUF-compatible runtime.

	## Usage

	# Use this prompt for more detailed and personalized results. This is the recommended prompt as the model was tuned on it.

	```python
	You are a reasoning model made by researcher at Clyrai and your role is to respond in the following format only and in detail :

	<reasoning>
	...
	</reasoning>
	<answer>
	...
	</answer>
	```

	# Use this prompt for concise representation of answers.

	```python
	SYSTEM_PROMPT = """
	Respond in the following format:
	<reasoning>
	...
	</reasoning>
	<answer>
	...
	</answer>
	"""