Charlie 1.5

Charlie 1.5 is a high-performance, 12B-parameter large language model (LLM) built on the Mistral architecture. It is designed for long-context reasoning, complex enterprise workflows, and structured decision-making.

With a 131,072-token (128k+) context window, Charlie 1.5 enables native processing of large inputs, such as financial filings, legal contracts, technical reports without requiring retrieval-augmented generation (RAG) pipelines or external chunking systems.

Model Summary

Attribute	Description
Architecture	Mistral-based decoder-only transformer
Parameters	~12B
Layers	40
Hidden Size	5,120
Context Window	131,072 tokens
Vocabulary Size	131,072
Precision	bfloat16 (BF16)
License	Apache License 2.0

Model Highlights

Extended Context: Native support for 131k-token sequences using RoPE (theta: 1,000,000)
Efficient Attention: Grouped Query Attention (32 attention heads, 8 KV heads)
Broad Coverage: Large vocabulary supporting multilingual, technical, and domain-specific text
Deployment-Friendly: Optimized for mid-range GPUs such as NVIDIA A10G
Long-Form Reasoning: Particularly effective on large-document and multi-step reasoning tasks

Performance & Benchmarks

Benchmark	Score
MMLU	68
MMLU-Pro	39
ARC-Challenge	60

Inference Performance (NVIDIA A10G)

Time to First Token (TTFT): ~80 ms
Throughput: ~146 tokens/sec
Precision: bfloat16 (BF16)

Benchmark results are indicative and may vary depending on hardware, prompt length, and configuration.

Intended Use & Scope

Charlie 1.5 is intended for:

Long-context document analysis
Enterprise decision-support systems
Research and experimentation
Commercial and non-commercial applications
Fine-tuning and derivative model development

The model is provided as-is and should be independently evaluated before use in high-risk or safety-critical applications.

Usage

Charlie 1.5 can be used with the Hugging Face transformers library:

import torch
from transformers import pipeline

pipe = pipeline(
    "text-generation",
    model="your-username/charlie-1.5",
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

prompt = """
Analyze the impact of a 15% tariff increase on lithium-ion components from the Asia-Pacific region. 

1. Identify the top 3 Tier 2 suppliers most at risk based on current lead times.
2. Propose a diversification strategy for our European assembly plant.
3. Calculate the projected shift in COGS if we pivot 40% of sourcing to Mexico.
"""
messages = [
     {"role": "system", "content": ""},
     {"role": "user", "content": prompt},
]

outputs = pipe(messages,
              max_new_tokens=512,
              do_sample=True,
              temperature=0.1,
              use_cache=True,
              return_full_text=False,
              num_return_sequences=1
              )
for output in outputs:
    print(output['generated_text'])

Technical Specifications

Hidden Size: 5,120
Intermediate Size: 14,336
Attention Heads: 32 (8 KV heads using Grouped Query Attention)
Activation Function: SiLU
Normalization: RMSNorm (epsilon: 1e-05)
Max Position Embeddings: 131,072

License

The model is released freely and without restriction under the Apache License 2.0. There are no restrictions on downstream usage beyond those stated in the license.

Citation & Attribution

If you use Charlie 1.5 in research or commercial applications, please attribute it to the original Gaudium AI development team.

Downloads last month: 8

Safetensors

Model size

12B params

Tensor type

BF16

Model tree for gocharlie-ai/charlie-1.5

Quantizations

2 models