Charlie 1.5

Charlie 1.5 is a high-performance, 12B-parameter large language model (LLM) built on the Mistral architecture. It is designed for long-context reasoning, complex enterprise workflows, and structured decision-making.

With a 131,072-token (128k+) context window, Charlie 1.5 enables native processing of large inputs, such as financial filings, legal contracts, technical reports without requiring retrieval-augmented generation (RAG) pipelines or external chunking systems.


Model Summary

Attribute Description
Architecture Mistral-based decoder-only transformer
Parameters ~12B
Layers 40
Hidden Size 5,120
Context Window 131,072 tokens
Vocabulary Size 131,072
Precision bfloat16 (BF16)
License Apache License 2.0

Model Highlights

  • Extended Context: Native support for 131k-token sequences using RoPE (theta: 1,000,000)
  • Efficient Attention: Grouped Query Attention (32 attention heads, 8 KV heads)
  • Broad Coverage: Large vocabulary supporting multilingual, technical, and domain-specific text
  • Deployment-Friendly: Optimized for mid-range GPUs such as NVIDIA A10G
  • Long-Form Reasoning: Particularly effective on large-document and multi-step reasoning tasks

Performance & Benchmarks

Benchmark Score
MMLU 68
MMLU-Pro 39
ARC-Challenge 60

Inference Performance (NVIDIA A10G)

  • Time to First Token (TTFT): ~80 ms
  • Throughput: ~146 tokens/sec
  • Precision: bfloat16 (BF16)

Benchmark results are indicative and may vary depending on hardware, prompt length, and configuration.


Intended Use & Scope

Charlie 1.5 is intended for:

  • Long-context document analysis
  • Enterprise decision-support systems
  • Research and experimentation
  • Commercial and non-commercial applications
  • Fine-tuning and derivative model development

The model is provided as-is and should be independently evaluated before use in high-risk or safety-critical applications.


Usage

Charlie 1.5 can be used with the Hugging Face transformers library:

import torch
from transformers import pipeline

pipe = pipeline(
    "text-generation",
    model="your-username/charlie-1.5",
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

prompt = """
Analyze the impact of a 15% tariff increase on lithium-ion components from the Asia-Pacific region. 

1. Identify the top 3 Tier 2 suppliers most at risk based on current lead times.
2. Propose a diversification strategy for our European assembly plant.
3. Calculate the projected shift in COGS if we pivot 40% of sourcing to Mexico.
"""
messages = [
     {"role": "system", "content": ""},
     {"role": "user", "content": prompt},
]

outputs = pipe(messages,
              max_new_tokens=512,
              do_sample=True,
              temperature=0.1,
              use_cache=True,
              return_full_text=False,
              num_return_sequences=1
              )
for output in outputs:
    print(output['generated_text'])

Technical Specifications

  • Hidden Size: 5,120
  • Intermediate Size: 14,336
  • Attention Heads: 32 (8 KV heads using Grouped Query Attention)
  • Activation Function: SiLU
  • Normalization: RMSNorm (epsilon: 1e-05)
  • Max Position Embeddings: 131,072

License

The model is released freely and without restriction under the Apache License 2.0. There are no restrictions on downstream usage beyond those stated in the license.

Citation & Attribution

If you use Charlie 1.5 in research or commercial applications, please attribute it to the original Gaudium AI development team.

Downloads last month
8
Safetensors
Model size
12B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for gocharlie-ai/charlie-1.5

Quantizations
2 models