chandra-bf16-mlx

I make this available for a limited time for testing, will be removed in a week

Chandra is an OCR model that outputs markdown, HTML, and JSON. It is highly accurate at extracting text from images and PDFs, while preserving layout information.

You can try Chandra in the free playground here, or at a hosted API here.

Features

  • Convert documents to markdown, html, or json with detailed layout information
  • Good handwriting support
  • Reconstructs forms accurately, including checkboxes
  • Good support for tables, math, and complex layouts
  • Extracts images and diagrams, with captions and structured data
  • Support for 40+ languages

See the original model card for details on usage from the command line.

This model chandra-bf16-mlx was converted to MLX format from datalab-to/chandra using mlx-vlm version 0.3.4.

Using mlx tools

pip install mlx-lm
from mlx_lm import load, generate

model, tokenizer = load("chandra-bf16-mlx")

prompt = "hello"

if tokenizer.chat_template is not None:
    messages = [{"role": "user", "content": prompt}]
    prompt = tokenizer.apply_chat_template(
        messages, add_generation_prompt=True
    )

response = generate(model, tokenizer, prompt=prompt, verbose=True)
Downloads last month
19
Safetensors
Model size
9B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collections including nightmedia/chandra-bf16-mlx