11 43 84

Javed Alam PRO

Javedalam

https://www.linkedin.com/in/jalam1001/

AI & ML interests

Ai interest as user, tester, and developer of open source ai llm based applications.

Recent Activity

updated a Space about 6 hours ago

Javedalam/Ibm-granite-4.0

published a Space about 6 hours ago

Javedalam/Ibm-granite-4.0

upvoted a collection about 7 hours ago

Granite 4.0 Nano Language Models

View all activity

Organizations

updated a Space about 6 hours ago

Ibm Granite 4.0

🏢

Small lm

published a Space about 6 hours ago

Ibm Granite 4.0

🏢

Small lm

upvoted a collection about 7 hours ago

Granite 4.0 Nano Language Models

Collection

9 items • Updated about 9 hours ago • 36

upvoted an article about 7 hours ago

Article

Granite 4.0 Nano: Just how small can you go?

and 1 other •

about 10 hours ago

• 32

reacted to merve's post with 👍 about 22 hours ago

Post

4543

deepseek-ai/DeepSeek-OCR is out! 🔥 my take ⤵️
> pretty insane it can parse and re-render charts in HTML
> it uses CLIP and SAM features concatenated, so better grounding
> very efficient per vision tokens/performance ratio
> covers 100 languages

2 replies

liked a model about 22 hours ago

onnx-community/trlm-135m-ONNX

Text Generation • Updated 28 days ago • 22 • 1

updated a collection about 23 hours ago

general pupose llm

Collection

8 items • Updated about 23 hours ago

liked a Space about 23 hours ago

ZeroGPU-LLM-Inference

🧠

Streaming LLM chat with web search and controls

liked a model 1 day ago

mkurman/lfm2-350M-med

Text Generation • 0.4B • Updated Sep 12 • 37 • 2

upvoted a collection 3 days ago

MobileLLM-Pro

Collection

4 items • Updated 5 days ago • 3

liked a model 3 days ago

meituan-longcat/LongCat-Video

Text-to-Video • Updated about 18 hours ago • 376 • 182

liked a model 4 days ago

Daksh0505/Seq2Seq-LSTM-MultiHeadAttention

Updated 7 days ago • 146 • 2

liked a Space 4 days ago

MiMo-Audio-Chat

💬

Chat with Xiaomi MiMo-Audio using voice

liked a model 4 days ago

XiaomiMiMo/MiMo-VL-7B-RL-GGUF

8B • Updated Jun 8 • 332 • 5

upvoted a collection 4 days ago

Skywork-R1V2

Collection

Multimodal Hybrid Reinforcement Learning for Reasoning • 7 items • Updated Aug 13 • 12

upvoted 2 articles 5 days ago

Article

Building the Open Agent Ecosystem Together: Introducing OpenEnv

6 days ago

• 100

Article

Uncensor any LLM with abliteration

•

Jun 13, 2024

• 706

updated a collection 6 days ago

Llm audio

Collection

13 items • Updated 6 days ago • 1

liked a Space 7 days ago

NanoChat WebGPU

⚡

Run NanoChat 100% locally in your browser on WebGPU

posted an update 7 days ago

Post

315

DeepSeek-OCR is a new open-source, vision-language OCR model from DeepSeek-AI (the same lab behind the DeepSeek-V and DeepSeek-R series).
It’s built to read complex, real-world documents — screenshots, PDFs, forms, tables, and handwritten or noisy text — and output clean, structured Markdown.

---

⚙️ Core capabilities

Multimodal (Vision + Language):
Uses a hybrid vision encoder + causal text decoder to “see” layouts and generate text like a language model rather than just classifying characters.

Markdown output:
Instead of raw text, it structures output with Markdown syntax — headings, bullet lists, tables, and inline formatting — which makes the results ideal for direct use in notebooks or LLM pipelines.

PDF-aware:
Includes a built-in PDF runner that automatically slices pages into tiles, processes each region, and re-assembles multi-page outputs.

Adaptive tiling (“crop_mode”):
Automatically splits large pages into overlapping tiles for better recognition of dense, small fonts — the “Gundam mode” mentioned in their docs.

Vision backbone:
Based on DeepSeek-V2’s VL-encoder (≈3 B parameters) trained on massive document + scene-text corpora.
Handles resolutions up to 1280 × 1280 px and dynamically scales lower.

Language head:
Uses the same causal decoder family as DeepSeek-V2, fine-tuned for text reconstruction, so it can reason about table alignment, code blocks, and list structures.

Open and MIT-licensed:
Weights and inference code are fully open under the MIT license, allowing integration into other projects or retraining for domain-specific OCR.

---

🆕 What’s new about its approach

Traditional OCR (e.g., Tesseract, PaddleOCR) → detects and classifies glyphs.
DeepSeek-OCR → interprets the entire document as a multimodal

Google colab notebook running deepseek OCR

https://colab.research.google.com/drive/1Fjzv3UYNoOt28HpM0RMUc8kG34EFgvuu?usp=sharing

The model url

deepseek-ai/DeepSeek-OCR

Javed Alam PRO

AI & ML interests

Recent Activity

Organizations

Javedalam's activity

Ibm Granite 4.0

Ibm Granite 4.0

Granite 4.0 Nano: Just how small can you go?

ZeroGPU-LLM-Inference

MiMo-Audio-Chat

Building the Open Agent Ecosystem Together: Introducing OpenEnv

Uncensor any LLM with abliteration

NanoChat WebGPU