JiRack GPT-2 Initial Weights
This file is strictly intended for saving the initial weights (checkpoint) of the JiRack GPT model.
The model is "clean": it contains no data and has never undergone any pre-training.
- Powered by CMS Manhattanβs cutting-edge Vision-BERT architecture.
It is engineered to be a maximally safe and robust base for training from scratch for specialized, smaller models, such as:
- SPAM Detection Systems
- FRAUD Detection Models
- Background Check (BG Check) Models
A product of CMS Manhattan.
Tokenizer Choices
- For English: GPT-2 Hugging Face tokenizer
- For multilingual use: BERT tokenizer from the Hugging Face library
Model Architecture Details
GPT-2 Architecture (Classic, Transformer-like)
CustomEmbedding
FrozenSignatureLayer
LearnedPositionalEmbedding
[TransformerBlock]
βββ MultiHeadAttention
βββ LayerNorm
βββ LayerNorm
βββ FFN
βββ Linear
βββ Activation: GELU
βββ Linear
LayerNorm
Linear
Model Checkpoint File Explanations
12-head Attention Model
Parameters:
VOCAB_SIZE = 50257MODEL_DIM = 768NUM_HEADS = 12NUM_LAYERS = 6MAX_SEQ_LEN = 8192FFN_HIDDEN_DIM = 4 * MODEL_DIMHEAD_DIM = MODEL_DIM // NUM_HEADS
File:JiRack_H12_L6_V50257_D768_MSL8192_FF768x4.pt
6-head Attention Model
Parameters:
VOCAB_SIZE = 50257MODEL_DIM = 768NUM_HEADS = 6NUM_LAYERS = 6MAX_SEQ_LEN = 8192FFN_HIDDEN_DIM = 4 * MODEL_DIMHEAD_DIM = MODEL_DIM // NUM_HEADS
File:JiRack_H6_L6_V50257_D768_MSL8192_FF768x4.pt
- So About PyTorch script . You can use Pytorch script for AI classification task .
- Do not Jit for Chatbot task . Use just state dict PyTorch for GPT (Chatbot) tasks
See other models with same patterns for read parameters
Welcome to ask to design your corp model over 33B or 70B or more parameters
CMS Manhattan
Copyright Β© 2002β2026
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
π
Ask for provider support