JiRack GPT-2 Initial Weights

This file is strictly intended for saving the initial weights (checkpoint) of the JiRack GPT model.
The model is "clean": it contains no data and has never undergone any pre-training.

  • Powered by CMS Manhattan’s cutting-edge Vision-BERT architecture.

It is engineered to be a maximally safe and robust base for training from scratch for specialized, smaller models, such as:

  • SPAM Detection Systems
  • FRAUD Detection Models
  • Background Check (BG Check) Models

A product of CMS Manhattan.


Tokenizer Choices

  • For English: GPT-2 Hugging Face tokenizer
  • For multilingual use: BERT tokenizer from the Hugging Face library

Model Architecture Details

GPT-2 Architecture (Classic, Transformer-like)

CustomEmbedding
FrozenSignatureLayer
LearnedPositionalEmbedding
[TransformerBlock]
    β”œβ”€β”€ MultiHeadAttention
    β”œβ”€β”€ LayerNorm
    β”œβ”€β”€ LayerNorm
    β”œβ”€β”€ FFN
          β”œβ”€β”€ Linear
          β”œβ”€β”€ Activation: GELU
          └── Linear
LayerNorm
Linear

Model Checkpoint File Explanations

12-head Attention Model

Parameters:

  • VOCAB_SIZE = 50257
  • MODEL_DIM = 768
  • NUM_HEADS = 12
  • NUM_LAYERS = 6
  • MAX_SEQ_LEN = 8192
  • FFN_HIDDEN_DIM = 4 * MODEL_DIM
  • HEAD_DIM = MODEL_DIM // NUM_HEADS

File:
JiRack_H12_L6_V50257_D768_MSL8192_FF768x4.pt


6-head Attention Model

Parameters:

  • VOCAB_SIZE = 50257
  • MODEL_DIM = 768
  • NUM_HEADS = 6
  • NUM_LAYERS = 6
  • MAX_SEQ_LEN = 8192
  • FFN_HIDDEN_DIM = 4 * MODEL_DIM
  • HEAD_DIM = MODEL_DIM // NUM_HEADS

File:
JiRack_H6_L6_V50257_D768_MSL8192_FF768x4.pt

  • So About PyTorch script . You can use Pytorch script for AI classification task .
  • Do not Jit for Chatbot task . Use just state dict PyTorch for GPT (Chatbot) tasks

See other models with same patterns for read parameters


Welcome to ask to design your corp model over 33B or 70B or more parameters

CMS Manhattan
Copyright Β© 2002–2026

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support