JiRack GPT-2 Initial Weights

This file is strictly intended for saving the initial weights (checkpoint) of the JiRack GPT model.
The model is "clean": it contains no data and has never undergone any pre-training.

It is engineered to be a maximally safe and robust base for training from scratch for specialized, smaller models, such as:

SPAM Detection Systems
FRAUD Detection Models
Background Check (BG Check) Models

A product of CMS Manhattan.

Tokenizer Choices

For English: GPT-2 Hugging Face tokenizer
For multilingual use: BERT tokenizer from the Hugging Face library

Model Architecture Details

GPT-2 Architecture (Classic, Transformer-like)

CustomEmbedding
FrozenSignatureLayer
LearnedPositionalEmbedding
[TransformerBlock]
    ├── MultiHeadAttention
    ├── LayerNorm
    ├── LayerNorm
    ├── FFN
          ├── Linear
          ├── Activation: GELU
          └── Linear
LayerNorm
Linear

Model Checkpoint File Explanations

12-head Attention Model

Parameters:

VOCAB_SIZE = 50257
MODEL_DIM = 768
NUM_HEADS = 12
NUM_LAYERS = 6
MAX_SEQ_LEN = 8192
FFN_HIDDEN_DIM = 4 * MODEL_DIM
HEAD_DIM = MODEL_DIM // NUM_HEADS

File:
JiRack_H12_L6_V50257_D768_MSL8192_FF768x4.pt

6-head Attention Model

Parameters:

VOCAB_SIZE = 50257
MODEL_DIM = 768
NUM_HEADS = 6
NUM_LAYERS = 6
MAX_SEQ_LEN = 8192
FFN_HIDDEN_DIM = 4 * MODEL_DIM
HEAD_DIM = MODEL_DIM // NUM_HEADS

File:
JiRack_H6_L6_V50257_D768_MSL8192_FF768x4.pt

So About PyTorch script . You can use Pytorch script for AI classification task .
Do not Jit for Chatbot task . Use just state dict PyTorch for GPT (Chatbot) tasks

See other models with same patterns for read parameters

Welcome to ask to design your corp model over 33B or 70B or more parameters

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support