ANWGPT3 (anwgpt3-355m)

Conversational version of ANWGPT2 (anwgpt2-355m).

  • Developed by: Subhrajit Sain, aka, ANW
  • Funded by: no one
  • Contributors: FlameF0X
  • Model type: text generator / question answering
  • Language (NLP): English
  • License: MIT
  • Finetuned from model: SubhrajitSain/anwgpt2-355m

Requirements (Python)

torch==2.3.1
torchvision==0.18.1
torchaudio==2.3.1
transformers==4.41.2
peft==0.10.0
accelerate==0.29.3
datasets==2.19.0
trl==0.8.6
bitsandbytes==0.43.1

Custom Inference Code

# =================================================================================================
# ANWGPT3 Inference Code - by ANW
# =================================================================================================

import torch
import gc
import time
from transformers import AutoTokenizer, AutoModelForCausalLM
from accelerate.utils import load_checkpoint_in_model 
from huggingface_hub import snapshot_download

model_hub_id = "SubhrajitSain/anwgpt3-355m"
base_model_name = "SubhrajitSain/anwgpt2-355m"

if torch.cuda.is_available():
    device = "cuda"
    print("Using GPU. Loading model structure (FP16).")
    load_kwargs = {} 
else:
    device = "cpu"
    print("Using CPU. Inference will be slow.")
    load_kwargs = {}

print(f"Downloading checkpoint files for {model_hub_id} to local cache...")
local_checkpoint_path = snapshot_download(repo_id=model_hub_id)

print(f"Loading tokenizer from: {model_hub_id}...")
tokenizer = AutoTokenizer.from_pretrained(model_hub_id, use_fast=False) 
vocab_size = len(tokenizer)

clean_template = (
    "{% for message in messages %}"
    "{{ message['content'] | trim }}\n"
    "{% endfor %}"
)
tokenizer.chat_template = clean_template
print("Applied chat template modification: Removed role tags in input.")

terminators = [
    tokenizer.eos_token_id, 
    tokenizer.convert_tokens_to_ids("<|im_end|>")
]

print(f"Loading base model structure from: {base_model_name}...")
model = AutoModelForCausalLM.from_pretrained(
    base_model_name,
    torch_dtype=torch.float16, 
    **load_kwargs 
)

print(f"Resizing model embeddings from {model.config.vocab_size} to {vocab_size} tokens.")
model.resize_token_embeddings(vocab_size)
model.config.vocab_size = vocab_size

print("Loading final merged weights onto the resized model structure from local cache...")
load_checkpoint_in_model(
    model, 
    checkpoint=local_checkpoint_path 
)

gc.collect()
torch.cuda.empty_cache()

model = model.to(device)
model.eval() 

sys_prompt = "You are ANWGPT3, a large language model meticulously crafted by ANW. Your primary purpose is to be a helpful, harmless, and knowledgeable conversational partner. Engage users in a supportive and informative manner, striving for accuracy, clarity, and kindness in all your responses. Always be honest about your nature as an AI. If you do not know the answer to a question, admit it rather than inventing information. Your goal is to assist users thoughtfully and make every interaction a positive and productive one."

print("\n--- Starting Interactive Chat with ANWGPT3 ---")
print("Type 'quit' or 'exit' to stop. Type 'clear' to reset history. uwu")

conversation_history = [
    {"role": "system", "content": sys_prompt}
]

while True:
    user_input = input("You: ")
    
    if user_input.lower() in ["quit", "exit"]:
        print("Exit inference.")
        break
    
    if user_input.lower() == "clear":
        print("\n--- Conversation history reset! ---")
        conversation_history = [
            {"role": "system", "content": sys_prompt}
        ]
        continue

    conversation_history.append({"role": "user", "content": user_input})

    input_text = tokenizer.apply_chat_template(
        conversation_history, 
        tokenize=False, 
        add_generation_prompt=True
    )

    input_ids = tokenizer(
        input_text, 
        return_tensors="pt", 
        truncation=True
    ).input_ids.to(model.device) 

    start_time = time.time()
    with torch.no_grad():
        generated_ids = model.generate(
            input_ids,
            max_new_tokens=512, 
            do_sample=True,
            temperature=0.7,   
            top_p=0.9,         
            eos_token_id=terminators,
            pad_token_id=tokenizer.eos_token_id 
        )
    end_time = time.time()

    new_tokens = generated_ids[0][len(input_ids[0]):]
    response = tokenizer.decode(new_tokens, skip_special_tokens=True)
    
    final_response = response.split("<|im_end|>")[0].strip()
    
    final_response = final_response.replace("assistant", "").replace("[INST]", "").strip()
    
    print(f"ANWGPT3: {final_response}") 
    print(f"(Time: {end_time - start_time:.2f}s)")
    
    conversation_history.append({"role": "assistant", "content": final_response})

print("\n--- Interactive session ended ---")

del model
gc.collect()
torch.cuda.empty_cache()
Downloads last month
816
Safetensors
Model size
0.4B params
Tensor type
F16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for SubhrajitSain/anwgpt3-355m

Finetuned
(1)
this model
Quantizations
2 models

Datasets used to train SubhrajitSain/anwgpt3-355m