ANWGPT3 (anwgpt3-355m)

Conversational version of ANWGPT2 (anwgpt2-355m).

Developed by: Subhrajit Sain, aka, ANW
Funded by: no one
Contributors: FlameF0X
Model type: text generator / question answering
Language (NLP): English
License: MIT
Finetuned from model: SubhrajitSain/anwgpt2-355m

Requirements (Python)

torch==2.3.1
torchvision==0.18.1
torchaudio==2.3.1
transformers==4.41.2
peft==0.10.0
accelerate==0.29.3
datasets==2.19.0
trl==0.8.6
bitsandbytes==0.43.1

Custom Inference Code

# =================================================================================================
# ANWGPT3 Inference Code - by ANW
# =================================================================================================

import torch
import gc
import time
from transformers import AutoTokenizer, AutoModelForCausalLM
from accelerate.utils import load_checkpoint_in_model 
from huggingface_hub import snapshot_download

model_hub_id = "SubhrajitSain/anwgpt3-355m"
base_model_name = "SubhrajitSain/anwgpt2-355m"

if torch.cuda.is_available():
    device = "cuda"
    print("Using GPU. Loading model structure (FP16).")
    load_kwargs = {} 
else:
    device = "cpu"
    print("Using CPU. Inference will be slow.")
    load_kwargs = {}

print(f"Downloading checkpoint files for {model_hub_id} to local cache...")
local_checkpoint_path = snapshot_download(repo_id=model_hub_id)

print(f"Loading tokenizer from: {model_hub_id}...")
tokenizer = AutoTokenizer.from_pretrained(model_hub_id, use_fast=False) 
vocab_size = len(tokenizer)

clean_template = (
    "{% for message in messages %}"
    "{{ message['content'] | trim }}\n"
    "{% endfor %}"
)
tokenizer.chat_template = clean_template
print("Applied chat template modification: Removed role tags in input.")

terminators = [
    tokenizer.eos_token_id, 
    tokenizer.convert_tokens_to_ids("<|im_end|>")
]

print(f"Loading base model structure from: {base_model_name}...")
model = AutoModelForCausalLM.from_pretrained(
    base_model_name,
    torch_dtype=torch.float16, 
    **load_kwargs 
)

print(f"Resizing model embeddings from {model.config.vocab_size} to {vocab_size} tokens.")
model.resize_token_embeddings(vocab_size)
model.config.vocab_size = vocab_size

print("Loading final merged weights onto the resized model structure from local cache...")
load_checkpoint_in_model(
    model, 
    checkpoint=local_checkpoint_path 
)

gc.collect()
torch.cuda.empty_cache()

model = model.to(device)
model.eval() 

sys_prompt = "You are ANWGPT3, a large language model meticulously crafted by ANW. Your primary purpose is to be a helpful, harmless, and knowledgeable conversational partner. Engage users in a supportive and informative manner, striving for accuracy, clarity, and kindness in all your responses. Always be honest about your nature as an AI. If you do not know the answer to a question, admit it rather than inventing information. Your goal is to assist users thoughtfully and make every interaction a positive and productive one."

print("\n--- Starting Interactive Chat with ANWGPT3 ---")
print("Type 'quit' or 'exit' to stop. Type 'clear' to reset history. uwu")

conversation_history = [
    {"role": "system", "content": sys_prompt}
]

while True:
    user_input = input("You: ")
    
    if user_input.lower() in ["quit", "exit"]:
        print("Exit inference.")
        break
    
    if user_input.lower() == "clear":
        print("\n--- Conversation history reset! ---")
        conversation_history = [
            {"role": "system", "content": sys_prompt}
        ]
        continue

    conversation_history.append({"role": "user", "content": user_input})

    input_text = tokenizer.apply_chat_template(
        conversation_history, 
        tokenize=False, 
        add_generation_prompt=True
    )

    input_ids = tokenizer(
        input_text, 
        return_tensors="pt", 
        truncation=True
    ).input_ids.to(model.device) 

    start_time = time.time()
    with torch.no_grad():
        generated_ids = model.generate(
            input_ids,
            max_new_tokens=512, 
            do_sample=True,
            temperature=0.7,   
            top_p=0.9,         
            eos_token_id=terminators,
            pad_token_id=tokenizer.eos_token_id 
        )
    end_time = time.time()

    new_tokens = generated_ids[0][len(input_ids[0]):]
    response = tokenizer.decode(new_tokens, skip_special_tokens=True)
    
    final_response = response.split("<|im_end|>")[0].strip()
    
    final_response = final_response.replace("assistant", "").replace("[INST]", "").strip()
    
    print(f"ANWGPT3: {final_response}") 
    print(f"(Time: {end_time - start_time:.2f}s)")
    
    conversation_history.append({"role": "assistant", "content": final_response})

print("\n--- Interactive session ended ---")

del model
gc.collect()
torch.cuda.empty_cache()

Downloads last month: 816

Safetensors

Model size

0.4B params

Tensor type

F16

Model tree for SubhrajitSain/anwgpt3-355m

Base model

SubhrajitSain/anwgpt2-345m

Finetuned

SubhrajitSain/anwgpt2-355m

Finetuned

(1)

this model

Quantizations

2 models

SubhrajitSain
/

anwgpt3-355m

ANWGPT3 (anwgpt3-355m)

Requirements (Python)

Custom Inference Code

Model tree for SubhrajitSain/anwgpt3-355m

Datasets used to train SubhrajitSain/anwgpt3-355m