Instructions to use ahans1/control-llama-1B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use ahans1/control-llama-1B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="ahans1/control-llama-1B")

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("ahans1/control-llama-1B")
model = AutoModelForCausalLM.from_pretrained("ahans1/control-llama-1B")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use ahans1/control-llama-1B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "ahans1/control-llama-1B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ahans1/control-llama-1B",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/ahans1/control-llama-1B

SGLang

How to use ahans1/control-llama-1B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "ahans1/control-llama-1B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ahans1/control-llama-1B",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "ahans1/control-llama-1B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ahans1/control-llama-1B",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use ahans1/control-llama-1B with Docker Model Runner:
```
docker model run hf.co/ahans1/control-llama-1B
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

YAML Metadata Warning:The pipeline tag "text2text-generation" is not in the official list: text-classification, token-classification, table-question-answering, question-answering, zero-shot-classification, translation, summarization, feature-extraction, text-generation, fill-mask, sentence-similarity, text-to-speech, text-to-audio, automatic-speech-recognition, audio-to-audio, audio-classification, audio-text-to-text, voice-activity-detection, depth-estimation, image-classification, object-detection, image-segmentation, text-to-image, image-to-text, image-to-image, image-to-video, unconditional-image-generation, video-classification, reinforcement-learning, robotics, tabular-classification, tabular-regression, tabular-to-text, table-to-text, multiple-choice, text-ranking, text-retrieval, time-series-forecasting, text-to-video, image-text-to-text, image-text-to-image, image-text-to-video, visual-question-answering, document-question-answering, zero-shot-image-classification, graph-ml, mask-generation, zero-shot-object-detection, text-to-3d, image-to-3d, image-feature-extraction, video-text-to-text, keypoint-detection, visual-document-retrieval, any-to-any, video-to-video, other

Quick Links

GitHub Repository: https://github.com/ahans30/goldfish-loss
arXiv: https://arxiv.org/abs/2406.10209

Goldfish Loss

We introduce goldfish loss, a new language modeling loss function that mitigates memorization of training data. Specifically, goldfish loss pseudorandomly drops $1/k$ of total tokens seen (in the forward pass) during loss computation (i.e., it doesn't compute loss for these tokens), with k being a hyperparameter. We show that the model finds it increasingly difficult to verbatim regurgitate training data even after 100 epochs. Please read our paper linked below for more details.

Overview

The following checkpoints are from our paper titled Goldfish Loss: Mitigating Memorization in Generative LLMs [paper link].

Checkpoint Name	k-GL	Token Drop Strategy	Pretrain Tokens	Primary Dataset	Canaries Dataset for Memorization
tomg-group-umd/3-goldfish-loss-llama-1B	3	Hash (width = 13)	20B	Redpajama	Wikipedia
tomg-group-umd/4-goldfish-loss-llama-1B	4	Hash (width = 13)	20B	Redpajama	Wikipedia
tomg-group-umd/8-goldfish-loss-llama-1B	8	Hash (width = 13)	20B	Redpajama	Wikipedia
tomg-group-umd/32-goldfish-loss-llama-1B	32	Hash (width = 13)	20B	Redpajama	Wikipedia
tomg-group-umd/128-goldfish-loss-llama-1B	128	Hash (width = 13)	20B	Redpajama	Wikipedia
tomg-group-umd/control-llama-1B	-	No Tokens Dropped	20B	Redpajama	None
tomg-group-umd/standard-loss-llama-1B	-	No Tokens Dropped	20B	Redpajama	Wikipedia

Description

standard-loss-llama-1B and control-llama-1B are trained with the standard causal language modeling loss, which has the same exact specifications as the goldfish models.
The control model differs only in the fact that it did not utilize the canaries dataset for memorization and was simply pre-trained on 20B Redpajama tokens.
The Canaries dataset, which contains 2000 Wikidocs, is repeated 50 times throughout the pre-training. Thus, it contains around ~204M tokens in total (including padding).

Technical Specification

Each checkpoint mentioned above used randomly initialized TinyLLaMA-1.1B architecture. For pretraining details, please find check our GitHub repository.

Cite our work

If you find our model, codebase or dataset beneficial, please consider citing our work:

@misc{hans2024like,
      title={Be like a Goldfish, Don't Memorize! Mitigating Memorization in Generative LLMs}, 
      author={Abhimanyu Hans and Yuxin Wen and Neel Jain and John Kirchenbauer and Hamid Kazemi and Prajwal Singhania and Siddharth Singh and Gowthami Somepalli and Jonas Geiping and Abhinav Bhatele and Tom Goldstein},
      year={2024},
      eprint={2406.10209},
      archivePrefix={arXiv},
}

Downloads last month: 11

Safetensors

Model size

1B params

Tensor type

F32

Model tree for ahans1/control-llama-1B

Quantizations

1 model

Paper for ahans1/control-llama-1B

Be like a Goldfish, Don't Memorize! Mitigating Memorization in Generative LLMs

Paper • 2406.10209 • Published Jun 14, 2024 • 8