ed-neo4j
/

text-to-cypher-unsloth-Llama-3.3-70B-Instruct-bnb-4bit

Text Generation

text-generation-inference

4-bit precision

Model card Files Files and versions

ed-neo4j commited on Feb 15

Commit

2236015

·

verified ·

1 Parent(s): b45b6a3

Update README.md

Files changed (1) hide show

README.md +3 -4

README.md CHANGED Viewed

@@ -30,6 +30,7 @@ You will need an a machine with:
 - 40GB+ of GPU Memory
 - Python 3.10 (tested)
 - Create a Python Environment
 - `pip install unsloth`
@@ -46,7 +47,6 @@ model, tokenizer = FastLanguageModel.from_pretrained(
     max_seq_length = 50000,
     dtype = None,
     load_in_4bit = True,
-    token = HF_TOKEN, # use one if using gated models
 )
 # It is interesting to see model architecture
 print (model)
@@ -54,8 +54,7 @@ print (model)
 #load model for Unsloth inference (2x faster inference)
 FastLanguageModel.for_inference(model)
-#Prepare into data for the model
 SYSTEM_PROMPT = """
 Task: Generate Cypher statement to query a graph database.
 Instructions: Use only the provided relationship types and properties in the schema.
@@ -86,7 +85,7 @@ messages = [
 #Apply the tokenizer chat template to the input messages)
 prompt_text = tokenizer.apply_chat_template(messages, add_generation_prompt=True,tokenize=False)
-#Turn the prompt text into a set of tokens and load tokens to GPU
 inputs = tokenizer([prompt_text], return_tensors = "pt").to("cuda")
 #Generate cypher (streaming mode on)

 - 40GB+ of GPU Memory
 - Python 3.10 (tested)
+Next:
 - Create a Python Environment
 - `pip install unsloth`
     max_seq_length = 50000,
     dtype = None,
     load_in_4bit = True,
 )
 # It is interesting to see model architecture
 print (model)
 #load model for Unsloth inference (2x faster inference)
 FastLanguageModel.for_inference(model)
+#Prepare data for the model
 SYSTEM_PROMPT = """
 Task: Generate Cypher statement to query a graph database.
 Instructions: Use only the provided relationship types and properties in the schema.
 #Apply the tokenizer chat template to the input messages)
 prompt_text = tokenizer.apply_chat_template(messages, add_generation_prompt=True,tokenize=False)
+#Turn the prompt text into a set of tokens and load them to GPU
 inputs = tokenizer([prompt_text], return_tensors = "pt").to("cuda")
 #Generate cypher (streaming mode on)