Update README.md
Browse files
README.md
CHANGED
|
@@ -101,4 +101,84 @@ Virtuo 1.0. Uso, modificação e redistribuição, incluindo comercial, com pres
|
|
| 101 |
## Créditos
|
| 102 |
Virtuo Turing – Artificial Intelligence, S.A. (Portugal) e Octávio Viana.
|
| 103 |
Base © Mistral AI (Apache-2.0).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 104 |
Website: https://justina.cloud
|
|
|
|
| 101 |
## Créditos
|
| 102 |
Virtuo Turing – Artificial Intelligence, S.A. (Portugal) e Octávio Viana.
|
| 103 |
Base © Mistral AI (Apache-2.0).
|
| 104 |
+
Website: https://justina.cloud
|
| 105 |
+
|
| 106 |
+
# Justina Clarus 24B — safetensors (v2)
|
| 107 |
+
|
| 108 |
+
Version 2. Reinforced with more training sessions and more PT-PT Q/A pairs, maintaining focus on CPC and CC and related topics.
|
| 109 |
+
|
| 110 |
+
## What’s new in v2
|
| 111 |
+
- Larger number of pairs and training iterations.
|
| 112 |
+
- Improved stylistic consistency in technical and legal PT-PT.
|
| 113 |
+
- More robustness to question variation within the same domain.
|
| 114 |
+
|
| 115 |
+
## Generalization and non-memorization
|
| 116 |
+
- The model does not memorize all answers verbatim. It retains general patterns and may converge to consistent formulations.
|
| 117 |
+
- It learned the format, tone, and patterns of formal PT-PT Q/A with specialized jargon (e.g., legal, technical). It answers consistently in that style even for questions different from those in the dataset.
|
| 118 |
+
Useful for: applications needing consistency with the dataset’s tone without literal reproduction. Excellent for RAG.
|
| 119 |
+
- It captures semantic and syntactic patterns of the PT-PT legal corpus. For identical or very close questions, answers tend to be accurate (>80–90% semantic equivalence even without verbatim).
|
| 120 |
+
Useful for: scenarios with varied questions within the same legal theme. Better generalization.
|
| 121 |
+
|
| 122 |
+
## Primary uses
|
| 123 |
+
This model is a base for:
|
| 124 |
+
1) fine-tuning to specific legal domains;
|
| 125 |
+
2) integration in RAG;
|
| 126 |
+
3) injecting user-supplied context at prompt time (laws, interpretations) to compose legal text.
|
| 127 |
+
|
| 128 |
+
## Legal coverage
|
| 129 |
+
- CPC: types of actions, procedures, jurisdiction, appeals, res judicata.
|
| 130 |
+
- CC: obligations, tort, personality rights.
|
| 131 |
+
- Reinforcements: abuse of rights; family and minors; sexual freedom; popular actions.
|
| 132 |
+
- Additional context (selective): CPP, CP, CRP, ECHR, GDPR (+ Law 58/2019), Law 112/2009, Law 130/2015, LAV, Reg. (EU) 1215/2012, Reg. (EU) 2019/1111, Dir. 2004/25/EC, CVM 194/196.
|
| 133 |
+
|
| 134 |
+
## Data and schema
|
| 135 |
+
Training in JSONL using self-contained Q/A pairs.
|
| 136 |
+
|
| 137 |
+
Schema
|
| 138 |
+
{"messages":[{"role":"user","content":"…"},{"role":"assistant","content":"…"}]}
|
| 139 |
+
|
| 140 |
+
Examples
|
| 141 |
+
{"messages":[{"role":"user","content":"Quais são as espécies fundamentais de ações no Direito Processual Civil segundo o artigo 10.º, n.º 1 CPC?"},{"role":"assistant","content":"Segundo o artigo 10.º, n.º 1 CPC, distinguem-se duas espécies fundamentais de ações no esquema do processo civil."}]}
|
| 142 |
+
{"messages":[{"role":"user","content":"O que caracteriza a ação declarativa?"},{"role":"assistant","content":"A ação declarativa pode ser de simples apreciação, de condenação ou constitutiva (art. 10.º, n.os 2 e 3 CPC), pedindo-se em cada subespécie providências distintas."}]}
|
| 143 |
+
|
| 144 |
+
## Usage
|
| 145 |
+
Distributed as safetensors for transformers.
|
| 146 |
+
|
| 147 |
+
Python (FP16/BF16)
|
| 148 |
+
from transformers import AutoTokenizer, AutoModelForCausalLM
|
| 149 |
+
import torch
|
| 150 |
+
repo = "VirtuoTuring/justina_clarus-24b-safetensors"
|
| 151 |
+
tok = AutoTokenizer.from_pretrained(repo, use_fast=True)
|
| 152 |
+
dtype = torch.bfloat16 if torch.cuda.is_bf16_supported() else torch.float16
|
| 153 |
+
model = AutoModelForCausalLM.from_pretrained(repo, torch_dtype=dtype, device_map="auto")
|
| 154 |
+
prompt = "Pergunta: Indique as espécies de ações no art. 10.º, n.º 1 CPC.\nResposta:"
|
| 155 |
+
out = model.generate(**tok(prompt, return_tensors="pt").to(model.device),
|
| 156 |
+
max_new_tokens=400, temperature=0.2, top_p=0.9)
|
| 157 |
+
print(tok.decode(out[0], skip_special_tokens=True))
|
| 158 |
+
|
| 159 |
+
Python 4-bit (bitsandbytes)
|
| 160 |
+
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
|
| 161 |
+
import torch
|
| 162 |
+
bnb = BitsAndBytesConfig(load_in_4bit=True, bnb_4bit_quant_type="nf4",
|
| 163 |
+
bnb_4bit_use_double_quant=True,
|
| 164 |
+
bnb_4bit_compute_dtype=torch.bfloat16 if torch.cuda.is_bf16_supported() else torch.float16)
|
| 165 |
+
tok = AutoTokenizer.from_pretrained(repo, use_fast=True)
|
| 166 |
+
model = AutoModelForCausalLM.from_pretrained(repo, quantization_config=bnb, device_map="auto")
|
| 167 |
+
|
| 168 |
+
## Good practice
|
| 169 |
+
- Cite article numbers when applicable.
|
| 170 |
+
- Validate against official sources. Human review is mandatory for filings.
|
| 171 |
+
- For production, prefer low temperature and explicit token limits.
|
| 172 |
+
|
| 173 |
+
## Limitations
|
| 174 |
+
- Context window ~4k tokens.
|
| 175 |
+
- Not a substitute for legal professionals or courts.
|
| 176 |
+
- May miss special regimes or recent legislative changes.
|
| 177 |
+
|
| 178 |
+
## License
|
| 179 |
+
Virtuo 1.0. Use, modification, and redistribution, including commercial, with notices preserved and reference to Virtuo Turing – Artificial Intelligence, S.A.
|
| 180 |
+
|
| 181 |
+
## Credits
|
| 182 |
+
Virtuo Turing – Artificial Intelligence, S.A. (Portugal) and Octávio Viana.
|
| 183 |
+
Base © Mistral AI (Apache-2.0).
|
| 184 |
Website: https://justina.cloud
|