ÆRA-4B

Overview

ÆRA is a specialized 4 billion parameter language model developed by AND EMILI as an enterprise-focused foundation for building intelligent agents and automation pipelines. Unlike general-purpose conversational models, ÆRA is intentionally designed with a narrow, practical focus on context-based reasoning and structured outputs.

Key Capabilities

🇮🇹 Native Italian Language Support

ÆRA excels at understanding and generating Italian text, making it ideal for Italian-speaking enterprises and applications.

📄 Context-Only Responses

ÆRA is trained to rely exclusively on provided context rather than internal knowledge. When asked questions without relevant context, it will respond honestly:

"Currently I don't have access to information about the actors who played Dr. Who. Feel free to share content and I will analyze it and tell you what I can infer from it."

This behavior ensures reliability and reduces hallucination in enterprise applications.

🔧 Structured Output Generation

  • JSON Generation: Reliably produces well-formed JSON outputs
  • Entity Extraction: Identifies and extracts entities from provided text
  • Classification: Categorizes content based on given criteria
  • Sentiment Analysis: Analyzes emotional tone in context

🛠️ Function Calling

Native support for tool use and function calling, enabling seamless integration into agentic workflows and automation pipelines.

Design Philosophy

ÆRA is not intended to be a general-knowledge assistant like ChatGPT. Instead, it serves as a lightweight, efficient starting point for enterprises exploring:

  • Retrieval Augmented Generation (RAG) implementations
  • Document analysis and information extraction
  • Automated workflows with structured outputs
  • Multi-agent systems requiring reliable, predictable behavior

Use Cases

This model is ideal for companies looking to:

  • Test the viability of RAG systems for their specific needs
  • Build proof-of-concepts for document processing pipelines
  • Implement lightweight automation without cloud dependencies
  • Evaluate whether LLM-based solutions fit their requirements

If initial tests with ÆRA prove successful, organizations can then invest in developing more specialized, powerful models tailored to their specific domain needs.

Technical Details

  • Parameters: 4 billion
  • Training: Post-trained on synthetic data focused on structured reasoning and Italian language tasks
  • Deployment: Optimized for local deployment on standard hardware
  • Privacy: Runs entirely on-premises with no external API calls

Precision & Memory

  • Recommended: GPU with bfloat16 or float16.
  • If you don’t set torch_dtype, many setups will load float32 on CPU → higher RAM usage and slower inference.
  • If you don’t pass device_map="auto", the model may not use your GPU.
  • Best practice: load on GPU with torch_dtype=torch.bfloat16 (or torch.float16) and device_map="auto". Total runtime memory is higher than weights alone due to buffers and KV-cache and scales with context length and batch size.

GGUF weights for local runtimes

GGUF 4-bit weights are available for local runners like LM Studio, Ollama, and llama.cpp.

Getting Started

Using Pipeline (Simplest)

from transformers import pipeline
import torch

pipe = pipeline(
    "text-generation",
    model="and-emili/aera-4b",
    model_kwargs={
        "torch_dtype": torch.bfloat16,  # or torch.float16 if preferred
        "low_cpu_mem_usage": True,
        "device_map": "auto",
    },
)
messages = [{"role": "user", "content": "Chi sei?"}]
answer = pipe(messages)[0]['generated_text'][-1]['content']

print(answer) 
# Output: 'Ciao! Mi chiamo ÆRA, un assistente virtuale sviluppato da AND EMILI.'

Direct Model Loading

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

tokenizer = AutoTokenizer.from_pretrained("and-emili/aera-4b", use_fast=True)
model = AutoModelForCausalLM.from_pretrained(
    "and-emili/aera-4b",
    torch_dtype=torch.bfloat16,  # or torch.float16
    device_map="auto",
    low_cpu_mem_usage=True,
)

messages = [
    {"role": "user", "content": "Chi è L'attuale presidente della Repubblica Italiana?"},
]
inputs = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt=True,
    tokenize=True,
    return_dict=True,
    return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=400)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:], skip_special_tokens=True))
# Output: 'Al momento non ho informazioni aggiornate sull'attuale presidente della Repubblica Italiana. 
#         Se hai un testo o dei dati specifici che vuoi condividere, posso aiutarti a estrarre questa informazione.'

RAG-Style Context Analysis

from transformers import pipeline
import torch

pipe = pipeline(
    "text-generation",
    model="and-emili/aera-4b",
    model_kwargs={
        "torch_dtype": torch.bfloat16,
        "low_cpu_mem_usage": True,
        "device_map": "auto",
    },
)

# Document/context
document = """
Il nuovo prodotto XYZ-3000 è stato lanciato nel 2024 con un prezzo di €1,299. 
Include 3 anni di garanzia e supporto tecnico gratuito. Il prodotto pesa 2.5kg 
ed è disponibile in tre colori: nero, argento e blu. La batteria dura 48 ore 
con uso normale.
"""

messages = [
    {"role": "system", "content": document},
    {"role": "user", "content": "Quanto costa il prodotto e quali colori sono disponibili?"}
]

response = pipe(messages, max_new_tokens=100, temperature=0.3)[0]['generated_text'][-1]['content']
print(response) 
# Output: "Il prodotto XYZ-3000 costa €1,299 e è disponibile in tre colori: nero, argento e blu."

OpenAI-Compatible API (via VLLM)

For production deployments, ÆRA supports OpenAI-compatible endpoints through VLLM, enabling structured output with Pydantic schemas:


from openai import OpenAI
from pydantic import BaseModel, Field
from typing import Optional, List

client = OpenAI(
    api_key="your-key",
    base_url="https://your-vllm-endpoint/v1",
)

# Complex structured output for meeting analysis
class ActionItem(BaseModel):
    azione: str = Field(description="Descrizione dell'azione da intraprendere")
    responsabile: Optional[str] = Field(description="Persona responsabile")
    scadenza: Optional[str] = Field(description="Data di scadenza")
    priorita: str = Field(description="Priorità: alta, media, bassa")

class MeetingSummary(BaseModel):
    riassunto: str = Field(description="Riassunto generale della riunione")
    decisioni_prese: List[str] = Field(description="Lista delle decisioni prese")
    azioni_da_intraprendere: List[ActionItem] = Field(description="Azioni specifiche da intraprendere")
    partecipanti: List[str] = Field(default=[], description="Lista dei partecipanti")
    prossima_riunione: Optional[str] = Field(description="Data della prossima riunione se menzionata")

# Real meeting notes to analyze
meeting_notes = """
Riunione del 15 giugno 2024 - Team Marketing
Presenti: Laura Bianchi (Marketing Manager), Marco Verdi (Social Media), Sara Neri (Grafica)

Discusso nuovo piano marketing Q3:
- Approvato budget €15.000 per campagna social media
- Laura coordinerà con agenzia esterna per video promozionali
- Marco deve preparare content calendar entro 30 giugno
- Sara creerà mockup nuova brochure entro 25 giugno
- Decidere fornitori stampa entro luglio
- Prossimo meeting: 29 giugno ore 14:00

Priorità alta: lancio campagna entro 15 luglio
Marco deve anche analizzare performance attuali social
"""

completion = client.beta.chat.completions.parse(
    model="and-emili/aera-4b",
    messages=[
        {"role": "system", "content": "Sei un assistente esperto che riassume riunioni aziendali italiane."},
        {"role": "user", "content": f"Analizza e riassumi questi appunti:\n\n{meeting_notes}"}
    ],
    response_format=MeetingSummary,
    temperature=0.5
)

result = completion.choices[0].message.parsed
print(f"RIASSUNTO: {result.riassunto}\n")
print(f"DECISIONI PRESE: {', '.join(result.decisioni_prese)}\n")
print("AZIONI DA INTRAPRENDERE:")
for action in result.azioni_da_intraprendere:
    print(f"- {action.azione}")
    if action.responsabile:
        print(f"  Responsabile: {action.responsabile}")
    print(f"  Priorità: {action.priorita}")



# Customer Support Automation with Escalation Logic
class CustomerResponse(BaseModel):
    risposta: str = Field(description="Risposta professionale al cliente")
    categoria_richiesta: str = Field(description="Categoria: spedizione, reso, pagamento, etc.")
    livello_urgenza: str = Field(description="Urgenza: basso, medio, alto")
    azioni_suggerite: List[str] = Field(description="Azioni che il cliente può intraprendere")
    escalation_richiesta: bool = Field(description="Se necessita escalation a operatore umano")

inquiry = "URGENTE! Il mio ordine per il matrimonio di domani non è ancora arrivato! Avevo pagato la spedizione express!"

completion = client.beta.chat.completions.parse(
    model="and-emili/aera-4b",
    messages=[
        {"role": "system", "content": "Sei un assistente clienti professionale per e-commerce."},
        {"role": "user", "content": inquiry}
    ],
    response_format=CustomerResponse,
    temperature=0.5
)

response = completion.choices[0].message.parsed
print(f"Urgenza: {response.livello_urgenza}")        # "alto"
print(f"Escalation: {response.escalation_richiesta}") # True
print(f"Risposta: {response.risposta}")

Advanced Use Cases

For more complex examples including:

  • Customer support automation
  • Meeting notes summarization
  • Contract information extraction

Check the examples in our GitHub repository.

Limitations

  • Does not provide information beyond what's in the given context
  • Not suitable for open-ended creative tasks or general knowledge queries
  • Optimized for Italian; performance may vary in other languages
  • Designed for specific enterprise use cases, not general conversation

About AND EMILI

AND EMILI specializes in developing practical AI solutions for enterprise automation and intelligence augmentation.


License: Apache 2.0

Downloads last month
145
Safetensors
Model size
4B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for and-emili/aera-4b

Quantizations
3 models