ÆRA-4B

Overview

ÆRA is a specialized 4 billion parameter language model developed by AND EMILI as an enterprise-focused foundation for building intelligent agents and automation pipelines. Unlike general-purpose conversational models, ÆRA is intentionally designed with a narrow, practical focus on context-based reasoning and structured outputs.

Key Capabilities

🇮🇹 Native Italian Language Support

ÆRA excels at understanding and generating Italian text, making it ideal for Italian-speaking enterprises and applications.

📄 Context-Only Responses

ÆRA is trained to rely exclusively on provided context rather than internal knowledge. When asked questions without relevant context, it will respond honestly:

"Currently I don't have access to information about the actors who played Dr. Who. Feel free to share content and I will analyze it and tell you what I can infer from it."

This behavior ensures reliability and reduces hallucination in enterprise applications.

🔧 Structured Output Generation

JSON Generation: Reliably produces well-formed JSON outputs
Entity Extraction: Identifies and extracts entities from provided text
Classification: Categorizes content based on given criteria
Sentiment Analysis: Analyzes emotional tone in context

🛠️ Function Calling

Native support for tool use and function calling, enabling seamless integration into agentic workflows and automation pipelines.

Design Philosophy

ÆRA is not intended to be a general-knowledge assistant like ChatGPT. Instead, it serves as a lightweight, efficient starting point for enterprises exploring:

Retrieval Augmented Generation (RAG) implementations
Document analysis and information extraction
Automated workflows with structured outputs
Multi-agent systems requiring reliable, predictable behavior

Use Cases

This model is ideal for companies looking to:

Test the viability of RAG systems for their specific needs
Build proof-of-concepts for document processing pipelines
Implement lightweight automation without cloud dependencies
Evaluate whether LLM-based solutions fit their requirements

If initial tests with ÆRA prove successful, organizations can then invest in developing more specialized, powerful models tailored to their specific domain needs.

Technical Details

Parameters: 4 billion
Training: Post-trained on synthetic data focused on structured reasoning and Italian language tasks
Deployment: Optimized for local deployment on standard hardware
Privacy: Runs entirely on-premises with no external API calls

Precision & Memory

Recommended: GPU with bfloat16 or float16.
If you don’t set torch_dtype, many setups will load float32 on CPU → higher RAM usage and slower inference.
If you don’t pass device_map="auto", the model may not use your GPU.
Best practice: load on GPU with torch_dtype=torch.bfloat16 (or torch.float16) and device_map="auto". Total runtime memory is higher than weights alone due to buffers and KV-cache and scales with context length and batch size.

GGUF weights for local runtimes

GGUF 4-bit weights are available for local runners like LM Studio, Ollama, and llama.cpp.

Getting Started

Using Pipeline (Simplest)

from transformers import pipeline
import torch

pipe = pipeline(
    "text-generation",
    model="and-emili/aera-4b",
    model_kwargs={
        "torch_dtype": torch.bfloat16,  # or torch.float16 if preferred
        "low_cpu_mem_usage": True,
        "device_map": "auto",
    },
)
messages = [{"role": "user", "content": "Chi sei?"}]
answer = pipe(messages)[0]['generated_text'][-1]['content']

print(answer) 
# Output: 'Ciao! Mi chiamo ÆRA, un assistente virtuale sviluppato da AND EMILI.'

Direct Model Loading

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

tokenizer = AutoTokenizer.from_pretrained("and-emili/aera-4b", use_fast=True)
model = AutoModelForCausalLM.from_pretrained(
    "and-emili/aera-4b",
    torch_dtype=torch.bfloat16,  # or torch.float16
    device_map="auto",
    low_cpu_mem_usage=True,
)

messages = [
    {"role": "user", "content": "Chi è L'attuale presidente della Repubblica Italiana?"},
]
inputs = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt=True,
    tokenize=True,
    return_dict=True,
    return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=400)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:], skip_special_tokens=True))
# Output: 'Al momento non ho informazioni aggiornate sull'attuale presidente della Repubblica Italiana. 
#         Se hai un testo o dei dati specifici che vuoi condividere, posso aiutarti a estrarre questa informazione.'

RAG-Style Context Analysis

from transformers import pipeline
import torch

pipe = pipeline(
    "text-generation",
    model="and-emili/aera-4b",
    model_kwargs={
        "torch_dtype": torch.bfloat16,
        "low_cpu_mem_usage": True,
        "device_map": "auto",
    },
)

# Document/context
document = """
Il nuovo prodotto XYZ-3000 è stato lanciato nel 2024 con un prezzo di €1,299. 
Include 3 anni di garanzia e supporto tecnico gratuito. Il prodotto pesa 2.5kg 
ed è disponibile in tre colori: nero, argento e blu. La batteria dura 48 ore 
con uso normale.
"""

messages = [
    {"role": "system", "content": document},
    {"role": "user", "content": "Quanto costa il prodotto e quali colori sono disponibili?"}
]

response = pipe(messages, max_new_tokens=100, temperature=0.3)[0]['generated_text'][-1]['content']
print(response) 
# Output: "Il prodotto XYZ-3000 costa €1,299 e è disponibile in tre colori: nero, argento e blu."

OpenAI-Compatible API (via VLLM)

For production deployments, ÆRA supports OpenAI-compatible endpoints through VLLM, enabling structured output with Pydantic schemas:


from openai import OpenAI
from pydantic import BaseModel, Field
from typing import Optional, List

client = OpenAI(
    api_key="your-key",
    base_url="https://your-vllm-endpoint/v1",
)

# Complex structured output for meeting analysis
class ActionItem(BaseModel):
    azione: str = Field(description="Descrizione dell'azione da intraprendere")
    responsabile: Optional[str] = Field(description="Persona responsabile")
    scadenza: Optional[str] = Field(description="Data di scadenza")
    priorita: str = Field(description="Priorità: alta, media, bassa")

class MeetingSummary(BaseModel):
    riassunto: str = Field(description="Riassunto generale della riunione")
    decisioni_prese: List[str] = Field(description="Lista delle decisioni prese")
    azioni_da_intraprendere: List[ActionItem] = Field(description="Azioni specifiche da intraprendere")
    partecipanti: List[str] = Field(default=[], description="Lista dei partecipanti")
    prossima_riunione: Optional[str] = Field(description="Data della prossima riunione se menzionata")

# Real meeting notes to analyze
meeting_notes = """
Riunione del 15 giugno 2024 - Team Marketing
Presenti: Laura Bianchi (Marketing Manager), Marco Verdi (Social Media), Sara Neri (Grafica)

Discusso nuovo piano marketing Q3:
- Approvato budget €15.000 per campagna social media
- Laura coordinerà con agenzia esterna per video promozionali
- Marco deve preparare content calendar entro 30 giugno
- Sara creerà mockup nuova brochure entro 25 giugno
- Decidere fornitori stampa entro luglio
- Prossimo meeting: 29 giugno ore 14:00

Priorità alta: lancio campagna entro 15 luglio
Marco deve anche analizzare performance attuali social
"""

completion = client.beta.chat.completions.parse(
    model="and-emili/aera-4b",
    messages=[
        {"role": "system", "content": "Sei un assistente esperto che riassume riunioni aziendali italiane."},
        {"role": "user", "content": f"Analizza e riassumi questi appunti:\n\n{meeting_notes}"}
    ],
    response_format=MeetingSummary,
    temperature=0.5
)

result = completion.choices[0].message.parsed
print(f"RIASSUNTO: {result.riassunto}\n")
print(f"DECISIONI PRESE: {', '.join(result.decisioni_prese)}\n")
print("AZIONI DA INTRAPRENDERE:")
for action in result.azioni_da_intraprendere:
    print(f"- {action.azione}")
    if action.responsabile:
        print(f"  Responsabile: {action.responsabile}")
    print(f"  Priorità: {action.priorita}")



# Customer Support Automation with Escalation Logic
class CustomerResponse(BaseModel):
    risposta: str = Field(description="Risposta professionale al cliente")
    categoria_richiesta: str = Field(description="Categoria: spedizione, reso, pagamento, etc.")
    livello_urgenza: str = Field(description="Urgenza: basso, medio, alto")
    azioni_suggerite: List[str] = Field(description="Azioni che il cliente può intraprendere")
    escalation_richiesta: bool = Field(description="Se necessita escalation a operatore umano")

inquiry = "URGENTE! Il mio ordine per il matrimonio di domani non è ancora arrivato! Avevo pagato la spedizione express!"

completion = client.beta.chat.completions.parse(
    model="and-emili/aera-4b",
    messages=[
        {"role": "system", "content": "Sei un assistente clienti professionale per e-commerce."},
        {"role": "user", "content": inquiry}
    ],
    response_format=CustomerResponse,
    temperature=0.5
)

response = completion.choices[0].message.parsed
print(f"Urgenza: {response.livello_urgenza}")        # "alto"
print(f"Escalation: {response.escalation_richiesta}") # True
print(f"Risposta: {response.risposta}")

Advanced Use Cases

For more complex examples including:

Customer support automation
Meeting notes summarization
Contract information extraction

Check the examples in our GitHub repository.

Limitations

Does not provide information beyond what's in the given context
Not suitable for open-ended creative tasks or general knowledge queries
Optimized for Italian; performance may vary in other languages
Designed for specific enterprise use cases, not general conversation

About AND EMILI

AND EMILI specializes in developing practical AI solutions for enterprise automation and intelligence augmentation.

License: Apache 2.0

Downloads last month: 145

Safetensors

Model size

4B params

Tensor type

BF16

Model tree for and-emili/aera-4b

Quantizations

3 models