| --- |
| |
| library_name: transformers |
| pipeline_tag: feature-extraction |
| license: apache-2.0 |
| tags: |
| - autoencoder |
| - pytorch |
| - reconstruction |
| - preprocessing |
| - normalizing-flow |
| - scaler |
| --- |
| |
| ## Autoencoder for Hugging Face Transformers (Block-based) |
|
|
| A flexible, production-grade Autoencoder implementation built to fit naturally into the Transformers ecosystem. It supports a new block-based architecture with ready-to-use templates for classic MLP, VAE/beta-VAE, Transformer, Recurrent, Convolutional, mixed hybrids, and learnable preprocessing. |
|
|
| ### Key features |
| - Block-based architecture: Linear, Attention, Recurrent (LSTM/GRU), Convolutional, Variational blocks |
| - Class-based configuration presets in template.py for quick starts |
| - Variational and beta-VAE variants (KL-controlled) |
| - Learnable preprocessing and inverse transforms |
| - Hugging Face-compatible config/model API and from_pretrained/save_pretrained |
|
|
| ## Install and load from the Hub (code repo) |
|
|
| ```python |
| from huggingface_hub import snapshot_download |
| import sys, torch |
| |
| repo_dir = snapshot_download( |
| repo_id="amaye15/autoencoder", |
| repo_type="model", |
| allow_patterns=["*.py", "config.json", "*.safetensors"], |
| ) |
| sys.path.append(repo_dir) |
| |
| from modeling_autoencoder import AutoencoderForReconstruction |
| model = AutoencoderForReconstruction.from_pretrained(repo_dir) |
| |
| x = torch.randn(8, 20) |
| out = model(input_values=x) |
| print("latent:", out.last_hidden_state.shape, "reconstructed:", out.reconstructed.shape) |
| ``` |
|
|
| ## Quickstart with class-based templates |
|
|
| ```python |
| from modeling_autoencoder import AutoencoderModel |
| from template import ClassicAutoencoderConfig |
| |
| cfg = ClassicAutoencoderConfig(input_dim=784, latent_dim=64) |
| model = AutoencoderModel(cfg) |
| |
| x = torch.randn(4, 784) |
| out = model(x, return_dict=True) |
| print(out.last_hidden_state.shape, out.reconstructed.shape) |
| ``` |
|
|
| ### Available presets (template.py) |
| - ClassicAutoencoderConfig: Dense MLP AE |
| - VariationalAutoencoderConfig: VAE with KL regularization |
| - BetaVariationalAutoencoderConfig: beta-VAE (beta > 1) |
| - TransformerAutoencoderConfig: Attention-based encoder for sequences |
| - RecurrentAutoencoderConfig: LSTM/GRU encoder for sequences |
| - ConvolutionalAutoencoderConfig: 1D Conv encoder for sequences |
| - ConvAttentionAutoencoderConfig: Mixed Conv + Attention encoder |
| - LinearRecurrentAutoencoderConfig: Linear down-projection + RNN |
| - PreprocessedAutoencoderConfig: MLP AE with learnable preprocessing |
|
|
| ## Block-based architecture |
|
|
| The autoencoder uses a modular block system where you define encoder_blocks and decoder_blocks as lists of dictionaries. Each block dict specifies its type and parameters. |
|
|
| ### Available block types |
|
|
| #### LinearBlock |
| Dense layer with optional normalization, activation, dropout, and residual connections. |
|
|
| ```python |
| { |
| "type": "linear", |
| "input_dim": 256, |
| "output_dim": 128, |
| "activation": "relu", # relu, gelu, tanh, sigmoid, etc. |
| "normalization": "batch", # batch, layer, group, instance, none |
| "dropout_rate": 0.1, |
| "use_residual": False, # adds skip connection if input_dim == output_dim |
| "residual_scale": 1.0 |
| } |
| ``` |
|
|
| #### AttentionBlock |
| Multi-head self-attention with feed-forward network. Works with 2D (B, D) or 3D (B, T, D) inputs. |
|
|
| ```python |
| { |
| "type": "attention", |
| "input_dim": 128, |
| "num_heads": 8, |
| "ffn_dim": 512, # if None, defaults to 4 * input_dim |
| "dropout_rate": 0.1 |
| } |
| ``` |
|
|
| #### RecurrentBlock |
| LSTM, GRU, or vanilla RNN encoder. Outputs final hidden state or all timesteps. |
|
|
| ```python |
| { |
| "type": "recurrent", |
| "input_dim": 64, |
| "hidden_size": 128, |
| "num_layers": 2, |
| "rnn_type": "lstm", # lstm, gru, rnn |
| "bidirectional": True, |
| "dropout_rate": 0.1, |
| "output_dim": 128 # final output dimension |
| } |
| ``` |
|
|
| #### ConvolutionalBlock |
| 1D convolution for sequence data. Expects 3D input (B, T, D). |
|
|
| ```python |
| { |
| "type": "conv1d", |
| "input_dim": 64, # input channels |
| "output_dim": 128, # output channels |
| "kernel_size": 3, |
| "padding": "same", # "same" or integer |
| "activation": "relu", |
| "normalization": "batch", |
| "dropout_rate": 0.1 |
| } |
| ``` |
|
|
| #### VariationalBlock |
| Produces mu and logvar for VAE reparameterization. Used internally by the model when autoencoder_type="variational". |
| |
| ```python |
| { |
| "type": "variational", |
| "input_dim": 128, |
| "latent_dim": 64 |
| } |
| ``` |
| |
| ### Custom configuration examples |
|
|
| #### Mixed architecture (Conv + Attention + Linear) |
| ```python |
| from configuration_autoencoder import AutoencoderConfig |
| |
| enc = [ |
| # 1D convolution for local patterns |
| {"type": "conv1d", "input_dim": 64, "output_dim": 128, "kernel_size": 3, "padding": "same", "activation": "relu"}, |
| {"type": "conv1d", "input_dim": 128, "output_dim": 128, "kernel_size": 3, "padding": "same", "activation": "relu"}, |
| |
| # Self-attention for global dependencies |
| {"type": "attention", "input_dim": 128, "num_heads": 8, "ffn_dim": 512, "dropout_rate": 0.1}, |
| |
| # Final linear projection |
| {"type": "linear", "input_dim": 128, "output_dim": 64, "activation": "relu", "normalization": "batch"} |
| ] |
| |
| dec = [ |
| {"type": "linear", "input_dim": 32, "output_dim": 64, "activation": "relu", "normalization": "batch"}, |
| {"type": "linear", "input_dim": 64, "output_dim": 128, "activation": "relu", "normalization": "batch"}, |
| {"type": "linear", "input_dim": 128, "output_dim": 64, "activation": "identity", "normalization": "none"} |
| ] |
| |
| cfg = AutoencoderConfig( |
| input_dim=64, |
| latent_dim=32, |
| autoencoder_type="classic", |
| encoder_blocks=enc, |
| decoder_blocks=dec |
| ) |
| ``` |
|
|
| #### Hierarchical encoder (multiple scales) |
| ```python |
| enc = [ |
| # Local features |
| {"type": "linear", "input_dim": 784, "output_dim": 512, "activation": "relu", "normalization": "batch"}, |
| {"type": "linear", "input_dim": 512, "output_dim": 256, "activation": "relu", "normalization": "batch"}, |
| |
| # Mid-level features with residual |
| {"type": "linear", "input_dim": 256, "output_dim": 256, "activation": "relu", "normalization": "batch", "use_residual": True}, |
| {"type": "linear", "input_dim": 256, "output_dim": 256, "activation": "relu", "normalization": "batch", "use_residual": True}, |
| |
| # High-level features |
| {"type": "linear", "input_dim": 256, "output_dim": 128, "activation": "relu", "normalization": "batch"}, |
| {"type": "linear", "input_dim": 128, "output_dim": 64, "activation": "relu", "normalization": "batch"} |
| ] |
| ``` |
|
|
| #### Sequence-to-sequence with recurrent encoder |
| ```python |
| enc = [ |
| {"type": "recurrent", "input_dim": 100, "hidden_size": 128, "num_layers": 2, "rnn_type": "lstm", "bidirectional": True, "output_dim": 256}, |
| {"type": "linear", "input_dim": 256, "output_dim": 128, "activation": "tanh", "normalization": "layer"} |
| ] |
| |
| dec = [ |
| {"type": "linear", "input_dim": 64, "output_dim": 128, "activation": "tanh", "normalization": "layer"}, |
| {"type": "linear", "input_dim": 128, "output_dim": 100, "activation": "identity", "normalization": "none"} |
| ] |
| ``` |
|
|
| ### Input shape handling |
| - **2D inputs (B, D)**: Work with Linear blocks directly. Attention/Recurrent/Conv blocks treat as (B, 1, D) |
| - **3D inputs (B, T, D)**: Work with all block types. Linear blocks operate per-timestep |
| - **Output shapes**: Decoder typically outputs same shape as input. For sequence models, final shape depends on decoder architecture |
|
|
| ## Configuration (configuration_autoencoder.py) |
| |
| AutoencoderConfig is the core configuration class. Important fields: |
| - input_dim: feature dimension (D) |
| - latent_dim: latent size |
| - encoder_blocks, decoder_blocks: block lists (see block types above) |
| - activation, dropout_rate, use_batch_norm: defaults used by some presets |
| - autoencoder_type: classic | variational | beta_vae | denoising | sparse | contractive | recurrent |
| - Reconstruction losses: mse | bce | l1 | huber | smooth_l1 | kl_div | cosine | focal | dice | tversky | ssim | perceptual |
| - Preprocessing: use_learnable_preprocessing, preprocessing_type, learn_inverse_preprocessing |
| |
| Example: |
| ```python |
| from configuration_autoencoder import AutoencoderConfig |
| cfg = AutoencoderConfig( |
| input_dim=128, |
| latent_dim=32, |
| autoencoder_type="variational", |
| encoder_blocks=[{"type": "linear", "input_dim": 128, "output_dim": 64, "activation": "relu"}], |
| decoder_blocks=[{"type": "linear", "input_dim": 32, "output_dim": 128, "activation": "identity", "normalization": "none"}], |
| ) |
| ``` |
| |
| ## Models (modeling_autoencoder.py) |
| |
| Main classes: |
| - AutoencoderModel: core module exposing forward that returns last_hidden_state (latent) and reconstructed |
| - AutoencoderForReconstruction: HF-compatible model wrapper with from_pretrained/save_pretrained |
| |
| Forward usage: |
| ```python |
| from modeling_autoencoder import AutoencoderModel |
| x = torch.randn(8, 20) |
| out = model(x, return_dict=True) |
| print(out.last_hidden_state.shape, out.reconstructed.shape) |
| ``` |
| |
| ### Variational behavior |
| If cfg.autoencoder_type == "variational" or "beta_vae": |
| - The model uses an internal VariationalBlock to compute mu and logvar |
| - Samples z during training; uses mu during eval |
| - KL term available via model._mu/_logvar (exposed in hidden_states when requested) |
|
|
| ```python |
| out = model(x, return_dict=True, output_hidden_states=True) |
| latent, mu, logvar = out.hidden_states |
| ``` |
|
|
| ## Preprocessing (preprocessing.py) |
|
|
| - PreprocessingBlock wraps LearnablePreprocessor and can be placed before/after the core encoder/decoder |
| - When enabled via config.use_learnable_preprocessing, the model constructs two blocks: pre (forward) and post (inverse) |
| - The block tracks reg_loss, which is added to preprocessing_loss in the model output |
|
|
| ```python |
| from template import PreprocessedAutoencoderConfig |
| cfg = PreprocessedAutoencoderConfig(input_dim=64, latent_dim=32, preprocessing_type="neural_scaler") |
| model = AutoencoderModel(cfg) |
| ``` |
|
|
| ## Utilities (utils.py) |
|
|
| Common helpers: |
| - _get_activation(name) |
| - _get_norm(name, num_groups=None) |
| - _flatten_3d_to_2d(x), _maybe_restore_3d(x, ref) |
|
|
| ## Training examples |
|
|
| ### Basic MSE reconstruction |
| ```python |
| from modeling_autoencoder import AutoencoderModel |
| from template import ClassicAutoencoderConfig |
| |
| cfg = ClassicAutoencoderConfig(input_dim=784, latent_dim=64) |
| model = AutoencoderModel(cfg) |
| opt = torch.optim.Adam(model.parameters(), lr=1e-3) |
| |
| for x in dataloader: # x: (B, 784) |
| out = model(x, return_dict=True) |
| loss = torch.nn.functional.mse_loss(out.reconstructed, x) |
| loss.backward(); opt.step(); opt.zero_grad() |
| ``` |
|
|
| ### VAE with KL term |
| ```python |
| from template import VariationalAutoencoderConfig |
| cfg = VariationalAutoencoderConfig(input_dim=784, latent_dim=32) |
| model = AutoencoderModel(cfg) |
| |
| for x in dataloader: |
| out = model(x, return_dict=True, output_hidden_states=True) |
| recon = torch.nn.functional.mse_loss(out.reconstructed, x) |
| _, mu, logvar = out.hidden_states |
| kl = -0.5 * torch.mean(1 + logvar - mu.pow(2) - logvar.exp()) |
| loss = recon + cfg.beta * kl |
| loss.backward(); opt.step(); opt.zero_grad() |
| ``` |
|
|
| ### Sequence reconstruction (Conv + Attention) |
| ```python |
| from template import ConvAttentionAutoencoderConfig |
| cfg = ConvAttentionAutoencoderConfig(input_dim=64, latent_dim=64) |
| model = AutoencoderModel(cfg) |
| |
| x = torch.randn(8, 50, 64) # (B, T, D) |
| out = model(x, return_dict=True) |
| ``` |
|
|
| ## End-to-end saving/loading |
| ```python |
| from modeling_autoencoder import AutoencoderForReconstruction |
| |
| model.save_pretrained("./my_ae") |
| reloaded = AutoencoderForReconstruction.from_pretrained("./my_ae") |
| ``` |
|
|
| ## Troubleshooting |
| - Check that block input_dim/output_dim align across adjacent blocks |
| - For attention/recurrent/conv blocks, prefer 3D inputs (B, T, D). 2D inputs are coerced to (B, 1, D) |
| - For variational/beta-VAE, ensure latent_dim is set; KL term available via hidden states |
| - When preprocessing is enabled, preprocessing_loss is included in the output for logging/regularization |
|
|
|
|
| ## Full AutoencoderConfig reference |
|
|
| Below is a comprehensive reference for all fields in configuration_autoencoder.AutoencoderConfig. Some fields are primarily used by presets or advanced features but are documented here for completeness. |
| |
| - input_dim (int, default=784): Input feature dimension D. For sequences, D is per-timestep feature size. |
| - hidden_dims (List[int], default=[512,256,128]): Legacy convenience list for simple MLPs. Prefer encoder_blocks. |
| - encoder_blocks (List[dict] | None): Block list for encoder. See Block-based architecture for block schemas. |
| - decoder_blocks (List[dict] | None): Block list for decoder. If omitted, model may derive a simple decoder from hidden_dims. |
| - latent_dim (int, default=64): Latent space dimension. |
| - activation (str, default="relu"): Default activation for Linear blocks when using legacy paths or presets. |
| - dropout_rate (float, default=0.1): Default dropout used in presets and some layers. |
| - use_batch_norm (bool, default=True): Default normalization flag used in presets ("batch" if True, else "none"). |
| - tie_weights (bool, default=False): If True, share/tie encoder and decoder weights (feature not always active depending on architecture). |
| - reconstruction_loss (str, default="mse"): Which loss to use in AutoencoderForReconstruction. One of: |
| - "mse", "bce", "l1", "huber", "smooth_l1", "kl_div", "cosine", "focal", "dice", "tversky", "ssim", "perceptual". |
| - autoencoder_type (str, default="classic"): Architecture variant. One of: |
| - "classic", "variational", "beta_vae", "denoising", "sparse", "contractive", "recurrent". |
| - beta (float, default=1.0): KL weight for VAE/beta-VAE. |
| - temperature (float, default=1.0): Reserved for temperature-based operations. |
| - noise_factor (float, default=0.1): Denoising strength used by Denoising variants. |
| - rnn_type (str, default="lstm"): For recurrent variants. One of: "lstm", "gru", "rnn". |
| - num_layers (int, default=2): Number of RNN layers for recurrent variants. |
| - bidirectional (bool, default=True): Whether RNN is bidirectional in recurrent variants. |
| - sequence_length (int | None, default=None): Optional fixed sequence length; if None, variable length is supported. |
| - teacher_forcing_ratio (float, default=0.5): For recurrent decoders that use teacher forcing. |
| - use_learnable_preprocessing (bool, default=False): Enable learnable preprocessing. |
| - preprocessing_type (str, default="none"): One of: "none", "neural_scaler", "normalizing_flow", "minmax_scaler", "robust_scaler", "yeo_johnson". |
| - preprocessing_hidden_dim (int, default=64): Hidden size for preprocessing networks. |
| - preprocessing_num_layers (int, default=2): Number of layers for preprocessing networks. |
| - learn_inverse_preprocessing (bool, default=True): Whether to learn inverse transform for reconstruction. |
| - flow_coupling_layers (int, default=4): Number of coupling layers for normalizing flows. |
| |
| Derived helpers and flags: |
| - has_block_lists: True if either encoder_blocks or decoder_blocks is provided. |
| - is_variational: True if autoencoder_type in {"variational", "beta_vae"}. |
| - is_denoising, is_sparse, is_contractive, is_recurrent: Variant flags. |
| - has_preprocessing: True if preprocessing enabled and type != "none". |
| |
| Validation notes: |
| - activation must be one of the supported list in configuration_autoencoder.py |
| - reconstruction_loss must be one of the supported list |
| - Many numeric parameters are validated to be positive or within [0,1] |
| |
| ## Training with Hugging Face Trainer |
| |
| The AutoencoderForReconstruction model computes reconstruction loss internally using config.reconstruction_loss. For VAEs/beta-VAEs, it adds the KL term scaled by config.beta. You can plug it directly into transformers.Trainer. |
|
|
| ```python |
| from transformers import Trainer, TrainingArguments |
| from modeling_autoencoder import AutoencoderForReconstruction |
| from template import ClassicAutoencoderConfig |
| import torch |
| from torch.utils.data import Dataset |
| |
| # 1) Config and model |
| cfg = ClassicAutoencoderConfig(input_dim=64, latent_dim=16) |
| model = AutoencoderForReconstruction(cfg) |
| |
| # 2) Dummy dataset (replace with your own) |
| class ToyAEDataset(Dataset): |
| def __init__(self, n=1024, d=64): |
| self.x = torch.randn(n, d) |
| def __len__(self): |
| return self.x.size(0) |
| def __getitem__(self, idx): |
| xi = self.x[idx] |
| return {"input_values": xi, "labels": xi} |
| |
| train_ds = ToyAEDataset() |
| |
| # 3) TrainingArguments |
| args = TrainingArguments( |
| output_dir="./ae-trainer", |
| per_device_train_batch_size=64, |
| learning_rate=1e-3, |
| num_train_epochs=3, |
| logging_steps=50, |
| save_steps=200, |
| report_to=[], # disable wandb if not configured |
| ) |
| |
| # 4) Trainer |
| trainer = Trainer( |
| model=model, |
| args=args, |
| train_dataset=train_ds, |
| ) |
| |
| # 5) Train |
| trainer.train() |
| |
| # 6) Use the model |
| x = torch.randn(4, 64) |
| out = model(input_values=x, return_dict=True) |
| print(out.last_hidden_state.shape, out.reconstructed.shape) |
| ``` |
|
|
| Notes: |
| - The dataset must yield dicts with "input_values" and optionally "labels"; if labels are missing, the model uses input as the target. |
| - For sequence inputs, shape is (B, T, D). For simple vectors, (B, D). |
| - Set cfg.reconstruction_loss to e.g. "bce" to switch the internal loss (the decoder head applies sigmoid when BCE is used). |
| - For VAE/beta-VAE, use VariationalAutoencoderConfig/BetaVariationalAutoencoderConfig. |
|
|
|
|
| ### Example using AutoencoderConfig directly |
|
|
| Below shows how to define a configuration purely with block dicts using AutoencoderConfig, without the template classes. |
|
|
| ```python |
| from configuration_autoencoder import AutoencoderConfig |
| from modeling_autoencoder import AutoencoderModel |
| import torch |
| |
| # Encoder: Linear -> Attention -> Linear |
| enc = [ |
| {"type": "linear", "input_dim": 128, "output_dim": 128, "activation": "relu", "normalization": "batch", "dropout_rate": 0.1}, |
| {"type": "attention", "input_dim": 128, "num_heads": 4, "ffn_dim": 512, "dropout_rate": 0.1}, |
| {"type": "linear", "input_dim": 128, "output_dim": 64, "activation": "relu", "normalization": "batch"}, |
| ] |
| |
| # Decoder: Linear -> Linear (final identity) |
| dec = [ |
| {"type": "linear", "input_dim": 32, "output_dim": 64, "activation": "relu", "normalization": "batch"}, |
| {"type": "linear", "input_dim": 64, "output_dim": 128, "activation": "identity", "normalization": "none"}, |
| ] |
| |
| cfg = AutoencoderConfig( |
| input_dim=128, |
| latent_dim=32, |
| encoder_blocks=enc, |
| decoder_blocks=dec, |
| autoencoder_type="classic", |
| ) |
| |
| model = AutoencoderModel(cfg) |
| x = torch.randn(4, 128) |
| out = model(x, return_dict=True) |
| print(out.last_hidden_state.shape, out.reconstructed.shape) |
| ``` |
|
|
| For a variational model, set autoencoder_type="variational" and the model will internally use a VariationalBlock for mu/logvar and sampling. |
| |
| |
| ## Learnable preprocessing |
| Enable learnable preprocessing and its inverse with the PreprocessedAutoencoderConfig class or via flags. |
| |
| ```python |
| from template import PreprocessedAutoencoderConfig |
| cfg = PreprocessedAutoencoderConfig(input_dim=64, latent_dim=32, preprocessing_type="neural_scaler") |
| ``` |
| |
| Supported preprocessing_type values include: "neural_scaler", "normalizing_flow", "minmax_scaler", "robust_scaler", "yeo_johnson". |
| |
| ## Saving and loading |
| ```python |
| from modeling_autoencoder import AutoencoderForReconstruction |
|
|
| # Save |
| model.save_pretrained("./my_ae") |
| # Load |
| reloaded = AutoencoderForReconstruction.from_pretrained("./my_ae") |
| ``` |
| |
| ## Reference |
| Core modules: |
| - configuration_autoencoder.AutoencoderConfig |
| - modeling_autoencoder.AutoencoderModel, AutoencoderForReconstruction |
| - blocks: BlockFactory, BlockSequence, Linear/Attention/Recurrent/Convolutional/Variational blocks |
| - preprocessing: PreprocessingBlock (learnable preprocessing wrapper) |
| - template: class-based presets listed above |
| |
| ## License |
| Apache-2.0 (see LICENSE) |
| |