AndrewMayesPrezzee

Feat - Block like transformer structure

8abd44b 8 months ago

19.7 kB

	---
	# Metadata for Hugging Face repo card
	library_name: transformers
	pipeline_tag: feature-extraction
	license: apache-2.0
	tags:
	- autoencoder
	- pytorch
	- reconstruction
	- preprocessing
	- normalizing-flow
	- scaler
	---

	## Autoencoder for Hugging Face Transformers (Block-based)

	A flexible, production-grade Autoencoder implementation built to fit naturally into the Transformers ecosystem. It supports a new block-based architecture with ready-to-use templates for classic MLP, VAE/beta-VAE, Transformer, Recurrent, Convolutional, mixed hybrids, and learnable preprocessing.

	### Key features
	- Block-based architecture: Linear, Attention, Recurrent (LSTM/GRU), Convolutional, Variational blocks
	- Class-based configuration presets in template.py for quick starts
	- Variational and beta-VAE variants (KL-controlled)
	- Learnable preprocessing and inverse transforms
	- Hugging Face-compatible config/model API and from_pretrained/save_pretrained

	## Install and load from the Hub (code repo)

	```python
	from huggingface_hub import snapshot_download
	import sys, torch

	repo_dir = snapshot_download(
	repo_id="amaye15/autoencoder",
	repo_type="model",
	allow_patterns=[".py", "config.json", ".safetensors"],
	)
	sys.path.append(repo_dir)

	from modeling_autoencoder import AutoencoderForReconstruction
	model = AutoencoderForReconstruction.from_pretrained(repo_dir)

	x = torch.randn(8, 20)
	out = model(input_values=x)
	print("latent:", out.last_hidden_state.shape, "reconstructed:", out.reconstructed.shape)
	```

	## Quickstart with class-based templates

	```python
	from modeling_autoencoder import AutoencoderModel
	from template import ClassicAutoencoderConfig

	cfg = ClassicAutoencoderConfig(input_dim=784, latent_dim=64)
	model = AutoencoderModel(cfg)

	x = torch.randn(4, 784)
	out = model(x, return_dict=True)
	print(out.last_hidden_state.shape, out.reconstructed.shape)
	```

	### Available presets (template.py)
	- ClassicAutoencoderConfig: Dense MLP AE
	- VariationalAutoencoderConfig: VAE with KL regularization
	- BetaVariationalAutoencoderConfig: beta-VAE (beta > 1)
	- TransformerAutoencoderConfig: Attention-based encoder for sequences
	- RecurrentAutoencoderConfig: LSTM/GRU encoder for sequences
	- ConvolutionalAutoencoderConfig: 1D Conv encoder for sequences
	- ConvAttentionAutoencoderConfig: Mixed Conv + Attention encoder
	- LinearRecurrentAutoencoderConfig: Linear down-projection + RNN
	- PreprocessedAutoencoderConfig: MLP AE with learnable preprocessing

	## Block-based architecture

	The autoencoder uses a modular block system where you define encoder_blocks and decoder_blocks as lists of dictionaries. Each block dict specifies its type and parameters.

	### Available block types

	#### LinearBlock
	Dense layer with optional normalization, activation, dropout, and residual connections.

	```python
	{
	"type": "linear",
	"input_dim": 256,
	"output_dim": 128,
	"activation": "relu", # relu, gelu, tanh, sigmoid, etc.
	"normalization": "batch", # batch, layer, group, instance, none
	"dropout_rate": 0.1,
	"use_residual": False, # adds skip connection if input_dim == output_dim
	"residual_scale": 1.0
	}
	```

	#### AttentionBlock
	Multi-head self-attention with feed-forward network. Works with 2D (B, D) or 3D (B, T, D) inputs.

	```python
	{
	"type": "attention",
	"input_dim": 128,
	"num_heads": 8,
	"ffn_dim": 512, # if None, defaults to 4 * input_dim
	"dropout_rate": 0.1
	}
	```

	#### RecurrentBlock
	LSTM, GRU, or vanilla RNN encoder. Outputs final hidden state or all timesteps.

	```python
	{
	"type": "recurrent",
	"input_dim": 64,
	"hidden_size": 128,
	"num_layers": 2,
	"rnn_type": "lstm", # lstm, gru, rnn
	"bidirectional": True,
	"dropout_rate": 0.1,
	"output_dim": 128 # final output dimension
	}
	```

	#### ConvolutionalBlock
	1D convolution for sequence data. Expects 3D input (B, T, D).

	```python
	{
	"type": "conv1d",
	"input_dim": 64, # input channels
	"output_dim": 128, # output channels
	"kernel_size": 3,
	"padding": "same", # "same" or integer
	"activation": "relu",
	"normalization": "batch",
	"dropout_rate": 0.1
	}
	```

	#### VariationalBlock
	Produces mu and logvar for VAE reparameterization. Used internally by the model when autoencoder_type="variational".

	```python
	{
	"type": "variational",
	"input_dim": 128,
	"latent_dim": 64
	}
	```

	### Custom configuration examples

	#### Mixed architecture (Conv + Attention + Linear)
	```python
	from configuration_autoencoder import AutoencoderConfig

	enc = [
	# 1D convolution for local patterns
	{"type": "conv1d", "input_dim": 64, "output_dim": 128, "kernel_size": 3, "padding": "same", "activation": "relu"},
	{"type": "conv1d", "input_dim": 128, "output_dim": 128, "kernel_size": 3, "padding": "same", "activation": "relu"},

	# Self-attention for global dependencies
	{"type": "attention", "input_dim": 128, "num_heads": 8, "ffn_dim": 512, "dropout_rate": 0.1},

	# Final linear projection
	{"type": "linear", "input_dim": 128, "output_dim": 64, "activation": "relu", "normalization": "batch"}
	]

	dec = [
	{"type": "linear", "input_dim": 32, "output_dim": 64, "activation": "relu", "normalization": "batch"},
	{"type": "linear", "input_dim": 64, "output_dim": 128, "activation": "relu", "normalization": "batch"},
	{"type": "linear", "input_dim": 128, "output_dim": 64, "activation": "identity", "normalization": "none"}
	]

	cfg = AutoencoderConfig(
	input_dim=64,
	latent_dim=32,
	autoencoder_type="classic",
	encoder_blocks=enc,
	decoder_blocks=dec
	)
	```

	#### Hierarchical encoder (multiple scales)
	```python
	enc = [
	# Local features
	{"type": "linear", "input_dim": 784, "output_dim": 512, "activation": "relu", "normalization": "batch"},
	{"type": "linear", "input_dim": 512, "output_dim": 256, "activation": "relu", "normalization": "batch"},

	# Mid-level features with residual
	{"type": "linear", "input_dim": 256, "output_dim": 256, "activation": "relu", "normalization": "batch", "use_residual": True},
	{"type": "linear", "input_dim": 256, "output_dim": 256, "activation": "relu", "normalization": "batch", "use_residual": True},

	# High-level features
	{"type": "linear", "input_dim": 256, "output_dim": 128, "activation": "relu", "normalization": "batch"},
	{"type": "linear", "input_dim": 128, "output_dim": 64, "activation": "relu", "normalization": "batch"}
	]
	```

	#### Sequence-to-sequence with recurrent encoder
	```python
	enc = [
	{"type": "recurrent", "input_dim": 100, "hidden_size": 128, "num_layers": 2, "rnn_type": "lstm", "bidirectional": True, "output_dim": 256},
	{"type": "linear", "input_dim": 256, "output_dim": 128, "activation": "tanh", "normalization": "layer"}
	]

	dec = [
	{"type": "linear", "input_dim": 64, "output_dim": 128, "activation": "tanh", "normalization": "layer"},
	{"type": "linear", "input_dim": 128, "output_dim": 100, "activation": "identity", "normalization": "none"}
	]
	```

	### Input shape handling
	- 2D inputs (B, D): Work with Linear blocks directly. Attention/Recurrent/Conv blocks treat as (B, 1, D)
	- 3D inputs (B, T, D): Work with all block types. Linear blocks operate per-timestep
	- Output shapes: Decoder typically outputs same shape as input. For sequence models, final shape depends on decoder architecture

	## Configuration (configuration_autoencoder.py)

	AutoencoderConfig is the core configuration class. Important fields:
	- input_dim: feature dimension (D)
	- latent_dim: latent size
	- encoder_blocks, decoder_blocks: block lists (see block types above)
	- activation, dropout_rate, use_batch_norm: defaults used by some presets
	- autoencoder_type: classic \| variational \| beta_vae \| denoising \| sparse \| contractive \| recurrent
	- Reconstruction losses: mse \| bce \| l1 \| huber \| smooth_l1 \| kl_div \| cosine \| focal \| dice \| tversky \| ssim \| perceptual
	- Preprocessing: use_learnable_preprocessing, preprocessing_type, learn_inverse_preprocessing

	Example:
	```python
	from configuration_autoencoder import AutoencoderConfig
	cfg = AutoencoderConfig(
	input_dim=128,
	latent_dim=32,
	autoencoder_type="variational",
	encoder_blocks=[{"type": "linear", "input_dim": 128, "output_dim": 64, "activation": "relu"}],
	decoder_blocks=[{"type": "linear", "input_dim": 32, "output_dim": 128, "activation": "identity", "normalization": "none"}],
	)
	```

	## Models (modeling_autoencoder.py)

	Main classes:
	- AutoencoderModel: core module exposing forward that returns last_hidden_state (latent) and reconstructed
	- AutoencoderForReconstruction: HF-compatible model wrapper with from_pretrained/save_pretrained

	Forward usage:
	```python
	from modeling_autoencoder import AutoencoderModel
	x = torch.randn(8, 20)
	out = model(x, return_dict=True)
	print(out.last_hidden_state.shape, out.reconstructed.shape)
	```

	### Variational behavior
	If cfg.autoencoder_type == "variational" or "beta_vae":
	- The model uses an internal VariationalBlock to compute mu and logvar
	- Samples z during training; uses mu during eval
	- KL term available via model._mu/_logvar (exposed in hidden_states when requested)

	```python
	out = model(x, return_dict=True, output_hidden_states=True)
	latent, mu, logvar = out.hidden_states
	```

	## Preprocessing (preprocessing.py)

	- PreprocessingBlock wraps LearnablePreprocessor and can be placed before/after the core encoder/decoder
	- When enabled via config.use_learnable_preprocessing, the model constructs two blocks: pre (forward) and post (inverse)
	- The block tracks reg_loss, which is added to preprocessing_loss in the model output

	```python
	from template import PreprocessedAutoencoderConfig
	cfg = PreprocessedAutoencoderConfig(input_dim=64, latent_dim=32, preprocessing_type="neural_scaler")
	model = AutoencoderModel(cfg)
	```

	## Utilities (utils.py)

	Common helpers:
	- _get_activation(name)
	- _get_norm(name, num_groups=None)
	- _flatten_3d_to_2d(x), _maybe_restore_3d(x, ref)

	## Training examples

	### Basic MSE reconstruction
	```python
	from modeling_autoencoder import AutoencoderModel
	from template import ClassicAutoencoderConfig

	cfg = ClassicAutoencoderConfig(input_dim=784, latent_dim=64)
	model = AutoencoderModel(cfg)
	opt = torch.optim.Adam(model.parameters(), lr=1e-3)

	for x in dataloader: # x: (B, 784)
	out = model(x, return_dict=True)
	loss = torch.nn.functional.mse_loss(out.reconstructed, x)
	loss.backward(); opt.step(); opt.zero_grad()
	```

	### VAE with KL term
	```python
	from template import VariationalAutoencoderConfig
	cfg = VariationalAutoencoderConfig(input_dim=784, latent_dim=32)
	model = AutoencoderModel(cfg)

	for x in dataloader:
	out = model(x, return_dict=True, output_hidden_states=True)
	recon = torch.nn.functional.mse_loss(out.reconstructed, x)
	_, mu, logvar = out.hidden_states
	kl = -0.5 * torch.mean(1 + logvar - mu.pow(2) - logvar.exp())
	loss = recon + cfg.beta * kl
	loss.backward(); opt.step(); opt.zero_grad()
	```

	### Sequence reconstruction (Conv + Attention)
	```python
	from template import ConvAttentionAutoencoderConfig
	cfg = ConvAttentionAutoencoderConfig(input_dim=64, latent_dim=64)
	model = AutoencoderModel(cfg)

	x = torch.randn(8, 50, 64) # (B, T, D)
	out = model(x, return_dict=True)
	```

	## End-to-end saving/loading
	```python
	from modeling_autoencoder import AutoencoderForReconstruction

	model.save_pretrained("./my_ae")
	reloaded = AutoencoderForReconstruction.from_pretrained("./my_ae")
	```

	## Troubleshooting
	- Check that block input_dim/output_dim align across adjacent blocks
	- For attention/recurrent/conv blocks, prefer 3D inputs (B, T, D). 2D inputs are coerced to (B, 1, D)
	- For variational/beta-VAE, ensure latent_dim is set; KL term available via hidden states
	- When preprocessing is enabled, preprocessing_loss is included in the output for logging/regularization


	## Full AutoencoderConfig reference

	Below is a comprehensive reference for all fields in configuration_autoencoder.AutoencoderConfig. Some fields are primarily used by presets or advanced features but are documented here for completeness.

	- input_dim (int, default=784): Input feature dimension D. For sequences, D is per-timestep feature size.
	- hidden_dims (List[int], default=[512,256,128]): Legacy convenience list for simple MLPs. Prefer encoder_blocks.
	- encoder_blocks (List[dict] \| None): Block list for encoder. See Block-based architecture for block schemas.
	- decoder_blocks (List[dict] \| None): Block list for decoder. If omitted, model may derive a simple decoder from hidden_dims.
	- latent_dim (int, default=64): Latent space dimension.
	- activation (str, default="relu"): Default activation for Linear blocks when using legacy paths or presets.
	- dropout_rate (float, default=0.1): Default dropout used in presets and some layers.
	- use_batch_norm (bool, default=True): Default normalization flag used in presets ("batch" if True, else "none").
	- tie_weights (bool, default=False): If True, share/tie encoder and decoder weights (feature not always active depending on architecture).
	- reconstruction_loss (str, default="mse"): Which loss to use in AutoencoderForReconstruction. One of:
	- "mse", "bce", "l1", "huber", "smooth_l1", "kl_div", "cosine", "focal", "dice", "tversky", "ssim", "perceptual".
	- autoencoder_type (str, default="classic"): Architecture variant. One of:
	- "classic", "variational", "beta_vae", "denoising", "sparse", "contractive", "recurrent".
	- beta (float, default=1.0): KL weight for VAE/beta-VAE.
	- temperature (float, default=1.0): Reserved for temperature-based operations.
	- noise_factor (float, default=0.1): Denoising strength used by Denoising variants.
	- rnn_type (str, default="lstm"): For recurrent variants. One of: "lstm", "gru", "rnn".
	- num_layers (int, default=2): Number of RNN layers for recurrent variants.
	- bidirectional (bool, default=True): Whether RNN is bidirectional in recurrent variants.
	- sequence_length (int \| None, default=None): Optional fixed sequence length; if None, variable length is supported.
	- teacher_forcing_ratio (float, default=0.5): For recurrent decoders that use teacher forcing.
	- use_learnable_preprocessing (bool, default=False): Enable learnable preprocessing.
	- preprocessing_type (str, default="none"): One of: "none", "neural_scaler", "normalizing_flow", "minmax_scaler", "robust_scaler", "yeo_johnson".
	- preprocessing_hidden_dim (int, default=64): Hidden size for preprocessing networks.
	- preprocessing_num_layers (int, default=2): Number of layers for preprocessing networks.
	- learn_inverse_preprocessing (bool, default=True): Whether to learn inverse transform for reconstruction.
	- flow_coupling_layers (int, default=4): Number of coupling layers for normalizing flows.

	Derived helpers and flags:
	- has_block_lists: True if either encoder_blocks or decoder_blocks is provided.
	- is_variational: True if autoencoder_type in {"variational", "beta_vae"}.
	- is_denoising, is_sparse, is_contractive, is_recurrent: Variant flags.
	- has_preprocessing: True if preprocessing enabled and type != "none".

	Validation notes:
	- activation must be one of the supported list in configuration_autoencoder.py
	- reconstruction_loss must be one of the supported list
	- Many numeric parameters are validated to be positive or within [0,1]

	## Training with Hugging Face Trainer

	The AutoencoderForReconstruction model computes reconstruction loss internally using config.reconstruction_loss. For VAEs/beta-VAEs, it adds the KL term scaled by config.beta. You can plug it directly into transformers.Trainer.

	```python
	from transformers import Trainer, TrainingArguments
	from modeling_autoencoder import AutoencoderForReconstruction
	from template import ClassicAutoencoderConfig
	import torch
	from torch.utils.data import Dataset

	# 1) Config and model
	cfg = ClassicAutoencoderConfig(input_dim=64, latent_dim=16)
	model = AutoencoderForReconstruction(cfg)

	# 2) Dummy dataset (replace with your own)
	class ToyAEDataset(Dataset):
	def __init__(self, n=1024, d=64):
	self.x = torch.randn(n, d)
	def __len__(self):
	return self.x.size(0)
	def __getitem__(self, idx):
	xi = self.x[idx]
	return {"input_values": xi, "labels": xi}

	train_ds = ToyAEDataset()

	# 3) TrainingArguments
	args = TrainingArguments(
	output_dir="./ae-trainer",
	per_device_train_batch_size=64,
	learning_rate=1e-3,
	num_train_epochs=3,
	logging_steps=50,
	save_steps=200,
	report_to=[], # disable wandb if not configured
	)

	# 4) Trainer
	trainer = Trainer(
	model=model,
	args=args,
	train_dataset=train_ds,
	)

	# 5) Train
	trainer.train()

	# 6) Use the model
	x = torch.randn(4, 64)
	out = model(input_values=x, return_dict=True)
	print(out.last_hidden_state.shape, out.reconstructed.shape)
	```

	Notes:
	- The dataset must yield dicts with "input_values" and optionally "labels"; if labels are missing, the model uses input as the target.
	- For sequence inputs, shape is (B, T, D). For simple vectors, (B, D).
	- Set cfg.reconstruction_loss to e.g. "bce" to switch the internal loss (the decoder head applies sigmoid when BCE is used).
	- For VAE/beta-VAE, use VariationalAutoencoderConfig/BetaVariationalAutoencoderConfig.


	### Example using AutoencoderConfig directly

	Below shows how to define a configuration purely with block dicts using AutoencoderConfig, without the template classes.

	```python
	from configuration_autoencoder import AutoencoderConfig
	from modeling_autoencoder import AutoencoderModel
	import torch

	# Encoder: Linear -> Attention -> Linear
	enc = [
	{"type": "linear", "input_dim": 128, "output_dim": 128, "activation": "relu", "normalization": "batch", "dropout_rate": 0.1},
	{"type": "attention", "input_dim": 128, "num_heads": 4, "ffn_dim": 512, "dropout_rate": 0.1},
	{"type": "linear", "input_dim": 128, "output_dim": 64, "activation": "relu", "normalization": "batch"},
	]

	# Decoder: Linear -> Linear (final identity)
	dec = [
	{"type": "linear", "input_dim": 32, "output_dim": 64, "activation": "relu", "normalization": "batch"},
	{"type": "linear", "input_dim": 64, "output_dim": 128, "activation": "identity", "normalization": "none"},
	]

	cfg = AutoencoderConfig(
	input_dim=128,
	latent_dim=32,
	encoder_blocks=enc,
	decoder_blocks=dec,
	autoencoder_type="classic",
	)

	model = AutoencoderModel(cfg)
	x = torch.randn(4, 128)
	out = model(x, return_dict=True)
	print(out.last_hidden_state.shape, out.reconstructed.shape)
	```

	For a variational model, set autoencoder_type="variational" and the model will internally use a VariationalBlock for mu/logvar and sampling.


	## Learnable preprocessing
	Enable learnable preprocessing and its inverse with the PreprocessedAutoencoderConfig class or via flags.

	```python
	from template import PreprocessedAutoencoderConfig
	cfg = PreprocessedAutoencoderConfig(input_dim=64, latent_dim=32, preprocessing_type="neural_scaler")
	```

	Supported preprocessing_type values include: "neural_scaler", "normalizing_flow", "minmax_scaler", "robust_scaler", "yeo_johnson".

	## Saving and loading
	```python
	from modeling_autoencoder import AutoencoderForReconstruction

	# Save
	model.save_pretrained("./my_ae")
	# Load
	reloaded = AutoencoderForReconstruction.from_pretrained("./my_ae")
	```

	## Reference
	Core modules:
	- configuration_autoencoder.AutoencoderConfig
	- modeling_autoencoder.AutoencoderModel, AutoencoderForReconstruction
	- blocks: BlockFactory, BlockSequence, Linear/Attention/Recurrent/Convolutional/Variational blocks
	- preprocessing: PreprocessingBlock (learnable preprocessing wrapper)
	- template: class-based presets listed above

	## License
	Apache-2.0 (see LICENSE)