This CUDA extension implements fused dropout + residual + LayerNorm from the flash-attention repo.
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
π
Ask for provider support
This CUDA extension implements fused dropout + residual + LayerNorm from the flash-attention repo.