Unable to deploy model with huggingface tei

by dhruv-wrk - opened Aug 25, 2025

Aug 25, 2025

Hi, has anyone been able to deploy this with huggingface tei on Sagemaker? I am trying to see how to use this in Sagemaker and do the sparse embedding computation through the endpoint

zhichao-geng

opensearch-project org Sep 4, 2025

Hi @dhruv-wrk , I'm not quite familiar with HF tei. will take a look into it. Before we run to a solution, you can try this tutorial to get it deployed on sagemaker https://github.com/opensearch-project/ml-commons/blob/main/docs/model_serving_framework/deploy_sparse_model_to_SageMaker.ipynb

vishva399

Oct 21, 2025

I am facing the same issue as @dhruv-wrk .

Code as given on the model card

import json
import sagemaker
import boto3
from sagemaker.huggingface import HuggingFaceModel, get_huggingface_llm_image_uri

try:
    role = sagemaker.get_execution_role()
except ValueError:
    iam = boto3.client('iam')
    role = iam.get_role(RoleName='sagemaker_execution_role')['Role']['Arn']

# Hub Model configuration. https://huggingface.co/models
hub = {
    'HF_MODEL_ID':'opensearch-project/opensearch-neural-sparse-encoding-doc-v3-gte'
}


# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
    image_uri=get_huggingface_llm_image_uri("huggingface-tei",version="1.8.2"),
    env=hub,
    role=role, 
)

# deploy model to SageMaker Inference
predictor = huggingface_model.deploy(
    initial_instance_count=1,
    instance_type="ml.g5.2xlarge",
  )
  
# send request
predictor.predict({
    "inputs": "My name is Clara and I am",
})

zhichao-geng

opensearch-project org Oct 24, 2025

Hi @dhruv-wrk @vishva399 ,

To use SPLADE pooling in TEI, we need to apply one change to @vishva399 's code. I.e. add a "POOLING" field to env.

hub = {
    'HF_MODEL_ID':'opensearch-project/opensearch-neural-sparse-encoding-doc-v2-mini',
    "POOLING": "splade",
}

Furthermore, the TEI's pooling logics are hard coded https://github.com/huggingface/text-embeddings-inference/blob/9ef569d83083afa30784223d0a0352229d094898/backends/python/server/text_embeddings_server/models/pooling.py#L38 And for v3-series, we're using log1p_relu, which is different from TEI's implementation. So we'd recommend to use v2/v1 series models with TEI.

zhichao-geng

opensearch-project org Oct 24, 2025

And to support the new pooling options, we need to create issues or PRs to huggingface/text-embeddings-inference repo.

vishva399

Oct 24, 2025

Thank you for your quick response it was very helpful

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment