boto3 torch streamlit transformers sentence-transformers PyPDF4 PyPDF2 docx2txt scikit-learn pdfplumber xhtml2pdf fpdf