LLM Task Underspecification Detection
👀
9
Evaluate gendered pronoun resolution in text
Visualize transformer computations with a tuned lens
Interact with Falcon-Chat for personalized conversations
Track, rank and evaluate open LLMs and chatbots
Explore and calibrate model predictions to better understand probabilities
Generate text answers to various prompts
Generate code snippets in Python, Java, JavaScript