Choosing the Right Model for the Job
Lesson Overview
In this video, we'll explore how to choose the right language model (LLM) for your specific task within AirOps Studio. With the rapid advancements in AI models, it's essential to understand the strengths and weaknesses of each option to optimize your workflow and achieve the best results.
- 00:00: Introduction to selecting the right LLM model in AirOps Studio
- 00:28: Overview of the various AI model providers and the rapid pace of updates
- 01:18: Comparing models across intelligence, speed, and price dimensions
- 03:40: Reasoning models and their applications
Reasoning Models
Reasoning models are a new class of LLMs that take more time to think through your problem, break it down into sub-parts, plan an attack, and execute sequentially. These models are ideal for tasks that require strategic planning or research analysis.
Examples of reasoning models include:
- OpenAI: O4 and O3
- Anthropic: Claude 3.7 Sonnet with thinking enabled
- Perplexity: Perplexity Sonar Reasoning
Online Models
Online models have access to the internet and are great for obtaining up-to-date information for research tasks or ensuring you have the freshest data without conducting your own search. AirOps Studio offers built-in steps for adding Google search, scraping existing pages, and customizing the process to your specific use case.
Examples of online models include:
- OpenAI: GPT-40 Search Preview
- Perplexity: All models are online by default
- Google: Upcoming online model to be added soon
General Workhorse Models
These models are excellent for common, day-to-day tasks and are known for their thoughtfulness, accuracy, and instruction adherence.
Recommended general workhorse models:
- Anthropic: Claude 3.7 Sonnet (without deep thinking)
- OpenAI: GPT 4.1
Lightweight Models
Lightweight models are best suited for simple classification tasks, reformatting, and instances where you want to keep costs and latency low while ensuring the job gets done.
Examples of lightweight models include:
- Google: Gemini Flash
- OpenAI: GPT 4.1 Nano and Mini
- Anthropic: Haiku
Key Takeaways
- Experiment with different models to find the best fit for your specific use case. Model benchmark scores don't always correlate with performance on your task.
- Use reasoning models like O4, O3, Claude 3.7 Sonnet with thinking, or Perplexity Sonar Reasoning for strategic questions and tasks that require extended thinking time.
- Leverage online models such as GPT-40 Search Preview or Perplexity's offerings for up-to-date information and research tasks without conducting your own search.
- Rely on general workhorse models like Claude 3.7 Sonnet or GPT 4.1 for common, day-to-day tasks that require thoughtfulness, accuracy, and instruction adherence.
- Choose lightweight models like Gemini Flash, GPT 4.1 Nano/Mini, or Haiku for simple classification tasks, reformatting, and instances where cost and latency are a priority.
Prompting
Learn the best techniques for prompting in AirOps.