Incident management insights, guides, and product updates from Rootly
We benchmarked the latest LLMs – including OpenAI's o3, Gemini 2.5 Pro, and Mistral Magistral – against real-world SRE tasks. See how they stack up and why no single model dominates across all use cases.
Laurence Liang
Discover IncidentDiagram, an open-source CLI tool that uses LLMs to turn incident retrospectives and codebases into easy-to-understand visual diagrams.
Jeba Emmanuel
Reliability engineering is evolving quickly—and AI is the catalyst. That’s why we’re excited to unveil Rootly AI Labs, a community-focused program dedicated to reshaping reliability through open collaboration, innovative prototypes, and cutting-edge research.
Sylvain Kalache
Rootly AI Labs analyzes the performance of Meta’s Llama 4 models and finds they underperform compared to competitors like Claude 3.5 Sonnet and Qwen2.5
A practical framework for evaluating AI tools based on four core pillars: Accuracy, Transparency, Adaptability, and Agentic capabilities.
Dinesh Sukhija
Connect Rootly to Cursor, Claude or Copilot with our open source MCP Server, available on GitHub.
Can a smaller AI model outperform a larger one? A distilled version of DeepSeek R1 (70B) outperformed Llama and nearly matched GPT-4o in classifying error logs. These results suggest that model efficiency, not just size, is key to AI performance in incident management.