Blog

Incident management insights, guides, and product updates from Rootly

Benchmarking Magistral for SRE tasks

We benchmarked the latest LLMs – including OpenAI's o3, Gemini 2.5 Pro, and Mistral Magistral – against real-world SRE tasks. See how they stack up and why no single model dominates across all use cases.

Laurence Liang

July 7, 2025

8 mins

How we built an OSS LLM-powered Incident Diagram Generator

Discover IncidentDiagram, an open-source CLI tool that uses LLMs to turn incident retrospectives and codebases into easy-to-understand visual diagrams.

Jeba Emmanuel

April 29, 2025

4 mins + video demo

Announcing Rootly AI Labs: Accelerating Reliability Engineering Through Community-Driven Innovation

Reliability engineering is evolving quickly—and AI is the catalyst. That’s why we’re excited to unveil Rootly AI Labs, a community-focused program dedicated to reshaping reliability through open collaboration, innovative prototypes, and cutting-edge research.

Sylvain Kalache

April 25, 2025

5 mins

Llama 4 underperforms: a benchmark against coding-centric models

Rootly AI Labs analyzes the performance of Meta’s Llama 4 models and finds they underperform compared to competitors like Claude 3.5 Sonnet and Qwen2.5

Sylvain Kalache

April 11, 2025

6 mins

A Guide to Evaluating AIOps and Agentic AI Tools

A practical framework for evaluating AI tools based on four core pillars: Accuracy, Transparency, Adaptability, and Agentic capabilities.

Dinesh Sukhija

March 31, 2025

8 mins

Introducing the Rootly MCP Server

Connect Rootly to Cursor, Claude or Copilot with our open source MCP Server, available on GitHub.

Sylvain Kalache

March 20, 2025

5 mins

Classifying Error Logs with AI: Can DeepSeek R1 Outperform GPT-4o and Llama 3?

Can a smaller AI model outperform a larger one? A distilled version of DeepSeek R1 (70B) outperformed Llama and nearly matched GPT-4o in classifying error logs. These results suggest that model efficiency, not just size, is key to AI performance in incident management.

Sylvain Kalache

February 19, 2025

6 mins