I — 📜 THE AI AGENTS AND ORCHESTRATORS MANUAL

1. 🚀 Introduction & Vision: The Era of Execution

Artificial intelligence is no longer just a chatbot answering questions like ChatGPT or Claude. We are entering the era of execution.

⚙️

AI becomes operational.

It no longer just advises, it acts. We are developing systems capable of making decisions, executing complex tasks, and learning from their mistakes.

The goal is to free humans from repetitive tasks to focus them on strategy and creation.

2. 🔍 Market Analysis: Moving Beyond Isolated AI

Most companies use AI in an isolated way: an employee copies and pastes text into an LLM, retrieves the result, and processes it manually. This is a waste of time and efficiency.

User

➡️

LLM (Manual)

➡️

Isolated Result

⚡ VS ⚡

Connected Workflow

🔗

AI Orchestration

🔗

Real ROI

The true ROI of AI lies in global orchestration and automation of processes via connected workflows.

3. 🏗️ Technical Pillars: Interconnection

Our approach is based on the interconnection of four major technological pillars:

🧠

LLMs

Strategic thinking and decision-making engines.

🔌

n8n

Workflow orchestration and data flows.

📚

RAG

Long-term memory and targeted business context.

🌐

APIs

Connection to business tools (CRM, ERP, Slack).

4. 💡 Our Solutions & Actions

We deploy concrete solutions to support this transformation:

🎓 AI & Agents Training

Upskilling managers and teams to integrate AI into their operational daily lives.

🛠️ Advanced Prompt Generator

An internal tool to structure perfect instructions and obtain optimal results without hallucinations, ensuring output reliability.

🛡️ Engineering Manifesto

Our philosophy: technical rigor, refusal of superficial jargon, and implementation of robust architectures (Docker, Controlled Cloud, Sovereignty).

5. 🔄 Expected Transformation: The Augmented Enterprise

Moving from a reactive approach to an AI-augmented enterprise.

💎 Core Values

✅ Transparency
✅ Security
✅ Auditability
✅ Scalability

II — UNDERSTANDING THE LANDSCAPE

Before choosing a model, it must be understood that there is no universal "best model." There are profiles adapted to specific uses. Here is the taxonomy that structures this guide.

The 6 AI Model Profiles

1. UI-first — The Visual Creatives

These models excel in generating interfaces: React components, Tailwind, animations, landing pages, clean HTML/CSS. They have a good sense of visual rhythm and produce quickly usable frontend code.

Strengths: Speed, visual quality, React/Tailwind, responsive design.
Limitations: Less reliable on backend architecture, sometimes generic on designs.
Typical Examples: Gemini Flash, Kimi K2.5

2. Reasoning-first — The Architects

These models think before they act. They break down problems, identify edge cases, and propose solid structures. Excellent for complex debugging, refactoring, and architectural decisions.

Strengths: Logic, debugging, architecture, consistency over long contexts.
Limitations: Sometimes slower, more verbose, less "creative" on the frontend.
Typical Examples: Claude Sonnet, GPT-5, GLM-5

3. Agent-first — The Autonomous Workers

These models are optimized for agentic flows: tool calling, execution loops, task chains. They know how to use tools, self-correct, and progress through several steps without constant supervision.

Strengths: Tool calling, orchestration, pipelines, autonomy.
Limitations: Sometimes less refined on creative tasks or deep one-off reasoning.
Typical Examples: DeepSeek V4, Claude (via API), Qwen with agents

4. Coding-first — The Reliable Developers

These models have been massively trained on code. They understand the nuances of frameworks, produce correct and consistent code, and handle multi-file projects well.

Strengths: Fullstack, Backend API, APIs, TypeScript, React Native.
Typical Examples: Gemini Flash, Qwen 3.6 Plus, Claude Sonnet

5. Low-cost — The Economic Workers

These models offer an exceptional quality/cost ratio. Perfect for repetitive tasks, high-volume pipelines, secondary agents, and data preprocessing.

Strengths: Very low cost, speed, good general level.
Typical Examples: DeepSeek, Gemini Flash, Claude Haiku

6. Long-context — The Large Document Readers

These models handle contexts of several hundred thousand tokens. Essential for analyzing large codebases, long documents, or maintaining consistency over very long sessions.

Strengths: Extended context, consistency over long sessions.
Typical Examples: Gemini (1M tokens), Claude (200k tokens)

Evaluating a Model: The 5 Axes

For each model or use case, evaluate it on these 5 axes:

Axis	What it Measures	Questions to Ask
Speed	Generation rapidity	Do I need immediate results?
Depth	Reasoning quality	Is the problem complex or simple?
Cost	Price per token/request	What is the frequency of use?
Autonomy	Agentic capability	Should it act alone or just answer?
Creativity	Originality of outputs	Is it creative or technical work?

The 3 Classic Pitfalls

1. The Single Model

Using the same model for everything — because it's simple, because it's what you know — is the most common pitfall. It's also the most costly and least efficient in the long run.

Using GPT-5 to generate simple CSS is like taking a taxi to go buy bread.

2. Blind Benchmarking

Benchmarks measure performance under controlled conditions. They don't measure what matters: quality on your specific task, in your context, with your constraints. The best model is the one that finishes your work quickly, cleanly, with few retries.

3. "More Expensive = Better"

False. Modern low-cost models (Gemini Flash, DeepSeek, Haiku) do 80% of the work of a premium model for 10% of the price. The real skill is knowing when to pay for power and when to save.

II — 🤖 AI Agents & Ecosystems Directory

A technical and structured mapping of tools, environments, and protocols that define software engineering and automation based on autonomous agents.

1. 🛠️ AI Development Agents

Autonomous execution systems interacting directly with the code lifecycle (IDE, CLI, sandboxing environments, and Git).

Agent	Technical Description	Official Link
Claude Code	Anthropic's native CLI agent running locally in the terminal. Capable of reading the codebase, running tests, managing Git, and exploiting MCP servers.	Visit Site
OpenAI Codex CLI	Command-line interface exploiting Codex/GPT models for translating natural commands into executable scripts.	Visit Site
Gemini CLI	Command-line tool allowing direct interaction with the Gemini API for refactoring and large-context code analysis tasks.	Visit Site
Cursor Agent	Advanced agent mode integrated into the Cursor IDE, capable of autonomously planning and applying multi-file modifications.	Visit Site
Windsurf	"Agent-native" IDE orchestrating real-time collaborative workflows between the developer and the agent (Cascade).	Visit Site
Aider	Command-line programming assistant optimized for Git, allowing code editing in existing repositories with strict commit tracking.	Visit Site
Goose	Agnostic open-source agent (Block/Agentic AI Foundation). Runs directly on the machine (CLI/Desktop) to automate build or refactoring recipes without Docker friction.	Visit Site
OpenHands	(Successor to OpenDevin). Open-source platform allowing autonomous software agents to modify code and execute commands in a secure Docker sandbox.	Visit Site
Amp	Coding agent developed by Sourcegraph, leveraging their global code intelligence engine and semantic repository indexing.	Visit Site
Freebuff	Open-source community alternative focused on local codebase modification.	Visit Site
Codebuff	Assistant specialized in navigation, refactoring, and rewriting of large-scale software projects.	Visit Site
Pi Coding Agent	Recent autonomous agent designed for resolving complex architectural tickets and generating unit tests.	Visit Site

2. 🏢 AI Work Environments

Cloud workspaces and IDEs where agents collaborate synchronously or asynchronously with the user.

Claude Cowork : Anthropic's collaborative space designed to align multiple model instances towards achieving business or technical team goals. → Link
Microsoft 365 Copilot : Integration of Microsoft's agent ecosystem within the enterprise Graph for automating document and communication flows. → Link
ChatGPT (Agent Mode) : Advanced asynchronous processing and systemic tool-calling features within the OpenAI interface. → Link
Perplexity Labs : Experimentation and evaluation environment for real-time information retrieval and data grounding. → Link
Manus : Generalist interface agent capable of autonomously navigating the Web, manipulating third-party applications, and delivering end-to-end projects in the background. → Link
Lovable : Full-stack application builder (Vibe Coding) generating code, interface, and infrastructure from natural language descriptions. → Link
Bolt.new : In-browser sandboxed web development environment for designing, running, and deploying full-stack applications based on Vite and Node. → Link
Replit : Cloud platform integrating native editing and deployment agents to instantly go from prompt to production application. → Link

3. ⚙️ Agent Orchestrators

High-level frameworks managing task distribution, memory, and multi-agent collaboration.

OpenClaw : Open-source orchestration solution for efficiently deploying, configuring, and coupling autonomous agents to third-party channels (e.g., Telegram, REST API). → Link
Hermes Agent : Advanced processing agent oriented towards autonomous background task management and standardized tooled interaction. → Link
CrewAI : Orchestration framework based on specific roles (agents, tasks, tools) to simulate engineering or operational teams. → Link
LangGraph : LangChain extension for modeling agent workflows as cyclic graphs, essential for complex iterative behaviors. → Link
AutoGen : Microsoft framework facilitating the development of multi-agent systems capable of conversing with each other to solve problems. → Link
Semantic Kernel : Open-source Microsoft SDK for integrating LLMs into conventional languages like C#, Python, and Java. → Link
PydanticAI : Type-safe application framework for building production agents, ensuring strict data structure validation via Pydantic. → Link
Smolagents : Ultra-light framework developed by Hugging Face, focused on simplicity and writing native Python code by the agent to execute its actions. → Link

4. 🌐 Web Agents

Agents specialized in DOM interaction, navigation, and automation of Web processes on behalf of humans.

OpenAI Operator : Autonomous agent designed to take control of the browser or system to execute complex workflows on demand. → Link
Browser Use : Python framework for connecting any LLM to a Chromium browser for semantic interaction with web page elements. → Link
Skyvern : Solution using computer vision and LLMs to automate workflows on complex or non-API Web sites, replacing traditional scraping. → Link
Stagehand : Open-source browser automation framework built on Playwright, optimized for robust AI-guided actions. → Link
Browserbase : Cloud infrastructure platform for running, managing, and monitoring fleets of headless browsers dedicated to AI agents. → Link
Steel Browser : Managed cloud browser optimized for AI agents, including integrated navigation fingerprint and proxy management. → Link

5. 🔄 Automation & Workflows

Integration platforms connecting LLMs and agents to databases and third-party APIs via pipeline architectures.

n8n : Low-code / native-code workflow automation platform featuring advanced AI nodes. Ideal for connecting PostgreSQL, vector databases, and asynchronous data processing architectures. → Link
Make : Visual automation tool for building API integration and text data routing scenarios. → Link
Zapier : Mainstream solution for rapid interconnection of SaaS applications with basic agent calling features. → Link
Flowise : Low-code UI interface for designing and hosting LangChain-based applications and RAG-type architectures. → Link
Dify : Unified LLM application development platform combining prompt management, RAG (Retrieval-Augmented Generation), and operational orchestration. → Link
Langflow : Visual rapid prototyping environment for AI architectures based on reusable modular components. → Link

6. 🧠 Agent Building Frameworks

Fundamental libraries and SDKs for developing custom cognitive architectures.

LangChain : The pioneer framework for assembling artificial intelligence components, processing chains, and tool connectors. → Link
LlamaIndex : Framework specialized in the ingestion, indexing, and efficient queryability of heterogeneous data structures by LLMs. → Link
DSPy : Declarative programming framework replacing manual prompt engineering with an algorithmic optimization process for prompts and model weights. → Link
Haystack : Highly modular, open-source AI orchestration framework designed for building custom RAG and semantic search systems. → Link
Agno : Development framework oriented towards creating robust agents with native state management and multi-model support. → Link
Mastra : Modern TypeScript/JavaScript framework designed to easily integrate agentic features and workflow management into Node.js/Frontend Framework applications. → Link
Atomic Agents : Modular and atomic development approach, promoting the creation of highly predictable and reusable tools and subagents. → Link

7. 📡 Interoperability Protocols

The standardized layer essential for structured communication between agents, servers, and clients.

Protocol	Technical Role	Importance / Maturity
MCP (Model Context Protocol)	Open standard developed by Anthropic linking models to secure data sources (GitHub, Slack, SQL databases, Docker environments) via a unified API.	⭐⭐⭐⭐⭐
A2A (Agent-to-Agent)	Emerging specification allowing message routing, subtask delegation, and context negotiation between distinct autonomous systems.	⭐⭐⭐⭐⭐
ACP (Agent Communication Protocol)	Open standard for inter-agent messaging, ensuring formatting and integrity of distributed communications.	⭐⭐⭐⭐
OpenAPI Specification	Formal description of REST API endpoints allowing agents to dynamically generate HTTP requests without human intervention.	⭐⭐⭐⭐
JSON Schema	Strict definition of data structures (Input/Output). Crucial for constraining Structured Output of models and avoiding type analysis errors.	⭐⭐⭐⭐
OAuth 2.0	Authorization delegation framework, ensuring that agents securely access third-party resources on behalf of the user.	⭐⭐⭐⭐
SSE (Server-Sent Events)	Unidirectional streaming protocol allowing real-time reception of token streams and agent execution logs.	⭐⭐⭐
WebSocket	Persistent bidirectional communication channel for real-time state synchronization and synchronous multi-agent collaboration.	⭐⭐⭐

8. 📚 Useful Resources

Prompts : Repositories of system context configurations and structured optimization techniques (e.g., deconstruction/diagnostic approaches).
Templates : Pre-configured code skeletons to quickly initialize multi-agent architectures or MCP servers.
llms.txt : Standardized configuration file at the root of websites used to provide a clean semantic context directly assimilable by AI agent crawlers.
Benchmarks : Test protocols and performance metrics (SWE-bench, GAIA) evaluating the problem-solving capacity of agents in the real world.
Tutorials : Technical documentation and implementation guides for data process automation and engineering pipelines.

V — AGENTIC THINKING

What is an agent? (Not the marketing definition)

The word "agent" is everywhere. It is often misused. Here is the definition that matters operationally:

An AI agent is a model that can take actions in the real world — calling APIs, reading files, writing code, browsing the web, sending messages — and chain these actions autonomously to achieve a goal.

What distinguishes an agent from a simple chatbot:

It has access to tools (tools / function calling)
It can act in several steps without human intervention at each step
It can self-correct based on intermediate results
It maintains state and memory over the duration of a task

An agent doesn't "answer" — it "does".

Orchestration vs Execution: The Fundamental Distinction

The most frequent confusion in AI projects is mixing two roles that must remain separate:

Orchestration	Execution
Decides what to do	Does what is decided
Chooses the right agent for each task	Executes a specific task
Handles errors and redirects	Reports errors
Maintains the global vision	Maintains the local focus
Models: Claude Sonnet, GPT-5	Models: DeepSeek, Haiku, Gemini Flash

The classic error: using a powerful and expensive model for simple task execution. Result: exploding bill, unnecessary latency, no quality gain.

The right approach: lightweight model for execution, intelligent model for orchestration — and human for final supervision.

New Agentic Workflows

Flow 1 — Task Decomposition

Before launching anything, you decompose. Practical example:

Goal: "Create a user profile page with photo, bio, and activity history"

Design data structure (Backend API model) → Qwen
Create API endpoints → Qwen
Generate main React component → Gemini Flash
Create sub-components (photo, bio, history) → Gemini Flash
Integrate and test consistency → GLM-5 or Claude Sonnet

Each step is clear, assignable, verifiable. That is agentic decomposition.

Flow 2 — Intelligent Routing

Routing is the real-time decision: "For this specific task, which model?"

Routing criteria:

Task complexity (simple → Haiku/Flash; complex → Sonnet)
Task type (UI → Kimi/Gemini; backend → Qwen; debug → GLM-5)
Acceptable cost (repetitive task → low-cost; critical task → premium)
Required speed (real-time → Flash; reflection → Sonnet)

A good routing system can automate these decisions. But even manually, developing this reflex changes everything.

Flow 3 — Feedback Loop

Agents don't do everything right the first time. The strength is in the loop:

The agent produces a result
You (or another agent) evaluate the result
If satisfactory: move to the next step
If unsatisfactory: correct the prompt, relaunch, or change model

This loop short-circuits the "I send a prompt and hope" mental model. It replaces hope with control.

Flow 4 — Context Memory

A major problem with agents: they forget. Most models do not have persistent memory between sessions.

Practical solutions:

Pass the relevant context at each call ("here is where we are")
Maintain a state file that the agent can read and update
Use memory tools (vector databases, automatic summaries)
Structure short sessions with explicit checkpoints

Humans in the Loop: When to Supervise, When to Let Go

Human supervision has a cost: your time and attention. It must be reserved for moments when it adds value.

Supervise Actively	Let It Run
Irreversible decisions	Repetitive and tested tasks
First execution of a flow	Stable pipelines with logs
Public or client outputs	Internal preprocessing
Large amounts / sensitive data	Low-value classification / extraction
New agents / tools	Agents already validated on hundreds of cases

The golden rule: supervise until you have confidence. Let go as soon as you have reliable quality metrics.

Agentic Anti-patterns: Errors to Avoid

Anti-pattern 1 — Too Much Autonomy Too Soon

Giving an agent access to critical systems before validating its behavior on simple cases. Result: poorly executed irreversible actions.

Rule: always start in "read-only" mode, then grant permissions progressively.

Anti-pattern 2 — Poorly Managed Context

Launching an agent on a long task without passing it the relevant history. It "forgets" the beginning, producing inconsistent outputs.

Rule: always include the minimum necessary context — neither too much (context pollution) nor too little (loss of consistency).

Anti-pattern 3 — Exploding Cost

Using a premium model for all steps of a pipeline, including the simplest ones. Result: bill ×10 without quality gain.

Rule: profile each step, assign the cheapest model that does the work well.

Anti-pattern 4 — Too Vague Prompt

"Do something interesting with this data." Agents don't handle ambiguity as well as humans. Result: random outputs, looping retries.

Rule: be as precise as you would be with a junior collaborator — expected format, constraints, examples if possible.

Anti-pattern 5 — No Error Handling

A pipeline that doesn't foresee what happens when an agent fails. It crashes, nothing continues.

Rule: always provide a fallback — another model, a degraded output, a human alert.

V — 💭 AGENTIC THINKING

To move from theory to production without obstacles, implementing agentic thinking on your machine (via your configuration files like CLAUDE.md or .clauderc) must follow a strict 6-step protocol. This workflow transforms a simple chatbot into an autonomous and reliable software engineer.

1. Plan Node Default (Planning Mode)

Before any modification, the agent isolates itself in a planning node. It maps the tree structure, inspects dependencies, and lists impacted files. A written action plan is produced and submitted for validation before execution.

Before writing or modifying a single line of code, you must mandatorily open a planning phase. 
Analyze the existing tree, read the necessary files, and write a structured action plan in list form. 
Wait for my explicit validation before moving to execution.

2. Subagent Strategy

To avoid context overload, the main agent delegates to specialized subagents. Each subagent handles a targeted task (tests, parsing, UI), ensuring precision and modularity.

For any complex task involving more than 3 files or distinct technologies (e.g., Frontend + Backend), 
behave like an orchestrator. Decompose the work and generate ultra-targeted instructions (micro-prompts) 
to guide your subagents or your own future iterations in an isolated way.

3. Self-Improvement Loop

The agent rereads and critiques its own code before submitting it. It looks for security flaws, duplications, unnecessary complexity, and missing typings. Corrections are automatically applied in this short loop.

Once the code is written, apply an automatic critical review before presenting it to me. 
Analyze your own proposal for: security flaws, duplication (DRY), unnecessary complexity (KISS), and missing typings. 
Correct your own errors invisibly in this phase.

4. Verification Before Done (Systematic Verification)

A task is only validated after executing unit tests and the production build. Without complete success, the task remains open.

You are formally prohibited from declaring a task as finished or asking me to test if you have not yourself executed 
the project's tests and the production build in the terminal. 
The success of these commands is the only acceptable validation criterion.

5. Demand Elegance (Balanced Elegance Requirement)

Code must remain simple, robust, and readable. No over-engineering or heavy frameworks if a native solution suffices. Elegance takes precedence over gratuitous complexity.

Constantly strive for elegance and architectural simplicity. 
Never propose over-engineering or heavy frameworks if a native or simple solution is suitable. 
The code must be minimal, modern, documented on the 'why', and human-readable.

6. Autonomous Bug Fixing

In case of test or build failure, the agent analyzes the logs, isolates the bug, and proposes a fix. It restarts the modification loop without requesting human help, except for persistent blocking.

If a test or build command fails at step 4, do not interrupt your execution to ask me for help. 
Immediately analyze the terminal's error logs, locate the faulty line, issue a new hypothesis, 
and correct the course autonomously.

VI — READING LEVELS

This guide is designed to be read and reread as you progress. Here is how to approach it based on your current level.

🟢 Beginner Level — Where to Start

You are discovering AI models or have just started using them in your workflow.

What to Remember

There is no single "best model" — there are models adapted to specific uses
Start with Gemini 2.5 Flash for the majority of your coding tasks
Add Qwen 3.6 Plus as soon as you work on Python/Backend API backend
Use Claude Sonnet when you are stuck on something really difficult

Minimal Setup to Get Started

Gemini Flash → your default daily model
Qwen → your backend model
A premium model (Claude Sonnet or GPT-5) → your safety net

What You Don't Need to Understand Yet

Multi-agent orchestration — that will come later
Automatic routing — start by doing it manually
Complex pipelines — first validate simple cases

🟡 Intermediate Level — Combining Models

You already use several models but intuitively, not yet systematically.

What to Integrate

Develop the "which model for this specific task" reflex before each session
Apply stack patterns (IV) rather than choosing case by case
Start decomposing large tasks into assignable sub-tasks
Set up short feedback loops

Key Skills to Develop

Write precise prompts with context, expected format, and constraints
Recognize when a model should be changed (disappointing results → change model)
Manage context manually between sessions

First Agent to Build

A simple agent that takes a React component specification, breaks it down into steps, and generates each part with the right model. Nothing complex — but it forces thinking in flows.

🔴 Advanced Level — Orchestration and Architectures

You master the basics and want to build robust agentic systems.

What to Build

An automatic routing system based on task type and complexity
Pipelines with error management, fallbacks, and logs
A persistent memory layer (vector database or structured state file)
Automated quality metrics to evaluate agent outputs

Architectures to Explore

Hierarchical agents: an orchestrator + specialized workers
Parallel agents: several agents on independent sub-tasks simultaneously
Self-correcting agents: validation loop integrated into each agent
Human-in-the-loop: supervision points automatically triggered on uncertain cases

The Central Question at This Level

How to build a system that remains reliable as it gains autonomy? The answer: tests, metrics, logs, and progressive supervision.

Progression Table

Level	Main Skill	Typical Setup	Next Step
🟢 Beginner	Choosing the right model by use case	Gemini + Qwen + 1 premium	Apply a stack pattern
🟡 Intermediate	Combining models, decomposing tasks	SaaS or Low-cost stack	Build a first simple agent
🔴 Advanced	Orchestration, routing, robust pipelines	Economic Elite Setup	Multi-agent architecture with metrics

ANNEXES

A. Glossary

Key terms in this guide, defined without unnecessary jargon.

AI Agent: AI model capable of taking autonomous actions in the real world using tools, chaining several steps, and self-correcting.
Context window: The maximum amount of text a model can process at once. A 1M token context can analyze an entire novel at once. Important for long projects.
Fine-tuning: The process of additional training of a model on specific data to improve its performance in a precise field.
Hallucination: When a model produces false information with confidence. Frequent on precise facts, dates, and names. Always check for critical content.
Orchestration: The coordination of several agents or models to accomplish a complex task. The orchestrator decides who does what, and in what order.
RAG (Retrieval-Augmented Generation): A technique that allows a model to fetch information from a database before answering. Reduces hallucinations and allows using recent data.
Routing: The decision to send a task to a specific model according to its characteristics. Can be manual (you decide) or automatic (a system decides).
System prompt: Instruction given to the model upstream of the conversation to define its role, tone, and constraints. Very powerful for customizing behavior.
Temperature: A parameter that controls the model's level of creativity/randomness. 0 = deterministic and predictable. 1+ = creative and varied. For code: keep low. For creativity: increase.
Token: The basic unit that models process. About 0.75 words in English. Model cost is calculated in tokens. 1000 tokens ≈ 750 words.
Tool calling (function calling): A model's ability to call external functions or APIs — search the web, read a file, send an email. The fundamental building block of agents.

B. Quick Decision Table

To quickly choose the right model according to the situation:

Situation	Recommended Model	Reason
Standard React/Frontend Framework component	Gemini Flash	Speed + frontend quality
Complex Backend API model	Qwen 3.6 Plus	Excellent Python/ORM
Inexplicable bug	GLM-5 or Claude Sonnet	Deep reasoning
Creative landing page	Kimi K2.5	Visual creativity
Automated pipeline	DeepSeek V4	Ultra-low cost, tool calling
Small CSS correction	Claude Haiku	Fast and cheap
Critical system architecture	Claude Sonnet / GPT-5	Maximal intelligence
Long session (100k+ tokens)	Gemini or Claude	Large context
Massive multi-file refactoring	Claude Sonnet	Consistency over large context
High-volume test/classification	DeepSeek or Haiku	Volume + cost

C. This Guide is Alive

The AI model market is evolving fast. A model recommended today may be outdated in six months. A new competitor can emerge overnight.

This guide must be updated regularly. The principles (agentic thinking, orchestration, routing, stack patterns) remain stable. Specific model recommendations evolve.

Treat it as a living system: note your own observations, add your use cases, and invalidate what no longer corresponds to your reality.

The best guide is the one you adapt to your reality.