Skip to content

AutoGen and MCP: Building Powerful Multi-Agent Systems

Introduction

In today's rapidly evolving artificial intelligence landscape, applications of Large Language Models (LLMs) have evolved from simple conversational systems to complex agent-based architectures. Microsoft's AutoGen framework, a powerful tool for building multi-agent systems, is leading innovation in this field. This article delves into a key component of the AutoGen framework—the Model Context Protocol (MCP)—and explores how it can be leveraged to build powerful multi-agent systems.

Introduction to AutoGen

AutoGen is an open-source framework developed by Microsoft, designed to simplify the process of building multi-agent systems based on large language models. It provides a flexible set of tools and APIs that enable developers to create networks of agents that can collaborate with each other, with each agent performing specific tasks or playing specific roles.

The core advantages of AutoGen include:

  1. Multi-agent collaboration: Supports complex interactions and collaboration between multiple agents
  2. Tool utilization capabilities: Agents can use various tools to extend their capabilities
  3. Flexible conversation flow: Supports customized conversation flows and control logic
  4. Scalability: Easy integration of new models and tools

What is MCP (Model Context Protocol)?

The Model Context Protocol (MCP) is an open standard designed to unify how AI models interact with external tools and services. In the AutoGen framework, MCP serves as a bridge connecting agents with external tools.

The core philosophy of MCP is to provide a standardized protocol that allows AI models (such as large language models) to interact consistently with various external services and tools. These tools can be local command-line utilities, remote API services, or even other AI systems.

Key Features of MCP

  1. Standardized interfaces: Provides unified tool invocation and response formats
  2. Multiple communication methods: Supports standard input/output (STDIO) and Server-Sent Events (SSE) communication
  3. Tool discovery mechanism: Allows dynamic discovery and use of available tools
  4. Session management: Supports maintaining session state for tool calls

MCP Implementation in AutoGen

In the AutoGen framework, MCP support is provided through the autogen_ext.tools.mcp module. This module offers various components that make it easy for developers to integrate MCP-compatible tools into AutoGen agents.

Core Components

  1. McpWorkbench: Wraps an MCP server and provides an interface to list and call tools provided by the server
  2. StdioMcpToolAdapter: Allows interaction with MCP tools via standard input/output
  3. SseMcpToolAdapter: Allows interaction with MCP tools that support Server-Sent Events (SSE) over HTTP
  4. McpSessionActor: Manages sessions with MCP servers

Configuration Parameters

MCP tool adapters require specific server parameters to establish connections:

  1. StdioServerParams: Parameters for connecting to an MCP server via standard input/output

    • command: The command to execute
    • args: Command arguments
    • env: Environment variables
    • read_timeout_seconds: Read timeout duration
  2. SseServerParams: Parameters for connecting to an MCP server via HTTP/SSE

    • url: Server URL
    • headers: HTTP headers
    • timeout: Connection timeout
    • sse_read_timeout: SSE read timeout

Real-world Case: Building a Multi-source Information Retrieval System

Let's explore a practical case that demonstrates how to use AutoGen and MCP to build a system capable of retrieving information from multiple sources (GitHub, Jira, and Confluence).

System Architecture

The system consists of three main components:

  1. Search Agent: Responsible for retrieving relevant information from multiple sources
  2. Summary Agent: Responsible for processing and summarizing the retrieved information
  3. User Proxy: Represents the user in interactions with other agents

The system uses MCP tools to connect to GitHub and Atlassian (Jira and Confluence) services, enabling agents to access information on these platforms.

Code Implementation

python
from typing import Sequence
import os
import asyncio
from autogen_ext.tools.mcp import StdioServerParams, mcp_server_tools
from autogen_ext.models.openai import AzureOpenAIChatCompletionClient
from autogen_agentchat.messages import BaseAgentEvent, BaseChatMessage
from autogen_agentchat.agents import AssistantAgent, UserProxyAgent
from autogen_agentchat.teams import SelectorGroupChat
from autogen_agentchat.conditions import TextMentionTermination
from autogen_agentchat.ui import Console


def get_model_client() -> AzureOpenAIChatCompletionClient:
    return AzureOpenAIChatCompletionClient(
        azure_deployment=os.getenv("AZURE_OPENAI_MODEL_DEPLOYMENT"),
        api_version=os.getenv("AZURE_OPENAI_API_VERSION"),
        azure_endpoint=os.getenv("AZURE_OPENAI_ENDPOINT"),
        api_key=os.getenv("AZURE_OPENAI_API_KEY"),
        model="gpt-4o"
    )

search_agent_prompt = """
Please search for as much related information as possible from Jira, Confluence and Github using the tools provided to you, 
based on the user's question. Return the information you have found without any processing or comments. 
If no information is found, reply with "Sorry, I couldn't find any relevant information on this issue.
"""

summary_agent_prompt = """
You are an AI assistant. Please help answer the user's question according to Search_Agent's response.
"""

async def main() -> None:
    github_server_params = StdioServerParams(
            command="docker",
            args=[
                "run",
                "-i",
                "--rm",
                "-e",
                "GITHUB_PERSONAL_ACCESS_TOKEN",
                "-e",
                "GH_HOST",
                "ghcr.io/github/github-mcp-server"
            ],
            env={
                "GITHUB_PERSONAL_ACCESS_TOKEN": os.getenv("GITHUB_PERSONAL_ACCESS_TOKEN"),
                "GH_HOST": os.getenv("GH_HOST")
            }
    )
    github_tools = await mcp_server_tools(github_server_params)
    
    atl_server_params = StdioServerParams(
        command="uv",
        args=[
            'run',
            'mcp-atlassian',
            '-v',
            '--jira-url',
            os.getenv("JIRA_HOST"),
            '--jira-personal-token',
            os.getenv("JIRA_PERSONAL_TOKEN"),
            '--confluence-url',
            os.getenv("CONFLUENCE_HOST"),
            '--confluence-personal-token',
            os.getenv("CONFLUENCE_PERSONAL_TOKEN"),
        ],
    )

    atl_tools = await mcp_server_tools(atl_server_params)

    # Create an agent that can use the tools
    search_agent = AssistantAgent(
        name="search_agent",
        model_client=get_model_client(),
        tools=github_tools + atl_tools,
        system_message=search_agent_prompt,
    )

    summary_agent = AssistantAgent(
        name="summary_agent",
        model_client=get_model_client(),
        system_message=summary_agent_prompt,
    )
        
    user_proxy = UserProxyAgent("user", input_func=input)

    # Create the termination condition which will end the conversation when the user says "Exit".
    termination = TextMentionTermination("Exit")

    def selector_func(messages: Sequence[BaseAgentEvent | BaseChatMessage]) -> str | None:
        if messages[-1].source == "user":
            return search_agent.name
        elif messages[-1].source == search_agent.name:
            return summary_agent.name
        elif messages[-1].source == summary_agent.name:
            return user_proxy.name
        return user_proxy.name

    team = SelectorGroupChat(
        [search_agent, summary_agent, user_proxy],
        model_client=get_model_client(),
        termination_condition=termination,
        selector_func=selector_func,
        allow_repeated_speaker=False,  # Allow an agent to speak multiple turns in a row.
    )

    task = "what is Marvin?"

    await Console(team.run_stream(task=task))


if __name__ == "__main__":
    asyncio.run(main())

Code Analysis

  1. MCP Server Configuration:

    • Using StdioServerParams to configure GitHub and Atlassian MCP servers
    • Passing authentication information and server addresses through environment variables
  2. Tool Acquisition:

    • Using the mcp_server_tools function to obtain available tools from MCP servers
    • Combining GitHub and Atlassian tools into a single tool list
  3. Agent Creation:

    • Creating a search agent and assigning all MCP tools to it
    • Creating a summary agent responsible for processing search results
    • Creating a user proxy to handle user input
  4. Conversation Flow Control:

    • Using selector_func to define the interaction sequence between agents
    • Implementing a simple workflow: user question → search agent retrieval → summary agent summarization → return to user
  5. Termination Condition:

    • Using TextMentionTermination to define conversation termination conditions

Advanced MCP Application Scenarios

Beyond the example above, MCP can be applied to various advanced scenarios:

1. File System Operations

Using a file system MCP server, agents can perform file creation, reading, writing, and other operations:

python
# Set up file system MCP server parameters
desktop = str(Path.home() / "Desktop")
server_params = StdioServerParams(
    command="npx.cmd", 
    args=["-y", "@modelcontextprotocol/server-filesystem", desktop]
)

# Get all available tools
tools = await mcp_server_tools(server_params)

# Create an agent that can use these tools
agent = AssistantAgent(
    name="file_manager",
    model_client=OpenAIChatCompletionClient(model="gpt-4"),
    tools=tools,
)

2. Web Content Retrieval

Using a fetch MCP server, agents can retrieve and process web content:

python
# Get the fetch tool
fetch_mcp_server = StdioServerParams(command="uvx", args=["mcp-server-fetch"])
tools = await mcp_server_tools(fetch_mcp_server)

# Create an agent that can use the fetch tool
agent = AssistantAgent(
    name="fetcher", 
    model_client=OpenAIChatCompletionClient(model="gpt-4o"), 
    tools=tools, 
    reflect_on_tool_use=True
)

3. Web Browser Automation

Using a Playwright MCP server, agents can control web browsers to perform complex interactions:

python
params = StdioServerParams(
    command="npx",
    args=["@playwright/mcp@latest"],
    read_timeout_seconds=60,
)

async with create_mcp_server_session(params) as session:
    await session.initialize()
    tools = await mcp_server_tools(server_params=params, session=session)
    
    agent = AssistantAgent(
        name="Assistant",
        model_client=model_client,
        tools=tools,
    )

Advantages and Limitations of MCP

Advantages

  1. Standardized Interface: Provides a unified way to call tools, simplifying the integration process
  2. Diverse Tool Support: Supports various types of tools, from local command-line tools to remote API services
  3. Session Management: Supports maintaining session state for tool calls, suitable for stateful tools (like browsers)
  4. Extensibility: Easy to add new tools and services

Limitations

  1. External Service Dependency: Requires support from external MCP servers
  2. Configuration Complexity: Configuration of some tools can be relatively complex
  3. Performance Overhead: Inter-process or network communication may introduce additional latency
  4. Security Considerations: Need to carefully handle tool permissions and authentication information

Conclusion

The MCP module in AutoGen provides crucial support for building powerful multi-agent systems. Through standardized tool interfaces, developers can easily integrate various external tools and services into agent systems, greatly expanding the range of agent capabilities.

From simple file operations to complex web browser automation, MCP enables agents to interact more richly with the real world. This capability is essential for building truly useful AI applications, as it allows AI systems not only to understand user intentions but also to take concrete actions to fulfill these intentions.

As AutoGen and MCP continue to evolve, we can expect to see more innovative multi-agent applications emerge, providing more intelligent and efficient services to users across various domains.

References