Skip to content

OpenManus Technical Analysis: Architecture and Implementation of an Open-Source Agent Framework

OpenManus is an open-source agent framework that provides functionality similar to Manus but without requiring an invite code, allowing developers and users to easily build and use powerful AI agents. This article provides a comprehensive analysis of OpenManus from its architecture design to core components, usage methods, and technical features.

1. Project Overview

OpenManus was developed by a team of developers from the MetaGPT community, including @mannaandpoem, @XiangJinyu, @MoshiQAQ, and others. The project focuses on building a universal agent framework that enables users to complete complex tasks through simple instructions, including but not limited to programming, information retrieval, file processing, and web browsing.

2. Architecture Design

OpenManus adopts a modular architecture design, which consists of the following core modules:

2.1 Agent Layer

The Agent layer is the core of OpenManus, responsible for coordinating various components and executing user instructions. It includes the following types of agents:

  • ToolCallAgent: Base agent class that handles the abstraction of tool/function calls
  • PlanningAgent: Planning agent responsible for creating and managing task execution plans
  • ReActAgent: Reactive agent that implements the think-act-observe loop pattern
  • Manus: Main agent implementation, inheriting from ToolCallAgent, serving as the direct entry point for user interaction

The agent layer follows a hierarchical inheritance relationship, with each agent type having specific responsibilities and extension capabilities:

ReActAgent --> ToolCallAgent --> PlanningAgent --> Manus

2.2 Tool Layer

The tool layer provides a rich set of tools that enable the agent to interact with external systems. Major tools include:

  • PythonExecute: Executes Python code
  • BrowserUseTool: Browser operation tool, supporting webpage navigation, element interaction, etc.
  • GoogleSearch: Google search tool
  • FileSaver: File saving tool
  • PlanningTool: Planning management tool
  • CreateChatCompletion: Tool for creating chat completions
  • Terminate: Tool for terminating execution

These tools are defined through a unified interface (BaseTool), ensuring that agents can consistently call and use them.

2.3 Prompt Layer

The prompt layer defines system prompts and instruction templates for interaction with the LLM, including:

  • System Prompts: Define the agent's role and capability scope
  • Step Prompts: Guide the agent on how to use available tools to perform tasks
  • Planning Prompts: Used for task decomposition and plan creation

2.4 LLM Interaction Layer

This layer is responsible for communicating with large language models, supporting different LLM providers and model configurations. Core functionalities include:

  • Model configuration management
  • Tool/function calling interface
  • Response parsing and processing

3. Core Process Analysis

The workflow of OpenManus can be summarized in the following key steps:

3.1 Task Planning Process

  1. User inputs task request
  2. Initial plan creation (create_initial_plan)
  3. Task decomposition into executable steps
  4. Generation of structured plan (including goals, steps, and status tracking)

3.2 Task Execution Process

  1. Get current active step (_get_current_step_index)
  2. Thinking phase (think): Evaluate current status and decide next action
  3. Action phase (act): Execute selected tools and record results
  4. Update plan status (update_plan_status)
  5. Repeat the above steps until the plan is completed or maximum step count is reached

3.3 Tool Calling Mechanism

  1. Parse tool call parameters
  2. Execute the tool and obtain results
  3. Add results to the agent's memory
  4. Handle status changes for special tools (such as termination)

4. Key Component Analysis

4.1 PlanningAgent

PlanningAgent is a core agent type in OpenManus, specifically responsible for task planning and execution. Its main functions include:

  • Plan Creation and Management: Creating structured plans based on user input
  • Step Tracking: Recording the execution status of each step
  • Progress Management: Automatically marking the current active step and updating completion status

The key feature is its ability to maintain task execution context, ensuring that complex tasks can be completed in sequence while providing clear execution status feedback.

4.2 ToolCallAgent

ToolCallAgent implements the fundamental mechanism for tool calling. Its main responsibilities include:

  • Tool Selection: Deciding which tools to use to complete tasks
  • Parameter Parsing: Processing tool parameters in JSON format
  • Result Processing: Formatting tool execution results
  • Error Handling: Providing robust error capture and reporting mechanisms

The design of ToolCallAgent allows OpenManus to flexibly extend new tools while maintaining a unified calling interface.

4.3 BrowserUseTool

BrowserUseTool is a powerful tool that allows agents to interact with web browsers. Its main functions include:

  • Web Navigation: Visiting specified URLs
  • Element Interaction: Clicking, inputting text
  • Content Extraction: Getting HTML, text, and links
  • JavaScript Execution: Running JS code in pages
  • Tab Management: Creating, switching, and closing tabs

This tool is implemented based on the browser-use library, providing agents with rich web interaction capabilities.

4.4 PythonExecute

The PythonExecute tool allows agents to execute Python code, enabling them to:

  • Perform data processing tasks
  • Conduct system operations
  • Create and modify files
  • Call other Python libraries and APIs

This makes OpenManus capable of executing programming tasks, which is a key component of its universal capabilities.

5. Usage Guide

5.1 Installation and Configuration

OpenManus provides two installation methods:

  1. Using conda:

    bash
    conda create -n open_manus python=3.12
    conda activate open_manus
    git clone https://github.com/mannaandpoem/OpenManus.git
    cd OpenManus
    pip install -r requirements.txt
  2. Using uv (Recommended):

    bash
    curl -LsSf https://astral.sh/uv/install.sh | sh
    git clone https://github.com/mannaandpoem/OpenManus.git
    cd OpenManus
    uv venv
    source .venv/bin/activate
    uv pip install -r requirements.txt

5.2 Configuring LLM

OpenManus requires LLM API configuration to work properly:

  1. Create a configuration file:

    bash
    cp config/config.example.toml config/config.toml
  2. Edit config.toml to add API keys:

    toml
    [llm]
    model = "gpt-4o"
    base_url = "https://api.openai.com/v1"
    api_key = "sk-..."  # Replace with your actual API key
    max_tokens = 4096
    temperature = 0.0

5.3 Basic Usage

Starting OpenManus requires just one command:

bash
python main.py

Then input your requests through the terminal, such as:

  • "Create a simple website calculator"
  • "Find the latest research on climate change"
  • "Help me optimize this Python code"

OpenManus will automatically plan tasks and call the appropriate tools to complete the request.

6. Code Analysis

6.1 Main Entry (main.py)

python
async def main():
    agent = Manus()
    while True:
        try:
            prompt = input("Enter your prompt (or 'exit'/'quit' to quit): ")
            prompt_lower = prompt.lower()
            if prompt_lower in ["exit", "quit"]:
                logger.info("Goodbye!")
                break
            if not prompt.strip():
                logger.warning("Skipping empty prompt.")
                continue
            logger.warning("Processing your request...")
            await agent.run(prompt)
        except KeyboardInterrupt:
            logger.warning("Goodbye!")
            break

The main entry is very concise, creating a Manus agent instance, then looping to receive user input and passing it to the agent for processing.

6.2 Manus Agent Implementation

python
class Manus(ToolCallAgent):
    name: str = "Manus"
    description: str = "A versatile agent that can solve various tasks using multiple tools"
    system_prompt: str = SYSTEM_PROMPT
    next_step_prompt: str = NEXT_STEP_PROMPT
    available_tools: ToolCollection = Field(
        default_factory=lambda: ToolCollection(
            PythonExecute(), GoogleSearch(), BrowserUseTool(), FileSaver(), Terminate()
        )
    )

The Manus agent inherits from ToolCallAgent and configures system prompts and available tool sets.

6.3 Tool Execution Method (execute_tool)

python
async def execute_tool(self, command: ToolCall) -> str:
    if not command or not command.function or not command.function.name:
        return "Error: Invalid command format"

    name = command.function.name
    if name not in self.available_tools.tool_map:
        return f"Error: Unknown tool '{name}'"

    try:
        # Parse arguments
        args = json.loads(command.function.arguments or "{}")
        
        # Execute tool
        logger.info(f"🔧 Activating tool: '{name}'...")
        result = await self.available_tools.execute(name=name, tool_input=args)
        
        # Format result
        observation = (
            f"Observed output of cmd `{name}` executed:\n{str(result)}"
            if result
            else f"Cmd `{name}` completed with no output"
        )
        
        # Handle special tools
        await self._handle_special_tool(name=name, result=result)
        
        return observation
    except Exception as e:
        # Error handling
        error_msg = f"⚠️ Tool '{name}' encountered a problem: {str(e)}"
        logger.error(error_msg)
        return f"Error: {error_msg}"

This method demonstrates how OpenManus executes tool calls, including parameter parsing, error handling, and result formatting.

7. Project Roadmap

According to the project roadmap, OpenManus plans to implement the following features in the future:

  1. Enhance Planning capabilities and optimize task decomposition and execution logic
  2. Introduce standardized evaluation metrics (based on GAIA and TAU-Bench)
  3. Expand model adaptation and optimize low-cost application scenarios
  4. Implement containerized deployment to simplify installation and usage workflows
  5. Enrich example libraries with more practical cases
  6. Develop frontend/backend to improve user experience

8. Conclusion

OpenManus is a well-designed open-source agent framework that achieves powerful task planning and execution capabilities through its modular architecture and flexible tool integration mechanism. Its core advantages include:

  1. Flexible Agent System: From basic tool-calling agents to complex planning agents, providing implementations of agents at different levels
  2. Rich Tool Set: Built-in Python execution, browser interaction, search, and other tools to meet diverse task requirements
  3. Structured Task Planning: Ability to automatically decompose complex tasks and track execution progress
  4. Open Source: Allowing community contributions and extensions, continuously evolving

OpenManus provides developers with a powerful platform that makes building complex AI agents simple and intuitive, making it an ideal choice for AI application development. Whether researchers, developers, or ordinary users, everyone can benefit from OpenManus by creating their own intelligent assistants to solve various problems.

For developers who want to learn more or contribute to the project, it is recommended to check the official GitHub repository and join the community discussion group to jointly promote the development of OpenManus.

Last updated: