LangGraph — Testing

Part 12. Testing LangGraph applications — FakeListChatModel, graph assertions, no-network discipline, and fixture patterns.

The testing setup

Test the graph, not the model. Use FakeListChatModel to simulate model responses without hitting the API:

import pytest
from langchain_core.messages import AIMessage, HumanMessage, ToolCall
 
# Build graph once
@pytest.fixture
def graph():
    return builder.compile(checkpointer=MemorySaver())
 
# Fake model that calls a tool
@pytest.fixture
def fake_with_tool_call():
    return FakeListChatModel(responses=[
        AIMessage(content="", tool_calls=[
            ToolCall(name="get_weather", args={"city": "Tokyo"}, id="call-1"),
        ]),
        AIMessage(content="It's sunny in Tokyo."),
    ])

Test 1: the agent calls a tool

def test_calls_get_weather_tool(graph, fake_with_tool_call):
    graph = build_graph(llm=fake_with_tool_call, tools=[get_weather])
 
    result = graph.invoke(
        {"messages": [HumanMessage(content="What's the weather in Tokyo?")]},
        config={"configurable": {"thread_id": "test-1"}},
    )
 
    # Check that a tool was called
    tool_calls = [
        m.tool_calls[-1]
        for m in result["messages"]
        if hasattr(m, "tool_calls") and m.tool_calls
    ]
    assert any(tc.name == "get_weather" for tc in tool_calls)

Test 2: the tool result is in the next model call

def test_tool_result_feeds_back_to_model(graph, fake_with_tool_call):
    graph = build_graph(llm=fake_with_tool_call, tools=[get_weather])
 
    result = graph.invoke(
        {"messages": [HumanMessage(content="What's the weather?")]},
    )
 
    # ToolMessage was appended
    tool_messages = [m for m in result["messages"] if isinstance(m, ToolMessage)]
    assert len(tool_messages) == 1
    assert "Tokyo" in tool_messages[0].content

Test 3: without tool calls, the agent finishes

def test_no_tool_calls_ends(graph):
    fake = FakeListChatModel(responses=[
        AIMessage(content="The weather is sunny."),
    ])
    graph = build_graph(llm=fake, tools=[get_weather])
 
    result = graph.invoke(
        {"messages": [HumanMessage(content="Hello")]},
    )
 
    # No tool calls made
    tool_calls = [
        m.tool_calls[-1]
        for m in result["messages"]
        if hasattr(m, "tool_calls") and m.tool_calls
    ]
    assert len(tool_calls) == 0
    # Final message is the AI response
    assert isinstance(result["messages"][-1], AIMessage)

Test 4: checkpointer persists across calls

def test_checkpointer_persistence(graph):
    config = {"configurable": {"thread_id": "test-persist"}}
 
    # First call
    graph.invoke(
        {"messages": [HumanMessage(content="Hi")]},
        config=config,
    )
 
    # Second call — should resume with history
    result = graph.invoke(
        {"messages": [HumanMessage(content="What did I just say?")]},
        config=config,
    )
 
    # The agent saw the history
    assert len(result["messages"]) > 2

Test 5: routing conditional edges

def test_routing_to_approval(graph, fake_destructive_tool):
    graph = build_graph(llm=fake_destructive_tool, tools=DESTRUCTIVE_TOOLS)
 
    result = graph.invoke(
        {"messages": [HumanMessage(content="delete cluster cl-001")]},
        config={"configurable": {"thread_id": "test-route"}},
    )
 
    # Graph paused with pending_approval
    assert result.get("pending_approval") is not None
    assert result["pending_approval"]["tool"] == "delete_cluster"

`FakeListChatModel` patterns

Infinite responses (for multi-step tests)

fake = FakeListChatModel(
    responses=itertools.repeat(AIMessage(content="ok")),
    cycle=True,
)

`usage_metadata` on responses

fake = FakeListChatModel(responses=[
    AIMessage(content="hi", usage_metadata={
        "input_tokens": 5,
        "output_tokens": 3,
        "total_tokens": 8,
    }),
])

Streaming chunks

for chunk in fake.stream("hi"):
    print(chunk.content, end="")
# "hi" — one chunk, the full response

No-network discipline

# conftest.py
@pytest.fixture(autouse=True)
def block_network(monkeypatch):
    import socket
    def refuse(*args, **kwargs):
        raise RuntimeError("Network call in test!")
    monkeypatch.setattr(socket, "socket", refuse)

Or with pytest-socket:

[tool.pytest.ini_options]
addopts = "--disable-socket"

This catches accidental LiteLLM calls before they cost money.

Fixture patterns

Reset checkpointer per test

@pytest.fixture
def graph():
    return builder.compile(checkpointer=MemorySaver())

Each test gets a fresh in-memory checkpointer.

Per-module graph (shared, not reset)

# conftest.py
@pytest.fixture(scope="module")
def graph():
    return builder.compile()

Shared across tests in the module. Faster (no re-compile) but tests must not pollute each other’s state.

`FakeListChatModel` per test

@pytest.fixture
def fake_tool_call():
    return FakeListChatModel(responses=[
        AIMessage(content="", tool_calls=[
            ToolCall(name="get_weather", args={"city": "Tokyo"}, id="call-1"),
        ]),
        AIMessage(content="It's sunny."),
    ])

Common pitfalls

FakeListChatModel exhausted — add cycle=True or enough responses for all the model calls in your test.
tool_call_id mismatch — when manually building ToolMessages, the tool_call_id must match the ToolCall.id. Let ToolNode handle this.
block_network doesn’t catch urllib3 direct sockets. Use pytest-socket for more thorough blocking.
checkpointer shared across tests — if you share the graph fixture, the checkpointer accumulates state. Use MemorySaver per test or clean it between tests.
async tests need @pytest.mark.asyncio. Don’t forget the marker for async graph calls.

cloudnative wiki

Explorer

LangGraph — Testing

The testing setup

Test 1: the agent calls a tool

Test 2: the tool result is in the next model call

Test 3: without tool calls, the agent finishes

Test 4: checkpointer persists across calls

Test 5: routing conditional edges

`FakeListChatModel` patterns

Infinite responses (for multi-step tests)

`usage_metadata` on responses

Streaming chunks

No-network discipline

Fixture patterns

Reset checkpointer per test

Per-module graph (shared, not reset)

`FakeListChatModel` per test

Common pitfalls

See also

Graph View

Table of Contents

Backlinks

cloudnative wiki

Explorer

LangGraph — Testing

The testing setup

Test 1: the agent calls a tool

Test 2: the tool result is in the next model call

Test 3: without tool calls, the agent finishes

Test 4: checkpointer persists across calls

Test 5: routing conditional edges

FakeListChatModel patterns

Infinite responses (for multi-step tests)

usage_metadata on responses

Streaming chunks

No-network discipline

Fixture patterns

Reset checkpointer per test

Per-module graph (shared, not reset)

FakeListChatModel per test

Common pitfalls

See also

Graph View

Table of Contents

Backlinks

`FakeListChatModel` patterns

`usage_metadata` on responses

`FakeListChatModel` per test