Observe

See what your agent did. Plain English. Not raw traces.

The difference

Traditional Traces:
  tool_call(name="stripe_create_refund", args={"payment_intent": "pi_abc", "amount": 5000})
  tool_result(result={"id": "re_xyz789", "status": "succeeded"})

Mokra Observe:
  Agent refunded $50.00 to customer

Basic usage

After running your agent in a MockWorld, call observe():

world = mockworld(name: "Refund test", services: ["stripe", "shopify"])

world.run do
  agent.process_refund("order-1234")
end

world.observe

Output:

╭─────────────────────────────────────────────────────────────╮
│  MockWorld: Refund test                                     │
│  Duration: 1.2s | Requests: 4                               │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  GET  shopify/admin/api/2024-01/orders/1234.json            │
│  → Retrieved order #1234 ($150.00, paid)                    │
│                                                             │
│  GET  stripe/v1/payment_intents/pi_abc123                   │
│  → Retrieved payment intent for $150.00                     │
│                                                             │
│  POST stripe/v1/refunds                                     │
│  → Created full refund of $150.00                           │
│                                                             │
│  POST sendgrid/v3/mail/send                                 │
│  → Sent email to ana@example.com                            │
│    Subject: "Your refund has been processed"                │
│                                                             │
╰─────────────────────────────────────────────────────────────╯

HTTP layer visibility

Mokra operates at the HTTP layer. It catches everything:

Direct HTTP calls via requests, fetch, Net::HTTP
SDK calls (Stripe SDK, Shopify SDK, etc.)
Calls from any library or framework
Background HTTP requests
Retry attempts

What others miss

LangSmith traces LangChain callbacks. But what about:

Direct HTTP calls inside your tools?
SDK calls that bypass your tool abstraction?
HTTP calls from libraries LangChain doesn’t know about?

If it hits the network, Mokra sees it.

Options

Print vs return

# Print to console (default)
world.observe

# Return as string for logging
log = world.observe(print: false)
Rails.logger.info(log)

Filter by service

# Only show Stripe observations
world.observe(service: "stripe")

# Only show a specific mock server
world.observe(mock_server_id: "ms_abc123")

Programmatic access

Access observations as data:

observations = world.observations

observations.each do |obs|
  puts "#{obs.method} #{obs.path}"
  puts "  Impact: #{obs.description}"
  puts "  Status: #{obs.status_code}"
end

# Filter
refunds = observations.select { |o| o.path.include?("refund") }
errors = observations.select { |o| o.status_code >= 400 }

Debugging agent behavior

When an AI agent misbehaves, Observe shows you what went wrong:

world.observe()

Expected:

GET  shopify/orders/1234 → Retrieved order
POST stripe/refunds → Created refund
POST sendgrid/mail/send → Sent confirmation

Actual (bug!):

GET  shopify/orders/1234 → Retrieved order
GET  shopify/orders/1234 → Retrieved order (DUPLICATE)
GET  shopify/orders/1234 → Retrieved order (DUPLICATE)
POST stripe/refunds → Created refund
POST stripe/refunds → Created refund (DUPLICATE!)
POST sendgrid/mail/send → Sent confirmation
POST sendgrid/mail/send → Sent confirmation (DUPLICATE!)

Instantly visible: the agent is stuck in a loop, creating duplicate refunds.

Best practices

1. Always observe before asserting

world.run { ... }
world.observe  # See what happened first
world.assert(...) # Then assert on it

2. Log observations in CI

# Even in passing tests, log observations
Rails.logger.info(world.observe(print: false))

3. Use observe to debug failures

begin
  world.assert("exactly one refund created")
rescue AssertionError => e
  puts "Assertion failed. Here's what happened:"
  world.observe
  raise
end

MockWorld Tests

Observe

Observe

The difference

Basic usage

HTTP layer visibility

What others miss

Options

Print vs return

Filter by service

Programmatic access

Debugging agent behavior

Best practices

1. Always observe before asserting

2. Log observations in CI

3. Use observe to debug failures

Next steps

Assert

AI Agents Guide

MockWorld Tests

​Observe

​The difference

​Basic usage

​HTTP layer visibility

​What others miss

​Options

​Print vs return

​Filter by service

​Programmatic access

​Debugging agent behavior

​Best practices

​1. Always observe before asserting

​2. Log observations in CI

​3. Use observe to debug failures

​Next steps

Assert

AI Agents Guide

Observe

The difference

Basic usage

HTTP layer visibility

What others miss

Options

Print vs return

Filter by service

Programmatic access

Debugging agent behavior

Best practices

1. Always observe before asserting

2. Log observations in CI

3. Use observe to debug failures

Next steps