Monitoring Poor User Experiences with AI through Braintrust and Slack Alerts

Evin Callahan

11 Nov 2025 - 06 Mins read

AI failures don't look like traditional software failures. Instead of throwing a 500 error or crashing, they fail silently by providing incorrect information, missing context, or giving unhelpful responses.

Your users might be struggling with your chatbot right now, and you'd never know until they complain—or worse, just leave.

This is the fundamental challenge of AI monitoring: how do you know when your chatbot is failing users?

The answer lies in proactive monitoring with intelligent scoring. In this guide, we'll build a complete monitoring pipeline that:

Captures every conversation your users have with your chatbot
Runs automated "scorers" to detect problematic interactions
Sends real-time alerts to Slack when issues arise
Gives your team the context they need to investigate and fix problems

The Problem: Silent AI Failures

Traditional software monitoring is straightforward. If your API returns a 500 error, you know something broke. If your database connection fails, you get an alert. The failure modes are explicit and detectable.

AI applications fail differently:

A user asks for product information and gets outdated data
Your chatbot hallucinates a feature that doesn't exist
An MCP tool call returns an error, but the LLM tries to continue anyway
The conversation goes in circles without resolving the user's question

These failures are subtle, context-dependent, and hard to detect with traditional monitoring. You need a system that understands the quality of AI interactions, not just whether the code executed successfully.

The Solution: Braintrust + Hookdeck + Slack

We'll use three tools to build our monitoring pipeline:

Braintrust - Captures conversation logs, runs scorers, and triggers alerts (free to start)
Hookdeck - Transforms webhook payloads into Slack-compatible messages (free to start)
Slack - Receives formatted alerts for your team to act on

The flow of this alert triggering mechanism takes place starting from Braintrust, and then weaves its way through some webhooks:

webhook flow between components

Setting Up Braintrust Logging

First, we need to capture conversations. Braintrust provides SDKs that make this trivial to add to your Next.js application and has a ton of integration guides here.

Below we'll show how to do that with a NextJS app

Install the Braintrust SDK to a NextJS app

First install the package via your package manager:

npm install braintrust
# or
pnpm add braintrust
# or
yarn add braintrust

Instrument Your Chatbot Code

Add logging to your chat endpoint or API route. In this case we wrap our call with a braintrust logging mechanism that logs the entire conversation:

// app/api/chat/route.js
import { initLogger } from "braintrust";

// Initialize the logger with your API key
const logger = initLogger({
  projectId: "uuid",
  apiKey: process.env.BRAINTRUST_API_KEY,
});

export async function POST(request) {
  const { messages, userId } = await request.json();

  // Start logging this conversation
  const span = logger.startSpan({
    name: "chat-conversation",
    input: { messages, userId },
  });

  try {
    // Your existing chatbot logic
    const response = await generateChatResponse(messages);

    // Log the successful response
    span.log({
      output: response,
      metadata: {
        userId,
        messageCount: messages.length,
      },
    });

    return Response.json(response);
  } catch (error) {
    // Log errors too
    span.log({
      output: { error: error.message },
      metadata: { userId, error: true },
    });
    throw error;
  } finally {
    span.end();
  }
}

That's it! Your conversations are now flowing into Braintrust in real-time. You can view them in the Braintrust dashboard at braintrust.dev.

What Gets Logged

Braintrust captures:

Input: The user's messages and context
Output: The chatbot's responses, including tool calls
Metadata: User IDs, timestamps, session info
Traces: Full execution traces if you're using nested spans
Scores: Results from your automated scorers (we'll set these up next)

Create a Braintrust Scorer to Detect Bad Conversations

Now that you have your messages going into Braintrust, lets score each conversation.

Scorers are the heart of your monitoring system. They're automated evaluators that analyze each conversation and assign a score from 0 to 1, where 1 is perfect and 0 is a complete failure.

To get this started, lets create a scorer that tells us about a conversation a user had.

Example Scorer: "is the user happy"

Now let's create a scorer that detects the quality of a users conversation with an LLM.

Every chat that a user has will have their entire conversation run through this scorer, which will output a value that tells us various details about their conversation based on what prompt we use. In this case we'll just see how happy the user is.

In the Braintrust console, create a new scorer that looks something like this:

braintrust scorer config

This scorer will run automatically on every logged conversation and assign a score. You can create multiple scorers for different failure modes:

Response Quality: Is the answer helpful and accurate?
Conversation Flow: Is the user getting stuck in loops?
Latency: Are responses taking too long?
Error Handling: Are errors being handled gracefully?

Since it will run on every conversation every time a message is submitted, be sure to use a cheaper model for this, as it will be run quite frequently. gpt-5-nano is reasonable.

Create a Slack Webhook URL we'll send to

In Slack, go to Apps → Incoming Webhooks
Click Add to Slack
Choose the channel for alerts (e.g., #ai-monitoring)
Copy the webhook URL (looks like https://hooks.slack.com/services/T00000000/B00000000/XXXXXXXXXXXX)

We'll use this next in Hookdeck.

Setting Up Hookdeck for Slack

Braintrust sends webhook payloads in its own format, but Slack expects a specific message structure. Hookdeck sits in the middle to transform the data.

Create a connection in Hookdeck

This will take a webhook, and reroute it to a new destination with a few small tweaks:

Sign up at https://hookdeck.com (free to start)
- If you're new user, going through their "getting started" wizard will take you through all of thes steps.
Create a new Connection for Braintrust webhooks. It'll give you a inbound webhook URL that looks like: https://hkdk.events/abcdef
Connect the destination of type HTTP with your slack webhook created above
Check off the transformation step and paste in the following code:

// Converts Braintrust alert payload into Slack message blocks
function buildSlackBlocks(request) {
  const { organization, project, automation, details } = request.body;

  return {
    blocks: [
      {
        type: "header",
        text: {
          type: "plain_text",
          text: `🚨 ${organization.name}: ${automation.name}`,
          emoji: true,
        },
      },
      {
        type: "section",
        fields: [
          {
            type: "mrkdwn",
            text: `*Organization:*\n${organization.name}`,
          },
          {
            type: "mrkdwn",
            text: `*Project:*\n${project.name}`,
          },
          {
            type: "mrkdwn",
            text: `*Event Type:*\n${automation.event_type}`,
          },
          {
            type: "mrkdwn",
            text: `*Triggered Logs:*\n${details.count}`,
          },
          {
            type: "mrkdwn",
            text: `*Time Range:*\n${details.time_start} → ${details.time_end}`,
          },
        ],
      },
      {
        type: "section",
        text: {
          type: "mrkdwn",
          text: details.message,
        },
      },
      {
        type: "actions",
        elements: [
          {
            type: "button",
            text: {
              type: "plain_text",
              text: "View Related Logs",
            },
            url: details.related_logs_url,
          },
          {
            type: "button",
            text: {
              type: "plain_text",
              text: "View Automation",
            },
            url: automation.url,
          },
        ],
      },
      {
        type: "context",
        elements: [
          {
            type: "mrkdwn",
            text: details.is_test ? "🧪 *Test Alert*" : "⚠️ *Production Alert*",
          },
        ],
      },
    ],
  };
}

// Update the body of the request with the slack blocks to be sent off
addHandler("transform", (request, context) => {
  request.body = buildSlackBlocks(request);
  return request;
});

This transformation:

Extracts key information from the Braintrust payload
Formats it into Slack's Block Kit format
Adds action buttons to view logs and automation details
Includes visual indicators for test vs production alerts

Save and enable the connection, and you should be good to go here!

Now let's create the alert in Braintrust to send to this connection.

Create a Braintrust Alert

At this point we should have all of our logs in Braintrust after we enabled our backend to be logging the content. Now we'll make an alert from the scores that we configured in the previous step.

In Braintrust:

Navigate to Configuration → Alerts
Click + Alert in the top right
Enter the alert name, and the webhook you got from
Enter your Hookdeck URL (we'll create this next): https://hookdeck.events/abcdef
Set the trigger frequency (e.g., "Immediately" or "Every 5 minutes")
Specify what to alert on using Braintrust Query Language (BTQL) that looks for low happiness scores, and filter out the empty tool call messages:

scores is not null
and scores."user-happiness" < 0.5
and is_root = true
and output[0].tool_calls is null

Save the alert

This query triggers when:

A conversation has been scored
The user-happiness score is below 50%

Testing the Complete Flow

Let's verify everything works:

1. Send a Test Alert from Braintrust

In your Braintrust alert:

Click Send Test
This triggers a webhook to Hookdeck

2. Verify in Hookdeck

Check the Hookdeck dashboard:

You should see the incoming request from Braintrust
View the transformation output
Confirm it was delivered to Slack

3. Check Slack

You should receive a formatted message like:

🚨 My Company: MCP Tool Failures Detected

Organization: My Company
Project: my-chatbot
Event Type: scorer_threshold
Triggered Logs: 3
Time Range: 2025-11-11 10:00 → 2025-11-11 10:05

3 conversations detected with MCP tool success scores below 0.5

[View Related Logs] [View Automation]

⚠️ Production Alert

What Happens When You Get an Alert

When an alert comes through:

Click "View Related Logs" to see the problematic conversations
Review the scorer details to understand what failed
Check the conversation context - what was the user trying to do?
Investigate the tool calls - which MCP tools failed and why?
Fix the underlying issue - update your MCP server, improve prompts, or adjust tool configurations
Monitor the scores - verify your fix improved the success rate

Advanced: Multiple Scorers and Alert Conditions

You can create sophisticated monitoring by combining multiple scorers:

-- Alert on multiple failure conditions
(scores."mcp-tool-success" < 0.5 OR scores."response-quality" < 0.6)
and is_root = true
and metadata.environment = 'production'

Common scorer patterns:

Response Quality: Uses an LLM to evaluate if the response was helpful
Factual Accuracy: Checks responses against your knowledge base
Conversation Completion: Detects if users got their questions answered
Error Rate: Tracks technical errors vs successful completions
Latency: Monitors response times and timeout rates

Best Practices

Start Simple

Begin with one or two scorers focused on your biggest pain points. Add more as you learn what matters.

Tune Your Thresholds

A score of 0.5 might be too sensitive or not sensitive enough. Adjust based on your alert volume and false positive rate.

Include Context in Alerts

The more context in your Slack message, the faster your team can respond. Include:

User IDs (if not sensitive)
Conversation summaries
Specific error messages
Links to full traces

Review Alerts Regularly

Set up a weekly review of alerts to:

Identify patterns in failures
Adjust scorer logic
Update alert thresholds
Improve your chatbot based on real issues

Use Environments

Create separate Braintrust projects for staging and production. Test your scorers and alerts in staging before enabling in production.

No affiliation

NOTE: we have no affiliation with Braintrust or Hookdeck