Your Next AI Startup Should Be Built on Temporal [Part 3: Automated Prompt Testing]

Written by Marshall Thompson | June 13, 2024

Welcome to part three of our series on using Temporal to improve the reliability of applications built around LLMs like the one that powers ChatGPT. In part one, you learned how to use Temporal to clone a repo and ingest its documentation into an RAG Database for use with your LLM. Part two taught you how to use context injection to give users more accurate answers to prompts made against that documentation. In this post, you’ll use Temporal and another LLM to automatically test the accuracy of your application’s answers to the prompts from part two.

Let that sink in—you’re training an LLM to test the output of another LLM. Most common feedback cycles involve tracking users' reactions to changes made, like A/B testing, or reactionary methods like click-through rates and calls to customer service. The LLM-based solution in this post allows you to improve the reliability and quality of your content before it ever reaches your users.

If you’d like to jump directly to the code, check out the GitHub repo for this project.

The importance of prompt validation

If you’ve ever used an LLM-based chat, you might have figured out that asking the same question in a slightly different way can yield a drastically different result. You want to give your users accurate information when they ask questions about your projects; in this case, Hatchify. If the answers are inaccurate or poorly constructed, users won’t trust the results and will stop using the service. You want your answers to be accurate so our users will be satisfied. But how do you accomplish that?

Setting up validation workflows with Temporal

Think of LLMs like any other function. When you call a function with certain variables, you will get a certain result. However, asking the same question to the same LLM does not necessarily guarantee the same deterministic result. So to test the accuracy of our LLM’s results, you need to identify the sources of input. Then you can write tests with different inputs and check for specific results. The solution needs to be flexible to adjust for the non-deterministic answers from the LLM.

LLM-based applications have several sources of input:

The documentation used to populate our RAG database
The user’s question being passed in the prompt to the LLM
The “enhanced” prompt which the application creates by combining results from our RAG database with the user’s original question
The LLM itself – there are many different LLMs and they are constantly changing

Any change in any of these inputs can cause the application to produce different results. A fully tested application would include Continuous Integration that runs tests whenever one of these inputs changes. Those tests would rate the accuracy of answers with a score, and the application would pass the tests above a certain score.

Testing user input can be tricky because similar questions could be asked in multiple ways. The results might be similar, but not identical, and you need a reliable system of scoring the responses. The cool thing, however, is that you can teach LLMs like OpenAI’s “GPT 4 Turbo” how to validate the responses from your documentation chat. The new LLM can decide if the answers are close enough and even give a numeric score to either pass or fail the test.

Create the initial questions

To simplify things, define an object where each key is a query for the prompt you’re testing, and each value is a high-quality example of an expected response. Our example will use Hatchify prompts:

// test/prompts/hatchify-assistant.json
{
  "How can I create a custom route?": "To create a custom route in your Hatchify application, you can follow these steps:\n\n1. **Install Koa Router**: If you are using Koa, it's recommended to use the `@koa/router` package to help create custom routes easily. You can install it using npm:\n\n```bash\nnpm install @koa/router\nnpm install @types/koa__router --save-dev\n```\n\n2. **Example Code**: Here is an example code snippet that demonstrates how to add a custom route to your Hatchify application:\n\n```typescript\nimport Koa from \"koa\"\nimport KoaRouter from \"@koa/router\"\nimport { hatchifyKoa, UnexpectedValueError } from \"@hatchifyjs/koa\"\nimport * as Schemas from \"../schemas\"\n\nconst app = new Koa()\nconst hatchedKoa = hatchifyKoa(Schemas, {\n  prefix: \"/api\",\n  database: {\n    uri: process.env.DB_URI,\n  }\n})\n\nconst router = new KoaRouter()\n\nrouter.get(\"/api/custom-route\", async function customRouteHandler(ctx): Promise<void> {\n  // Add your custom logic here\n  // You can access request data, query parameters, and more from `ctx`\n\n  // Example: Return a custom response\n  ctx.body = { message: \"Custom route response\" }\n})\n\napp.use(router.routes())\napp.use(hatchedKoa.middleware.allModels.all)\n\napp.listen(3000, () => {\n  console.log(\"Server started on http://localhost:3000\")\n})\n```\n\n3. **Custom Logic**: Inside the custom route handler function, you can add your custom logic, such as data manipulation, validation, or integration with third-party services. You have access to the request context (`ctx`) to handle incoming requests and send responses.\n\n4. **Middleware**: Ensure to add your custom route before the `hatchedKoa.middleware.allModels.all` middleware to properly handle the custom route logic.\n\nBy following these steps and customizing the example code provided, you can easily create custom routes in your Hatchify application to handle specific business logic or requirements.",
  "How can I see a list of all endpoints?": "To see a list of all endpoints generated by Hatchify in your application, you can use the `printEndpoints()` method provided by the `hatchedKoa` object. This method will print out a list of all the endpoints that have been generated based on your schemas.\n\nHere is an example of how you can use the `printEndpoints()` method in your code:\n\n```typescript\nimport { hatchifyKoa } from \"@hatchifyjs/koa\";\nimport * as Schemas from \"../schemas\";\n\nconst hatchedKoa = hatchifyKoa(Schemas, { prefix: \"/api\" });\n\n// Call the printEndpoints() method to print out the list of endpoints\nhatchedKoa.printEndpoints();\n```\n\nBy running this code in your application, you will see a list of all the endpoints that have been generated by Hatchify based on your schemas. This can help you understand the API routes available in your application and how they are structured.",
  "How can I setup custom validation for an endpoint?": "To set up custom validation for an endpoint in your Hatchify application, you can leverage the validation hooks provided by Hatchify. These hooks allow you to define custom validation logic for your resources.\n\nHere's a general outline of how you can set up custom validation for an endpoint:\n\n1. Define your schema with the necessary attributes and validation functions. For example, you can define a schema for an `Employee` resource with custom validation logic:\n\n```typescript\nexport const Employee = {\n  name: \"Employee\",\n  attributes: {\n    firstName: string(),\n    lastName: string(),\n    startDate: datetime(),\n    endDate: datetime(),\n  },\n  validation: {\n    startDateBeforeEndDate() {\n      const { startDate, endDate } = this as unknown as { startDate: Date; endDate: Date };\n      if (startDate && endDate && startDate > endDate) {\n        throw [\n          new UnexpectedValueError({\n            detail: \"Start date cannot be after end date.\",\n            pointer: \"data/attributes/startDate\",\n          }),\n        ];\n      }\n    },\n  },\n} satisfies PartialSchema;\n```\n\n2. Implement the custom validation logic within the validation function defined in your schema. In this example, the `startDateBeforeEndDate` function checks if the `startDate` is before the `endDate`.\n\n3. Use the custom validation function in your endpoint logic. You can call this validation function before creating or updating a resource to ensure data integrity.\n\nBy following these steps, you can easily set up custom validation for your endpoints in Hatchify, ensuring that your data meets the required criteria before being processed.",
  "How can I define an attribute as an optional integer?": "To define an attribute as an optional integer, you can use the `integer` function from the Hatchify open-source project in TypeScript. Here is an example of how you can define an optional integer attribute:\n\n```typescript\nimport { integer } from '@hatchifyjs/core';\n\nexport const ExampleModel = {\n  name: 'ExampleModel',\n  attributes: {\n    optionalNumber: integer({ required: false }),\n  },\n};\n```\n\nIn this example, the `optionalNumber` attribute is defined as an optional integer by setting `required: false` in the `integer` function. This means that the `optionalNumber` attribute can be `null` or an integer value.",
  "How can I define schema relationships?": "To define schema relationships in your TypeScript application using the Hatchify open-source project, you can utilize the `belongsTo`, `hasMany`, and `hasOne` functions provided by the `@hatchifyjs/core` package. Here's a brief overview of how you can define different types of relationships:\n\n### Belongs To Relationship\n- Use `belongsTo` to create a relationship from the current schema to the target schema.\n- Example:\n  ```typescript\n  import { belongsTo } from \"@hatchifyjs/core\";\n\n  const Player = {\n    name: \"Player\",\n    relationships: {\n      team: belongsTo(\"Team\"),\n    },\n  };\n  ```\n\n### Has Many Relationship\n- Use `hasMany` to create a one-to-many relationship from the current schema to the target schema.\n- Example:\n  ```typescript\n  import { hasMany } from \"@hatchifyjs/core\";\n\n  const Team = {\n    name: \"Team\",\n    relationships: {\n      players: hasMany(\"Player\"),\n    },\n  };\n  ```\n\n### Has One Relationship\n- Use `hasOne` to create a one-to-one relationship from the current schema to the target schema.\n- Example:\n  ```typescript\n  import { hasOne } from \"@hatchifyjs/core\";\n\n  const SalesPerson = {\n    name: \"SalesPerson\",\n    relationships: {\n      account: hasOne(\"Account\"),\n    },\n  };\n  ```\n\nBy defining these relationships within your schema objects, you can establish the connections between different models in your application.",
  "How can I query for data created between two dates?": "To query for data created between two dates using Hatchify, you can utilize the `$gte` (greater than or equal to) and `$lte` (less than or equal to) operators on the date attribute. Here's an example of how you can construct the query:\n\n```typescript\n// Query for data created between two dates\nconst startDate = \"2023-01-01\";\nconst endDate = \"2023-12-31\";\n\n// Construct the filter query\nconst filterQuery = `filter[createdAt][$gte]=${startDate}&filter[createdAt][$lte]=${endDate}`;\n\n// Use the filter query in your API request\nconst apiUrl = `/api/data?${filterQuery}`;\n```\n\nIn this example:\n- `startDate` represents the start date of the range.\n- `endDate` represents the end date of the range.\n- The `filterQuery` string is constructed with the `$gte` and `$lte` operators on the `createdAt` attribute.\n- Finally, you can use the `filterQuery` in your API request to retrieve data created between the specified dates.\n\nMake sure to replace `createdAt` with the actual attribute representing the creation date in your data model.",
  "How can I query data with pagination?": "To query data with pagination using the Hatchify open-source project in TypeScript, you can follow the pagination techniques provided in the documentation. Here are the steps for different pagination methods:\n\n### Offset and Limit Pagination\n\n1. Use the `page[offset]` parameter to determine the starting point in the dataset.\n2. Use the `page[limit]` parameter to specify the maximum number of records to include on each page.\n\nExample URL for fetching 10 todos, skipping the first 5:\n```curl\nGET /api/todos?page[offset]=5&page[limit]=10\n```\n\n### Page-Based Pagination\n\n1. Use the `page[number]` parameter to specify the desired page number.\n2. Use the `page[size]` parameter to determine the number of records per page.\n\nExample URL for fetching the second page of todos where each page has 20 results:\n```curl\nGET /api/todos?page[number]=2&page[size]=20\n```\n\nBy following these techniques and adjusting the parameters according to your requirements, you can effectively query data with pagination using Hatchify in TypeScript."
}

Build the Temporal workflow

Start by creating the testPromptsWorkflow in workflows.ts. This is the same file that contains the workflows from parts one and two of this series.

type TestPromptsWorkflowInput = {
  latestDocumentProcessingId: string
  testName: string
}

type TestPromptsWorkflowOutput = {
  validationResults: {
    query: string
    answer: string
    score: number
    reason: string
  }[],
  summary: string
  averageScore: number
}

export async function testPromptsWorkflow(input: TestPromptsWorkflowInput): Promise<TestPromptsWorkflowOutput> {
  // 1.
  const testCases = await loadTestCases({ testName: input.testName })

  // 2.
  const queries = Object.keys(testCases)

  // 3.
  const childWorkflowHandles = await Promise.all(
    queries.map((query) => {
      return startChild('invokePromptWorkflow', {
        taskQueue: 'invoke-prompt-queue',
        args: [{
          query,
          latestDocumentProcessingId: input.latestDocumentProcessingId,
        }]
      })
    }
  ));

  // 4.
  const childWorkflowResponses = await Promise.all(
    childWorkflowHandles.map(async (handle, index) => {
      const query = queries[index];
      const expectedResponse = testCases[query]
      const result = await handle.result();

      return {
        query,
        expectedResponse,
        actualResponse: result.response,
      }
    })
  );

  // 5.
  const validationResults = await Promise.all(
    childWorkflowResponses.map(async (input) => validateQueryResult(input))
  )

  // 6.
  const { summary, averageScore } = await summarizeValidationResults({ validationResults })

  // 7.
  return {
    validationResults,
    summary,
    averageScore
  }
}

Let’s review what’s happening in this workflow:

The workflow runs an activity called loadTestCases based on the testName provided in the input. This is passed from the client and, in our case, is hard-coded to hatchify-assistant.
The workflow extracts the queries from the loaded test cases.
For each query, it starts a child workflow named invokePromptWorkflow with specific arguments and adds it to the childWorkflowHandles array. The child workflows are started in parallel using Promise.all.
The workflow waits for all child workflows to complete and collects their responses. For each response, it constructs an object containing the query, the expected response from the test cases, and the actual response from the workflow. These objects are stored in the childWorkflowResponses array.
It validates each workflow response against the expected response using the validateQueryResult function. The validation results are stored in the validationResults array.
The workflow summarizes the validation results using the summarizeValidationResults function, which returns a summary and an averageScore.
Finally, it returns an object containing the validationResults, the summary, and the averageScore.

Create the `loadTestCases` activity

In the activities folder, create the file test-prompts-activities.ts and put all of your new activities inside. Start with the activity for loading test cases:

import fs from 'fs'

type LoadTestCasesInput = {
  testName: string
}
type LoadTestCasesOutput = Record<string, string>

export async function loadTestCases(input: LoadTestCasesInput): Promise<LoadTestCasesOutput> {
  const testCases = await fs.promises.readFile(`./test/prompts/${input.testName}.json`, 'utf8')
  return JSON.parse(testCases)
}

The above activity loads the test cases from the specified JSON file and parses them into a plain JavaScript object before returning them to the workflow. As application requirements expand, the activity could be extended to pull from other sources. Other sources might include additional JSON files or even an S3 bucket full of files.

Create the `validateQueryResult` activity

Since you already built the invokePromptWorkflow in the last article, you can move on to the validateQueryResult workflow. Add it to the same test-prompts-activities.ts file:

import { createMemoizedOpenAI } from '../chat-gpt'

const getGPTModel = createMemoizedOpenAI("gpt-4-turbo");

type ValidateQueryResultInput = {
  query: string
  expectedResponse: string
  actualResponse: string
}
type ValidateQueryResultOutput = {
  query: string
  answer: string
  score: number
  reason: string
}

export async function validateQueryResult(input: ValidateQueryResultInput): Promise<ValidateQueryResultOutput> {
  // 1.
  const gptModel = getGPTModel()

  // 2.
  const response = await gptModel.invoke([
    [ 'system', 'You are responsible for verifying that prompt provided invoked the correct response from an LLM. Consider the following question and expected answer as well as the prompt\'s answer.' ],
    [ 'system', `The question is: ${input.query}` ],
    [ 'system', `The expected answer is: ${input.expectedResponse}` ],
    [ 'system', `The prompt's answer was: ${input.actualResponse}` ],
    [ 'system', 'Provide a score between 0 and 100 representing how well the prompt did at answering the question correctly.' ],
    [ 'system', 'If the expected answer and the prompt\'s answer are essentially the same, the score should be on the high end.' ],
    [ 'system', 'If key details are missing or incorrect in the prompt\'s answer or irrelevant information is included, the score should be lower.' ],
    [ 'system', 'If the prompt\'s answer is entirely wrong, the score should be 0. Put your score and reasoning inside a JSON object using the keys "score" and "reason"'],
    [ 'system', 'You must never consider your own answers, but instead always determine score based on the expected answer above.' ],
  ], {
    response_format: {
      type: 'json_object'
    }
  })

  // 3.
  const result = JSON.parse(response.content.toString()) as {
    reason: string
    score: number
  };

  // 4.
  return {
    query: input.query,
    answer: input.actualResponse,
    score: result.score,
    reason: result.reason,
  }
}

Let’s review what’s going on in this activity:

It initializes a GPT model using the getGPTModel function.
It invokes the GPT model with a series of system prompts. These prompts tell GPT to evaluate a query and its response. The query, expected response, and actual response are provided in the input to the function. The GPT model is expected to return a JSON object containing a score and a reason. The score is a number between 0 and 100 representing how well the query was answered. The reason is a string explaining the score.
The response from the GPT model is parsed into a JavaScript object.
Finally, the function returns an object containing the query, the response, and the score and reason from the GPT model.

The above activity uses a utility called createMemoizedOpenAI from ../chat-gpt.ts . This is the same logic that you used in a previous article, but now you’ll extract it into a shared location:

import { ChatOpenAI } from "@langchain/openai"

const { OPENAI_API_KEY } = process.env

export function createMemoizedOpenAI(modelName: string = 'gpt-3.5-turbo') {
  let _gptModel: ChatOpenAI
  return () => {
    if (!_gptModel) {
      _gptModel = new ChatOpenAI({
        openAIApiKey: OPENAI_API_KEY,
        temperature: 0,
        modelName,
      })
    }
    return _gptModel
  }
}

The above code sets up a connection to an OpenAI model running GPT-3.5-turbo by default to preserve previous articles' functionality. However, our new activity passes in GPT-4-turbo. The createMemoizedOpenAI function is the last part required for this activity to run. Now we can move on to the last required activity.

Create the `summarizeValidationResults` activity

Next, you’ll need to create the summarizeValidationResults activity:

type SummarizeValidationResultsInput = {
  validationResults: ValidateQueryResultOutput[]
};
type SummarizeValidationResultsOutput = {
  summary: string
  averageScore: number
};

export async function summarizeValidationResults(input: SummarizeValidationResultsInput): Promise<SummarizeValidationResultsOutput> {
  // 1.
  const gptModel = getGPTModel()

  // 2.
  const comments = input.validationResults.map((validation) => validation.reason)
  const averageScore = input.validationResults.reduce((total, validation) => total + validation.score, 0) / input.validationResults.length;

  // 3.
  const response = await gptModel.invoke([
    [ 'system', 'You are responsible for summarizing comments about an LLM prompt\'s effectiveness.' ],
    [ 'system', 'Consider the following comments. Each one was made by a person providing feedback about the quality of a prompt\'s output.' ],
    [ 'system', `Comments:\n${comments.join("\n\n")}` ],
    [ 'system', 'You must provide a concise summary of common patterns in the comments, good or bad.' ],
  ])

  // 4.
  return {
    summary: response.content.toString(),
    averageScore
  }
}

Let’s review what this activity does:

It initializes a GPT model using the getGPTModel function, which is reused from the previous activity you wrote.
It extracts the comments from the validation results and then calculates the average score.
It invokes the GPT model with a series of system prompts. These prompts instruct the model to summarize the comments about the effectiveness of a prompt's output, which are provided in the input to the function. The GPT model is expected to return a summary of the comments.
Finally, the function returns an object containing the summary from the GPT model and the averageScore.

Build the Temporal Worker

Now it’s time to build the part of the code that will actually execute the workflow activities: the Temporal Worker. Create the worker in a file called test-prompts-worker.ts:

import { NativeConnection, Worker } from '@temporalio/worker';
import * as activities from './activities';

const TEMPORAL_ADDRESS = process.env.TEMPORAL_ADDRESS

async function run() {
  // 1.
  const connection = await NativeConnection.connect({
    address: TEMPORAL_ADDRESS
  });

  // 2.
  const worker = await Worker.create({
    connection,
    namespace: 'default',
    taskQueue: 'prompt-testing-queue',
    workflowsPath: require.resolve('./workflows'),
    activities
  });

  // 3.
  await worker.run();
}

run().catch((err) => {
  console.error(err);
  process.exit(1);
});

Let’s review what the above code does:

It establishes a connection to the Temporal server using the NativeConnection.connect method and the server address.
It creates a Temporal worker using the Worker.create method. The worker is configured to use the established connection, the "default" namespace, a task queue named "prompt-testing-queue", and the workflows and activities created earlier.
It starts the worker using the worker.run method. The worker will now listen for tasks from the prompt-testing-queue and execute them.

Build the Temporal Client

Now that your workflow is complete, you need a way to run it. Create a Node.js script (using TypeScript) and create a Temporal Client that will trigger the workflow:

import { Connection, Client } from '@temporalio/client';
import { testPromptsWorkflow } from './workflows';
import { nanoid } from 'nanoid';

async function run() {
  const connection = await Connection.connect({ address: 'localhost:7233' });
  const client = new Client({
    connection
  });

  const [ latestDocumentProcessingId ] = process.argv.slice(2)
  const id = `test-prompts-workflow-${nanoid()}`.toLowerCase().replaceAll('_', '')
  const handle = await client.workflow.start(testPromptsWorkflow, {
    taskQueue: 'prompt-testing-queue',
    args: [{
      latestDocumentProcessingId,
      testName: "hatchify-assistant"
    }],
    workflowId: id
  });

  console.log(`Workflow ${handle.workflowId} running`);
  console.log(await handle.result());
}

run().catch((err) => {
  console.error(err);
  process.exit(1);
});

Add a test-prompts script to your package.json so you can easily run it from the command line in your project:

"scripts": {
  "build": "tsc --build",
  "start": "ts-node src/process-documents-worker.ts && ts-node src/invoke-prompt-worker.ts",
  "process-documents": "ts-node src/process-documents-client.ts",
  "invoke-prompt": "ts-node src/invoke-prompt-client.ts",
  "test-prompts": "ts-node src/test-prompts-client.ts"
},

Running the workflow

With everything in place, you're ready to run your new workflow. The easiest way to get up and running for local development is to clone the repo we created, which includes a docker-compose.yml file that will set up all of the necessary services for you, including Temporal, worker runners, and a PostgreSQL database. The Docker-based configuration is great for local development. In production we would use Temporal Cloud for better reliability, scaling, and uptime. Assuming you've followed the instructions in the README.md file, you can now run the command to test prompts:

npm run test-prompts

Once the workflow finishes running, you’ll see object returned from the workflow, including the validationResult (all lines except the last two), followed by the summary and averageScore .

{
  validationResults: [
    {
      query: 'How can I create a custom route?',
      answer: 'To create a custom route in your Hatchify application, you can use the Koa Router to define the route and its associated logic. Here is an example of how you can create a custom route in TypeScript using Hatchify and Koa:\n' +
        '\n' +
        '1. First, ensure you have the necessary dependencies installed. If you are using Koa, you can install the Koa Router:\n' +
        '\n' +
        '```bash\n' +
        'npm install @koa/router\n' +
        'npm install @types/koa__router --save-dev\n' +
        '```\n' +
        '\n' +
        '2. Next, you can create a custom route by defining a new route using the Koa Router and adding your custom logic. Here is an example:\n' +
        '\n' +
        '```typescript\n' +
        'import Koa from "koa";\n' +
        'import KoaRouter from "@koa/router";\n' +
        'import { hatchifyKoa, UnexpectedValueError } from "@hatchifyjs/koa";\n' +
        'import * as Schemas from "../schemas";\n' +
        '\n' +
        'const app = new Koa();\n' +
        'const hatchedKoa = hatchifyKoa(Schemas, {\n' +
        '  prefix: "/api",\n' +
        '  database: {\n' +
        '    uri: process.env.DB_URI,\n' +
        '  },\n' +
        '});\n' +
        '\n' +
        '// Creating a Koa Router\n' +
        'const router = new KoaRouter();\n' +
        '\n' +
        '// Adding the custom route\n' +
        'router.get("/custom-route", async function customRouteHandler(ctx): Promise<void> {\n' +
        '  // Add your custom logic here\n' +
        '  ctx.body = "Custom Route Response";\n' +
        '});\n' +
        '\n' +
        '// Mount the router middleware\n' +
        'app.use(router.routes());\n' +
        '\n' +
        '// Use the Hatchify middleware for all models\n' +
        'app.use(hatchedKoa.middleware.allModels.all);\n' +
        '\n' +
        '// Start the server\n' +
        'app.listen(3000, () => {\n' +
        '  console.log("Server running on http://localhost:3000");\n' +
        '});\n' +
        '```\n' +
        '\n' +
        'In this example, we create a custom route `/custom-route` that responds with a simple message. You can replace the logic inside the route handler function with your custom business logic.\n' +
        '\n' +
        "Remember to adjust the route path, HTTP method, and logic according to your application's requirements. This example demonstrates the basic structure of creating a custom route in a Hatchify application using Koa.",
      score: 95,
      reason: "The prompt's answer closely matches the expected answer, providing a detailed and accurate guide on how to create a custom route in a Hatchify application using Koa Router. Both answers include similar steps, code examples, and explanations. The slight difference in the route path in the example code ('/custom-route' vs '/api/custom-route') is minor and does not significantly impact the overall correctness of the response."
    },
    {
      query: 'How can I see a list of all endpoints?',
      answer: 'To see a list of all endpoints generated by Hatchify, you can use the `printEndpoints()` method provided by the Hatchify middleware. This method will print out a list of all the endpoints that have been generated based on your schemas.\n' +
        '\n' +
        'Here is an example of how you can use the `printEndpoints()` method in your backend code:\n' +
        '\n' +
        '```typescript\n' +
        'import { hatchifyKoa } from "@hatchifyjs/koa";\n' +
        'import * as Schemas from "../schemas";\n' +
        '\n' +
        'const hatchedKoa = hatchifyKoa(Schemas, {\n' +
        '  prefix: "/api",\n' +
        '  database: {\n' +
        '    uri: process.env.DB_URI,\n' +
        '  },\n' +
        '});\n' +
        '\n' +
        '// This will print out a list of all endpoints generated by Hatchify\n' +
        'hatchedKoa.printEndpoints();\n' +
        '```\n' +
        '\n' +
        'By calling `hatchedKoa.printEndpoints()`, you will get a list of all the endpoints that have been created based on your schemas. This can help you understand and visualize the API routes available in your application.',
      score: 95,
      reason: "The prompt's answer accurately describes the method to see a list of all endpoints using the `printEndpoints()` method provided by Hatchify, similar to the expected answer. It includes a code example that is very similar to the expected answer, correctly showing the use of `hatchifyKoa` and the `printEndpoints()` method. The only minor difference is the additional detail about the database configuration in the code example, which is not incorrect but slightly deviates from the simplicity of the expected answer. This additional detail does not detract significantly from the correctness of the response regarding the original question."
    },
    {
      query: 'How can I setup custom validation for an endpoint?',
      answer: 'To set up custom validation for an endpoint in a Hatchify application, you can follow these steps:\n' +
        '\n' +
        '1. **Define Your Schema with Validation Logic**:\n' +
        '   - Define your schema with custom validation logic. You can add validation functions to your schema that will be executed during create or update operations.\n' +
        '   - Here is an example of adding custom validation to an `Employee` schema:\n' +
        '\n' +
        '   ```typescript\n' +         
        '   // hatchify-app/schemas.ts\n' +
        '   import { datetime, string } from "@hatchify/core"\n' +
        '   import type { PartialSchema } from "@hatchify/core"\n' +
        '   import { UnexpectedValueError } from "@hatchifyjs/koa"\n' +
        '\n' +
        '   export const Employee = {\n' +
        '     name: "Employee",\n' +
        '     attributes: {\n' +
        '       firstName: string(),\n' +
        '       lastName: string(),\n' +
        '       startDate: datetime(),\n' +
        '       endDate: datetime(),\n' +
        '     },\n' +
        '     validation: {\n' +
        '       startDateBeforeEndDate() {\n' +
        '         const { startDate, endDate } = this as unknown as { startDate: Date; endDate: Date }\n' +
        '         if (startDate && endDate && startDate > endDate) {\n' +
        '           throw [\n' +
        '             new UnexpectedValueError({\n' +
        '               detail: "Start date cannot be after end date.",\n' +
        '               pointer: "data/attributes/startDate",\n' +
        '             }),\n' +
        '           ]\n' +
        '         }\n' +
        '       },\n' +
        '     },\n' +
        '   } satisfies PartialSchema\n' +
        '   ```\n' +
        '\n' +
        '2. **Implement Custom Logic in Your Endpoint**:\n' +
        '   - In your endpoint logic, you can access the parsed request data, perform custom validation, and handle errors accordingly.\n' +
        '   - Here is an example of custom validation logic in a POST request for creating assignments:\n' +
        '\n' +
        '   ```typescript\n' +
        '   // hatchify-app/backend/index.ts\n' +
        '   import { hatchifyKoa, Op, UnexpectedValueError } from "@hatchifyjs/koa"\n' +
        '   import { Assignment, Employee } from "../schemas.js"\n' +
        '\n' +
        '   router.post("/api/assignments", async (ctx, next) => {\n' +
        '     const createOptions = hatchedKoa.parse.Assignment.create(ctx.body)\n' +
        '     const { startDate, endDate, employeeId } = <Assignment>ctx.body\n' +
        '\n' +
        '     await hatchedKoa.orm.transaction(async (transaction) => {\n' +
        '       let assignmentsForEmployee = await hatchedKoa.orm.models.Assignment.findAll({\n' +
        '         where: { employeeId },\n' +
        '         transaction,\n' +
        '       })\n' +
        '\n' +
        '       assignmentsForEmployee = await hatchedKoa.orm.models.Assignment.findAll({\n' +
        '         where: {\n' +
        '           employeeId,\n' +
        '           startDate: { [Op.gt]: startDate },\n' +
        '           endDate: { [Op.lt]: endDate },\n' +
        '         },\n' +
        '         transaction,\n' +
        '       })\n' +
        '\n' +
        '       if (assignmentsForEmployee.length) {\n' +
        '         throw [\n' +
        '           new UnexpectedValueError({\n' +
        '             detail: "Employee already assigned.",\n' +
        '             pointer: "data/attributes/startDate",\n' +
        '           }),\n' +
        '         ]\n' +
        '       }\n' +
        '\n' +
        '       const assignment = await hatchedKoa.orm.models.Assignment.create(createOptions.body, { ...createOptions.ops, transaction })\n' +
        '       const result = hatchedKoa.serialize.Assignment.create(assignment)\n' +
        '     })\n' +
        '\n' +
        '     ctx.status = 201\n' +
        '     ctx.body = result\n' +
        '   })\n' +
        '   ```\n' +
        '\n' +
        '3. **Handle Custom Validation Errors**:\n' +
        '   - Ensure that your custom validation logic throws errors when validation fails. These errors can be caught and handled to provide appropriate responses to the client.\n' +
        '\n' +
        'By following these steps, you can set up custom validation for an endpoint in your Hatchify application to enforce specific business rules and data integrity requirements.',
      score: 95,
      reason: "The prompt's answer closely matches the expected answer in terms of content, structure, and detail. It provides a comprehensive guide on setting up custom validation for an endpoint in a Hatchify application, including defining a schema with validation logic, implementing custom logic in the endpoint, and handling validation errors. The slight variations in code examples and additional details about handling errors in endpoint logic do not detract from the overall accuracy and relevance of the response, hence the high score."
    },
    {
      query: 'How can I define an attribute as an optional integer?',
      answer: 'To define an attribute as an optional integer in the Hatchify open-source project using TypeScript, you can use the `integer()` function with the `required` parameter set to `false`. Here is an example:\n' +
        '\n' +
        '```typescript\n' +
        "import { integer } from 'hatchify';\n" +
        '\n' +
        'export const Todo = {\n' +
        '  name: "Todo",\n' +
        '  attributes: {\n' +
        '    optionalNumber: integer({ required: false }),\n' +
        '  },\n' +
        '};\n' +
        '```\n' +
        '\n' +
        'In this example, the `optionalNumber` attribute is defined as an optional integer. This means that the attribute can have a value of type integer or `null`.',
      score: 95,
      reason: "The prompt's answer accurately addresses how to define an optional integer attribute using the Hatchify open-source project in TypeScript, similar to the expected answer. It correctly uses the `integer()` function with the `required: false` parameter. The only minor discrepancy is the import path (`'hatchify'` instead of `'@hatchifyjs/core'`), which could be due to a version difference or a simplification in the example. This small detail does not significantly impact the correctness of the response regarding the functionality described."
    },
    {
      query: 'How can I define schema relationships?',
      answer: 'To define schema relationships in Hatchify, you can use the `belongsTo`, `hasMany`, and `hasOne` functions. Here is an example of defining relationships between `Player` and `Team` models using `belongsTo` and `hasMany`:\n' +
        '\n' +
        '```typescript\n' +
        'import { belongsTo, hasMany, string } from "@hatchifyjs/core"\n' +
        'import type { PartialSchema } from "@hatchifyjs/core"\n' +
        '\n' +
        'export const Player = {\n' +
        '  name: "Player",\n' +
        '  attributes: {\n' +
        '    firstName: string(),\n' +
        '    lastName: string(),\n' +
        '  },\n' +
        '  relationships: {\n' +
        '    team: belongsTo("Team"),\n' +
        '  },\n' +
        '} satisfies PartialSchema\n' +
        '\n' +
        'export const Team = {\n' +
        '  name: "Team",\n' +
        '  attributes: {\n' +
        '    name: string(),\n' +
        '  },\n' +
        '  relationships: {\n' +
        '    players: hasMany("Player"),\n' +
        '  },\n' +
        '} satisfies PartialSchema\n' +
        '```\n' +
        '\n' +
        'In this example, a `Player` belongs to a `Team`, and a `Team` has many `Players`. This establishes a one-to-many relationship between the two models. You can further explore more complex relationships like many-to-many using the `through` option as shown in the documentation.',
      score: 85,
      reason: "The prompt's answer correctly addresses how to define schema relationships in Hatchify using the `belongsTo` and `hasMany` functions, providing a clear example involving `Player` and `Team` models. However, it does not mention the `hasOne` relationship, which is included in the expected answer. This omission of one relationship type slightly reduces the completeness of the response, but the core concept and other details are accurately and effectively communicated."
    },
    {
      query: 'How can I query for data created between two dates?',
      answer: "To query for data created between two dates using Hatchify, you can use the `$gte` (greater than or equal to) and `$lte` (less than or equal to) operators on the date attribute. Here's an example of how you can filter data created between two dates:\n" +
        '\n' +
        '```typescript\n' +
        '// Example query to filter data created between two dates\n' +
        'const startDate = "2023-01-01";\n' +
        'const endDate = "2023-12-31";\n' +
        '\n' +
        '// Filter for data created between startDate and endDate\n' +
        'const filterQuery = `filter[createdAt][$gte]=${startDate}&filter[createdAt][$lte]=${endDate}`;\n' +
        '```\n' +
        '\n' +
        'In this example, `createdAt` is the attribute representing the creation date of the data. Adjust the attribute name according to your data model. The `startDate` and `endDate` variables represent the range of dates you want to filter. Update these values as needed for your specific use case.\n' +
        '\n' +
        'Include this filter query in your API request to retrieve data created between the specified dates.',
      score: 95,
      reason: "The prompt's answer closely matches the expected answer, providing a correct and detailed explanation on how to query for data created between two dates using the `$gte` and `$lte` operators. Both answers include similar code examples and explanations. The slight difference is in the additional guidance on how to use the filter query in an API request provided in the expected answer, which is briefly mentioned but not detailed in the prompt's answer. However, this does not significantly detract from the overall correctness and completeness of the response."
    },
    {
      query: 'How can I query data with pagination?',
      answer: 'To query data with pagination using Hatchify, you can follow the guidelines provided in the documentation. Here are the steps to perform pagination:\n' +
        '\n' +
        '### Offset and Limit Pagination\n' +
        '\n' +
        '1. Use the `page[offset]` parameter to specify the starting point in the dataset.\n' +
        '2. Use the `page[limit]` parameter to determine the maximum number of records to include on each page.\n' +
        '\n' +
        'For example, to request 10 todos, skipping the first 5, you can make the following request:\n' +
        '\n' +
        '```curl\n' +
        'GET /api/todos?page[offset]=5&page[limit]=10\n' +
        '```\n' +
        '\n' +
        '### Page-Based Pagination\n' +
        '\n' +
        '1. Use the `page[number]` parameter to specify the desired page number.\n' +
        '2. Use the `page[size]` parameter to determine the number of records per page.\n' +
        '\n' +
        'For example, to request the second page of todos where each page has 20 results, you can make the following request:\n' +
        '\n' +
        '```curl\n' +
        'GET /api/todos?page[number]=2&page[size]=20\n' +
        '```\n' +
        '\n' +
        'By following these techniques, you can effectively paginate data in your Hatchify-based application.',
      score: 95,
      reason: "The prompt's answer closely matches the expected answer, providing correct and detailed steps for both offset and limit pagination as well as page-based pagination using the Hatchify project. The examples given are accurate and align with the expected answer. The slight difference in wording does not affect the accuracy or completeness of the response."
    }
  ],
  summary: "The comments generally indicate that the prompt's answers are highly effective, closely matching the expected answers in terms of accuracy, detail, and relevance. Key observations include:\n" +
    '\n' +
    '1. **Accuracy and Completeness**: The prompt consistently provides accurate and detailed responses that align well with the expected answers. It effectively covers essential aspects of the questions, such as code examples and step-by-step guides.\n' +
    '\n' +
    "2. **Minor Discrepancies**: While the prompt's answers are largely accurate, there are minor discrepancies noted, such as slight variations in code paths, additional details not present in the expected answers, and minor omissions (e.g., not mentioning the `hasOne` relationship). These discrepancies do not significantly impact the overall correctness or utility of the responses.\n" +
    '\n' +
    '3. **Additional Details**: In some cases, the prompt provides additional details not found in the expected answers (e.g., database configuration details). These are generally seen as enhancing the response rather than detracting from it, although they sometimes deviate from the simplicity of the expected answers.\n' +
    '\n' +
    '4. **Slight Variations in Examples**: There are slight variations in the code examples provided by the prompt compared to the expected answers. These variations are minor and do not affect the fundamental correctness of the responses.\n' +
    '\n' +
    'Overall, the prompt is effective in delivering comprehensive and accurate information that is highly relevant to the questions asked, with only minor areas for improvement in terms of simplifying or precisely matching the expected answers.',
  averageScore: 93.57142857142857
}

Using the results and summary

The new workflow provides some nice benefits:

You can begin iterating on your inputs with the goal of increasing the averageScore provided by the LLM.
You can integrate it into your Continuous Integration workflow, like GitHub Actions. This will allow you to monitor pull requests to the documentation and ensure they can’t be merged unless they meet a specific score.
End users can enjoy your service since you now have a great way to provide reliable and accurate answers.

Let’s focus on that third benefit. You’ve created a durable workflow that trains one LLM to test the output of another LLM. Think about what that means for improving user experience! Most feedback cycles are reactionary, occurring after the user has experienced the end result. Your new LLM-based solution allows you to improve the reliability and quality of content before it ever reaches a user.

Benefits of using Temporal for prompt validation

Using Temporal's Durable Execution abstraction for this use case has similar benefits to using it for the LLM Prompting Workflow of the previous example: the workflow can never unintentionally fail. It can run for as long as necessary. The independent nature of Activities also makes it very easy to scale your test suites over time.

Your tests are fault tolerant

Since these tests inherently rely on network requests to a third-party service, failures due to network issues or outages at the third-party provider are not uncommon. These failures would ordinarily result in the test suite failing, but with Temporal, we get retries with basically no added development effort. Once the issue is resolved, the test suite can pick up where it left off, which eliminates the need to rerun the whole suite in the event of intermittent failures.

Your tests are scaleable

This example only has a handful of test cases for a single prompt, but as your application evolves and the number of prompts grows, you may find yourself with hundreds or thousands of test cases. Running that many tests sequentially is going to be slow, and running them in parallel would be extremely resource-intensive. Temporal provides a lot of flexibility in terms of scalability. For example, these test cases could be divided into smaller batches and run on independent activities. The worker cluster can then be scaled to meet the demands of the active test suites so that you scale to meet the needs of any enterprise customer.

Your tests have enhanced observability and debugging

Temporal provides excellent observability and debugging capabilities. Each workflow execution is tracked with a detailed event history, which includes every state transition and retry attempt. This visibility makes it easier to understand what went wrong if a test fails, as you can trace the entire execution path. You can inspect logs, view state changes, and even replay workflows to reproduce issues. This level of observability simplifies troubleshooting and ensures that any issues can be quickly identified and resolved, leading to more reliable and maintainable test suites.

Conclusion

Thanks for joining us on this adventure using Temporal to build an LLM. We hope you’ve enjoyed innovating with us. If you’d like to learn more about how we use Temporal for LLMs and other projects, join our Community Discord.

View full post