Guiding an LLM's Response to Create Structured Output

Photo by Ricardo Gomez Angel on Unsplash

This article will teach you how to structure an LLM response such as GPT-4 or Llama 3 using validation libraries in Python.

This is a very relevant topic given the need to extract structured information in JSON format, for example, turns out to be fundamental for data mining tasks, where precise information is extracted from the unstructured format (such as free text).

In addition, structured response format is not reliable even in the most commercial systems around, such as GPT due to the LLM’s stochastic nature in generating output tokens.

We will use several libraries, such as Pydantic and Instructor for validation and schema modeling, and OpenAI and ollama for the LLM part. The proposed content will be valid both for closed-source models such as GPT by OpenAI or Anthropic and for open source models such as Llama 3.

By reading this article you will learn:

what it is and how to define a data model
how to make sure that your LLM respects the output format through validation rules
how to use the Instructor and Pydantic libraries

Enjoy the read!

Why do we need structured output?

LLMs like GPT-4 can provide enormous value even without structuring their response following a specific pattern. However, it is important (especially for programmers and those who work with data) that a possible response pattern can be respected if that is the user’s will.

Starting from a particular version of GPT-3.5, OpenAI added the response_format parameter in its completions API – this allows the user to define different keys, such as json_object, to guide the model towards a response more suited to the entered prompt.

Here’s an example:

from openai import OpenAI
client = OpenAI()

response = client.chat.completions.create(
  model="gpt-3.5-turbo-0125",
  response_format={ "type": "json_object" },
  messages=[
    {"role": "system", "content": "You are a helpful assistant designed to output JSON."},
    {"role": "user", "content": "Who won the world series in 2020?"}
  ]
)
print(response.choices[0].message.content)

>>> "content": "{"winner": "Los Angeles Dodgers"}"

However, this logic does not always work. In fact, OpenAI, in its documentation, suggests writing the word "JSON" in the prompt precisely to guide GPT in generating it. This is such an important tip that we are forced to write it in the prompt somewhere when we use response_format={ "type": "json_object" }.

Why is it difficult for LLMs to produce consistent JSON output?

This is because LLMs are effectively machines that return the next token more likely to follow the previous one given an input prompt. In fact, it is difficult to encounter this pattern in "nature" unless the model has been expressly guided during the training phase to see and understand these formats.

The JSON mode of newer LLMs does not guarantee that the output matches a specific pattern, only that it is valid and parsed without errors.

It therefore remains important to be able to validate what is inside these outputs and raise exceptions and errors if they are not consistent with our data model.

Use Case

We will see the example of extracting information in JSON starting from a simple question to an LLM such as GPT-4 or Llama3, as mentioned.

We could ask anything, but we will ask the model questions about the winners of the Soccer World Cup over time.

In particular we want to extract

Final date
Host nation of the tournament
Winner team
Top scorers

We will not worry about validating the accuracy of the data, but only about adapting the LLM’s textual response to the scheme we will see now.

In the article we will look at this example and perhaps explore others as well.

Required dependencies

Let’s now see the dependencies to install to run this tutorial.

Obviously, assuming that we already have an active development environment, we are going to install Pydantic, Instructor, OpenAI client and ollama.

Pydantic: is the most famous data model definition and validation library used by the community thanks to its ease of use, efficiency and relevance in data science
Instructor: it is in fact a wrapper around Pydantic specialized for working with LLMs and is the library that will allow you to create the validation logic
OpenAI: the famous client for querying GPT and other OpenAI models
ollama: Very convenient interface to open source LLMs like llama3 In our development environment, we issue the command to begin

In our development environment, we issue the command to begin

pip install pydantic instructor openai ollama

Since we also want to test open source models, the next step is to install ollama system-wide. You can learn how to install and use ollama by reading this dedicated article

How to use LLMs locally with ollama and Python

Now we can focus on development.

Definition of a data model

A data model is a logical pattern to follow to structure data. They are used in many contexts, from defining tables in databases to validating input data.

I’ve already covered a bit of data modeling using Pydantic in data science and machine learning in the post below 👇

Improve your data models with Pydantic

Let’s start by creating Pydantic data models:

from pydantic import BaseModel, Field
from typing import List
import datetime

class SoccerData(BaseModel):
    date_of_final: datetime.date = Field(..., description="Date of the final event")
    hosting_country: str = Field(..., description="The nation hosting the tournament")
    winner: str = Field(..., description="The soccer team that won the final cup")
    top_scorers: list = Field(
        ..., description="A list of the top 3 scorers of the tournament"
    )

class SoccerDataset(BaseModel):
    reports: List[SoccerData] = []

In this script we are importing the BaseModel and Field class from Pydnatic and using them to create a data model. We are in fact building the structure that our final result must have.

Pydantic requires that we declare the type of data going into the model. We have datetime.date which, for example, forces the date field to be a date and not a string. At the same time, the top_scorers field must necessarily be a list, otherwise Pydantic will return a validation error.

Finally, we create a data model that collects multiple instances of the SoccerData model. This is called SoccerDataset and will be used by Instructor to validate the presence of multiple reports, not just one.

Creating the system prompt

Very simply, we will write in English what the model must do, underlining the intent and structure of the result by providing examples.

system_prompt = """You are an expert sports journalist. You'll be asked to create a small report on who won the soccer world cups in specific years.
 You'll report the date of the tournament's final, the top 3 scorers of the entire tournament, the winning team, and the nation hosting the tournament.
 Return a JSON object with the following fields: date_of_final, hosting_country, winner, top_scorers.

 If multiple years are inputted, separate the reports with a comma.

 Here's an example
 [
    {
        "date_of_final": "1966",
        "hosting_country": "England",
        "winner": "England",
        "top_scorers": ["Player A", "Player B", "Player C"]
    },
    {
        "date_of_final": ...
        "hosting_country": ...
        "winner": ...
        "top_scorers": ...
    },

]

Here's the years you'll need to report on:

 """

This prompt will be used as a system prompt and will simply allow us to pass the years of interest separated by a comma.

Creating the Instructor code

Here we will create the main logic of JSON validation and structuring thanks to Instructor. It uses an interface similar to the one OpenAI provides to call GPT via API.

First we’ll use OpenAI in a function called query_gpt that allows us to parameterize our prompt:

from openai import OpenAI
import instructor

def query_gpt(prompt: str) -> list:
    client = instructor.from_openai(OpenAI(api_key="..."))
    resp = client.chat.completions.create(
        model="gpt-3.5-turbo",
        response_model=SoccerDataset,
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": prompt},
        ],
    )
    return resp.model_dump_json(indent=4)

Let’s remember to pass our OpenAI API key to the newly created client. We will use GPT-3.5-Turbo, passing SoccerDataset as response_model. Of course it would be possible to use "gpt-4o" if we wanted to use the most powerful model at the time of writing this article.

We don’t use SoccerData, but SoccerDataset.

If we used the former, the LLM would only ever return a single result.

Let’s put everything together and launch the software, passing the years "2010, 2014 and 2018" as content in the user prompt as input from which we want to generate the structured report.

from openai import OpenAI
import instructor

from typing import List
from pydantic import BaseModel, Field
import datetime

class SoccerData(BaseModel):
    date_of_final: datetime.date = Field(..., description="Date of the final event")
    hosting_country: str = Field(..., description="The nation hosting the tournament")
    winner: str = Field(..., description="The soccer team that won the final cup")
    top_scorers: list = Field(
        ..., description="A list of the top 3 scorers of the tournament"
    )

class SoccerDataset(BaseModel):
    reports: List[SoccerData] = []

system_prompt = """You are an expert sports journalist. You'll be asked to create a small report on who won the soccer world cups in specific years.
 You'll report the date of the tournament's final, the top 3 scorers of the entire tournament, the winning team, and the nation hosting the tournament.
 Return a JSON object with the following fields: date_of_final, hosting_country, winner, top_scorers.

 If the query is invalid, return an empty report.

 If multiple years are inputted, separate the reports with a comma.

 Here's an example
 [
    {
        "date_of_final": "1966",
        "hosting_country": "England",
        "winner": "England",
        "top_scorers": ["Player A", "Player B", "Player C"]
    },
    {
        "date_of_final": ...
        "hosting_country": ...
        "winner": ...
        "top_scorers": ...
    },

]

Here's the years you'll need to report on:

 """

def query_gpt(prompt: str) -> list:
    client = instructor.from_openai(OpenAI())
    resp = client.chat.completions.create(
        model="gpt-3.5-turbo",
        response_model=SoccerDataset,
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": prompt},
        ],
    )
    return resp.model_dump_json(indent=4)

if __name__ == "__main__":
  resp = query_llm("2010, 2014, 2018")
  print(resp)

This is the result:

{
    "reports": [
        {
            "date_of_final": "2010-07-11",
            "hosting_country": "South Africa",
            "winner": "Spain",
            "top_scorers": [
                "Thomas Müller",
                "David Villa",
                "Wesley Sneijder"
            ]
        },
        {
            "date_of_final": "2014-07-13",
            "hosting_country": "Brazil",
            "winner": "Germany",
            "top_scorers": [
                "James Rodríguez",
                "Thomas Müller",
                "Neymar"
            ]
        },
        {
            "date_of_final": "2018-07-15",
            "hosting_country": "Russia",
            "winner": "France",
            "top_scorers": [
                "Harry Kane",
                "Antoine Griezmann",
                "Romelu Lukaku"
            ]
        }
    ]
}

Fantastic. GPT-3.5-Turbo followed our prompt perfectly and Instructor validated the fields creating a structure consistent with the data model. In fact, the output is not a string, as an LLM like GPT would typically return, but a list of Python dictionaries.

Now let’s try to insert an input that makes no sense.

if __name__ == "__main__":
      print(query_gpt("hi, how are you?"))

>>>
{
 "reports": []
}

LLM correctly returns an empty report, because this is how we asked it to handle invalid queries via system prompt.

Use open source templates with Instructor

We have seen how to use GPT in Instructor to have structured JSON output. Now let’s see how to use ollama to use open source templates like llama3.

Remember that you need to download llama3 via ollama to use it.

Use the ollama pull llama3 command to download it!

Let’s create a new function called query_llama.

def query_llama(prompt: str) -> list:
    client = instructor.from_openai(
    OpenAI(
            base_url="http://localhost:11434/v1",
            api_key="ollama",  # valore richiesto, ma non influente
        ),
        mode=instructor.Mode.JSON,
    )
    resp = client.chat.completions.create(
        model="llama3",
        messages=[
            {
                "role": "system",
                "content": system_prompt
            },
            {
                "role": "user",
                "content": prompt
            }
        ],
        response_model=SoccerDataset,
    )
    return resp.model_dump_json(indent=4)

There are some differences with the GPT code. Let’s see them.

ollama is called through the same interface as GPT, but changing the base url pointer (base_url) and the API key, which is required but not needed for correct operation (don’t ask me why)
you need to explain the JSON mode through the mode parameter Let’s run the new function

Let’s run the function

if __name__ == "__main__":
    print(query_llama("2010, 2014, 2018"))

and here are the results:

{
    "reports": [
        {
            "date_of_final": "2010-07-11",
            "hosting_country": "South Africa",
            "winner": "Spain",
            "top_scorers": [
                "Thomas Müller",
                "Wolfram Toloi",
                "Landon Donovan"
            ]
        },
        {
            "date_of_final": "2014-07-13",
            "hosting_country": "Brazil",
            "winner": "Germany",
            "top_scorers": [
                "James Rodríguez",
                "Miroslav Klose",
                "Thomas Müller"
            ]
        },
        {
            "date_of_final": "2018-07-15",
            "hosting_country": "Russia",
            "winner": "France",
            "top_scorers": [
                "Harry Kane",
                "Kylian Mbappé",
                "Antoine Griezmann"
            ]
        }
    ]
}

We have a list with correct JSON! All this locally with Llama 3.

As I mentioned before, validation occurs for structure, not content. In fact, the content is different from that generated by GPT.

Let’s see how the markers are different. Perhaps it is possible to have the correct list by iterating on the prompt, specifying clearly which markers we want to receive.

Conclusion

We have seen how to use Pydantic, Instructor and ollama to drive the output of an LLM to a structured format, such as JSON.

Remember that the model is actually guided in this process, and therefore it is not deterministic. There will be cases where JSON will not be respected due to the non-deterministic nature of LLMs.