🏷 LLMParser

LLMParser makes it easy for anyone to classify and extract structured data from text with large language models (LLMs). No prompt engineering or AI experience required.

Why?

While LLMs are extremely powerful, producing reliable JSON output is challenging.

LLMParser aims to solve this by enforcing a consistent JSON input and output format for classifying and extracting text with LLMs.

What can you do?

There are three main ways to use LLMParser:

  1. Classify Text - eg. classify corporate contracts as NDA, MSA, etc.
  2. Extract Fields - eg. extract job titles from LinkedIn profiles or dishes from menus
  3. Classify and Extract Fields - eg. classify emails that relate to scheduling and extract available times

Quick Start

To get started, install the package:

or

Then, import LLMParser, instantiate a new parser, and parse some text.

LLMParser Class

The LLMParser constructor takes an object with four fields:

  • apiKey - string (required) OpenAI API key
  • categories - Category[] (optional) categories to classify text into
  • fields - Field[] (optional) fields to parse
  • model - string (optional) name of the LLM model to use (defaults to gpt-3.5-turbo)

Supply a categories array if you want to classify text.

Supply a fields array if you only want to extract fields from text.

You can supply either categories or fields, but not both.

If you want to classify and extract fields, you can add fields to a category.

Parse Method

The LLMParser instance has one method, parse, which takes two fields:

  • document - string (required) text to classify or extract fields from
  • forceClassifyAs - string (optional) name of a category to force classify as

The result of the parse method is an object of type ParseResult.

Classifying Text

The first use case we will go over is classifying text. For example, say we want to classify a job posting as either "Software Engineer" or "Head of Community."

To do this we will instantiate a new LLMParser with the following categories:

Now let's put it all together and classify a job posting.

And here is our classification!

Categories - Category[]

The categories array is how we tell our parser what categories to classify text into. Each category (type Category) has three fields:

  • name - string (required) the name of the category type
  • description - string (required) extra instructions to help the LLM
  • fields - Field[] (optional) an array of fields to extract

If you want to classify and extract fields, you can populate the fields property on a category. Here's an example:

Extracting Fields

The next use case we will cover is extracting fields from text. For example, say we want to extract the job title, company name, and location from a job posting.

To do this we will instantiate a new LLMParser with the following fields:

Now let's put it all together and extract these fields from a job posting.

And here are our extracted fields!

Fields - Field[]

The fields array is how we tell our parser what fields to extract from a document. Each field (type Field) has three fields:

  • name - string (required) the name of the field
  • description - string (required) extra instructions to help the LLM
  • type - string (required) the type of the field (string, number, boolean, date)

Here's an example:

Classifying and Extracting Fields

The last way to use LLMParser is to classify and extract fields. For example, say we want to classify a job posting as either "Software Engineer" or "Head of Community" and then extract the job title and location.

To do this we will instantiate a new LLMParser with the following categories:

Now let's put it all together and classify + extract data from a job posting.

And here are our results!


That was a quick overview of how to use LLMParser! Try it for yourself in the playground or install the package and get started.

If you have any questions or feedback, please email me at kevin@llmparser.com.