🏷 LLMParser

Github npm

LLMParser makes it easy for anyone to classify and extract structured data from text with large language models (LLMs). No prompt engineering or AI experience required.

Why?

While LLMs are extremely powerful, producing reliable JSON output is challenging.

LLMParser aims to solve this by enforcing a consistent JSON input and output format for classifying and extracting text with LLMs.

What can you do?

There are three main ways to use LLMParser:

Classify Text - eg. classify corporate contracts as NDA, MSA, etc.
Extract Fields - eg. extract job titles from LinkedIn profiles or dishes from menus
Classify and Extract Fields - eg. classify emails that relate to scheduling and extract available times

Quick Start

To get started, install the package:

npm install llmparser
npm install llmparser

yarn add llmparser
yarn add llmparser

Then, import LLMParser, instantiate a new parser, and parse some text.

import { LLMParser } from 'llmparser';

const categories = // see below for examples on how to classify text, extract structured data, or both

const parser = new LLMParser({
  apiKey: process.env.OPENAI_API_KEY,
  categories: categories,
});

const result = await parser.parse({
  document // text to classify and/or extract structured data from
})
import { LLMParser } from 'llmparser';

const categories = // see below for examples on how to classify text, extract structured data, or both

const parser = new LLMParser({
  apiKey: process.env.OPENAI_API_KEY,
  categories: categories,
});

const result = await parser.parse({
  document // text to classify and/or extract structured data from
})

LLMParser Class

The LLMParser constructor takes an object with four fields:

apiKey - string (required) OpenAI API key
categories - Category[] (optional) categories to classify text into
fields - Field[] (optional) fields to parse
model - string (optional) name of the LLM model to use (defaults to gpt-3.5-turbo)

Supply a categories array if you want to classify text.

const classifier = new LLMParser({
  apiKey: process.env.OPENAI_API_KEY,
  categories: jobTypes,
  model: 'gpt-3.5-turbo'  // or 'gpt-4' | 'text-davinci-003'
});
const classifier = new LLMParser({
  apiKey: process.env.OPENAI_API_KEY,
  categories: jobTypes,
  model: 'gpt-3.5-turbo'  // or 'gpt-4' | 'text-davinci-003'
});

Supply a fields array if you only want to extract fields from text.

const extractor = new LLMParser({
  apiKey: process.env.OPENAI_API_KEY,
  fields: jobFields
});
const extractor = new LLMParser({
  apiKey: process.env.OPENAI_API_KEY,
  fields: jobFields
});

You can supply either categories or fields, but not both.

If you want to classify and extract fields, you can add fields to a category.

Parse Method

The LLMParser instance has one method, parse, which takes two fields:

document - string (required) text to classify or extract fields from
forceClassifyAs - string (optional) name of a category to force classify as

const result = await parser.parse({
  document // text to classify and/or extract structured data from
})
const result = await parser.parse({
  document // text to classify and/or extract structured data from
})

The result of the parse method is an object of type ParseResult.

type ParseResult = Partial<ClassificationResult> & {
  fields?: FieldsResultObject;
};

type FieldResult = {
  value: PossibleFieldValues;
  source: string;
  confidence: number;
  type: 'string' | 'number' | 'boolean' | 'date';
};

type FieldsResultObject = {
  [key: string]: FieldResult;
};
type ParseResult = Partial<ClassificationResult> & {
  fields?: FieldsResultObject;
};

type FieldResult = {
  value: PossibleFieldValues;
  source: string;
  confidence: number;
  type: 'string' | 'number' | 'boolean' | 'date';
};

type FieldsResultObject = {
  [key: string]: FieldResult;
};

Classifying Text

The first use case we will go over is classifying text. For example, say we want to classify a job posting as either "Software Engineer" or "Head of Community."

To do this we will instantiate a new LLMParser with the following categories:

const categories = [{
  name: "Software Engineering",
  description: "this job description Software engineers design, develop, and maintain software systems.",
},
{
  name: "Head of Community",
  description: "Data scientists use data to solve problems.",
}];
const categories = [{
  name: "Software Engineering",
  description: "this job description Software engineers design, develop, and maintain software systems.",
},
{
  name: "Head of Community",
  description: "Data scientists use data to solve problems.",
}];

Now let's put it all together and classify a job posting.

import { LLMParser } from 'llmparser';

// classify text into 'Software Engineering' or 'Head of Community'
const categories = [{
  name: "Software Engineering",
  description: "this job description Software engineers design, develop, and maintain software systems.",
},
{
  name: "Head of Community",
  description: "Data scientists use data to solve problems.",
}];

// instantiate the parser
const parser = new LLMParser({
  apiKey: process.env.OPENAI_API_KEY,
  categories,
});

// fake job posting
const jobPosting = `Head of Community at Notion (View all jobs)
San Francisco, California; New York, New York;
About Us: We're on a mission to make...`;

// classify the job posting
const classification = await parser.parse({
  document: jobPosting,
});
import { LLMParser } from 'llmparser';

// classify text into 'Software Engineering' or 'Head of Community'
const categories = [{
  name: "Software Engineering",
  description: "this job description Software engineers design, develop, and maintain software systems.",
},
{
  name: "Head of Community",
  description: "Data scientists use data to solve problems.",
}];

// instantiate the parser
const parser = new LLMParser({
  apiKey: process.env.OPENAI_API_KEY,
  categories,
});

// fake job posting
const jobPosting = `Head of Community at Notion (View all jobs)
San Francisco, California; New York, New York;
About Us: We're on a mission to make...`;

// classify the job posting
const classification = await parser.parse({
  document: jobPosting,
});

And here is our classification!

{
  "type": "Head of Community",
  "confidence": 0.9,
  "source": "Head of Community, lead our efforts to build an engaged and passionate community of Notion users and customers."
}
{
  "type": "Head of Community",
  "confidence": 0.9,
  "source": "Head of Community, lead our efforts to build an engaged and passionate community of Notion users and customers."
}

Categories - Category[]

The categories array is how we tell our parser what categories to classify text into. Each category (type Category) has three fields:

name - string (required) the name of the category type
description - string (required) extra instructions to help the LLM
fields - Field[] (optional) an array of fields to extract

interface Category {
  name: string;
  description: string;
  fields?: Field[];
}
interface Category {
  name: string;
  description: string;
  fields?: Field[];
}

If you want to classify and extract fields, you can populate the fields property on a category. Here's an example:

import { Category } from 'llmparser';

// if you are using Typescript you can type your categories
// const categoriesAndFields: Category[] = [
const categoriesAndFields = [
  {
    name: 'software',
    description: "this job description Software engineers design, develop, and maintain software systems.",
    fields: [ // fields to extract
      {
        name: 'salary',
        description: 'salary range',
        type: 'string',
      }
    ]
  },
  {
    name: 'community',
    description: "Data scientists use data to solve problems.",
    fields: [
      {
        name: 'salary',
        description: 'salary range',
        type: 'string',
      }
    ]
  }
];
import { Category } from 'llmparser';

// if you are using Typescript you can type your categories
// const categoriesAndFields: Category[] = [
const categoriesAndFields = [
  {
    name: 'software',
    description: "this job description Software engineers design, develop, and maintain software systems.",
    fields: [ // fields to extract
      {
        name: 'salary',
        description: 'salary range',
        type: 'string',
      }
    ]
  },
  {
    name: 'community',
    description: "Data scientists use data to solve problems.",
    fields: [
      {
        name: 'salary',
        description: 'salary range',
        type: 'string',
      }
    ]
  }
];

Extracting Fields

The next use case we will cover is extracting fields from text. For example, say we want to extract the job title, company name, and location from a job posting.

To do this we will instantiate a new LLMParser with the following fields:

const fields = [{
  name: "Job Title",
  description: "the title of the job",
  type: "string",
},
{
  name: "Company Name",
  description: "the name of the company",
  type: "string",
},
{
  name: "Location",
  description: "the location of the job",
  type: "string",
}];
const fields = [{
  name: "Job Title",
  description: "the title of the job",
  type: "string",
},
{
  name: "Company Name",
  description: "the name of the company",
  type: "string",
},
{
  name: "Location",
  description: "the location of the job",
  type: "string",
}];

Now let's put it all together and extract these fields from a job posting.

import { LLMParser } from 'llmparser';

// fields to extract
const fields = [{
  name: "Job Title",
  description: "the title of the job",
  type: "string",
},
{
  name: "Company Name",
  description: "the name of the company",
  type: "string",
},
{
  name: "Location",
  description: "the location of the job",
  type: "string",
}];

// instantiate the parser
const parser = new LLMParser({
  apiKey: process.env.OPENAI_API_KEY,
  fields,
});

// fake job posting
const jobPosting = `Head of Community at Notion (View all jobs)
San Francisco, California; New York, New York;
About Us: We're on a mission to make...`;

// classify the job posting
const extractedFields = await parser.parse({
  document: jobPosting,
});
import { LLMParser } from 'llmparser';

// fields to extract
const fields = [{
  name: "Job Title",
  description: "the title of the job",
  type: "string",
},
{
  name: "Company Name",
  description: "the name of the company",
  type: "string",
},
{
  name: "Location",
  description: "the location of the job",
  type: "string",
}];

// instantiate the parser
const parser = new LLMParser({
  apiKey: process.env.OPENAI_API_KEY,
  fields,
});

// fake job posting
const jobPosting = `Head of Community at Notion (View all jobs)
San Francisco, California; New York, New York;
About Us: We're on a mission to make...`;

// classify the job posting
const extractedFields = await parser.parse({
  document: jobPosting,
});

And here are our extracted fields!

{
  "fields": {
    "Job Title": {
      "value": "Head of Community",
      "source": "Head of Community",
      "confidence": 1,
      "type": "string"
    },
    "Company Name": {
      "value": "Notion",
      "source": "Notion",
      "confidence": 1,
      "type": "string"
    },
    "Location": {
      "value": "San Francisco, California; New York, New York",
      "source": "San Francisco, California; New York, New York",
      "confidence": 1,
      "type": "string"
    }
  }
}
{
  "fields": {
    "Job Title": {
      "value": "Head of Community",
      "source": "Head of Community",
      "confidence": 1,
      "type": "string"
    },
    "Company Name": {
      "value": "Notion",
      "source": "Notion",
      "confidence": 1,
      "type": "string"
    },
    "Location": {
      "value": "San Francisco, California; New York, New York",
      "source": "San Francisco, California; New York, New York",
      "confidence": 1,
      "type": "string"
    }
  }
}

Fields - Field[]

The fields array is how we tell our parser what fields to extract from a document. Each field (type Field) has three fields:

name - string (required) the name of the field
description - string (required) extra instructions to help the LLM
type - string (required) the type of the field (string, number, boolean, date)

interface Field {
  name: string;
  description: string;
  type: 'string' | 'number' | 'boolean' | 'date';
}
interface Field {
  name: string;
  description: string;
  type: 'string' | 'number' | 'boolean' | 'date';
}

Here's an example:

import { Field } from 'llmparser';

// if you are using Typescript you can type your fields
// const fields: Field[] = [{
const fields = [{
  name: "Job Title",
  description: "the title of the job posting.",
  type: "string",
},
{
  name: "Company Name",
  description: "the name of the company",
  type: "string",
},
{
  name: "Location",
  description: "the location of the job posting",
  type: "string",
}];
import { Field } from 'llmparser';

// if you are using Typescript you can type your fields
// const fields: Field[] = [{
const fields = [{
  name: "Job Title",
  description: "the title of the job posting.",
  type: "string",
},
{
  name: "Company Name",
  description: "the name of the company",
  type: "string",
},
{
  name: "Location",
  description: "the location of the job posting",
  type: "string",
}];

Classifying and Extracting Fields

The last way to use LLMParser is to classify and extract fields. For example, say we want to classify a job posting as either "Software Engineer" or "Head of Community" and then extract the job title and location.

To do this we will instantiate a new LLMParser with the following categories:

const categories = [{
  name: "Software Engineering",
  description: "this job description Software engineers design, develop, and maintain software systems.",
  fields: [{ // extract these fields after classification
    name: "Job Title",
    description: "the title of the job",
    type: "string",
  },
  {
    name: "Company Name",
    description: "the name of the company",
    type: "string",
  }]
},
{
  name: "Head of Community",
  description: "Data scientists use data to solve problems.",
  fields: [{ // extract these fields after classification
    name: "Job Title",
    description: "the title of the job",
    type: "string",
  },
  {
    name: "Company Name",
    description: "the name of the company",
    type: "string",
  }]
}];
const categories = [{
  name: "Software Engineering",
  description: "this job description Software engineers design, develop, and maintain software systems.",
  fields: [{ // extract these fields after classification
    name: "Job Title",
    description: "the title of the job",
    type: "string",
  },
  {
    name: "Company Name",
    description: "the name of the company",
    type: "string",
  }]
},
{
  name: "Head of Community",
  description: "Data scientists use data to solve problems.",
  fields: [{ // extract these fields after classification
    name: "Job Title",
    description: "the title of the job",
    type: "string",
  },
  {
    name: "Company Name",
    description: "the name of the company",
    type: "string",
  }]
}];

Now let's put it all together and classify + extract data from a job posting.

import { LLMParser } from 'llmparser';

// classify text into 'Software Engineering' or 'Head of Community' and extract fields
const categories = [{
  name: "Software Engineering",
  description: "this job description Software engineers design, develop, and maintain software systems.",
  fields: [{ // extract these fields after classification
    name: "Job Title",
    description: "the title of the job",
    type: "string",
  },
  {
    name: "Company Name",
    description: "the name of the company",
    type: "string",
  }]
},
{
  name: "Head of Community",
  description: "Data scientists use data to solve problems.",
  fields: [{ // extract these fields after classification
    name: "Job Title",
    description: "the title of the job",
    type: "string",
  },
  {
    name: "Company Name",
    description: "the name of the company",
    type: "string",
  }]
}];

// instantiate the parser
const parser = new LLMParser({
  apiKey: process.env.OPENAI_API_KEY,
  categories,
});

// fake job posting
const jobPosting = `Head of Community at Notion (View all jobs)
San Francisco, California; New York, New York;
About Us: We're on a mission to make...`;

// classify + extract fields from the job posting
const results = await parser.parse({
  document: jobPosting,
});
import { LLMParser } from 'llmparser';

// classify text into 'Software Engineering' or 'Head of Community' and extract fields
const categories = [{
  name: "Software Engineering",
  description: "this job description Software engineers design, develop, and maintain software systems.",
  fields: [{ // extract these fields after classification
    name: "Job Title",
    description: "the title of the job",
    type: "string",
  },
  {
    name: "Company Name",
    description: "the name of the company",
    type: "string",
  }]
},
{
  name: "Head of Community",
  description: "Data scientists use data to solve problems.",
  fields: [{ // extract these fields after classification
    name: "Job Title",
    description: "the title of the job",
    type: "string",
  },
  {
    name: "Company Name",
    description: "the name of the company",
    type: "string",
  }]
}];

// instantiate the parser
const parser = new LLMParser({
  apiKey: process.env.OPENAI_API_KEY,
  categories,
});

// fake job posting
const jobPosting = `Head of Community at Notion (View all jobs)
San Francisco, California; New York, New York;
About Us: We're on a mission to make...`;

// classify + extract fields from the job posting
const results = await parser.parse({
  document: jobPosting,
});

And here are our results!

{
  "type": "Head of Community",
  "confidence": 0.9,
  "source": "Head of Community, lead our efforts to build an engaged and passionate community of Notion users and customers.",
  "fields": {
    "Job Title": {
      "value": "Head of Community",
      "source": "Head of Community",
      "confidence": 1,
      "type": "string"
    },
    "Company Name": {
      "value": "Notion",
      "source": "Notion",
      "confidence": 1,
      "type": "string"
    }
  }
}
{
  "type": "Head of Community",
  "confidence": 0.9,
  "source": "Head of Community, lead our efforts to build an engaged and passionate community of Notion users and customers.",
  "fields": {
    "Job Title": {
      "value": "Head of Community",
      "source": "Head of Community",
      "confidence": 1,
      "type": "string"
    },
    "Company Name": {
      "value": "Notion",
      "source": "Notion",
      "confidence": 1,
      "type": "string"
    }
  }
}

That was a quick overview of how to use LLMParser! Try it for yourself in the playground or install the package and get started.

If you have any questions or feedback, please email me at kevin@llmparser.com.