Data Extraction
Using the structured output APIs, we can gurantee the results returned from the language model match a particular schema.
For example, say we have a block of text and we want to extract specific information from it without writing a custom parser.
Example Text
Fireball Lvl. 3 Evocation
Casting Time: 1 action
Range: 150 feet
Target: A point you choose within range
Components: V S M (A tiny ball of bat guano and sulfur)
Duration: Instantaneous
Classes: Sorcerer, Wizard
A bright streak flashes from your pointing finger to a point you choose
within range and then blossoms with a low roar into an explosion of
flame. Each creature in a 20-foot-radius sphere centered on that point
must make a Dexterity saving throw. A target takes 8d6 fire damage on a
failed save, or half as much damage on a successful one. The fire
spreads around corners. It ignites flammable objects in the area that
aren’t being worn or carried.
At Higher Levels: When you cast this spell using a spell slot of 4th
level or higher, the damage increases by 1d6 for each slot level
above 3rd.
And the JSON schema we would like to use for the data extraction:
Example Response Schema
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"type": "object",
"properties": {
"name": { "type": "string" },
"castingTime": { "type": "string" },
"classes": {
"type": "array",
"items": { "type": "string" }
},
"range": { "type": "string" },
"damage": { "type": "string" }
},
"required": [
"name",
"castingTime",
"classes"
]
}
Now all we need to do it call the API explaining what you want to be pull out of the text blob and pass in the JSON schema for the response.
export API=https://api.sightglass.ai/api/v1
curl -X PUT \
-H "Authorization: Bearer $(API_KEY)" \
$(API)/action/ask \
--data '{\
"query": "extract the name, casting time, classes, range, and damage of this spell", \
"text": <text example above>, \
"jsonSchema": <schema example above> }'
Our API will work it's magic and return the results. There is some post-processing to ensure schemas are valid & default values are respected.
Response
{
"time": 2.594,
"status": "Ok",
"result": {
"jsonResponse": {
"castingTime": "1 action",
"classes": [
"Sorcerer",
"Wizard"
],
"damage": "8d6 fire damage",
"name": "Fireball",
"range": "150 feet"
}
}
}
Content Analysis
In addition to data extraction we can ask the language model to analyze content and present that information in a machine readable response.
For example, we are analyzing reviews for a restaurant and need to determine not only the sentiment but extract complaints / praises as part of the analysis.
Example (Real) Review
So disappointed the food really. I got the pork belly \"carbonara\"
and not only was the pork belly the worst I've ever had (a was cold
and undercooked), but the rice cakes were even . And maybe it was the
yolk that made the rice cakes taste bad but something was off and
literally had to gulp down my food with the wine. I was so disappointed;
I haven't had such a bad man in such a long time. I had a decision to
make at that time, say something or not. And despite being a $28 meal
that I absolutely hated, I chickened out and paid the full amount.
I know probably the wrong decision but I just couldn't bring myself
to have them take it back because I'd feel terrible. I hate doing that.
So probably the wrong decision but here we are.\nOn the other hand,
the service was absolutely fantastic! Everyone was so nice. I greatly
appreciated that!
To drink I had the cab. Tasty and lighter than most cabs.
I'd probably come back for the wine/drinks.
The food? I don't think so
And the JSON schema we would like to use for the response:
Example Response Schema
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"type": "object",
"properties": {
"sentiment": {
"type": "string",
"enum": ["happy", "satisfied", "not satisfied", "angry"]
},
"complaints": {
"type": "array",
"items": { "type": "string" }
},
"praises": {
"type": "array",
"items": { "type": "string" }
}
},
"required": [
"sentiment",
"complaints",
"praises"
]
}
Similar to the data extraction example, we call the API explaining what we want to be pull out of the text blob and pass in the JSON schema for the response.
export API=https://api.sightglass.ai/api/v1
curl -X PUT \
-H "Authorization: Bearer $(API_KEY)" \
$(API)/action/ask \
--data '{\
"query": "classify the sentiment of this text and extract any complaints or praises", \
"text": <text example above>, \
"jsonSchema": <schema example above> }'
Our API will work it's magic and return the results. There is some post-processing to ensure schemas are valid & default values are respected.
Response
{
"time": 5.457,
"status": "Ok",
"result": {
"jsonResponse": {
"complaints": [
"the pork belly was the worst I've ever had",
"the pork belly was cold and undercooked",
"the rice cakes were bad",
"something was off with the yolk and rice cakes",
"had to gulp down my food with the wine",
"I haven't had such a bad meal in a long time",
"I paid the full amount despite hating the meal"
],
"praises": [
"the service was absolutely fantastic",
"everyone was so nice",
"greatly appreciated the service"
],
"sentiment": "not satisfied"
}
}
}