Features In-Depth

Document & Text Content Basics

Create and analyze document embeddings with a single API call

Add Your Data Summarize Semantic Search Question & Answer

Add Documents

Our API allows you to create & store collections of documents for analysis. These documents can be anything text based and any sort of metadata can be stored alongside the document.

export API=https://api.sightglass.ai/api/v1

curl -X PUT \
        -H "Authorization: Bearer $(API_KEY)" \
        $(API)/collections/test/documents \
        --data '{"text": "lorem...", "displayName": "test", "metadata": { "url": "http://example.com" }}'

No need to worry about context sizes, embeddings types, or vector databases. The document is automatically segmented, embeddings automatically created and stored.

Example output

{
  "time": 0.297,
  "status": "ok",
  "result": {
  "uuid": "2f26e64e-01d6-4db5-a5c9-36aa4c2218da"
  }
}

Summaries for any length content

Condense lenghty text content into concise and coherent summaries with a single API call. Content summaries can be automatically created as new documents are created or manually kicked off via an API request.

Let's for example, say we have scraped Wikipedia pages and are analyzing the content.

Example Document

  url: https://en.wikipedia.org/wiki/Rust_(programming_language)
  text: Rust is a multi-paradigm, general-purpose programming language that
  emphasizes performance, type safety, and concurrency. It enforces memory
  safety—ensuring that all references point to valid memory—without
  requiring the use of a garbage collector or reference counting present
  in other memory-safe languages. To simultaneously enforce memory safety
  and prevent concurrent data races, its "borrow checker" tracks the object
  lifetime of all references in a program during compilation. Rust borrows
  ideas from functional programming, including static types, immutability,
  higher-order functions, and algebraic data types. It is popularized for
  systems programming.[12][13][14]

  Software developer Graydon Hoare created Rust as a personal project
  while working at Mozilla Research in 2006. Mozilla officially sponsored
  the project in 2009. Since the first stable release in May 2015, Rust has
  been adopted by companies including Amazon, Discord, Dropbox, Facebook
  (Meta), Google (Alphabet), and Microsoft. In December 2022, it became
  the first language other than C and assembly to be supported in the
  development of the Linux kernel.

  Rust has been noted for its growth as a newer language[15][16] and has
  been the subject of academic programming languages research.[17][18][19]

We can summarize the stored document content with an request. Finished summaries are stored alongside the documents for future reference.

Summarization API Request

export API=https://api.sightglass.ai/api/v1

curl -X POST \
        -H "Authorization: Bearer $(API_KEY)" \
        $(API)/collections/test/documents/2f26e64e-01d6-4db5-a5c9-36aa4c2218da/summarize

Rust is a powerful programming language that combines multiple programming
      paradigms and is designed to prioritize performance, type safety, and concurrency.
      One of its standout features is its ability to enforce memory safety without r
      elying on a garbage collector or reference counting. This is achieved
      through the use of a \"borrow checker\" that tracks object lifetimes, ensuring
      memory safety and preventing data races.

Semantic Search

Semantic search analyzes the content and meaning of queries, rather than relying solely on keyword matching, delivering more accurate and relevant results.

export API=https://api.sightglass.ai/api/v1

curl -X PUT \
        -H "Authorization: Bearer $(API_KEY)" \
        $(API)/collections/test/search \
        --data '{"query": "thoughts on the Apple Vision Pro"}'

The semantic search API will return the most relevant documents as well as the relevant segments inside the document.

Example Response

{
  "time": 0.339022601,
  "status": "Ok",
  "result": [{
    "docId": "c21ef674-ad8e-48f2-924d-0f2679007d60",
    "title": "538: We Studied Thousands of Heads",
    "score": 96.42386,
    "context": [
      "super excited for the Apple vision pro...",
      ...
    ],
  }]
}

Question & Answer

Need to answer free-form questions about a collection of content? The chat API will use semantic search to find relevant documents and generate a response using context from those documents.

export API=https://api.sightglass.ai/api/v1

curl -X PUT \
        -H "Authorization: Bearer $(API_KEY)" \
        $(API)/collections/podcasts/chat \
        --data '{"query": "what do people think about the Apple Vision Pro?"}'

As part of the response, the relevant documents and context are also returned.

Example Response

{
  "time": 0.339022601,
  "status": "Ok",
  "result": {
    "response": "According to the context, the host of the Accidental Tech Podcast are very excited..."
    "context": [
      { "uuid": ..., "context": "super excited for the Apple vision pro..." },
      ...
    ]
  }
}

Getting Structured Output

Explain what you want and easily pipe your insights / analysis into other tools

Data Extraction Content Analysis

Data Extraction

Using the structured output APIs, we can gurantee the results returned from the language model match a particular schema.

For example, say we have a block of text and we want to extract specific information from it without writing a custom parser.

Example Text

Fireball Lvl. 3 Evocation

Casting Time: 1 action
Range: 150 feet
Target: A point you choose within range
Components: V S M (A tiny ball of bat guano and sulfur)
Duration: Instantaneous
Classes: Sorcerer, Wizard

A bright streak flashes from your pointing finger to a point you choose
within range and then blossoms with a low roar into an explosion of
flame. Each creature in a 20-foot-radius sphere centered on that point
must make a Dexterity saving throw. A target takes 8d6 fire damage on a
failed save, or half as much damage on a successful one. The fire
spreads around corners. It ignites flammable objects in the area that
aren’t being worn or carried.

At Higher Levels: When you cast this spell using a spell slot of 4th
level or higher, the damage increases by 1d6 for each slot level
above 3rd.

And the JSON schema we would like to use for the data extraction:

Example Response Schema

{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "type": "object",
  "properties": {
    "name": { "type": "string" },
    "castingTime": { "type": "string" },
    "classes": {
      "type": "array",
      "items": { "type": "string" }
    },
    "range": { "type": "string" },
    "damage": { "type": "string" }
  },
  "required": [
    "name",
    "castingTime",
    "classes"
  ]
}

Now all we need to do it call the API explaining what you want to be pull out of the text blob and pass in the JSON schema for the response.

export API=https://api.sightglass.ai/api/v1

curl -X PUT \
        -H "Authorization: Bearer $(API_KEY)" \
        $(API)/action/ask \
        --data '{\
          "query": "extract the name, casting time, classes, range, and damage of this spell", \
          "text": <text example above>, \
          "jsonSchema": <schema example above> }'

Our API will work it's magic and return the results. There is some post-processing to ensure schemas are valid & default values are respected.

Response

{
  "time": 2.594,
  "status": "Ok",
  "result": {
    "jsonResponse": {
      "castingTime": "1 action",
      "classes": [
        "Sorcerer",
        "Wizard"
      ],
      "damage": "8d6 fire damage",
      "name": "Fireball",
      "range": "150 feet"
    }
  }
}

Content Analysis

In addition to data extraction we can ask the language model to analyze content and present that information in a machine readable response.

For example, we are analyzing reviews for a restaurant and need to determine not only the sentiment but extract complaints / praises as part of the analysis.

Example (Real) Review

So disappointed the food really. I got the pork belly \"carbonara\"
and not only was the pork belly the worst I've ever had (a was cold
and undercooked), but the rice cakes were even . And maybe it was the
yolk that made the rice cakes taste bad but something was off and
literally had to gulp down my food with the wine. I was so disappointed;

I haven't had such a bad man in such a long time. I had a decision to
make at that time, say something or not. And despite being a $28 meal
that I absolutely hated, I chickened out and paid the full amount.
I know probably the wrong decision but I just couldn't bring myself
to have them take it back because I'd feel terrible. I hate doing that.
So probably the wrong decision but here we are.\nOn the other hand,
the service was absolutely fantastic! Everyone was so nice. I greatly
appreciated that!

To drink I had the cab. Tasty and lighter than most cabs.
I'd probably come back for the wine/drinks.
The food? I don't think so

And the JSON schema we would like to use for the response:

Example Response Schema

{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "type": "object",
  "properties": {
    "sentiment": {
      "type": "string",
      "enum": ["happy", "satisfied", "not satisfied", "angry"]
    },
    "complaints": {
      "type": "array",
      "items": { "type": "string" }
    },
    "praises": {
      "type": "array",
      "items": { "type": "string" }
    }
  },
  "required": [
    "sentiment",
    "complaints",
    "praises"
  ]
}

Similar to the data extraction example, we call the API explaining what we want to be pull out of the text blob and pass in the JSON schema for the response.

export API=https://api.sightglass.ai/api/v1

curl -X PUT \
        -H "Authorization: Bearer $(API_KEY)" \
        $(API)/action/ask \
        --data '{\
          "query": "classify the sentiment of this text and extract any complaints or praises", \
          "text": <text example above>, \
          "jsonSchema": <schema example above> }'

Our API will work it's magic and return the results. There is some post-processing to ensure schemas are valid & default values are respected.

Response

{
  "time": 5.457,
  "status": "Ok",
  "result": {
    "jsonResponse": {
      "complaints": [
        "the pork belly was the worst I've ever had",
        "the pork belly was cold and undercooked",
        "the rice cakes were bad",
        "something was off with the yolk and rice cakes",
        "had to gulp down my food with the wine",
        "I haven't had such a bad meal in a long time",
        "I paid the full amount despite hating the meal"
      ],
      "praises": [
        "the service was absolutely fantastic",
        "everyone was so nice",
        "greatly appreciated the service"
      ],
      "sentiment": "not satisfied"
    }
  }
}