Conversing with Documents: Unleashing the Power of LLMs and LangChain

Unlock document insights. Learn how LLMs and LangChain can extract information and generate valuable content.

Yarnit Team

July 20, 2023

AI Insights

6 min read

Table of content

Link

Over the past few months, I’ve been captivated by the flood of apps claiming to be the ultimate “ChatGPT for your documents” on Product Hunt. The question that lingered in my mind was, “How do these apps actually work?” Curiosity led me down an exciting path of discovery, and I stumbled upon a framework that I think is revolutionizing the world of app development in the context of Large Language Models. It’s called LangChain.

As I delved deeper into the workings of LangChain, I discovered that creating such an app is not as daunting as it seems. In fact, it’s surprisingly achievable by combining three key workflows with the incredible power of the OpenAI API.

That being said, I humbly acknowledge that creating a software application is a very complex process and me not being a software developer have only explored this at a surface level. All nuances of software development remain unexplored. Let me begin with the context of the technique and then introduce the application.

Decoding the technique

Document Embeddings — First things first, we need to convert our documents into something called “embeddings”. Think of it as a fancy way of representing our documents in a language that the computer can understand. We use these document loaders provided by LangChain to upload the documents. Then, via the embeddings models, convert the documents to vector embeddings. Once we have our embeddings, we store them in a vector store for future searches.

Below is an example of extracting YouTube video transcripts and storing them in an FAISS vector store.

Note: In case the length of the document is low, instead of creating embeddings, we can directly pass the text in the prompt. This means that the entire document text is the context.

Context Establishment

Now, let’s talk about context. When a user asks a question, we need to understand the context and find the most relevant documents to provide accurate answers. We convert the user’s question into vector representations (again using LangChain’s embedding service). Then, using these vectors, we search through our library of embeddings and retrieve the most relevant documents.

Language Models at Work

‍Here comes the exciting part! We unleash the power of large language models. These super-smart models take the user’s question and the context we established earlier and generate precise answers. It’s like having a real conversation with our documents! These language models analyze the question, consider the context, and deliver the best response.

‍

A diagrammatic representation of the three workflows

Exploring the Development of an Application

With this grasp of the technique behind document-based conversations, let’s take a closer look at VIDIA.I — the app that brings this magic to life.

Overview

VIDIA.I integrates Streamlit, OpenAI API, and LangChain loaders and embeddings to deliver a seamless user experience. You will need an OpenAI API key to get started.

Asset Upload and Processing

VIDIA.I is designed to handle various asset types with ease. Whether it’s PDFs, web links, audio files, or more, you can upload them all to VIDIA.I for processing. For longer documents, embeddings are created.

Once uploaded, the app analyzes and processes these assets, enabling Q&A interactions, summarization, and extraction of valuable information. It’s like having your own document-savvy assistant right at your fingertips!

‍

Chat Away

Once the documents are uploaded and embeddings created (if needed), the chat window is enabled. You can ask questions related to the document.

Here, I have provided a link to blog on the OpenAI website. Let’s ask a few questions.

Let’s now ask a few general questions which aren’t a part of the blog. VIDIA still answers the question but puts a disclaimer saying that the information is not provided in the document. We can choose to take any action once we recognize that the question is out of context.

Document Summary, Questions, Talking Points

VIDIA also provides other information about the document that is of your interest.

Try it out!

You can try out the app and play around with it here — https://abhinav-kimothi-vidia-i-srcmain-w0iybq.streamlit.app/

If you’re interested, check out the source-code on Github

Cautions

The main essence for the success of a system like this is not the generative LLM but how the curated context is being passed to it.
Storing the data and metadata correctly is critical.
Using the right embedding model creates a world of difference.
Using the right vector database to reduce search space and efficient filtering can improve the system
Like everywhere else, the system can hallucinate. Beware!

Limitations of VIDIA

VIDIA uses the gpt-3 text-davinci model and can be upgraded.
It has yet to be a chat system and does not remember the previous responses. That can be built using the gpt-3.5/4 endpoints.
It can’t handle multiple documents/entire websites, yet.
CSV/excel/spreadsheets are out of scope for now.

Hope you find this useful. Do let me know what other techniques and applications you have come across in for this use case.savv