Natural Language Processing (Part 4)

LangChain (RAG)


LangChain is a framework built around Large Language Models.

I used LangChain to create a medical report application.


Streamlit is an open-source Python library used to create web applications for data science and machine learning projects.

I used streamlit to create the application's user interface.

hello world streamlit app

OpenAI's API

I updated my hello world application to call OpenAI's API.

I had to request a secret API key.

secret key

Using LangChain to call OpenAI's API.

Document pages

I used LangChain to return the text found on each page of a sample medical report.

hp4.pdf medical report

Jupyter Notebook

To improve performance and reduce costs I pre-processed the pdf file and stored the result.

I created a Jupyter Notebook to keep track of the steps I followed.

Running jupyter-notebook locally

I loaded pages from the medical report.


I broke the pages into paragraphs (texts).

[2] & [3]


I used Chroma to create an embeddings vector store and saved the store locally.

I used OpenAI's API to generate the embeddings


I converted the query 'Does the patient smoke?' to an embedding and compared the result with the embeddings in the vector store.


I ensured that the embeddings vector store could be loaded from the local folder.


I located the most similar paragraphs and sent those (see Context Injection) and the query to OpenAI's servers. 


The streamlit code

The finished application