Module Reference
LDDashRagChatbot
A Dash-based chatbot UI backed by Google Gemini (via LangChain) with optional Retrieval-Augmented Generation (RAG) over an uploaded document.
The application supports uploading common document types (txt/md/csv/json/pdf/docx), building a per-session vector index (Chroma), and answering questions using a LangChain Expression Language (LCEL) RAG chain.
Warning
This implementation stores vector indexes in a process-level in-memory dict
(RAG_SESSIONS). This is suitable for local development and demos, but is
not safe for multi-worker deployments (e.g., Gunicorn with multiple workers)
without persistence or an external store.
Environment Variables
GOOGLE_API_KEY(required): Google Gemini API key.GEMINI_MODEL(optional): Gemini chat model name. Default:gemini-2.5-flash-lite.EMBEDDING_MODEL(optional): Google embedding model. Default:models/text-embedding-004.TEMPERATURE(optional): LLM temperature. Default:0.2.CHUNK_SIZE(optional): Chunk size for text splitting. Default:1000.CHUNK_OVERLAP(optional): Chunk overlap for text splitting. Default:150.TOP_K(optional): Number of retrieved chunks for RAG. Default:5.MAX_TURNS(optional): Rolling window of chat turns kept in UI memory. Default:12.
Running
Run locally:
python LDDashRagChatbot.py
Then open:
Created on 1/20/26 at 9:24 PM By yuvarajdurairaj Module Name LDDashRagChatbot
- LDDashRagChatbot.RAG_SESSIONS = {}
In-memory store mapping a Dash session id to its RAG index state.
Keys are session ids (strings). Values are dicts with at least:
vs: Chroma vector storeretriever: Retriever derived from the vector storefilename: Uploaded filenamedoc_count: Number of source Documents produced by parsing
- LDDashRagChatbot.build_rag_chain(retriever)[source]
Build an LCEL Retrieval-Augmented Generation (RAG) chain.
The chain follows this conceptual structure:
Retrieve relevant documents for a question
Format documents into a promptable context string
Combine
{context}and{question}into a chat promptInvoke the Gemini chat model
Parse the model output into a plain string
LCEL equivalent:
( {"context": retriever | format_docs, "question": RunnablePassthrough()} | prompt | llm | StrOutputParser() )
- Parameters:
retriever – A LangChain retriever implementing the Runnable interface and returning a list of
Documentobjects.- Returns:
An LCEL runnable chain that accepts a question string and returns an answer string.
- Return type:
Runnable
- LDDashRagChatbot.build_vectorstore(docs: List[Document], collection_name: str) Chroma[source]
Build a Chroma vector store from Documents.
Documents are chunked using the module-level
splitterand embedded using the module-level Gemini embedding model.- Parameters:
docs – Input Documents to index.
collection_name – Name of the Chroma collection.
- Returns:
A Chroma vector store containing the embedded chunks.
- Return type:
Chroma
- LDDashRagChatbot.clear_attachment(n, session_id)[source]
Dash callback: clear the RAG index for the current session.
- Parameters:
n – Number of clicks on the “Clear attachment” button.
session_id – Per-browser session identifier.
- Returns:
A status message indicating no attachment is indexed.
- Return type:
str
- LDDashRagChatbot.decode_upload(contents: str) bytes[source]
Decode Dash
dcc.Uploadcontents into raw bytes.Dash upload contents are provided as a data URL:
data:<mime>;base64,<payload>- Parameters:
contents – The upload contents string from
dcc.Upload(contents=...).- Returns:
The decoded file contents.
- Return type:
bytes
- LDDashRagChatbot.format_docs(docs: List[Document]) str[source]
Format retrieved documents into a single context string.
This function is intended for use in an LCEL chain as a post-processing step after retrieval. It includes basic source metadata when available.
- Parameters:
docs – A list of LangChain
Documentobjects returned by a retriever.- Returns:
A formatted context string suitable for insertion into a prompt. If no documents are provided, returns
"NO_RELEVANT_CONTEXT".- Return type:
str
- LDDashRagChatbot.load_documents_from_upload(contents: str, filename: str) List[Document][source]
Load an uploaded file into LangChain Documents based on extension.
Supported extensions: -
txt,md,csv-> single Document -json-> single Document (pretty-printed) -pdf-> one Document per page -docx-> single Document- Parameters:
contents – The Dash upload contents string from
dcc.Upload.filename – The uploaded filename (used to infer file type and stored as metadata).
- Returns:
Extracted Documents representing the file content.
- Return type:
list[Document]
- Raises:
ValueError – If the file extension is unsupported.
- LDDashRagChatbot.on_send_or_clear(send_clicks, clear_clicks, n_submit, user_text, history, session_id)[source]
Dash callback: send a user message (or clear chat) and return updated history.
If a RAG index exists for the current session, the callback uses the LCEL RAG chain to answer the user question. Otherwise it falls back to a plain LLM call.
- Parameters:
send_clicks – Number of clicks on the “Send” button.
clear_clicks – Number of clicks on the “Clear chat” button.
n_submit –
dcc.Inputsubmit count (triggered when pressing Enter).user_text – Current text in the user input box.
history – Chat history stored in
dcc.Store.session_id – Per-browser session identifier.
- Returns:
Updated chat history and the cleared input box value.
- Return type:
tuple[list, str]
- LDDashRagChatbot.on_upload(contents, filename, session_id)[source]
Dash callback: handle file upload and build the RAG vector index.
- Parameters:
contents – Dash upload contents string (base64 data URL).
filename – Uploaded filename.
session_id – Per-browser session identifier.
- Returns:
A status message describing index build success/failure.
- Return type:
str
- LDDashRagChatbot.parse_docx(raw: bytes, filename: str) List[Document][source]
Extract text from a DOCX and return it as a list of Documents.
The DOCX content is extracted from paragraphs, joined, and returned as a single
Document(which will later be chunked by the text splitter).- Parameters:
raw – Raw bytes of the DOCX file.
filename – The original filename, stored in
Document.metadata['source'].
- Returns:
A single
Documentcontaining extracted DOCX text, or an empty list if no text was found.- Return type:
list[Document]
- LDDashRagChatbot.parse_json(raw: bytes) str[source]
Parse JSON bytes into a pretty-printed text representation.
If JSON parsing fails, falls back to text decoding.
- Parameters:
raw – Raw bytes for a JSON file.
- Returns:
Pretty-printed JSON string or decoded text fallback.
- Return type:
str
- LDDashRagChatbot.parse_pdf(raw: bytes, filename: str) List[Document][source]
Extract text from a PDF and return it as a list of Documents.
Each PDF page becomes a separate
Document. Pages with no extractable text are skipped.- Parameters:
raw – Raw bytes of the PDF file.
filename – The original filename, stored in
Document.metadata['source'].
- Returns:
One
Documentper page containing extractable text.- Return type:
list[Document]
- LDDashRagChatbot.parse_txt_like(raw: bytes) str[source]
Decode bytes into text for plain-text-like files.
Attempts UTF-8 first and falls back to Latin-1 with replacement.
- Parameters:
raw – Raw bytes for a text-like file.
- Returns:
Decoded text content.
- Return type:
str
- LDDashRagChatbot.render_chat(history)[source]
Render chat messages as simple “bubble” components.
- Parameters:
history – List of chat message dicts with keys
roleandcontent.- Returns:
A list of Dash HTML components representing the chat history.
- Return type:
list
- LDDashRagChatbot.trim_history(history)[source]
Trim chat history to a rolling window of recent turns.
The UI stores messages as dicts of the form
{"role": "...", "content": "..."}. This function keeps only the lastMAX_TURNS * 2entries (user+assistant).- Parameters:
history – List of chat message dicts.
- Returns:
Trimmed history list.
- Return type:
list