Module Reference

LDDashRagChatbot

A Dash-based chatbot UI backed by Google Gemini (via LangChain) with optional Retrieval-Augmented Generation (RAG) over an uploaded document.

The application supports uploading common document types (txt/md/csv/json/pdf/docx), building a per-session vector index (Chroma), and answering questions using a LangChain Expression Language (LCEL) RAG chain.

Warning

This implementation stores vector indexes in a process-level in-memory dict (RAG_SESSIONS). This is suitable for local development and demos, but is not safe for multi-worker deployments (e.g., Gunicorn with multiple workers) without persistence or an external store.

Environment Variables

GOOGLE_API_KEY (required): Google Gemini API key.
GEMINI_MODEL (optional): Gemini chat model name. Default: gemini-2.5-flash-lite.
EMBEDDING_MODEL (optional): Google embedding model. Default: models/text-embedding-004.
TEMPERATURE (optional): LLM temperature. Default: 0.2.
CHUNK_SIZE (optional): Chunk size for text splitting. Default: 1000.
CHUNK_OVERLAP (optional): Chunk overlap for text splitting. Default: 150.
TOP_K (optional): Number of retrieved chunks for RAG. Default: 5.
MAX_TURNS (optional): Rolling window of chat turns kept in UI memory. Default: 12.

Running

Run locally:

python LDDashRagChatbot.py

Then open:

http://127.0.0.1:8050

Created on 1/20/26 at 9:24 PM By yuvarajdurairaj Module Name LDDashRagChatbot

LDDashRagChatbot.RAG_SESSIONS = {}

In-memory store mapping a Dash session id to its RAG index state.

Keys are session ids (strings). Values are dicts with at least:

vs: Chroma vector store
retriever: Retriever derived from the vector store
filename: Uploaded filename
doc_count: Number of source Documents produced by parsing

LDDashRagChatbot.build_rag_chain(retriever)[source]

Build an LCEL Retrieval-Augmented Generation (RAG) chain.

The chain follows this conceptual structure:

Retrieve relevant documents for a question
Format documents into a promptable context string
Combine {context} and {question} into a chat prompt
Invoke the Gemini chat model
Parse the model output into a plain string

LCEL equivalent:

(
  {"context": retriever | format_docs, "question": RunnablePassthrough()}
  | prompt
  | llm
  | StrOutputParser()
)

Parameters:: retriever – A LangChain retriever implementing the Runnable interface and returning a list of Document objects.
Returns:: An LCEL runnable chain that accepts a question string and returns an answer string.
Return type:: Runnable

LDDashRagChatbot.build_vectorstore(docs: List[Document], collection_name: str) → Chroma[source]

Build a Chroma vector store from Documents.

Documents are chunked using the module-level splitter and embedded using the module-level Gemini embedding model.

Parameters:

docs – Input Documents to index.
collection_name – Name of the Chroma collection.

Returns:

A Chroma vector store containing the embedded chunks.

Return type:

Chroma

LDDashRagChatbot.clear_attachment(n, session_id)[source]

Dash callback: clear the RAG index for the current session.

Parameters:

n – Number of clicks on the “Clear attachment” button.
session_id – Per-browser session identifier.

Returns:

A status message indicating no attachment is indexed.

Return type:

str

LDDashRagChatbot.decode_upload(contents: str) → bytes[source]

Decode Dash dcc.Upload contents into raw bytes.

Dash upload contents are provided as a data URL:

data:<mime>;base64,<payload>

Parameters:: contents – The upload contents string from dcc.Upload(contents=...).
Returns:: The decoded file contents.
Return type:: bytes

LDDashRagChatbot.format_docs(docs: List[Document]) → str[source]

Format retrieved documents into a single context string.

This function is intended for use in an LCEL chain as a post-processing step after retrieval. It includes basic source metadata when available.

Parameters:: docs – A list of LangChain Document objects returned by a retriever.
Returns:: A formatted context string suitable for insertion into a prompt. If no documents are provided, returns "NO_RELEVANT_CONTEXT".
Return type:: str

LDDashRagChatbot.load_documents_from_upload(contents: str, filename: str) → List[Document][source]

Load an uploaded file into LangChain Documents based on extension.

Supported extensions: - txt, md, csv -> single Document - json -> single Document (pretty-printed) - pdf -> one Document per page - docx -> single Document

Parameters:

contents – The Dash upload contents string from dcc.Upload.
filename – The uploaded filename (used to infer file type and stored as metadata).

Returns:

Extracted Documents representing the file content.

Return type:

list[Document]

Raises:

ValueError – If the file extension is unsupported.

LDDashRagChatbot.on_send_or_clear(send_clicks, clear_clicks, n_submit, user_text, history, session_id)[source]

Dash callback: send a user message (or clear chat) and return updated history.

If a RAG index exists for the current session, the callback uses the LCEL RAG chain to answer the user question. Otherwise it falls back to a plain LLM call.

Parameters:

send_clicks – Number of clicks on the “Send” button.
clear_clicks – Number of clicks on the “Clear chat” button.
n_submit – dcc.Input submit count (triggered when pressing Enter).
user_text – Current text in the user input box.
history – Chat history stored in dcc.Store.
session_id – Per-browser session identifier.

Returns:

Updated chat history and the cleared input box value.

Return type:

tuple[list, str]

LDDashRagChatbot.on_upload(contents, filename, session_id)[source]

Dash callback: handle file upload and build the RAG vector index.

Parameters:

contents – Dash upload contents string (base64 data URL).
filename – Uploaded filename.
session_id – Per-browser session identifier.

Returns:

A status message describing index build success/failure.

Return type:

str

LDDashRagChatbot.parse_docx(raw: bytes, filename: str) → List[Document][source]

Extract text from a DOCX and return it as a list of Documents.

The DOCX content is extracted from paragraphs, joined, and returned as a single Document (which will later be chunked by the text splitter).

Parameters:

raw – Raw bytes of the DOCX file.
filename – The original filename, stored in Document.metadata['source'].

Returns:

A single Document containing extracted DOCX text, or an empty list if no text was found.

Return type:

list[Document]

LDDashRagChatbot.parse_json(raw: bytes) → str[source]

Parse JSON bytes into a pretty-printed text representation.

If JSON parsing fails, falls back to text decoding.

Parameters:: raw – Raw bytes for a JSON file.
Returns:: Pretty-printed JSON string or decoded text fallback.
Return type:: str

LDDashRagChatbot.parse_pdf(raw: bytes, filename: str) → List[Document][source]

Extract text from a PDF and return it as a list of Documents.

Each PDF page becomes a separate Document. Pages with no extractable text are skipped.

Parameters:

raw – Raw bytes of the PDF file.
filename – The original filename, stored in Document.metadata['source'].

Returns:

One Document per page containing extractable text.

Return type:

list[Document]

LDDashRagChatbot.parse_txt_like(raw: bytes) → str[source]

Decode bytes into text for plain-text-like files.

Attempts UTF-8 first and falls back to Latin-1 with replacement.

Parameters:: raw – Raw bytes for a text-like file.
Returns:: Decoded text content.
Return type:: str

LDDashRagChatbot.render_chat(history)[source]

Render chat messages as simple “bubble” components.

Parameters:: history – List of chat message dicts with keys role and content.
Returns:: A list of Dash HTML components representing the chat history.
Return type:: list

LDDashRagChatbot.trim_history(history)[source]

Trim chat history to a rolling window of recent turns.

The UI stores messages as dicts of the form {"role": "...", "content": "..."}. This function keeps only the last MAX_TURNS * 2 entries (user+assistant).

Parameters:: history – List of chat message dicts.
Returns:: Trimmed history list.
Return type:: list

LDDashRagChatbot.update_chat_window(history)[source]

Dash callback: render the chat window from stored history.

Parameters:: history – Chat history stored in dcc.Store.
Returns:: Rendered chat components.
Return type:: list