Module Reference

LDDashRagChatbot

A Dash-based chatbot UI backed by Google Gemini (via LangChain) with optional Retrieval-Augmented Generation (RAG) over an uploaded document.

The application supports uploading common document types (txt/md/csv/json/pdf/docx), building a per-session vector index (Chroma), and answering questions using a LangChain Expression Language (LCEL) RAG chain.

Warning

This implementation stores vector indexes in a process-level in-memory dict (RAG_SESSIONS). This is suitable for local development and demos, but is not safe for multi-worker deployments (e.g., Gunicorn with multiple workers) without persistence or an external store.

Environment Variables

  • GOOGLE_API_KEY (required): Google Gemini API key.

  • GEMINI_MODEL (optional): Gemini chat model name. Default: gemini-2.5-flash-lite.

  • EMBEDDING_MODEL (optional): Google embedding model. Default: models/text-embedding-004.

  • TEMPERATURE (optional): LLM temperature. Default: 0.2.

  • CHUNK_SIZE (optional): Chunk size for text splitting. Default: 1000.

  • CHUNK_OVERLAP (optional): Chunk overlap for text splitting. Default: 150.

  • TOP_K (optional): Number of retrieved chunks for RAG. Default: 5.

  • MAX_TURNS (optional): Rolling window of chat turns kept in UI memory. Default: 12.

Running

Run locally:

python LDDashRagChatbot.py

Then open:

Created on 1/20/26 at 9:24 PM By yuvarajdurairaj Module Name LDDashRagChatbot

LDDashRagChatbot.RAG_SESSIONS = {}

In-memory store mapping a Dash session id to its RAG index state.

Keys are session ids (strings). Values are dicts with at least:

  • vs: Chroma vector store

  • retriever: Retriever derived from the vector store

  • filename: Uploaded filename

  • doc_count: Number of source Documents produced by parsing

LDDashRagChatbot.build_rag_chain(retriever)[source]

Build an LCEL Retrieval-Augmented Generation (RAG) chain.

The chain follows this conceptual structure:

  • Retrieve relevant documents for a question

  • Format documents into a promptable context string

  • Combine {context} and {question} into a chat prompt

  • Invoke the Gemini chat model

  • Parse the model output into a plain string

LCEL equivalent:

(
  {"context": retriever | format_docs, "question": RunnablePassthrough()}
  | prompt
  | llm
  | StrOutputParser()
)
Parameters:

retriever – A LangChain retriever implementing the Runnable interface and returning a list of Document objects.

Returns:

An LCEL runnable chain that accepts a question string and returns an answer string.

Return type:

Runnable

LDDashRagChatbot.build_vectorstore(docs: List[Document], collection_name: str) Chroma[source]

Build a Chroma vector store from Documents.

Documents are chunked using the module-level splitter and embedded using the module-level Gemini embedding model.

Parameters:
  • docs – Input Documents to index.

  • collection_name – Name of the Chroma collection.

Returns:

A Chroma vector store containing the embedded chunks.

Return type:

Chroma

LDDashRagChatbot.clear_attachment(n, session_id)[source]

Dash callback: clear the RAG index for the current session.

Parameters:
  • n – Number of clicks on the “Clear attachment” button.

  • session_id – Per-browser session identifier.

Returns:

A status message indicating no attachment is indexed.

Return type:

str

LDDashRagChatbot.decode_upload(contents: str) bytes[source]

Decode Dash dcc.Upload contents into raw bytes.

Dash upload contents are provided as a data URL:

data:<mime>;base64,<payload>

Parameters:

contents – The upload contents string from dcc.Upload(contents=...).

Returns:

The decoded file contents.

Return type:

bytes

LDDashRagChatbot.format_docs(docs: List[Document]) str[source]

Format retrieved documents into a single context string.

This function is intended for use in an LCEL chain as a post-processing step after retrieval. It includes basic source metadata when available.

Parameters:

docs – A list of LangChain Document objects returned by a retriever.

Returns:

A formatted context string suitable for insertion into a prompt. If no documents are provided, returns "NO_RELEVANT_CONTEXT".

Return type:

str

LDDashRagChatbot.load_documents_from_upload(contents: str, filename: str) List[Document][source]

Load an uploaded file into LangChain Documents based on extension.

Supported extensions: - txt, md, csv -> single Document - json -> single Document (pretty-printed) - pdf -> one Document per page - docx -> single Document

Parameters:
  • contents – The Dash upload contents string from dcc.Upload.

  • filename – The uploaded filename (used to infer file type and stored as metadata).

Returns:

Extracted Documents representing the file content.

Return type:

list[Document]

Raises:

ValueError – If the file extension is unsupported.

LDDashRagChatbot.on_send_or_clear(send_clicks, clear_clicks, n_submit, user_text, history, session_id)[source]

Dash callback: send a user message (or clear chat) and return updated history.

If a RAG index exists for the current session, the callback uses the LCEL RAG chain to answer the user question. Otherwise it falls back to a plain LLM call.

Parameters:
  • send_clicks – Number of clicks on the “Send” button.

  • clear_clicks – Number of clicks on the “Clear chat” button.

  • n_submitdcc.Input submit count (triggered when pressing Enter).

  • user_text – Current text in the user input box.

  • history – Chat history stored in dcc.Store.

  • session_id – Per-browser session identifier.

Returns:

Updated chat history and the cleared input box value.

Return type:

tuple[list, str]

LDDashRagChatbot.on_upload(contents, filename, session_id)[source]

Dash callback: handle file upload and build the RAG vector index.

Parameters:
  • contents – Dash upload contents string (base64 data URL).

  • filename – Uploaded filename.

  • session_id – Per-browser session identifier.

Returns:

A status message describing index build success/failure.

Return type:

str

LDDashRagChatbot.parse_docx(raw: bytes, filename: str) List[Document][source]

Extract text from a DOCX and return it as a list of Documents.

The DOCX content is extracted from paragraphs, joined, and returned as a single Document (which will later be chunked by the text splitter).

Parameters:
  • raw – Raw bytes of the DOCX file.

  • filename – The original filename, stored in Document.metadata['source'].

Returns:

A single Document containing extracted DOCX text, or an empty list if no text was found.

Return type:

list[Document]

LDDashRagChatbot.parse_json(raw: bytes) str[source]

Parse JSON bytes into a pretty-printed text representation.

If JSON parsing fails, falls back to text decoding.

Parameters:

raw – Raw bytes for a JSON file.

Returns:

Pretty-printed JSON string or decoded text fallback.

Return type:

str

LDDashRagChatbot.parse_pdf(raw: bytes, filename: str) List[Document][source]

Extract text from a PDF and return it as a list of Documents.

Each PDF page becomes a separate Document. Pages with no extractable text are skipped.

Parameters:
  • raw – Raw bytes of the PDF file.

  • filename – The original filename, stored in Document.metadata['source'].

Returns:

One Document per page containing extractable text.

Return type:

list[Document]

LDDashRagChatbot.parse_txt_like(raw: bytes) str[source]

Decode bytes into text for plain-text-like files.

Attempts UTF-8 first and falls back to Latin-1 with replacement.

Parameters:

raw – Raw bytes for a text-like file.

Returns:

Decoded text content.

Return type:

str

LDDashRagChatbot.render_chat(history)[source]

Render chat messages as simple “bubble” components.

Parameters:

history – List of chat message dicts with keys role and content.

Returns:

A list of Dash HTML components representing the chat history.

Return type:

list

LDDashRagChatbot.trim_history(history)[source]

Trim chat history to a rolling window of recent turns.

The UI stores messages as dicts of the form {"role": "...", "content": "..."}. This function keeps only the last MAX_TURNS * 2 entries (user+assistant).

Parameters:

history – List of chat message dicts.

Returns:

Trimmed history list.

Return type:

list

LDDashRagChatbot.update_chat_window(history)[source]

Dash callback: render the chat window from stored history.

Parameters:

history – Chat history stored in dcc.Store.

Returns:

Rendered chat components.

Return type:

list