GitLab Duo Chat: Expanded GitLab Documentation Context
Everyone can contribute. Help move this issue forward while earning points, leveling up and collecting rewards.
Problem
GitLab Duo Chat currently has limited contextual awareness of documentation sources outside the core GitLab repository. According to #508285, Duo Chat only indexes documentation from the main GitLab repository at /gitlab-org/gitlab/-/tree/master/doc
, excluding documentation from other deployed services like Gitaly, GitLab Runners, and other satellite services.
This limitation creates several challenges:
-
Incomplete knowledge base: Users expect Duo Chat to be aware of all documentation related to GitLab, including satellite services and tools that are part of the GitLab ecosystem but maintained in separate repositories.
-
Inaccurate or incomplete responses: When users ask questions about topics covered in documentation outside the main repository, Duo Chat is unable to provide accurate or complete answers, leading to user frustration and diminished product value.
-
Inconsistent user experience: Users expect an "all-knowing chat" (#14085 (closed)) that understands everything they can see in GitLab's ecosystem. The current limitation creates confusion when Duo can answer some documentation questions but not others, with no clear indication to users about these boundaries (other than documentation).
-
Limited context for evaluation: Since test data is generated from this documentation, our evaluation does not capture this blindspot (#508285), potentially leading to overestimated performance metrics.
What this is not about
Customers have requested to have Duo Chat support answering questions about their own documentation. However, this is a distinct separate problem: #517943
Proposal
Expand GitLab Duo Chat's documentation context to include all official GitLab documentation sources, not just the main repository, and optionally allow customers to include their own documentation sources. This would create a more comprehensive knowledge base for Duo Chat to draw upon when answering user questions.
-
Index additional documentation sources:
- As proposed in this comment, include all five official GitLab documentation repositories as listed in: the documentation and in https://gitlab.com/gitlab-org/gitlab-docs/-/blob/main/nanoc.yaml?ref_type=heads#L43, which include:
- GitLab: https://gitlab.com/gitlab-org/gitlab/-/tree/master/doc/
- Runner: https://gitlab.com/gitlab-org/gitlab-runner/-/tree/main/docs/
- Omnibus GitLab: https://gitlab.com/gitlab-org/omnibus-gitlab/-/tree/master/doc/
- Charts: https://gitlab.com/gitlab-org/charts/gitlab/-/tree/master/doc/
- GitLab Development Kit: https://gitlab.com/gitlab-org/gitlab-development-kit/-/tree/main/doc/
- Additional satellite service documentation to include:
- Gitaly: https://gitlab.com/gitlab-org/gitaly/-/tree/master/doc/
- GitLab Shell: https://gitlab.com/gitlab-org/gitlab-shell/-/tree/main/doc/
- GitLab Pages: https://gitlab.com/gitlab-org/gitlab-pages/-/tree/master/doc/
- GitLab Workhorse: https://gitlab.com/gitlab-org/gitlab-workhorse/-/tree/master/doc/
- GitLab Elasticsearch Indexer: https://gitlab.com/gitlab-org/gitlab-elasticsearch-indexer/-/tree/master/doc/ (if available)
- Support Knowledge Base: https://support.gitlab.com/hc/en-us/sections/15215649512604-Knowledge-Base (referenced in this comment)
- Create a consistent update mechanism to ensure all documentation sources remain current
- As proposed in this comment, include all five official GitLab documentation repositories as listed in: the documentation and in https://gitlab.com/gitlab-org/gitlab-docs/-/blob/main/nanoc.yaml?ref_type=heads#L43, which include:
-
Versioning considerations:
- While versioning is already handled for the main GitLab documentation, the additional documentation sources may follow different versioning schemes
- Ensure consistent versioning approach across all documentation sources
- Map satellite service documentation versions appropriately to match GitLab's main version numbering
-
Audience-aware documentation context:
- The listed documentation aim at different audiences/purposes:
- Using GitLab and its services
- Contributing to GitLab
- Ensure that the solution appropriately differentiates the user goal and presents only relevant answers
- If this can't be done, only those documents could be included in the RAG that address user needs. Those documents that address contributor needs should then be left out.
- The listed documentation aim at different audiences/purposes:
-
Technical implementation:
- Continue using the existing Google Vertex AI Search system
- Add additional documentation sources to the indexing pipeline
Benefits
-
Enhanced user experience: Users will receive more comprehensive and accurate answers to their documentation questions about all GitLab components, not just the core product.
-
Improved context awareness: Duo Chat will better meet user expectations of having knowledge about the entire GitLab ecosystem, including satellite services and supporting components.
-
Better support for contributors: As noted in #514576, this will help answer questions from contributors about how to work with different GitLab components.
-
Competitive advantage: This feature would differentiate GitLab Duo from competitors by providing deeper integration with the entire GitLab ecosystem.
References
- Issue #508285: "Duo Documentation does not index documentation in different repositories"
- Issue #517943: "Product Strategy: Duo Chat to support questions about customer docs or 3rd party docs"
- Issue #514576: "Teach Duo Chat to answer questions on how to contribute"
- Epic #14085 (closed): "Meet user expectation that Duo Chat should know everything the user sees and beyond"