Handle Vertex TokenLimits error in ActiveContext
What does this MR do and why?
As part of the Semantic search: chat with your codebase (&16910), the Ai::ActiveContext::BulkProcessWorker generates embeddings for a code snippet using Gitlab::Llm::VertexAi::Embeddings::Text. Furthermore, the embeddings generation is done as a bulk request, meaning that a single embeddings generation request can have multiple inputs.
This can result in a "token limits exceeded error" in the call to the Vertex Text Embedding model.
Call stack
-
Ai::ActiveContext::BulkProcessWorkerruns every minute and processes jobs in anyActiveContextqueue. For this specific scenario, we are concerned with theAi::ActiveContext::Queues::Code - The
BulkProcessWorkerprocesses a batch ofAi::ActiveContext::References::Code -
Ai::ActiveContext::References::Codecalls theActiveContext::Preprocessors::Embeddings::apply_embeddings -
ActiveContext::Preprocessors::EmbeddingscallsAi::ActiveContext::Embeddings::Code::VertexText.generate_embeddings. (It determines the correct embeddings generation class to call byAi::ActiveContext::Collections::Code::MODELS) -
Ai::ActiveContext::Embeddings::Code::VertexText.generate_embeddingscallsGitlab::Llm::VertexAi::Embeddings::Text#execute
Solution
This problem has a 2-pronged solution:
- Calculate the average tokens count of the code snippets, and set a batch size for processing to make sure we will not exceed token limits.
- Token counts calculation is not fully accurate, so we still need to handle "token limits exceeded error" by:
- Have
Gitlab::Llm::VertexAi::Embeddings::Textraise a specific error class for "token limits exceeded" - In
Ai::ActiveContext::Embeddings::Code::VertexText.generate_embeddings, catch the "token limits exceeded" error class and retry the call toGitlab::Llm::VertexAi::Embeddings::Textwith a smaller batch
- Have
This MR specifically addresses Step 2-2
References
Screenshots or screen recordings
N/A
How to set up and validate locally
Option 1: Validate through the BulkProcessWorker for Ai::ActiveContext::References::Code
You can use this validation if you have a ready list of refs / Elasticsearch docs that follow the schema detailed in this migration.
# ref_ids should be a large batch, so that you're sure the _total tokens count_ exceeds Vertex AI's limits
ref_ids = [the,ids,of,the,elasticsearch,docs]
::Ai::ActiveContext::Collections::Code.track_refs!(routing: "1", hashes: ref_ids)
::Ai::ActiveContext::BulkProcessWorker.new.perform("Ai::ActiveContext::Queues::Code", 0)
# The call to `::Ai::ActiveContext::BulkProcessWorker` should not result in any logged ERROR or WARNING.
Option 2: Validate by directly calling Ai::ActiveContext::Embeddings::Code::VertexText.generate_embeddings
# create a long single input then multiply by 250 (the *batch size* limit) so we have a really large token count
str = (["The quick brown fox jumps over the lazy dog"] * 50).join("\n")
contents = [str] * 250
# First, let's test on the Gitlab::Llm::VertexAi::Embeddings::Text
# This is the LLM class, and it should result in a TokenLimitExceededError
generate_embeddings = Gitlab::Llm::VertexAi::Embeddings::Text.new(
contents,
user: User.first,
tracking_context: { action: 'embedding' },
unit_primitive: ::Ai::ActiveContext::References::Code::UNIT_PRIMITIVE,
model: 'text-embedding-005',
).execute
# Now, let's test the same input on Ai::ActiveContext::Embeddings::Code::VertexText.
# This is where the token limit exceeded error is handled, with the `contents` input
# being halved recursively until the total token counts of each contents batch is within limits.
# This should not result in an error
results = Ai::ActiveContext::Embeddings::Code::VertexText.generate_embeddings(
contents,
unit_primitive: ::Ai::ActiveContext::References::Code::UNIT_PRIMITIVE,
model: 'text-embedding-005',
user: User.first
)
# Check results length is the same as the length of the original `contents` input
results.length
=> 250
# Check that each `results` element is an array of vector embeddings:
results.all?(Array)
=> true
results.all? { |ems| ems.all?(Float) }
=> true
MR acceptance checklist
Evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.
Related to #551002 (closed)