[go: up one dir, main page]

Skip to content

Integrate git-blame-tree(1) into Gitaly

When GitLab shows the tree of files you get something like this:

Before After
Screen_Shot_2025-04-30_at_16.50.24 Screen_Shot_2025-04-30_at_16.51.31

Current state

So GitLab knows the tree of files, but doesn't have the commits which last touched each file associated with them yet. To load this info, it uses the ListLastCommitsForTree RPC to fetch that information.

Internally this RPC handler runs git-ls-tree(1) to get a list of files in the tree. When it has the list of files it calls log.LastCommitForPath() for each file, this returns a *catfile.Commit for each path and that gets filled into the ListLastCommitsForTreeResponse.

With git-blame-tree

When git-blame-tree(1) is available in Git, we could avoid calling log.LastCommitForPath() for each path separately, and instead get all info at once for the files in the tree. To able to use git-blame-tree(1), we'd need it to return a full GitCommit:

message GitCommit {
  // id ...
  string id = 1;
  // subject ...
  bytes subject = 2;
  // body ...
  bytes body = 3;
  // author ...
  CommitAuthor author = 4;
  // committer ...
  CommitAuthor committer = 5;
  // parent_ids ...
  repeated string parent_ids = 6;
  // body_size is the size of the commit body. If body exceeds a certain threshold,
  // it will be nullified, but its size will be set in body_size so we can know if
  // a commit had a body in the first place.
  int64 body_size = 7;
  // signature_type ...
  SignatureType signature_type = 8;
  // tree_id is the object ID of the tree. The tree ID will always be filled, even
  // if the tree is empty. In that case the value will be `4b825dc642cb6eb9a060e54bf8d69288fbee4904`.
  // That value is equivalent to `git hash-object -t tree /dev/null`.
  string tree_id = 9;
  // trailers is the list of Git trailers (https://git-scm.com/docs/git-interpret-trailers)
  // found in this commit's message. The number of trailers and their key/value
  // sizes are limited. If a trailer exceeds these size limits, it and any
  // trailers that follow it are not included.
  repeated CommitTrailer trailers = 10;
  // short_stats are the git stats including additions, deletions and changed_files,
  // they are only set when `include_shortstat == true`.
  CommitStatInfo  short_stats = 11;
  // referenced_by contains fully-qualified reference names (e.g refs/heads/main)
  // that point to the commit.
  repeated bytes referenced_by = 12; // protolint:disable:this REPEATED_FIELD_NAMES_PLURALIZED
  // encoding is the encoding of the commit message. This field will only be present if
  // `i18n.commitEncoding` was set to a value other than "UTF-8" at the time
  // this commit was made.
  // See: https://git-scm.com/docs/git-commit#_discussion
  string encoding = 13;
}

The proposed git-blame-tree implementation would only return the commit sha, so extending it to return full commit details would be beneficial to avoid doing another Git call to git-show each commit.

Pagination

At the moment the ListLastCommitsForTreeRequest RPC has a offset and limit field. Based on that, the handler takes the output of git-ls-tree(1) and takes the subset of paths within that range. It's inefficient to first get the full list of entries in a tree, to only take a small subset of it.

To use git-blame-tree we'll probably need to be doing the same. Although we need to avoid git-blame-tree blames every file in the tree, to only take a subset of files. Therefore we'd still need to call git-ls-tree first, get the subset of paths we want to the commit information for and do something like: git blame-tree <rev> -- <path-0> <path-1> <path-2> ...

Future optimization.

So because the UI knows already which files it needs the last commit for, I wonder if we can change the interface of ListLastCommitsForTree to have the caller pass a list of paths where it needs the last commit for. (maybe the name ListLastCommitsForTree wouldn't fit and it might be better to have an RPC ListLastCommitsForPaths).

Edited by Toon Claes
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information