[go: up one dir, main page]

Skip to content

Resolve cross-join in ProjectsGrades.grades_for with groups

Summary

The vulnerabilities/projects_grade.rb model has a scope to include projects which cannot be used when the tables are decomposed.

Further details

See https://gitlab.com/gitlab-org/gitlab/-/blob/5fa07731e2b22537958c4ebe3efba9a039a9a9e9/ee/app/models/vulnerabilities/projects_grade.rb#L31

    def self.grades_for(vulnerables, filter: nil, include_subgroups: false)
      projects = vulnerables.map do |v|
        collection = include_subgroups ? v.all_projects : v.projects
        collection.non_archived
      end

      relation = ::Vulnerabilities::Statistic.for_project(projects.reduce(&:or))
      relation = relation.by_grade(filter) if filter
      relation = relation.allow_cross_joins_across_databases(url: 'https://gitlab.com/gitlab-org/gitlab/-/issues/503387')

Proposal

Change ProjectsGrade.grades_for to rely on a new Vulnerabilities::Statistic.by_group scope which is based on Add traversal_ids column to vulnerability_stati... (#512592 - closed).

Vulnerabilities::Statistic.by_group is similar to Vulnerabilities::Read.by_group in the way it compares traversal IDs to collect all the objects that match a group and its subgroup.

Additionally, Vulnerabilities::Statistic provides a way to select objects that are directly in the group, and not in its subgroups. This is necessary to support the include_subgroups paramater. For instance, we could have .by_group and .by_group_not_including_subgroups, or pass a boolean to .by_group to switch b/w including the subgroups or not.

NOTE: The original proposal was to iterate batches of groups, and sub-batches of projects. This has been implemented in Resolve cross-join in ProjectsGrade.grades_for ... (!172720 - merged). However, this implementation wasn't not fast enough for large group such as gitlab-org. See thread: #508631 (comment 2264738914)

Implementation plan

  • Implement Vulnerabilities::Statistic.by_group. It can include subgroups or not.
  • Update .grades_for to leverage .by_group. This is behind a feature flag.
  • Post migration : Add DB index to make .grades_for efficient when it uses .by_group.

Verification

Use the Group Security Dashboard or the groupVulnerabilityGrades query it triggers.

GraphQL query
query groupVulnerabilityGrades($fullPath: ID!) {
  group(fullPath: $fullPath) {
    id
    vulnerabilityGrades(includeSubgroups: true) {
      grade
      count
      projects {
        nodes {
          ...Project
          ...ProjectVulnerabilitySeveritiesCount
          __typename
        }
        __typename
      }
      __typename
    }
    __typename
  }
}

fragment Project on Project {
  id
  name
  nameWithNamespace
  securityDashboardPath
  fullPath
  avatarUrl
  path
  __typename
}

fragment ProjectVulnerabilitySeveritiesCount on Project {
  vulnerabilitySeveritiesCount(state: [DETECTED, CONFIRMED]) {
    ...VulnerabilitySeveritiesCount
    __typename
  }
  __typename
}

fragment VulnerabilitySeveritiesCount on VulnerabilitySeveritiesCount {
  critical
  high
  info
  low
  medium
  unknown
  __typename
}
{
  "fullPath": "gitlab-com"
}
Edited by Fabien Catteau