diff --git a/doc/operations/incident_management/alerts.md b/doc/operations/incident_management/alerts.md index b489b7de739d4e63992ec6e791be04b78f15fbb3..48e2c020f70819b13162660c9848e7a99cff41ad 100644 --- a/doc/operations/incident_management/alerts.md +++ b/doc/operations/incident_management/alerts.md @@ -76,22 +76,77 @@ page. Alerts provide **Overview** and **Alert details** tabs to give you the right amount of information you need. -### Alert overview tab +### Alert details tab -The **Overview** tab provides basic information about the alert: +The **Alert details** tab has two sections. The top section provides a short list of critical details such as the severity, start time, number of events, and originating monitorting tool. The second section displays the full alert payload. -![Alert Detail Overview](./img/alert_detail_overview_v13_1.png) +### Metrics tab -### Alert details tab +> [Introduced](https://gitlab.com/gitlab-org/gitlab/-/issues/217768) in GitLab 13.2. + +The **Metrics** tab will display a metrics chart for alerts coming from Prometheus. If the alert originated from any other tool, the **Metrics** tab will be empty. To set up alerts for GitLab-managed Prometheus instances, see [Managed Prometheus instances](../metrics/alerts.md#managed-prometheus-instances). For externally-managed Prometheus instances, you will need to configure your alerting +rules to display a chart in the alert. For information about how to configure +your alerting rules, see [Embedding metrics based on alerts in incident issues](../metrics/embed.md#embedding-metrics-based-on-alerts-in-incident-issues). See +[External Prometheus instances](../metrics/alerts.md#external-prometheus-instances) +for information about setting up alerts for your self-managed Prometheus +instance. + +To view the metrics for an alert: + + 1. Sign in as a user with Developer or higher [permissions](../../user/permissions.md). + 1. Navigate to **Operations > Alerts**. + 1. Select the alert you want to view. + 1. Below the title of the alert, select the **Metrics** tab. + +![Alert Metrics View](img/alert_detail_metrics_v13_2.png) -![Alert Full Details](./img/alert_detail_full_v13_1.png) +#### View an alert's logs -#### Update an alert's status +> - [Introduced](https://gitlab.com/gitlab-org/gitlab/-/issues/201846) in GitLab Ultimate 12.8. and [improved](https://gitlab.com/gitlab-org/gitlab/-/issues/217768) in GitLab 13.3. +> - [Moved](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/25455) to [GitLab Core](https://about.gitlab.com/pricing/) 12.9. + +Viewing logs from a metrics panel can be useful if you're triaging an +application incident and need to [explore logs](../metrics/dashboards/index.md#chart-context-menu) +from across your application. These logs help you understand what's affecting +your application's performance and how to resolve any problems. + +To view the logs for an alert: + + 1. Sign in as a user with Developer or higher [permissions](../../user/permissions.md). + 1. Navigate to **Operations > Alerts**. + 1. Select the alert you want to view. + 1. Below the title of the alert, select the **Metrics** tab. + 1. Select the [menu](../metrics/dashboards/index.md#chart-context-menu) of + the metric chart to view options. + 1. Select **View logs**. + +For additional information, see [View logs from metrics panel](#view-logs-from-metrics-panel). + +### Activity feed tab + +> [Introduced](https://gitlab.com/groups/gitlab-org/-/epics/3066) in GitLab 13.1. + +The **Activity feed** tab is a log of activity on the alert. When you take action on an alert, this is logged as a system note. This gives you a linear +timeline of the alert's investigation and assignment history. + +The following actions will result in a system note: + +- [Updating the status of an alert](#update-an-alerts-status) +- [Creating an issue based on an alert](#create-an-issue-from-an-alert) +- [Assignment of an alert to a user](#update-an-alerts-assignee) + +![Alert Details Activity Feed](./img/alert_detail_activity_feed.png) + +## Alert actions + +There are different actions avilable in GitLab to help triage and respond to alerts. + +### Update an alert's status The Alert detail view enables you to update the Alert Status. See [Create and manage alerts in GitLab](./alerts.md) for more details. -#### Create an issue from an alert +### Create an incident from an alert > [Introduced](https://gitlab.com/gitlab-org/gitlab/-/issues/217745) in GitLab 13.1. @@ -104,16 +159,14 @@ Closing a GitLab issue associated with an alert changes the alert's status to Resolved. See [Create and manage alerts in GitLab](alerts.md) for more details about alert statuses. -#### Update an alert's assignee +### Assign an alert > [Introduced](https://gitlab.com/groups/gitlab-org/-/epics/3066) in GitLab 13.1. -The Alert detail view allows users to update the Alert assignee. -GitLab supports only a single assignee per alert. - In large teams, where there is shared ownership of an alert, it can be -difficult to track who is investigating and working on it. The Alert detail -view enables you to update the Alert assignee: +difficult to track who is investigating and working on it. Assigning alerts eases collaboration and delegation by indicating which user is owning the alert. GitLab supports only a single assignee per alert. + +To assign an alert: 1. To display the list of current alerts, navigate to **Operations > Alerts**: @@ -131,26 +184,11 @@ view enables you to update the Alert assignee: ![Alert Details View Assignee(s)](./img/alert_todo_assignees_v13_1.png) -To remove an assignee, select **Edit** next to the **Assignee** dropdown menu +After completing their portion of investigating or fixing the alert, users can +unassign themselves from the alert. To remove an assignee, select **Edit** next to the **Assignee** dropdown menu and deselect the user from the list of assignees, or select **Unassigned**. -#### Alert system notes - -> [Introduced](https://gitlab.com/groups/gitlab-org/-/epics/3066) in GitLab 13.1. - -When you take action on an alert, this is logged as a system note, -which is visible in the Alert Details view. This gives you a linear -timeline of the alert's investigation and assignment history. - -The following actions will result in a system note: - -- [Updating the status of an alert](#update-an-alerts-status) -- [Creating an issue based on an alert](#create-an-issue-from-an-alert) -- [Assignment of an alert to a user](#update-an-alerts-assignee) - -![Alert Details View System Notes](./img/alert_detail_system_notes_v13_1.png) - -#### Create a to do from an alert +### Create a to do from an alert > [Introduced](https://gitlab.com/groups/gitlab-org/-/epics/3066) in GitLab 13.1. @@ -168,91 +206,6 @@ Select the **To-Do List** **{todo-done}** in the navigation bar to view your cur ![Alert Details Added to do](./img/alert_detail_added_todo_v13_1.png) -#### View an alert's metrics data - -> [Introduced](https://gitlab.com/gitlab-org/gitlab/-/issues/217768) in GitLab 13.2. - -To view the metrics for an alert: - - 1. Sign in as a user with Developer or higher [permissions](../../user/permissions.md). - 1. Navigate to **Operations > Alerts**. - 1. Select the alert you want to view. - 1. Below the title of the alert, select the **Metrics** tab. - -![Alert Metrics View](img/alert_detail_metrics_v13_2.png) - -For GitLab-managed Prometheus instances, metrics data is available for the -alert, making it easy to see surrounding behavior. For information about -setting up alerts, see [Managed Prometheus instances](../metrics/alerts.md#managed-prometheus-instances). - -For externally-managed Prometheus instances, you can configure your alerting -rules to display a chart in the alert. For information about how to configure -your alerting rules, see [Embedding metrics based on alerts in incident issues](../metrics/embed.md#embedding-metrics-based-on-alerts-in-incident-issues). See -[External Prometheus instances](../metrics/alerts.md#external-prometheus-instances) -for information about setting up alerts for your self-managed Prometheus -instance. - -### Use cases for assigning alerts - -Consider a team formed by different sections of monitoring, collaborating on a -single application. After an alert surfaces, it's extremely important to route -the alert to the team members who can address and resolve the alert. - -Assigning Alerts eases collaboration and delegation. All assignees are shown in -your team's work-flows, and all assignees receive notifications, simplifying -communication and ownership of the alert. - -After completing their portion of investigating or fixing the alert, users can -unassign their account from the alert when their role is complete. You can -update the alert on the [Alert list](./alerts.md) to reflect if the alert has -been resolved. - -### View an alert's logs - -> [Introduced](https://gitlab.com/gitlab-org/gitlab/-/issues/217768) in GitLab 13.3. - -To view the logs for an alert: - - 1. Sign in as a user with Developer or higher [permissions](../../user/permissions.md). - 1. Navigate to **Operations > Alerts**. - 1. Select the alert you want to view. - 1. Below the title of the alert, select the **Metrics** tab. - 1. Select the [menu](../metrics/dashboards/index.md#chart-context-menu) of - the metric chart to view options. - 1. Select **View logs**. - -For additional information, see [View logs from metrics panel](#view-logs-from-metrics-panel). - -### Embed metrics in incidents and issues - -You can embed metrics anywhere [GitLab Markdown](../../user/markdown.md) is -used, such as descriptions, comments on issues, and merge requests. Embedding -metrics helps you share them when discussing incidents or performance issues. -You can output the dashboard directly into any issue, merge request, epic, or -any other Markdown text field in GitLab by -[copying and pasting the link to the metrics dashboard](../metrics/embed.md#embedding-gitlab-managed-kubernetes-metrics). - -You can embed both [GitLab-hosted metrics](../metrics/embed.md) and -[Grafana metrics](../metrics/embed_grafana.md) in incidents and issue -templates. - -#### Context menu - -You can view more details about an embedded metrics panel from the context -menu. To access the context menu, select the **{ellipsis_v}** **More actions** -dropdown box above the upper right corner of the panel. For a list of options, -see [Chart context menu](../metrics/dashboards/index.md#chart-context-menu). - -##### View logs from metrics panel - -> - [Introduced](https://gitlab.com/gitlab-org/gitlab/-/issues/201846) in GitLab Ultimate 12.8. -> - [Moved](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/25455) to [GitLab Core](https://about.gitlab.com/pricing/) 12.9. - -Viewing logs from a metrics panel can be useful if you're triaging an -application incident and need to [explore logs](../metrics/dashboards/index.md#chart-context-menu) -from across your application. These logs help you understand what's affecting -your application's performance and how to resolve any problems. - ## View the environment that generated the alert > - [Introduced](https://gitlab.com/gitlab-org/gitlab/-/issues/232492) in GitLab 13.5. diff --git a/doc/operations/incident_management/img/alert_detail_activity_feed.png b/doc/operations/incident_management/img/alert_detail_activity_feed.png new file mode 100644 index 0000000000000000000000000000000000000000..126332868bdc14590b84645c99dae38532ada68a Binary files /dev/null and b/doc/operations/incident_management/img/alert_detail_activity_feed.png differ diff --git a/doc/operations/incident_management/incidents.md b/doc/operations/incident_management/incidents.md index e6eda180eb54f9cb3bb7ae47faa6f9e0e99d9282..30a077aa0806169335d3a07a50ede243d733f6fe 100644 --- a/doc/operations/incident_management/incidents.md +++ b/doc/operations/incident_management/incidents.md @@ -6,86 +6,19 @@ info: To determine the technical writer assigned to the Stage/Group associated w # Incidents +Incidents are critical entities in incident management workflows. They represent a service disruption or outage that needs to be restored urgently. GitLab provides tools for the triage, response, and remediation of incidents. While no configuration is required to use the [manual features](#create-an-incident-manually) of incident management, some simple [configuration](#configure-incidents) is needed to automate incident creation. -For users with at least Guest [permissions](../../user/permissions.md), the -Incident Management list is available at **Operations > Incidents** -in your project's sidebar. The list contains the following metrics: - -![Incident List](img/incident_list_v13_4.png) - -- **Status** - To filter incidents by their status, click **Open**, **Closed**, - or **All** above the incident list. -- **Search** - The Incident list supports a simple free text search, which filters - on the **Title** and **Incident** fields. -- **Severity** - Severity of a particular incident, which can be one of the following - values: - - **{severity-critical}** **Critical - S1** - - **{severity-high}** **High - S2** - - **{severity-medium}** **Medium - S3** - - **{severity-low}** **Low - S4** - - **{severity-unknown}** **Unknown** - - [Editing incident severity](#incident-details) on the incident details page was - [introduced](https://gitlab.com/gitlab-org/gitlab/-/issues/229402) in GitLab 13.4. - -- **Incident** - The description of the incident, which attempts to capture the - most meaningful data. -- **Date created** - How long ago the incident was created. This field uses the - standard GitLab pattern of `X time ago`, but is supported by a granular date/time - tooltip depending on the user's locale. -- **Assignees** - The user assigned to the incident. -- **Published** - Displays a green check mark (**{check-circle}**) if the incident is published - to a [Status Page](status_page.md). **(ULTIMATE)** - -The Incident list displays incidents sorted by incident created date. -([Introduced](https://gitlab.com/gitlab-org/gitlab/-/issues/229534) to GitLab core in 13.3.) -To see if a column is sortable, point your mouse at the header. Sortable columns -display an arrow next to the column name. - -Incidents share the [Issues API](../../user/project/issues/index.md). - -TIP: **Tip:** -For a live example of the incident list in action, visit this -[demo project](https://gitlab.com/gitlab-examples/ops/incident-setup/everyone/tanuki-inc/-/incidents). - -## Configure incidents - -> [Introduced](https://gitlab.com/gitlab-org/gitlab/-/issues/4925) in GitLab Ultimate 11.11. - -With Maintainer or higher [permissions](../../user/permissions.md), you can enable -or disable Incident Management features in the GitLab user interface -to create issues when alerts are triggered: - -1. Navigate to **Settings > Operations > Incidents** and expand - **Incidents**: - - ![Incident Management Settings](./img/incident_management_settings_v13_3.png) - -1. For GitLab versions 11.11 and greater, you can select the **Create an issue** - checkbox to create an issue based on your own - [issue templates](../../user/project/description_templates.md#creating-issue-templates). - For more information, see - [Trigger actions from alerts](../metrics/alerts.md#trigger-actions-from-alerts) **(ULTIMATE)**. -1. To create issues from alerts, select the template in the **Issue Template** - select box. -1. To send [separate email notifications](alert_notifications.md#email-notifications) to users - with [Developer permissions](../../user/permissions.md), select - **Send a separate email notification to Developers**. -1. Click **Save changes**. +## Incident Creation -Appropriately configured alerts include an -[embedded chart](../metrics/embed.md#embedding-metrics-based-on-alerts-in-incident-issues) -for the query corresponding to the alert. You can also configure GitLab to -[close issues](../metrics/alerts.md#trigger-actions-from-alerts) -when you receive notification that the alert is resolved. +You can create an incident manually or automatically. -## Create an incident manually +### Create incidents manually -If you have at least Guest [permissions](../../user/permissions.md), to create an Incident, you have two options. +If you have at least Guest [permissions](../../user/permissions.md), to create an Incident, you have two options to do this manually. -### From the Incidents List +**From the Incidents List:** > [Moved](https://gitlab.com/gitlab-org/monitor/health/-/issues/24) to GitLab core in 13.3. @@ -95,7 +28,7 @@ If you have at least Guest [permissions](../../user/permissions.md), to create a ![Incident List Create](./img/incident_list_create_v13_3.png) -### From the Issues List +**From the Issues List:** > [Introduced](https://gitlab.com/gitlab-org/gitlab/-/issues/230857) in GitLab 13.4. @@ -105,11 +38,31 @@ If you have at least Guest [permissions](../../user/permissions.md), to create a ![Incident List Create](./img/new_incident_create_v13_4.png) -## Configure PagerDuty integration +### Create incidents automatically + +> [Introduced](https://gitlab.com/gitlab-org/gitlab/-/issues/4925) in GitLab Ultimate 11.11. + +With Maintainer or higher [permissions](../../user/permissions.md), you can enable + Gitlab to create incident automatically whenever an alert is triggered: + +1. Navigate to **Settings > Operations > Incidents** and expand + **Incidents**: + + ![Incident Management Settings](./img/incident_management_settings_v13_3.png) + +1. Check the **Create an incident** + checkbox. +1. To customize the incident, select an [issue templates](../../user/project/description_templates.md#creating-issue-templates). +1. To send [an email notification](alert_notifications.md#email-notifications) to users + with [Developer permissions](../../user/permissions.md), select + **Send a separate email notification to Developers**. Email notifications will also be sent to users with **Maintainer** and **Owner** permissions. +1. Click **Save changes**. + +### Create incidents via the PagerDuty webhook > [Introduced](https://gitlab.com/gitlab-org/gitlab/-/issues/119018) in GitLab 13.3. -You can set up a webhook with PagerDuty to automatically create a GitLab issue +You can set up a webhook with PagerDuty to automatically create a GitLab incident for each PagerDuty incident. This configuration requires you to make changes in both PagerDuty and GitLab: @@ -126,7 +79,49 @@ in both PagerDuty and GitLab: to add the webhook URL to a PagerDuty webhook integration. To confirm the integration is successful, trigger a test incident from PagerDuty to -confirm that a GitLab issue is created from the incident. +confirm that a GitLab incident is created from the incident. + +## Incident list +For users with at least Guest [permissions](../../user/permissions.md), the +Incident list is available at **Operations > Incidents** +in your project's sidebar. The list contains the following metrics: + +![Incident List](img/incident_list_v13_4.png) + +- **Status** - To filter incidents by their status, click **Open**, **Closed**, + or **All** above the incident list. +- **Search** - The Incident list supports a simple free text search, which filters + on the **Title** and **Incident** fields. +- **Severity** - Severity of a particular incident, which can be one of the following + values: + - **{severity-critical}** **Critical - S1** + - **{severity-high}** **High - S2** + - **{severity-medium}** **Medium - S3** + - **{severity-low}** **Low - S4** + - **{severity-unknown}** **Unknown** + + [Editing incident severity](#change-severity) on the incident details page was + [introduced](https://gitlab.com/gitlab-org/gitlab/-/issues/229402) in GitLab 13.4. + +- **Incident** - The description of the incident, which attempts to capture the + most meaningful data. +- **Date created** - How long ago the incident was created. This field uses the + standard GitLab pattern of `X time ago`, but is supported by a granular date/time + tooltip depending on the user's locale. +- **Assignees** - The user assigned to the incident. +- **Published** - Displays a green check mark (**{check-circle}**) if the incident is published + to a [Status Page](status_page.md). **(ULTIMATE)** + +The Incident list displays incidents sorted by incident created date. +([Introduced](https://gitlab.com/gitlab-org/gitlab/-/issues/229534) to GitLab core in 13.3.) +To see if a column is sortable, point your mouse at the header. Sortable columns +display an arrow next to the column name. + +Incidents share the [Issues API](../../user/project/issues/index.md). + +TIP: **Tip:** +For a live example of the incident list in action, visit this +[demo project](https://gitlab.com/gitlab-examples/ops/incident-setup/everyone/tanuki-inc/-/incidents). ## Incident details @@ -193,3 +188,46 @@ After enabling **Incident SLA** in the Incident Management configuration, newly- incidents display a SLA (Service Level Agreement) timer showing the time remaining before the SLA period expires. If the incident is not closed before the SLA period ends, GitLab adds a `missed::SLA` label to the incident. + +## Incident Actions + +There are different actions avilable to help triage and respond to incidents. + +### Assign incidents + +Assign incidents to users that are actively responding. Select **Edit** in the right-hand side bar to select or deselect assignees. + +### Change severity + +See [Incident List](#incident-list) for a full description of the severities available. Select **Edit** in the right-hand side bar to change the severity of an incident. + +### Add a to do + +Add a to-do for incidents that you want to track in your to-do list. Clicke the **Add a to do** button at the top of the right-hand side bar to add a to do. + +### Manage incidents from Slack + +Slack slash commands allow you to control GitLab and view GitLab content without leaving Slack. + +Learn how to [set up Slack slash commands](../../user/project/integrations/slack_slash_commands.md) +and how to [use the available slash commands](../../integration/slash_commands.md). + +### Associate Zoom calls + +GitLab enables you to [associate a Zoom meeting with an issue](../../user/project/issues/associate_zoom_meeting.md) +for synchronous communication during incident management. After starting a Zoom +call for an incident, you can associate the conference call with an issue. Your +team members can join the Zoom call without requesting a link. + +### Embed metrics in incidents + +You can embed metrics anywhere [GitLab Markdown](../../user/markdown.md) is +used, such as descriptions, comments on issues, and merge requests. Embedding +metrics helps you share them when discussing incidents or performance issues. +You can output the dashboard directly into any issue, merge request, epic, or +any other Markdown text field in GitLab by +[copying and pasting the link to the metrics dashboard](../metrics/embed.md#embedding-gitlab-managed-kubernetes-metrics). + +You can embed both [GitLab-hosted metrics](../metrics/embed.md) and +[Grafana metrics](../metrics/embed_grafana.md) in incidents and issue +templates.