CN117441144A

CN117441144A - Systems and methods for handling asynchronous reset events while maintaining persistent storage state

Info

Publication number: CN117441144A
Application number: CN202280038691.7A
Authority: CN
Inventors: B·J·富勒
Original assignee: Oracle International Corp
Current assignee: Oracle International Corp
Priority date: 2021-03-31
Filing date: 2022-03-29
Publication date: 2024-01-23

Abstract

Techniques to handle asynchronous power conversion events while maintaining persistent memory states are described herein. In some embodiments, the system may proxy an asynchronous reset event through system logic that generates an interrupt to invoke a special persistent flush interrupt handler that performs a persistent cache flush prior to invoking hardware power conversion. Additionally or alternatively, the system may include a hardware backup mechanism to ensure that all resets and power conversions requested in hardware are reliably completed within a bounded time window, regardless of whether the persistent cache flush handler is successful.

Description

System and method

技术领域Technical field

本公开涉及高速缓存管理技术。特别地，本公开涉及用于在断电事件时刷新易失性高速缓存状态的技术。The present disclosure relates to cache management technology. In particular, this disclosure relates to techniques for refreshing volatile cache state upon a power outage event.

背景技术Background technique

现代服务器设计通常将持久性存储器(PMEM)(诸如数据中心持久性存储器模块(DCPMM)或非易失性双列直插存储器模块(NVDIMMS))纳入到存储器体系架构中。与基于块的持久性介质相比，持久性存储器具有若干优势，包括低时延随机访问时间以及直接对持久性存储器执行远程直接存储器访问(RDMA)操作的能力。Modern server designs often incorporate persistent memory (PMEM), such as data center persistent memory modules (DCPMM) or non-volatile dual in-line memory modules (NVDIMMS), into the memory architecture. Persistent storage offers several advantages over block-based persistent media, including low-latency random access times and the ability to perform remote direct memory access (RDMA) operations directly to persistent storage.

将数据直接提交到持久性存储器设备是昂贵的，并且具有持久性存储器的服务器通常支持将某个易失性片上(on-chip)状态视为持久性的，以便限制软件需要执行的显式提交操作的数量。如果系统可以保证在所有重置或电力转换时易失性缓冲区的状态将被刷新到持久性存储器(其否则会破坏易失性缓冲区内存储的内容)，那么程序可以将提交到易失性缓冲区的任何数据视为持久性的。一种用于刷新易失性缓冲区的这种方法被称为异步动态随机存取存储器刷新(ADR)，由此存储器控制器中的易失性缓冲区被包括在持久性域中。根据这种方法、系统保留必要的少量能量，以在断电后保持系统供电足够长的时间，用以将易失性存储器控制器缓冲区向外刷新到持久性存储器设备。Committing data directly to a persistent memory device is expensive, and servers with persistent memory typically support treating some volatile on-chip state as persistent in order to limit the explicit commits that the software needs to perform The number of operations. If the system can guarantee that the state of the volatile buffer will be flushed to persistent memory on all resets or power transitions (which would otherwise corrupt the contents stored within the volatile buffer), then the program can commit to the volatile buffer. Any data in a persistent buffer is considered persistent. One such method for refreshing volatile buffers is called asynchronous dynamic random access memory refresh (ADR), whereby volatile buffers in the memory controller are included in the persistence domain. Under this approach, the system retains a small amount of energy necessary to keep the system powered after a power outage long enough to flush volatile memory controller buffers out to persistent memory devices.

另一种被称为增强型ADR(eADR)或持久性高速缓存刷新(PCF)的技术扩展了可以作为持久性处理的易失性状态，以包括所有处理器高速缓存和片上缓冲区。通常，处理器高速缓存比存储器控制器中的易失性存储器缓冲区大几个数量级。因此，系统需要明显更多的能量来完成刷新处理。支持持久性高速缓存刷新的服务器必须包括某种形式的辅助能量存储，以便在持久性高速缓存刷新操作期间为系统供电。一些服务器包括电池备份单元(BBU)，以提供足够的能量，以在断电后完成将数据从处理器高速缓存向外刷新到持久性存储器中。BBU可以存储大量能量；但是，它们面临着许多挑战，包括占用区域大、供应服务器系统所需的高电流的能力有限、热约束和附加成本。Another technique called enhanced ADR (eADR) or persistent cache flush (PCF) extends the volatile state that can be handled as persistent to include all processor caches and on-chip buffers. Typically, processor caches are orders of magnitude larger than the volatile memory buffers in the memory controller. Therefore, the system requires significantly more energy to complete the refresh process. Servers that support persistent cache flush must include some form of auxiliary energy storage to power the system during persistent cache flush operations. Some servers include a battery backup unit (BBU) to provide enough energy to complete the flushing of data from the processor cache to persistent storage after a power outage. BBUs can store large amounts of energy; however, they face many challenges, including large footprints, limited ability to supply the high currents required by server systems, thermal constraints, and additional costs.

异步硬件重置事件使持久性高速缓存刷新操作的实现进一步复杂化。异步硬件重置通常通过直接断言重置请求引脚(pin)来实现，并且可能无法被处理器或芯片组的电源排序逻辑检测到。如果系统允许外部发起的重置事件触发硬件重置，而无需在重置之前调用持久性刷新处理程序，那么持久性存储器状态可能无法被正确刷新。如果应用在平台硬件不完全支持的情况下依赖于持久性高速缓存刷新，那么应用数据可能会在电力中断事件期间丢失或损坏。Asynchronous hardware reset events further complicate the implementation of persistent cache flush operations. Asynchronous hardware resets are typically implemented by directly asserting the reset request pin and may not be detected by the processor or chipset's power sequencing logic. If the system allows an externally initiated reset event to trigger a hardware reset without calling the persistence refresh handler before the reset, the persistent memory state may not be refreshed correctly. If an application relies on persistent cache flushing when the platform hardware does not fully support it, application data may be lost or corrupted during a power outage event.

本部分中描述的方法是可以采用的方法，但不一定是先前已经设想或采用的方法。因此，除非另有说明，否则不应仅由于将本部分中所描述的任何方法包括在本部分中而将其视为有资格作为现有技术。The approaches described in this section are approaches that could be taken, but are not necessarily approaches that have been previously envisioned or taken. Therefore, unless otherwise indicated, no method described in this section shall be deemed to qualify as prior art solely by virtue of its inclusion in this section.

附图说明Description of the drawings

在附图的各图中，通过示例而非限制的方式图示了实施例。应当注意的是，在本公开中对“实施例”或“一个实施例”的引用不一定是指同一个实施例，并且它们意味着至少一个实施例。在附图中：In the figures of the accompanying drawings, embodiments are illustrated by way of example and not limitation. It should be noted that references to "embodiment" or "one embodiment" in this disclosure do not necessarily refer to the same embodiment, and they mean at least one embodiment. In the attached picture:

图1图示了根据一些实施例的用于执行持久性高速缓存刷新操作的系统。Figure 1 illustrates a system for performing persistent cache flush operations in accordance with some embodiments.

图2图示了根据一些实施例的用于执行持久性高速缓存刷新操作以维持持久性存储器状态的示例操作集。Figure 2 illustrates an example set of operations for performing persistent cache flush operations to maintain persistent memory state, in accordance with some embodiments.

图3图示了根据一些实施例的用于管理具有多个电源的系统中的持久性高速缓存刷新操作的示例操作集。Figure 3 illustrates an example set of operations for managing persistent cache flush operations in a system with multiple power supplies, in accordance with some embodiments.

图4图示了根据一些实施例的用于管理多个电源的示例系统。Figure 4 illustrates an example system for managing multiple power supplies in accordance with some embodiments.

图5图示了根据一些实施例的具有来自不同电源单元的交错(staggered)警告信号的示例时序图。Figure 5 illustrates an example timing diagram with staggered warning signals from different power supply units in accordance with some embodiments.

图6图示了根据一些实施例的用于处理外部发起的异步重置事件的示例操作集。Figure 6 illustrates an example set of operations for handling externally initiated asynchronous reset events in accordance with some embodiments.

图7图示了根据一些实施例的用于拦截和处理外部发起的异步重置事件的示例系统。Figure 7 illustrates an example system for intercepting and handling externally initiated asynchronous reset events in accordance with some embodiments.

图8图示了根据一些实施例的用于协调持久性存储器操作模式的示例操作集。Figure 8 illustrates an example set of operations for coordinating persistent memory operating modes in accordance with some embodiments.

图9示出了图示根据一些实施例的计算机系统的框图。Figure 9 shows a block diagram illustrating a computer system in accordance with some embodiments.

具体实施方式Detailed ways

在下面的描述中，出于解释的目的，阐述了许多具体细节以便提供透彻的理解。可以在没有这些具体细节的情况下实践一个或多个实施例。一个实施例中描述的特征可以与另一个实施例中描述的特征组合。在一些示例中，参考框图形式描述了众所周知的结构和设备，以避免不必要地混淆本发明。In the following description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding. One or more embodiments may be practiced without these specific details. Features described in one embodiment may be combined with features described in another embodiment. In some instances, well-known structures and devices are described with reference to block diagram form in order to avoid unnecessarily obscuring the present invention.

1.总体概述1.General Overview

本文描述了用于利用系统电源单元(PSU)提供辅助能量以在交流(AC)电力丢失之后将易失性系统存储器刷新到持久性存储器的技术。在一些实施例中，这些技术包括实现足够长的延长保持窗口，以在断电事件之后使用PSU大容量电容器中可用的能量来完成处理器高速缓存和存储器控制器缓冲区的完全刷新。即使PSU中可用的能量的量与大多数BBU相比相对较小，这些技术也可以使得能够在不需要BBU的情况下刷新易失性系统高速缓存。This article describes techniques for utilizing a system power supply unit (PSU) to provide auxiliary energy to flush volatile system memory to persistent memory following a loss of alternating current (AC) power. In some embodiments, these techniques include implementing an extended hold window long enough to use the energy available in the PSU bulk capacitor to complete a complete refresh of the processor cache and memory controller buffers after a power outage event. Even though the amount of energy available in the PSU is relatively small compared to most BBUs, these techniques can enable volatile system caches to be flushed without the need for a BBU.

许多PSU都包含大容量电容器，其允许系统处理临时10毫秒(ms)的AC电力损失。示例PSU实施方式是假设最坏情况的输出负载并提供10ms计时器，该计时器在计时器到期时关闭供电输出。这种实施方式无论系统电力消耗如何都将PSU可以实现的最大保持窗口限制为10ms，这可能没有足够的时间来刷新所有系统高速缓存。Many PSUs contain bulk capacitors that allow the system to handle a temporary 10 millisecond (ms) loss of AC power. An example PSU implementation assumes worst-case output loading and provides a 10ms timer that turns off the power output when the timer expires. This implementation limits the maximum hold window a PSU can achieve to 10ms regardless of system power consumption, which may not be enough time to flush all system caches.

在一些实施例中，PSU被实现为将保持窗口延长由系统电力消耗决定的不确定时间窗口而不是固定时间窗口。可以监视一个或多个PSU内的大容量电容器上的电压，并且当在大容量电容器上检测到可编程的电力故障警告阈值电压时可以触发通知。系统可以配置电压阈值以指示成功完成高速缓存刷新操作所需的能量的特定最小量在PSU中可用。PSU还可以实现与安全地按顺序关闭(sequence down)系统电力轨道所需的能量的最小量相关联的第二电压阈值。由于这两个通知均基于PSU大容量电容器中的可用能量的量，因此系统可以实现可配置的保持窗口，其持续时间由系统的电力消耗决定而不是固定持续时间。因此，系统可以定义操作点来最小化能耗，而不是受固定持续时间计时器的约束。In some embodiments, the PSU is implemented to extend the hold window by an indeterminate time window determined by system power consumption rather than a fixed time window. The voltage on the bulk capacitor within one or more PSUs can be monitored and a notification can be triggered when a programmable power failure warning threshold voltage is detected on the bulk capacitor. The system can configure voltage thresholds to indicate that a specific minimum amount of energy is available in the PSU required to successfully complete a cache flush operation. The PSU may also implement a second voltage threshold associated with the minimum amount of energy required to safely sequence down the system power rails. Because both notifications are based on the amount of available energy in the PSU bulk capacitor, the system can implement a configurable hold window whose duration is determined by the system's power consumption rather than a fixed duration. Therefore, the system can define operating points to minimize energy consumption, rather than being bound by a fixed duration timer.

在一些实施例中，系统逻辑可以实现能量计数器，其估计所有安装的PSU上可用的能量的总量，并且当估计的总系统能量已经达到与成功完成高速缓存刷新操作所需的最小能量相关联的阈值时生成中断信号以调用持久性刷新处理程序。系统逻辑可以为系统中安装的每个PSU实现能量计数器。在PSU已经向系统逻辑生成电力故障警告信号之后，系统逻辑可以开始以与系统中的活动电源的数量和系统的操作模式成比例的速率递减与该PSU相关联的能量计数器。系统逻辑可以通过对每PSU(per PSU)计数器中的每一个求和来估计可用的总能量。当总估计能量下降到低于临界阈值时，系统逻辑可以生成中断信号以调用持久性高速缓存刷新处理程序。In some embodiments, system logic may implement an energy counter that estimates the total amount of energy available on all installed PSUs and when the estimated total system energy has reached is associated with the minimum energy required to successfully complete a cache flush operation threshold when an interrupt signal is generated to call the persistence refresh handler. System logic can implement energy counters for each PSU installed in the system. After the PSU has generated a power failure warning signal to the system logic, the system logic may begin decrementing the energy counter associated with the PSU at a rate proportional to the number of active power supplies in the system and the operating mode of the system. System logic can estimate the total energy available by summing each of the per PSU counters. When the total estimated energy drops below a critical threshold, system logic can generate an interrupt signal to invoke the persistence cache flush handler.

在一些实施例中，系统被配置为减少刷新处理期间的电力消耗，以最小化完成刷新所需的能量的量。处理器、持久性存储器设备和支持电路系统可以仍然保持供电。不参与刷新处理的其它系统组件(诸如风扇、输入/输出(I/O)设备和硬盘驱动器)可能已被禁用电力。系统中的电力控制环还可以包含诸如通过减少处理器频率来降低中央处理单元(CPU)电力消耗的钩子(hooks)。In some embodiments, the system is configured to reduce power consumption during the refresh process to minimize the amount of energy required to complete the refresh. The processor, persistent memory device, and supporting circuitry may remain powered. Other system components not participating in the refresh process, such as fans, input/output (I/O) devices, and hard drives, may have power disabled. Power control loops in the system may also include hooks to reduce central processing unit (CPU) power consumption, such as by reducing processor frequency.

为了确保包含被认为是持久性的状态的易失性系统资源在系统重置或电力转换之前被正确刷新到持久性介质，在一些实施例中，所有重置或电力转换之前都会执行持久性刷新处理程序，该处理程序负责在重置或电力转换之前将所有易失性状态推出到持久性介质。系统可以捕获(trap)对用于发起重置或电力状态转换的寄存器的访问，并在允许所捕获的写入完成之前发起持久性高速缓存刷新。捕获对寄存器的访问允许系统在执行请求的重置或电力转换动作之前运行高速缓存刷新处理程序。可以实现类似的机制来处理主机子系统外部的平台实体请求的重置和电力转换。To ensure that volatile system resources containing state that are considered persistent are properly flushed to persistent media prior to system resets or power transitions, in some embodiments, a persistence flush is performed before all resets or power transitions Handler that is responsible for pushing out all volatile state to persistent media before reset or power transition. The system can trap accesses to registers used to initiate resets or power state transitions and initiate a persistence cache flush before allowing the trapped write to complete. Capturing accesses to registers allows the system to run the cache flush handler before performing the requested reset or power transition action. Similar mechanisms can be implemented to handle resets and power transitions requested by platform entities external to the host subsystem.

系统重置和电力转换不仅可以响应于断电而发生，还可以响应于外部代理发起的事件而发生。例如，某些系统错误可能会触发硬件(HW)发起的系统重置。作为另一个示例，用户可以通过按住按钮或翻转开关来发起热重置或强制断电。如果系统允许外部发起的重置或电力转换事件来触发硬件重置，而不在重置之前调用刷新处理程序，那么驻留在易失性处理器高速缓存或存储器缓冲区中的数据可能会丢失。为了确保外部发起的系统重置或电力转换正确地调用持久性刷新处理程序，系统可以通过系统逻辑来代理这些异步事件，系统逻辑生成中断以调用特殊的持久性刷新中断处理程序，该处理程序在调用请求的HW操作之前执行持久性高速缓存刷新。附加地或替代地，系统可以包括HW备份机制，以确保HW中请求的所有重置和电力转换在有界时间窗口内可靠地完成，而与持久性高速缓存刷新处理程序是否成功无关。System resets and power transitions can occur not only in response to power outages but also in response to events initiated by external agents. For example, certain system errors may trigger a hardware (HW) initiated system reset. As another example, the user can initiate a thermal reset or force a power outage by holding down a button or flipping a switch. If the system allows an externally initiated reset or power transition event to trigger a hardware reset without calling a refresh handler before the reset, data residing in volatile processor caches or memory buffers may be lost. To ensure that externally initiated system resets or power transitions correctly invoke the persistence refresh handler, the system can proxy these asynchronous events through system logic that generates an interrupt to invoke a special persistence refresh interrupt handler that is Performs a persistent cache flush before invoking the requested HW operation. Additionally or alternatively, the system may include an HW backup mechanism to ensure that all resets and power transitions requested in the HW are completed reliably within a bounded time window, regardless of whether the persistence cache flush handler is successful or not.

本文描述的技术还提供用于向操作系统通知系统硬件是否支持持久性高速缓存刷新的握手机制和协议。系统可以确定硬件是否能够在断电或异步重置的情况下支持处理器高速缓存和易失性存储器缓冲区的完全刷新。如果硬件有能力，那么可以选择性地启用持久性高速缓存刷新并将其通告给操作系统。一旦启用持久性高速缓存刷新，操作系统就可以将提交给易失性处理器高速缓存的数据视为持久性的。如果被系统硬件禁用或不支持，那么此类数据可能会因电力故障或重置事件而丢失，并且平台可能不会向操作系统通告对持久性高速缓存刷新的支持。The technology described in this article also provides a handshake mechanism and protocol for informing the operating system whether the system hardware supports persistent cache flushing. The system can determine whether the hardware is capable of supporting a full flush of processor caches and volatile memory buffers in the event of a power loss or asynchronous reset. If the hardware is capable, persistent cache flushing can optionally be enabled and advertised to the operating system. Once persistent cache flushing is enabled, the operating system can treat data submitted to the volatile processor cache as persistent. If disabled or unsupported by system hardware, such data may be lost due to power failure or reset events, and the platform may not advertise support for persistent cache flushes to the operating system.

本说明书中描述的和/或权利要求中记载的一个或多个实施例可能不包括在本总体概述部分中。One or more embodiments described in this specification and/or claimed in the claims may not be included in this general summary section.

2.系统体系架构2. System architecture

在一些实施例中，本文描述的技术在存储器布局中包括持久性存储器的一个或多个计算设备(诸如服务器装置或其它网络主机)上实现。虽然本文提供了示例计算体系架构，但是这些技术适用于各种不同的计算体系架构，其可以取决于特定实施方式而变化。这些技术可以用于(a)确定系统组件的特定组合是否能够支持持久性高速缓存刷新，(b)配置系统组件以启用持久性高速缓存刷新(如果支持的话)，和/或(c)在启用了持久性高速缓存刷新的情况下在电力转换或重置之前执行持久性高速缓存刷新处理程序。In some embodiments, the techniques described herein are implemented on one or more computing devices, such as server devices or other network hosts, that include persistent memory in the memory layout. Although this article provides example computing architectures, these techniques are applicable to a variety of different computing architectures, which may vary depending on the particular implementation. These techniques can be used to (a) determine whether a particular combination of system components is capable of supporting persistent cache flushing, (b) configure the system components to enable persistent cache flushing (if supported), and/or (c) enable In the case of persistent cache flushing, the persistent cache flush handler is executed before a power transition or reset.

图1图示了根据一些实施例的用于执行持久性高速缓存刷新操作的系统。如图所示，图1包括PSU 102a、PSU 102b、电力管理子系统104、持久性高速缓存刷新处理程序106、存储器子系统108、CPU 116、系统管理模块118、系统固件120、外围组件124和操作系统122。在其它实施例中，系统100可以包括比图1中所示的组件更多或更少的组件。在一些情况下，图1中所示的组件可能位于彼此本地或彼此远离。Figure 1 illustrates a system for performing persistent cache flush operations in accordance with some embodiments. As shown, Figure 1 includes PSU 102a, PSU 102b, power management subsystem 104, persistent cache flush handler 106, memory subsystem 108, CPU 116, system management module 118, system firmware 120, peripheral components 124, and operating system122. In other embodiments, system 100 may include more or fewer components than shown in FIG. 1 . In some cases, the components shown in Figure 1 may be located locally or remotely from each other.

PSU 102a和PSU 102b将电力转换成允许系统100的组件正确操作的形式。在一些实施例中，PSU 102a和PSU 102b将AC电力转换成用于为系统100的组件供电的直流(DC)能量。附加地或替代地，PSU 102a和PSU 102b可以包括DC-DC电力转换器，诸如升高或降低输入电压的转换器。PSU 102a和PSU 102b可以经由一个或多个电力轨道(诸如+3伏(V)、+5V和/或+12V轨道)电耦合到系统100的其它组件。虽然图示了两个PSU，但是系统可以仅具有单个PSU或具有附加的PSU，这取决于特定的实施方式。PSU 102a and PSU 102b convert power into a form that allows the components of system 100 to operate correctly. In some embodiments, PSU 102a and PSU 102b convert AC power into direct current (DC) energy used to power components of system 100. Additionally or alternatively, PSU 102a and PSU 102b may include DC-DC power converters, such as converters that step up or step down the input voltage. PSU 102a and PSU 102b may be electrically coupled to other components of system 100 via one or more power rails, such as +3 Volt (V), +5V, and/or +12V rails. Although two PSUs are illustrated, the system may have only a single PSU or have additional PSUs, depending on the specific implementation.

电力管理子系统104控制系统PSU向系统100的组件的电力输送。在一些实施例中，电力管理子系统104在重置或断电事件期间选择性地对组件断电以优雅地使系统100关机。附加地或替代地，电力管理子系统104可以监视PSU 102a和PSU 102b中的大容量电容器两端的电压水平。如果电压水平下降到低于可编程的阈值，那么电力管理子系统104可以断言或解除断言信号以通知系统100的其它组件。The power management subsystem 104 controls the delivery of power from the system PSU to the components of the system 100 . In some embodiments, the power management subsystem 104 selectively powers down components during a reset or power outage event to gracefully shut down the system 100 . Additionally or alternatively, power management subsystem 104 may monitor voltage levels across bulk capacitors in PSU 102a and PSU 102b. If the voltage level drops below a programmable threshold, the power management subsystem 104 may assert or deassert the signal to notify other components of the system 100 .

存储器子系统108包括易失性和非易失性存储区域。在一些实施例中，易失性存储区域包括处理器高速缓存110和存储器缓冲区112。处理器高速缓存110可以包括CPU 116内的高速缓存，诸如3级(L3)和4级(L4)高速缓存，其可以由CPU 116使用以减少对主存储器的数据访问次数。存储器缓冲区112可以包括CPU 116中的寄存器和/或为在不同区域之间传送的数据提供中间存储的存储器控制器。例如，存储器控制器缓冲区可以为正在处理器高速缓存110和主存储器之间传送的数据提供临时存储。Memory subsystem 108 includes volatile and non-volatile storage areas. In some embodiments, volatile storage areas include processor cache 110 and memory buffer 112 . Processor cache 110 may include caches within CPU 116, such as level 3 (L3) and level 4 (L4) caches, which may be used by CPU 116 to reduce the number of data accesses to main memory. Memory buffer 112 may include registers in CPU 116 and/or a memory controller that provides intermediate storage for data transferred between different regions. For example, a memory controller buffer may provide temporary storage for data being transferred between processor cache 110 and main memory.

持久性存储器114包括一个或多个非易失性存储器设备，诸如数据中心持久性存储器模块(DCPMM)和非易失性双列直插存储器模块(NVDIMM)。在一些实施例中，持久性存储器114是字节可寻址的并且驻留在存储器总线上，从而提供与易失性DRAM类似的速度和时延，易失性DRAM通常比不驻留在存储器总线上的诸如硬盘和闪存驱动器之类的外围非易失性存储设备快得多。此外，持久性存储器114可以由操作系统122以与易失性DRAM相同的方式进行分页和映射，这通常不是其它形式的持久性存储的情况。持久性存储器114可以用作系统100内的主存储器。在其它情况下，主存储器可以包括一个或多个易失性存储器模块，诸如DRAM。Persistent memory 114 includes one or more non-volatile memory devices, such as data center persistent memory modules (DCPMMs) and non-volatile dual in-line memory modules (NVDIMMs). In some embodiments, persistent memory 114 is byte-addressable and resides on a memory bus, thereby providing similar speed and latency to volatile DRAM, which is generally faster than non-resident memory. Peripheral non-volatile storage devices such as hard disks and flash drives are much faster on the bus. Furthermore, persistent memory 114 may be paged and mapped by the operating system 122 in the same manner as volatile DRAM, which is generally not the case with other forms of persistent storage. Persistent memory 114 may serve as main memory within system 100 . In other cases, main memory may include one or more volatile memory modules, such as DRAM.

当安装持久性高速缓存刷新处理器并且启用平台信令机制时，存储在包括处理器高速缓存110和存储器缓冲区112的易失性存储器区域内的数据可以被视为持久性存储器状态的一部分，即使在断电或其它电力转换事件的情况下也是如此。为了维持持久性状态，高速缓存刷新处理程序106响应于检测到触发事件来执行并管理高速缓存刷新操作。如果未启用持久性高速缓存刷新操作，那么在电力转换事件期间可能不会执行完全高速缓存刷新，并且易失性存储器区域内的一些或全部数据可能会丢失。在没有持久性高速缓存刷新处理程序的情况下，来自存储器缓冲区112的数据可以被刷新，但处理器高速缓存110不会被刷新，这可以减少执行刷新操作所需的时间量。When a persistent cache flush processor is installed and the platform signaling mechanism is enabled, data stored within the volatile memory area including processor cache 110 and memory buffer 112 may be considered part of the persistent memory state, This is true even in the event of a power outage or other power conversion event. To maintain persistent state, cache flush handler 106 performs and manages cache flush operations in response to detecting trigger events. If persistent cache flush operations are not enabled, a full cache flush may not be performed during a power transition event, and some or all data within the volatile memory area may be lost. Without a persistent cache flush handler, data from memory buffer 112 can be flushed, but processor cache 110 is not flushed, which can reduce the amount of time required to perform a flush operation.

系统管理模块118包括用于管理系统级操作的软件和/或硬件。在一些实施例中，系统管理模块118包括服务处理器(SP)和CPU芯片组。系统管理模块118可以与一个或多个传感器对接以监视硬件组件。附加地或替代地，系统管理模块118可以执行其它功能，包括捕获对系统寄存器的写入、生成系统管理中断(SMI)以及监视系统引导(boot)状态。System management module 118 includes software and/or hardware for managing system-level operations. In some embodiments, system management module 118 includes a service processor (SP) and a CPU chipset. System management module 118 may interface with one or more sensors to monitor hardware components. Additionally or alternatively, the system management module 118 may perform other functions, including capturing writes to system registers, generating system management interrupts (SMIs), and monitoring system boot status.

系统固件120包括提供系统硬件的低级别控制的软件。在一些实施例中，系统固件120包括软件(诸如基本输入/输出系统(BIOS)固件)其在系统通电或重置时管理引导处理。系统固件120还可以为操作系统122提供运行时服务，诸如管理持久性高速缓存刷新操作和外围组件124。System firmware 120 includes software that provides low-level control of the system hardware. In some embodiments, system firmware 120 includes software, such as basic input/output system (BIOS) firmware, that manages the boot process when the system is powered on or reset. System firmware 120 may also provide runtime services to operating system 122 , such as managing persistent cache flush operations and peripheral components 124 .

操作系统122包括支持操作的软件，这些操作包括调度CPU 116上的指令执行、向软件应用提供服务、以及控制对外围组件124的访问。在一些实施例中，如果系统100支持的话，系统固件120可以通告将高速缓存内容包括在持久性域中的能力。操作系统122然后可以选择性地启用或禁用持久性高速缓存刷新。当被启用时，操作系统122可以将提交给包括处理器高速缓存110和存储器缓冲区112的易失性存储器的数据视为持久性的。Operating system 122 includes software that supports operations including scheduling instruction execution on CPU 116 , providing services to software applications, and controlling access to peripheral components 124 . In some embodiments, system firmware 120 may advertise the ability to include cache content in the persistence domain if supported by system 100 . Operating system 122 may then selectively enable or disable persistent cache flushing. When enabled, operating system 122 may treat data submitted to volatile memory, including processor cache 110 and memory buffer 112, as persistent.

外围组件124包括辅助硬件设备，诸如硬盘、输入设备、显示设备和/或可以与系统100的其它组件电耦合的其它输出设备。系统100的电力消耗可以部分地基于连接和活动的外围组件124而变化。最坏情况场景的最大电力负载可以通过假设包括外围组件124的所有硬件组件都正在以全部容量进行操作来计算。Peripheral components 124 include auxiliary hardware devices such as hard drives, input devices, display devices, and/or other output devices that may be electrically coupled with other components of system 100 . The power consumption of system 100 may vary based in part on connected and active peripheral components 124 . The maximum power load for the worst-case scenario can be calculated by assuming that all hardware components, including peripheral components 124, are operating at full capacity.

3.持久性高速缓存刷新3. Persistence cache refresh

3.1在电力中断事件期间管理高速缓存刷新操作3.1 Managing cache flush operations during power outage events

当AC电力中断时，可能不期望立即触发高速缓存刷新操作，因为电力可能会快速恢复。但是，存在这样的风险：如果经过太多时间而没有恢复电力，那么PSU 102a和PSU102b内的保持能量将不足以执行完全高速缓存刷新。如果启用持久性高速缓存刷新，那么持久性存储器状态可能会损坏。为了维持持久性状态，当PSU 102a和PSU 102b的大容量电容器内的剩余能量下降到低于阈值水平时，电力管理子系统104可以生成警告信号。When AC power is lost, it may not be desirable to trigger a cache flush operation immediately because power may be restored quickly. However, there is a risk that if too much time passes without power being restored, the retained energy within PSU 102a and PSU 102b will not be sufficient to perform a full cache flush. If persistent cache flushing is enabled, persistent storage state may become corrupted. To maintain a persistent state, power management subsystem 104 may generate a warning signal when the remaining energy within the bulk capacitors of PSU 102a and PSU 102b drops below a threshold level.

图2图示了根据一些实施例的用于执行高速缓存刷新操作以维持持久性存储器状态的示例操作集。图2中所示的一个或多个操作可以一起被修改、重新布置或省略。因此，图2中所示的特定操作顺序不应被解释为限制一个或多个实施例的范围。Figure 2 illustrates an example set of operations for performing cache flush operations to maintain persistent memory state, in accordance with some embodiments. One or more operations shown in Figure 2 may be modified, rearranged, or omitted altogether. Accordingly, the specific sequence of operations shown in Figure 2 should not be construed as limiting the scope of one or more embodiments.

参考图2，处理200包括基于系统负载来估计穿越(ride-through time)时间和保持时间的总量(操作202)。穿越时间对应于系统100可以在没有AC电力的情况下操作同时留下足够的能量来执行高速缓存刷新和按顺序关闭电力轨道的估计时间量。保持时间对应于在给定系统负载的情况下执行完全高速缓存刷新和按顺序关闭电力轨道的时间量。可以基于全系统负载或限制为最大值的减少的系统负载来计算估计，如本文进一步描述的。Referring to FIG. 2, process 200 includes estimating an amount of ride-through time and hold time based on system load (operation 202). The ride-through time corresponds to an estimated amount of time that system 100 can operate without AC power while leaving sufficient energy to perform cache flushes and sequential power rail shutdowns. The hold time corresponds to the amount of time to perform a full cache flush and sequential power rail shutdown given the system load. The estimate may be calculated based on full system load or reduced system load limited to a maximum value, as further described herein.

在一些实施例中，处理200基于估计的穿越时间和保持时间对一个或多个能量阈值进行编程(操作204)。例如，处理200可以估计PSU的大容量电容器中的电压水平，该电压水平将保证系统100在有限系统负载下的估计的保持时间量以完成高速缓存刷新和按顺序的电力轨道关机操作。然后可以将电压水平编程为阈值。在其它实施方式中，可以基于估计的穿越时间代替基于电压/能量的阈值来设置时间。In some embodiments, process 200 programs one or more energy thresholds based on the estimated transit time and hold time (operation 204). For example, process 200 may estimate the voltage level in the PSU's bulk capacitor that will guarantee the system 100 an estimated amount of hold time under limited system load to complete cache flush and sequential power rail shutdown operations. The voltage level can then be programmed to the threshold. In other embodiments, the time may be set based on an estimated transit time instead of a voltage/energy based threshold.

在一些实施例中，操作202和204被实现为与本文描述的其余操作分开的处理。例如，操作202和204可以在系统100的引导序列期间执行，其可以计算每个操作点所需的能量的量和相关联的电压阈值。可以部分地基于检测到的引导序列的系统组件以及在正常操作和/或减少的电力操作模式期间运行组件的估计电力要求来执行计算。然后引导序列可以设置系统100的可编程电压阈值。在其它实施例中，可以通过用户输入来设置或修改可编程阈值。例如，系统管理员可以为每个操作点设置可编程电压阈值，从而允许系统管理员注入关于系统电力要求的领域知识。In some embodiments, operations 202 and 204 are implemented as separate processes from the remaining operations described herein. For example, operations 202 and 204 may be performed during a boot sequence of system 100, which may calculate the amount of energy required for each operating point and the associated voltage threshold. The calculation may be performed based in part on the detected system components of the boot sequence and estimated power requirements to operate the components during normal operation and/or reduced power operating modes. The boot sequence may then set the programmable voltage threshold of system 100 . In other embodiments, the programmable threshold may be set or modified through user input. For example, a system administrator can set programmable voltage thresholds for each operating point, allowing the system administrator to inject domain knowledge about the system's power requirements.

再次参考图2，处理200包括监视AC电力的损失(操作206)。在一些实施例中，系统100包括可以嵌入在PSU 102a和/或PSU 102b中的传感器和/或电路系统，其检测输入AC电力何时中断。在其它实施例中，当AC电力丢失时，外部电路系统和/或传感器可以向系统100发出信号。Referring again to FIG. 2, process 200 includes monitoring for loss of AC power (operation 206). In some embodiments, system 100 includes sensors and/or circuitry that may be embedded in PSU 102a and/or PSU 102b that detect when input AC power is interrupted. In other embodiments, external circuitry and/or sensors may signal system 100 when AC power is lost.

基于监视电路系统，处理200可以检测AC电力的丢失(操作208)。作为响应，处理200触发通知(操作210)。在一些实施例中，通过解除断言acok信号来触发通知。解除断言该信号会提供以下警告：电力不再稳定并且PSU大容量电容器内的能量储备已下降至临界点，其中可能需要发起系统关机以保留数据的持久性状态，标志着估计的保持时间的开始。换句话说，该通知用于提醒电力管理子系统104在系统PSU中保留足够的能量以将电力轨道保持足够长的时间来执行处理器高速缓存110和存储器缓冲区112的完全高速缓存刷新。Based on the monitoring circuitry, process 200 may detect the loss of AC power (operation 208). In response, process 200 triggers a notification (operation 210). In some embodiments, notifications are triggered by deasserting the acok signal. Deasserting this signal provides the following warning: Power is no longer stable and the energy reserves within the PSU bulk capacitor have dropped to a critical point where a system shutdown may need to be initiated to preserve the data in a persistent state, marking the beginning of the estimated retention time . In other words, this notification serves to alert the power management subsystem 104 to reserve enough energy in the system PSU to maintain the power rail long enough to perform a full cache flush of the processor cache 110 and memory buffer 112 .

在一些实施方式中，早期警告机制与关机之前的固定时间间隔相关联，在这种情况下，电力管理子系统104可以假设系统100正在最大负载下操作并保证在最大负载下完成高速缓存刷新并按顺序关闭电力轨道的最小时间量。但是，这种方法可能导致保守的实施，其中系统关机可能比期望的更早发起，尤其是在最大PSU负载明显高于持久性高速缓存刷新期间的实际系统负载的情况下。相比之下，可编程的早期警告阈值允许系统在断言针对穿越的警告信号之前的能耗与针对断言保持的警告信号之后的能耗之间进行权衡。In some embodiments, the early warning mechanism is associated with a fixed time interval before shutdown, in which case the power management subsystem 104 can assume that the system 100 is operating at maximum load and guarantee that the cache flush is completed under the maximum load and Minimum amount of time to shut down power rails in sequence. However, this approach can lead to a conservative implementation where a system shutdown may be initiated earlier than desired, especially if the maximum PSU load is significantly higher than the actual system load during persistence cache flushes. In contrast, a programmable early warning threshold allows the system to trade off energy consumption before asserting a warning signal for crossing versus after asserting a warning signal for hold.

在通知已经被触发之后，处理200继续以第一操作模式对系统组件供电(操作212)。当以第一操作模式运行时，系统组件可以使用PSU大容量电容器中的能量来供电。在一些实施例中，可以提供电力，就像AC没有中断一样。在其它实施例中，可以在系统100内进行省电调整。例如，可以调节处理器频率、可以调暗显示器亮度、和/或可以采取其它节电动作。附加地或替代地，数据可以继续被写入到处理器高速缓存110和存储器缓冲区112中并在其中更新。After the notification has been triggered, process 200 continues to power the system components in the first mode of operation (operation 212). When operating in the first mode of operation, system components may be powered using energy in the PSU bulk capacitor. In some embodiments, power may be provided as if the AC were not interrupted. In other embodiments, power saving adjustments may be made within system 100 . For example, processor frequency may be adjusted, display brightness may be dimmed, and/or other power saving actions may be taken. Additionally or alternatively, data may continue to be written to and updated in processor cache 110 and memory buffer 112 .

处理200还基于编程的阈值来监视一个或多个系统PSU内的能量水平(操作214)。在一些实施例中，系统100包括用于监视PSU中的大容量电容器两端的电压的传感器。由于大容量电容器的电容值是固定的，因此当AC电力中断时，电容器中的电压可以用作PSU能量水平的代理。在其它实施例中，可以根据大容量电容器的电容值和测量的电压来计算能量水平。Process 200 also monitors energy levels within one or more system PSUs based on programmed thresholds (operation 214). In some embodiments, system 100 includes a sensor for monitoring the voltage across a bulk capacitor in the PSU. Since the capacitance value of bulk capacitors is fixed, the voltage in the capacitor can be used as a proxy for the PSU's energy level when AC power is interrupted. In other embodiments, the energy level may be calculated based on the capacitance value of the bulk capacitor and the measured voltage.

处理200还确定一个或多个PSU的能量水平是否满足阈值(操作216)。例如，如果一个或多个大容量电容器两端的测量电压低于在操作204处编程的电压阈值，那么处理200可以确定满足阈值。如果不满足阈值，那么处理200可以继续监视PSU能量水平，直到电力恢复或者PSU大容量电容器中的电压达到或下降到低于可编程阈值。一旦满足阈值，就可以断言警告信号以触发高速缓存刷新和断电序列。Process 200 also determines whether the energy level of one or more PSUs meets a threshold (operation 216). For example, if the measured voltage across one or more bulk capacitors is below a voltage threshold programmed at operation 204, process 200 may determine that the threshold is met. If the threshold is not met, process 200 may continue to monitor the PSU energy level until power is restored or the voltage in the PSU bulk capacitor reaches or drops below the programmable threshold. Once a threshold is met, a warning signal can be asserted to trigger cache flush and power-down sequences.

在一些实施例中，处理200通过减少系统负载以最小化电力消耗来进入第二操作模式(操作218)。在此阶段期间，电力管理子系统104可以关闭高速缓存刷新操作中不涉及的组件。例如，电力管理子系统104可以使外围组件124断电，外围组件124可以包括硬盘驱动器、风扇、显示器、外围组件快速互连(PCIe)设备和/或其它外围硬件。附加地或替代地，电力管理子系统104可以调节CPU 116的时钟速度和频率以最小化电力消耗。In some embodiments, process 200 enters the second mode of operation by reducing system load to minimize power consumption (operation 218). During this phase, the power management subsystem 104 may shut down components that are not involved in the cache flush operation. For example, the power management subsystem 104 may power down peripheral components 124, which may include hard drives, fans, displays, Peripheral Component Interconnect Express (PCIe) devices, and/or other peripheral hardware. Additionally or alternatively, the power management subsystem 104 may adjust the clock speed and frequency of the CPU 116 to minimize power consumption.

处理200还执行高速缓存刷新(操作220)。在高速缓存刷新操作期间，CPU 116可以将存储在处理器高速缓存110和存储器缓冲区112中的数据写入到持久性存储器114，以维持数据的持久性状态。在一些实施例中，处理200可以在该操作期间继续监视PSU能量水平。如果PSU能量水平下降到第二电压阈值以下，那么即使高速缓存刷新未完成，处理200也可以触发断电序列，以防止所有电力轨道同时断电。第二电压阈值可以被编程为比第一阈值低得多的水平，从而留下足够的能量来按顺序使电力轨道关闭。Process 200 also performs a cache flush (operation 220). During cache flush operations, CPU 116 may write data stored in processor cache 110 and memory buffer 112 to persistent memory 114 to maintain a persistent state of the data. In some embodiments, process 200 may continue to monitor PSU energy levels during this operation. If the PSU energy level drops below the second voltage threshold, process 200 may trigger a power-down sequence to prevent all power rails from being powered down simultaneously, even if the cache flush is not completed. The second voltage threshold can be programmed to a much lower level than the first threshold, leaving enough energy to sequentially turn the power rail off.

一旦高速缓存刷新完成，处理200就使剩余系统组件断电(操作224)。处理200可以按顺序关闭电力轨道以优雅地使系统100关机。使电力轨道关闭的顺序可能因系统而异。Once the cache flush is complete, process 200 powers down remaining system components (operation 224). The process 200 can sequentially shut down the power rails to gracefully shut down the system 100 . The order in which power rails are turned off may vary from system to system.

图2中描绘的处理可以维持持久性存储器状态，而无需安装BBU或以其它方式依赖于来自BBU的能量。替代地，一个或多个PSU的大容量电容器内的能量可以由电力管理子系统104管理以保证持久性。此外，电力管理子系统104考虑运行时电力负载，这允许可变的穿越时间和保持时间以更高效且有效地使用所存储的能量。The process depicted in Figure 2 can maintain persistent memory state without installing a BBU or otherwise relying on power from the BBU. Alternatively, the energy within the bulk capacitors of one or more PSUs may be managed by the power management subsystem 104 to ensure durability. Additionally, the power management subsystem 104 accounts for runtime power loads, which allows for variable ride-through and hold-up times for more efficient and effective use of stored energy.

3.2管理多个电源单元3.2Manage multiple power supply units

当系统中有多个PSU并且其中一个或多个PSU丢失AC电力时，单个PSU中的能量的量可能不足以完成高速缓存刷新操作。但是，跨多个PSU的聚合能量可能足以完成高速缓存刷新以维持数据的持久性状态。如果存在多个PSU，那么电力管理子系统104可以监视所有电源上可用的总能量。当聚合电压水平超过阈值时，电力管理系统104可以发出电力故障警告信号以触发高速缓存刷新操作。When there are multiple PSUs in the system and one or more of the PSUs lose AC power, the amount of energy in a single PSU may not be sufficient to complete the cache flush operation. However, the aggregate energy across multiple PSUs may be sufficient to complete cache flushes to maintain the persistent state of the data. If multiple PSUs are present, the power management subsystem 104 can monitor the total energy available on all power supplies. When the aggregate voltage level exceeds the threshold, the power management system 104 may issue a power failure warning signal to trigger a cache refresh operation.

在一些实施例中，电力管理子系统104检测关于被管理的每个PSU的以下事件：In some embodiments, the power management subsystem 104 detects the following events for each PSU being managed:

·AC电力丢失，这可以通过acok信号的解除断言来检测。如果从一个或多个PSU接收到该信号，那么电力管理子系统104可以进入第一操作模式，在AC电力恢复的情况下在穿越窗口期间减少电力。• Loss of AC power, which can be detected by de-assertion of the acok signal. If this signal is received from one or more PSUs, the power management subsystem 104 may enter a first mode of operation to reduce power during the ride-through window if AC power is restored.

·PSU中的能量或电压水平已超过第一阈值的指示，这可以通过vwarn信号的断言来检测。电力管理子系统104可以组合来自所有PSU的电压警告信息以确定何时进入第二操作模式，由此持久性高速缓存刷新处理程序106发起高速缓存刷新操作。电力管理系统104可以在第二操作模式期间进一步减少电力，如先前所描述的。· An indication that the energy or voltage level in the PSU has exceeded a first threshold, which can be detected by the assertion of the vwarn signal. Power management subsystem 104 may combine voltage warning information from all PSUs to determine when to enter the second operating mode, whereby persistent cache flush handler 106 initiates a cache flush operation. The power management system 104 may further reduce power during the second mode of operation, as previously described.

·PSU中的能量或电压水平已超过第二阈值的指示，这可以通过pwrok信号的断言来检测。电力管理子系统104可以组合pwrok信号信息来确定是否立即使系统断电。如果PSU能量水平的进一步下降无法留下足够的能量来安全地按顺序关闭电力轨道，那么可以触发关机。· An indication that the energy or voltage level in the PSU has exceeded the second threshold, which can be detected by the assertion of the pwrok signal. Power management subsystem 104 may combine the pwrok signal information to determine whether to immediately power down the system. A shutdown can be triggered if further drops in PSU energy levels do not leave enough energy to safely shut down the power rails in sequence.

在一些实施例中，电力管理子系统104维护一组每PSU计数器以在AC电力丢失的情况下跟踪每个PSU中的估计能量水平。每PSU计数器的初始值可以是硬编码或可编程的，以对应于vwarn断言时PSU中可用的能量的量。当电力管理子系统104检测到PSU已断言vwarn时，它可以开始以与系统中的活动电源的数量和每个电源的最大负载成比例的速率递减相关联的PSU的能量计数器。例如，如果存在单个活动PSU并且最大负载为1200瓦(W)，那么计数器可以以每毫秒1.2焦耳(J)的速率递减。如果有两个活动电源，那么每电源的负载为600W，并且能量计数器可以按600mJ/ms递减。使用四个活动电源时，能量计数器可以按300mJ/ms递减。作为另一个示例，如果最坏情况系统负载减少到1000W，那么计数器递减速率可以被修改为对于单个电源为1J/ms、对于两个电源为500mJ/ms，以及对于四个电源为250mJ/ms。可以调整计数器以提供最大穿越时间，同时维持足够的能量来维持针对长时间断电的持久性高速缓存刷新。In some embodiments, the power management subsystem 104 maintains a set of per-PSU counters to track estimated energy levels in each PSU in the event of a loss of AC power. The initial value of the per-PSU counter may be hard-coded or programmable to correspond to the amount of energy available in the PSU when vwarn is asserted. When the power management subsystem 104 detects that a PSU has asserted vwarn, it may begin decrementing the associated PSU's energy counter at a rate proportional to the number of active power supplies in the system and the maximum load of each power supply. For example, if there is a single active PSU and the maximum load is 1200 Watts (W), then the counter can be decremented at a rate of 1.2 Joules (J) per millisecond. If there are two active supplies, the load is 600W per supply and the energy counter can be decremented by 600mJ/ms. When using four active power supplies, the energy counter can be decremented by 300mJ/ms. As another example, if the worst case system load is reduced to 1000W, the counter deceleration rate can be modified to 1J/ms for a single power supply, 500mJ/ms for two power supplies, and 250mJ/ms for four power supplies. The counter can be adjusted to provide maximum traversal time while maintaining sufficient energy to sustain persistent cache refreshes against long power outages.

图3图示了根据一些实施例的用于管理具有多个电源的系统中的持久性高速缓存刷新操作的示例操作集。图3中所示的一个或多个操作可以被一起修改、重新布置或省略。因此，图3中所示的特定操作顺序不应被解释为限制一个或多个实施例的范围。Figure 3 illustrates an example set of operations for managing persistent cache flush operations in a system with multiple power supplies, in accordance with some embodiments. One or more operations shown in Figure 3 may be modified, rearranged, or omitted altogether. Accordingly, the specific sequence of operations shown in Figure 3 should not be construed as limiting the scope of one or more embodiments.

参考图3，处理300检测来自一个或多个PSU的一个或多个vwarn信号的断言(操作302)。如前所述，每个PSU可以被配置为当AC电力丢失并且(一个或多个)PSU大容量电容器中的能量下降到阈值以下时断言信号，该阈值可以是可编程的。Referring to Figure 3, process 300 detects assertion of one or more vwarn signals from one or more PSUs (operation 302). As mentioned previously, each PSU may be configured to assert a signal when AC power is lost and the energy in the PSU bulk capacitor(s) drops below a threshold, which threshold may be programmable.

响应于检测到(一个或多个)vwarn信号，处理300发起一个或多个相关联的倒计时器(操作304)。在一些实施例中，倒计时器跟踪已断言vwarn信号的每个PSU的估计能量水平。处理300可以以与系统中的PSU的数量和每电源的最大负载成比例的速率递减计数器。在其它实施例中，可以使用其它机制来跟踪PSU内的能量水平。例如，处理300可以递增计数器代替递减计数器直到达到阈值，或使用其它跟踪逻辑。In response to detecting the vwarn signal(s), process 300 initiates one or more associated countdown timers (operation 304). In some embodiments, a countdown timer tracks the estimated energy level of each PSU that has asserted the vwarn signal. Process 300 may decrement the counter at a rate proportional to the number of PSUs in the system and the maximum load per power supply. In other embodiments, other mechanisms may be used to track energy levels within the PSU. For example, process 300 may increment a counter instead of decrementing it until a threshold is reached, or use other tracking logic.

附加地或替代地，处理300可以响应于检测到一个或多个vwarn信号而使系统100进入减少电力模式。减少电力操作模式可以由单个信号或阈值数量的信号来触发，这取决于特定的实施方式。在其它实施例中，处理300可以随着每个新检测到的信号而逐渐减少电力。例如，处理300可以利用每个新的vwarn信号来逐渐调节CPU频率和/或发起或增加如前所述的其它节电动作。Additionally or alternatively, process 300 may cause system 100 to enter a reduced power mode in response to detecting one or more vwarn signals. The reduced power operating mode may be triggered by a single signal or a threshold number of signals, depending on the particular implementation. In other embodiments, process 300 may gradually reduce power with each newly detected signal. For example, process 300 may utilize each new vwarn signal to gradually adjust the CPU frequency and/or initiate or increase other power saving actions as described above.

处理300还监视(a)基于倒计时器(或其它跟踪逻辑)的组合PSU的聚合能量水平，(b)来自其它PSU的附加vwarn信号的断言，以及(c)来自PSU的pwrok信号的断言(操作306)。如果检测到附加的vwarn信号，那么处理300发起用于断言该信号的(一个或多个)PSU的相关联的倒计时器(操作304)。Process 300 also monitors (a) the aggregate energy level of the combined PSUs based on a countdown timer (or other tracking logic), (b) assertion of additional vwarn signals from other PSUs, and (c) assertion of pwrok signals from the PSUs (operation 306). If an additional vwarn signal is detected, process 300 initiates the associated countdown timer for the PSU(s) asserting the signal (operation 304).

如果聚合能量水平满足第一阈值，那么处理300执行高速缓存刷新操作(操作308)。例如，处理300可以确定所有PSU上的聚合能量水平等于或低于最小阈值。在高速缓存刷新操作期间，CPU 116可以将存储在处理器高速缓存110和存储器缓冲区112中的数据写入持久性存储器114，以维持数据的持久性状态。在一些实施例中，处理300可以在该操作期间继续监视PSU能量水平。If the aggregate energy level meets the first threshold, process 300 performs a cache flush operation (operation 308). For example, process 300 may determine that the aggregate energy level across all PSUs is equal to or below a minimum threshold. During cache flush operations, CPU 116 may write data stored in processor cache 110 and memory buffer 112 to persistent memory 114 to maintain a persistent state of the data. In some embodiments, process 300 may continue to monitor PSU energy levels during this operation.

如果高速缓存刷新操作完成，或者PSU下降到低于触发一个或多个pwrok信号的第二电压阈值数量，那么处理300按顺序关闭电力轨道(操作310)。当检测到pwrok信号时，即使高速缓存刷新未完成，处理300也可以发起断电序列，以防止所有电力轨道同时断电。第二电压阈值可以被编程为比第一阈值低得多的水平，从而留下足够的能量来按顺序使电力轨道关闭。If the cache flush operation is completed, or the PSU drops below a second voltage threshold amount that triggers one or more pwrok signals, process 300 sequentially shuts down the power rails (operation 310). When the pwrok signal is detected, process 300 may initiate a power down sequence to prevent all power rails from being powered down simultaneously, even if the cache flush is not completed. The second voltage threshold can be programmed to a much lower level than the first threshold, leaving enough energy to sequentially turn the power rail off.

图4图示了根据一些实施例的用于管理多个电源的示例系统400。系统400包括PSU402和404。但是，PSU的数量可以取决于特定实施方式而变化。每个PSU可以包括一个或多个大容量电容器，诸如电容器406，其存储从所连接的AC电力网络获得的静电能量。基于电容器的存储允许PSU以比BBU更小的占用区域实现，并提供更快的充电和放电速率。PSU 402和404可以连接到相同的AC电力网络或不同的AC电力网络，这取决于特定的实施方式。如果连接到不同的AC电力网络，那么一个PSU可能丢失AC电力，而另一个PSU继续由不同的AC网络供电。在这种场景下，每个PSU可以向电力管理子系统408提供单独的acok信号(未示出)。当AC电力丢失时，这些信号可以由各个PSU独立地解除断言，以发信号通知哪个PSU丢失电力。在其它情况下，acok信号的解除断言可以发信号通知一组PSU或所有PSU已丢失AC电力。Figure 4 illustrates an example system 400 for managing multiple power supplies in accordance with some embodiments. System 400 includes PSUs 402 and 404. However, the number of PSUs may vary depending on the specific implementation. Each PSU may include one or more bulk capacitors, such as capacitor 406, which store electrostatic energy obtained from the connected AC power network. Capacitor-based storage allows PSUs to be implemented with a smaller footprint than BBUs and provide faster charge and discharge rates. PSUs 402 and 404 may be connected to the same AC power network or different AC power networks, depending on the particular implementation. If connected to a different AC power network, it is possible for one PSU to lose AC power while the other PSU continues to be powered by a different AC network. In this scenario, each PSU may provide a separate acok signal to the power management subsystem 408 (not shown). When AC power is lost, these signals can be deasserted independently by each PSU to signal which PSU is losing power. In other cases, de-assertion of the acok signal may signal that a group of PSUs or all PSUs have lost AC power.

在一些实施例中，当大容量电容器(例如，大容量电容器406)中的能量达到阈值时，每个PSU断言vwarn信号。因此，vwarn信号向电力管理子系统408通知相关联的PSU的可用能量处于第一阈值水平。电力管理子系统408为每个PSU维护单独的能量计数器，当相关联的PSU断言vwarn信号时，这些计数器被触发。例如，当PSU 402断言vwarn信号时，电力管理子系统408可以以与系统中的PSU的数量和每电源的最大负载成比例的速率递减能量计数器410。能量计数器412独立于能量计数器410被管理(来自PSU的vwarn信号不触发对其它PSU的不关联的计数器的计数)并且响应于PSU 404断言vwarn信号而递减。电力管理子系统408包括加法器414，其将PSU的估计能量计数在一起求和以计算聚合能量计数器416。In some embodiments, each PSU asserts the vwarn signal when the energy in the bulk capacitor (eg, bulk capacitor 406) reaches a threshold. Therefore, the vwarn signal notifies the power management subsystem 408 that the available energy of the associated PSU is at the first threshold level. The power management subsystem 408 maintains separate energy counters for each PSU, and these counters are triggered when the associated PSU asserts the vwarn signal. For example, when PSU 402 asserts the vwarn signal, power management subsystem 408 may decrement energy counter 410 at a rate proportional to the number of PSUs in the system and the maximum load per power supply. Energy counter 412 is managed independently of energy counter 410 (the vwarn signal from a PSU does not trigger counting of unrelated counters for other PSUs) and is decremented in response to PSU 404 asserting the vwarn signal. The power management subsystem 408 includes a summer 414 that sums the estimated energy counts of the PSUs together to calculate an aggregate energy counter 416 .

在一些实施例中，电力管理子系统监视聚合能量计数器416以确定所有PSU上的聚合能量是否已经达到或低于系统阈值，该系统阈值可以是可编程的并且取决于特定实施方式而变化。如果达到阈值，那么电力管理子系统408断言SMI信号以停止CPU/芯片组422执行的当前任务，以便为持久性高速缓存刷新和重置做准备。响应于SMI，持久性高速缓存刷新处理程序424可以发起先前描述的持久性高速缓存刷新操作。In some embodiments, the power management subsystem monitors the aggregate energy counter 416 to determine whether the aggregate energy across all PSUs has reached or fallen below a system threshold, which may be programmable and vary depending on the particular implementation. If the threshold is reached, then the power management subsystem 408 asserts the SMI signal to stop the current task being executed by the CPU/chipset 422 in preparation for a persistent cache flush and reset. In response to the SMI, persistence cache flush handler 424 may initiate the previously described persistence cache flush operation.

图5图示了根据一些实施例的具有来自不同电源单元的交错警告信号的示例时序图500。图500的上半部分描绘了电力故障的定时，而图500的下半部分描绘了系统100可以响应于即将发生的电力故障而实现的潜在电力减少。图500假设最坏情况行为，其中系统以最大负载进行操作，直到触发电力故障高速缓存刷新。当触发电力故障高速缓存刷新时，系统电力消耗被减少，从而使完成刷新操作时的负载最小化。Figure 5 illustrates an example timing diagram 500 with interleaved warning signals from different power supply units in accordance with some embodiments. The upper portion of graph 500 depicts the timing of a power failure, while the lower portion of graph 500 depicts the potential power reduction that system 100 can achieve in response to an impending power failure. Diagram 500 assumes worst case behavior where the system operates at maximum load until a power failure cache flush is triggered. When a power-failure cache flush is triggered, system power consumption is reduced, minimizing the load on completing the flush operation.

图500的变量可以定义如下：The variables of graph 500 can be defined as follows:

·t_{psu0_v1warn}–第一PSU断言v1warn的时间·t _{psu0_v1warn} – The time the first PSU asserted v1warn

·t_{psu1_v1warn}–第二PSU断言v1warn的时间·t _{psu1_v1warn} – The time the second PSU asserted v1warn

·t_{flush_start}–电力管理子系统104发起电力故障刷新操作的时间· _{tflush_start} – the time when the power management subsystem 104 initiates the power fault refresh operation

·t_{psu1_pwrok}–第二PSU解除断言pwrok并且系统100开始断电序列的时间·t _{psu1_pwrok} – The time when the second PSU deasserts pwrok and system 100 begins the power down sequence

·T_{v1warn_delay}–t_{psu0_v1warn}和t_{psu1_v1warn}之间的时间延迟·T _{v1warn_delay} – time delay between t _{psu0_v1warn} and t _{psu1_v1warn}

·T_{v1warn_debounce}–t_{psu1_v1warn}直到t_{flush_start}之间的时间延迟，·T _{v1warn_debounce} – the time delay between t _{psu1_v1warn} and t _{flush_start} ,

·T_v1warn–当处于最大负载时利用电源中的所有E_v1warn能量的时间T _v1warn – the time to utilize all E _v1warn energy in the power supply when at maximum load

·P_max–最大系统负载·P _max – maximum system load

·P_throttle–第一PSU断言v1warn之后的系统负载P _throttle – System load after first PSU asserts v1warn

·P_debounce–所有PSU已断言v1warn之后的系统负载·P _debounce – System load after all PSUs have asserted v1warn

·P_flush–当电力故障刷新操作被触发后的系统负载· _Pflush – system load when power failure flush operation is triggered

·E_v1warn-在v1warn断言之后，但在pwrok解除断言之前PSU中可用的可使用能量E _v1warn - The usable energy available in the PSU after v1warn is asserted but before pwrok is deasserted

·E_pwrok–在pwrok解除断言之后并且在主电力轨道丢失之前PSU中可用的可使用能量· E _pwrok – the usable energy available in the PSU after pwrok is deasserted and before the main power rail is lost

·E_{psu0_reserve}–在第二PSU在t_{psu1_v1warn}处断言v1warn之后，在第一PSU中剩余的用于断言v1warn的可使用能量·E _{psu0_reserve} – The available energy remaining in the first PSU to assert v1warn after the second PSU asserts v1warn at t _{psu1_v1warn}

·E_reserve–在所有PSU上断言v1warn之后，但在最后一个PSU上的pwrok的解除断言之前，所有PSU中剩余的总可使用能量。· E _reserve – The total usable energy remaining in all PSUs after v1warn is asserted on all PSUs but before deassertion of pwrok on the last PSU.

·E_{v1warn_delay}–系统100在t_{v1warn_delay}期间消耗的能量·E _{v1warn_delay} – The energy consumed by system 100 during t _{v1warn_delay}

·E_flush–成功完成电力故障刷新操作所需的能量 _Eflush – the energy required to successfully complete a power failure flush operation

·N–为系统提供电力的活动PSU数量·N – Number of active PSUs providing power to the system

参考图5，当第一PSU在t_{psu0_v1warn}处断言v1warn时，该PSU在电源关闭相关联的电力轨道并停止向系统提供电力之前具有(E_v1warn+E_pwrok)可用能量。存在N个活动PSU，并且系统负载在系统中的所有活动PSU之间划分。后面的某个时间点，第二PSU在t_{psu1_v1warn}处断言v1warn。t_{psu0_v1warn}和t_{psu1_v1warn}之间的时间段期间消耗的能量的量被表示为E_{v1warn_delay}。Referring to Figure 5, when the first PSU asserts v1warn at t _{psu0_v1warn} , the PSU has (E _v1warn + E _pwrok ) available energy before the power shuts down the associated power rail and stops providing power to the system. There are N active PSUs, and the system load is divided among all active PSUs in the system. At some later point in time, the second PSU asserts v1warn at t _{psu1_v1warn} . The amount of energy consumed during the time period between t _{psu0_v1warn} and t _{psu1_v1warn} is expressed as E _{v1warn_delay} .

在第一电源已断言v1warn之后但在第二电源已断言v1warn之前的时间段期间，系统100从所有N个活动PSU汲取能量。第一PSU在断言v1warn之后消耗的最大能量为((E_v1warn+E_pwrok)。当第二CPU断言v1warn时在第一PSU中剩余的能量被表示为E_psu0reserve。System 100 draws power from all N active PSUs during the period after the first power supply has asserted v1warn but before the second power supply has asserted v1warn. The maximum energy consumed by the first PSU after asserting v1warn is ((E _v1warn +E _pwrok ). The energy remaining in the first PSU when the second CPU asserts v1warn is denoted as E _psu0reserve .

如果T_{v1warn_delay}小，那么在系统已消耗来自第一电源的所有能量之前，第二PSU解除断言pwrok。在最坏的情况下，当两个电源同时各自断言v1warn时，两个电源也会同时解除断言pwrok。在这些情况下，如果当所有电源均解除断言pwrok时系统关机，那么可能无法在第一电源中使用任何E_pwrok能量。为了允许这种可能性，系统100可以被配置为假设E_pwrok能量在第一电源中不可用。If T _{v1warn_delay} is small, the second PSU deasserts pwrok before the system has consumed all energy from the first power supply. In the worst case scenario, when both power supplies simultaneously assert v1warn and each assert v1warn, both power supplies will also simultaneously deassert pwrok. In these cases, if the system shuts down when all power supplies deassert pwrok, then any E _pwrok energy may not be used in the first power supply. To allow for this possibility, the system 100 may be configured to assume that _Epwrok energy is not available in the first power source.

为了利用两个电源中的所有能量，系统100可以被配置为使得当第二PSU断言v1warn时不立即开始断电刷新。替代地，系统100可以延迟刷新触发，直到所有活动PSU中保留的能量的量等于完成刷新所需的量。系统100还可以被配置为保证E_reserve≥E_flush以保留足够的能量来完成高速缓存刷新操作。To utilize all energy from both power supplies, the system 100 may be configured so that power-down refresh does not begin immediately when the second PSU asserts v1warn. Alternatively, system 100 may delay refresh triggering until the amount of energy reserved in all active PSUs equals the amount required to complete the refresh. System 100 may also be configured to ensure E _reserve ≥ E _flush to reserve sufficient energy to complete the cache flush operation.

如果两个PSU同时断言v1warn，那么T_{v1warn_delay}＝0，并且T_{v1warn_debounce}＝T_v1warn。如果PSU断言v1warn的时间间隔很远，那么T_{v1warn_debounce}＝0，并且一旦第二PSU断言v1warn，就可以触发电力故障刷新。电力管理子系统104可以相应地对能量/电压阈值进行编程。If both PSUs assert v1warn simultaneously, then T _{v1warn_delay} =0 and T _{v1warn_debounce} =T _v1warn . If the PSU asserts v1warn far apart, then T _{v1warn_debounce} = 0, and once the second PSU asserts v1warn, a power failure refresh can be triggered. The power management subsystem 104 may program the energy/voltage thresholds accordingly.

4.管理外部发起的异步重置事件4.Manage externally initiated asynchronous reset events

电力中断事件并不是系统关机或重置的唯一原因。在一些情况下，系统错误或用户动作可能触发系统关机或重置。对于这些外部发起的异步事件，监视电力丢失可能不足以维持持久性存储器状态，因为A/C电力可能相对恒定。异步硬件重置通常通过直接断言重置请求引脚来实现，该重置请求引脚在HW中发起重置，并且可能不提供在重置之前调用软件高速缓存刷新处理程序的任何能力。在一些实施例中，为了防止数据丢失，板逻辑(boardlogic)被配置为当检测到外部发起的重置请求时生成SMI信号以发起高速缓存刷新。Power outage events are not the only reason for system shutdown or reset. In some cases, system errors or user actions may trigger a system shutdown or reset. For these externally initiated asynchronous events, monitoring power loss may not be sufficient to maintain persistent memory state because A/C power may be relatively constant. Asynchronous hardware reset is typically implemented by directly asserting the reset request pin, which initiates the reset in the HW, and may not provide any ability to call the software cache flush handler before the reset. In some embodiments, to prevent data loss, board logic is configured to generate an SMI signal to initiate a cache flush when an externally initiated reset request is detected.

图6图示了根据一些实施例的用于处理外部发起的异步重置事件的示例操作集。图6中所示的一个或多个操作可以被一起修改、重新布置或省略。因此，图6中所示的特定操作序列不应被解释为限制一个或多个实施例的范围。Figure 6 illustrates an example set of operations for handling externally initiated asynchronous reset events in accordance with some embodiments. One or more operations shown in Figure 6 may be modified, rearranged, or omitted altogether. Accordingly, the specific sequence of operations illustrated in Figure 6 should not be construed as limiting the scope of one or more embodiments.

参考图6，处理600拦截HW重置请求信号的断言(操作602)。在一些实施例中，平台发起的重置请求(包括由系统管理模块118发起的那些请求)通过电力管理子系统104来代理。这允许系统100在执行所请求的重置或电力转换动作之前运行持久性高速缓存刷新处理程序106。Referring to Figure 6, process 600 intercepts assertion of the HW reset request signal (operation 602). In some embodiments, platform-initiated reset requests, including those initiated by system management module 118 , are proxied through power management subsystem 104 . This allows the system 100 to run the persistent cache flush handler 106 before performing the requested reset or power transition action.

在一些实施例中，处理600确定是否启用持久性高速缓存刷新(操作604)。如下面进一步描述的，系统固件(或其它系统逻辑)可以选择性地启用或禁用持久性高速缓存刷新以配置处理器高速缓存110中的数据是否包括在持久性存储器状态中。In some embodiments, process 600 determines whether persistent cache flushing is enabled (operation 604). As described further below, system firmware (or other system logic) may selectively enable or disable persistent cache flushing to configure whether data in processor cache 110 is included in persistent memory state.

如果未启用持久性高速缓存刷新，那么处理600将请求路由到重置引脚(操作606)。在一些实施例中，电力管理子系统104将请求路由到系统芯片组。芯片组可以发起HW重置序列。If persistent cache flushing is not enabled, process 600 routes the request to the reset pin (operation 606). In some embodiments, power management subsystem 104 routes requests to the system chipset. The chipset can initiate the HW reset sequence.

如果启用持久性高速缓存刷新，那么处理600将请求路由到系统管理模块118(操作608)。在这种情况下，重置引脚不会响应于平台或用户发起的重置而立即断言，以留出时间调用基于软件的高速缓存刷新处理程序。If persistent cache flushing is enabled, process 600 routes the request to system management module 118 (operation 608). In this case, the reset pin is not asserted immediately in response to a platform- or user-initiated reset to allow time for the software-based cache flush handler to be called.

在一些实施例中，处理600生成SMI信号以将系统100置于系统管理模式(操作610)。SMI信号可以由系统管理模块118断言，系统管理模块118可以使用直接绑定到CPU116的特殊信令线。该信号可以使系统固件120(例如，BIOS)暂停CPU 116正在执行的当前任务，以准备高速缓存刷新和重置。In some embodiments, process 600 generates an SMI signal to place system 100 into system management mode (operation 610). The SMI signal may be asserted by the system management module 118 , which may use special signaling lines tied directly to the CPU 116 . This signal may cause system firmware 120 (eg, BIOS) to pause the current task being executed by CPU 116 in preparation for cache flushing and reset.

在一些实施例中，如果启用持久性高速缓存刷新，那么系统固件120(例如，BIOS)将系统管理模块118内的通用输入/输出(GPIO)引脚配置为SMI的触发器。当要执行高速缓存刷新以及随后的热重置(warm reset)时，GPIO引脚可以用于向系统固件120发信号通知。该GPIO可能不同于用于向芯片组发信号通知即将到来的电力故障的GPIO，以传达持久性高速缓存刷新处理程序应通过请求热重置而不是断电来终止。In some embodiments, if persistent cache flushing is enabled, system firmware 120 (eg, BIOS) configures a general-purpose input/output (GPIO) pin within system management module 118 as a trigger for SMI. The GPIO pins may be used to signal system firmware 120 when a cache flush and subsequent warm reset is to be performed. This GPIO may be different from the one used to signal the chipset of an impending power failure, to convey that the persistent cache flush handler should be terminated by requesting a warm reset rather than a power outage.

处理600接下来执行高速缓存刷新操作(操作612)。响应于SMI信号，系统固件120可以调用高速缓存刷新处理程序106来管理高速缓存刷新操作，如先前所描述的。因此，数据从诸如处理器高速缓存110和存储器缓冲区112之类的易失性存储器传送到持久性存储器114，从而维持持久性状态。Process 600 next performs a cache flush operation (operation 612). In response to the SMI signal, system firmware 120 may invoke cache flush handler 106 to manage cache flush operations, as previously described. Accordingly, data is transferred from volatile memory, such as processor cache 110 and memory buffer 112, to persistent memory 114, thereby maintaining a persistent state.

处理600进一步确定刷新是否完成(操作614)。当处理器高速缓存110和存储器缓冲区112中的数据已被写入持久性存储器114时，持久性高速缓存刷新处理程序106可以断言信号或以其它方式提供通知。Process 600 further determines whether the refresh is complete (operation 614). Persistent cache flush handler 106 may assert a signal or otherwise provide notification when data in processor cache 110 and memory buffer 112 has been written to persistent memory 114 .

一旦高速缓存刷新完成，处理600就生成重置请求(操作622)。例如，持久性高速缓存刷新处理器106可以通过将特定值写入PCH的特定IO端口/寄存器(例如，0x06到端口CF9)或通过向芯片组请求系统逻辑对HW重置请求信号进行断言来发起系统重置。Once the cache flush is complete, process 600 generates a reset request (operation 622). For example, the persistent cache flush processor 106 may be initiated by writing a specific value to a specific IO port/register of the PCH (e.g., 0x06 to port CF9) or by requesting the system logic from the chipset to assert the HW reset request signal. System reset.

如果刷新未完成，那么处理600可以确定是否已达到超时(操作616)。例如，处理600可以允许一秒或另一个阈值时间段(其可以由系统100配置)用于完成刷新操作。在一些情况下，与重置事件相关联的系统状态可能防止刷新完成。实现超时可以防止系统100进入无法执行热重置的状态。If the refresh is not completed, process 600 may determine whether the timeout has been reached (operation 616). For example, process 600 may allow one second or another threshold period of time (which may be configured by system 100) for the refresh operation to complete. In some cases, the system state associated with the reset event may prevent the refresh from completing. Implementing a timeout prevents the system 100 from entering a state where a warm reset cannot be performed.

如果达到超时，那么处理600直接向芯片组生成重置请求信号(操作618)。操作622中的重置请求还可以是对芯片组的直接重置请求或者可以是基于软件的请求。因此，重置系统的机制可能会基于刷新是否成功完成而有所不同。If the timeout is reached, then process 600 generates a reset request signal directly to the chipset (operation 618). The reset request in operation 622 may also be a direct reset request to the chipset or may be a software-based request. Therefore, the mechanism for resetting the system may vary based on whether the refresh completed successfully.

响应于重置信号，然后系统100被重置(操作420)。在这种情境中的重置可以使得系统100关机或重新启动。In response to the reset signal, system 100 is then reset (operation 420). A reset in this scenario can cause the system 100 to shut down or restart.

图7图示了根据一些实施例的用于拦截和处理外部发起的异步重置事件的示例系统700。系统700包括系统管理模块702，系统管理模块702可以在诸如现场可编程门阵列(FPGA)之类的可编程硬件中实现，或者通过如下面进一步描述的其它硬件组件(或硬件和软件的组合)来实现。系统管理模块702充当代理，拦截硬件请求信号的断言，该硬件请求信号可以通过用户按下重置按钮或通过板管理控制器(BMC)或调试头(debug header)所断言的重置请求来触发。Figure 7 illustrates an example system 700 for intercepting and processing externally initiated asynchronous reset events in accordance with some embodiments. System 700 includes a system management module 702, which may be implemented in programmable hardware such as a field programmable gate array (FPGA), or through other hardware components (or a combination of hardware and software) as described further below. to fulfill. The system management module 702 acts as a proxy, intercepting the assertion of a hardware request signal, which may be triggered by a user pressing a reset button or a reset request asserted by a board management controller (BMC) or debug header. .

系统管理模块702包括逻辑门704，其将断言的重置请求信号路由到解复用器706。耦合到解复用器706的选择线是基于持久性高速缓存刷新是启用还是禁用来设置的。“0”或低电压状态表示其中持久性高速缓存刷新被禁用并且处理器高速缓存110和存储器缓冲区112中的数据不作为持久性域的一部分进行管理的存储器操作模式。“1”或高电压状态表示其中启用持久性高速缓存刷新并且处理器高速缓存110和存储器缓冲区112中的数据是持久性域的一部分的持久性高速缓存操作模式。但是，取决于特定的实施方式，选择线上的值可以交换。System management module 702 includes logic gate 704 that routes the asserted reset request signal to demultiplexer 706 . The select lines coupled to demultiplexer 706 are set based on whether persistent cache flushing is enabled or disabled. A "0" or low voltage state represents a memory operating mode in which persistence cache flushing is disabled and data in processor cache 110 and memory buffer 112 are not managed as part of the persistence domain. A "1" or high voltage state represents a mode of persistence cache operation in which persistence cache flushing is enabled and data in processor cache 110 and memory buffer 112 are part of the persistence domain. However, depending on the specific implementation, the values on the selection lines may be swapped.

当禁用持久性高速缓存刷新时，然后系统管理模块702向电耦合在CPU/芯片组712上的引脚断言请求重置中断信号。作为响应，CPU/芯片组712上的重置控制逻辑714暂停正在执行的当前任务并发起硬件重置，这可以包括发送信号以重置有限状态机(FSM)710。重置FSM 710可以以特定次序按顺序关闭电力轨道以避免损坏硬件组件。如前所述，使电力轨道关闭的顺序可以取决于系统体系架构而变化。When persistent cache flushing is disabled, the system management module 702 then asserts a request reset interrupt signal to a pin electrically coupled to the CPU/chipset 712 . In response, reset control logic 714 on the CPU/chipset 712 suspends the current task being executed and initiates a hardware reset, which may include sending a signal to reset the finite state machine (FSM) 710 . Resetting the FSM 710 can sequentially shut down the power rails in a specific order to avoid damage to hardware components. As mentioned previously, the order in which power rails are shut down can vary depending on the system architecture.

当启用持久性高速缓存刷新时，系统管理模块702使用直接绑定到CPU/芯片组712上的另一个引脚的特殊信令线来断言SMI。该信令线与先前描述的当未启用持久性高速缓存刷新时用于执行HW重置的线不同。响应于检测到SMI，CPU/芯片组712向电力故障刷新处理程序716发送基于软件的请求以发起持久性高速缓存刷新。When persistent cache flushing is enabled, the system management module 702 asserts SMI using a special signaling line tied directly to another pin on the CPU/chipset 712 . This signaling line is different from the previously described line used to perform a HW reset when persistent cache flushing is not enabled. In response to detecting the SMI, the CPU/chipset 712 sends a software-based request to the power failure flush handler 716 to initiate a persistent cache flush.

响应于该请求，持久性高速缓存刷新处理程序716发起持久性高速缓存刷新操作以将数据从处理器高速缓存和存储器缓冲区传送到持久性存储介质。如果高速缓存刷新成功完成，那么电力故障刷新处理程序716向重置控制逻辑714发送软件重置请求，这可以触发如前所述的断电序列。In response to the request, persistence cache flush handler 716 initiates a persistence cache flush operation to transfer data from the processor cache and memory buffers to the persistent storage medium. If the cache flush completes successfully, the power failure flush handler 716 sends a software reset request to the reset control logic 714, which can trigger the power down sequence as previously described.

当启用持久性高速缓存刷新时，系统管理模块702进一步初始化计时器708。计时器708可以递减或递增直到被取消或达到超时值。响应于检测到输入引脚上重置FSM 710的信号的断言，可以取消计数。该信号指示电力故障刷新处理程序716成功地将处理器高速缓存和存储器缓冲区刷新到持久性存储介质，并且已经发起重置序列。如果在计时器被取消之前达到超时值，那么系统管理模块702可以直接断言CPU/芯片组712上的rst_req_in引脚以触发HW重置。When persistent cache flushing is enabled, system management module 702 further initializes timer 708. Timer 708 may decrement or increment until canceled or a timeout value is reached. Counting may be canceled in response to detection of the assertion of a signal on the input pin that resets the FSM 710. This signal indicates that the power failure flush handler 716 successfully flushed the processor cache and memory buffers to the persistent storage medium and that a reset sequence has been initiated. If the timeout value is reached before the timer is canceled, the system management module 702 can directly assert the rst_req_in pin on the CPU/chipset 712 to trigger a HW reset.

5.协调系统组件之间的持久性高速缓存刷新状态5. Coordinate persistent cache flush state between system components

系统引导固件可以经由用户可配置选项公开持久性高速缓存刷新支持。但是，引导固件可以部署在各种各样的硬件平台上，并且提出该选项可能并不意味着特定平台硬件能够支持持久性高速缓存刷新。平台是否能够支持持久性高速缓存刷新可以取决于硬件配置、能量存储模块的存在和/或健康状况以及底层硬件组件的能力。在一些实施例中，系统100的组件参与握手以(a)确定硬件是否具有足够的能力来支持持久性高速缓存刷新；(b)有选择地启用/禁用持久性高速缓存刷新；(c)当启用持久性高速缓存刷新时，配置系统组件以支持持久性高速缓存刷新；以及(d)与操作系统通信是否已成功启用持久性高速缓存刷新。System boot firmware can expose persistent cache flush support via user-configurable options. However, boot firmware can be deployed on a wide variety of hardware platforms, and presenting this option may not imply that the specific platform hardware is capable of supporting persistent cache flushing. Whether a platform is able to support persistent cache flushes may depend on hardware configuration, the presence and/or health of the energy storage module, and the capabilities of the underlying hardware components. In some embodiments, components of system 100 participate in a handshake to (a) determine whether the hardware has sufficient capabilities to support persistent cache flush; (b) selectively enable/disable persistent cache flush; (c) when When persistent cache flush is enabled, configure system components to support persistent cache flush; and (d) communicate with the operating system whether persistent cache flush has been successfully enabled.

图8图示了根据一些实施例的用于协调持久性存储器操作模式的示例操作集。图8中所示的一个或多个操作可以被一起修改、重新布置或省略。因此，图8中所示的特定操作顺序不应被解释为限制一个或多个实施例的范围。Figure 8 illustrates an example set of operations for coordinating persistent memory operating modes in accordance with some embodiments. One or more operations shown in Figure 8 may be modified, rearranged, or omitted altogether. Accordingly, the specific sequence of operations shown in Figure 8 should not be construed as limiting the scope of one or more embodiments.

参考图8，处理800发起引导序列(操作802)。在一些实施例中，引导序列加载系统固件120，其可以包括BIOS固件。固件可以被配置为公开用于持久性高速缓存刷新的用户可配置选项。例如，固件可以向用户呈现提示：用户是否想要启用持久性高速缓存刷新，或者用户可以导航用户界面，诸如BIOS设置实用程序屏幕。Referring to Figure 8, process 800 initiates a boot sequence (operation 802). In some embodiments, the boot sequence loads system firmware 120, which may include BIOS firmware. The firmware can be configured to expose user-configurable options for persistent cache flushing. For example, the firmware may present a prompt to the user whether the user wants to enable persistent cache flushing, or the user may navigate a user interface, such as a BIOS setup utility screen.

在一些实施例中，用户界面公开“耐久性域”设置选项的多个设置以配置平台是否将以ADR模式或持久性高速缓存刷新模式进行操作。例如，用户界面可以公开用于选择“存储器控制器”设置或“CPU高速缓存层次结构”设置的选项。在“存储器控制器”设置中，启用了ADR，但禁用了持久性高速缓存刷新。当选择该设置时，存储器缓冲区112在断电事件期间被刷新，但是刷新操作不应用于处理器高速缓存110。在一些实施例中，可以默认地以这个设置配置系统硬件。In some embodiments, the user interface exposes multiple settings of the "Durability Domain" settings option to configure whether the platform will operate in ADR mode or persistence cache flush mode. For example, the user interface may expose options for selecting "Memory Controller" settings or "CPU Cache Hierarchy" settings. In the Memory Controller settings, ADR is enabled but persistent cache flushing is disabled. When this setting is selected, memory buffer 112 is flushed during a power outage event, but flush operations are not applied to processor cache 110 . In some embodiments, system hardware may be configured with this setting by default.

在“CPU高速缓存层次结构”设置中，启用了持久性高速缓存刷新。因此，如果选择该选项，那么在平台硬件支持持久性高速缓存刷新操作的情况下，在断电事件时刷新存储器缓冲区112和处理器高速缓存110中的数据。In the "CPU Cache Hierarchy" settings, persistent cache flushing is enabled. Therefore, if this option is selected, data in memory buffer 112 and processor cache 110 are flushed on a power outage event, provided the platform hardware supports persistent cache flush operations.

附加地或替代地，可以支持其它设置。例如，可以选择“标准域”设置，其中高速缓存的数据在电力故障事件时不被刷新。用户可以经由用户界面选择偏好的设置，如前所述。如果用户没有选择设置，那么系统固件120可以选择默认设置，该默认设置可以取决于特定的实施方式而变化。Additionally or alternatively, other settings may be supported. For example, you can select a "Standard Domain" setting, in which cached data is not flushed in the event of a power failure. The user can select preferred settings via the user interface, as described above. If the user does not select settings, system firmware 120 may select default settings, which may vary depending on the particular implementation.

在一些实施例中，系统固件120进行检查以确定持久性高速缓存模式是否已由用户选择或默认选择(操作804)。即使选择了该选项，平台硬件在一些情况下也可能不支持持久性高速缓存刷新操作。此外，系统硬件可能随着组件的添加、移除、老化和/或故障而随着时间的推移而演进。In some embodiments, system firmware 120 checks to determine whether persistent caching mode has been selected by the user or selected by default (operation 804). Even if this option is selected, the platform hardware may not support persistent cache flush operations in some cases. In addition, system hardware may evolve over time as components are added, removed, aged, and/or fail.

如果尚未选择持久性高速缓存模式，那么系统固件120继续引导序列而不通告对持久性高速缓存模式的支持(操作822)。引导序列可以包括初始化硬件组件、加载操作系统和/或处理尚未被处理的系统引导文件。引导序列可以继续，而不执行下面进一步描述的硬件能力检查。If persistent cache mode has not been selected, system firmware 120 continues the boot sequence without advertising support for persistent cache mode (operation 822). The boot sequence may include initializing hardware components, loading the operating system, and/or processing system boot files that have not yet been processed. The boot sequence can continue without performing the hardware capability checks described further below.

如果已选择持久性高速缓存模式，那么系统固件120向系统管理模块118发送请求以确定系统100是否能够支持持久性高速缓存刷新操作(操作806)。If persistent cache mode has been selected, system firmware 120 sends a request to system management module 118 to determine whether system 100 is capable of supporting persistent cache flush operations (operation 806).

响应于该请求，系统管理模块118评估系统100的硬件能力(操作808)。在一些实施例中，系统管理模块118可以参与和一个或多个硬件组件的握手以确定设置、配置和/或指示是否支持持久性高速缓存刷新的其它信息。例如，在引导序列期间，连接的硬件组件可以包括向系统固件提供该组件支持的特征列表的固件。系统管理模块118可以扫描所提供的特征列表和/或其它信息以确定特征是否与持久性高速缓存刷新兼容。In response to the request, system management module 118 evaluates the hardware capabilities of system 100 (operation 808). In some embodiments, the system management module 118 may participate in a handshake with one or more hardware components to determine settings, configuration, and/or other information indicating whether persistent cache flushing is supported. For example, during a boot sequence, a connected hardware component may include firmware that provides the system firmware with a list of features supported by the component. System management module 118 may scan the provided feature list and/or other information to determine whether the feature is compatible with persistent cache flushing.

在一些实施例中，评估系统100的硬件能力包括确定PSU 102a和/或PSU 102b是否支持生成预警信号以及配置可编程的vwarn阈值。例如，系统管理模块118可以确定PSU 402是否包括用于断言vwarn信号的引脚。如果PSU不具有这些能力，那么系统管理模块118可以确定平台硬件不支持持久性高速缓存刷新操作。In some embodiments, evaluating the hardware capabilities of system 100 includes determining whether PSU 102a and/or PSU 102b supports generating warning signals and configuring a programmable vwarn threshold. For example, system management module 118 may determine whether PSU 402 includes a pin for asserting the vwarn signal. If the PSU does not have these capabilities, system management module 118 may determine that the platform hardware does not support persistent cache flush operations.

附加地或替代地，系统管理模块118可以确定电力管理子系统104是否包括用于检测vwarn信号、监视多个PSU上的聚合能量水平和/或在系统范围能量水平低于阈值时触发中断的逻辑。如果电力管理子系统104不具有这些能力，那么系统管理模块118可以确定平台硬件不支持持久性高速缓存刷新操作。Additionally or alternatively, the system management module 118 may determine whether the power management subsystem 104 includes logic to detect vwarn signals, monitor aggregate energy levels across multiple PSUs, and/or trigger an interrupt when system-wide energy levels fall below a threshold. . If the power management subsystem 104 does not have these capabilities, the system management module 118 may determine that the platform hardware does not support persistent cache flush operations.

附加地或替代地，系统管理模块118可以评估其它硬件能力。例如，系统管理模块118可以评估系统100以确定系统是否支持拦截重置信号以及配置GPIO引脚来处理异步重置事件。作为另一个示例，系统管理模块118可以评估CPU 116以确定其是否包括用于调用持久性高速缓存刷新处理程序的特殊信令线。Additionally or alternatively, system management module 118 may evaluate other hardware capabilities. For example, system management module 118 may evaluate system 100 to determine whether the system supports intercepting reset signals and configuring GPIO pins to handle asynchronous reset events. As another example, system management module 118 may evaluate CPU 116 to determine whether it includes a special signaling line for invoking a persistent cache flush handler.

附加地或替代地，系统管理模块118可以确定是否已经安装了支持持久性高速缓存刷新操作的任何BBU。如果已经安装了BBU，那么系统管理模块118可以确定支持持久性高速缓存刷新，即使PSU体系架构不提供支持。另一方面，如果未安装BBU并且PSU和/或电力管理子系统不支持持久性高速缓存刷新操作，那么系统管理模块118可以确定不支持持久性高速缓存刷新。Additionally or alternatively, the system management module 118 may determine whether any BBUs that support persistent cache flush operations have been installed. If the BBU has been installed, the system management module 118 may determine that persistent cache flushing is supported even if the PSU architecture does not provide support. On the other hand, if the BBU is not installed and the PSU and/or power management subsystem does not support persistent cache flush operations, the system management module 118 may determine that persistent cache flushing is not supported.

附加地或替代地，系统管理模块118可以评估其它硬件能力。例如，系统管理模块118可以评估安装在平台中的辅助能量存储设备(诸如BBU)的容量，并且确定设备是否提供足够的能量来为在刷新处理期间活动的系统组件供电。附加地或替代地，系统管理模块118可以诸如通过测量电池阻抗来评估电池的健康状况，以确定平台硬件是否支持持久性高速缓存刷新。Additionally or alternatively, system management module 118 may evaluate other hardware capabilities. For example, system management module 118 may evaluate the capacity of auxiliary energy storage devices installed in the platform, such as BBUs, and determine whether the devices provide sufficient energy to power system components that are active during the refresh process. Additionally or alternatively, the system management module 118 may evaluate the health of the battery, such as by measuring battery impedance, to determine whether the platform hardware supports persistent cache flushing.

基于该评估，系统管理模块118向系统固件120返回指示平台是否能够支持持久性高速缓存刷新的响应(操作810)。如果支持的话，那么响应可以授予系统固件120启用持久性高速缓存刷新的许可。否则，系统管理模块118拒绝系统固件120启用持久性高速缓存刷新的能力。Based on this evaluation, system management module 118 returns a response to system firmware 120 indicating whether the platform is capable of supporting persistent cache flushing (operation 810 ). If supported, the response may grant system firmware 120 permission to enable persistent cache flushing. Otherwise, system management module 118 denies system firmware 120 the ability to enable persistent cache flushing.

当接收到响应时，系统固件120确定系统100是否支持持久性高速缓存刷新(操作812)。When the response is received, system firmware 120 determines whether system 100 supports persistent cache flushing (operation 812).

如果平台硬件不支持持久性高速缓存刷新，那么系统固件120继续引导序列而不向操作系统122通告对持久性高速缓存刷新的支持(操作822)。当持久性高速缓存刷新未被通告和启用时，操作系统122可以阻止应用尝试将处理器高速缓存视为在系统100中是持久性的。If the platform hardware does not support persistent cache flushing, system firmware 120 continues the boot sequence without advertising support for persistent cache flushing to operating system 122 (operation 822). When persistent cache flushing is not advertised and enabled, the operating system 122 may prevent applications from attempting to treat the processor cache as persistent in the system 100 .

如果支持持久性高速缓存刷新，那么系统固件120和/或系统管理模块118配置系统组件以支持持久性高速缓存刷新操作(操作814)。例如，系统固件120可以建立GPIO引脚、初始化每PSU计时器、配置PSU、以及以其它方式配置系统硬件/软件来执行高速缓存刷新操作，如前所述。If persistent cache flushing is supported, system firmware 120 and/or system management module 118 configure the system components to support persistent cache flushing operations (operation 814). For example, system firmware 120 may establish GPIO pins, initialize per-PSU timers, configure PSUs, and otherwise configure system hardware/software to perform cache flush operations, as previously described.

然后，系统固件120和/或系统管理模块118向操作系统122通告对持久性高速缓存刷新的支持(操作816)。在一些实施例中，系统固件120可以向操作系统122提供支持的特征和/或配置设置的列表。该列表可以包括指示支持并启用持久性高速缓存刷新的条目。但是，通告支持的方式可以取决于特定的实施方式而变化。System firmware 120 and/or system management module 118 then advertise support for persistent cache flushing to operating system 122 (operation 816). In some embodiments, system firmware 120 may provide a list of supported features and/or configuration settings to operating system 122 . The list can include entries indicating that persistent cache flushing is supported and enabled. However, the manner in which support is announced may vary depending on the specific implementation.

基于该通告，操作系统122检测是否支持持久性高速缓存模式(操作818)。例如，操作系统122可以在引导序列期间扫描支持的特征的列表以确定系统固件或系统管理模块118是否正在通告对持久性高速缓存刷新的支持。Based on the notification, operating system 122 detects whether persistent cache mode is supported (operation 818). For example, operating system 122 may scan a list of supported features during the boot sequence to determine whether system firmware or system management module 118 is advertising support for persistent cache flushing.

如果平台硬件启用并支持持久性高速缓存模式，那么操作系统122向一个或多个应用通告持久性高速缓存模式(操作820)。在一些实施例中，应用可以查询操作系统122以确定持久性高速缓存模式是否可用并且被支持。操作系统122可以提供响应以指示应用是否可以依赖于持久性高速缓存。应用可以取决于是否启用和支持持久性高速缓存来实现不同的逻辑。例如，如果被启用，那么数据库应用可以将读取和写入视为已提交，而无需实现复杂的基于软件的检查，这可以简化应用代码并提供更高效的读取和写入的执行。If the platform hardware enables and supports persistent cache mode, operating system 122 advertises persistent cache mode to one or more applications (operation 820). In some embodiments, the application may query the operating system 122 to determine whether persistent cache mode is available and supported. The operating system 122 may provide a response to indicate whether the application can rely on the persistent cache. Applications can implement different logic depending on whether persistent caching is enabled and supported. For example, if enabled, database applications can treat reads and writes as committed without having to implement complex software-based checks, which can simplify application code and provide more efficient execution of reads and writes.

随着系统组件的演进，可以重复处理800以确定对持久性高速缓存模式的支持是否已经改变。硬件的改变(诸如BBU的安装或PSU升级)可能导致系统100通告对其先前不支持的持久性高速缓存刷新的支持。在其它情况下，如果诸如BBU之类的组件被移除或出现故障，那么可以移除通告。As system components evolve, process 800 may be repeated to determine whether support for persistent cache mode has changed. A change in hardware, such as the installation of a BBU or a PSU upgrade, may cause the system 100 to advertise support for persistent cache flushes that it did not previously support. In other cases, the advertisement may be removed if a component such as the BBU is removed or fails.

6.硬件实施方式6. Hardware implementation

根据一个实施例，本文描述的技术由一个或多个专用计算设备来实现。专用计算设备可以是硬连线的以执行本技术，或者可以包括被持久性地编程以执行本技术的数字电子设备，诸如一个或多个专用集成电路(ASIC)、现场可编程门阵列(FPGA)或网络处理单元(NPU)，或者可以包括被编程为根据固件、存储器、其它存储装置或组合中的程序指令执行本技术的一个或多个通用硬件处理器。这种专用计算设备还可以将定制的硬连线逻辑、ASIC、FPGA或NPU与定制的编程组合来实现本技术。专用计算设备可以是台式计算机系统、便携式计算机系统、手持式设备、联网设备或结合硬连线和/或程序逻辑来实现技术的任何其它设备。According to one embodiment, the techniques described herein are implemented by one or more special purpose computing devices. Special purpose computing devices may be hardwired to perform the present technology, or may include digital electronic devices permanently programmed to perform the present technology, such as one or more application specific integrated circuits (ASICs), field programmable gate arrays (FPGAs) ) or network processing unit (NPU), or may include one or more general-purpose hardware processors programmed to perform the technology according to program instructions in firmware, memory, other storage devices, or a combination. This specialized computing device can also combine custom hardwired logic, ASIC, FPGA or NPU with custom programming to implement the technology. A special purpose computing device may be a desktop computer system, a portable computer system, a handheld device, a networked device, or any other device that incorporates hardwiring and/or program logic to implement technology.

例如，图9是图示可以在其上实现本发明的实施例的计算机系统900的框图。计算机系统900包括总线902或用于传送信息的其它通信机制以及与总线902耦合用于处理信息的硬件处理器904。硬件处理器904可以是例如通用微处理器。For example, FIG. 9 is a block diagram illustrating a computer system 900 on which embodiments of the invention may be implemented. Computer system 900 includes a bus 902 or other communication mechanism for communicating information and a hardware processor 904 coupled with bus 902 for processing information. Hardware processor 904 may be, for example, a general-purpose microprocessor.

计算机系统900还包括耦合到总线902用于存储信息和要由处理器904执行的指令的主存储器906，诸如随机存取存储器(RAM)或其它动态存储设备。主存储器906也可以用于存储在要由处理器904执行的指令的执行期间的临时变量或其它中间信息。这种指令当被存储在处理器904可访问的非暂态存储介质中时，使计算机系统900成为被定制用于执行指令中指定的操作的专用机器。Computer system 900 also includes main memory 906 , such as random access memory (RAM) or other dynamic storage device, coupled to bus 902 for storing information and instructions to be executed by processor 904 . Main memory 906 may also be used to store temporary variables or other intermediate information during execution of instructions to be executed by processor 904. Such instructions, when stored in a non-transitory storage medium accessible to processor 904, cause computer system 900 to become a special purpose machine customized to perform the operations specified in the instructions.

计算机系统900还包括耦合到总线902用于存储静态信息和处理器904的指令的只读存储器(ROM)908或其它静态存储设备。诸如磁盘或光盘之类的存储设备910被提供并且耦合到总线902，以用于存储信息和指令。Computer system 900 also includes a read-only memory (ROM) 908 or other static storage device coupled to bus 902 for storing static information and instructions for processor 904 . A storage device 910, such as a magnetic or optical disk, is provided and coupled to bus 902 for storing information and instructions.

计算机系统900可以经由总线902耦合到用于向计算机用户显示信息的显示器912，诸如阴极射线管(CRT)或发光二极管(LED)监视器。可以包括字母数字键和其它键的输入设备914耦合到总线902，用于将信息和命令选择传送到处理器904。另一种类型的用户输入设备是光标控件916，诸如鼠标、轨迹球、触摸屏或光标方向键，用于将方向信息和命令选择传送到处理器904并且用于控制显示器912上的光标移动。输入设备914典型地具有两个轴(第一轴(例如，x)和第二轴(例如，y))上的两个自由度，其允许设备在平面中指定位置。Computer system 900 may be coupled via bus 902 to a display 912 for displaying information to a computer user, such as a cathode ray tube (CRT) or light emitting diode (LED) monitor. Input device 914 , which may include alphanumeric and other keys, is coupled to bus 902 for communicating information and command selections to processor 904 . Another type of user input device is a cursor control 916 , such as a mouse, trackball, touch screen, or cursor direction keys, for communicating directional information and command selections to the processor 904 and for controlling cursor movement on the display 912 . Input device 914 typically has two degrees of freedom in two axes, a first axis (eg, x) and a second axis (eg, y), which allows the device to specify a position in a plane.

计算机系统900可以使用定制的硬连线逻辑、一个或多个ASIC或FPGA、固件和/或程序逻辑来实现本文描述的技术，该定制的硬连线逻辑、一个或多个ASIC或FPGA、固件和/或程序逻辑与计算机系统结合使计算机系统900成为专用机器或将计算机系统900编程为专用机器。根据一个实施例，本文的技术由计算机系统900响应于处理器904执行主存储器906中包含的一条或多条指令的一个或多个序列而执行。这些指令可以从另一个存储介质(诸如存储设备910)读取到主存储器906中。在主存储器906中包含的指令序列的执行使处理器904执行本文描述的处理步骤。在替代实施例中，可以使用硬连线电路系统代替软件指令或与软件指令组合使用。Computer system 900 may implement the techniques described herein using custom hardwired logic, one or more ASICs or FPGAs, firmware, and/or program logic. and/or program logic combined with the computer system to make the computer system 900 a special purpose machine or to program the computer system 900 as a special purpose machine. According to one embodiment, the techniques herein are performed by computer system 900 in response to processor 904 executing one or more sequences of one or more instructions contained in main memory 906 . These instructions may be read into main memory 906 from another storage medium, such as storage device 910 . Execution of the sequences of instructions contained in main memory 906 causes processor 904 to perform the process steps described herein. In alternative embodiments, hardwired circuitry may be used in place of or in combination with software instructions.

如本文使用的术语“存储介质”是指存储有使机器以特定方式操作的数据和/或指令的任何非暂态介质。这种存储介质可以包括非易失性介质和/或易失性介质。非易失性介质包括例如光盘或磁盘，诸如存储设备910。易失性介质包括动态存储器，诸如主存储器906。存储介质的常见形式包括例如软盘、柔性盘、硬盘、固态驱动器、磁带或任何其它磁性数据存储介质、CD-ROM、任何其它光学数据存储介质、具有孔图案的任何物理介质、RAM、PROM和EPROM、FLASH-EPROM、NVRAM、任何其它存储器芯片或盒式磁带、内容可寻址存储器(CAM)和三态内容可寻址存储器(TCAM)。The term "storage medium" as used herein refers to any non-transitory medium storing data and/or instructions that cause a machine to operate in a particular manner. Such storage media may include non-volatile media and/or volatile media. Non-volatile media include, for example, optical or magnetic disks, such as storage device 910 . Volatile media includes dynamic memory, such as main memory 906 . Common forms of storage media include, for example, floppy disks, flexible disks, hard disks, solid state drives, magnetic tape or any other magnetic data storage media, CD-ROM, any other optical data storage media, any physical media with a hole pattern, RAM, PROM and EPROM , FLASH-EPROM, NVRAM, any other memory chip or cartridge, Content Addressable Memory (CAM) and Tri-State Content Addressable Memory (TCAM).

存储介质与传输介质不同但可以与传输介质结合使用。传输介质参与在存储介质之间传递信息。例如，传输介质包括同轴线缆、铜线和光纤，包括包含有总线902的电线。传输介质还可以采取声波或光波的形式，诸如在无线电波和红外线数据通信期间生成的那些波。Storage media is distinct from but may be used in conjunction with transmission media. Transmission media participates in transferring information between storage media. For example, transmission media include coaxial cable, copper wire, and fiber optics, including the wires that include bus 902. Transmission media can also take the form of acoustic or light waves, such as those generated during radio wave and infrared data communications.

各种形式的介质可以涉及将一条或多条指令的一个或多个序列携带到处理器904以供执行。例如，指令最初可以在远程计算机的磁盘或固态驱动器上携带。远程计算机可以将指令加载到其动态存储器中并使用调制解调器通过诸如电话线、光纤线缆或同轴线缆之类的网络线发送指令。计算机系统900本地的调制解调器可以在网络线上接收数据并使用红外线发射器将数据转换为红外线信号。红外线检测器可以接收红外线信号中携带的数据，并且适当的电路系统可以将数据放置在总线902上。总线902将数据携带到主存储器906，处理器904从主存储器906中检索并执行指令。由主存储器906接收的指令可以可选地在由处理器904执行之前或之后存储在存储设备910上。Various forms of media may be involved in carrying one or more sequences of one or more instructions to processor 904 for execution. For example, the instructions may initially be carried on a disk or solid-state drive of the remote computer. The remote computer can load the instructions into its dynamic memory and use a modem to send the instructions over a network line such as a telephone line, fiber optic cable, or coaxial cable. A modem local to computer system 900 can receive data on the network line and use an infrared transmitter to convert the data into an infrared signal. An infrared detector can receive the data carried in the infrared signal, and appropriate circuitry can place the data on bus 902. Bus 902 carries the data to main memory 906, from which processor 904 retrieves and executes the instructions. Instructions received by main memory 906 may optionally be stored on storage device 910 before or after execution by processor 904 .

计算机系统900还包括耦合到总线902的通信接口918。通信接口918提供耦合到网络链路920的双向数据通信，该网络链路920连接到本地网络922。例如，通信接口918可以是综合业务数字网络(ISDN)卡、线缆调制解调器、卫星调制解调器、或向对应类型的电话线提供数据通信连接的调制解调器。作为另一个示例，通信接口918可以是提供到兼容的局域网(LAN)的数据通信连接的LAN卡。也可以实现无线链路。在任何这种实现中，通信接口918发送和接收携带表示各种类型信息的数字数据流的电信号、电磁信号或光学信号。Computer system 900 also includes a communications interface 918 coupled to bus 902 . Communication interface 918 provides bidirectional data communications coupled to network link 920 , which connects to local network 922 . For example, communications interface 918 may be an Integrated Services Digital Network (ISDN) card, a cable modem, a satellite modem, or a modem that provides a data communications connection to a corresponding type of telephone line. As another example, communications interface 918 may be a LAN card that provides a data communications connection to a compatible local area network (LAN). Wireless links can also be implemented. In any such implementation, communications interface 918 sends and receives electrical, electromagnetic, or optical signals that carry digital data streams representing various types of information.

网络链路920通常通过一个或多个网络向其它数据设备提供数据通信。例如，网络链路920可以通过本地网络922提供到主计算机924或到由互联网服务提供商(ISP)926操作的数据装备的连接。ISP 926进而通过现在通常称为“互联网”928的全球分组数据通信网络提供数据通信服务。本地网络922和互联网928两者都使用携带数字数据流的电信号、电磁信号或光学信号。通过各种网络的信号以及在网络链路920上并且通过通信接口918的信号是传输介质的示例形式，这些信号将数字数据携带到计算机系统900或携带来自计算机系统900的数字数据。Network link 920 typically provides data communications to other data devices through one or more networks. For example, network link 920 may provide a connection to a host computer 924 through a local network 922 or to data equipment operated by an Internet service provider (ISP) 926. The ISP 926 in turn provides data communications services over the global packet data communications network now commonly referred to as the "Internet" 928 . Both local network 922 and Internet 928 use electrical, electromagnetic, or optical signals that carry digital data streams. Signals through the various networks and signals on network link 920 and through communication interface 918 are example forms of transmission media that carry digital data to and from computer system 900 .

计算机系统900可以通过(一个或多个)网络、网络链路920和通信接口918发送消息和接收数据，包括程序代码。在互联网示例中，服务器930可以通过互联网928、ISP 926、本地网络922和通信接口918传输对于应用程序的所请求代码。Computer system 900 may send messages and receive data, including program code, over network(s), network link 920, and communication interface 918. In the Internet example, server 930 may transmit the requested code for the application over the Internet 928, ISP 926, local network 922, and communication interface 918.

所接收的代码可以在其被接收时由处理器904执行，和/或存储在存储设备910或其它非易失性存储装置中以供以后执行。The received code may be executed by processor 904 as it is received, and/or stored in storage device 910 or other non-volatile storage for later execution.

7.其它事项；扩展7. Other matters; expansion

实施例针对具有一个或多个设备的系统，该一个或多个设备包括硬件处理器并且被配置为执行本文描述的和/或以下权利要求中任一项所述的任何操作。Embodiments are directed to a system having one or more devices including a hardware processor and configured to perform any of the operations described herein and/or as recited in any of the following claims.

在实施例中，非暂态计算机可读存储介质包括指令，该指令当由一个或多个硬件处理器执行时，使得执行本文描述的和/或权利要求中任一项所述的任何操作。In embodiments, a non-transitory computer-readable storage medium includes instructions that, when executed by one or more hardware processors, cause any of the operations described herein and/or recited in any of the claims to be performed.

根据一个或多个实施例，可以使用本文描述的特征和功能的任何组合。在前述说明书中，已经参考许多具体细节描述了实施例，这些具体细节可能因实施方案而不同。因此，应当在说明性而不是限制性的意义上看待说明书和附图。本发明范围的唯一且排他的指示以及申请人意在要作为本发明范围的是以由本申请产生的一组权利要求的具体形式的所产生权利要求的字面和等同范围，包括任何后续的校正。Any combination of the features and functionality described herein may be used in accordance with one or more embodiments. In the foregoing specification, embodiments have been described with reference to numerous specific details that may vary from implementation to implementation. Accordingly, the specification and drawings are to be regarded in an illustrative rather than a restrictive sense. The sole and exclusive indication of the scope of the invention, and what applicants intend to be the scope of the invention, is the literal and equivalent scope of the claims in the specific form of a set of claims arising from this application, including any subsequent corrections.

Claims

1. A method, comprising:

identifying a request to initiate a reset or power conversion in a computing system;

responsive to identifying the request, determining whether a first mode is enabled in the computing system for performing a persistent cache refresh;

in response to determining to enable the first mode, generating an interrupt that triggers a persistent cache refresh operation; and

the reset or power conversion is delayed until the persistent cache refresh operation has completed or a timeout has expired.

2. The method of claim 1, further comprising brokering, by system logic, a source of a power conversion event, the system logic configured to (a) when in the first mode, generate an interrupt that triggers the persistent cache flush operation and delay the power conversion event; and (b) when operating in the second mode, directly routing a request to perform power conversion to a chipset of the computing system.

3. The method of claim 1, wherein the request to initiate the reset or power conversion is detected based on an error in the computing system.

4. The method of claim 1, wherein the request to initiate the reset or power conversion is detected based on a user interaction with hardware in the computing system.

5. The method of claim 1, wherein the request to initiate the reset or power conversion is detected based on a user interaction with a board management controller in the computing system.

6. The method of claim 1, further comprising: configuring a general purpose input/output pin of a hardware component within the computing system in response to determining to enable the first mode, wherein the hardware component triggers the persistent cache refresh operation when the interrupt is asserted on the general purpose input/output pin.

7. The method of claim 1, wherein the persistent cache refresh operation transfers data in a volatile processor cache to persistent memory.

8. The method of claim 1, wherein resetting the computing system after the persistent cache refresh operation has completed comprises writing a value to a register that triggers a warm reset of the computing system.

9. The method of claim 1, wherein initiating power conversion of the computing system after the persistent cache refresh operation has completed comprises writing a value to a register that triggers power conversion of the computing system.

10. The method of claim 1, wherein resetting the computing system after the persistent cache refresh operation has completed or a timeout has expired comprises asserting a reset request signal to a chipset of the computing system.

11. The method of claim 1, wherein initiating power conversion of the computing system after the persistent cache refresh operation has completed or a timeout has expired comprises asserting a power state conversion request signal to a chipset of the computing system.

12. The method of claim 1, wherein a reset or power conversion request is routed to an interrupt signal that triggers an interrupt handler that flushes a processor cache when the first mode is enabled.

13. The method of claim 1, wherein when the first mode is disabled, the reset request is routed to a reset pin in the system without flushing the processor cache.

14. A system, comprising:

a hardware processor;

a chipset coupled to the hardware processor, comprising (a) a first pin for receiving an interrupt signal that triggers a persistent cache refresh operation, and (b) a second pin for receiving a reset request signal;

wherein the chipset pauses task execution on the hardware processor and invokes a persistent cache flush handler in response to detecting the interrupt signal on the first pin;

wherein the chipset pauses task execution on the hardware processor and initiates a reset in response to detecting a reset request signal on the second pin without invoking the persistent cache refresh handler.

15. The system of claim 14, further comprising system logic to act as a proxy for a source of a reset or power conversion event and electrically coupled to the first pin and the second pin, wherein the system logic (a) when in the first mode, generates an interrupt signal to invoke the persistent cache flush handler and delays the reset or power conversion event; and (b) when operating in the second mode, routing a request to perform a reset or power conversion directly to a second pin of the chipset.

16. The system of claim 14, wherein the system logic comprises a timer that is started when the interrupt signal is generated; wherein the system logic is configured to send a reset request signal to the first pin in response to detecting that the timer has expired, and cancel the timer if the persistent cache refresh operation is completed before the timer has expired.

17. The system of claim 14, wherein the system logic intercepts requests from hardware components interacting with a user, a debug header, and a board management controller to initiate a reset or power conversion event.

18. The system of claim 14, wherein the first pin is a general purpose input/output pin; wherein the chipset is configured to detect an interrupt signal and invoke a persistent cache refresh handler in response to detecting that a persistent cache refresh is enabled on the system.

19. The system of claim 14, wherein the hardware processor comprises a set of volatile processor caches; wherein the persistent cache flush handler transfers data in the volatile processor cache to the persistent memory when invoked.

20. A system comprising means for performing the operations of any one of claims 1-13.

21. A non-transitory computer-readable medium comprising instructions that, when executed by a hardware processor, cause performance of the operations of any one of claims 1-13.