CN103370694B

CN103370694B - Restart the data processing system

Info

Publication number: CN103370694B
Application number: CN201280009293.9A
Authority: CN
Inventors: B.P.杜罗斯; J.S.霍利三世
Original assignee: Ab Initio Technology LLC
Current assignee: Ab Initio Technology LLC
Priority date: 2011-02-18
Filing date: 2012-02-16
Publication date: 2016-08-10
Anticipated expiration: 2032-02-16
Also published as: AU2012217636A1; CN103370694A; WO2012112763A1; AU2012217636B2; KR20140001988A; JP2017062838A; EP2676199B1; EP2676199A1; JP2014505958A; CA2827319A1; KR101835458B1; JP6377703B2; US20120216202A1; CA2827319C; US9116759B2

Abstract

Techniques are disclosed including a computer-implemented method that includes sending a message (604) through a process level that includes at least first and second processes that are executing as one or more tasks in response to a predetermined event (506), the message indicating an abort of execution of the one or more tasks, and initiating, by one or more of the processes upon receiving the message, the abort of execution of the one or more tasks (606).

Description

Restart the data processing system

相关申请的交叉引用Cross References to Related Applications

本申请对2011年2月18日提交的、名称为“Restarting Data ProcessingSystems”的美国申请序列No.13/031,078要求优先权，其全部内容通过引用合并于此。This application claims priority to US Application Serial No. 13/031,078, filed February 18, 2011, entitled "Restarting Data Processing Systems," the entire contents of which are hereby incorporated by reference.

技术领域technical field

本描述涉及重启数据处理系统。This description involves restarting a data processing system.

背景技术Background technique

由单处理器计算机提供的计算速度在过去数十年已极大地提高。然而，由这样的处理器执行的许多应用可能需要超过甚至最快的单处理器计算机的计算能力。例如，在例如航空预约系统的交易系统中，多个用户可能同时访问计算机资源。这些用户通常期望低响应时间。单处理器计算机可能不能跟上这样的需求。已开发了例如并行处理系统的各种架构来处理这样的应用以便提高性能。通常，并行处理系统使用可能位于单个地点或远程分布的多个处理器。由于它们的处理能力，对于处理大量数据的应用已变得依赖这样的并行处理系统，在某些情况下该处理大量数据的应用可以包括基本上连续的并接近实时的处理。期望这样的处理能力是鲁棒的和耐系统故障的，即容错的。这些能力对于从大规模基于因特网的数据处理到专用网络和通信系统（例如，企业内部的“内联网”等）范围的所有类型和大小的计算机网络是有用的。The speed of computation provided by single-processor computers has increased enormously over the past few decades. However, many applications performed by such processors may require computing power beyond that of even the fastest single-processor computers. For example, in a transactional system such as an airline reservation system, multiple users may have simultaneous access to computer resources. These users typically expect low response times. Single-processor computers may not be able to keep up with such demands. Various architectures, such as parallel processing systems, have been developed to handle such applications in order to improve performance. Typically, parallel processing systems use multiple processors that may be located at a single site or distributed remotely. Because of their processing capabilities, such parallel processing systems have come to be relied upon for applications that process large amounts of data, which in some cases may involve substantially continuous and near real-time processing. Such processing capabilities are expected to be robust and resistant to system failures, ie, fault-tolerant. These capabilities are useful for computer networks of all types and sizes, ranging from large-scale Internet-based data processing to private networking and communication systems (eg, "intranets" within enterprises, etc.).

发明内容Contents of the invention

在一个方面，通常，一种计算机实现的方法包括响应于预定事件发送消息通过包括正在作为一个或多个任务执行的至少第一和第二进程的进程级，所述消息指示一个或多个任务的执行的中止，并由进程中的一个或多个在接收到所述消息时，启动一个或多个任务的执行的中止。In one aspect, in general, a computer-implemented method includes sending a message through a process level including at least a first and a second process that is executing as one or more tasks in response to a predetermined event, the message indicating that the one or more tasks and one or more of the processes, upon receipt of said message, initiates the suspension of execution of one or more tasks.

多个方面可以包括下列内容中的一个或多个。Aspects may include one or more of the following.

所述计算机实现的方法可以包括在正在初始化第一和第二进程时存储与所述第一和第二进程中的每一个的初始状态有关的信息。进程的执行可以包括执行所述进程的至少一个执行阶段，并在完成对应的执行阶段时存储代表所述执行阶段的结束状态的信息。所述计算机实现的方法可以包括从保存的结束状态中的一个恢复第一和第二进程中的一个或多个的执行，而不需要关闭进程。The computer-implemented method may include storing information related to an initial state of each of the first and second processes while the first and second processes are being initialized. Execution of a process may include executing at least one execution phase of said process, and storing information representative of an end status of said execution phase upon completion of the corresponding execution phase. The computer-implemented method may include resuming execution of one or more of the first and second processes from one of the saved end states without closing the process.

预定事件可以代表到外部设备的连接的丢失。预定事件可以代表外部设备的错误。当已重建到外部设备的连接时，可以恢复进程执行。当已清除外部设备的错误时，可以恢复进程执行。可以从先于预定事件发生的执行阶段存储的结束状态恢复进程执行。如果预定事件发生在基本上紧跟启动进程之后，则可以从初始状态恢复第一和第二进程中的一个或多个的执行。A predetermined event may represent a loss of connection to an external device. A scheduled event may represent an error of an external device. Process execution may resume when the connection to the external device has been re-established. When the external device's error has been cleared, process execution can be resumed. Process execution can be resumed from an end state stored in a stage of execution prior to the occurrence of a scheduled event. Execution of one or more of the first and second processes may resume from the initial state if the predetermined event occurs substantially immediately after starting the process.

进程执行可以包括对接收到的数据流执行一个或多个处理动作以便产生输出数据。所述计算机实现的方法可以包括发送检查点消息通过进程级的第一和第二进程，检查点消息包括用于存储关于进程的当前信息的指令，并在进程处接收到检查点消息时中止进程的操作，并启动将与进程的当前执行状态有关的信息存储到存储区域。所述计算机实现的方法可以包括使用新的初始或结束状态覆写先前存储的初始或结束状态。Process execution may include performing one or more processing actions on a received data stream in order to generate output data. The computer-implemented method may include sending a checkpoint message through the process-level first and second processes, the checkpoint message including instructions for storing current information about the process, and terminating the process upon receipt of the checkpoint message at the process and initiates storing information about the current execution state of the process into the storage area. The computer-implemented method may include overwriting a previously stored initial or final state with a new initial or final state.

第一和第二进程中的每一个可以与一个或多个数据队列通信用于为进程接收数据并将所述数据排队。所述计算机实现的方法可以包括响应于与网络有关的事件生成检查点消息。与网络有关的事件可以代表网络关闭。所述计算机实现的方法可以包括周期性地生成检查点消息。所述计算机实现的方法可以包括响应于将由进程处理的引入数据记录内或从所述引入数据记录导出的一个或多个数据值来产生检查点消息。所述计算机实现的方法可以包括部分基于在恢复处理消息中包含的信息从保存的结束状态中的一个恢复第一和第二进程中的一个或多个的执行。所述计算机实现的方法可以包括在基本上紧跟第一进程的初始化之后的第一进程的第一执行阶段期间接收一个或多个消息，以及从保存的初始状态恢复第一进程的执行，而不需要关闭和重启第一进程。Each of the first and second processes can communicate with one or more data queues for receiving and queuing data for the process. The computer-implemented method may include generating a checkpoint message in response to a network-related event. Network-related events can represent a network shutdown. The computer-implemented method may include periodically generating checkpoint messages. The computer-implemented method may include generating a checkpoint message in response to one or more data values within or derived from an incoming data record to be processed by a process. The computer-implemented method may include resuming execution of one or more of the first and second processes from one of the saved end states based in part on information contained in the resume process message. The computer-implemented method may include receiving one or more messages during a first execution phase of the first process substantially immediately following initialization of the first process, and resuming execution of the first process from a saved initial state, and There is no need to shut down and restart the first process.

在另一方面，通常，存储计算机程序的计算机可读存储介质包括用于使得计算系统执行下列操作的指令：响应于预定事件发送消息通过包括执行一个或多个任务的至少第一和第二进程的进程级，所述消息指示正在执行的一个或多个任务的中止，并由进程中的一个或多个在接收到所述消息时，启动一个或多个任务的执行的中止。In another aspect, generally, a computer-readable storage medium storing a computer program includes instructions for causing a computing system to: send a message through at least a first and a second process including performing one or more tasks in response to a predetermined event At the process level, the message indicates the suspension of one or more tasks being executed, and upon receipt of the message by one or more of the processes, the suspension of the execution of the one or more tasks is initiated.

在另一方面，通常，计算系统包括设备或端口，其被配置为响应于预定事件发送消息通过包括执行一个或多个任务的至少第一和第二进程的进程级，所述消息指示正在执行的一个或多个任务的中止；以及至少一个处理器，其被配置为由进程中的一个或多个在接收到所述消息时，启动一个或多个任务的执行的中止。In another aspect, in general, a computing system includes a device or port configured to send a message through process levels including at least first and second processes performing one or more tasks in response to a predetermined event, the message indicating that a task is being performed and at least one processor configured to, upon receipt of said message by one or more of the processes, initiate suspension of execution of the one or more tasks.

另一方面，通常，计算系统包括用于响应于预定事件发送消息通过包括执行一个或多个任务的至少第一和第二进程的进程级的装置，所述消息指示正在执行的一个或多个任务的中止，以及用于由进程中的一个或多个在接收到所述消息时，启动一个或多个任务的执行的中止的装置。In another aspect, generally, a computing system includes means for sending a message through process levels including at least first and second processes performing one or more tasks in response to a predetermined event, the message indicating that one or more tasks are being performed. Abort of tasks, and means for initiating, by one or more of the processes upon receipt of said message, the abort of execution of one or more tasks.

多个方面可以包括下列优点中的一个或多个。Aspects can include one or more of the following advantages.

可以在不同的执行阶段中执行多进程处理系统中的进程。在系统故障的事件中，从最近完成检查点终止和重启处理系统可能消耗过度量的处理时间和资源。在处理系统已响应于异常情况终止其活动之后，可能需要由具有这样的系统的丰富经验的信息技术专家手动地重新初始化所述处理系统。这可能导致相当长的系统停机时间。在某些示例中，还可能需要单独的进程来检测系统故障并通知专家。像这样，为了提高效率并减少处理资源消耗，在重建到进程的故障连接时，可以从处理系统内的进程的最近记录的检查点执行所述处理系统内的进程，而非重启整个系统。在一个实现方式中，可以通知系统中的单个进程以暂停处理直到重建故障连接为止，而非终止和重启整个系统。Processes in a multiprocessing system can be executed in different stages of execution. In the event of a system failure, terminating and restarting the processing system from the most recently completed checkpoint may consume an excessive amount of processing time and resources. After a processing system has terminated its activities in response to an abnormal condition, it may be necessary to manually reinitialize the processing system by an information technology specialist with extensive experience with such systems. This can cause considerable system downtime. In some examples, a separate process may also be required to detect system failures and notify experts. As such, for increased efficiency and reduced processing resource consumption, a process within a processing system may be executed from its most recently recorded checkpoint rather than restarting the entire system when reestablishing a failed connection to the process. In one implementation, rather than terminating and restarting the entire system, individual processes in the system may be notified to suspend processing until the failed connection is re-established.

根据下列描述和根据权利要求，本发明的其他特征和优点将变得清楚。Other features and advantages of the invention will be apparent from the following description and from the claims.

附图说明Description of drawings

图1是多进程数据处理系统的框图。Figure 1 is a block diagram of a multi-process data processing system.

图2和3图示示例多进程数据处理系统。2 and 3 illustrate example multi-process data processing systems.

图4是图示示例检查点操作进程的流程图。4 is a flowchart illustrating an example checkpoint operation process.

图5和6是示例复原机制的流程图。5 and 6 are flowcharts of example restoration mechanisms.

具体实施方式detailed description

参考图1，数据处理系统100提供以流水线方式排列的多个进程用于处理数据。在示例系统100内，从数据源102（例如，在用作Web服务器的服务器104上执行的应用）接收数据，并向在计算机系统108上执行的或以分布的方式（例如，具有两个或多个联网的计算机终端）执行的多进程数据处理模块106传送所述数据。数据处理模块106监视、控制并执行系统100的数据处理方面。为了提供这样的处理，数据处理模块106包括能够存储要由一个或多个进程116、118处理的数据的一个或多个队列110、112、114。在该实例中，如所示，在初始数据队列110中存储从数据源102接收的数据，并向初始进程116周期性地提供所述数据。进程116处理数据（例如，变换、过滤、确认内容等）并向一个或多个下游数据队列112、112’提供处理后的数据。可以向随后的进程118、118’提供来自队列112、112’的数据，并在转而向其他下流数据队列114、114’递送结果之前执行其他（或类似）处理。图示的数据处理模块106的队列和进程布局是可以使用的许多可能的处理方案中的一个。例如，数据处理模块106可以包括可以位于示出的进程的上游、下游、或独立于示出的进程的另外的进程（例如，用于并行或串行执行）。在某些示例中，可以向例如关系数据库管理系统（RDBMS）的目标应用120（或多个应用）输出来自最后一组队列（例如，队列114和114’）的数据。Referring to FIG. 1, a data processing system 100 provides a plurality of processes arranged in a pipeline for processing data. In the example system 100, data is received from a data source 102 (e.g., an application executing on a server 104 acting as a web server) and sent to an application executing on a computer system 108 or in a distributed fashion (e.g., with two or The multi-process data processing module 106 executed by a plurality of networked computer terminals) transmits the data. Data processing module 106 monitors, controls, and executes the data processing aspects of system 100 . To provide for such processing, the data processing module 106 includes one or more queues 110 , 112 , 114 capable of storing data to be processed by one or more processes 116 , 118 . In this example, data received from data source 102 is stored in initial data queue 110 and provided to initial process 116 periodically, as shown. Process 116 processes data (eg, transforms, filters, validates content, etc.) and provides processed data to one or more downstream data queues 112, 112'. Subsequent processes 118, 118' may be provided with data from the queues 112, 112' and perform other (or similar) processing before passing results to other downstream data queues 114, 114'. The illustrated queue and process layout of the data processing module 106 is one of many possible processing schemes that may be used. For example, the data processing module 106 may include additional processes that may be located upstream, downstream, or independent of the illustrated processes (eg, for parallel or serial execution). In some examples, data from the last set of queues (eg, queues 114 and 114') can be output to target application 120 (or applications), such as a relational database management system (RDBMS).

在数据处理模块106中包括的进程可以与外部设备和/或其他处理系统（例如，计算机系统122）通信。例如，进程可以与向来自其他系统的进程提供消息的Java消息服务（JMS）队列通信。在某些情况下，数据处理模块106中的进程可以与一个或多个数据库通信（例如，位于外部系统122中）。例如，模块106可以基于从在一个或多个自动取款机（ATM）处的对应的客户会话接收的信息执行对银行数据库中的客户金融账户的更新。Processes included in data processing module 106 may communicate with external devices and/or other processing systems (eg, computer system 122 ). For example, a process can communicate with a Java Message Service (JMS) queue that provides messages to processes from other systems. In some cases, processes in data processing module 106 may communicate with one or more databases (eg, located in external system 122 ). For example, module 106 may perform an update of the customer's financial account in the bank's database based on information received from a corresponding customer session at one or more automated teller machines (ATMs).

通过示例的方式，图2图示具有用于提供数据以用于在中央位置的处理的远程执行处理模块202（由ATM204执行）的处理系统200。在图示的示例中，例如读取ATM进程206的初始进程能够从ATM接收客户账户数据（例如与交易相关联），并向验证账户进程208传递所述数据用于认证账户详情。在该实例中，验证账户进程208可以对照个人识别号（PIN）数据库210检验由客户输入的PIN。一旦已认证客户的身份，就可以向下游的检查余额进程212传送进一步的数据记录，所述检查余额进程212可以与第二、不同的数据库214通信例如用于检查所识别的客户账户的余额。在完成进一步的交易之后，可以向下游的更新余额进程216发送另外的数据，所述更新余额进程216可以与第三数据库218通信例如用于更新与客户账户相关联的余额信息。创建回答进程220可以准备可以向输出显示进程222提供的交易的输出概要（例如，用于在ATM204上向用户显示）。对于系统级监视（例如，系统质量保证）或其它应用，数据库210、214和218可以与主数据服务器224通信。在某些实现方式中，可以例如由独立的计算机系统226执行所述数据库。By way of example, FIG. 2 illustrates a processing system 200 having a remotely executed processing module 202 (executed by an ATM 204 ) for providing data for processing at a central location. In the illustrated example, an initial process such as the read ATM process 206 can receive customer account data (eg, associated with a transaction) from the ATM and pass the data to the verify account process 208 for authenticating account details. In this example, the verify account process 208 may check the PIN entered by the customer against a personal identification number (PIN) database 210 . Once the identity of the customer has been authenticated, a further data record may be passed downstream to a check balance process 212 which may communicate with a second, different database 214 eg for checking the balance of the identified customer account. After further transactions are completed, additional data may be sent downstream to the update balance process 216, which may communicate with the third database 218, eg, to update balance information associated with the customer account. Create answer process 220 may prepare an output summary of the transaction that may be provided to output display process 222 (eg, for display to a user at ATM 204 ). Databases 210 , 214 , and 218 may communicate with master data server 224 for system level monitoring (eg, system quality assurance) or other applications. In some implementations, the database can be executed, for example, by a separate computer system 226 .

可以在不同的执行阶段执行这样的多进程处理系统中的进程。进程的执行可以包括在不同的执行阶段中进程内的一个或多个任务的执行。通过将执行分段为这样的阶段，可以例如由数据处理中的多个逻辑端点或断点终止不同的执行阶段。每个执行阶段可以具有一个或多个处理活动来实现该执行阶段的目标。作为示例，可以以一个或多个方式在不同的执行阶段中执行验证账户进程208。例如，作为第一执行阶段，验证账户进程208最初可以从客户接收个人识别号（PIN）信息。在从客户接收PIN信息时的各种处理活动可以包括例如在ATM显示器上显示提示，以及运行例程来检验PIN信息的数据条目。在下一执行阶段，进程208可以建立到数据库210的连接，并使用PIN信息作为识别客户的记录的密钥。一旦进程208已完成其与客户的记录的交易，就可以终止到数据库210的连接。在最后的执行阶段，进程208可以基于上述交易生成结果。像这样，这些执行阶段中的每一个包括不同的逻辑端点（或处理断点），在所述逻辑端点处可以临时暂停和/或恢复验证账户进程208。Processes in such a multiprocessing system may be executed in different execution stages. Execution of a process may include the execution of one or more tasks within the process in different stages of execution. By segmenting execution into such phases, different execution phases can be terminated, for example, by multiple logical endpoints or breakpoints in the data processing. Each execution phase can have one or more processing activities to achieve the goals of that execution phase. As an example, the verify account process 208 may be performed in one or more ways in different stages of execution. For example, as a first stage of execution, the verify account process 208 may initially receive personal identification number (PIN) information from a customer. Various processing activities upon receiving PIN information from a customer may include, for example, displaying a prompt on an ATM display, and running routines to verify data entries for the PIN information. In the next stage of execution, process 208 may establish a connection to database 210 and use the PIN information as a key to identify the customer's record. Once the process 208 has completed its recorded transaction with the customer, the connection to the database 210 may be terminated. In a final execution stage, process 208 may generate results based on the transactions described above. As such, each of these execution phases includes a distinct logical endpoint (or processing breakpoint) at which the verify account process 208 can be temporarily suspended and/or resumed.

在某些情况下，可能发生趋向于影响正常系统操作过程的一个或多个事件。这样的事件可以是由在处理系统中包括的硬件或软件模块引起的异常或错误。例如，硬件异常或错误可以包括来自一个或多个硬件单元的重置、中断或其他信号。还可以由算术逻辑单元生成异常用于例如除以零、溢出、指令解码错误、未定义的指令等的数值错误。在某些情况下，连接到网络的设备可能故障或暂时离线，从而使得网络上的其他设备故障。基于一个或多个这样的事件的发生，可能需要临时停止数据库中的一个或多个数据库的操作用于进行校正动作（例如，维护、切换到第二系统等）。Under certain circumstances, one or more events may occur that tend to affect the normal course of system operation. Such events may be anomalies or errors caused by hardware or software modules included in the processing system. For example, a hardware exception or error may include a reset, interrupt, or other signal from one or more hardware units. Exceptions may also be generated by the arithmetic logic unit for numerical errors such as division by zero, overflow, instruction decode error, undefined instruction, and the like. In some cases, a device connected to the network may fail or go offline temporarily, causing other devices on the network to fail. Based on the occurrence of one or more of these events, it may be necessary to temporarily cease operation of one or more of the databases for corrective action (eg, maintenance, switching to a second system, etc.).

可能需要停止操作和矫正动作的其他事件可以包括检测数据库210-224中的一个或多个数据库的故障。可能由于各种原因发生这样的故障。例如，存储器分配中可能存在错误或向存储器空间写入时可能存在冲突。在例如当进程试图从用尽的账户提取资金时在底层的数据操作中也可能存在错误。除了存在临时故障的事件以外，还可能存在通过操作者干预触发的事件。在某些实现方式中，操作者可以校正引起故障的情况，或者系统可以及时校正所述情况。所述事件的示例在没有限制的情况下可以包括连接到网络的一个或多个设备的故障、用于维护的一个或多个设备或软件服务的关闭、设备或软件服务的故障或切换、例如存储空间的资源的耗尽、处理单元的负载、一个或多个软件服务的超时。Other events that may require cessation of operations and corrective action may include detecting a failure of one or more of the databases 210-224. Such failures may occur for various reasons. For example, there may be an error in the memory allocation or there may be a conflict writing to the memory space. There may also be errors in the underlying data manipulation, such as when a process attempts to withdraw funds from a depleted account. In addition to events where there are temporary failures, there may also be events triggered by operator intervention. In some implementations, the operator can correct the condition that caused the failure, or the system can correct the condition in time. Examples of such events may include, without limitation, failure of one or more devices connected to the network, shutdown of one or more devices or software services for maintenance, failure or switching of devices or software services, such as Exhaustion of storage resources, load on processing units, timeout of one or more software services.

为了检测并解决这样的事件，数据处理系统可以使用通常被称为检查点操作技术的一个或多个技术来在故障事件中或当使得系统离线用于维护或切换时保证最小系统停机时间。检查点操作技术总地涉及将进程的当前状态的详情存储为检查点记录，使得所述进程可以使用存储的信息来稍后从该状态重启。例如，验证账户进程204可以在完成每个执行阶段时（先于启动下一执行阶段或其他处理的执行）在检查点记录中保存其当前状态。To detect and resolve such events, data processing systems may employ one or more techniques commonly referred to as checkpointing techniques to ensure minimal system downtime in the event of a failure or when taking the system offline for maintenance or switchover. Checkpointing techniques generally involve storing details of the current state of a process as a checkpoint record so that the process can use the stored information to later restart from that state. For example, verify account process 204 may save its current state in a checkpoint record as it completes each execution stage (prior to initiating execution of the next execution stage or other processing).

检查点记录可以包括各种类型的信息，例如进程值、关于成功处理的记录的信息、以及与进程的当前执行阶段有关的其他详情。例如，检查点记录可以包括关于数据队列（例如，图1的数据队列）中的当前位置的信息，从所述位置正在处理数据。像这样，在停止操作之后，可以从该队列位置恢复处理。沿这些线，在从系统故障复原之后，进程能够从存储的中间检查点状态而非从初始状态重启。Checkpoint records can include various types of information, such as process values, information about successfully processed records, and other details about the current stage of execution of the process. For example, a checkpoint record may include information about the current position in a data queue (eg, the data queue of FIG. 1 ) from which data is being processed. As such, processing can be resumed from this queue position after the operation has been stopped. Along these lines, after recovering from a system failure, a process can restart from a stored intermediate checkpoint state rather than from an initial state.

作为示例，如果PIN数据库210故障，则验证账户进程208可以产生异常来使得终止整个处理系统。在重启时，处理系统中的进程（或进程中的部分）可以从它们的最近检查点状态继续处理。在该示例中，由于在客户提供他的PIN信息之后的时间点发生故障和重启，因此PIN信息被回复到进程并且不需要从客户重新收集。像这样，可能不存在再次提示客户提供他的PIN信息的需要。As an example, if the PIN database 210 fails, the verify account process 208 may generate an exception causing the entire processing system to terminate. On restart, processes (or parts of processes) in the processing system can continue processing from their most recent checkpointed state. In this example, since the failure and reboot occurred at a point in time after the customer provided his PIN information, the PIN information is reverted to the process and does not need to be recollected from the customer. As such, there may not be a need to prompt the customer again for his PIN information.

在系统故障的事件中，从最近完成检查点终止和重启处理系统可能消耗过度量的处理时间和资源。在处理系统已响应于异常情况终止其活动之后，处理系统可能需要由具有这样的系统的丰富经验的信息技术专家手动重新初始化。这可能导致相当长的系统停机时间。在某些示例中，单独的进程可能需要被设计为检测系统故障并通知专家。在名称为“Continuous FlowCheckpointing Data Processing”的美国专利No.6,584,581中、在名称为“Overpartitioning System and method for increasing checkpoints incomponent-based parallel applications”的美国专利No.5,819,021中、以及在名称为“Methods and Systems for Reconstructing the State of a Computation”的美国专利No.5,712,971中描述了检查点操作系统的示例，上述专利中的每一个的内容整体合并于此。In the event of a system failure, terminating and restarting the processing system from the most recently completed checkpoint may consume an excessive amount of processing time and resources. After a processing system has terminated its activities in response to an abnormal condition, the processing system may need to be manually reinitialized by an information technology specialist with extensive experience with such systems. This can cause considerable system downtime. In some examples, a separate process may need to be designed to detect system failures and notify experts. In US Patent No. 6,584,581 entitled "Continuous Flow Checkpointing Data Processing", in US Patent No. 5,819,021 entitled "Overpartitioning System and method for increasing checkpoints in component-based parallel applications", and in "Methods and Systems An example of a checkpoint operating system is described in US Patent No. 5,712,971 for Reconstructing the State of a Computation," the contents of each of which are hereby incorporated in their entirety.

为了提高效率并降低处理资源消耗，在重建故障连接时从进程的最近记录的检查点执行所述进程而非重启整个系统。在一个实现方式中，可以通知系统中的单个进程以中止处理直到重建故障连接为止，而非终止并重启整个系统。To improve efficiency and reduce processing resource consumption, a process is executed from its most recently recorded checkpoint rather than restarting the entire system when a failed connection is re-established. In one implementation, individual processes in the system may be notified to suspend processing until the failed connection is re-established, rather than terminating and restarting the entire system.

图3示出多进程系统300的框图，所述多进程系统300包括数据源进程302、进程304a-n、数据接收进程306，以及与其他进程（进程302、304a-n）中的每一个通信的容错管理器308。在某些实现方式中，容错管理器308可以作为多进程系统300内的另一进程执行。在某些情况下，容错管理器302可以是在单独的计算机系统（未示出）上运行的或在专用处理器中实现的应用，所述专用处理器是例如在例如名称为“Continuous Flow Checkpointing DataProcessing”的美国专利No.6,584,581中描述的检查点处理器，所述专利的内容整体合并于此。Figure 3 shows a block diagram of a multi-process system 300 comprising a data source process 302, processes 304a-n, a data receiving process 306, and communicating with each of the other processes (processes 302, 304a-n) fault tolerance manager 308 . In some implementations, fault tolerance manager 308 may execute as another process within multi-process system 300 . In some cases, fault tolerance manager 302 may be an application running on a separate computer system (not shown) or implemented in a special-purpose processor such as the one in, for example, the "Continuous Flow Checkpointing The checkpoint processor described in US Patent No. 6,584,581 of "Data Processing", the contents of which are incorporated herein in their entirety.

可以实现一个或多个技术来建立进程302-306和容错管理器308之间的通信。例如，单独的异常信道310a-n可以被用于传送关于可能在进程302-306中发生的异常情况的信息。信道310a-n可以是有线网络系统、无线网络系统、或有线或无线网络系统的组合的部分。可以由进程302-306使用信道310a-n来向容错管理器308传送关于进程302-306的错误信息。例如，如果与进程304a通信的外部设备故障，则进程304a可以立即产生错误标志并经由异常信道310b向容错管理器308传送所述错误。One or more techniques may be implemented to establish communication between processes 302-306 and fault tolerance manager 308. For example, separate exception channels 310a-n may be used to convey information about abnormal conditions that may occur in processes 302-306. Channels 310a-n may be part of a wired network system, a wireless network system, or a combination of wired or wireless network systems. Channels 310a-n may be used by processes 302-306 to communicate error information about processes 302-306 to fault tolerance manager 308. For example, if an external device communicating with process 304a fails, process 304a may immediately generate an error flag and communicate the error to fault tolerance manager 308 via exception channel 310b.

除了异常信道310a-n以外，容错管理器308可以通过对应的通信信道312a-e向进程302-306发送命令消息（例如，中止/暂停命令消息和检查点命令消息）。通信信道312a-e被布置为依序从容错管理器308向进程302-306中的每一个传送命令消息。例如，可以通过信道312b-d首先向数据源进程302传送来自容错管理器308的消息，然后顺次传递所述消息通过进程304a-n中的每一个和数据接收进程306。数据接收进程306可以使用信道312e向容错管理器308传送命令消息。In addition to exception channels 310a-n, fault tolerance manager 308 may send command messages (eg, abort/pause command messages and checkpoint command messages) to processes 302-306 over corresponding communication channels 312a-e. Communication channels 312a-e are arranged to transmit command messages from fault tolerance manager 308 to each of processes 302-306 in sequence. For example, a message from fault tolerance manager 308 may first be communicated to data source process 302 via channels 312b-d, and then passed sequentially through each of processes 304a-n and data receiving process 306. Data receiving process 306 may transmit command messages to fault tolerance manager 308 using channel 312e.

信道信道进程304a可以与外部数据库318（在计算机系统320上执行）进一步通信。有时，到数据库318的连接可能故障，或可能使得数据库318离线用于维护。所述故障可以是执行数据库318的计算机系统320的硬件故障。在这样的情况下，进程304a可以在异常信道310a上产生错误标志来通知容错管理器308连接丢失。Channel Channel process 304a may further communicate with external database 318 (executing on computer system 320). From time to time, the connection to database 318 may fail, or database 318 may be taken offline for maintenance. The failure may be a hardware failure of the computer system 320 executing the database 318 . In such a case, process 304a may generate an error flag on exception channel 310a to notify fault tolerance manager 308 that the connection was lost.

在接收到错误的通知时，容错管理器308可以生成并传播中止命令消息通过进程302-306。在某些实施方式中，中止命令消息322通知进程302-306中的每一个暂停操作并中止任何正在进行的工作。中止命令消息322可以是使得进程中止它们的当前处理的特殊消息分组。Upon receiving notification of an error, fault tolerance manager 308 may generate and propagate an abort command message through processes 302-306. In some implementations, the abort command message 322 notifies each of the processes 302-306 to suspend operations and abort any work in progress. Abort command message 322 may be a special message packet that causes processes to abort their current processing.

通常首先通过信道312a向数据源进程302传送中止消息，然后通过信道312b-d通过进程302-306中的每一个，最后通过信道312e回到容错管理器308。在接收到中止消息322时，进程302-306中的每一个以相对较小的延迟（如果存在）中止其当前活动，并刷新/丢弃自最近检查点状态以来可能已处理的任何未完成的任务或记录。在进程已中止其活动之后，其可以向下一个下游进程传递中止消息322。以这样的方式，中止消息322在返回到容错管理器308之前一直传播到接收进程306。容错管理器308等待直到其从接收进程306接收中止消息322为止，这确保所有进程302-306已中止当前处理任务（例如，处于静止状态中）。The abort message is typically transmitted first to data source process 302 via channel 312a, then through each of processes 302-306 via channels 312b-d, and finally back to fault tolerance manager 308 via channel 312e. Upon receipt of an abort message 322, each of the processes 302-306 aborts its current activity with a relatively small delay (if any) and flushes/discards any outstanding tasks that may have been processed since the most recent checkpointed state or record. After a process has suspended its activity, it may pass an abort message 322 to the next downstream process. In this manner, the abort message 322 propagates to the receiving process 306 before returning to the fault tolerance manager 308 . The fault tolerance manager 308 waits until it receives an abort message 322 from the receiving process 306, which ensures that all processes 302-306 have aborted the current processing task (eg, are in a quiescent state).

在数据库318由于计算机系统320中的硬件故障而发生故障时，进程302-306被指示为中止它们的处理。在某些实施方式中，在系统已完全中止其处理之后，进程302可以等待反映校正故障所需的平均时间量的时间的可指定的时间量，并再次从最近的保存的检查点状态开始处理。在某些实施方式中，进程304a可以周期性地轮询数据库318来检查其状况（即，检查数据库318是否是操作的）。在某些示例中，计算机系统320可以被配置为当数据库318被回复为操作状态时，自动通知进程304a。当重建与数据库318的连接时，处理系统300可以再次从最近保存的检查点状态开始处理。When database 318 fails due to a hardware failure in computer system 320, processes 302-306 are instructed to abort their processing. In some implementations, after the system has completely suspended its processing, process 302 may wait a specifiable amount of time that reflects the average amount of time required to correct a failure, and begin processing again from the most recent saved checkpoint state . In some implementations, the process 304a may periodically poll the database 318 to check its status (ie, check whether the database 318 is operational). In some examples, computer system 320 may be configured to automatically notify process 304a when database 318 is returned to an operational state. When the connection to database 318 is re-established, processing system 300 may again begin processing from the most recently saved checkpoint state.

在这方面，进程304a通知容错管理器308已重建连接。容错管理器308对于进程302-306中的每一个确定最近成功完成的检查点状态，并向进程302-306中的每一个发送恢复处理消息（未示出）。正如其他命令消息，恢复处理消息经由通信信道312a-e依序传播通过进程302-306中的每一个。In this regard, process 304a notifies fault tolerance manager 308 that the connection has been re-established. Fault tolerance manager 308 determines, for each of processes 302-306, the state of the most recently successfully completed checkpoint and sends a recovery process message (not shown) to each of processes 302-306. As with other command messages, resume processing messages propagate sequentially through each of processes 302-306 via communication channels 312a-e.

在实施方式中，恢复处理消息指定检查点状态，从所述检查点状态进程302-306将恢复处理。检查点操作涉及将多个检查点状态存储到相应的存储区域。为了存储与进程302、304a-n、306中的每一个相关联的检查点状态数据，存储区域（例如，存储器）可以被分配给每个进程。每个进程在不同的执行阶段的结束周期性地暂停其当前操作，并且在相关联的存储区域中存储其检查点数据。例如，数据源进程302可以在处理中的不同执行阶段的结束（例如引入数据流中的逻辑断点）周期性地暂停其当前操作，并在存储区域311中存储检查点信息。以这种方式，当执行进程302、304a-n、306中的每一个时，对应的存储区域311、314a-n、316周期性地保存检查点数据。检查点数据可以包括关于当前状态的信息和/或与进程302-306相关联的数据以便允许在稍后的时间重新构建那些状态。可以使用各种类型的存储技术来实现存储区域311-316，例如在诸如硬盘驱动器的磁性介质的非易失性贮存器上。In an embodiment, the resume processing message specifies a checkpoint state from which the process 302-306 will resume processing. A checkpoint operation involves storing multiple checkpoint states to corresponding storage areas. To store checkpoint state data associated with each of the processes 302, 304a-n, 306, a storage area (eg, memory) may be allocated to each process. Each process periodically suspends its current operation at the end of a different phase of execution and stores its checkpoint data in an associated storage area. For example, data source process 302 may periodically suspend its current operation at the end of different execution stages in processing (eg, introducing a logical breakpoint in the data flow) and store checkpoint information in storage area 311 . In this manner, as each of the processes 302, 304a-n, 306 executes, the corresponding storage area 311, 314a-n, 316 periodically holds checkpoint data. Checkpoint data may include information about the current state and/or data associated with the processes 302-306 to allow reconstruction of those states at a later time. Storage areas 311-316 may be implemented using various types of storage technologies, for example on non-volatile storage on magnetic media such as hard drives.

容错管理器308通过生成并通过通信信道312a-e向每个进程302-306依序传递检查点命令消息（未示出）来管理检查点操作。在共同未决的美国专利申请No.13/030,998中更详细地提供了关于检查点命令消息传递系统的描述，所述申请的内容通过引用全部合并于此。检查点命令消息传递通过每个进程302-306，使得所述进程可以在接收到消息时对其当前状态执行检查点操作。像这样，检查点命令消息行进到数据源进程302，然后在返回到容错管理器308之前依序通过每个进程304a-n和数据接收进程306。可以以固定的时间间隔自动启动该检查点操作。例如，容错管理器308可以以例如每五分钟的预定周期率启动检查点命令消息。所述周期率可以被设定为默认值，或由用户调整。在某些示例中，一个或多个外部触发器可以启动用于存储检查点信息的操作。在一个实例中，网络消息可以通知容错管理器308即将到来的网络关闭，从而触发检查点操作。在某些实施方式中，可以响应于正在处理的数据记录内或从所述数据记录导出的值触发检查点操作。例如，处理后的数据记录可以包括可以被视为检查点操作可发生的逻辑点的时间戳或断点值。Fault tolerance manager 308 manages checkpoint operations by generating and sequentially delivering checkpoint command messages (not shown) to each process 302-306 over communication channels 312a-e. A more detailed description of the checkpoint command messaging system is provided in co-pending US Patent Application No. 13/030,998, the contents of which are hereby incorporated by reference in their entirety. A checkpoint command message is passed through each process 302-306 so that the process can checkpoint its current state when the message is received. As such, the checkpoint command message travels to the data source process 302 , then through each process 304 a - n and the data receiving process 306 in sequence before returning to the fault tolerance manager 308 . This checkpoint operation can be started automatically at regular intervals. For example, fault tolerance manager 308 may initiate checkpoint command messages at a predetermined periodic rate, such as every five minutes. The cycle rate can be set as a default, or adjusted by the user. In some examples, one or more external triggers may initiate operations for storing checkpoint information. In one example, a network message may notify fault tolerance manager 308 of an impending network shutdown, thereby triggering a checkpoint operation. In some implementations, a checkpoint operation may be triggered in response to a value within or derived from a data record being processed. For example, a processed data record may include a timestamp or a breakpoint value that may be considered a logical point at which a checkpoint operation may occur.

随着在数据正在被系统处理的时段期间存储检查点信息，可以先于处理数据存储信息。在一个实施方式中，可以在初次初始化多进程系统300时，例如在起动期间，触发初始检查点操作。容错管理器308可以传递初始检查点命令消息通过进程302-306中的每一个。在图3中示出的示例中，首先向数据源进程302传送初始检查点消息。数据源进程302立即进行检查点操作，例如将代表所述数据源进程302的初始状态的数据存储到相关联的数据存储空间311，并向下游的下一进程304a传递初始检查点消息。该初始检查点状态被称为检查点状态零。类似地，以串行方式，进程304-306中的每一个可以对应地将其初始状态和相关联的数据值存储到适当的信息区域作为检查点状态零。在示例中，初始状态和相关联的数据值可以包括全局变量的初始值、参考数据信息、和包括计数器的初始值的审核变量。As the checkpoint information is stored during the period in which the data is being processed by the system, the information may be stored prior to processing the data. In one embodiment, an initial checkpoint operation may be triggered when the multi-process system 300 is first initialized, eg, during startup. Fault tolerance manager 308 may pass initial checkpoint command messages through each of processes 302-306. In the example shown in FIG. 3 , an initial checkpoint message is first transmitted to the data source process 302 . The data source process 302 immediately performs a checkpoint operation, such as storing data representing the initial state of the data source process 302 in the associated data storage space 311, and passing an initial checkpoint message to the downstream next process 304a. This initial checkpoint state is called checkpoint state zero. Similarly, in serial fashion, each of the processes 304-306 may respectively store its initial state and associated data values to the appropriate information area as checkpoint state zero. In an example, the initial state and associated data values may include initial values of global variables, reference data information, and audit variables including initial values of counters.

在进程302-306中的每一个已存储它们的初始状态之后，初始检查点命令消息通过信道312e返回到容错管理器308。基于在其通过进程302、304a-n、306的往返之后返回到容错管理器308的消息，警告容错管理器进程302-306已完成检查点状态零。在某些实施方式中，虽然下游进程正在保存它们的当前状态，但是源和其他上游进程可以继续接收数据并执行其他功能，而不需要等待所有进程保存它们的状态。After each of processes 302-306 have stored their initial state, an initial checkpoint command message is returned to fault tolerance manager 308 via channel 312e. Based on the message returned to the fault tolerance manager 308 after its round trip through the processes 302, 304a-n, 306, the fault tolerance manager processes 302-306 are alerted that they have completed checkpoint status zero. In some implementations, while downstream processes are saving their current state, the source and other upstream processes can continue to receive data and perform other functions without waiting for all processes to save their state.

类似地，对于进程302-306的每个不同的执行阶段可以执行另外的检查点操作。像这样，随着存储代表初始检查点信息的数据，容错管理器308可以启动例如代表与随后的检查点周期（例如，检查点状态1、2、3、……、n）相关联的信息的另外的信息的存储。为了启动随后的检查点信息的存储，可以使用例如传播进一步的检查点命令消息通过进程302、304a-n、306的技术。在接收检查点命令消息时，进程304a可以完成任何正在进行的任务或暂停任何未完成的任务。在某些示例中，进程304a可以删除在数据存储器314中存储的先前创建的检查点记录，并重新请求存储空间。然后进程304可以对于其当前状态和相关联的数据创建的新的检查点记录。在某些情况下，早先的检查点记录永久地存储在存储器中，并没有被新的检查点记录覆写。在美国专利No.6,584,581中提供了在检查点操作记录中存储的信息的另外的示例，所述专利的内容整体合并于此。Similarly, additional checkpointing operations may be performed for each distinct stage of execution of processes 302-306. As such, along with storing data representing initial checkpoint information, fault tolerance manager 308 may initiate, for example, an Storage of additional information. To initiate storage of subsequent checkpoint information, techniques such as propagating further checkpoint command messages through processes 302, 304a-n, 306 may be used. Upon receiving the checkpoint command message, process 304a may complete any ongoing tasks or suspend any outstanding tasks. In some examples, process 304a may delete previously created checkpoint records stored in data store 314 and re-request storage space. Process 304 may then create a new checkpoint record for its current state and associated data. In some cases, earlier checkpoint records are permanently stored in memory and are not overwritten by new checkpoint records. Additional examples of information stored in checkpoint operation records are provided in US Patent No. 6,584,581, the contents of which are incorporated herein in their entirety.

在某些实例中，容错管理器308可以在当前正在执行前面的检查点操作的同时启动另外的检查点操作。例如，在进程304n正在处理任意检查点状态（例如，与检查点命令消息N对应的检查点状态N）时，容错管理器308可以通过生成并向源进程302传输随后的检查点命令消息N+1来开始随后的检查点状态（例如，检查点状态N+1）。沿这些线，当检查点命令消息N仍然正在行进通过进程302-306时，生成新的随后的检查点命令消息N+1并传递所述检查点命令消息N+1通过进程302-306是可能的。以这样的方式，容错管理器308可以引起更频繁的进程状态的检查点操作，而不必等待直到完成先前的检查点操作状态为止。In some instances, fault tolerance manager 308 may initiate additional checkpoint operations while a previous checkpoint operation is currently being performed. For example, while process 304n is processing any checkpoint state (e.g., checkpoint state N corresponding to checkpoint command message N), fault tolerance manager 308 may generate and transmit subsequent checkpoint command message N+ 1 to start the subsequent checkpoint state (e.g. checkpoint state N+1). Along these lines, while checkpoint command message N is still traveling through processes 302-306, it is possible to generate a new subsequent checkpoint command message N+1 and pass said checkpoint command message N+1 through processes 302-306 of. In this manner, fault tolerance manager 308 can cause more frequent checkpointing of process states without having to wait until a previous checkpointed state is complete.

在某些情况下，在一个或多个检查点命令消息传输通过进程302、304a-n、306时，系统故障可能发生。例如，考虑容错管理器308通过生成检查点命令消息N已启动检查点状态N的情况。在检查点命令消息N正在被进程302-306处理时，进程中的一个（例如，进程304a）和外部系统（例如，数据库312）之间的连接可能故障。在被警告所述情况时，容错管理器308可以通过传递中止命令消息322通过进程302-306来响应。中止命令消息322可以到达仍然正在处理检查点状态N（例如，存储与检查点N相关联的检查点信息）的进程（例如，进程304n）。基于中止命令的接收，进程304n可以采取一个或多个动作。例如，进程304n可以完成检查点状态N并中止所有进一步的处理。在另一情况下，进程304n可以丢弃与自先前的检查点状态N-1以来的当前和随后的状态相关联的结果，并中止进一步的处理。作为结果，当系统300实现静止时，进程302-306中的每一个可以处于不同的检查点状态。例如，进程304n上游的所有进程（例如，数据接收进程306）可能已完成检查点状态N，而进程304n下游的所有进程（例如，进程304a和数据源进程302）可能仅已完成检查点状态N-1。In some cases, a system failure may occur while one or more checkpoint command messages are transmitted through the processes 302, 304a-n, 306. For example, consider the case where fault-tolerance manager 308 has initiated a checkpoint state N by generating a checkpoint command message N . While checkpoint command message N is being processed by processes 302-306, the connection between one of the processes (eg, process 304a) and the external system (eg, database 312) may fail. Upon being alerted of the situation, fault tolerance manager 308 may respond through processes 302-306 by passing an abort command message 322. Abort command message 322 may arrive at a process (eg, process 304n ) that is still processing checkpoint state N (eg, storing checkpoint information associated with checkpoint N). Based on receipt of the abort command, process 304n may take one or more actions. For example, process 304n may complete checkpoint state N and abort all further processing. In another case, process 304n may discard results associated with the current and subsequent states since the previous checkpoint state N-1, and abort further processing. As a result, each of processes 302-306 may be in a different checkpoint state when system 300 achieves quiescence. For example, all processes upstream of process 304n (e.g., data receiving process 306) may have completed checkpoint state N, while all processes downstream of process 304n (e.g., process 304a and data source process 302) may have only completed checkpoint state N -1.

当系统300准备好恢复处理时，容错管理器308经由通信信道312a-e向每个进程传输一个或多个恢复处理消息。恢复处理消息向进程指示最早的、完全进行（commit）（或完成）的检查点状态（例如，检查点状态N-1），所述进程将从所述检查点状态执行。在示例中，可能已完成检查点状态N的进程可以仅从检查点状态N-1到检查点状态N再现结果。以该方式，进程302-306可以避免重复它们的较早的努力。在示例中，从检查点状态N-1到检查点状态N重演结果涉及再现在两个检查点状态之间可能已发生的较早的处理动作的结果。When system 300 is ready to resume processing, fault tolerance manager 308 transmits one or more resume processing messages to each process via communication channels 312a-e. The resume processing message indicates to the process the earliest, fully committed (or completed) checkpoint state (eg, checkpoint state N-1 ) from which the process will execute. In an example, a process that may have completed checkpoint state N may only reproduce results from checkpoint state N−1 to checkpoint state N. In this way, processes 302-306 can avoid repeating their earlier efforts. In an example, replaying results from checkpoint state N−1 to checkpoint state N involves reproducing the results of earlier processing actions that may have occurred between the two checkpoint states.

在示例中，系统故障可以在起动之后基本上立即发生。在这样的情况下，进程302-306中的许多可能仅完成了检查点状态零。这些进程302-306可以基于在对应的检查点记录中存储的初始化数据和起动值从检查点状态零恢复处理。In an example, system failure may occur substantially immediately after startup. In such a case, many of the processes 302-306 may have only completed checkpoint state zero. These processes 302-306 may resume processing from a checkpoint state of zero based on initialization data and startup values stored in the corresponding checkpoint records.

图4是描绘在多进程系统内的进程（例如，图3的进程302）的示例执行的流程图。在起动时，进程立即将其初始状态存储到数据存储器中作为检查点状态零（步骤402）。然后进程可以在不同的执行阶段（例如，执行阶段1、2、……、N-1）中执行。在每个执行阶段结束时，进程可以将其结束状态保存到数据存储器作为检查点状态。例如，在第一执行阶段之后，进程可以将第一执行阶段的结束状态保存为检查点状态1（状态404）。类似地，在随后的执行阶段之后，进程可以将执行阶段的结束状态保存为检查点状态2、……、N-1和N（步骤406-410）。4 is a flowchart depicting an example execution of a process (eg, process 302 of FIG. 3 ) within a multi-process system. Upon startup, the process immediately stores its initial state into data storage as checkpoint state zero (step 402). The process can then execute in different execution phases (eg, execution phase 1, 2, . . . , N-1). At the end of each stage of execution, a process can save its end state to data memory as a checkpoint state. For example, after the first execution phase, the process may save the end state of the first execution phase as checkpoint state 1 (state 404 ). Similarly, after a subsequent execution phase, the process may save the end state of the execution phase as checkpoint states 2, . . . , N-1, and N (steps 406-410).

图5是描绘在执行进程的同时在存储和从检查点恢复时执行的示例步骤的流程图。例如，在初始化进程时，与进程的初始状态有关的信息存储在相关联的存储区域中（步骤502）。然后在不同的执行阶段中执行进程。像这样，在每个执行阶段的结束，进程存储代表执行阶段的结束状态的信息（步骤504）。当预定事件发生时，例如，丢失到外部设备的连接（步骤508），中止进程的执行（步骤506）。同时，进程检查以便查看触发暂停的事件是否已清除（例如，重建到外部设备的连接）。在这期间，不关闭进程，但中止进程直到事件被认为清除为止。进程的执行从最近保存的初始或结束状态恢复（步骤510）。5 is a flow diagram depicting example steps performed when storing and restoring from checkpoints while executing a process. For example, when a process is initialized, information related to the initial state of the process is stored in an associated memory area (step 502). The process is then executed in different execution phases. As such, at the end of each execution phase, the process stores information representing the end state of the execution phase (step 504). When a predetermined event occurs, eg, loss of connection to an external device (step 508), execution of the process is aborted (step 506). At the same time, the process checks to see if the event that triggered the pause has cleared (for example, reestablishing a connection to an external device). During this time, the process is not closed, but aborted until the event is considered cleared. Execution of the process resumes from the most recently saved initial or end state (step 510).

图6是描绘在外部系统故障或使得外部系统离线用于维护的事件中执行的示例步骤的流程图。例如，外部系统可以是与处理系统中的进程通信的数据库（例如，图3的数据库318）。当使得外部系统离线用于维护或外部系统故障时，可以由例如与故障的外部系统通信的进程产生错误标志（步骤602）。响应于所述错误标志可以生成中止消息（例如，图3的中止命令消息322）并发送所述中止消息通过进程（步骤604）。可以采用一个或多个技术用于这样的消息传输。例如，可以实现不同的传输路径和不同类型的消息或消息的组合。当进程接收到中止命令消息时，中止进程中的每一个进程的当前活动（步骤606）。可以实现一个或多个技术用于中止这样的进程。中止可以包括一个或多个功能的执行，例如，进程可能丢弃自最近的检查点状态以来执行的任何交易。中止进一步的动作直到重建到外部系统的故障连接为止（步骤608）。当重建所述连接时，向进程中的每一个传输恢复处理消息（步骤610）。恢复处理消息指示检查点状态，从所述检查点状态进程将恢复处理。像这样，进程中的每一个从与它们相关联的存储区域检索关于检查点状态的相关信息（步骤612）。6 is a flowchart depicting example steps performed in the event of an external system failure or taking the external system offline for maintenance. For example, an external system may be a database (eg, database 318 of FIG. 3 ) in communication with processes in the processing system. When an external system is taken offline for maintenance or an external system fails, an error flag may be generated, for example, by a process communicating with the failed external system (step 602 ). An abort message (eg, abort command message 322 of FIG. 3 ) may be generated and sent through the process in response to the error flag (step 604 ). One or more techniques may be employed for such message transmission. For example, different transmission paths and different types of messages or combinations of messages can be implemented. When the process receives the abort command message, the current activity of each of the processes in the process is aborted (step 606 ). One or more techniques may be implemented for stopping such processes. An abort may include the execution of one or more functions, for example, a process may discard any transactions executed since the most recent checkpointed state. Further actions are aborted until the failed connection to the external system is re-established (step 608). When the connection is re-established, a resume processing message is transmitted to each of the processes (step 610). A resume processing message indicates a checkpoint state from which the process will resume processing. As such, each of the processes retrieves relevant information about the state of the checkpoint from their associated storage area (step 612).

可以使用用于在计算机上执行的软件来实现上面描述的进程的重启。例如，所述软件在一个或多个编程或可编程计算机系统（其可以是例如分布式、客户端/服务器、或网格的各种架构）上执行的一个或多个计算机程序中形成过程，每个所述计算机系统包括至少一个处理器，至少一个数据存储系统（包括易失性和非易失性存储器和/或存储元件）、至少一个输入设备或端口以及至少一个输出设备或端口。所述软件可以形成例如提供与数据流图的设计和配置有关的其他服务的更大的程序中的一个或多个模块。图的节点和元素可以被实现为在计算机可读介质中存储的数据结构，或符合在数据存储库中存储的数据模型的其他有组织的数据。Restarting of the processes described above can be accomplished using software for execution on a computer. For example, the software forms a process in one or more computer programs executing on one or more programmed or programmable computer systems (which may be of various architectures such as distributed, client/server, or grid), Each of said computer systems includes at least one processor, at least one data storage system (including volatile and non-volatile memory and/or storage elements), at least one input device or port, and at least one output device or port. The software may form one or more modules within a larger program, eg, that provides other services related to the design and configuration of dataflow graphs. The nodes and elements of a graph may be implemented as data structures stored in a computer readable medium, or other organized data conforming to a data model stored in a data repository.

可以在例如CD-ROM的可以由通用或专用可编程计算机读取的存储介质上提供所述软件，或可以经由到执行所述软件的计算机的网络的通信介质递送（被编码在传播信号中）所述软件。可以在专用计算机上、或使用例如协处理器的专用硬件执行所有功能。可以以由不同的计算机执行由所述软件指定的计算的不同部分的分布式的方式实现所述软件。每个这样的计算机程序被优选地存储在或下载到可以由通用或专用可编程计算机读取的存储介质或设备（例如，固态存储器或介质、或磁性或光学介质），用于当由计算机系统读取存储介质或设备以便执行在这里描述的过程时，配置和操作计算机。本发明系统还可以被认为被实现为使用计算机程序配置的计算机可读存储介质，其中这样配置存储介质使得以具体和预定方式操作计算机系统以便执行在这里描述的功能。The software may be provided on a storage medium, such as a CD-ROM, which can be read by a general or special purpose programmable computer, or may be delivered (encoded in a propagated signal) via a communication medium to a network of computers executing the software the software. All functions can be performed on a special purpose computer, or using special purpose hardware such as coprocessors. The software can be implemented in a distributed fashion with different computers performing different parts of the calculations specified by the software. Each such computer program is preferably stored or downloaded to a storage medium or device (e.g., solid-state memory or media, or magnetic or optical media) that can be read by a general-purpose or special-purpose programmable computer for use when programmed by the computer system Configure and operate a computer while reading a storage medium or device to perform the processes described herein. The inventive system may also be considered to be implemented as a computer-readable storage medium configured using a computer program, wherein the storage medium is configured such that the computer system is operated in a specific and predetermined manner to perform the functions described herein.

已描述了本发明的许多实施例。然而，将理解可以进行各种修改而不背离本发明的精神和范围。例如，上面描述的步骤中的某些可以是顺序无关的，从而可以以不同于描述的顺序的顺序执行。A number of embodiments of the invention have been described. However, it will be understood that various modifications may be made without departing from the spirit and scope of the invention. For example, some of the steps described above may be order independent, and thus may be performed in an order different from that described.

应理解上述描述意图图示而并不限制本发明的范围，本发明的范围由所附权利要求的范围定义。例如，可以以不同的顺序执行上面描述的许多功能步骤，而不实质上影响整体处理。其他实施例在下列权利要求的范围内。It should be understood that the above description is intended to illustrate, and not to limit, the scope of the invention, which is defined by the scope of the appended claims. For example, many of the functional steps described above may be performed in a different order without materially affecting the overall process. Other embodiments are within the scope of the following claims.

Claims

1. A computer-implemented method comprising:

sending a message to at least a first process and a second process configured to perform one or more tasks in response to a predetermined event, the message indicating suspension of execution of the one or more tasks, wherein the first process is configured to provide data to the second process; and

Upon receipt of the message, an abort of execution of the one or more tasks to be aborted by one or more of at least the first process and the second process is initiated.

2. The computer-implemented method of claim 1, comprising:

Information about an initial state of each of the first process and the second process is stored while the first process and the second process are being initialized.

3. The computer-implemented method of claim 1, wherein execution of a process includes executing at least one execution phase of the process, and storing information representative of an end status of the execution phase upon completion of the corresponding execution phase.

4. The computer-implemented method of claim 3, comprising:

resuming execution of one or more of the first process and the second process from one of the saved end states without closing the one of the first process and the second process or multiple processes.

5. The computer-implemented method of claim 1, wherein the predetermined event represents a loss of connection to an external device.

6. The computer-implemented method of claim 1, wherein the predetermined event represents an error of an external device.

7. The computer-implemented method of claim 5, wherein a recovery process is performed when a connection to the external device has been re-established.

8. The computer-implemented method of claim 6, wherein execution of the first process or the second process is resumed when an error of the external device has been cleared.

9. The computer-implemented method of claim 3, wherein process execution is resumed from an end state stored by a stage of execution prior to the occurrence of the predetermined event.

10. The computer-implemented method of claim 2, wherein if the predetermined event occurs after initiation of one or more of the first process and the second process and before completion of an execution phase, Then resume execution of the one or more processes in the first process and the second process from the initial state.

11. The computer-implemented method of claim 1, wherein process execution includes performing one or more processing actions on a received data stream to generate output data.

12. The computer-implemented method of claim 1, comprising:

sending a checkpoint message through the first process and the second process, the checkpoint message including instructions for storing current information about the first process and the second process, and

suspending operation upon receipt of said checkpoint message at said first process or said second process and initiating storage of information related to a current execution state of said first process or said second process to a storage area .

13. The computer-implemented method of claim 12, comprising:

Overwrites a previously stored initial or final state with a new initial or final state.

14. The computer-implemented method of claim 1 , wherein each of the first process and the second process communicates with one or more data queues for at least the first process and the second process The second process receives data and queues the data.

15. The computer-implemented method of claim 12, further comprising:

Checkpoint messages are generated in response to network-related events.

16. The computer-implemented method of claim 15, wherein the network-related event represents a network shutdown.

17. The computer-implemented method of claim 12, further comprising:

The checkpoint messages are generated periodically.

18. The computer-implemented method of claim 12, further comprising:

The checkpoint message is generated in response to one or more data values within or derived from an incoming data record to be processed by at least the first process and the second process.

19. The computer-implemented method of claim 2, further comprising:

Execution of one or more of the first process and the second process is resumed from one of the saved end states based in part on information contained in the resume processing message.

20. The computer-implemented method of claim 1, further comprising:

receiving the message during a first execution phase of the first process after initialization of the first process and prior to completion of the first execution phase of the first process, and

Resuming execution of the first process from the saved initial state without shutting down and restarting the first process.

21. The method of claim 1, wherein the second process is configured to process data received from the first process by at least one of transforming, filtering, validating data content.

22. A computing system comprising:

a device or port configured to send a message to at least a first process and a second process performing one or more tasks in response to a predetermined event, the message indicating an abort of the one or more tasks being performed, wherein the the first process is configured to provide data to the second process; and

at least one processor configured to, upon receipt of said message, initiate execution of said one or more tasks to be aborted by one or more of at least said first process and said second process suspension.

23. The computing system of claim 22 , wherein the at least one processor is configured to store information related to the first process and the second process when the first process and the second process are being initialized. Information about the initial state of each process in .

24. The computing system of claim 22, wherein execution of a process includes executing at least one execution phase of the process, and storing information representative of an end status of the execution phase upon completion of the corresponding execution phase.

25. The computing system of claim 22 , wherein the at least one processor is configured to resume one or more of the first process and the second process from one of the saved end states without closing the one or more processes of the first process and the second process.

26. The computing system of claim 22, wherein the predetermined event represents at least one of a loss of connection to the external device or an error of the external device.

27. The computing system of claim 22, wherein process execution is resumed from an end state stored by a stage of execution prior to the occurrence of the predetermined event.

28. The computing system of claim 22 , wherein the at least one processor is configured to send a checkpoint message through the first process and the second process, the checkpoint message including information for storing information about the An instruction describing the current information of the first process and the second process.

29. The computing system of claim 22 , wherein each of the first process and the second process communicates with one or more data queues for at least the first process and the second process Two processes receive data and queue the data.

30. The computing system of claim 22, wherein the second process is configured to process data received from the first process by at least one of transforming, filtering, validating data content.

31. A computing system comprising:

means for sending a message to at least a first process and a second process executing one or more tasks in response to a predetermined event, the message indicating an abort of the executing one or more tasks, wherein the first process is terminated by configured to provide data to the second process; and

means for, upon receipt of said message, initiating abort of execution of said one or more tasks to be aborted by one or more of at least said first process and said second process.

32. The computing system of claim 31 , comprising means for storing an initial process associated with each of the first process and the second process when the first process and the second process are being initialized. A device for state-related information.

33. The computing system of claim 31 , comprising means for resuming execution of one or more of the first process and the second process from one of the saved end states without shutting down A means for the one or more processes of the first process and the second process.

34. The computing system of claim 31, wherein the predetermined event represents at least one of a loss of connection to the external device or an error of the external device.

35. The computing system of claim 31, wherein the second process is configured to process data received from the first process by at least one of transforming, filtering, validating data content.

36. A computer implemented method comprising:

sending an abort command message from a manager executing on the computer system in response to a predetermined event;

receiving an abort command message at a first process configured to execute one or more tasks;

suspending execution of one or more tasks of the first process;

sending an abort command message from said first process to a second process configured to perform one or more tasks;

suspending execution of one or more tasks of the second process; and

A return abort command message is sent to the manager.

37. The method of claim 36, wherein the first process is configured to provide data to the second process, and the second process is configured to process the data to complete the one or more Task.

38. The method of claim 36 , comprising sending an abort command message from the second process to a sequence of downstream processes, wherein the returned abort command message is from the last of the downstream processes to the manager send.

39. The method of claim 36, comprising, after said manager receives said returned abort command message, sending a resume process message from said manager, and receiving said Resume processing messages.

40. A computing system comprising:

one or more processors that execute instructions to implement

a manager configured to send an abort command message in response to a predetermined event; and

a series of processes configured to perform one or more tasks;

Wherein the first process in the series of processes is configured to suspend the execution of one or more tasks of the first process when receiving an abort command message, and send an abort to the second process in the series of processes a command message, the second process being downstream of the first process;

Wherein the second process is configured to suspend execution of one or more tasks of the second process when receiving a suspend command message;

Wherein the last process in the series of processes is configured to send back an abort command message to the manager.

41. The system of claim 40, wherein the first process is configured to provide data to the second process, and the second process is configured to process the data to complete the one or more Task.

42. The system of claim 40, wherein the manager is configured to send a resume processing message to the series of processes after receiving the returned abort command message.