CN100530106C - Realization Method of Multi-machine Fault-Tolerant System Kernel - Google Patents
Realization Method of Multi-machine Fault-Tolerant System Kernel Download PDFInfo
- Publication number
- CN100530106C CN100530106C CNB200610161298XA CN200610161298A CN100530106C CN 100530106 C CN100530106 C CN 100530106C CN B200610161298X A CNB200610161298X A CN B200610161298XA CN 200610161298 A CN200610161298 A CN 200610161298A CN 100530106 C CN100530106 C CN 100530106C
- Authority
- CN
- China
- Prior art keywords
- data
- task
- synchronization
- carry out
- scheduler task
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02P—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
- Y02P90/00—Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
- Y02P90/02—Total factory control, e.g. smart factories, flexible manufacturing systems [FMS] or integrated manufacturing systems [IMS]
Landscapes
- Hardware Redundancy (AREA)
- Multi Processors (AREA)
Abstract
本发明公开了一种多机容错系统内核的实现方法,在应用程序与操作系统之间形成一个中间件,以多任务的方式管理系统资源、应用程序,使得应用程序的输入输出均通过该中间件来完成。在中间件中实现运算单元之间的同步与数据交换以及输出表决。在中间件中通过建立并管理缓冲区实现与应用程序的接口,通过多任务管理与数据交换实现运算单元之间的同步,通过输出数据比较实现输出表决。本发明通过在应用程序与操作系统之间加入软件中间件实现容错内核,无需复杂的硬件电路;通过多任务管理与数据交换实现较高同步密度;通过输出数据比较实现输出表决,无需新增输出模块。并可通过对中间件的配置实现多种不同形式的容错结构。The present invention discloses a method for realizing a multi-machine fault-tolerant system kernel, wherein a middleware is formed between an application program and an operating system, and system resources and application programs are managed in a multi-tasking manner, so that the input and output of the application program are all completed through the middleware. Synchronization, data exchange, and output voting between operation units are realized in the middleware. An interface with the application program is realized by establishing and managing a buffer in the middleware, synchronization between operation units is realized by multi-tasking management and data exchange, and output voting is realized by output data comparison. The present invention realizes a fault-tolerant kernel by adding a software middleware between the application program and the operating system, without the need for complex hardware circuits; a higher synchronization density is realized by multi-tasking management and data exchange; and output voting is realized by output data comparison, without the need for adding an output module. Various different forms of fault-tolerant structures can also be realized by configuring the middleware.
Description
所属技术领域 Technical field
本发明涉及一种计算机自动控制系统中多机容错系统内核的实现方法。The invention relates to a method for realizing the kernel of a multi-machine fault-tolerant system in a computer automatic control system.
背景技术 Background technique
在一些涉及重大人身和设备安全的自动控制系统领域,要求系统具有极高的可靠性,并且要求系统不仅在正常工作时保证系统安全,而且必须在发生故障时确保系统由故障导向安全。为此,基于容错技术的高可靠性、高安全性多机容错系统应运而生。以三取二计算机系统为例,设置三台计算机作为三个运算单元,只有其中两台或两台以上运算结果一致时才认为运算结果正确并予以输出。In the field of automatic control systems involving major personal and equipment safety, the system is required to have extremely high reliability, and the system is required not only to ensure system safety during normal operation, but also to ensure that the system is guided to safety by failure when a failure occurs. Therefore, a high-reliability, high-security multi-machine fault-tolerant system based on fault-tolerant technology came into being. Taking two out of three computer systems as an example, three computers are set up as three computing units, and only when two or more of them have the same computing results, the computing results are considered correct and output.
实现多机容错的核心技术是同步,只有在同步的前提下多个运算单元才能输出同一状态下的运算结果用于表决。同步的方式有两种,一种是紧密同步:即通过硬件同步装置迫使多个运算单元严格按照协同的节拍同步运行。但这种方式需要增加同步硬件设备,且对实现该功能的技术要求较高。另一种方式是松散同步:即通过软件的方法使得多个运算单元在各自的时钟下通过软件协调到近似同步的状态。大多数松散同步机制都采用定时同步和输出同步方式,即在系统启动运行后定时发送同步信息进行定时同步,并在输出端通过某种装置等待多个运算单元的输出结果,并在输出结果到来后进行表决输出。中国专利00109094就提出采用同步帧定时发送程序进行定时同步,并在现场数据线上串接多个固化了执行程序的独立输出单元OCM,使得三个运算单元实现同步和表决。这种同步方式主要是通过运行软件来实现,对于整个系统而言省去了复杂的硬件电路。但由于只在外部输出单元才进行表决,而三个运算单元在应用执行时并没有数据交换,降低了系统同步精度。并且需要增加输出单元。The core technology to realize multi-computer fault tolerance is synchronization. Only under the premise of synchronization can multiple computing units output computing results in the same state for voting. There are two ways of synchronization, one is tight synchronization: that is to force multiple computing units to run synchronously in strict accordance with the coordinated beat through the hardware synchronization device. However, this method needs to increase the synchronization hardware equipment, and the technical requirements for realizing this function are relatively high. Another way is loose synchronization: that is, through software methods, multiple computing units can be coordinated to an approximately synchronized state through software under their respective clocks. Most loose synchronization mechanisms use timing synchronization and output synchronization, that is, after the system starts running, it sends synchronization information regularly for timing synchronization, and waits for the output results of multiple computing units through a device at the output end, and when the output results arrive Then vote on the output. Chinese patent 00109094 proposes to use a synchronization frame timing sending program for timing synchronization, and connect a plurality of independent output units OCM with solidified execution programs in series on the field data line, so that the three computing units can realize synchronization and voting. This synchronization method is mainly realized by running software, which saves complex hardware circuits for the entire system. However, since voting is only performed on the external output unit, and the three computing units do not exchange data during application execution, the synchronization accuracy of the system is reduced. And need to increase the output unit.
发明内容 Contents of the invention
本发明的目的在于克服现有技术存在的技术缺陷,提供一种通过软件实现多机容错系统内核的实现方法。它能够提供较高的系统同步精度,使得系统故障尽早被发现并予以解决,同时本方法完全以软件的方式实现,基于实时多任务系统,使用标准posix和操作系统接口。无需增加任何输入输出单元等硬件设备,并且具有极强的可移植性。The purpose of the present invention is to overcome the technical defects in the prior art and provide a method for realizing the kernel of a multi-computer fault-tolerant system through software. It can provide high system synchronization accuracy, so that system faults can be found and solved as soon as possible. At the same time, the method is completely implemented in software, based on a real-time multitasking system, and uses standard posix and operating system interfaces. There is no need to add any hardware devices such as input and output units, and it has strong portability.
本发明实现发明目的的基本原理:在实时多任务操作系统基础上,在应用程序与操作系统之间形成一个中间件,以多任务的方式管理系统资源、应用程序等,使得应用程序的输入输出均通过该中间件来完成,通过缓冲管理实现中间件与应用程序之间的接口,通过标准posix和操作系统接口实现中间件与操作系统之间的接口。在中间件中实现运算单元(又称计算单元)之间的同步与数据交换以及输出表决。在中间件中通过建立并管理缓冲区实现与应用程序的接口,通过多任务管理与数据交换实现运算单元之间的同步,通过输出数据比较实现输出表决。The basic principle of the present invention to realize the purpose of the invention: on the basis of the real-time multitasking operating system, a middleware is formed between the application program and the operating system, and the system resources, application programs, etc. are managed in a multi-task manner, so that the input and output of the application program Both are completed through the middleware, the interface between the middleware and the application program is realized through buffer management, and the interface between the middleware and the operating system is realized through the standard posix and operating system interface. In the middleware, the synchronization and data exchange and output voting between the computing units (also known as computing units) are realized. In the middleware, the interface with the application program is realized by establishing and managing the buffer zone, the synchronization between the computing units is realized through multi-task management and data exchange, and the output voting is realized through output data comparison.
本发明基于上述原理,为实现发明目的所采用的技术方案是:多机容错系统内核的实现方法,在应用程序与操作系统之间形成一个软件中间件,采用多任务的方式管理系统资源、应用程序;具体分为应用任务、主调度任务、计算单元状态监视任务以及通讯通道管理任务;其中:The present invention is based on the above-mentioned principles, and the technical solution adopted for realizing the purpose of the invention is: the realization method of the kernel of the multi-machine fault-tolerant system, forming a software middleware between the application program and the operating system, and adopting a multi-task mode to manage system resources, application Program; it is specifically divided into application tasks, main scheduling tasks, computing unit status monitoring tasks, and communication channel management tasks; among them:
应用任务:应用程序执行任务,由主调度任务创建,自行结束或由主调度任务销毁;Application task: application execution task, created by the main scheduling task, ends by itself or destroyed by the main scheduling task;
主调度任务(主通道的调度任务,又称主通道的计算单元的调度任务):负责系统计算单元的同步以及数据分发、数据比较、运算调度;The main scheduling task (the scheduling task of the main channel, also known as the scheduling task of the computing unit of the main channel): responsible for the synchronization of the computing units of the system, data distribution, data comparison, and operation scheduling;
计算单元状态监视任务:负责监视当前计算单元以及伙伴计算单元通讯状态,并进行整个系统状态的判断;Computing unit status monitoring task: responsible for monitoring the communication status of the current computing unit and partner computing units, and judging the status of the entire system;
通讯通道管理任务:负责监视管理通讯通道,接收并发送数据;Communication channel management task: responsible for monitoring and managing communication channels, receiving and sending data;
系统内核过程:在整个运算周期之初进行一次进程同步,使得多个计算单元处于近似同一时刻开始该周期的运算;进程同步过后进行数据输入过程,多个计算单元分别从外获取数据,接下来就进行一次数据同步(数据同步1),使得多个计算单元所获取的数据保持一致,同时使得应用程序在近似同一时刻开始执行;输入数据同步后即开始启动应用任务执行应用程序;待应用程序完成后得出计算结果,并将计算结果分发给伙伴计算单元,在此之后进行第二次数据同步(数据同步2),以保证每个计算单元都获得了比较所需的数据,同时使得结果比较过程在近似同一时刻开始执行;接下来进行结果比较过程,在得出了比较结果后进行第三次数据同步(数据同步3),以保证比较结果,即准备输出的数据是一致的,并且使得结果数据输出在近似同一时刻进行;接下来,可选择单个或多个运算单元同时输出。System kernel process: a process synchronization is performed at the beginning of the entire operation cycle, so that multiple computing units start the operation of the cycle at approximately the same time; after the process synchronization, the data input process is performed, and multiple computing units obtain data from the outside respectively, and then Just perform data synchronization once (data synchronization 1), so that the data acquired by multiple computing units is consistent, and at the same time, the application program starts to execute at approximately the same time; after the input data is synchronized, the application task is started to execute the application program; the waiting application program After completion, the calculation result is obtained, and the calculation result is distributed to the partner computing unit, after which the second data synchronization (data synchronization 2) is performed to ensure that each computing unit has obtained the data required for comparison, and at the same time makes the result The comparison process starts to execute at approximately the same time; then the result comparison process is performed, and the third data synchronization (data synchronization 3) is performed after the comparison result is obtained to ensure that the comparison result, that is, the data to be output is consistent, and The output of the result data is performed at approximately the same time; next, a single or multiple computing units can be selected to output at the same time.
进程同步过程:在一个运算周期之初,作为主通道的计算单元的调度任务首先发出进程同步指令,然后调度任务被挂起等待从通道应答;从通道调度任务首先自行挂起,等待主通道的进程同步指令;当通讯通道管理任务收到主通道发来的进程同步指令后随即做出应答;主通道的通讯通道管理任务收到所有从通道的进程同步应答(或超时)后向所有从通道发出进程执行指令,同时给出本机调度任务继续执行的信号,使得主通道的调度任务继续执行;从通道在获得了主通道发来的进程执行指令后,给出本机调度任务继续执行的信号,使得从通道的调度任务继续执行;这样多个运算单元只相差一个传输时延的时间,可以近似地看作三个运算单元在同一时刻开始运行。Process synchronization process: At the beginning of a computing cycle, the scheduling task of the computing unit as the main channel first issues a process synchronization command, and then the scheduling task is suspended to wait for the reply from the channel; the scheduling task from the channel first suspends itself and waits for the response from the main channel Process synchronization command; when the communication channel management task receives the process synchronization command sent by the main channel, it responds immediately; after the communication channel management task of the main channel receives the process synchronous response (or timeout) of all slave channels, it sends a message to all slave channels Issue a process execution instruction, and at the same time give a signal to continue the execution of the local scheduling task, so that the scheduling task of the main channel continues to execute; after the slave channel obtains the process execution instruction sent by the main channel, it gives a signal to continue the execution of the local scheduling task Signal, so that the scheduling tasks of the slave channel continue to execute; in this way, the difference between multiple computing units is only one transmission delay time, which can be approximately regarded as three computing units starting to run at the same time.
主通道的选择可采用轮换方式或抢先方式等多种形式实现。The selection of the main channel can be realized in various ways such as rotation or preemption.
数据同步过程:调度任务将需同步的数据,如输入数据分发给伙伴计算通道,随后发送数据同步指令,表明数据发送完毕需要同步,随后调度任务自行挂起,等待其他运算通道的数据传送完毕(数据同步指令);接下来通讯管理任务在收到所有伙伴运算单元送来的数据同步指令(或超时)后,表明所有运算通道的数据均已收到,则给出本机调度任务继续执行的信号,使得本机的调度任务继续执行;由于每个通道都等到了其他所有通道的数据发送完毕,因此,此时的多个运算单元可近似地看作在同一时刻执行数据同步之后的步骤。Data synchronization process: the scheduling task distributes the data that needs to be synchronized, such as input data, to the partner computing channel, and then sends a data synchronization command, indicating that the data needs to be synchronized after sending, and then the scheduling task suspends itself, waiting for the data transmission of other computing channels to complete ( data synchronization command); next, after the communication management task receives the data synchronization command (or timeout) sent by all partner computing units, it indicates that the data of all computing channels has been received, and then gives the local scheduling task to continue execution Signal, so that the scheduling tasks of the local machine continue to execute; since each channel waits until the data of all other channels has been sent, the multiple computing units at this time can be approximately regarded as steps after data synchronization is performed at the same time.
本发明以实时多任务操作系统为基础,通过在应用程序与操作系统之间加入中间件(软件)实现容错内核,无需复杂的硬件电路;通过缓冲管理实现中间件与应用程序之间的接口,通过标准posix和操作系统接口实现中间件与操作系统之间的接口,可移植性极强;通过多任务管理与数据交换实现较高同步密度;通过输出数据比较实现输出表决,无需新增输出模块。并可通过对中间件的配置实现多种不同形式的容错结构。The present invention is based on the real-time multitasking operating system, realizes the fault-tolerant kernel by adding middleware (software) between the application program and the operating system, and does not need complex hardware circuits; realizes the interface between the middleware and the application program through buffer management, The interface between the middleware and the operating system is realized through the standard posix and operating system interface, which is extremely portable; high synchronization density is achieved through multi-task management and data exchange; output voting is realized through output data comparison without adding an output module . And can realize many different forms of fault-tolerant structures through the configuration of the middleware.
附图说明 Description of drawings
图1是本发明多机容错系统内核实现方法的运算周期过程图(三取二容错系统);Fig. 1 is the computing cycle process diagram (take two out of three fault-tolerant systems) of the multi-machine fault-tolerant system kernel implementation method of the present invention;
图2是本发明多机容错系统内核实现方法的进程同步过程图(三取二容错系统);Fig. 2 is the process synchronous process diagram (take two out of three fault-tolerant systems) of the multi-machine fault-tolerant system kernel implementation method of the present invention;
图3是本发明多机容错系统内核实现方法的数据同步过程图(三取二容错系统);Fig. 3 is the data synchronous process diagram (take two out of three fault-tolerant systems) of the multi-computer fault-tolerant system kernel implementation method of the present invention;
图4是本发明多机容错系统内核实现方法程序框图(主通道的选择采用轮换方式)Fig. 4 is a program block diagram of the method for realizing the kernel of the multi-machine fault-tolerant system of the present invention (the selection of the main channel adopts a rotation mode)
图5是本发明多机容错系统内核实现方法中主通道的选择采用抢先方式实现的程序块。Fig. 5 is a program block in which the selection of the main channel in the multi-computer fault-tolerant system kernel implementation method of the present invention is implemented in a preemptive manner.
具体实施方式 Detailed ways
下面结合附图和具体实施例,对本发明作进一步详细说明。The present invention will be described in further detail below in conjunction with the accompanying drawings and specific embodiments.
实施例:以三取二容错系统为例。本发明提出的多机容错系统内核实现方法在应用程序与操作系统之间形成一个软件中间件,采用多任务的方式管理系统资源、应用程序等。具体可分为应用任务、主调度任务、计算单元状态监视任务以及通讯通道管理任务等,其中:Embodiment: Take two out of three fault-tolerant systems as an example. The multi-computer fault-tolerant system kernel implementation method proposed by the invention forms a software middleware between the application program and the operating system, and manages system resources, application programs, etc. in a multi-task manner. Specifically, it can be divided into application tasks, main scheduling tasks, computing unit status monitoring tasks, and communication channel management tasks, among which:
应用任务:应用程序执行任务,由主调度任务创建,自行结束或由主调度任务销毁。Application task: the application execution task, created by the main scheduling task, ends by itself or is destroyed by the main scheduling task.
主调度任务:负责三取二系统计算单元的同步以及运算调度。Main scheduling task: responsible for the synchronization and operation scheduling of the computing units of the two-out-of-three system.
计算单元状态监视任务:负责监视当前计算单元以及伙伴计算单元通讯状态,并进行整个系统状态的判断。Computing unit status monitoring task: responsible for monitoring the communication status of the current computing unit and partner computing units, and judging the status of the entire system.
通讯通道管理任务:负责监视管理通讯通道,接收并发送数据。Communication channel management task: responsible for monitoring and managing communication channels, receiving and sending data.
系统运算周期过程如图1所示:在整个运算周期之初进行一次进程同步,使得三个计算单元处于近似同一时刻开始该周期的运算。进程同步过后进行数据输入过程。三个计算单元分别从外获取数据,接下来就进行一次数据同步(数据同步1),使得三个计算单元所获取的数据保持一致,同时使得应用程序在近似同一时刻开始执行。输入数据同步后即开始启动应用任务执行应用程序。待应用程序完成后得出计算结果,并将计算结果分发给伙伴计算单元,在此之后进行第二次数据同步(数据同步2),以保证每个计算单元都获得了比较所需的数据,同时使得结果比较过程在近似同一时刻开始执行。接下来进行结果比较过程,在得出了比较结果后进行第三次数据同步(数据同步3),以保证比较结果,即准备输出的数据是一致的,并且使得结果数据输出在近似同一时刻进行。接下来,可选择单个、双个或三个运算单元同时输出。The system operation cycle process is shown in Figure 1: a process synchronization is performed at the beginning of the entire operation cycle, so that the three computing units start the operation of the cycle at approximately the same time. After the process is synchronized, the data input process is performed. The three computing units obtain data from the outside respectively, and then perform a data synchronization (data synchronization 1), so that the data obtained by the three computing units are consistent, and at the same time, the application program starts to execute at approximately the same time. The start of the application task execution application starts after the input data is synchronized. After the application program is completed, the calculation results are obtained, and the calculation results are distributed to the partner computing units. After that, the second data synchronization (data synchronization 2) is performed to ensure that each computing unit has obtained the data required for comparison. At the same time, the result comparison process is started at approximately the same time. Next, the result comparison process is performed. After the comparison result is obtained, the third data synchronization (data synchronization 3) is performed to ensure that the comparison result, that is, the data to be output is consistent, and the result data output is performed at approximately the same time. . Next, you can choose single, double or three arithmetic units to output simultaneously.
进程同步过程如图2所示:在一个运算周期之初,作为主通道的计算单元的调度任务首先发出进程同步指令,然后调度任务被挂起等待从通道应答。从通道调度任务首先自行挂起,等待主通道的进程同步指令。当通讯通道管理任务收到主通道发来的进程同步指令后随即做出应答。主通道的通讯通道管理任务收到所有从通道的进程同步应答(或超时)后向所有从通道发出进程执行指令,同时给出本机调度任务继续执行的信号,使得主通道的调度任务继续执行。从通道在获得了主通道发来的进程执行指令后,给出本机调度任务继续执行的信号,使得从通道的调度任务继续执行。这样三个运算单元只相差一个传输时延的时间,可以近似地看作三个运算单元在同一时刻开始运行。The process synchronization process is shown in Figure 2: at the beginning of a computing cycle, the scheduling task of the computing unit as the master channel first issues a process synchronization command, and then the scheduling task is suspended and waits for the slave channel to respond. The slave channel scheduling task first suspends itself and waits for the process synchronization command of the master channel. When the communication channel management task receives the process synchronization instruction sent by the main channel, it responds immediately. After the communication channel management task of the main channel receives the process synchronous response (or timeout) of all the slave channels, it sends process execution instructions to all the slave channels, and at the same time gives a signal to continue the execution of the local scheduling task, so that the scheduling task of the main channel continues to execute . After the slave channel receives the process execution instruction sent by the master channel, it gives a signal to continue the execution of the local scheduling task, so that the scheduling task of the slave channel continues to execute. In this way, the difference between the three computing units is only one transmission delay time, and it can be approximately regarded as that the three computing units start running at the same time.
本实施例中,主通道的选择采用轮换方式。In this embodiment, the selection of the main channel adopts a rotation method.
数据同步过程如图3所示:调度任务将需同步的数据,如输入数据分发给伙伴计算通道,随后发送数据同步指令,表明数据发送完毕需要同步,随后调度任务自行挂起,等待其他运算通道的数据传送完毕(数据同步指令)。接下来通讯管理任务在收到所有伙伴运算单元送来的数据同步指令(或超时)后,表明所有运算通道的数据均已收到。则给出本机调度任务继续执行的信号,使得本机的调度任务继续执行。由于每个通道都等到了其他所有通道的数据发送完毕,因此,此时的三个运算单元可近似地看作在同一时刻执行数据同步之后的步骤。The data synchronization process is shown in Figure 3: the scheduling task distributes the data to be synchronized, such as input data, to the partner computing channel, and then sends a data synchronization command, indicating that the data needs to be synchronized after sending, and then the scheduling task suspends itself and waits for other computing channels The data transmission is completed (data synchronization command). Next, the communication management task indicates that the data of all computing channels have been received after receiving the data synchronization command (or timeout) sent by all the partner computing units. Then, a signal to continue execution of the scheduled task of the local machine is given, so that the scheduled task of the local machine continues to execute. Since each channel has waited until the data of all other channels have been sent, the three computing units at this time can be approximately regarded as a step after performing data synchronization at the same time.
附图4给出的是本实施例的软件中间件程序框图(主通道的选择采用轮换方式)。每个运算周期进行一次主通道轮转。三个运算单元轮换做主单元。共同管理整个系统。Accompanying drawing 4 provided is the program block diagram of the software middleware of this embodiment (the selection of the main channel adopts the mode of rotation). The main channel is rotated once per computing cycle. The three computing units take turns as the main unit. Manage the entire system together.
由于运算单元在重新加入系统时,可能产生主机判别冲突,因此需设置冲突规避策略,本实施例中当发生主机冲突碰撞时,采用预先设置好的后序让前序的方法规避碰撞。When the computing unit re-joins the system, host identification conflicts may occur, so conflict avoidance strategies need to be set. In this embodiment, when host conflicts occur, the pre-set post-order method is used to avoid collisions.
实施例2、与实施例基本相同,所不同的的是,主通道的选择采用抢先方式实现。Embodiment 2 is basically the same as the embodiment, except that the selection of the main channel is implemented in a preemptive manner.
附图5给出了本实施例中主通道的选择采用抢先方式实现的程序框图。在该方式下,计算单元状态监视任务在每个通道状态检测周期,向伙伴通道发送一个状态查询指令。当有返回时检查通讯状态记录表,当满足以下条件时,将本机设为主机。Accompanying drawing 5 has given the program block diagram that the selection of main channel adopts preemptive way to realize in this embodiment. In this way, the computing unit status monitoring task sends a status query command to the partner channel in each channel status detection cycle. When there is a return, check the communication status record table, and when the following conditions are met, set the machine as the master.
条件一:只有一个伙伴通道与自身建立连接情况下,判断另一伙伴通道是否真的不在线(自身判断和来自伙伴通道的状态一致);应答伙伴通道的主机状态为非主机;自身主机状态为非主机。此时将自身主机状态设为主机。Condition 1: When only one partner channel establishes a connection with itself, judge whether the other partner channel is really offline (the self-judgment is consistent with the status from the partner channel); the status of the host that responds to the partner channel is non-host; the status of its own host is off-host. At this time, set the host status of itself as the host.
条件二:两个伙伴通道与均自身建立连接情况下,判断两个伙伴通道是否状态一致(自身判断和来自伙伴通道的状态一致);两个伙伴通道的主机状态均为非主机;自身主机状态为非主机。此时将自身主机状态设为主机。Condition 2: When the two partner channels are connected to themselves, judge whether the status of the two partner channels is consistent (the self-judgment is consistent with the status from the partner channel); the host status of the two partner channels is non-host; the host status of itself for non-host. At this time, set the host status of itself as the host.
由于三台主机并非完全意义上的时间同步,因此,通道间的主机判别存在一定误差,可能导致多台主机同时判断为主机。状态监视任务一旦检测到自身主机状态与伙伴通道主机状态发生冲突时,则将自身主机状态清除并等待下一次判断,以规避冲突。Since the three hosts are not fully time-synchronized, there is a certain error in the host identification between channels, which may cause multiple hosts to be judged as hosts at the same time. Once the state monitoring task detects that the state of its own host conflicts with the state of the partner channel host, it will clear the state of its own host and wait for the next judgment to avoid conflicts.
Claims (4)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CNB200610161298XA CN100530106C (en) | 2006-12-20 | 2006-12-20 | Realization Method of Multi-machine Fault-Tolerant System Kernel |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CNB200610161298XA CN100530106C (en) | 2006-12-20 | 2006-12-20 | Realization Method of Multi-machine Fault-Tolerant System Kernel |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| CN101000561A CN101000561A (en) | 2007-07-18 |
| CN100530106C true CN100530106C (en) | 2009-08-19 |
Family
ID=38692545
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CNB200610161298XA Active CN100530106C (en) | 2006-12-20 | 2006-12-20 | Realization Method of Multi-machine Fault-Tolerant System Kernel |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN100530106C (en) |
Families Citing this family (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN101794242B (en) * | 2010-01-29 | 2012-07-18 | 西安交通大学 | Fault-tolerant computer system data comparing method serving operating system core layer |
| CN102298324B (en) * | 2011-06-21 | 2013-04-17 | 东华大学 | Cooperative intelligent accurate fault-tolerance controller and method thereof |
| CN108804109B (en) * | 2018-06-07 | 2021-11-05 | 北京四方继保自动化股份有限公司 | Industrial deployment and control method based on redundant arbitration of multiple functionally equivalent modules |
| US11671278B2 (en) * | 2021-09-02 | 2023-06-06 | Rivian Ip Holdings, Llc | Automotive embedded system timekeeping |
-
2006
- 2006-12-20 CN CNB200610161298XA patent/CN100530106C/en active Active
Also Published As
| Publication number | Publication date |
|---|---|
| CN101000561A (en) | 2007-07-18 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN105930580B (en) | Time synchronization and data exchange device and method for joint simulation of power system and information communication system | |
| CN102591964B (en) | Implementation method and device for data reading-writing splitting system | |
| CN107634855A (en) | A kind of double hot standby method of embedded system | |
| CN105471622B (en) | A kind of high availability method and system of the control node active-standby switch based on Galera | |
| CN106790694A (en) | The dispatching method of destination object in distributed system and distributed system | |
| CN101383690B (en) | A network synchronization method of fault-tolerant computer system based on socket | |
| CN107483135A (en) | A kind of high synchronous time triggered Ethernet device and method | |
| CN102103532B (en) | Safety redundancy computer system of train control vehicle-mounted equipment | |
| CN115550384B (en) | Cluster data synchronization method, device, equipment and computer-readable storage medium | |
| CN109507866A (en) | A kind of double-machine redundancy system and method based on network address drift technology | |
| CN101237315A (en) | A Synchronous Detection and Fault Isolation Method for Dual-Controller High-Availability Systems | |
| CN100530106C (en) | Realization Method of Multi-machine Fault-Tolerant System Kernel | |
| CN101916068B (en) | Computer control system based on 2-out-of-2 structure and implementation method thereof | |
| CN201592724U (en) | Time synchronous system of train control vehicular device | |
| CN106502835A (en) | A kind of disaster-tolerant backup method and device | |
| CN102184157A (en) | Information display device based on dual processor cooperation | |
| CN102508745A (en) | Triple-modular redundancy system based on two-stage loose synchronization and realization method thereof | |
| CN105373563A (en) | Database switching method and apparatus | |
| EP3651027A1 (en) | Synchronized high-assurance circuits | |
| CN101183317A (en) | Method for real-time interrupting synchronization with multiple progress states | |
| CN108388228A (en) | A kind of synchronous debugging method and apparatus for multichannel embedded control system | |
| WO2013051067A1 (en) | Computer and computer-control method | |
| US8527741B2 (en) | System for selectively synchronizing high-assurance software tasks on multiple processors at a software routine level | |
| CN115277376B (en) | Disaster recovery switching method, device, equipment and medium | |
| CN117555688A (en) | Data processing methods, systems, equipment and storage media based on active-active center |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| C06 | Publication | ||
| PB01 | Publication | ||
| C10 | Entry into substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| C14 | Grant of patent or utility model | ||
| GR01 | Patent grant | ||
| TR01 | Transfer of patent right | ||
| TR01 | Transfer of patent right |
Effective date of registration: 20191008 Address after: Room 1401, floor 14, block a, building 1, Guorui building, No. 359, Jiangdong Middle Road, Jianye District, Nanjing City, Jiangsu Province, 210019 Patentee after: Nanjing Guorui Defense System Co., Ltd. Address before: 1313 box 03, box 210014, Nanjing City, Jiangsu Province Patentee before: No. 14 Inst., China Electronic Science & Technology Group Corp. |