[go: up one dir, main page]

CN111190632B - Method and device for realizing server BMC dual-activity - Google Patents

Method and device for realizing server BMC dual-activity Download PDF

Info

Publication number
CN111190632B
CN111190632B CN201911388465.8A CN201911388465A CN111190632B CN 111190632 B CN111190632 B CN 111190632B CN 201911388465 A CN201911388465 A CN 201911388465A CN 111190632 B CN111190632 B CN 111190632B
Authority
CN
China
Prior art keywords
bmc
version
working
logical partition
state
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911388465.8A
Other languages
Chinese (zh)
Other versions
CN111190632A (en
Inventor
邢科钰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Metabrain Intelligent Technology Co Ltd
Original Assignee
Suzhou Inspur Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Inspur Intelligent Technology Co Ltd filed Critical Suzhou Inspur Intelligent Technology Co Ltd
Priority to CN201911388465.8A priority Critical patent/CN111190632B/en
Publication of CN111190632A publication Critical patent/CN111190632A/en
Application granted granted Critical
Publication of CN111190632B publication Critical patent/CN111190632B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/60Software deployment
    • G06F8/65Updates
    • G06F8/654Updates using techniques specially adapted for alterable solid state memories, e.g. for EEPROM or flash memories
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1448Management of the data involved in backup or backup restore
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/70Software maintenance or management
    • G06F8/71Version control; Configuration management
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Security & Cryptography (AREA)
  • Quality & Reliability (AREA)
  • Stored Programmes (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a method for realizing server BMC dual activity, which comprises the following steps: the invention further provides a device for realizing the dual-activity of the server BMC, the problem that the operation of the server is stopped once the BMC fails due to a single version of the BMC chip is solved, the operation and the management efficiency of the server are improved, and the image file of the BMC version is stored by packaging a third logic area in the EEPROM of the BMC chip.

Description

一种实现服务器BMC双活的方法及装置A method and device for realizing server BMC active-active

技术领域Technical field

本发明涉及服务器BMC设计领域,尤其是涉及一种实现服务器BMC双活的方法及装置。The present invention relates to the field of server BMC design, and in particular, to a method and device for realizing server BMC dual-activity.

背景技术Background technique

随着服务器产品功能的日益强大,作为管理中心的BMC也在发挥着越来越重要的作用,从日常的服务器功能管理,系统远程调试以及系统日志的分析都在时刻发挥着它的价值。在享受BMC带来的便利的同时,BMC也存在着损坏的风险,一旦BMC失效,这台服务器就意味着从控制平台脱轨失联。As the functions of server products become increasingly powerful, BMC, as the management center, is also playing an increasingly important role. It plays its value at all times from daily server function management, system remote debugging and system log analysis. While enjoying the convenience brought by BMC, there is also the risk of damage to BMC. Once BMC fails, the server will be derailed from the control platform and lose contact.

在服务器调试阶段,BMC芯片因为人为刷新信号中断或突然间的断电导致BMC功能失效,使整机服务器自我检查失效无法启动,导致整机停止工作;在服务器运行阶段中,BMC功能的失效,导致整机服务器失联,机房管理系统无法获取服务器的运行状态,造成服务的中断和管理上的失效。During the server debugging phase, the BMC chip fails due to artificial refresh signal interruption or sudden power outage, causing the entire server self-check to fail to start, causing the entire machine to stop working; during the server operation phase, the BMC function fails, As a result, the entire server is lost, and the computer room management system cannot obtain the running status of the server, resulting in service interruption and management failure.

目前BMC芯片都是单体版本,一个芯片中只能容纳一个版本,无还原和失效激活功能,一旦因为BMC功能失效,就会导致整个服务器工作停止,不利于提高服务器的工作以及管理效率。At present, BMC chips are all single versions. One chip can only accommodate one version and has no restore and failure activation functions. Once the BMC function fails, the entire server will stop working, which is not conducive to improving the server's work and management efficiency.

发明内容Contents of the invention

本发明为了解决现有技术中存在的问题,创新提出了一种实现服务器BMC双活的方法及装置,有效解决由于BMC芯片单一版本造成一旦BMC失效造成服务器工作停止的问题,有效的提高了服务器的工作以及管理效率。In order to solve the problems existing in the prior art, the present invention innovatively proposes a method and device for realizing server BMC dual-active, which effectively solves the problem of server work stopping due to BMC failure due to a single version of the BMC chip, and effectively improves the efficiency of the server. work and management efficiency.

本发明第一方面提供了一种实现服务器BMC双活的方法,包括:The first aspect of the present invention provides a method for realizing server BMC dual-active, including:

在BMC的EEPROM芯片中设置第一逻辑分区、第二逻辑分区,其中,第一逻辑分区为BMC工作版本,第二逻辑分区为BMC激活版本;Set a first logical partition and a second logical partition in the BMC's EEPROM chip, where the first logical partition is the BMC working version and the second logical partition is the BMC activated version;

实时监测BMC工作状态,如果监测到BMC工作异常,第二逻辑分区接收命令,替换第一逻辑分区中BMC工作版本,进入工作状态。The BMC working status is monitored in real time. If the BMC working abnormality is detected, the second logical partition receives the command, replaces the BMC working version in the first logical partition, and enters the working status.

结合第一方面,在第一方面第一种可能的实现方式中,还包括:在BMC的EEPROM芯片中设置第三逻辑分区,所述第三逻辑分区存储有BMC版本镜像文件,当监测到BMC工作状态异常时,通过BMC版本镜像文件解压刷新,实现BMC版本的再次激活。Combined with the first aspect, a first possible implementation of the first aspect also includes: setting a third logical partition in the EEPROM chip of the BMC, and the third logical partition stores the BMC version image file. When the BMC is detected When the working status is abnormal, the BMC version can be activated again by decompressing and refreshing the BMC version image file.

进一步地,所述BMC工作版本在工作时自动备份配置信息,并保存在配置文件中。Further, the BMC working version automatically backs up the configuration information when working and saves it in the configuration file.

进一步地,当BMC版本镜像文件解压刷新,实现BMC版本的再次激活完成后,导入所述配置文件。Further, when the BMC version image file is decompressed and refreshed, and the BMC version is reactivated, the configuration file is imported.

结合第一方面,在第一方面第二种可能的实现方式中,实时监测BMC工作状态具体是:BMC中watchdog实时监测BMC工作版本的状态寄存器数值。Combined with the first aspect, in the second possible implementation of the first aspect, real-time monitoring of the BMC working status is specifically: the watchdog in the BMC real-time monitors the status register value of the BMC working version.

本发明第二方面提供了一种实现服务器BMC双活的装置,包括:The second aspect of the present invention provides a device for realizing server BMC dual-active, including:

第一设置模块,在BMC的EEPROM芯片中设置第一逻辑分区、第二逻辑分区,其中,第一逻辑分区为BMC工作版本,第二逻辑分区为BMC激活版本;The first setting module sets the first logical partition and the second logical partition in the EEPROM chip of the BMC, where the first logical partition is the BMC working version and the second logical partition is the BMC activated version;

监测模块,实时监测BMC工作状态,如果监测到BMC工作异常,第二逻辑分区接收命令,替换第一逻辑分区中BMC工作版本,进入工作状态。The monitoring module monitors the BMC working status in real time. If the BMC working abnormality is detected, the second logical partition receives the command, replaces the BMC working version in the first logical partition, and enters the working status.

结合第二方面,在第二方面第一种可能的实现方式中,还包括:第二设置模块,在BMC的EEPROM芯片中设置第三逻辑分区,所述第三逻辑分区存储有BMC版本镜像文件,当监测到BMC工作状态异常时,通过BMC版本镜像文件解压刷新,实现BMC版本的再次激活。Combined with the second aspect, the first possible implementation of the second aspect also includes: a second setting module that sets a third logical partition in the EEPROM chip of the BMC, and the third logical partition stores the BMC version image file , when an abnormal working status of the BMC is detected, the BMC version image file is decompressed and refreshed to activate the BMC version again.

进一步地,所述BMC工作版本在工作时自动备份配置信息,并保存在配置文件中。Further, the BMC working version automatically backs up the configuration information when working and saves it in the configuration file.

进一步地,当BMC版本镜像文件解压刷新,实现BMC版本的再次激活完成后,导入所述配置文件。Further, when the BMC version image file is decompressed and refreshed, and the BMC version is reactivated, the configuration file is imported.

结合第二方面,在第二方面第二种可能的实现方式中,监测模块中实时监测BMC工作状态具体是:BMC中watchdog实时监测BMC工作版本的状态寄存器数值。Combined with the second aspect, in the second possible implementation of the second aspect, the real-time monitoring of the BMC working status in the monitoring module is specifically: the watchdog in the BMC real-time monitors the status register value of the BMC working version.

本发明采用的技术方案包括以下技术效果:The technical solution adopted by the present invention includes the following technical effects:

1、本发明有效解决由于BMC芯片单一版本造成一旦BMC失效造成服务器工作停止的问题,有效的提高了服务器的工作以及管理效率。1. The present invention effectively solves the problem that the server stops working once the BMC fails due to a single version of the BMC chip, and effectively improves the work and management efficiency of the server.

2、本发明通过在BMC芯片的EEPROM中封装第三逻辑区域,第三逻辑分区存储有BMC版本镜像文件,当监测到BMC工作状态异常时,通过BMC版本镜像文件解压刷新,使得BMC芯片中的BMC工作版本以及BMC激活版本实时处于激活状态,一旦其中BMC工作异常时,可以将原有BMC工作版本进行刷新替换,保证了服务器运行工作的高效性以及可靠性。2. The present invention encapsulates the third logical area in the EEPROM of the BMC chip, and the third logical partition stores the BMC version image file. When the abnormal working status of the BMC is detected, the BMC version image file is decompressed and refreshed, so that the BMC version image file is decompressed and refreshed. The BMC working version and the BMC activated version are activated in real time. Once the BMC works abnormally, the original BMC working version can be refreshed and replaced, ensuring the efficiency and reliability of server operation.

3、BMC工作版本在工作时自动备份配置信息,并保存在配置文件中,当BMC版本镜像文件解压刷新,实现BMC版本的再次激活完成后,导入所述配置文件,实现BMC对服务器的管理连接,保证服务器管理不掉线。3. The BMC working version automatically backs up the configuration information when working and saves it in the configuration file. When the BMC version image file is decompressed and refreshed, and the BMC version is reactivated, the configuration file is imported to realize BMC's management connection to the server. , to ensure that server management does not drop offline.

应当理解的是以上的一般描述以及后文的细节描述仅是示例性和解释性的,并不能限制本发明。It should be understood that the above general description and the following detailed description are exemplary and explanatory only, and do not limit the present invention.

附图说明Description of drawings

为了更清楚说明本发明实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单介绍,显而易见的,对于本领域普通技术人员而言,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to more clearly explain the embodiments of the present invention or the technical solutions in the prior art, the following will briefly introduce the drawings needed to describe the embodiments or the prior art. It is obvious that for those of ordinary skill in the art, Other drawings can also be obtained based on these drawings without any creative effort.

图1为本发明方案中实施例一方法的流程示意图;Figure 1 is a schematic flow chart of the method of Embodiment 1 of the solution of the present invention;

图2为本发明方案中实施例一中BMC逻辑分区示意图;Figure 2 is a schematic diagram of the BMC logical partition in Embodiment 1 of the solution of the present invention;

图3为本发明方案中实施例二方法的流程示意图;Figure 3 is a schematic flow chart of the method in Embodiment 2 of the present invention;

图4为本发明方案中实施例三装置的结构示意图;Figure 4 is a schematic structural diagram of the device according to the third embodiment of the present invention;

图5为本发明方案中实施例四装置的结构示意图。Figure 5 is a schematic structural diagram of the device in Embodiment 4 of the solution of the present invention.

具体实施方式Detailed ways

为能清楚说明本方案的技术特点,下面通过具体实施方式,并结合其附图,对本发明进行详细阐述。下文的公开提供了许多不同的实施例或例子用来实现本发明的不同结构。为了简化本发明的公开,下文中对特定例子的部件和设置进行描述。此外,本发明可以在不同例子中重复参考数字和/或字母。这种重复是为了简化和清楚的目的,其本身不指示所讨论各种实施例和/或设置之间的关系。应当注意,在附图中所图示的部件不一定按比例绘制。本发明省略了对公知组件和处理技术及工艺的描述以避免不必要地限制本发明。In order to clearly explain the technical features of this solution, the present invention will be described in detail below through specific implementation modes and in conjunction with the accompanying drawings. The following disclosure provides many different embodiments or examples for implementing different structures of the invention. In order to simplify the disclosure of the present invention, the components and arrangements of specific examples are described below. Furthermore, the present invention may repeat reference numbers and/or letters in different examples. This repetition is for purposes of simplicity and clarity and does not by itself indicate a relationship between the various embodiments and/or arrangements discussed. It should be noted that components illustrated in the figures are not necessarily to scale. Descriptions of well-known components and processing techniques and processes are omitted to avoid unnecessarily limiting the invention.

实施例一Embodiment 1

如图1-图2所示,本发明提供了一种实现服务器BMC双活的方法,包括:As shown in Figures 1 and 2, the present invention provides a method for realizing server BMC dual-active, including:

S1,在BMC的EEPROM芯片中设置第一逻辑分区、第二逻辑分区,其中,第一逻辑分区为BMC工作版本,第二逻辑分区为BMC激活版本;S1, set the first logical partition and the second logical partition in the EEPROM chip of BMC, where the first logical partition is the BMC working version and the second logical partition is the BMC activated version;

S2,实时监测BMC工作状态,判断BMC工作是否异常,如果判断结果为是,执行步骤S3,如果判断结果为否,则执行步骤S2;S2, monitor the BMC working status in real time and judge whether the BMC is working abnormally. If the judgment result is yes, execute step S3. If the judgment result is no, execute step S2;

S3,第二逻辑分区接收命令,替换第一逻辑分区中BMC工作版本,进入工作状态。S3: The second logical partition receives the command, replaces the BMC working version in the first logical partition, and enters the working state.

其中,在步骤S1中,第一逻辑分区中BMC版本为工作版本,第二逻辑分区中BMC版本为激活版本,两者版本相同,可以实时处于替换与被替换的状态。Among them, in step S1, the BMC version in the first logical partition is the working version, and the BMC version in the second logical partition is the activated version. The two versions are the same and can be in a state of replacing and being replaced in real time.

在步骤S2中,实时监测BMC工作状态具体是:BMC中watchdog实时监测BMC工作版本的状态寄存器数值。BMC版本的状态都会写入状态寄存器中存储,通过检查状态寄存器的数值来确定BMC工作版本状态。例如,00为处于激活状态,01为失效,10为处于工作状态。In step S2, the real-time monitoring of the BMC working status is specifically: the watchdog in the BMC monitors the status register value of the BMC working version in real time. The status of the BMC version will be written to the status register and stored. The status of the BMC working version can be determined by checking the value of the status register. For example, 00 means active, 01 means disabled, and 10 means working.

在状态寄存器设置两个指针,其中,第一指针指向第一逻辑分区中BMC工作版本状态,第二指针指向第二逻辑分区中BMC激活版本状态,根据状态寄存器存储的数值来确定BMC工作版本状态以及BMC激活版本状态。Set two pointers in the status register. The first pointer points to the BMC working version status in the first logical partition, and the second pointer points to the BMC activated version status in the second logical partition. The BMC working version status is determined based on the value stored in the status register. and BMC activation version status.

在步骤S3中,第二逻辑分区接收命令,替换第一逻辑分区中B MC工作版本,进入工作状态具体是:watchdog监测到BMC工作异常时,向第二逻辑分区发送命令,第二逻辑分区接收到命令后,替换第一逻辑分区的BMC工作版本,进入到工作状态。其中,命令可以是通过IPMI(IntelligentPlatform Management Interface,即智能平台管理接口)命令实现。In step S3, the second logical partition receives the command, replaces the BMC working version in the first logical partition, and enters the working state. Specifically: when the watchdog detects that the BMC is working abnormally, it sends the command to the second logical partition, and the second logical partition receives the command. After receiving the command, replace the BMC working version of the first logical partition and enter the working state. The command may be implemented through an IPMI (IntelligentPlatform Management Interface) command.

本发明有效解决由于BMC芯片单一版本造成一旦BMC失效造成服务器工作停止的问题,有效的提高了服务器的工作以及管理效率。The invention effectively solves the problem of server work stopping due to BMC failure due to a single version of the BMC chip, and effectively improves the work and management efficiency of the server.

实施例二Embodiment 2

如图3所示,本发明提供了一种实现服务器BMC双活的方法,包括:As shown in Figure 3, the present invention provides a method for realizing server BMC dual-active, including:

S1,在BMC的EEPROM芯片中设置第一逻辑分区、第二逻辑分区,其中,第一逻辑分区为BMC工作版本,第二逻辑分区为BMC激活版本;S1, set the first logical partition and the second logical partition in the EEPROM chip of BMC, where the first logical partition is the BMC working version and the second logical partition is the BMC activated version;

S2,实时监测BMC工作状态,判断BMC工作是否异常,如果判断结果为是,执行步骤S3,如果判断结果为否,则执行步骤S2;S2, monitor the BMC working status in real time and judge whether the BMC is working abnormally. If the judgment result is yes, execute step S3. If the judgment result is no, execute step S2;

S3,第二逻辑分区接收命令,替换第一逻辑分区中BMC工作版本,进入工作状态;S3, the second logical partition receives the command, replaces the BMC working version in the first logical partition, and enters the working state;

S4,在BMC的EEPROM芯片中设置第三逻辑分区,第三逻辑分区存储有BMC版本镜像文件,当监测到BMC工作状态异常时,通过BMC版本镜像文件解压刷新,实现BMC版本的再次激活。S4, set up a third logical partition in the EEPROM chip of the BMC. The third logical partition stores the BMC version image file. When an abnormal working status of the BMC is detected, the BMC version image file is decompressed and refreshed to activate the BMC version again.

在步骤S4中,BMC版本镜像文件可以是BMC版本稳定版,也可以是BMC版本最新版,为了实现管理高效,一般优选为BMC版本稳定版,使得刷新后BMC工作版本或BMC激活版本能够更稳定。第三逻辑分区所占用空间可以是第二逻辑分区的一半,BMC版本镜像文件是一个zip格式的压缩包,使用BMC内置解压工具可先解压zip文件,解压后的文件默认是bin文件,再使用BMC写入工具将解压后的bin文件写入到待刷新的第一逻辑分区(或第二逻辑分区),可以通过对第一逻辑分区以及第二逻辑分区进行编码,以实现第一逻辑分区以及第二逻辑分区的区别。In step S4, the BMC version image file can be the BMC version stable version or the latest BMC version. In order to achieve efficient management, the BMC version stable version is generally preferred, so that the BMC working version or BMC activated version can be more stable after the refresh. . The space occupied by the third logical partition can be half of the second logical partition. The BMC version image file is a compressed package in zip format. Use the BMC built-in decompression tool to decompress the zip file first. The decompressed file defaults to a bin file, and then use The BMC writing tool writes the decompressed bin file to the first logical partition (or second logical partition) to be refreshed. The first logical partition and the second logical partition can be encoded to realize the first logical partition and the second logical partition. The difference between the second logical partition.

同理,当第二逻辑分区的BMC激活版本处于工作状态,第一逻辑分区中BMC版本刷新后,一旦BMC工作再次发生异常后,第一逻辑分区的中BMC版本(原BMC工作版本,现处于激活状态)会替换第二逻辑分区的BMC版本(原BMC激活版本,现处于工作状态),同时对现在第二逻辑分区中的BMC版本(原BMC激活版本,现处于工作状态,再次工作异常后处于失效状态)进行刷新。In the same way, when the BMC activated version of the second logical partition is in the working state and the BMC version in the first logical partition is refreshed, once the BMC abnormality occurs again, the BMC version of the first logical partition (the original BMC working version, is now in Activated state) will replace the BMC version of the second logical partition (the original BMC activated version, now in the working state), and at the same time, the BMC version in the second logical partition (the original BMC activated version, now in the working state) will work abnormally again. in an invalid state) to refresh.

BMC工作版本在工作时自动备份配置信息,并保存在配置文件中,即可通过BMCweb界面将BMC的配置信息保存成config文件(配置文件)并保存),当BMC版本镜像文件解压刷新,实现BMC版本的再次激活完成后,导入所述配置文件,实现BMC对服务器的管理连接,保证服务器管理不掉线。The BMC working version automatically backs up the configuration information when working and saves it in the configuration file. You can save the BMC configuration information into a config file (configuration file) through the BMC web interface and save it). When the BMC version image file is decompressed and refreshed, BMC is realized. After the reactivation of the version is completed, import the configuration file to realize BMC's management connection to the server and ensure that server management is not dropped.

本发明通过在BMC芯片的EEPROM中封装第三逻辑区域,第三逻辑分区存储有BMC版本镜像文件,当监测到BMC工作状态异常时,通过BMC版本镜像文件解压刷新,使得BMC芯片中的BMC工作版本以及BMC激活版本实时处于激活状态,一旦其中BMC工作异常时,可以将原有BMC工作版本进行刷新替换,保证了服务器运行工作的高效性以及可靠性。The present invention encapsulates the third logical area in the EEPROM of the BMC chip, and the third logical partition stores the BMC version image file. When an abnormality in the BMC working status is detected, the BMC version image file is decompressed and refreshed to make the BMC in the BMC chip work. The version and BMC activation version are activated in real time. Once the BMC works abnormally, the original BMC working version can be refreshed and replaced, ensuring the efficiency and reliability of server operation.

实施例三Embodiment 3

如图4所示,本发明技术方案还提供了一种实现服务器BMC双活的装置,包括:As shown in Figure 4, the technical solution of the present invention also provides a device for realizing server BMC dual-active, including:

第一设置模块101,在BMC的EEPROM芯片中设置第一逻辑分区、第二逻辑分区,其中,第一逻辑分区为BMC工作版本,第二逻辑分区为BMC激活版本;The first setting module 101 sets a first logical partition and a second logical partition in the EEPROM chip of the BMC, where the first logical partition is the BMC working version and the second logical partition is the BMC activated version;

监测模块102,实时监测BMC工作状态,如果监测到BMC工作异常,第二逻辑分区接收命令,替换第一逻辑分区中BMC工作版本,进入工作状态。The monitoring module 102 monitors the BMC working status in real time. If the BMC working abnormality is detected, the second logical partition receives the command, replaces the BMC working version in the first logical partition, and enters the working status.

监测模块102中实时监测BMC工作状态具体是:BMC中watchdog实时监测BMC工作版本的状态寄存器数值。The real-time monitoring of the BMC working status in the monitoring module 102 is specifically: the watchdog in the BMC monitors the status register value of the BMC working version in real time.

本发明有效解决由于BMC芯片单一版本造成一旦BMC失效造成服务器工作停止的问题,有效的提高了服务器的工作以及管理效率。The invention effectively solves the problem of server work stopping due to BMC failure due to a single version of the BMC chip, and effectively improves the work and management efficiency of the server.

实施例四Embodiment 4

如图5所示,本发明技术方案还提供一种实现服务器BMC双活的装置,包括:As shown in Figure 5, the technical solution of the present invention also provides a device for realizing server BMC dual-active, including:

第一设置模块101,在BMC的EEPROM芯片中设置第一逻辑分区、第二逻辑分区,其中,第一逻辑分区为BMC工作版本,第二逻辑分区为BMC激活版本;The first setting module 101 sets a first logical partition and a second logical partition in the EEPROM chip of the BMC, where the first logical partition is the BMC working version and the second logical partition is the BMC activated version;

监测模块102,实时监测BMC工作状态,如果监测到BMC工作异常,第二逻辑分区接收命令,替换第一逻辑分区中BMC工作版本,进入工作状态;The monitoring module 102 monitors the BMC working status in real time. If the BMC working abnormality is detected, the second logical partition receives the command, replaces the BMC working version in the first logical partition, and enters the working status;

第二设置模块103,在BMC的EEPROM芯片中设置第三逻辑分区,所述第三逻辑分区存储有BMC版本镜像文件,当监测到BMC工作状态异常时,通过BMC版本镜像文件解压刷新,实现BMC版本的再次激活。The second setting module 103 sets a third logical partition in the EEPROM chip of the BMC. The third logical partition stores the BMC version image file. When an abnormal working status of the BMC is detected, the BMC version image file is decompressed and refreshed to realize the BMC. Reactivation of the version.

本发明通过在BMC芯片的EEPROM中封装第三逻辑区域,第三逻辑分区存储有BMC版本镜像文件,当监测到BMC工作状态异常时,通过BMC版本镜像文件解压刷新,使得BMC芯片中的BMC工作版本以及BMC激活版本实时处于激活状态,一旦其中BMC工作异常时,可以将原有BMC工作版本进行刷新替换,保证了服务器运行工作的高效性以及可靠性。The present invention encapsulates the third logical area in the EEPROM of the BMC chip, and the third logical partition stores the BMC version image file. When an abnormality in the BMC working status is detected, the BMC version image file is decompressed and refreshed to make the BMC in the BMC chip work. The version and BMC activation version are activated in real time. Once the BMC works abnormally, the original BMC working version can be refreshed and replaced, ensuring the efficiency and reliability of server operation.

本发明BMC工作版本在工作时自动备份配置信息,并保存在配置文件中,当BMC版本镜像文件解压刷新,实现BMC版本的再次激活完成后,导入所述配置文件,实现BMC对服务器的管理连接,保证服务器管理不掉线。The BMC working version of the present invention automatically backs up configuration information when working and saves it in the configuration file. When the BMC version image file is decompressed and refreshed to realize the reactivation of the BMC version, the configuration file is imported to realize the BMC management connection to the server. , to ensure that server management does not drop offline.

上述虽然结合附图对本发明的具体实施方式进行了描述,但并非对本发明保护范围的限制,所属领域技术人员应该明白,在本发明的技术方案的基础上,本领域技术人员不需要付出创造性劳动即可做出的各种修改或变形仍在本发明的保护范围以内。Although the specific embodiments of the present invention have been described above in conjunction with the accompanying drawings, they do not limit the scope of the present invention. Those skilled in the art should understand that based on the technical solutions of the present invention, those skilled in the art do not need to perform creative work. Various modifications or variations that can be made are still within the protection scope of the present invention.

Claims (2)

1. The method for realizing the dual-activity of the server BMC is characterized by comprising the following steps of:
setting a first logical partition and a second logical partition in an EEPROM chip of the BMC, wherein the first logical partition is a BMC working version, and the second logical partition is a BMC activating version; the BMC working version automatically backs up configuration information when working and stores the configuration information in a configuration file; saving configuration information of the BMC into a configuration file through a BMC web interface;
monitoring the BMC working state in real time, if the BMC working abnormality is monitored, receiving a command by the second logical partition, replacing the BMC working version in the first logical partition, and entering the working state; the real-time monitoring BMC working state specifically comprises the following steps: the method comprises the steps that a watchdog in the BMC monitors the value of a state register of a working version of the BMC in real time; setting a first pointer and a second pointer in a state register, wherein the first pointer points to a BMC working version state in a first logic partition, and the second pointer points to a BMC activating version state in a second logic partition, and determining the BMC working version state and the BMC activating version state according to the numerical value stored in the state register;
setting a third logical partition in an EEPROM chip of the BMC, wherein the third logical partition stores BMC version image files, and when abnormal working states of the BMC are monitored, the BMC version image files are decompressed and refreshed to realize reactivation of the BMC version; the BMC version image file is a BMC version stable version, so that a BMC working version or a BMC active version can be more stable after refreshing; when the abnormal working state of the BMC is monitored, the BMC version is activated again through decompressing and refreshing the BMC version image file, and the method specifically comprises the following steps: when the BMC activated version of the second logical partition is in a working state, after the BMC version in the first logical partition is refreshed, once the BMC works abnormally again, the BMC version in the first logical partition can replace the BMC version of the second logical partition, and meanwhile the BMC version in the second logical partition is refreshed; and when the BMC version image file is decompressed and refreshed, after the reactivation of the BMC version is completed, importing the configuration file, realizing management connection of the BMC to the server, and ensuring that the server management is not disconnected.
2. An apparatus for implementing server BMC dual activity, comprising:
the first setting module is used for setting a first logical partition and a second logical partition in an EEPROM chip of the BMC, wherein the first logical partition is a BMC working version, and the second logical partition is a BMC activating version; the BMC working version automatically backs up configuration information when working and stores the configuration information in a configuration file; saving configuration information of the BMC into a configuration file through a BMC web interface;
the monitoring module monitors the BMC working state in real time, if the BMC working abnormality is monitored, the second logical partition receives the command, replaces the BMC working version in the first logical partition and enters the working state; the real-time monitoring BMC working state specifically comprises the following steps: the method comprises the steps that a watchdog in the BMC monitors the value of a state register of a working version of the BMC in real time; setting a first pointer and a second pointer in a state register, wherein the first pointer points to a BMC working version state in a first logic partition, and the second pointer points to a BMC activating version state in a second logic partition, and determining the BMC working version state and the BMC activating version state according to the numerical value stored in the state register;
the second setting module is used for setting a third logical partition in the EEPROM chip of the BMC, wherein the third logical partition stores BMC version image files, and when abnormal working states of the BMC are monitored, the BMC version image files are decompressed and refreshed to realize reactivation of the BMC version; the BMC version image file is a BMC version stable version, so that a BMC working version or a BMC active version can be more stable after refreshing; when the abnormal working state of the BMC is monitored, the BMC version is activated again through decompressing and refreshing the BMC version image file, and the method specifically comprises the following steps: when the BMC activated version of the second logical partition is in a working state, after the BMC version in the first logical partition is refreshed, once the BMC works abnormally again, the BMC version in the first logical partition can replace the BMC version of the second logical partition, and meanwhile the BMC version in the second logical partition is refreshed; and when the BMC version image file is decompressed and refreshed, after the reactivation of the BMC version is completed, importing the configuration file, realizing management connection of the BMC to the server, and ensuring that the server management is not disconnected.
CN201911388465.8A 2019-12-30 2019-12-30 Method and device for realizing server BMC dual-activity Active CN111190632B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911388465.8A CN111190632B (en) 2019-12-30 2019-12-30 Method and device for realizing server BMC dual-activity

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911388465.8A CN111190632B (en) 2019-12-30 2019-12-30 Method and device for realizing server BMC dual-activity

Publications (2)

Publication Number Publication Date
CN111190632A CN111190632A (en) 2020-05-22
CN111190632B true CN111190632B (en) 2024-02-27

Family

ID=70705916

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911388465.8A Active CN111190632B (en) 2019-12-30 2019-12-30 Method and device for realizing server BMC dual-activity

Country Status (1)

Country Link
CN (1) CN111190632B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1885270A (en) * 2005-06-24 2006-12-27 株式会社东芝 Information processing apparatus, storage medium, and data rescue method
CN105279042A (en) * 2014-07-15 2016-01-27 华耀(中国)科技有限公司 Redundant backup system and method for BSD system
CN110399152A (en) * 2019-07-22 2019-11-01 浙江鸿泉车联网有限公司 A kind of device systems double copies upgrade method and device

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1885270A (en) * 2005-06-24 2006-12-27 株式会社东芝 Information processing apparatus, storage medium, and data rescue method
CN105279042A (en) * 2014-07-15 2016-01-27 华耀(中国)科技有限公司 Redundant backup system and method for BSD system
CN110399152A (en) * 2019-07-22 2019-11-01 浙江鸿泉车联网有限公司 A kind of device systems double copies upgrade method and device

Also Published As

Publication number Publication date
CN111190632A (en) 2020-05-22

Similar Documents

Publication Publication Date Title
US11687391B2 (en) Serializing machine check exceptions for predictive failure analysis
CN112559395B (en) Relay protection device and method based on dual-Soc storage system exception handling mechanism
US20090276665A1 (en) Apparatus, system, and method of efficiently utilizing hardware resources for a software test
CN114416435A (en) Microprocessor architecture and microprocessor fault detection method
CN103955188A (en) Control system and method supporting redundancy switching function
US10102045B2 (en) Control device, control method and program
CN101799776A (en) Fault processing method of multi-core processor, multi-core processor and communication device
US12007820B2 (en) Systems, devices, and methods for controller devices handling fault events
CN114116280B (en) Interactive BMC self-recovery method, system, terminal and storage medium
CN105045531A (en) Buffer synchronization mechanism between double storage controllers
CN103488721A (en) Database bisynchronous method and system for master and slave boards
CN105302768A (en) Slave CPU exception processing method and apparatus
CN115237644A (en) System fault handling method, central computing unit and vehicle
JP2010186242A (en) Computer system
CN111190632B (en) Method and device for realizing server BMC dual-activity
CN115617550A (en) Processing device, control unit, electronic device, method, and computer program
CN107133134A (en) A kind of efficient RAID card Auto-Test System and method
CN210721440U (en) PCIE card abnormity recovery device, PCIE card and PCIE expansion system
JP2008152552A (en) Computer system and failure information management method
CN110825547A (en) SMBUS-based PCIE card exception recovery device and method
CN118132386B (en) System crash information storage method, device and computer system
CN105391575A (en) Treasury control method and system
JP6835422B1 (en) Information processing device and information processing method
JP2002229811A (en) Control method of logical partition system
JPS6290068A (en) Standby system monitoring method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP03 Change of name, title or address

Address after: Building 9, No.1, guanpu Road, Guoxiang street, Wuzhong Economic Development Zone, Wuzhong District, Suzhou City, Jiangsu Province

Patentee after: Suzhou Yuannao Intelligent Technology Co.,Ltd.

Country or region after: China

Address before: Building 9, No.1, guanpu Road, Guoxiang street, Wuzhong Economic Development Zone, Wuzhong District, Suzhou City, Jiangsu Province

Patentee before: SUZHOU LANGCHAO INTELLIGENT TECHNOLOGY Co.,Ltd.

Country or region before: China