[go: up one dir, main page]

CN112003895B - Cloud host evacuation method, device, device and storage medium in OpenStack cloud platform - Google Patents

Cloud host evacuation method, device, device and storage medium in OpenStack cloud platform Download PDF

Info

Publication number
CN112003895B
CN112003895B CN202010723780.8A CN202010723780A CN112003895B CN 112003895 B CN112003895 B CN 112003895B CN 202010723780 A CN202010723780 A CN 202010723780A CN 112003895 B CN112003895 B CN 112003895B
Authority
CN
China
Prior art keywords
nova
service
cloud host
cloud
controller
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010723780.8A
Other languages
Chinese (zh)
Other versions
CN112003895A (en
Inventor
宋文平
亓开元
苏广峰
张百林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Metabrain Intelligent Technology Co Ltd
Original Assignee
Suzhou Inspur Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Inspur Intelligent Technology Co Ltd filed Critical Suzhou Inspur Intelligent Technology Co Ltd
Priority to CN202010723780.8A priority Critical patent/CN112003895B/en
Publication of CN112003895A publication Critical patent/CN112003895A/en
Application granted granted Critical
Publication of CN112003895B publication Critical patent/CN112003895B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/51Discovery or management thereof, e.g. service location protocol [SLP] or web services

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

本申请涉及一种OpenStack云平台中云主机疏散方法、装置、设备及存储介质。方法包括:利用Nova controller服务接收疏散请求,并根据疏散请求确定待疏散云主机;利用Nova controller服务和设备管理服务获取待疏散云主机的设备绑定信息,其中,设备绑定信息包括PCI设备和/或PCIe设备;利用Nova controller服务根据设备绑定信息从待疏散云主机中筛选出绑定设备的云主机;利用Nova controller服务、绑定设备的云主机和资源管理服务确定云平台中是否存在可用计算节点;若存在可用计算节点,则利用Nova controller服务、设备管理服务和Nova compute服务为绑定设备的云主机在可用计算节点上创建新云主机。本发明方案实现了将绑定设备的云主机自动疏散到其它计算节点上,提高了云平台的健壮性,节省了人工成本。

Figure 202010723780

The present application relates to a cloud host evacuation method, device, device and storage medium in an OpenStack cloud platform. The method includes: using the Nova controller service to receive the evacuation request, and determining the cloud host to be evacuated according to the evacuation request; using the Nova controller service and the device management service to obtain device binding information of the cloud host to be evacuated, wherein the device binding information includes PCI devices and / or PCIe device; use the Nova controller service to screen out the cloud host to be evacuated from the cloud hosts to be evacuated according to the device binding information; use the Nova controller service, the cloud host of the bound device and the resource management service to determine whether there is a cloud platform Available computing nodes; if there are available computing nodes, use the Nova controller service, device management service, and Nova compute service to create a new cloud host on the available computing node for the cloud host bound to the device. The solution of the invention realizes the automatic evacuation of the cloud host bound to the device to other computing nodes, improves the robustness of the cloud platform, and saves labor costs.

Figure 202010723780

Description

OpenStack云平台中云主机疏散方法、装置、设备及存储介质Cloud host evacuation method, device, device and storage medium in OpenStack cloud platform

技术领域technical field

本发明属于云计算领域,尤其涉及一种OpenStack云平台中云主机疏散方法、装置、设备及存储介质。The invention belongs to the field of cloud computing, and in particular relates to a cloud host evacuation method, device, equipment and storage medium in an OpenStack cloud platform.

背景技术Background technique

OpenStack开发的多架构云平台包括多个计算节点,每个计算节点由一个宿主机和若干运行在宿主机上的云主机(虚拟机)构成,当云主机所在的宿主机出现硬件故障或者断电造成整台计算节点无法工作时,该节点上的云主机需要疏散到其他可用的计算节点上。The multi-architecture cloud platform developed by OpenStack includes multiple computing nodes. Each computing node consists of a host and several cloud hosts (virtual machines) running on the host. When the host where the cloud host is located has a hardware failure or power failure When the entire computing node cannot work, the cloud host on the node needs to be evacuated to other available computing nodes.

目前OpenStack云平台提供的Nova evacuate服务虽然能够对未绑定宿主机设备的云主机进行疏散,但是对于直通PCI/PCIe设备的云主机无法进行疏散,需要运维人员手动进行疏散,手动清理数据库中云主机与PCI/PCIe 设备的绑定关系数据;人工疏散的方式不仅耗费时间较长,而且人工成本较高。At present, the Nova evacuate service provided by the OpenStack cloud platform can evacuate cloud hosts that are not bound to host devices, but cannot evacuate cloud hosts with pass-through PCI/PCIe devices. Operators need to manually evacuate and manually clean up the database. Binding relationship data between cloud hosts and PCI/PCIe devices; manual evacuation not only takes a long time, but also has high labor costs.

发明内容SUMMARY OF THE INVENTION

有鉴于此,有必要针对以上技术问题提供能一种OpenStack云平台中云主机疏散方法、装置、设备及存储介质。In view of this, it is necessary to provide a cloud host evacuation method, apparatus, device and storage medium in the OpenStack cloud platform for the above technical problems.

根据本发明的一方面,提供了一种OpenStack云平台中云主机疏散方法,所述方法包括以下步骤:According to an aspect of the present invention, a cloud host evacuation method in an OpenStack cloud platform is provided, the method comprising the following steps:

利用Nova controller服务接收疏散请求,并根据所述疏散请求确定待疏散云主机;Use the Nova controller service to receive the evacuation request, and determine the cloud host to be evacuated according to the evacuation request;

利用所述Nova controller服务和设备管理服务获取所述待疏散云主机的设备绑定信息,其中,所述设备绑定信息包括PCI设备和/或PCIe设备;Obtain the device binding information of the cloud host to be evacuated by using the Nova controller service and the device management service, wherein the device binding information includes PCI devices and/or PCIe devices;

利用Nova controller服务根据所述设备绑定信息从所述待疏散云主机中筛选出绑定设备的云主机;Use the Nova controller service to filter out the cloud host to which the device is bound from the cloud hosts to be evacuated according to the device binding information;

利用Nova controller服务、绑定设备的云主机和资源管理服务确定云平台中是否存在可用计算节点;Use the Nova controller service, the cloud host and resource management service of the bound device to determine whether there are available computing nodes in the cloud platform;

若存在可用计算节点,则利用所述Nova controller服务、所述设备管理服务和Nova compute服务为所述绑定设备的云主机在所述可用计算节点上创建新云主机。If there is an available computing node, use the Nova controller service, the device management service, and the Nova compute service to create a new cloud host on the available computing node for the cloud host bound to the device.

在其中一个实施例中,所述方法还包括:In one embodiment, the method further includes:

利用Nova evacuate服务对剩余待疏散云主机进行疏散。Use the Nova evacuate service to evacuate the remaining cloud hosts to be evacuated.

在其中一个实施例中,所述利用所述Nova controller服务和设备管理服务获取所述待疏散云主机的设备绑定信息的步骤包括:In one embodiment, the step of obtaining the device binding information of the cloud host to be evacuated by using the Nova controller service and the device management service includes:

所述Nova controller服务调用所述设备管理服务获取所述待疏散云主机的设备绑定信息;The Nova controller service invokes the device management service to obtain the device binding information of the cloud host to be evacuated;

所述设备管理服务将所述设备绑定信息返回至所述Nova controller服务。The device management service returns the device binding information to the Nova controller service.

在其中一个实施例中,所述利用Nova controller服务、绑定设备的云主机和资源管理服务确定云平台中是否存在可用计算节点的步骤包括:In one of the embodiments, the step of determining whether there is an available computing node in the cloud platform by using the Nova controller service, the cloud host bound to the device, and the resource management service includes:

Nova controller服务获取所述绑定设备的云主机的配置信息,其中,所述配置信息包括CPU信息、内存信息、磁盘信息;The Nova controller service obtains the configuration information of the cloud host to which the device is bound, wherein the configuration information includes CPU information, memory information, and disk information;

根据所述配置信息和所述绑定设备的云主机的设备绑定信息确定新云主机所需资源;Determine the resources required by the new cloud host according to the configuration information and the device binding information of the cloud host to which the device is bound;

Nova controller服务调用资源管理服务查询云平台中是否存在满足新云主机所需资源的可用计算节点。The Nova controller service calls the resource management service to query whether there are available computing nodes in the cloud platform that meet the resources required by the new cloud host.

在其中一个实施例中,所述利用所述Nova controller服务、所述设备管理服务和Nova compute服务为所述绑定设备的云主机在所述可用计算节点上创建新云主机的步骤包括:In one embodiment, the step of using the Nova controller service, the device management service and the Nova compute service to create a new cloud host on the available computing node for the device-bound cloud host includes:

利用Nova controller服务调用所述设备管理服务绑定新云主机和设备的关系,并利用所述设备管理服务向所述Nova controller服务发送设备绑定事件的状态;Use the Nova controller service to call the device management service to bind the relationship between the new cloud host and the device, and use the device management service to send the state of the device binding event to the Nova controller service;

利用所述Nova controller服务调用Nova compute服务创建新云主机,并利用所述Nova compute服务监听设备绑定事件的状态;Use the Nova controller service to call the Nova compute service to create a new cloud host, and use the Nova compute service to monitor the status of device binding events;

利用Nova controller服务向所述Nova compute服务发送设备绑定事件的状态;Use the Nova controller service to send the status of the device binding event to the Nova compute service;

若所述Nova compute服务监听到设备绑定事件成功,则利用所述Nova compute服务生成新云主机的xml文件,并启动新云主机。If the Nova compute service detects that the device binding event is successful, the Nova compute service is used to generate an xml file of the new cloud host, and the new cloud host is started.

在其中一个实施例中,所述方法还包括:In one embodiment, the method further includes:

若不存在可用计算节点,则利用所述Nova controller服务生成云主机疏散异常信息。If there is no available computing node, use the Nova controller service to generate cloud host evacuation exception information.

在其中一个实施例中,所述PCI设备和/或PCIe设备包括:GPU、FPGA、 NVMe和SSD。In one embodiment, the PCI device and/or PCIe device includes: GPU, FPGA, NVMe and SSD.

根据本发明的另一方面,提供了OpenStack云平台中云主机疏散装置,所述装置包括:According to another aspect of the present invention, a cloud host evacuation device in an OpenStack cloud platform is provided, the device comprising:

请求接收模块,用于利用Nova controller服务接收疏散请求,并根据所述疏散请求确定待疏散云主机;a request receiving module, configured to receive an evacuation request using the Nova controller service, and determine the cloud host to be evacuated according to the evacuation request;

获取模块,用于利用所述Nova controller服务和设备管理服务获取所述待疏散云主机的设备绑定信息,其中,所述设备绑定信息包括PCI设备和/ 或PCIe设备;an obtaining module, configured to obtain the device binding information of the cloud host to be evacuated by using the Nova controller service and the device management service, wherein the device binding information includes a PCI device and/or a PCIe device;

筛选模块,用于利用Nova controller服务根据所述设备绑定信息从所述待疏散云主机中筛选出绑定设备的云主机;A screening module, configured to use the Nova controller service to screen out the cloud host bound to the device from the cloud host to be evacuated according to the device binding information;

计算节点确定模块,用于利用Nova controller服务、绑定设备的云主机和资源管理服务确定云平台中是否存在可用计算节点;The computing node determination module is used to determine whether there is an available computing node in the cloud platform by using the Nova controller service, the cloud host of the bound device and the resource management service;

新云主机创建模块,用于在存在可用计算节点时,则利用所述Nova controller服务、所述设备管理服务和Nova compute服务为所述绑定设备的云主机在所述可用计算节点上创建新云主机。A new cloud host creation module is used to create a new cloud host on the available computing node for the cloud host bound to the device by using the Nova controller service, the device management service and the Nova compute service when there is an available computing node. Cloud hosting.

根据本发明的又一方面,还提供了一种计算机设备,包括:至少一个处理器;以及According to yet another aspect of the present invention, there is also provided a computer device comprising: at least one processor; and

存储器,所述存储器存储有可在所述处理器上运行的计算机程序,所述处理器进行所述程序时进行前述的OpenStack云平台中云主机疏散方法。A memory, where the memory stores a computer program that can be executed on the processor, and when the processor executes the program, the foregoing method for evacuating a cloud host in an OpenStack cloud platform is performed.

根据本发明的再一方面,还提供了一种计算机可读存储介质,计算机可读存储介质存储有计算机程序,计算机程序被处理器进行时进行前述的 OpenStack云平台中云主机疏散方法。According to yet another aspect of the present invention, a computer-readable storage medium is also provided. The computer-readable storage medium stores a computer program. When the computer program is executed by a processor, the aforementioned method for evacuation of a cloud host in an OpenStack cloud platform is performed.

上述OpenStack云平台中云主机疏散方法、装置、设备及存储介质,利用Novacontroller服务、设备管理服务、资源管理服务和Nova controller服务实现了将绑定设备的云主机自动疏散到其它计算节点上,提高了多架构云平台的健壮性,并且能够释放故障节点上的PCI/PCIe设备,以便在服务器故障恢复后能够继续使用其上的PCI/PCIe设备,节省了人工成本。The cloud host evacuation method, device, equipment and storage medium in the above-mentioned OpenStack cloud platform utilizes Novacontroller service, device management service, resource management service and Nova controller service to realize the automatic evacuation of the cloud host bound to the device to other computing nodes. The robustness of the multi-architecture cloud platform is improved, and the PCI/PCIe devices on the failed node can be released, so that the PCI/PCIe devices on the server can continue to be used after the server fails, saving labor costs.

附图说明Description of drawings

为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的实施例。In order to explain the embodiments of the present invention or the technical solutions in the prior art more clearly, the following briefly introduces the accompanying drawings that need to be used in the description of the embodiments or the prior art. Obviously, the accompanying drawings in the following description are only These are some embodiments of the present invention. For those of ordinary skill in the art, other embodiments can also be obtained according to these drawings without creative efforts.

图1为本发明一个实施例中一种OpenStack云平台中云主机疏散方法的流程示意图;1 is a schematic flowchart of a method for evacuating a cloud host in an OpenStack cloud platform according to an embodiment of the present invention;

图2为本发明又一个实施例中绑定设备的云主机疏散流程示意图;FIG. 2 is a schematic diagram of an evacuation process of a cloud host with a bound device in another embodiment of the present invention;

图3为本发明另一个实施例中OpenStack云平台中云主机疏散装置的结构示意图;3 is a schematic structural diagram of a cloud host evacuation device in an OpenStack cloud platform in another embodiment of the present invention;

图4为本发明另一个实施例中算机设备的内部结构图。FIG. 4 is an internal structure diagram of a computer device in another embodiment of the present invention.

具体实施方式Detailed ways

为使本发明的目的、技术方案和优点更加清楚明白,以下结合具体实施例,并参照附图,对本发明实施例进一步详细说明。In order to make the objectives, technical solutions and advantages of the present invention more clearly understood, the embodiments of the present invention will be further described in detail below with reference to the specific embodiments and the accompanying drawings.

需要说明的是,本发明实施例中所有使用“第一”和“第二”的表述均是为了区分两个相同名称非相同的实体或者非相同的参量,可见“第一”“第二”仅为了表述的方便,不应理解为对本发明实施例的限定,后续实施例对此不再一一说明。It should be noted that all expressions using "first" and "second" in the embodiments of the present invention are for the purpose of distinguishing two entities with the same name but not the same or non-identical parameters. It can be seen that "first" and "second" It is only for the convenience of expression and should not be construed as a limitation to the embodiments of the present invention, and subsequent embodiments will not describe them one by one.

在一个实施例中,请参照图1所示,本发明提供了一种OpenStack云平台中云主机疏散方法,该方法具体包括以下步骤:In one embodiment, please refer to FIG. 1 , the present invention provides a cloud host evacuation method in an OpenStack cloud platform, and the method specifically includes the following steps:

S100,利用Nova controller服务接收疏散请求,并根据疏散请求确定待疏散云主机。S100, the Nova controller service is used to receive the evacuation request, and the cloud host to be evacuated is determined according to the evacuation request.

其中,Nova controller服务是OpenStack中Nova组件中的控制插件的集合,用于在OpenStack中控制节点上云主机创建、删除、调度、开关机等操作的服务;疏散请求是指某计算节点上宿主机故障时将该故障节点上的全部的云主机疏散其它计算节点上的请求,该请求中包含故障的计算节点,以及该计算节点下的全部云主机。Among them, the Nova controller service is a collection of control plug-ins in the Nova component in OpenStack, which is used to control the cloud host creation, deletion, scheduling, power on and off and other operations on the node in OpenStack; the evacuation request refers to the host on a computing node. When a fault occurs, all cloud hosts on the faulty node are evacuated from requests on other computing nodes, and the request includes the faulty computing node and all cloud hosts under the computing node.

S200,利用Nova controller服务和设备管理服务获取待疏散云主机的设备绑定信息,其中,设备绑定信息包括PCI设备和/或PCIe设备。S200, use the Nova controller service and the device management service to obtain device binding information of the cloud host to be evacuated, where the device binding information includes PCI devices and/or PCIe devices.

其中,设备管理服务是指OpenStack中PCI设备管理系统,提供 PCI/PCIe设备列表、属性、绑定、解绑操作接口的服务;设备绑定信息是指云主机对宿主机设备的绑定信息,优选地,PCI设备和/或PCIe设备包括: GPU、FPGA、NVMe和SSD。Among them, device management service refers to the PCI device management system in OpenStack, which provides PCI/PCIe device list, attribute, binding, and unbinding operation interface services; device binding information refers to the binding information of the cloud host to the host device, Preferably, the PCI device and/or the PCIe device includes: GPU, FPGA, NVMe and SSD.

S300,利用Nova controller服务根据设备绑定信息从待疏散云主机中筛选出绑定设备的云主机。S300, use the Nova controller service to screen out the cloud hosts bound to the device from the cloud hosts to be evacuated according to the device binding information.

S400,利用Nova controller服务、绑定设备的云主机和资源管理服务确定云平台中是否存在可用计算节点。S400: Determine whether there is an available computing node in the cloud platform by using the Nova controller service, the cloud host bound to the device, and the resource management service.

其中,资源管理服务示值OpenStack中对CPU、Memory、Disk、PCI/PCIe 设备等资源的使用、调度、管理的服务;可用计算节点是指多架构云平台中一个计算节点,其能够用于创建云主机,并提供相应的PCI设备和/或PCIe 设备。Among them, the resource management service indicates the use, scheduling, and management of resources such as CPU, Memory, Disk, PCI/PCIe devices in OpenStack; the available computing node refers to a computing node in the multi-architecture cloud platform, which can be used to create Cloud host, and provide corresponding PCI devices and/or PCIe devices.

S500,若存在可用计算节点,则利用Nova controller服务、设备管理服务和Novacompute服务为绑定设备的云主机在可用计算节点上创建新云主机。其中,Nova compute是指OpenStack中启动云主机的计算节点服务。S500, if there is an available computing node, use the Nova controller service, the device management service, and the Novacompute service to create a new cloud host on the available computing node for the cloud host bound to the device. Among them, Nova compute refers to the computing node service that starts the cloud host in OpenStack.

上述一种OpenStack云平台中云主机疏散方法,利用Nova controller 服务、设备管理服务、资源管理服务和Nova controller服务实现了将绑定设备的云主机自动疏散到其它计算节点上,提高了多架构云平台的健壮性,并且能够释放故障节点上的PCI/PCIe设备,以便在服务器故障恢复后能够继续使用其上的PCI/PCIe设备,节省了人工成本。The above-mentioned method for evacuation of cloud hosts in the OpenStack cloud platform utilizes Nova controller services, device management services, resource management services and Nova controller services to automatically evacuate cloud hosts bound to devices to other computing nodes, thereby improving the multi-architecture cloud The robustness of the platform and the ability to release the PCI/PCIe devices on the failed node so that the PCI/PCIe devices on the server can continue to be used after the server fails to recover, saving labor costs.

优选地,本发明方法还包括以下步骤:Preferably, the method of the present invention further comprises the following steps:

S600,利用Nova evacuate服务对剩余待疏散云主机进行疏散。S600, use the Nova evacuate service to evacuate the remaining cloud hosts to be evacuated.

优选地,还包括:S700,若不存在可用计算节点,则利用Nova controller 服务生成云主机疏散异常信息。Preferably, the method further includes: S700, if there is no available computing node, use the Nova controller service to generate cloud host evacuation exception information.

上述一种OpenStack云平台中云主机疏散方法,通过结合OpenStack 云平台提供的Nova evacuate服务对未绑定设备的云主机进行疏散,从而实现了对故障计算节点上全部云主机的疏散;并且对无法自动疏散的云主机自动爬出异常信息,记录云主机疏散的日志内容,方便后续运维人员检修和维护。The above-mentioned method for evacuation of cloud hosts in the OpenStack cloud platform, by combining the Nova evacuate service provided by the OpenStack cloud platform to evacuate cloud hosts that are not bound to devices, thereby realizing the evacuation of all cloud hosts on the faulty computing node; The automatically evacuated cloud host automatically crawls out abnormal information, and records the log content of the cloud host evacuation, which is convenient for subsequent operation and maintenance personnel to overhaul and maintain.

请参照图2所示,在又一个实施例中,上述步骤S200具体包括以下子步骤:Referring to FIG. 2, in another embodiment, the above step S200 specifically includes the following sub-steps:

S210,Nova controller服务调用设备管理服务获取待疏散云主机的设备绑定信息。S210, the Nova controller service invokes the device management service to obtain device binding information of the cloud host to be evacuated.

S220,设备管理服务将设备绑定信息返回至Nova controller服务。S220, the device management service returns the device binding information to the Nova controller service.

在又一个实施例中上述步骤S400具体包括以下子步骤:In yet another embodiment, the above step S400 specifically includes the following sub-steps:

S410,Nova controller服务获取绑定设备的云主机的配置信息,其中,配置信息包括CPU信息、内存信息、磁盘信息。S410, the Nova controller service obtains configuration information of the cloud host bound to the device, where the configuration information includes CPU information, memory information, and disk information.

S420,根据配置信息和绑定设备的云主机的设备绑定信息确定新云主机所需资源。S420: Determine the resources required by the new cloud host according to the configuration information and the device binding information of the cloud host to which the device is bound.

S430,Nova controller服务调用资源管理服务查询云平台中是否存在满足新云主机所需资源的可用计算节点。S430, the Nova controller service invokes the resource management service to query whether there are available computing nodes in the cloud platform that meet the resources required by the new cloud host.

在又一个实施例中,上述步骤S500具体包括以下子步骤:In yet another embodiment, the above step S500 specifically includes the following sub-steps:

S510,利用Nova controller服务调用设备管理服务绑定新云主机和设备的关系,并利用设备管理服务向Nova controller服务发送设备绑定事件的状态。S510, use the Nova controller service to call the device management service to bind the relationship between the new cloud host and the device, and use the device management service to send the status of the device binding event to the Nova controller service.

S520,利用Nova controller服务调用Nova compute服务创建新云主机,并利用Nova compute服务监听设备绑定事件的状态;S520, use the Nova controller service to call the Nova compute service to create a new cloud host, and use the Nova compute service to monitor the status of device binding events;

S530,利用Nova controller服务向Nova compute服务发送设备绑定事件的状态;S530, use the Nova controller service to send the status of the device binding event to the Nova compute service;

S540,若Nova compute服务监听到设备绑定事件成功,则利用Nova compute服务生成新云主机的xml文件,并启动新云主机。S540, if the Nova compute service monitors the successful device binding event, the Nova compute service is used to generate an xml file of the new cloud host, and the new cloud host is started.

在又一个实施例中,为了便于理解本发明的技术方案,假设云计算平台中存在两个均具有PCI/PCIe设备计算节点为例,分别记作计算节点1和计算节点2,计算节点1的上运行两个云主机,云主机1绑定了计算节点1 的GPU,云主机2绑定了计算节点1的SSD,如果计算节点1断电,但计算节点2正常运行,此时Nova controller接收到宿主机1节点故障的事件,开始对云主机1和云主机2进行疏散,具体流程如下:In yet another embodiment, in order to facilitate the understanding of the technical solution of the present invention, it is assumed that there are two computing nodes with PCI/PCIe devices in the cloud computing platform as an example, which are denoted as computing node 1 and computing node 2, respectively. Two cloud hosts are running on the cloud host. Cloud host 1 is bound to the GPU of computing node 1, and cloud host 2 is bound to the SSD of computing node 1. If computing node 1 is powered off, but computing node 2 is running normally, the Nova controller receives In the event of node failure of host 1, the evacuation of cloud host 1 and cloud host 2 begins. The specific process is as follows:

步骤1,Nova controller,服务调用设备管理服务获取云主机1和云主机 2绑定的PCI/PCIe设备信息,根据PCI/PCIe设备信息与云主机1和云主1 规格中的CPU、Memory、Disk信息,调用资源管理系统选择计算节点2 作为可用计算节点;Step 1, Nova controller, the service calls the device management service to obtain the PCI/PCIe device information bound to the cloud host 1 and the cloud host 2, and according to the PCI/PCIe device information and the CPU, Memory, Disk in the specifications of the cloud host 1 and the cloud host 1 information, call the resource management system to select computing node 2 as the available computing node;

步骤3,Nova controller服务调用设备管理服务绑定主机和PCI/PCIe设备的关系,由于PCI设备管理服务绑定PCI/PCIe设备为异步操作,所以设备管理服务发送PCI/PCIe绑定事件给Nova controller服务。Step 3, the Nova controller service calls the device management service to bind the relationship between the host and the PCI/PCIe device. Since the PCI device management service binds the PCI/PCIe device for an asynchronous operation, the device management service sends the PCI/PCIe binding event to the Nova controller Serve.

步骤4,Nova controller服务调用Nova compute服务创建新云主机1 和新云主机2;Step 4, the Nova controller service calls the Nova compute service to create a new cloud host 1 and a new cloud host 2;

步骤5,Nova compute服务一方面创建新云主机1、新云主机2的网卡、磁盘等信息,一方面监听PCI/PCIe设备绑定的事件状态,等等PIC/PCIe绑定完成;Step 5: On the one hand, the Nova compute service creates information such as the network card and disk of the new cloud host 1 and new cloud host 2;

步骤6,Nova controller服务把PCI/PCIe设备绑定事件状态发送给Nova compute服务;Step 6, the Nova controller service sends the PCI/PCIe device binding event status to the Nova compute service;

步骤7,Nova compute接收到PCI/PCIe设备绑定事件成功后,生成新云主机1和新云主机2的xml文件,启动新云主机1和云主机2,从而将计算节点1上的云主机1和云主机2疏散到计算节点2上,并且在计算节点2 上的新云主机1绑定计算节点2的GPU,新云主机2绑定计算节点2的SSD。Step 7: After Nova compute receives the PCI/PCIe device binding event successfully, it generates the xml files of the new cloud host 1 and the new cloud host 2, starts the new cloud host 1 and the cloud host 2, and then connects the cloud hosts on the computing node 1. 1 and the cloud host 2 are evacuated to the computing node 2, and the new cloud host 1 on the computing node 2 is bound to the GPU of the computing node 2, and the new cloud host 2 is bound to the SSD of the computing node 2.

此时,对于计算节点1而言其GPU和SSD均以被释放,避免了运维人员清理数据库中云主机1和云主机2与PCI/PCIe设备的绑定关系数据,新云主机1和新云主机2能够正常运行云主机1和云主机2内的应用程序,保证了云平台业务的正常处理;当计算节点1故障恢复后支持重新对GPU 和SSD进行使用。At this time, for computing node 1, its GPU and SSD are released, which avoids the operation and maintenance personnel from cleaning up the binding relationship data between cloud host 1 and cloud host 2 and PCI/PCIe devices in the database. The cloud host 2 can normally run the applications in the cloud host 1 and the cloud host 2, which ensures the normal processing of cloud platform services; when the computing node 1 recovers from a fault, the GPU and SSD can be used again.

在又一个实施例中,请参照图3所示,本发明一个了一种OpenStack云平台中云主机疏散装置70 ,所述装置具体包括:In yet another embodiment, please refer to FIG. 3 , the present invention provides a cloud host evacuation device 70 in an OpenStack cloud platform, and the device specifically includes:

请求接收模块71 ,用于利用Nova controller服务接收疏散请求,并根据疏散请求确定待疏散云主机;The request receiving module 71 is used for receiving the evacuation request by using the Nova controller service, and determining the cloud host to be evacuated according to the evacuation request;

获取模块72 ,用于利用Nova controller服务和设备管理服务获取待疏散云主机的设备绑定信息,其中,设备绑定信息包括PCI设备和/或PCIe 设备;An obtaining module 72, configured to obtain the device binding information of the cloud host to be evacuated by using the Nova controller service and the device management service, wherein the device binding information includes PCI devices and/or PCIe devices;

筛选模块73 ,用于利用Nova controller服务根据设备绑定信息从待疏散云主机中筛选出绑定设备的云主机;A screening module 73, configured to use the Nova controller service to screen out the cloud hosts bound to the device from the cloud hosts to be evacuated according to the device binding information;

计算节点确定模块74 ,用于利用Nova controller服务、绑定设备的云主机和资源管理服务确定云平台中是否存在可用计算节点;The computing node determination module 74 is used for determining whether there is an available computing node in the cloud platform by using the Nova controller service, the cloud host of the bound device and the resource management service;

新云主机创建模块75 ,用于在存在可用计算节点时,则利用Nova controller服务、设备管理服务和Nova compute服务为绑定设备的云主机在可用计算节点上创建新云主机。The new cloud host creation module 75 is configured to use the Nova controller service, the device management service and the Nova compute service to create a new cloud host on the available computing node for the cloud host bound to the device when there is an available computing node.

需要说明的是,关于OpenStack云平台中云主机疏散装置的具体限定可以参见上文中对于OpenStack云平台中云主机疏散方法的限定,在此不再赘述。上述OpenStack云平台中云主机疏散装置中的各个模块可全部或部分通过软件、硬件及其组合来实现。上述各模块可以硬件形式内嵌于或独立于计算机设备中的处理器中,也可以以软件形式存储于计算机设备中的存储器中,以便于处理器调用执行以上各个模块对应的操作。It should be noted that, for the specific limitation of the cloud host evacuation device in the OpenStack cloud platform, reference may be made to the limitation on the cloud host evacuation method in the OpenStack cloud platform above, which will not be repeated here. Each module in the cloud host evacuation device in the above-mentioned OpenStack cloud platform can be implemented in whole or in part by software, hardware and combinations thereof. The above modules can be embedded in or independent of the processor in the computer device in the form of hardware, or stored in the memory in the computer device in the form of software, so that the processor can call and execute the operations corresponding to the above modules.

根据本发明的另一方面,提供了一种计算机设备,该计算机设备可以是服务器,其内部结构图请参照图4所示。该计算机设备包括通过系统总线连接的处理器、存储器、网络接口和数据库。其中,该计算机设备的处理器用于提供计算和控制能力。该计算机设备的存储器包括非易失性存储介质、内存储器。该非易失性存储介质存储有操作系统、计算机程序和数据库。该内存储器为非易失性存储介质中的操作系统和计算机程序的运行提供环境。该计算机设备的数据库用于存储数据。该计算机设备的网络接口用于与外部的终端通过网络连接通信。该计算机程序被处理器执行时实现以上所述的OpenStack云平台中云主机疏散方法。According to another aspect of the present invention, a computer device is provided, and the computer device may be a server. Please refer to FIG. 4 for an internal structure diagram of the computer device. The computer device includes a processor, memory, a network interface, and a database connected by a system bus. Among them, the processor of the computer device is used to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium, an internal memory. The nonvolatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the execution of the operating system and computer programs in the non-volatile storage medium. The database of the computer device is used to store data. The network interface of the computer device is used to communicate with an external terminal through a network connection. When the computer program is executed by the processor, the above-described cloud host evacuation method in the OpenStack cloud platform is implemented.

根据本发明的又一方面,提供了一种计算机可读存储介质,其上存储有计算机程序,计算机程序被处理器执行时实现以上所述的OpenStack云平台中云主机疏散方法。According to another aspect of the present invention, a computer-readable storage medium is provided, on which a computer program is stored, and when the computer program is executed by a processor, the above-described method for evacuation of a cloud host in an OpenStack cloud platform is implemented.

本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,所述的计算机程序可存储于一非易失性计算机可读取存储介质中,该计算机程序在执行时,可包括如上述各方法的实施例的流程。其中,本申请所提供的各实施例中所使用的对存储器、存储、数据库或其它介质的任何引用,均可包括非易失性和/或易失性存储器。非易失性存储器可包括只读存储器(ROM)、可编程ROM(PROM)、电可编程ROM(EPROM)、电可擦除可编程ROM (EEPROM)或闪存。易失性存储器可包括随机存取存储器(RAM)或者外部高速缓冲存储器。作为说明而非局限,RAM以多种形式可得,诸如静态RAM(SRAM)、动态RAM(DRAM)、同步DRAM(SDRAM)、双数据率SDRAM(DDRSDRAM)、增强型SDRAM(ESDRAM)、同步链路(Synchlink)DRAM(SLDRAM)、存储器总线(Rambus)直接RAM(RDRAM)、直接存储器总线动态RAM(DRDRAM)、以及存储器总线动态RAM(RDRAM)等。Those of ordinary skill in the art can understand that all or part of the processes in the methods of the above embodiments can be implemented by instructing relevant hardware through a computer program, and the computer program can be stored in a non-volatile computer-readable storage In the medium, when the computer program is executed, it may include the processes of the above-mentioned method embodiments. Wherein, any reference to memory, storage, database or other medium used in the various embodiments provided in this application may include non-volatile and/or volatile memory. Nonvolatile memory may include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory. Volatile memory may include random access memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in various forms such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain Road (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.

以上实施例的各技术特征可以进行任意的组合,为使描述简洁,未对上述实施例中的各个技术特征所有可能的组合都进行描述,然而,只要这些技术特征的组合不存在矛盾,都应当认为是本说明书记载的范围。The technical features of the above embodiments can be combined arbitrarily. In order to make the description simple, all possible combinations of the technical features in the above embodiments are not described. However, as long as there is no contradiction in the combination of these technical features It is considered to be the range described in this specification.

以上所述实施例仅表达了本申请的几种实施方式,其描述较为具体和详细,但并不能因此而理解为对发明专利范围的限制。应当指出的是,对于本领域的普通技术人员来说,在不脱离本申请构思的前提下,还可以做出若干变形和改进,这些都属于本申请的保护范围。因此,本申请专利的保护范围应以所附权利要求为准。The above-mentioned embodiments only represent several embodiments of the present application, and the descriptions thereof are specific and detailed, but should not be construed as a limitation on the scope of the invention patent. It should be pointed out that for those skilled in the art, without departing from the concept of the present application, several modifications and improvements can be made, which all belong to the protection scope of the present application. Therefore, the scope of protection of the patent of the present application shall be subject to the appended claims.

Claims (10)

1. A cloud host evacuation method in an OpenStack cloud platform is characterized by comprising the following steps:
receiving an evacuation request by utilizing a Nova controller service, and determining a cloud host to be evacuated according to the evacuation request;
acquiring device binding information of the cloud host to be evacuated by using the Nova controller service and the device management service, wherein the device binding information comprises PCI (peripheral component interconnect) devices and/or PCIe (peripheral component interconnect express) devices;
screening out cloud hosts bound with equipment from the cloud hosts to be evacuated by utilizing Nova controller service according to the equipment binding information;
determining whether available computing nodes exist in the cloud platform by using Nova controller service, a cloud host of a binding device and resource management service;
if an available computing node exists, creating a new cloud host on the available computing node for the cloud host of the bound device using the Nova controller service, the device management service, and the Nova computer service.
2. The method of claim 1, further comprising:
and evacuating the remaining cloud hosts to be evacuated by utilizing the Nova evacuate service.
3. The method as claimed in claim 1, wherein the step of obtaining the device binding information of the cloud host to be evacuated by using the Nova controller service and the device management service comprises:
the Nova controller service calls the equipment management service to acquire equipment binding information of the cloud host to be evacuated;
the device management service returns the device binding information to the Nova controller service.
4. The method of claim 1, wherein the step of determining whether there are available computing nodes in the cloud platform using the Nova controller service, the cloud host of the bound device, and the resource management service comprises:
the Nova controller service acquires configuration information of a cloud host of the binding device, wherein the configuration information comprises CPU information, memory information and disk information;
determining resources required by a new cloud host according to the configuration information and the equipment binding information of the cloud host of the binding equipment;
the Nova controller service calls the resource management service to inquire whether available computing nodes meeting the resources required by the new cloud host exist in the cloud platform.
5. The method of claim 1, wherein the step of creating a new cloud host on the available compute node for the cloud host of the bound device using the Nova controller service, the device management service, and the Nova computer service comprises:
calling the relationship between the new cloud host and the equipment bound by the equipment management service by using the Nova controller service, and sending the state of an equipment bound event to the Nova controller service by using the equipment management service;
calling Nova computer service by using the Nova controller service to create a new cloud host, and monitoring the state of a device binding event by using the Nova computer service;
sending the state of a device binding event to the Nova computer service by utilizing a Nova controller service;
and if the Nova computer service monitors that the equipment binding event is successful, generating an xml file of the new cloud host by using the Nova computer service, and starting the new cloud host.
6. The method of claim 1, further comprising:
and if no available computing node exists, generating cloud host evacuation exception information by utilizing the Nova controller service.
7. The method of claim 1, wherein the PCI device and/or PCIe device comprises: GPU, FPGA, NVMe, and SSD.
8. A cloud host evacuation device in an OpenStack cloud platform, the device comprising:
the request receiving module is used for receiving an evacuation request by utilizing a Nova controller service and determining a cloud host to be evacuated according to the evacuation request;
an obtaining module, configured to obtain device binding information of the cloud host to be evacuated by using the Nova controller service and the device management service, where the device binding information includes a PCI device and/or a PCIe device;
the screening module is used for screening out the cloud host bound with the equipment from the cloud host to be evacuated according to the equipment binding information by utilizing Nova controller service;
the computing node determining module is used for determining whether available computing nodes exist in the cloud platform by using Nova controller service, a cloud host of the binding equipment and resource management service;
and a new cloud host creation module, configured to, when there is an available computing node, create a new cloud host on the available computing node for the cloud host of the bound device using the Nova controller service, the device management service, and the Nova computer service.
9. A computer device, comprising:
at least one processor; and
a memory storing a computer program operable in the processor, the processor when executing the program performing the method of any of claims 1-7.
10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, is adapted to carry out the method of any one of claims 1 to 7.
CN202010723780.8A 2020-07-24 2020-07-24 Cloud host evacuation method, device, device and storage medium in OpenStack cloud platform Active CN112003895B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010723780.8A CN112003895B (en) 2020-07-24 2020-07-24 Cloud host evacuation method, device, device and storage medium in OpenStack cloud platform

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010723780.8A CN112003895B (en) 2020-07-24 2020-07-24 Cloud host evacuation method, device, device and storage medium in OpenStack cloud platform

Publications (2)

Publication Number Publication Date
CN112003895A CN112003895A (en) 2020-11-27
CN112003895B true CN112003895B (en) 2022-05-13

Family

ID=73468126

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010723780.8A Active CN112003895B (en) 2020-07-24 2020-07-24 Cloud host evacuation method, device, device and storage medium in OpenStack cloud platform

Country Status (1)

Country Link
CN (1) CN112003895B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113806019B (en) * 2021-09-15 2024-02-23 济南浪潮数据技术有限公司 Method for binding and unbinding PMEM equipment in OpenStack cloud platform
CN113992574B (en) * 2021-09-30 2023-04-25 济南浪潮数据技术有限公司 Method, system and equipment for setting router binding node priority
CN114153555A (en) * 2021-10-31 2022-03-08 山东海量信息技术研究院 A cloud platform PMEM device management method, system, device and storage medium
CN115174407B (en) * 2022-06-17 2024-06-04 上海仪电(集团)有限公司中央研究院 Bandwidth dynamic allocation method and system based on private cloud environment

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109522145A (en) * 2018-11-14 2019-03-26 江苏鸿信系统集成有限公司 A kind of virtual-machine fail automatic recovery system and its method
CN110908832A (en) * 2019-10-24 2020-03-24 烽火通信科技股份有限公司 Virtual machine fault evacuation method and system for cloud platform and computer readable medium

Also Published As

Publication number Publication date
CN112003895A (en) 2020-11-27

Similar Documents

Publication Publication Date Title
CN112003895B (en) Cloud host evacuation method, device, device and storage medium in OpenStack cloud platform
CN108600029B (en) A configuration file updating method, device, terminal device and storage medium
CN109683826B (en) Capacity expansion method and device for distributed storage system
US8966318B1 (en) Method to validate availability of applications within a backup image
CN103677967B (en) A kind of remote date transmission system of data base and method for scheduling task
WO2020248507A1 (en) Container cloud-based system resource monitoring method and related device
CN102622298A (en) Software testing system and method
CN112035062B (en) Migration method of local storage of cloud computing, computer equipment and storage medium
WO2024016624A1 (en) Multi-cluster access method and system
WO2021184587A1 (en) Prometheus-based private cloud monitoring method and apparatus, and computer device and storage medium
WO2021169275A1 (en) Sdn network device access method and apparatus, computer device, and storage medium
CN106452836B (en) Master node setting method and device
CN103401764A (en) Method and device for sending mails
CN114090179A (en) Migration method and device of stateful service and server
CN114900449A (en) Resource information management method, system and device
CN102385536A (en) Method and system for realization of parallel computing
US8468386B2 (en) Detecting and recovering from process failures
US12386649B2 (en) Server maintenance control device, server maintenance system, server maintenance control method, and program
CN118677755A (en) Distributed storage service processing method and device and distributed storage system
CN114598604A (en) Monitoring method, monitoring device and terminal for virtual network function instance information
CN118646753A (en) Cloud host creation method, device and OpenStack cloud platform including MinIO application
CN108154343B (en) Emergency processing method and system for enterprise-level information system
CN114172903A (en) Node expansion method, device, equipment and medium of slurm scheduling system
CN114443354A (en) File recovery method and device, electronic equipment and storage medium
CN112433860B (en) Event management method, system, equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP03 Change of name, title or address

Address after: Building 9, No.1, guanpu Road, Guoxiang street, Wuzhong Economic Development Zone, Wuzhong District, Suzhou City, Jiangsu Province

Patentee after: Suzhou Yuannao Intelligent Technology Co.,Ltd.

Country or region after: China

Address before: Building 9, No.1, guanpu Road, Guoxiang street, Wuzhong Economic Development Zone, Wuzhong District, Suzhou City, Jiangsu Province

Patentee before: SUZHOU LANGCHAO INTELLIGENT TECHNOLOGY Co.,Ltd.

Country or region before: China