CN110737510B

CN110737510B - block device management system

Info

Publication number: CN110737510B
Application number: CN201911011188.9A
Authority: CN
Inventors: 苗科展; 唐舜
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2019-10-23
Filing date: 2019-10-23
Publication date: 2022-07-05
Anticipated expiration: 2039-10-23
Also published as: CN110737510A

Abstract

The embodiments of the present disclosure disclose a block device management system. The block device management system includes: a client and an application program interface; the block device management system further includes: an executor configured to submit tasks to the block device service manager module and query submitted tasks, obtain tasks from an in-memory database and report tasks state; and an in-memory database configured to perform metadata storage, management jobs and tasks for device snapshots and data volumes. The block device management system improves the fault tolerance of block device management on the virtual machine side and reduces operation and maintenance costs.

Description

block device management system

技术领域technical field

本公开涉及计算机技术领域，具体涉及云计算技术领域，尤其涉及块设备管理系统。The present disclosure relates to the field of computer technologies, in particular to the field of cloud computing technologies, and in particular to a block device management system.

背景技术Background technique

云计算中，虚拟机的虚拟化技术包括三部分：计算虚拟化、存储虚拟化和网络虚拟化。存储虚拟化方面，业界逐渐采用基于分布式存储系统实现块设备虚拟化的方式，对块设备的管理包括两部分内容：存储侧块设备数据管理，包括云存储块数据在分布式文件系统中的管理；以及虚拟机侧块设备管理，包括块设备的生命周期与各项操作管理。In cloud computing, virtual machine virtualization technology includes three parts: computing virtualization, storage virtualization and network virtualization. In terms of storage virtualization, the industry gradually adopts the method of realizing block device virtualization based on distributed storage systems. The management of block devices includes two parts: data management of block devices on the storage side, including cloud storage block data in distributed file systems. management; and block device management on the virtual machine side, including the life cycle and operation management of block devices.

其中，虚拟机侧块设备管理涉及虚拟机与块设备两种资源的关系管理。例如块设备在虚拟机的绑定与解绑，如果不能准确同步两者分别的状态，容易造成虚拟机与块设备元数据不一致；块设备单独使用时，存在较多操作，不同操作请求需要具备良好的容错机制。The block device management on the virtual machine side involves relationship management between two resources, the virtual machine and the block device. For example, when a block device is bound and unbound in a virtual machine, if the respective states of the two cannot be accurately synchronized, the metadata of the virtual machine and the block device may be inconsistent; when the block device is used alone, there are many operations, and different operation requests need to have Good fault tolerance mechanism.

具体地，目前虚拟机侧块设备的管理实现方案是OpenStack中块设备管理系统(Cinder)模块。Cinder模块包括五个组件：应用程序接口(Cinder-API)、调度器(Cinder-Scheduler)、本地代理节点(Cinder-Volume)、关系型数据库管理系统(MySQL)和客户端(Cinder Client)。模块之间通信使用AMQP协议，技术实现为面向对象的消息中间件(Qpid)。Specifically, the current implementation solution for managing block devices on the virtual machine side is a block device management system (Cinder) module in OpenStack. The Cinder module includes five components: application program interface (Cinder-API), scheduler (Cinder-Scheduler), local agent node (Cinder-Volume), relational database management system (MySQL) and client (Cinder Client). The communication between modules uses the AMQP protocol, and the technology is implemented as object-oriented message middleware (Qpid).

以下分别说明Cinder模块的五个组件：The five components of the Cinder module are described below:

应用程序接口用于接受客户端与计算服务器(Nova Computer)(导入客户端模块)发起的HTTP请求，进行用户身份校验与消息分发，其使用Web服务器网关接口(WSGI)服务提供Web服务，操作核心资源和扩展资源，涉及与数据库(DB)、数据卷(Volume，指块设备)和调度器(Scheduler)的交互。The application program interface is used to accept the HTTP request initiated by the client and the computing server (Nova Computer) (importing the client module) to perform user identity verification and message distribution. It uses the Web Server Gateway Interface (WSGI) service to provide Web services, operating Core resources and extended resources, involving the interaction with the database (DB), data volume (Volume, referring to block devices) and scheduler (Scheduler).

调度器用于处理不指定后端服务时的消息，为该消息调度一个符合条件的后端服务，调度时包含过滤器(Filters)与加权(Weighting)两个过程。调度器也是一个WSGI服务，启动后从Qpid处消费应用程序接口生产的调度消息。The scheduler is used to process the message when the backend service is not specified, and schedule a qualified backend service for the message. The scheduling includes two processes of filter (Filters) and weighting (Weighting). The scheduler is also a WSGI service that consumes scheduling messages produced by the API from Qpid after startup.

本地代理节点针对不同类型的块设备，无论使用网络文件系统(NFS)还是iSCSI协议连接，块设备管理系统的数据卷根据数据卷的类型加载对应的驱动程序(Driver)进行适配。请求包括数据卷的添加、删除与扩展、挂载与卸载、快照的创建与删除。For different types of block devices, the local proxy node loads the corresponding driver (Driver) according to the type of the data volume for adaptation regardless of whether it is connected using the Network File System (NFS) or iSCSI protocol. Requests include adding, deleting and expanding data volumes, mounting and unmounting, and creating and deleting snapshots.

MySQL，Cinder以数据库存储运行时的数据，包括后端服务、数据卷相关数据、镜像相关数据、配额相关数据、服务质量(Qos)与鉴权相关数据。使用对象关系映射(ORM)框架SQLAlchemy连接与抽象(驱动引擎)后端不同数据库系统，并基于Cinder/Cinder/db/api.py规定的统一操作接口封装SQL工具包，运行时以懒加载的方式加载对应数据库的驱动类。(Cinder/db/sqlalchemy)。MySQL and Cinder use databases to store runtime data, including back-end services, data volume-related data, mirror-related data, quota-related data, quality of service (QoS), and authentication-related data. Use the Object Relational Mapping (ORM) framework SQLAlchemy to connect and abstract (drive engine) back-end different database systems, and encapsulate the SQL toolkit based on the unified operation interface specified by Cinder/Cinder/db/api.py, and load it lazily at runtime Load the driver class corresponding to the database. (Cinder/db/sqlalchemy).

客户端，封装HTTP请求与提供命令行方式访问应用程序接口，提供Shell与Python模块两种使用方式。Client, encapsulates HTTP requests and provides command-line access to the application program interface, and provides two ways of using Shell and Python modules.

然而，上述的虚拟机侧块设备的管理实现方案存在以下问题：However, the above-mentioned implementation solution for managing block devices on the virtual machine side has the following problems:

首先是服务间通信问题。Cinder与外部交互统一采用HTTP协议，内部统一采用AMQP(异步与同步通信并存)。受限于Qpid模块，应用程序接口与数据卷间Qpid消息发生超时时定位成本比较高。内部通信也缺乏良好的重试容错机制，一旦通信失败，即业务有感。The first is the problem of inter-service communication. Cinder uses HTTP protocol for external interaction, and AMQP (coexistence of asynchronous and synchronous communication) internally. Limited by the Qpid module, the positioning cost is relatively high when the Qpid message between the application program interface and the data volume times out. Internal communication also lacks a good retry fault tolerance mechanism. Once the communication fails, the business will feel it.

其次是资源状态的同步问题。虚拟机挂载与卸载数据卷的过程是事务过程，涉及到两个模块中(nova与Cinder)两种资源的状态机流转(元数据的更新)。当一种资源状态机流转被异常打断时，会触发另一个种资源的状态回滚过程，最终需要确保两种资源状态的正确关联，模块间同步成本比较高。The second is the synchronization of resource state. The process of mounting and unmounting data volumes of a virtual machine is a transaction process, which involves the state machine flow (metadata update) of two resources in two modules (nova and Cinder). When the flow of one resource state machine is interrupted abnormally, it will trigger the state rollback process of another resource. In the end, it is necessary to ensure the correct association of the two resource states, and the cost of synchronization between modules is relatively high.

再次是功能边界的问题。当前环境中虚拟机管理服务与块设备服务中已经完成了部分块设备的管理工作。例如虚拟机管理服务中中已经实现了不同后端系统盘的挂载与卸载过程，例如本地系统盘、云盘。块设备服务中已经实现了数据卷的创建、删除、扩展，数据卷的挂载、卸载，快照的创建、删除与回滚等操作。Again, it's a matter of functional boundaries. In the current environment, the management of some block devices has been completed in the virtual machine management service and the block device service. For example, the process of mounting and unmounting different back-end system disks, such as local system disks and cloud disks, has been implemented in the virtual machine management service. The block device service has implemented operations such as creating, deleting, and extending data volumes, mounting and unmounting data volumes, and creating, deleting, and rolling back snapshots.

再次是数据库(DB)的版本迭代问题。迁移控制数据库模型的版本(db/sqlalchemy/migrate_repo/versions/中记录每次版本变化引起的详细变更)，使用0xx_xxx的形式线性递增记录版本变化。在多人协作开发与反响移植时容易造成变更记录的冲突与缺失。Once again it is the version iteration problem of the database (DB). Migration controls the version of the database model (detailed changes caused by each version change are recorded in db/sqlalchemy/migrate_repo/versions/), and the version changes are recorded linearly in the form of 0xx_xxx. It is easy to cause conflict and lack of change records in multi-person collaborative development and response porting.

最后是并发性能与外键的问题。Cinder的大多表均存在外键，因此对子表写操作时，会对父表添加“共享锁”，从而影响父表的并发写性能(不影响读性能)。例如在创建数据卷时，会计算磁盘配额的使用然后更新预留信息。两者分别为对用户的磁盘配额使用量信息(quota_usages)与预留(reservations)表的写操作，其中预留表的usage_id是相对磁盘配额使用量信息的外键。一个数据卷创建请求到来时，先对磁盘配额使用量信息进行写操作，然后对预留表进行写操作，此时表存储引擎(InnoDB)对磁盘配额使用量信息进行加锁，阻塞写请求。多个数据卷请求到来时，上述情况下容易造成数据卷创建慢甚至超时的情况。Finally, there is the issue of concurrency performance and foreign keys. Most of Cinder's tables have foreign keys, so when writing to the child table, a "shared lock" will be added to the parent table, thereby affecting the concurrent write performance of the parent table (without affecting the read performance). For example, when a data volume is created, the disk quota usage is calculated and the reservation information is updated. The two are the write operations to the user's disk quota usage information (quota_usages) and reservations (reservations) table, where the usage_id of the reservation table is a foreign key to the relative disk quota usage information. When a data volume creation request arrives, the disk quota usage information is written first, and then the reserved table is written. At this time, the table storage engine (InnoDB) locks the disk quota usage information and blocks the write request. When multiple data volume requests arrive, the above situation may easily lead to slow or even timeout of data volume creation.

发明内容SUMMARY OF THE INVENTION

本公开实施例提供了块设备管理系统。Embodiments of the present disclosure provide a block device management system.

第一方面，本公开实施例提供了一种块设备管理系统，包括：客户端、应用程序接口；块设备管理系统还包括：执行器，被配置成向块设备服务管理器模块提交任务和查询已提交的任务，从内存数据库获取任务和汇报任务状态；以及内存数据库，被配置成执行设备快照和数据卷的元数据存储、管理作业与任务。In a first aspect, an embodiment of the present disclosure provides a block device management system, including: a client and an application program interface; the block device management system further includes: an executor configured to submit tasks and queries to the block device service manager module Submitted tasks, fetching tasks and reporting task status from an in-memory database; and an in-memory database, configured to perform device snapshots and metadata storage for data volumes, and manage jobs and tasks.

在一些实施例中，内存数据库进一步被配置成：将上游的各类请求拆分为任务列表，并将任务列表分配至执行器上执行；执行器进一步被配置成：基于从内存数据库分配的任务列表中的任务Id，追踪任务Id所指示的任务在客户端中的状态，包括：若任务Id所指示的任务尚未开始，向块设备服务管理器提交任务Id所指示的任务，并追踪任务Id所指示的任务的状态；若任务Id所指示的任务已经执行/完成，在执行器中更新任务Id所指示的任务的状态；以及定时向内存数据库汇报心跳，将执行器内存中的任务Id所指示的任务的状态同步至内存数据库。In some embodiments, the in-memory database is further configured to: split various upstream requests into task lists, and assign the task lists to the executors for execution; the executors are further configured to: based on the tasks allocated from the in-memory database The task ID in the list tracks the status of the task indicated by the task ID in the client, including: if the task indicated by the task ID has not started, submit the task indicated by the task ID to the block device service manager, and track the task ID The status of the indicated task; if the task indicated by the task Id has been executed/completed, update the status of the task indicated by the task Id in the executor; and periodically report the heartbeat to the memory database, and the task Id in the memory of the executor is reported. The status of the indicated task is synchronized to the in-memory database.

在一些实施例中，执行器中的任务的状态包括：任务状态标识和任务动作标识；其中，任务状态标识包括：未执行、执行中、执行失败、执行成功；任务动作标识包括：提交任务、检查任务、空。In some embodiments, the state of the task in the executor includes: a task state identifier and a task action identifier; wherein, the task state identifier includes: not executed, in execution, execution failed, and executed successfully; the task action identifier includes: submit the task, Check tasks, empty.

在一些实施例中，执行器进一步被配置成：在提交任务Id所指示的任务之前，设置任务Id所指示的任务的任务状态标识为未执行，设置任务Id所指示的任务的任务动作标识为空；从未执行队列获取任务，设置获取的任务的任务动作标识为提交任务，采用新建的协程调用块设备服务客户端向块设备服务管理器提交获取的任务；响应于块设备服务客户端返回的提交结果指示任务提交成功，将提交结果所指示的任务从未执行队列移入执行中任务队列，更改提交结果所指示的任务的任务状态标识为执行中，更改提交结果所指示的任务的任务动作标识为空；响应于块设备服务客户端返回的提交结果指示任务提交失败，将提交结果所指示的任务从未执行队列移入退场/失败队列，重试向块设备服务管理器提交提交结果所指示的任务；响应于内存数据库返回的心跳响应中指示任务已被块设备服务消费但任务的状态还未更新、未处理任务队列中不存在该任务，向未处理任务队列中添加该任务。In some embodiments, the executor is further configured to: before submitting the task indicated by the task Id, set the task status identifier of the task indicated by the task Id to not executed, and set the task action identifier of the task indicated by the task Id to be Empty; the task is acquired from an unexecuted queue, and the task action flag of the acquired task is set to submit the task, and the newly created coroutine is used to call the block device service client to submit the acquired task to the block device service manager; in response to the block device service client The returned submission result indicates that the task was submitted successfully, move the task indicated by the submission result from the unexecuted queue to the executing task queue, change the task status of the task indicated by the submission result to be under execution, and change the task indicated by the submission result. The action flag is empty; in response to the submission result returned by the block device service client indicating that the task submission failed, move the task indicated by the submission result from the unexecuted queue to the exit/failure queue, and retry submitting the submission result to the block device service manager. The indicated task; in response to the heartbeat response returned by the in-memory database indicating that the task has been consumed by the block device service but the status of the task has not been updated, and the task does not exist in the unprocessed task queue, add the task to the unprocessed task queue.

在一些实施例中，响应于块设备服务客户端返回的提交结果指示任务提交失败，将任务从未执行队列移入退场/失败队列，重试向块设备服务管理器提交任务包括：响应于块设备服务客户端返回的提交结果指示任务提交失败、提交结果所指示的任务为创建资源类任务且重试向块设备服务管理器提交提交结果所指示的任务的次数小于预定阈值，将提交结果所指示的任务从未执行队列移入退场/失败队列，重试向块设备服务管理器提交提交结果所指示的任务。In some embodiments, in response to the submission result returned by the block device service client indicating that the task submission failed, moving the task from the unexecuted queue to the exit/failure queue, and retrying to submit the task to the block device service manager includes: responding to the block device service manager The submission result returned by the service client indicates that the task submission failed, the task indicated by the submission result is a task of creating a resource class, and the number of retries to submit the task indicated by the submission result to the block device service manager is less than the predetermined threshold, and the number of times indicated by the submission result is less than the predetermined threshold. Moves the tasks from the unexecuted queue to the exit/failure queue, retrying the task indicated by the commit result to the block device service manager.

在一些实施例中，执行器进一步被配置成以下至少一项：响应于在提交任务之后执行定时轮询任务，设置被提交的任务的任务动作标识为检查任务，并向块设备服务管理器发送对被提交的任务的查询请求；响应于接收到块设备服务管理器基于查询请求返回的查询结果为任务执行中，保持被提交的任务的任务状态标识不变，设置被提交的任务的任务动作标识为空；响应于接收到块设备服务管理器基于查询请求返回的查询结果为任务执行成功，设置被提交的任务的任务状态标识为执行成功，设置被提交的任务的任务动作标识为空；响应于接收到块设备服务管理器基于查询请求返回的查询结果为任务执行失败，设置被提交的任务的任务状态标识为执行失败，设置被提交的任务的任务动作标识为空。In some embodiments, the executor is further configured to at least one of the following: in response to executing the timed polling task after submitting the task, set the task action flag of the submitted task to check the task, and send to the block device service manager A query request for the submitted task; in response to receiving the query result returned by the block device service manager based on the query request that the task is executing, keep the task status identifier of the submitted task unchanged, and set the task action of the submitted task The identification is empty; in response to receiving the query result returned by the block device service manager based on the query request, the task execution is successful, the task status identification of the submitted task is set to be executed successfully, and the task action identification of the submitted task is set to be empty; In response to receiving the query result returned by the block device service manager based on the query request that the task execution fails, the task status flag of the submitted task is set as execution failure, and the task action flag of the submitted task is set as null.

在一些实施例中，执行器进一步被配置成：响应于向块设备服务管理器提交的任务为预设任务且提交预设任务成功，设置被提交的任务的任务状态标识为执行成功，设置被提交的任务的任务动作标识为空；响应于向块设备服务管理器提交的任务为预设任务且提交预设任务失败，设置被提交的任务的任务状态标识为执行失败，设置被提交的任务的任务动作标识为空。In some embodiments, the executor is further configured to: in response to the task submitted to the block device service manager being a preset task and the submission of the preset task is successful, set the task status flag of the submitted task to be executed successfully, and set the task status to be executed successfully. The task action identifier of the submitted task is empty; in response to the task submitted to the block device service manager being a preset task and the submission of the preset task fails, the task status identifier of the submitted task is set as execution failure, and the submitted task is set The task action ID of is empty.

在一些实施例中，执行器进一步被配置成：基于pingpong机制向内存数据库发送心跳请求，心跳请求包括：标识当前请求次数的ping_id，标识是否为初始请求的同步标识，标识执行中/执行成功/执行失败的任务的任务列表；内存数据库，进一步被配置成：向执行器返回心跳响应，心跳响应包括：标识下次请求次数的pong_id，未执行任务/执行中任务，对应ping_id的退场/失败任务。In some embodiments, the executor is further configured to: send a heartbeat request to the in-memory database based on the pingpong mechanism, where the heartbeat request includes: a ping_id identifying the current number of requests, a synchronizing identifier identifying whether it is an initial request, identifying execution/successful execution/ The task list of tasks that failed to execute; the in-memory database is further configured to: return a heartbeat response to the executor, and the heartbeat response includes: pong_id identifying the number of next requests, unexecuted tasks/executing tasks, and exit/failed tasks corresponding to ping_id .

在一些实施例中，执行器进一步被配置成执行以下至少一项：响应于接收心跳响应，更新ping_id为心跳响应中的pong_id，将心跳响应中标识的退场/失败任务移出退场/失败队列，响应于执行器中的未执行队列中不存在心跳响应中的未执行任务，将心跳响应中的未执行任务添加至执行器中的未执行队列；响应于心跳请求发送失败，重新向内存数据库发起心跳请求；响应于心跳请求发送成功但在预设时间内未接收到心跳响应，采用发送成功但未接收到心跳响应的心跳请求的ping_id向内存数据库重新发起心跳请求，重新发送的心跳请求中标识执行中/执行成功/执行失败的任务的任务列表与发送成功但未接收到心跳响应的心跳请求中标识执行中/执行成功/执行失败的任务的任务列表不一定相同。In some embodiments, the executor is further configured to perform at least one of the following: in response to receiving the heartbeat response, updating the ping_id to the pong_id in the heartbeat response, removing the exit/failure task identified in the heartbeat response from the exit/failure queue, responding to If there is no unexecuted task in the heartbeat response in the unexecuted queue in the executor, add the unexecuted task in the heartbeat response to the unexecuted queue in the executor; in response to the failure to send the heartbeat request, re-initiate the heartbeat to the in-memory database Request; in response to the successful sending of the heartbeat request but not receiving the heartbeat response within the preset time, use the ping_id of the heartbeat request that was successfully sent but did not receive the heartbeat response to re-send the heartbeat request to the in-memory database, and the re-sent heartbeat request identifies the execution The task list of tasks in progress/successfully executed/failed to execute is not necessarily the same as the task list of tasks in progress/successfully executed/failed to be identified in the heartbeat request sent successfully but no heartbeat response is received.

在一些实施例中，内存数据库进一步被配置成：将心跳请求中标识执行中/执行成功/执行失败的任务的任务列表缓存至内存，响应于接收到的心跳请求的ping_id与之前接收到的心跳请求的ping_id相同，向执行器返回之前接收到的心跳请求缓存至内存的任务列表。In some embodiments, the in-memory database is further configured to: cache a task list in the heartbeat request that identifies tasks in execution/successful/failed to be executed in the memory, in response to the ping_id of the received heartbeat request and the previously received heartbeat The ping_id of the request is the same, and it returns to the executor the task list of the previously received heartbeat request cached in memory.

在一些实施例中，执行器进一步被配置成执行以下至少一项：采用新建的协程从未执行队列获取任务；采用新建的协程执行每项定时任务，定时任务包括以下至少一项：定时汇报心跳、定时提交任务、定时查询任务状态。In some embodiments, the executor is further configured to execute at least one of the following: use a newly created coroutine to obtain a task from an unexecuted queue; use the newly created coroutine to execute each timing task, and the timing task includes at least one of the following: timing: Report heartbeat, submit tasks regularly, and query task status regularly.

在一些实施例中，客户端和应用程序接口与内存数据库、执行器之间采用RPC通信；客户端与应用程序接口之间采用HTTP通信。In some embodiments, RPC communication is used between the client and the application program interface, the in-memory database and the executor; HTTP communication is used between the client and the application program interface.

在一些实施例中，内存数据库包括：块设备管理系统的服务模块，被配置成接收执行器的心跳请求、接收应用程序接口的数据卷、快照操作请求；块设备管理系统的管理模块，被配置成处理执行器的心跳请求、处理应用程序接口的数据卷、快照操作请求。In some embodiments, the in-memory database includes: a service module of the block device management system, configured to receive heartbeat requests of the executor, data volumes of application programming interfaces, and snapshot operation requests; and a management module of the block device management system, configured to receive It can be used to process heartbeat requests of executors, data volumes of application programming interfaces, and snapshot operation requests.

在一些实施例中，内存数据库还包括：数据卷的服务模块，为PB定义的服务模块、数据卷的元数据的CRUD接口，被配置成接收对数据卷的元数据的操作请求；数据卷的管理模块，由数据库管理器管理，被配置成持久化对数据卷的元数据的操作请求，以及存储与更新数据卷的元数据。In some embodiments, the in-memory database further includes: a service module for the data volume, a service module defined for the PB, and a CRUD interface for metadata of the data volume, configured to receive an operation request for the metadata of the data volume; The management module, managed by the database manager, is configured to persist operation requests for metadata of the data volume, and to store and update the metadata of the data volume.

在一些实施例中，内存数据库还包括：快照的服务模块，为PB定义的服务模块、快照的元数据的CRUD接口，被配置成接收对快照的元数据的操作请求；快照的管理模块，由数据库管理器管理，被配置成持久化对快照的元数据的操作请求，存储与更新快照的元数据。In some embodiments, the in-memory database further includes: a service module for snapshots, a service module defined for PB, a CRUD interface for metadata of snapshots, configured to receive operation requests for metadata of snapshots; a management module for snapshots, composed of The database manager manages, is configured to persist operation requests on the metadata of the snapshot, stores and updates the metadata of the snapshot.

第二方面，本公开实施例提供了一种电子设备/终端/服务器，包括：一个或多个处理器；存储装置，用于存储一个或多个程序；当一个或多个程序被一个或多个处理器执行，使得一个或多个处理器实现如上任一所述的块设备管理系统。In a second aspect, embodiments of the present disclosure provide an electronic device/terminal/server, including: one or more processors; a storage device for storing one or more programs; The processor executes such that the one or more processors implement the block device management system as described in any of the above.

第三方面，本公开实施例提供了一种计算机可读介质，其上存储有计算机程序，该程序被处理器执行时实现如上任一所述的块设备管理系统。In a third aspect, an embodiment of the present disclosure provides a computer-readable medium on which a computer program is stored, and when the program is executed by a processor, implements the block device management system described above.

本公开实施例提供的块设备管理系统，包括：客户端、应用程序接口；块设备管理系统还包括：执行器，被配置成向块设备服务管理器模块提交任务和查询已提交的任务，从内存数据库获取任务和汇报任务状态；以及内存数据库，被配置成执行设备快照和数据卷的元数据存储、管理作业与任务。在这一过程中，由于块设备管理系统在原有的块设备管理系统所包括的客户端、应用程序接口的基础上，新增的执行器(Executor)是无状态组件，因此所有操作支持容错，基于该特性后端系统升级过程可以做到用户无感；同时Executor水平扩展可以支持更大规模的块设备(数据卷与快照)操作请求，使得系统具备良好扩展性，此外，与现有技术相比，该块设备管理系统移除了OpenStack Cinder中的多个组件，使用内存数据库(NovaMaster)替代MySQL组件，移除了外键依赖问题；移除块设备管理系统的本地代理节点与调度器，降低了集群搭建过程部署成本(机器与人力)，增强了块设备管理系统的稳定性，使得面向企业端(ToB)业务的变更更稳定、更透明。在部分实施例中，使用BaiduRPC替代Qpid提高系统间通信的容错能力。The block device management system provided by the embodiments of the present disclosure includes: a client and an application program interface; the block device management system further includes: an executor, configured to submit tasks to the block device service manager module and query the submitted tasks, from An in-memory database that captures tasks and reports task status; and an in-memory database that is configured to perform device snapshots and metadata storage for data volumes, and manage jobs and tasks. In this process, based on the client and application program interfaces included in the original block device management system, the newly added executor (Executor) is a stateless component, so all operations support fault tolerance. Based on this feature, the back-end system upgrade process can be insensitive to users; at the same time, the horizontal expansion of Executor can support larger-scale block device (data volume and snapshot) operation requests, making the system have good scalability. In comparison, the block device management system removes multiple components in OpenStack Cinder, uses an in-memory database (NovaMaster) to replace the MySQL component, and removes the foreign key dependency problem; removes the local agent node and scheduler of the block device management system, It reduces the deployment cost (machine and manpower) of the cluster construction process, enhances the stability of the block device management system, and makes changes to the enterprise-side (ToB) business more stable and transparent. In some embodiments, BaiduRPC is used instead of Qpid to improve the fault tolerance of inter-system communication.

附图说明Description of drawings

通过阅读参照以下附图所作的对非限制性实施例详细描述，本公开的其它特征、目的和优点将会变得更明显：Other features, objects and advantages of the present disclosure will become more apparent upon reading the detailed description of non-limiting embodiments with reference to the following drawings:

图1是本公开可以应用于其中的示例性系统架构图；FIG. 1 is an exemplary system architecture diagram to which the present disclosure may be applied;

图2是根据本公开实施例的块设备管理系统的一个实施例的流程示意图；2 is a schematic flowchart of an embodiment of a block device management system according to an embodiment of the present disclosure;

图3是根据本公开实施例的块设备管理系统的又一个实施例的流程示意图；3 is a schematic flowchart of still another embodiment of a block device management system according to an embodiment of the present disclosure;

图4是本公开的用于评估风险的装置的一个实施例的示例性结构图；FIG. 4 is an exemplary structural diagram of one embodiment of the apparatus for assessing risk of the present disclosure;

图5是适于用来实现本公开实施例的服务器的计算机系统的结构示意图。FIG. 5 is a schematic structural diagram of a computer system suitable for implementing the server of an embodiment of the present disclosure.

具体实施方式Detailed ways

下面结合附图和实施例对本公开作进一步的详细说明。可以理解的是，此处所描述的具体实施例仅仅用于解释相关发明，而非对该发明的限定。另外还需要说明的是，为了便于描述，附图中仅示出了与有关发明相关的部分。The present disclosure will be further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only used to explain the related invention, but not to limit the invention. In addition, it should be noted that, for the convenience of description, only the parts related to the related invention are shown in the drawings.

需要说明的是，在不冲突的情况下，本公开中的实施例及实施例中的特征可以相互组合。下面将参考附图并结合实施例来详细说明本公开。It should be noted that the embodiments of the present disclosure and the features of the embodiments may be combined with each other under the condition of no conflict. The present disclosure will be described in detail below with reference to the accompanying drawings and in conjunction with embodiments.

图1示出了可以应用本公开的块设备管理系统或用于评估风险的装置的实施例的示例性系统架构100。FIG. 1 illustrates an exemplary system architecture 100 of an embodiment of a block device management system or apparatus for assessing risk to which the present disclosure may be applied.

如图1所示，系统架构100可以包括终端设备101、102、103，网络104和服务器105。网络104用以在终端设备101、102、103和服务器105之间提供通信链路的介质。网络104可以包括各种连接类型，例如有线、无线通信链路或者光纤电缆等等。As shown in FIG. 1 , the system architecture 100 may include terminal devices 101 , 102 , and 103 , a network 104 and a server 105 . The network 104 is a medium used to provide a communication link between the terminal devices 101 , 102 , 103 and the server 105 . The network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.

用户可以使用终端设备101、102、103通过网络104与服务器105交互，以接收或发送消息等。终端设备101、102、103上可以安装有各种通讯客户端应用，例如浏览器应用、购物类应用、搜索类应用、即时通信工具、邮箱客户端、社交平台软件等。The user can use the terminal devices 101, 102, 103 to interact with the server 105 through the network 104 to receive or send messages and the like. Various communication client applications may be installed on the terminal devices 101 , 102 and 103 , such as browser applications, shopping applications, search applications, instant communication tools, email clients, social platform software, and the like.

终端设备101、102、103可以是硬件，也可以是软件。当终端设备101、102、103为硬件时，可以是支持浏览器应用的各种电子设备，包括但不限于平板电脑、膝上型便携计算机和台式计算机等等。当终端设备101、102、103为软件时，可以安装在上述所列举的电子设备中。其可以实现成例如用来提供分布式服务的多个软件或软件模块，也可以实现成单个软件或软件模块。在此不做具体限定。The terminal devices 101, 102, and 103 may be hardware or software. When the terminal devices 101, 102, and 103 are hardware, they can be various electronic devices that support browser applications, including but not limited to tablet computers, laptop computers, desktop computers, and the like. When the terminal devices 101, 102, and 103 are software, they can be installed in the electronic devices listed above. It can be implemented, for example, as multiple software or software modules for providing distributed services, or as a single software or software module. There is no specific limitation here.

服务器105可以是提供各种服务的服务器，例如对终端设备101、102、103上进行的浏览器应用提供支持的后台服务器。后台服务器可以对接收到的请求等数据进行分析等处理，并将处理结果反馈给终端设备。The server 105 may be a server that provides various services, such as a background server that provides support for browser applications performed on the terminal devices 101 , 102 , and 103 . The background server can analyze and process the received request and other data, and feed back the processing result to the terminal device.

需要说明的是，服务器可以是硬件，也可以是软件。当服务器为硬件时，可以实现成多个服务器组成的分布式服务器集群，也可以实现成单个服务器。当服务器为软件时，可以实现成例如用来提供分布式服务的多个软件或软件模块，也可以实现成单个软件或软件模块。在此不做具体限定。It should be noted that the server may be hardware or software. When the server is hardware, it can be implemented as a distributed server cluster composed of multiple servers, or can be implemented as a single server. When the server is software, it may be implemented as multiple software or software modules for providing distributed services, or may be implemented as a single software or software module. There is no specific limitation here.

在实践中，本公开实施例所提供的块设备管理系统可以设置于终端设备101、102、103和/或服务器105、106中。In practice, the block device management system provided by the embodiments of the present disclosure may be provided in the terminal devices 101 , 102 , and 103 and/or the servers 105 and 106 .

应该理解，图1中的终端设备、网络和服务器的数目仅仅是示意性的。根据实现需要，可以具有任意数目的终端设备、网络和服务器。It should be understood that the numbers of terminal devices, networks and servers in FIG. 1 are merely illustrative. There can be any number of terminal devices, networks and servers according to implementation needs.

继续参考图2，图2示出了根据本公开的块设备管理系统的一个实施例的示例性结构图200。该块设备管理系统200包括：客户端210、应用程序接口220，还包括：执行器230，被配置成向块设备服务管理器模块提交任务和查询已提交的任务，从内存数据库获取任务和汇报任务状态；以及内存数据库240，被配置成执行设备快照和数据卷的元数据存储、管理作业与任务。Continuing to refer to FIG. 2, FIG. 2 illustrates an exemplary structural diagram 200 of one embodiment of a block device management system according to the present disclosure. The block device management system 200 includes: a client 210, an application program interface 220, and further includes: an executor 230, configured to submit tasks to the block device service manager module and query submitted tasks, and obtain tasks and reports from an in-memory database task status; and an in-memory database 240 configured to perform metadata storage, management jobs and tasks for device snapshots and data volumes.

在本实施例中，块设备管理系统的执行主体(例如图1所示的终端或服务器)可以设置客户端210、应用程序接口220、执行器(Executer)230和内存数据库(NovaMaster)240。In this embodiment, the execution body of the block device management system (for example, the terminal or the server shown in FIG. 1 ) may be provided with a client 210 , an application program interface 220 , an executor (Executer) 230 and an in-memory database (NovaMaster) 240 .

其中，块设备管理系统(Cinder)，是在虚拟机和具体存储设备之间引入了一层“逻辑存储卷”的抽象，Cinder本身并不是一种存储技术，只是提供一个中间的抽象层，Cinder通过调用不同存储后端类型的驱动接口来管理相对应的后端存储，为用户提供统一的卷相关操作的存储接口。Among them, the block device management system (Cinder) introduces a layer of "logical storage volume" abstraction between virtual machines and specific storage devices. Cinder itself is not a storage technology, but only provides an intermediate abstraction layer. Cinder By calling the driver interfaces of different storage backend types to manage the corresponding backend storage, it provides users with a unified storage interface for volume-related operations.

其中，客户端(Cinder Client，也即图中的Cinder客户端)210被配置成：客户端封装HTTP请求与提供命令行方式访问应用程序接口，提供Shell与Python模块两种使用方式。The client (Cinder Client, that is, the Cinder client in the figure) 210 is configured such that the client encapsulates an HTTP request and provides a command-line access to the application program interface, and provides two usage modes of Shell and Python modules.

应用程序接口220用于接受客户端与Nova Computer(导入客户端模块)发起的HTTP请求，进行用户身份校验与消息分发，其使用WSGI Service提供Web服务，操作核心资源和扩展资源，涉及与数据库、数据卷和调度器的交互。当前应用程序接口对外提供接口如下表所示：The application program interface 220 is used to accept the HTTP request initiated by the client and Nova Computer (import client module), perform user identity verification and message distribution, and use WSGI Service to provide Web services, operate core resources and extended resources, and relate to the database. , data volume and scheduler interaction. The current application program interface provides external interfaces as shown in the following table:

上述接口涉及更新元数据与调用块设备服务两部分功能。更新元数据无需通过现有技术中的块设备管理系统的数据卷；非tgt挂载数据卷操作最终为向块设备服务提交Job(对于更复杂或更长时间运行的操作，使用Jobs代替tasks)请求，此时块设备管理系统的数据卷为客户端角色。因此按照功能划分产生两类角色：元数据存储与任务管理。任务管理需要一个任务提交与监控模块，可以采用执行器230实现，而元数据存储功能可由内存数据库240提供。其中，执行器230可以采用执行器管理模块来完成提交任务与监控模块。The above interface involves two functions: updating metadata and calling block device services. Updating metadata does not need to manage the system's data volume through the block device in the prior art; the non-tgt mount data volume operation ultimately submits a Job to the block device service (for more complex or longer-running operations, use Jobs instead of tasks) At this time, the data volume of the block device management system is the client role. Therefore, there are two types of roles based on functional division: metadata storage and task management. Task management requires a task submission and monitoring module, which can be implemented by the executor 230 , and the metadata storage function can be provided by the in-memory database 240 . The executor 230 may use the executor management module to complete the submission task and the monitoring module.

因此，上述执行主体可以在保留块设备管理系统的客户端210与应用程序接口220的基础上，设置执行器230模块提交与监控已提交的执行任务；设置内存数据库240执行设备快照与卷的元数据、管理任务。Therefore, on the basis of retaining the client 210 and the application program interface 220 of the block device management system, the above-mentioned execution body can set the executor 230 module to submit and monitor the submitted execution tasks; set the in-memory database 240 to execute the device snapshot and volume metadata data, management tasks.

本申请上述实施例所示的块设备管理系统，新增的执行器(Executor)是无状态组件，因此所有操作支持容错，基于该特性后端系统升级过程可以做到用户无感；同时Executor水平扩展可以支持更大规模的块设备(数据卷与快照)操作请求，使得系统具备良好扩展性，此外，与现有技术相比，该块设备管理系统移除了OpenStack Cinder中的多个组件，使用内存数据库替代MySQL组件，移除了外键依赖问题；移除块设备管理系统的本地代理节点与调度器，降低了集群搭建过程部署成本(机器与人力)，增强了块设备管理系统的稳定性，使得面向企业端(ToB)业务的变更更稳定、更透明。In the block device management system shown in the above embodiments of this application, the newly added executor (Executor) is a stateless component, so all operations support fault tolerance. Based on this feature, the back-end system upgrade process can be insensitive to users; at the same time, the Executor level The expansion can support larger-scale block device (data volume and snapshot) operation requests, making the system have good scalability. In addition, compared with the existing technology, the block device management system removes multiple components in OpenStack Cinder, Using an in-memory database to replace the MySQL component removes the foreign key dependency problem; removes the local agent node and scheduler of the block device management system, reduces the deployment cost (machine and manpower) of the cluster building process, and enhances the stability of the block device management system This makes changes to business-oriented (ToB) services more stable and transparent.

在上述实施例的一些可选实现方式中，内存数据库进一步被配置成：将上游的各类请求拆分为任务列表，并将任务列表分配至执行器上执行；执行器进一步被配置成：基于从内存数据库分配的任务列表中的任务Id，追踪任务Id所指示的任务在客户端中的状态，包括：若任务Id所指示的任务尚未开始，向块设备服务管理器(CDSMaster)提交任务Id所指示的任务，并追踪任务Id所指示的任务的状态；若任务Id所指示的任务已经执行/完成，在执行器中更新任务Id所指示的任务的状态；以及定时向内存数据库汇报心跳，将执行器内存中的任务Id所指示的任务的状态同步至内存数据库。In some optional implementations of the above embodiments, the in-memory database is further configured to: split various upstream requests into task lists, and assign the task lists to the executor for execution; the executor is further configured to: based on Track the status of the task indicated by the task ID in the client from the task ID in the task list allocated by the memory database, including: if the task indicated by the task ID has not started, submit the task ID to the block device service manager (CDSMaster) The indicated task, and track the status of the task indicated by the task Id; if the task indicated by the task Id has been executed/completed, update the status of the task indicated by the task Id in the executor; and periodically report the heartbeat to the in-memory database, Synchronize the status of the task indicated by the taskId in the executor memory to the in-memory database.

在本实现方式中，为了配合执行器的操作以实现相应功能，上述执行主体设置内存数据库将上游的各类请求拆分为子任务，每个子任务会分配到执行器上执行。执行器启动后，会根据分配的任务列表中的任务Id来追踪任务在块设备管理系统的客户端中的状态。若任务尚未开始，向块设备服务管理器(CDSMaster)提交任务，并不断追踪任务状态；若任务已经执行/完成，更新任务在执行器中的状态。执行器会定时向内存数据库汇报，把执行器内存中的任务状态同步即时到内存数据库。这样，采用内存数据库进行任务的拆分，并采用执行器追踪任务在客户端中的状态，移除了外键依赖，增强了块设备管理系统的稳定性。In this implementation manner, in order to cooperate with the operation of the executor to realize the corresponding function, the above-mentioned execution body sets the in-memory database to split various upstream requests into subtasks, and each subtask is assigned to the executor for execution. After the executor is started, it will track the status of the task in the client of the block device management system according to the task ID in the assigned task list. If the task has not started, submit the task to the block device service manager (CDSMaster), and keep track of the task status; if the task has been executed/completed, update the status of the task in the executor. The executor will periodically report to the in-memory database, and synchronize the task status in the executor's memory to the in-memory database in real time. In this way, the in-memory database is used to split tasks, and the executor is used to track the status of tasks in the client, which removes foreign key dependencies and enhances the stability of the block device management system.

在上述实施例的一些可选实现方式中，执行器中的任务的状态包括：任务状态标识和任务动作标识；其中，任务状态标识包括：未执行(ready)、执行中(running)、执行失败(error)、执行成功(exited)；任务动作标识包括：提交任务(submitting)、检查任务(checking)、空(none)。In some optional implementations of the above embodiments, the status of the task in the executor includes: a task status identifier and a task action identifier; wherein, the task status identifier includes: not executing (ready), executing (running), execution failure (error), successfully executed (exited); task action identifiers include: submitting task (submitting), checking task (checking), and empty (none).

在本实现方式中，Executor task有两个字段来标识状态：task_state和action，前者表示task所处状态，合法值有ready(未执行)、running(执行中)、error(执行失败)、exited(执行成功)，后者表示task正在执行的动作，合法值有submitting(正在向Master提交任务)、checking(正在检查任务状态)、none。通过设置任务的状态包括任务状态标识和任务动作标识，可以细化当前任务执行的情况，提高任务的状态数据的准确性。In this implementation, the Executor task has two fields to identify the state: task_state and action, the former indicates the state of the task, and the legal values are ready (not executing), running (executing), error (execution failure), exited ( Successful execution), the latter indicates the action being performed by the task. The legal values are submitting (submitting the task to the Master), checking (checking the task status), and none. By setting the status of the task including the task status flag and the task action flag, the current task execution situation can be refined, and the accuracy of the task status data can be improved.

在本实施例的一些可选实现方式中，执行器进一步被配置成：在提交任务Id所指示的任务之前，设置任务Id所指示的任务的任务状态标识为未执行，设置任务Id所指示的任务的任务动作标识为空；从未执行队列获取任务，设置获取的任务的任务动作标识为提交任务，采用新建的协程调用块设备服务客户端向块设备服务管理器提交获取的任务；响应于块设备服务客户端返回的提交结果指示任务提交成功，将提交结果所指示的任务从未执行队列移入执行中任务队列，更改提交结果所指示的任务的任务状态标识为执行中，更改提交结果所指示的任务的任务动作标识为空；响应于块设备服务客户端返回的提交结果指示任务提交失败，将提交结果所指示的任务从未执行队列移入退场/失败队列，重试向块设备服务管理器提交提交结果所指示的任务；响应于内存数据库返回的心跳响应中指示任务已被块设备服务消费但任务的状态还未更新、未处理任务队列中不存在该任务，向未处理任务队列中添加该任务。In some optional implementations of this embodiment, the executor is further configured to: before submitting the task indicated by the task Id, set the task status flag of the task indicated by the task Id to not executed, and set the task state indicated by the task Id The task action ID of the task is empty; the task is acquired from the unexecuted queue, and the task action ID of the acquired task is set to submit the task, and the newly created coroutine is used to call the block device service client to submit the acquired task to the block device service manager; response If the submission result returned by the block device service client indicates that the task was submitted successfully, move the task indicated by the submission result from the unexecuted queue to the executing task queue, change the task status of the task indicated by the submission result to be under execution, and change the submission result. The task action identifier of the indicated task is empty; in response to the submission result returned by the block device service client indicating that the task submission failed, move the task indicated by the submission result from the unexecuted queue to the exit/failure queue, and retry to the block device service The manager submits the task indicated by the submission result; in response to the heartbeat response returned by the in-memory database, it indicates that the task has been consumed by the block device service but the status of the task has not been updated, and the task does not exist in the unprocessed task queue. add this task.

在本实现方式中，在一次提交任务周期内，执行器会遍历ready_queue，将state＝ready、action＝none的task提交到块设备服务管理器。In this implementation manner, in a task submission cycle, the executor will traverse the ready_queue, and submit the task with state=ready and action=none to the block device service manager.

具体地，执行器所执行的步骤如下：如图3所示，首先，在步骤310中，执行器遍历ready_queue，选出任务状态标识设置为未执行，任务动作标识设置为空(state＝ready，action＝none)的任务，提交之前把任务动作标识设置为提交任务(action置为submitting)；之后，执行器新开一个协程，调用客户端向块设备服务管理器提交任务；之后，接收块设备服务客户端(块设备_tool)返回的提交结果；之后，在步骤320中，若任务提交成功，执行器将任务从ready_queue移入running_queue，将任务状态标识设置为执行中、将任务动作标识设置为空(state＝running，action＝none)；在步骤330中，若任务提交失败(任务对象状态不合法、任务参数有误等)，执行器将任务从ready_queue移入exited_error_queue，更改任务状态标识为执行失败，任务动作标识设置为空(state＝error，action＝none)。Specifically, the steps performed by the executor are as follows: as shown in Figure 3, first, in step 310, the executor traverses the ready_queue, selects the task status flag and sets it as not executed, and sets the task action flag to be empty (state=ready, action=none) task, before submitting, set the task action flag to submitting task (action is set to submitting); after that, the executor opens a new coroutine and calls the client to submit the task to the block device service manager; after that, receives the block The submission result returned by the device service client (block device_tool); then, in step 320, if the task submission is successful, the executor moves the task from the ready_queue into the running_queue, sets the task status flag to executing, and sets the task action flag to is empty (state=running, action=none); in step 330, if the task submission fails (the state of the task object is invalid, the task parameters are wrong, etc.), the executor moves the task from ready_queue to exited_error_queue, and changes the task status flag to execute On failure, the task action flag is set to null (state=error, action=none).

其中，Coroutine(协程)是一种用户态的轻量级线程，特点如下：轻量级线程；非抢占式多任务处理，由协程主动交出控制权；编译器/解释器/虚拟机层面的任务；多个协程可能在一个或多个线程上运行；子程序是协程的一个特例。Among them, Coroutine (coroutine) is a lightweight thread in user mode, with the following characteristics: lightweight thread; non-preemptive multitasking, the coroutine actively hands over control; compiler/interpreter/virtual machine level tasks; multiple coroutines may run on one or more threads; subroutines are a special case of coroutines.

在上述执行器所执行的步骤之外，执行器向客户端提交任务时，可能出现以下异常案例，处理的步骤如下：In addition to the steps performed by the executor above, when the executor submits a task to the client, the following exceptions may occur. The processing steps are as follows:

一方面，在上述提交任务的过程中的前三个步骤，可能切换至另一个协程，该协程可能在心跳汇报、提交其它任务或查询任务状态。其中，若该协程在心跳汇报，内存数据库返回的心跳响应可能指示推送已经被块设备服务(CDS)消费，但执行器中的针对该已被消费的任务的任务状态还未更新，此时执行器将该已被消费的任务加入待执行任务队列时，检查待执行任务队列中是否已存在该已被消费的任务。若该协程在提交其它任务或查询任务状态，对于当前提交任务无影响。On the one hand, in the first three steps in the above process of submitting a task, it is possible to switch to another coroutine, which may report in the heartbeat, submit other tasks or query the status of the task. Among them, if the coroutine is reporting in the heartbeat, the heartbeat response returned by the in-memory database may indicate that the push has been consumed by the block device service (CDS), but the task status of the consumed task in the executor has not been updated. When the executor adds the consumed task to the to-be-executed task queue, it checks whether the consumed task already exists in the to-be-executed task queue. If the coroutine is submitting other tasks or querying the task status, it has no effect on the currently submitted task.

另一方面，在上述提交任务的过程中，可能出现提交任务超时或没有响应的情况，此时由于块设备JOB支持幂等，直接重试提交任务即可。On the other hand, in the above process of submitting a task, there may be a situation where the submitted task times out or there is no response. At this time, since the block device JOB supports idempotency, you can simply retry the submission task.

本实现方式中的执行器提交任务的方法，每次提交动作都会在新的协程里面执行，保证遍历不被阻塞，此外，可以定时提交任务，并且对任务的状态进行标识，进一步细化当前任务执行的情况，提高任务的状态数据的准确性。In this implementation, the executor submits tasks. Each submission action will be executed in a new coroutine to ensure that the traversal will not be blocked. In addition, tasks can be submitted regularly, and the status of the tasks can be identified to further refine the current The status of task execution to improve the accuracy of task status data.

在本实施例的一些可选实现方式中，响应于块设备服务客户端返回的提交结果指示任务提交失败，将任务从未执行队列移入退场/失败(exited_error_queue)队列，重试向块设备服务管理器提交任务包括：响应于块设备服务客户端返回的提交结果指示任务提交失败、提交结果所指示的任务为创建资源类任务且重试向块设备服务管理器提交提交结果所指示的任务的次数小于预定阈值，将提交结果所指示的任务从未执行队列移入退场/失败队列，重试向块设备服务管理器提交提交结果所指示的任务。In some optional implementations of this embodiment, in response to the submission result returned by the block device service client indicating that the task submission failed, the task is moved from the unexecuted queue to the exit/failure (exited_error_queue) queue, and the task is retried to the block device service management Submitting a task to the server includes: in response to the submission result returned by the block device service client indicating that the task submission fails, the task indicated by the submission result is a task of creating a resource class, and the number of times that the task indicated by the submission result is retried to the block device service manager If the value is less than the predetermined threshold, the task indicated by the submission result is moved from the unexecuted queue to the exit/failure queue, and the task indicated by the submission result is retried to the block device service manager.

在本实现方式中，对于创建资源类任务的重试次数设定了上限预定阈值，以避免创建资源类任务重试次数过多，从而出现资源泄露的情况。In this implementation manner, an upper predetermined threshold is set for the number of retries for creating a resource-type task, so as to avoid excessive retries for creating a resource-type task, thereby causing resource leakage.

在本实施例的一些可选实现方式中，执行器进一步被配置成以下至少一项：响应于在提交任务之后执行定时轮询任务，设置被提交的任务的任务动作标识为检查任务，并向块设备服务管理器发送对被提交的任务的查询请求；响应于接收到块设备服务管理器基于查询请求返回的查询结果为任务执行中，保持被提交的任务的任务状态标识不变，设置被提交的任务的任务动作标识为空；响应于接收到块设备服务管理器基于查询请求返回的查询结果为任务执行成功，设置被提交的任务的任务状态标识为执行成功，设置被提交的任务的任务动作标识为空；响应于接收到块设备服务管理器基于查询请求返回的查询结果为任务执行失败，设置被提交的任务的任务状态标识为执行失败，设置被提交的任务的任务动作标识为空。In some optional implementations of this embodiment, the executor is further configured to at least one of the following: in response to executing the timed polling task after submitting the task, set the task action flag of the submitted task as a check task, and send The block device service manager sends a query request for the submitted task; in response to receiving the query result returned by the block device service manager based on the query request that the task is being executed, the task status identifier of the submitted task remains unchanged, and the setting is The task action flag of the submitted task is empty; in response to receiving the query result returned by the block device service manager based on the query request that the task is executed successfully, set the task status flag of the submitted task to be executed successfully, and set the The task action flag is empty; in response to receiving the query result returned by the block device service manager based on the query request that the task execution failed, set the task status flag of the submitted task as execution failure, and set the task action flag of the submitted task as null.

在本实现方式中，执行器定时轮询任务状态，如图3中的可选实现方式所示，在可选的步骤340中，先将任务动作标识(task action)置为检查任务，再向块设备服务管理器(CDSMaster)发送查询请求；在可选的步骤350中，CDSMaster返回任务状态，task还在执行，task state不变，将任务动作标识设置为空(task state不变，action＝none)；在可选的步骤360中，CDSMaster返回任务状态，task执行成功，任务状态标识设置为执行成功，任务动作标识设置为空(state＝exited，action＝none)；在可选的步骤370中，CDSMaster返回任务状态，task执行失败，任务状态标识设置为执行失败，任务动作标识设置为空(taskstate＝error，action＝none)。In this implementation, the executor periodically polls the task status. As shown in the optional implementation in FIG. 3, in optional step 340, first set the task action flag (task action) as the inspection task, and then send the The block device service manager (CDSMaster) sends a query request; in optional step 350, the CDSMaster returns the task status, the task is still executing, the task state remains unchanged, and the task action identifier is set to null (task state remains unchanged, action= none); in optional step 360, CDSMaster returns the task status, the task is successfully executed, the task status flag is set to execute successfully, and the task action flag is set to be empty (state=exited, action=none); in optional step 370 , the CDSMaster returns the task status, the task execution fails, the task status flag is set to execution failure, and the task action flag is set to null (taskstate=error, action=none).

本实现方式中执行器所执行的步骤，确定了执行器在定时轮询任务状态所执行的操作，进一步细化了执行器在操作流程中的状态信息，提高了状态信息的准确性。The steps performed by the executor in this implementation manner determine the operations performed by the executor when periodically polling the task status, further refine the status information of the executor in the operation process, and improve the accuracy of the status information.

在本实施例的一些可选实现方式中，执行器进一步被配置成：响应于向块设备服务管理器提交的任务为预设任务且提交预设任务成功，设置被提交的任务的任务状态标识为执行成功，设置被提交的任务的任务动作标识为空；响应于向块设备服务管理器提交的任务为预设任务且提交预设任务失败，设置被提交的任务的任务状态标识为执行失败，设置被提交的任务的任务动作标识为空。In some optional implementations of this embodiment, the executor is further configured to: in response to the task submitted to the block device service manager being a preset task and the preset task is submitted successfully, set a task status identifier of the submitted task For successful execution, set the task action flag of the submitted task to be empty; in response to the task submitted to the block device service manager being a preset task and the failure to submit the preset task, set the task status flag of the submitted task as execution failure , set the task action flag of the submitted task to null.

在本实现方式中，预设任务可以为本领域技术人员根据经验或应用场景确定的无需轮询任务状态的任务。例如某些短期/不关心执行结果的task，具体地，可以为stat_volume(查询磁盘挂载信息)、delete_volume(删除磁盘)、delete_snapshot(删除快照)等。这些预设任务，在提交之后即可返回执行结果，不需要轮询任务状态。如图3中的可选实现方式所示，在可选的步骤380中，若预设任务提交成功，可直接更改task＝exited，action＝none；若预设任务提交失败(块设备服务返回的失败，非环境原因导致的)，如上述步骤330所示，可直接更改task＝error，action＝none。In this implementation manner, the preset task may be a task that is determined by those skilled in the art according to experience or application scenarios without polling the task status. For example, some tasks that are short-term/don't care about the execution result, specifically, can be stat_volume (query disk mount information), delete_volume (delete disk), delete_snapshot (delete snapshot), etc. These preset tasks can return the execution results after submission, and do not need to poll the task status. As shown in the optional implementation in FIG. 3, in optional step 380, if the preset task is submitted successfully, you can directly change task=exited, action=none; if the preset task submission fails (the block device service returns failure, not caused by environmental reasons), as shown in the above step 330, you can directly change task=error, action=none.

本实现方式中的执行器所执行的对于预设任务(例如段琦/不关心执行结果的任务等)的操作流程，可以减少轮询过程，提高执行提交任务的效率。The operation flow of the preset task (eg Duan Qi/task that does not care about the execution result, etc.) executed by the executor in this implementation manner can reduce the polling process and improve the efficiency of executing the submitted task.

在本实施例的一些可选实现方式中，执行器进一步被配置成：基于pingpong机制向内存数据库发送心跳请求，心跳请求包括：标识当前请求次数的ping_id，标识是否为初始请求的同步标识，标识执行中/执行成功/执行失败的任务的任务列表；内存数据库，进一步被配置成：向执行器返回心跳响应，心跳响应包括：标识下次请求次数的pong_id，未执行任务/执行中任务，对应ping_id的退场/失败任务。In some optional implementations of this embodiment, the executor is further configured to: send a heartbeat request to the in-memory database based on the pingpong mechanism, where the heartbeat request includes: a ping_id identifying the current number of requests, identifying whether it is a synchronization identifier for an initial request, identifying A task list of tasks in execution/execution success/execution failure; the in-memory database is further configured to: return a heartbeat response to the executor, where the heartbeat response includes: pong_id identifying the number of next requests, unexecuted tasks/executing tasks, corresponding to Exit/fail tasks for ping_id.

在本实现方式中，执行器启动，初始化各项参数、初始化未执行队列(ready_queue)、执行中任务队列(running_queue)、退场/失败任务队列(exited_error_queue)，用于分别存放相应状态的任务；之后，执行器向内存数据库汇报心跳，初次汇报时，ping_id＝0，sync＝true(表示初始汇报)，不带任何任务信息；之后，内存数据库返回状态为未执行任务状态和/或执行中任务状态(ready/running)的任务，pong_id＝1，该值为下次汇报周期的ping_id，用于保证心跳请求被顺序处理；之后，执行器将分配的任务存入相应队列。In this implementation, the executor starts, initializes various parameters, initializes the unexecuted queue (ready_queue), the executing task queue (running_queue), and the exit/failed task queue (exited_error_queue), which are used to store tasks in corresponding states respectively; , the executor reports the heartbeat to the in-memory database. During the initial report, ping_id=0, sync=true (representing the initial report), without any task information; after that, the in-memory database returns the status of the unexecuted task and/or the task being executed. (ready/running) task, pong_id=1, the value is the ping_id of the next reporting period, which is used to ensure that the heartbeat request is processed sequentially; after that, the executor stores the assigned task in the corresponding queue.

本实现方式中所限定的心跳请求和心跳响应所包括的参数、流程，细化了心跳汇报所携带的数据内容，提升了心跳汇报所携带的数据的准确性。The parameters and processes included in the heartbeat request and the heartbeat response defined in this implementation method refine the data content carried by the heartbeat report and improve the accuracy of the data carried by the heartbeat report.

在本实施例的一些可选实现方式中，执行器进一步被配置成执行以下至少一项：响应于接收心跳响应，更新ping_id为心跳响应中的pong_id，将心跳响应中标识的退场/失败任务移出退场/失败队列，响应于执行器中的未执行队列中不存在心跳响应中的未执行任务，将心跳响应中的未执行任务添加至执行器中的未执行队列；响应于心跳请求发送失败，重新向内存数据库发起心跳请求；响应于心跳请求发送成功但在预设时间内未接收到心跳响应，采用发送成功但未接收到心跳响应的心跳请求的ping_id向内存数据库重新发起心跳请求，重新发送的心跳请求中标识执行中/执行成功/执行失败的任务的任务列表与发送成功但未接收到心跳响应的心跳请求中标识执行中/执行成功/执行失败的任务的任务列表不一定相同。In some optional implementations of this embodiment, the executor is further configured to perform at least one of the following: in response to receiving the heartbeat response, update the ping_id to the pong_id in the heartbeat response, and remove the exit/failure task identified in the heartbeat response Exit/fail queue, in response to the unexecuted task in the heartbeat response not existing in the unexecuted queue in the executor, add the unexecuted task in the heartbeat response to the unexecuted queue in the executor; in response to the failure to send the heartbeat request, Re-initiate the heartbeat request to the in-memory database; in response to the heartbeat request being sent successfully but not receiving the heartbeat response within the preset time, use the ping_id of the heartbeat request that was successfully sent but not received the heartbeat response to re-initiate the heartbeat request to the in-memory database, and resend The task list that identifies tasks in execution/successful/failed in the heartbeat request is not necessarily the same as the task list that identifies tasks in execution/successful/failed in the heartbeat request sent successfully but no heartbeat response is received.

在本实现方式中，执行器向内存数据库汇报心跳，ping_id＝n表示第n次汇报周期，sync＝false表示非初始汇报，tasks＝running/exited/error：将执行中/执行成功/失败的任务汇报给内存数据库。之后，内存数据库更新任务状态，返回未执行任务，以及本次汇报的退场/失败任务(exited/error tasks)。之后，执行器收到心跳响应，更新ping_id，将接收到的exited/error tasks从退场失败队列(exited_error_queue)移出，对于接收到的未执行任务(ready tasks)，先判断未执行队列(ready_queue)中是否存在该接收到的未执行任务，若不存在，则将未执行任务加入未执行队列。In this implementation, the executor reports the heartbeat to the in-memory database, ping_id=n represents the nth reporting cycle, sync=false represents non-initial reporting, tasks=running/exited/error: tasks that will be executed/executed successfully/failed Report to the in-memory database. After that, the in-memory database updates the task status, returns the unexecuted tasks, and the exited/error tasks reported this time. After that, the executor receives the heartbeat response, updates the ping_id, and removes the received exited/error tasks from the exit failure queue (exited_error_queue). Whether the received unexecuted task exists, if not, the unexecuted task will be added to the unexecuted queue.

在上述的执行器向内存数据库汇报心跳的操作过程中，也可能存在以下异常案例：During the above-mentioned operation of the executor reporting the heartbeat to the in-memory database, the following exceptions may also exist:

第一种异常为心跳请求没有正常发出(也即发送失败)，那么此时执行器可以重新发起请求。The first exception is that the heartbeat request is not sent normally (that is, the sending fails), then the executor can re-initiate the request at this time.

第二种异常为心跳请求正常，没有收到心跳响应(也即接收失败)，那么此时执行器可以重新发起请求：ping_id不变，tasks内容有可能发生变化；内存数据库可以在接收到第一个ping_id的请求后，将请求中的状态数据缓存到内存，从而对于后续接收的相同的ping_id请求，都返回缓存的内容。The second abnormality is that the heartbeat request is normal, and no heartbeat response is received (that is, the reception fails). At this time, the executor can re-initiate the request: the ping_id remains unchanged, and the tasks content may change; the in-memory database can receive the first request. After a ping_id request is made, the status data in the request is cached in the memory, so that the cached content is returned for the same ping_id request received subsequently.

第三种异常为第一种异常与第二种异常之间可能切换到另一个协程，该协程可能在下个周期的心跳汇报，ping_id与本次汇报一致，不会造成影响；该协程可能在提交任务，不会造成影响；该协程还可能在查询任务状态，也不会造成影响。The third exception is that between the first exception and the second exception, it may switch to another coroutine. The coroutine may report in the heartbeat of the next cycle. The ping_id is consistent with this report and will not cause any impact; the coroutine It may be submitting a task, but it will not affect it; the coroutine may also be querying the task status, and it will not affect it.

第四种异常为执行器的心跳请求设置超时时间，超时后立即重发请求；内存数据库返回超时的心跳响应，不会影响执行器接收并处理。The fourth exception sets a timeout for the heartbeat request of the executor, and immediately resends the request after the timeout; the in-memory database returns a timeout heartbeat response, which will not affect the executor's reception and processing.

本实现方式中的执行器汇报心跳的方法，解决了汇报心跳的过程中容易出现的各种问题，提高了系统的稳定性。The method for reporting the heartbeat by the executor in this implementation mode solves various problems that are easy to occur in the process of reporting the heartbeat, and improves the stability of the system.

在本实施例的一些可选实现方式中，内存数据库进一步被配置成：将心跳请求中标识执行中/执行成功/执行失败的任务的任务列表缓存至内存，响应于接收到的心跳请求的ping_id与之前接收到的心跳请求的ping_id相同，向执行器返回之前接收到的心跳请求缓存至内存的任务列表。In some optional implementations of this embodiment, the in-memory database is further configured to: cache the task list in the heartbeat request that identifies the tasks in execution/successful/failed to be executed in the memory, and in response to the received ping_id of the heartbeat request The same as the ping_id of the previously received heartbeat request, return to the executor the task list of the previously received heartbeat request cached in memory.

在本实现方式中，针对执行器中响应于心跳请求发送成功但在预设时间内未接收到心跳响应，采用发送成功但未接收到心跳响应的心跳请求的ping_id向内存数据库重新发起心跳请求，重新发送的心跳请求中标识执行中/执行成功/执行失败的任务的任务列表与发送成功但未接收到心跳响应的心跳请求中标识执行中/执行成功/执行失败的任务的任务列表不一定相同，内存数据库可以在接收到第一个ping_id的请求后，将请求中的状态数据缓存到内存，后续相同的ping_id请求，都返回缓存的内容，从而可以确保前后相同ping_id请求所返回的数据的一致。In this implementation, in response to the heartbeat request being sent successfully but not receiving the heartbeat response within the preset time in the executor, the heartbeat request is re-initiated to the memory database using the ping_id of the heartbeat request that was successfully sent but the heartbeat response was not received, The task list that identifies tasks in execution/successful/failed in the resent heartbeat request is not necessarily the same as the task list that identifies tasks in execution/successful/failed in the heartbeat request sent successfully but no heartbeat response is received , the in-memory database can cache the status data in the request to memory after receiving the first ping_id request, and the subsequent same ping_id request will return the cached content, thus ensuring the consistency of the data returned by the same ping_id request before and after .

在本实施例的一些可选实现方式中，执行器进一步被配置成执行以下至少一项：采用新建的协程从未执行队列获取任务；采用新建的协程执行每项定时任务，定时任务包括以下至少一项：定时汇报心跳、定时提交任务、定时查询任务状态。In some optional implementations of this embodiment, the executor is further configured to execute at least one of the following: using a newly created coroutine to obtain a task from an unexecuted queue; using the newly created coroutine to execute each timing task, and the timing task includes At least one of the following: report heartbeat regularly, submit tasks regularly, and query task status regularly.

在本实现方式中，执行器启动的定时任务共有三项，每项定时任务都会新开一个协程去执行：定时汇报心跳；定时提交任务；以及定时查询任务状态。In this implementation, there are three timing tasks started by the executor, and each timing task will open a new coroutine to execute: reporting heartbeat regularly; submitting tasks regularly; and querying task status regularly.

本公开上述实施例的块设备管理系统，使得风险控制模型输出的风险评估信息的准确性更高，并且提高了得到风险评估信息的效率和准确性。The block device management system of the above embodiments of the present disclosure makes the risk assessment information output by the risk control model more accurate, and improves the efficiency and accuracy of obtaining the risk assessment information.

进一步参考图4，图4示出了根据本公开的块设备管理系统的又一个实施例的示例性结构图。With further reference to FIG. 4 , FIG. 4 shows an exemplary structural diagram of yet another embodiment of the block device management system according to the present disclosure.

如图4所示，本实施例的块设备管理系统400，可以在图2所示的块设备管理系统的基础上，内存数据库240包括：块设备管理系统的服务模块(CinderService)241，被配置成接收执行器的心跳请求、接收应用程序接口的数据卷、快照操作请求；块设备管理系统的管理模块(CinderManager)242，被配置成处理执行器的心跳请求、处理应用程序接口的数据卷、快照操作请求。As shown in FIG. 4 , in the block device management system 400 of this embodiment, on the basis of the block device management system shown in FIG. 2 , the memory database 240 includes: a service module (CinderService) 241 of the block device management system, which is configured with into receiving the heartbeat request of the executor, receiving the data volume of the application program interface, and the snapshot operation request; the management module (CinderManager) 242 of the block device management system is configured to process the heartbeat request of the executor, process the data volume of the application program interface, Snapshot operation request.

备选地或附加地，内存数据库还包括：数据卷的服务模块(VolumeService)243，为PB定义的服务模块、数据卷的元数据的CRUD接口，被配置成接收对数据卷的元数据的操作请求；数据卷的管理模块(VolumeManager)244，由数据库管理器管理，被配置成持久化对数据卷的元数据的操作请求，以及存储与更新数据卷的元数据。Alternatively or additionally, the in-memory database further includes: a data volume service module (VolumeService) 243, a service module defined for the PB, a CRUD interface for metadata of the data volume, configured to receive operations on the metadata of the data volume request; a data volume management module (VolumeManager) 244, managed by the database manager, is configured to persist operation requests for the metadata of the data volume, and to store and update the metadata of the data volume.

备选地或附加地，内存数据库还包括：快照的服务模块245(SnapshotService)，为PB定义的服务模块、快照的元数据的CRUD接口，被配置成接收对快照的元数据的操作请求；快照的管理模块(SnapshotManager)246，由数据库管理器管理，被配置成持久化对快照的元数据的操作请求，存储与更新快照的元数据。Alternatively or additionally, the in-memory database also includes: a snapshot service module 245 (SnapshotService), a service module defined for the PB, a CRUD interface for the snapshot metadata, configured to receive an operation request for the snapshot metadata; the snapshot The management module (SnapshotManager) 246, managed by the database manager, is configured to persist operation requests for the metadata of the snapshot, and store and update the metadata of the snapshot.

综上，内存数据库中实现三个Service：数据卷的元数据存储、快照的元数据存储、作业与任务管理，各组件设计如下表所示：To sum up, three services are implemented in the in-memory database: metadata storage for data volumes, metadata storage for snapshots, and job and task management. The design of each component is shown in the following table:

在本实施例的一些可选实现方式中，客户端和应用程序接口与内存数据库、执行器之间采用RPC通信；客户端与应用程序接口之间采用HTTP通信。In some optional implementation manners of this embodiment, RPC is used for communication between the client and the application program interface, the in-memory database, and the executor; HTTP communication is used between the client and the application program interface.

如图4中所示，虚线为RPC调用，实线为函数调用。原模块间面向对象的消息中间件(Qpid，采用AMQP协议)与SQL的交互方式，修改为使用RPC交互。在这里，通过使用BaiduRPC替代Qpid，提高系统间通信的容错能力。As shown in Figure 4, the dotted lines are RPC calls and the solid lines are function calls. The interaction between the object-oriented message middleware (Qpid, using AMQP protocol) and SQL between the original modules is modified to use RPC interaction. Here, the fault tolerance of inter-system communication is improved by using BaiduRPC instead of Qpid.

本公开图4中的实施例中的块设备管理系统，在图2中所示的块设备管理系统的基础上，对于块设备管理系统、卷数据和快照，细化了内存数据库中的接收请求的服务模块和处理请求的管理模块，提高了内存数据库对于执行器的服务支撑能力，提高了块设备管理系统的稳定性。The block device management system in the embodiment in FIG. 4 of the present disclosure, on the basis of the block device management system shown in FIG. 2, refines the receiving request in the in-memory database for the block device management system, volume data and snapshots The service module and the management module for processing requests improve the service support capability of the in-memory database for the executor and improve the stability of the block device management system.

下面参考图5，其示出了适于用来实现本公开的实施例的电子设备(例如图1中的服务器或终端设备)500的结构示意图。本公开的实施例中的终端设备可以包括但不限于诸如笔记本电脑、台式计算机等。图5示出的终端设备/服务器仅仅是一个示例，不应对本公开的实施例的功能和使用范围带来任何限制。Referring next to FIG. 5 , it shows a schematic structural diagram of an electronic device (eg, the server or terminal device in FIG. 1 ) 500 suitable for implementing the embodiments of the present disclosure. Terminal devices in the embodiments of the present disclosure may include, but are not limited to, notebook computers, desktop computers, and the like. The terminal device/server shown in FIG. 5 is only an example, and should not impose any limitation on the function and scope of use of the embodiments of the present disclosure.

如图5所示，电子设备500可以包括处理装置(例如中央处理器、图形处理器等)501，其可以根据存储在只读存储器(ROM)502中的程序或者从存储装置508加载到随机访问存储器(RAM)503中的程序而执行各种适当的动作和处理。在RAM 503中，还存储有电子设备500操作所需的各种程序和数据。处理装置501、ROM 502以及RAM 503通过总线504彼此相连。输入/输出(I/O)接口505也连接至总线504。As shown in FIG. 5 , an electronic device 500 may include a processing device (eg, a central processing unit, a graphics processor, etc.) 501 that may be loaded into random access according to a program stored in a read only memory (ROM) 502 or from a storage device 508 Various appropriate actions and processes are executed by the programs in the memory (RAM) 503 . In the RAM 503, various programs and data necessary for the operation of the electronic device 500 are also stored. The processing device 501 , the ROM 502 , and the RAM 503 are connected to each other through a bus 504 . An input/output (I/O) interface 505 is also connected to bus 504 .

通常，以下装置可以连接至I/O接口506：包括例如触摸屏、触摸板、键盘、鼠标、摄像头、麦克风、加速度计、陀螺仪等的输入装置505；包括例如液晶显示器(LCD)、扬声器、振动器等的输出装置507；包括例如磁带、硬盘等的存储装置508；以及通信装置509。通信装置509可以允许电子设备500与其他设备进行无线或有线通信以交换数据。虽然图5示出了具有各种装置的电子设备500，但是应理解的是，并不要求实施或具备所有示出的装置。可以替代地实施或具备更多或更少的装置。图5中示出的每个方框可以代表一个装置，也可以根据需要代表多个装置。Typically, the following devices may be connected to the I/O interface 506: input devices 505 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; including, for example, a liquid crystal display (LCD), speakers, vibration An output device 507 such as a computer; a storage device 508 including, for example, a magnetic tape, a hard disk, etc.; and a communication device 509 . Communication means 509 may allow electronic device 500 to communicate wirelessly or by wire with other devices to exchange data. While FIG. 5 shows electronic device 500 having various means, it should be understood that not all of the illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided. Each block shown in FIG. 5 can represent one device, and can also represent multiple devices as required.

特别地，根据本公开的实施例，上文参考流程图描述的过程可以被实现为计算机软件程序。例如，本公开的实施例包括一种计算机程序产品，其包括承载在计算机可读介质上的计算机程序，该计算机程序包含用于执行流程图所示的方法的程序代码。在这样的实施例中，该计算机程序可以通过通信装置509从网络上被下载和安装，或者从存储装置508被安装，或者从ROM 502被安装。在该计算机程序被处理装置501执行时，执行本公开的实施例的方法中限定的上述功能。需要说明的是，本公开的实施例所述的计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质或者是上述两者的任意组合。计算机可读存储介质例如可以是——但不限于——电、磁、光、电磁、红外线、或半导体的系统、装置或器件，或者任意以上的组合。计算机可读存储介质的更具体的例子可以包括但不限于：具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机访问存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑磁盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本公开的实施例中，计算机可读存储介质可以是任何包含或存储程序的有形介质，该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。而在本公开的实施例中，计算机可读信号介质可以包括在基带中或者作为载波一部分传播的数据信号，其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式，包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读信号介质还可以是计算机可读存储介质以外的任何计算机可读介质，该计算机可读信号介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。计算机可读介质上包含的程序代码可以用任何适当的介质传输，包括但不限于：电线、光缆、RF(射频)等等，或者上述的任意合适的组合。In particular, according to embodiments of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program carried on a computer-readable medium, the computer program containing program code for performing the method illustrated in the flowchart. In such an embodiment, the computer program may be downloaded and installed from the network via the communication device 509 , or from the storage device 508 , or from the ROM 502 . When the computer program is executed by the processing device 501, the above-described functions defined in the methods of the embodiments of the present disclosure are performed. It should be noted that the computer-readable medium described in the embodiments of the present disclosure may be a computer-readable signal medium or a computer-readable storage medium, or any combination of the above two. The computer-readable storage medium can be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus or device, or a combination of any of the above. More specific examples of computer readable storage media may include, but are not limited to, electrical connections with one or more wires, portable computer disks, hard disks, random access memory (RAM), read only memory (ROM), erasable Programmable read only memory (EPROM or flash memory), fiber optics, portable compact disk read only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination of the foregoing. In embodiments of the present disclosure, a computer-readable storage medium may be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device. Rather, in embodiments of the present disclosure, a computer-readable signal medium may include a data signal in baseband or propagated as part of a carrier wave, carrying computer-readable program code therein. Such propagated data signals may take a variety of forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing. A computer-readable signal medium can also be any computer-readable medium other than a computer-readable storage medium that can transmit, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device . Program code embodied on a computer readable medium may be transmitted using any suitable medium including, but not limited to, electrical wire, optical fiber cable, RF (radio frequency), etc., or any suitable combination of the foregoing.

上述计算机可读介质可以是上述电子设备中所包含的；也可以是单独存在，而未装配入该电子设备中。上述计算机可读介质承载有一个或者多个程序，当上述一个或者多个程序被该电子设备执行时，使得该电子设备实现块设备管理系统，包括：客户端、应用程序接口；块设备管理系统还包括：执行器，被配置成向块设备服务管理器模块提交任务和查询已提交的任务，从内存数据库获取任务和汇报任务状态；以及内存数据库，被配置成执行设备快照和数据卷的元数据存储、管理作业与任务。The above-mentioned computer-readable medium may be included in the above-mentioned electronic device; or may exist alone without being assembled into the electronic device. The above-mentioned computer-readable medium carries one or more programs, and when the above-mentioned one or more programs are executed by the electronic device, the electronic device is made to implement a block device management system, including: a client, an application program interface; a block device management system Also included: an executor configured to submit tasks and query submitted tasks to the block device service manager module, obtain tasks from an in-memory database and report task status; and an in-memory database configured to perform device snapshots and meta-volumes of data Data storage, management jobs and tasks.

可以以一种或多种程序设计语言或其组合来编写用于执行本公开的实施例的操作的计算机程序代码，所述程序设计语言包括面向对象的程序设计语言—诸如Java、Smalltalk、C++，还包括常规的过程式程序设计语言—诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中，远程计算机可以通过任意种类的网络——包括局域网(LAN)或广域网(WAN)——连接到用户计算机，或者，可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。Computer program code for carrying out operations of embodiments of the present disclosure may be written in one or more programming languages, including object-oriented programming languages—such as Java, Smalltalk, C++, or a combination thereof, Also included are conventional procedural programming languages - such as the "C" language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer, or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or may be connected to an external computer (eg, using an Internet service provider to via Internet connection).

附图中的流程图和框图，图示了按照本公开各种实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上，流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分，该模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意，在有些作为替换的实现中，方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如，两个接连地表示的方框实际上可以基本并行地执行，它们有时也可以按相反的顺序执行，这依所涉及的功能而定。也要注意的是，框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合，可以用执行规定的功能或操作的专用的基于硬件的系统来实现，或者可以用专用硬件与计算机指令的组合来实现。The flowchart and block diagrams in the Figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code that contains one or more logical functions for implementing the specified functions executable instructions. It should also be noted that, in some alternative implementations, the functions noted in the blocks may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It is also noted that each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations, can be implemented in dedicated hardware-based systems that perform the specified functions or operations , or can be implemented in a combination of dedicated hardware and computer instructions.

描述于本公开实施例中所涉及到的模块和器件可以通过软件的方式实现，也可以通过硬件的方式来实现。所描述的模块也可以设置在处理器中，例如，可以描述为：一种处理器实现块设备管理系统，包括：客户端、应用程序接口；处理器实现块设备管理系统还包括：执行器和内存数据库。其中，这些模块和器件的名称在某种情况下并不构成对该模块和器件本身的限定，例如，执行器还可以被描述为“向块设备服务管理器模块提交任务和查询已提交的任务，以及从内存数据库获取任务和汇报任务状态的器件”。The modules and devices involved in the embodiments of the present disclosure may be implemented in software or hardware. The described module can also be set in the processor, for example, it can be described as: a processor implements a block device management system, including: a client, an application program interface; the processor implements a block device management system and further includes: an executor and Memory Database. Among them, the names of these modules and devices do not constitute restrictions on the modules and devices themselves, for example, the executor can also be described as "submit tasks to the block device service manager module and query submitted tasks. , and a device that retrieves tasks from an in-memory database and reports task status".

以上描述仅为本公开的较佳实施例以及对所运用技术原理的说明。本领域技术人员应当理解，本公开中所涉及的发明范围，并不限于上述技术特征的特定组合而成的技术方案，同时也应涵盖在不脱离上述发明构思的情况下，由上述技术特征或其等同特征进行任意组合而形成的其它技术方案。例如上述特征与本公开中公开的(但不限于)具有类似功能的技术特征进行互相替换而形成的技术方案。The above description is merely a preferred embodiment of the present disclosure and an illustration of the technical principles employed. Those skilled in the art should understand that the scope of the invention involved in the present disclosure is not limited to the technical solution formed by the specific combination of the above-mentioned technical features, and should also cover, without departing from the above-mentioned inventive concept, the above-mentioned technical features or Other technical solutions formed by any combination of its equivalent features. For example, a technical solution is formed by replacing the above features with the technical features disclosed in the present disclosure (but not limited to) with similar functions.

Claims

1. A block device management system, comprising: a client, an application program interface; the block device management system further includes:

the executor is configured to submit tasks to the block equipment service manager module, inquire the submitted tasks, and acquire the tasks and report the task state from the memory database; and

the memory database is configured to execute metadata storage, management operation and task of the equipment snapshot and the data volume;

wherein the states of the tasks in the executor include: a task state identifier and a task action identifier; wherein, the task state identification comprises: not executed, in execution, failed in execution, successful in execution; the task action identification comprises the following steps: submitting a task, checking the task and emptying;

the actuator is further configured to:

before submitting a task indicated by a task Id, setting a task state identifier of the task indicated by the task Id as unexecuted, and setting a task action identifier of the task indicated by the task Id as null;

acquiring a task from an unexecuted queue, setting a task action identifier of the acquired task as a submitted task, and adopting a newly-built coroutine to call a block equipment service client to submit the acquired task to a block equipment service manager;

and in response to the fact that the submission result returned by the block equipment service client indicates that the submission of the task is successful, the task indicated by the submission result is moved from the unexecuted queue to the executing task queue, the task state of the task indicated by the change submission result is identified as being executing, and the task action for changing the task indicated by the submission result is identified as being empty.

2. The block device management system of claim 1, wherein the in-memory database is further configured to: splitting various upstream requests into task lists, and distributing the task lists to the actuators for execution;

the actuator is further configured to: tracking the state of the task indicated by the task Id in the client based on the task Id in the task list distributed from the in-memory database, wherein the tracking comprises the following steps:

if the task indicated by the task Id does not start, submitting the task indicated by the task Id to a block device service manager, and tracking the state of the task indicated by the task Id;

if the task indicated by the task Id is executed/completed, updating the state of the task indicated by the task Id in an actuator; and

and reporting heartbeat to a memory database at regular time, and synchronizing the state of the task indicated by the task Id in the memory of the actuator to the memory database.

3. The block device management system of claim 1, wherein the executor is further configured to:

in response to the submission result returned by the block equipment service client indicating that the submission of the task fails, moving the task indicated by the submission result from the unexecuted queue into a retirement/failure queue, and retrying to submit the task indicated by the submission result to the block equipment service manager;

and in response to the heartbeat response returned by the memory database indicating that the task is consumed by the block device service but the state of the task is not updated, and the task does not exist in the unprocessed task queue, adding the task to the unprocessed task queue.

4. The block device management system of claim 3, wherein responsive to the commit result returned by the block device service client indicating a failure to commit the task, moving the task from the unexecuted queue into the retirement/failure queue, and retrying to commit the task to the block device service manager comprises:

and in response to the submission result returned by the block equipment service client indicating that the submission of the task fails, the task indicated by the submission result is a task of creating a resource class, and the number of times of retrying to submit the task indicated by the submission result to the block equipment service manager is less than a preset threshold value, moving the task indicated by the submission result from the unexecuted queue into a retired/failed queue, and retrying to submit the task indicated by the submission result to the block equipment service manager.

5. The block device management system of claim 3, wherein the executor is further configured to at least one of:

in response to executing a timed polling task after submitting a task, setting a task action identifier of the submitted task as a check task, and sending a query request for the submitted task to the block device service manager;

in response to receiving that a query result returned by the block device service manager based on the query request is in task execution, keeping a task state identifier of the submitted task unchanged, and setting a task action identifier of the submitted task to be null;

in response to receiving that the query result returned by the block device service manager based on the query request is that the task is successfully executed, setting the task state identifier of the submitted task as successful execution, and setting the task action identifier of the submitted task as null;

and in response to receiving that the query result returned by the block device service manager based on the query request is a task execution failure, setting the task state identifier of the submitted task as the execution failure, and setting the task action identifier of the submitted task as null.

6. The block device management system of claim 3, wherein the executor is further configured to:

in response to the fact that the task submitted to the block equipment service manager is a preset task and the preset task is successfully submitted, setting a task state identifier of the submitted task as successful execution and setting a task action identifier of the submitted task as null;

and in response to the fact that the task submitted to the block equipment service manager is a preset task and the submission of the preset task fails, setting the task state identifier of the submitted task as an execution failure, and setting the task action identifier of the submitted task as null.

7. The block device management system of claim 3, wherein the executor is further configured to:

sending a heartbeat request to the memory database based on a pingpong mechanism, wherein the heartbeat request comprises: identifying ping _ id of current request times, identifying whether the current request is synchronous identification of an initial request, and identifying a task list of tasks which are executed/executed successfully/executed unsuccessfully;

the in-memory database further configured to: returning a heartbeat response to the executor, the heartbeat response including: and the pong _ id for identifying the next request times, the unexecuted task/the executed task and the quitting/failing task corresponding to the ping _ id.

8. The block device management system of claim 7, wherein the executor is further configured to perform at least one of:

in response to receiving the heartbeat response, updating the ping _ id to a pong _ id in the heartbeat response, removing a field-off/failed task identified in the heartbeat response from a field-off/failed queue, and in response to an unexecuted task in the heartbeat response not existing in an unexecuted queue in the executor, adding the unexecuted task in the heartbeat response to the unexecuted queue in the executor;

responding to the sending failure of the heartbeat request, and restarting the heartbeat request to a memory database;

responding to the successful sending of the heartbeat request but not receiving the heartbeat response within the preset time, and re-initiating the heartbeat request to the memory database by adopting the ping _ id of the heartbeat request which is successfully sent but not receiving the heartbeat response, wherein a task list which is used for identifying the task which is being executed/successfully executed/unsuccessfully executed in the re-sent heartbeat request is not necessarily the same as a task list which is used for identifying the task which is being executed/successfully executed/unsuccessfully executed in the heartbeat request which is successfully sent but not receiving the heartbeat response.

9. The block device management system of claim 8, wherein the in-memory database is further configured to:

and caching a task list of the task which identifies the task which is executed/executed successfully/executed unsuccessfully in the heartbeat request into a memory, and returning the task list of the task which is cached in the memory by the received heartbeat request to the executor in response to the fact that the ping _ id of the received heartbeat request is the same as the ping _ id of the received heartbeat request.

10. The block device management system of any of claims 1-9, wherein the executor is further configured to perform at least one of:

acquiring tasks from the unexecuted queue by adopting a newly-built coroutine;

executing each timing task by adopting the newly-built coroutine, wherein the timing task comprises at least one of the following items: reporting heartbeat regularly, submitting task regularly and inquiring task regularly.

11. The block device management system of claim 1, wherein the client and application program interface communicate with the in-memory database and the executor using RPC; and the client and the application program interface adopt HTTP communication.

12. The block device management system of claim 1, wherein the in-memory database comprises:

the service module of the block device management system is configured to receive a heartbeat request of an executor, receive a data volume of an application program interface and a snapshot operation request;

and the management module of the block device management system is configured to process heartbeat requests of the executor, process data volumes of the application program interface and snapshot operation requests.

13. The block device management system according to any one of claims 1 or 12, wherein the in-memory database further includes:

the service module of the data volume, the service module defined by PB, the CRUD interface of the metadata of the data volume, and the CRUD interface are configured to receive an operation request for the metadata of the data volume;

a management module for the data volume, managed by the database manager, is configured to persist operation requests for the metadata of the data volume, and to store and update the metadata of the data volume.

14. The block device management system according to any one of claims 1 and 12, wherein the in-memory database further includes:

the service module of the snapshot is defined by PB, and the CRUD interface of the metadata of the snapshot is configured to receive an operation request for the metadata of the snapshot;

and the management module of the snapshot is managed by the database manager and is configured to persist operation requests for the metadata of the snapshot and store and update the metadata of the snapshot.

15. An electronic device/terminal/server comprising:

one or more processors;

storage means for storing one or more programs;

when executed by the one or more processors, cause the one or more processors to implement the block device management system of any of claims 1-14.

16. A computer-readable medium, on which a computer program is stored which, when being executed by a processor, carries out a block device management system according to any one of claims 1-14.