[go: up one dir, main page]

CN118740743A - Data transmission method and system - Google Patents

Data transmission method and system Download PDF

Info

Publication number
CN118740743A
CN118740743A CN202310366680.8A CN202310366680A CN118740743A CN 118740743 A CN118740743 A CN 118740743A CN 202310366680 A CN202310366680 A CN 202310366680A CN 118740743 A CN118740743 A CN 118740743A
Authority
CN
China
Prior art keywords
controller
link
controllers
application server
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310366680.8A
Other languages
Chinese (zh)
Inventor
吴勇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to CN202310366680.8A priority Critical patent/CN118740743A/en
Publication of CN118740743A publication Critical patent/CN118740743A/en
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/12Avoiding congestion; Recovering from congestion
    • H04L47/125Avoiding congestion; Recovering from congestion by balancing the load, e.g. traffic engineering
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/24Multipath
    • H04L45/243Multipath using M+N parallel active paths
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1097Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本申请实施例提供了一种数据传输方法及系统,用于降低应用服务器重传数据读写请求的概率。该存储系统包括存储服务器和应用服务器,所述存储服务器包括多个控制器,所述应用服务器与所述多个控制器中的至少两个控制器连接,所述方法包括:当所述至少两个控制器中的第一控制器的数据处理效率低于所述至少两个控制器中除所述第一控制器外的任一个控制器时,所述存储服务器向所述应用服务器发送第一信息;所述应用服务器将待通过第一链路传输的数据切换到第二链路,所述第二链路为所述至少两个控制器中的第二控制器与所述应用服务器之间的链路。

The embodiment of the present application provides a data transmission method and system for reducing the probability of an application server retransmitting a data read and write request. The storage system includes a storage server and an application server, the storage server includes multiple controllers, the application server is connected to at least two of the multiple controllers, and the method includes: when the data processing efficiency of the first controller of the at least two controllers is lower than that of any controller of the at least two controllers except the first controller, the storage server sends a first message to the application server; the application server switches the data to be transmitted through the first link to the second link, and the second link is a link between the second controller of the at least two controllers and the application server.

Description

一种数据传输方法及系统Data transmission method and system

技术领域Technical Field

本申请涉及云存储技术领域,尤其涉及一种数据传输方法及系统。The present application relates to the field of cloud storage technology, and in particular to a data transmission method and system.

背景技术Background Art

网络附属存储(network attached storage,NAS)是一种连接到计算机网络的数据存储方式,常应用于文件存储领域。其中,用于提供数据存储服务的设备可以称为存储服务器或NAS设备。Network attached storage (NAS) is a data storage method connected to a computer network, and is often used in the field of file storage. The device used to provide data storage services can be called a storage server or NAS device.

存储服务器可包括多个控制器,应用服务器可以通过与控制器之间的链路访问控制器,以从存储服务器中读取数据或向存储服务器中存储数据。但是,当存储服务器中的某个控制器由于硬盘老化等现象导致处理数据的能力降低时,该控制器处理数据的效率将会低于该存储服务器内的其他控制器。那么,该控制器在接收到新的数据读写请求时,由于数据处理能力不足,会丢弃该新的数据读写请求,并请求应用服务器重新发送该新的数据读写请求。这会使得该控制器和应用服务器之间的链路上数据重传数据读写请求的次数较多,可能会导致链路拥塞。The storage server may include multiple controllers, and the application server can access the controllers through the links between the controllers to read data from the storage server or store data in the storage server. However, when the ability of a controller in the storage server to process data is reduced due to phenomena such as hard disk aging, the efficiency of the controller in processing data will be lower than that of other controllers in the storage server. Then, when the controller receives a new data read and write request, it will discard the new data read and write request due to insufficient data processing capacity, and request the application server to resend the new data read and write request. This will cause the link between the controller and the application server to retransmit data read and write requests more times, which may cause link congestion.

发明内容Summary of the invention

本申请实施例提供了一种数据传输方法及系统,用于降低应用服务器重传数据读写请求的概率。The embodiments of the present application provide a data transmission method and system for reducing the probability of an application server retransmitting a data read and write request.

第一方面,提供一种数据传输方法,该方法可以由存储系统实现,该存储系统包括存储服务器和应用服务器,所述存储服务器包括多个控制器,所述应用服务器与所述多个控制器中的至少两个控制器连接,所述方法包括:当所述至少两个控制器中的第一控制器的数据处理效率低于所述至少两个控制器中除所述第一控制器外的任一个控制器时,所述存储服务器向所述应用服务器发送第一信息;其中,所述第一信息用于指示所述应用服务器切换链路,所述第一信息包括第一链路的标识,所述第一链路为所述第一控制器与所述应用服务器之间的链路;所述应用服务器将待通过所述第一链路传输的数据切换到第二链路,所述第二链路为所述至少两个控制器中的第二控制器与所述应用服务器之间的链路。In a first aspect, a data transmission method is provided, which can be implemented by a storage system, the storage system comprising a storage server and an application server, the storage server comprising multiple controllers, the application server being connected to at least two of the multiple controllers, the method comprising: when a data processing efficiency of a first controller of the at least two controllers is lower than that of any other controller of the at least two controllers except the first controller, the storage server sending first information to the application server; wherein the first information is used to instruct the application server to switch a link, the first information comprising an identifier of a first link, the first link being a link between the first controller and the application server; the application server switches the data to be transmitted through the first link to a second link, the second link being a link between the second controller of the at least two controllers and the application server.

在本申请实施例中,如果某个控制器的数据处理效率低于其他控制器,存储服务器可以指示应用服务器将待通过该控制器与应用服务器之间的某条链路(例如第一链路)传输的数据迁移到其他控制器的链路上传输。由此可以减小数据处理效率较低的控制器待处理的数据量,减少应用服务器重传数据读写请求的次数,降低拥塞现象出现的概率,提升存储系统的性能。In the embodiment of the present application, if the data processing efficiency of a controller is lower than that of other controllers, the storage server can instruct the application server to migrate the data to be transmitted through a link (e.g., the first link) between the controller and the application server to the link of other controllers for transmission. In this way, the amount of data to be processed by the controller with lower data processing efficiency can be reduced, the number of times the application server retransmits data read and write requests can be reduced, the probability of congestion can be reduced, and the performance of the storage system can be improved.

在一种可能的实施方式中,所述第一控制器的数据处理效率低于所述任一个控制器,包括如下一项或多项:一个周期内所述任一个控制器处理的数据量与所述第一控制器处理的数据量的差值大于第一阈值,且所述任一个控制器处理的数据量大于所述第一控制器处理的数据量;所述任一个控制器的每秒读写次数(input/output operations persecond,IOPS)与所述第一控制器的IOPS的差值大于第二阈值,且所述任一个控制器的IOPS大于所述第一控制器的IOPS;所述第一控制器发生中央处理器(central processingunit,CPU)核隔离事件;所述第一控制器处理第一数量的数据所用的第一时长与所述任一个控制器处理所述第一数量的数据所用的第二时长的差值大于第三阈值,且所述第一时长大于所述第二时长;所述第一控制器的CPU占用率与所述任一个控制器的CPU占用率的差值大于第四阈值,且所述第一控制器的CPU占用率大于所述任一个控制器的CPU的占用率;所述第一控制器相邻两秒内的IOPS的差值大于第五阈值;或,预设时长内,所述第一控制器向所述应用服务器发送第二信息的次数大于第六阈值,所述第二信息用于请求所述应用服务器重新发送数据读写请求。In a possible implementation manner, the data processing efficiency of the first controller is lower than that of any of the controllers, including one or more of the following: the difference between the amount of data processed by any of the controllers and the amount of data processed by the first controller in one cycle is greater than a first threshold, and the amount of data processed by any of the controllers is greater than the amount of data processed by the first controller; the difference between the input/output operations per second (IOPS) of any of the controllers and the IOPS of the first controller is greater than a second threshold, and the IOPS of any of the controllers is greater than the IOPS of the first controller; the first controller has a central processing unit (CPU) a first processing unit (CPU) core isolation event; a difference between a first duration taken by the first controller to process a first amount of data and a second duration taken by any one of the controllers to process the first amount of data is greater than a third threshold, and the first duration is greater than the second duration; a difference between the CPU occupancy of the first controller and the CPU occupancy of any one of the controllers is greater than a fourth threshold, and the CPU occupancy of the first controller is greater than the CPU occupancy of any one of the controllers; a difference in IOPS of the first controller within two adjacent seconds is greater than a fifth threshold; or, within a preset duration, the number of times the first controller sends second information to the application server is greater than a sixth threshold, and the second information is used to request the application server to resend a data read and write request.

上述技术方案中,如果一个周期内第一控制器的数据处理量小于多个控制器中的其他控制器中的任一个控制器,或者如果第一控制器处理一定数量的数据所花费的时长大于该任一个控制器,或者第一控制器的CPU占用率大于该任一个控制器,表明第一控制器处理数据的速度变慢,数据处理能力降低。如果第一控制器发生核隔离事件,表明第一控制器中用于处理数据的至少一个核出现故障,无法继续处理数据,使得第一控制器用于进行数据处理的核变少,数据处理能力降低。如果第一控制器相邻两秒内IOPS的差值大于第五阈值,表明第一控制器处理数据的波动较大,性能不稳定。如果第一控制器的数据处理效率降低,则第一控制器在接收到新的数据时,可能会由于数据处理能力不足而丢弃该新的数据,且第一控制器还可以请求应用服务器重新发送该新的数据,例如第一控制器向应用服务器发送的第二信息就用于请求应用服务器重新发送该新的数据。那么,如果预设时长内第一控制器向应用服务器发送第二信息的次数大于第六阈值,表明被第一控制器丢弃的数据较多,即第一控制器的数据处理能力较低。In the above technical solution, if the data processing volume of the first controller in a cycle is less than that of any other controller in the multiple controllers, or if the time taken by the first controller to process a certain amount of data is longer than that of any controller, or the CPU occupancy rate of the first controller is greater than that of any controller, it indicates that the speed of the first controller to process data is slowed down and the data processing capacity is reduced. If a core isolation event occurs in the first controller, it indicates that at least one core used for processing data in the first controller fails and cannot continue to process data, so that the first controller has fewer cores for data processing and the data processing capacity is reduced. If the difference in IOPS of the first controller within two consecutive seconds is greater than the fifth threshold, it indicates that the fluctuation of the data processed by the first controller is large and the performance is unstable. If the data processing efficiency of the first controller is reduced, the first controller may discard the new data due to insufficient data processing capacity when receiving new data, and the first controller may also request the application server to resend the new data, for example, the second information sent by the first controller to the application server is used to request the application server to resend the new data. Then, if the number of times the first controller sends the second information to the application server within the preset time is greater than the sixth threshold, it indicates that the data discarded by the first controller is large, that is, the data processing capacity of the first controller is low.

在一种可能的实施方式中,所述方法还包括:所述存储服务器将所述第一链路的状态设置为低优先级状态。In a possible implementation manner, the method further includes: the storage server setting the state of the first link to a low priority state.

优先级状态可用于控制器选择链路,例如控制器优先选择高优先级的链路。因此存储服务器将第一链路设置为低优先级状态,可以减小控制器选择该第一链路传输数据的概率,从而有助于提升数据传输效率。The priority state can be used by the controller to select a link, for example, the controller preferentially selects a link with a high priority. Therefore, the storage server sets the first link to a low priority state, which can reduce the probability that the controller selects the first link to transmit data, thereby helping to improve data transmission efficiency.

在一种可能的实施方式中,所述方法还包括:所述存储服务器向所述应用服务器发送第三信息,所述第三信息用于指示所述应用服务器将所述第一链路的状态更新为所述低优先级状态;所述应用服务器将所述第一链路的状态设置为所述低优先级状态。In a possible implementation, the method further includes: the storage server sending third information to the application server, the third information being used to instruct the application server to update the state of the first link to the low priority state; and the application server setting the state of the first link to the low priority state.

上述技术方案中,存储服务器将链路状态发送给应用服务器,指示应用服务器进行状态同步,有助于降低应用服务器通过低优先级链路传输数据的概率。In the above technical solution, the storage server sends the link status to the application server and instructs the application server to perform status synchronization, which helps to reduce the probability of the application server transmitting data through a low-priority link.

在一种可能的实施方式中,所述方法还包括:所述应用服务器确定链路负载最小的链路为所述第二链路;或,所述第一信息还用于指示所述第二链路。In a possible implementation manner, the method further includes: the application server determining that the link with the smallest link load is the second link; or, the first information is further used to indicate the second link.

上述技术方案中,存储服务器对所有控制器对应的链路进行负载均衡以确定第二链路,可以使得链路的负载均衡程度和/或控制器的负载均衡程度更高。其中,链路的负载均衡程度用于指示通过不同的链路传输的数据量之间的差异;控制器的负载均衡程度用于指示通过不同的控制器处理的数据量的差异。其中,如果差异较小,表明负载均衡程度较高;如果差异较大,表明负载均衡程度较低。应用服务器自行确定第二链路,由于应用服务器所对应的链路可以少于存储服务器对应的链路,由此可以减小负载钧衡过程中需要均衡的链路的数量,有助于提升确定第二链路的效率。In the above technical solution, the storage server performs load balancing on the links corresponding to all controllers to determine the second link, which can make the load balancing degree of the link and/or the load balancing degree of the controller higher. The load balancing degree of the link is used to indicate the difference between the amount of data transmitted through different links; the load balancing degree of the controller is used to indicate the difference in the amount of data processed by different controllers. If the difference is small, it indicates that the load balancing degree is high; if the difference is large, it indicates that the load balancing degree is low. The application server determines the second link by itself. Since the link corresponding to the application server can be less than the link corresponding to the storage server, the number of links that need to be balanced in the load balancing process can be reduced, which helps to improve the efficiency of determining the second link.

在一种可能的实施方式中,所述第一信息还用于指示所述第二链路,所述方法还包括:所述存储服务器获取所述应用服务器与所述至少两个控制器中除所述第一控制器之外的控制器之间的至少一条链路的负载信息,并确定所述至少一条链路中负载最小的链路为所述第二链路;或者,所述存储服务器获取所述至少两个控制器中除所述第一控制器之外的其他控制器的负载信息,确定其中负载最小的控制器为所述第二控制器,并确定所述第二控制器与所述应用服务器之间的链路中负载最小的链路为所述第二链路。In a possible implementation, the first information is also used to indicate the second link, and the method further includes: the storage server obtains load information of at least one link between the application server and a controller other than the first controller among the at least two controllers, and determines that the link with the smallest load among the at least one link is the second link; or, the storage server obtains load information of other controllers other than the first controller among the at least two controllers, determines that the controller with the smallest load is the second controller, and determines that the link with the smallest load among the links between the second controller and the application server is the second link.

上述技术方案中,存储服务器从所有控制器对应的链路中选择负载最小的链路,可以使得链路之间的负载均衡度更高;存储服务器先确定出负载最小的控制器,从负载最小的控制器中选择负载最小的链路,可以使得控制器之间的负载均衡程度更高。In the above technical solution, the storage server selects the link with the smallest load from the links corresponding to all controllers, which can make the load balance between the links higher; the storage server first determines the controller with the smallest load, and selects the link with the smallest load from the controller with the smallest load, which can make the load balance between the controllers higher.

第二方面,提供一种数据传输方法,该方法可以应用于存储系统中的存储服务器,该存储系统包括所述存储服务器和应用服务器,所述存储服务器包括多个控制器,所述应用服务器与所述多个控制器中的至少两个控制器连接,所述方法包括:当所述至少两个控制器中的第一控制器的数据处理效率低于所述至少两个控制器中除所述第一控制器外的任一个控制器时,向所述应用服务器发送第一信息;其中,所述第一信息用于指示所述应用服务器切换链路,所述第一信息包括第一链路的标识,所述第一链路为所述第一控制器与所述应用服务器之间的链路。In a second aspect, a data transmission method is provided, which can be applied to a storage server in a storage system, the storage system comprising the storage server and an application server, the storage server comprising multiple controllers, the application server being connected to at least two of the multiple controllers, the method comprising: when a data processing efficiency of a first controller of the at least two controllers is lower than that of any other controller of the at least two controllers except the first controller, sending first information to the application server; wherein the first information is used to instruct the application server to switch a link, the first information comprises an identifier of a first link, and the first link is a link between the first controller and the application server.

在一种可能的实施方式中,所述第一控制器的数据处理效率低于所述任一个控制器,包括如下一项或多项:一个周期内所述任一个控制器处理的数据量与所述第一控制器处理的数据量的差值大于第一阈值,且所述任一个控制器处理的数据量大于所述第一控制器处理的数据量;所述任一个控制器的IOPS与所述第一控制器的IOPS的差值大于第二阈值,且所述任一个控制器的IOPS大于所述第一控制器的IOPS;所述第一控制器发生中央处理器CPU核隔离事件;所述第一控制器处理第一数量的数据所用的第一时长与所述任一个控制器处理所述第一数量的数据所用的第二时长的差值大于第三阈值,且所述第一时长大于所述第二时长;所述第一控制器的CPU占用率与所述任一个控制器的CPU占用率的差值大于第四阈值,且所述第一控制器的CPU占用率大于所述任一个控制器的CPU的占用率;所述第一控制器相邻两秒内的IOPS的差值大于第五阈值;或,预设时长内,所述第一控制器向所述应用服务器发送第二信息的次数大于第六阈值,所述第二信息用于请求所述应用服务器重新发送数据读写请求。In a possible implementation manner, the data processing efficiency of the first controller is lower than that of any of the controllers, including one or more of the following: the difference between the amount of data processed by any of the controllers and the amount of data processed by the first controller within a cycle is greater than a first threshold, and the amount of data processed by any of the controllers is greater than the amount of data processed by the first controller; the difference between the IOPS of any of the controllers and the IOPS of the first controller is greater than a second threshold, and the IOPS of any of the controllers is greater than the IOPS of the first controller; a CPU core isolation event occurs in the first controller; the difference between a first duration used by the first controller to process a first amount of data and a second duration used by any of the controllers to process the first amount of data is greater than a third threshold, and the first duration is greater than the second duration; the difference between the CPU occupancy rate of the first controller and the CPU occupancy rate of any of the controllers is greater than a fourth threshold, and the CPU occupancy rate of the first controller is greater than the CPU occupancy rate of any of the controllers; the difference between the IOPS of the first controller within two adjacent seconds is greater than a fifth threshold; or, within a preset duration, the number of times the first controller sends the second information to the application server is greater than a sixth threshold, and the second information is used to request the application server to resend the data read and write request.

在一种可能的实施方式中,所述方法还包括:将所述第一链路的状态设置为低优先级状态。In a possible implementation manner, the method further includes: setting the state of the first link to a low priority state.

在一种可能的实施方式中,所述方法还包括:向所述应用服务器发送第三信息,所述第三信息用于指示所述应用服务器将所述第一链路的状态更新为所述低优先级状态。In a possible implementation manner, the method further includes: sending third information to the application server, where the third information is used to instruct the application server to update the state of the first link to the low priority state.

在一种可能的实施方式中,所述第一信息还用于指示第二链路,所述第二链路为所述至少两个控制器中的第二控制器与所述应用服务器之间的链路。In a possible implementation manner, the first information is further used to indicate a second link, where the second link is a link between a second controller of the at least two controllers and the application server.

获取所述应用服务器与所述至少两个控制器中除所述第一控制器之外的控制器之间的至少一条链路的负载信息,并确定所述至少一条链路中负载最小的链路为所述第二链路;或者,获取所述至少两个控制器中除所述第一控制器之外的其他控制器的负载信息,确定其中负载最小的控制器为所述第二控制器,并确定所述第二控制器与所述应用服务之间的链路中负载最小的链路为所述第二链路。Obtain load information of at least one link between the application server and a controller other than the first controller among the at least two controllers, and determine that the link with the smallest load among the at least one link is the second link; or, obtain load information of other controllers other than the first controller among the at least two controllers, determine that the controller with the smallest load among them is the second controller, and determine that the link with the smallest load among the links between the second controller and the application service is the second link.

第三方面,提供一种数据传输方法,该方法可以应用于存储系统中的应用服务器,所述存储系统包括存储服务器和所述应用服务器,所述存储服务器包括多个控制器,所述应用服务器与所述多个控制器中的至少两个控制器连接,所述方法包括:接收来自所述存储服务器的第一信息;其中,所述第一信息为在所述至少两个控制器中的第一控制器的数据处理效率低于所述至少两个控制器中除所述第一控制器外的任一个控制器时所述存储服务器发送的,所述第一信息用于指示所述应用服务器切换链路,所述第一信息包括第一链路的标识,所述第一链路为所述第一控制器与所述应用服务器之间的链路;将待通过所述第一链路传输的数据切换到所述第二链路,所述第二链路为所述至少两个控制器中的第二控制器与所述应用服务器之间的链路。According to a third aspect, a data transmission method is provided, which can be applied to an application server in a storage system, wherein the storage system comprises a storage server and the application server, the storage server comprises multiple controllers, and the application server is connected to at least two of the multiple controllers, the method comprising: receiving first information from the storage server; wherein the first information is sent by the storage server when the data processing efficiency of the first controller among the at least two controllers is lower than that of any other controller among the at least two controllers except the first controller, the first information is used to instruct the application server to switch links, the first information comprises an identifier of a first link, the first link is a link between the first controller and the application server; and the data to be transmitted through the first link is switched to the second link, the second link is a link between the second controller among the at least two controllers and the application server.

在一种可能的实施方式中,所述第一控制器的数据处理效率低于所述任一个控制器,包括如下一项或多项:一个周期内所述任一个控制器处理的数据量与所述第一控制器处理的数据量的差值大于第一阈值,且所述任一个控制器处理的数据量大于所述第一控制器处理的数据量;所述任一个控制器的IOPS与所述第一控制器的IOPS的差值大于第二阈值,且所述任一个控制器的IOPS大于所述第一控制器的IOPS;所述第一控制器发生中央处理器CPU核隔离事件;所述第一控制器处理第一数量的数据所用的第一时长与所述任一个控制器处理所述第一数量的数据所用的第二时长的差值大于第三阈值,且所述第一时长大于所述第二时长;所述第一控制器的CPU占用率与所述任一个控制器的CPU占用率的差值大于第四阈值,且所述第一控制器的CPU占用率大于所述任一个控制器的CPU的占用率;所述第一控制器相邻两秒内的IOPS的差值大于第五阈值;或,预设时长内,所述第一控制器向所述应用服务器发送第二信息的次数大于第六阈值,所述第二信息用于请求所述应用服务器重新发送数据读写请求。In a possible implementation manner, the data processing efficiency of the first controller is lower than that of any of the controllers, including one or more of the following: the difference between the amount of data processed by any of the controllers and the amount of data processed by the first controller within a cycle is greater than a first threshold, and the amount of data processed by any of the controllers is greater than the amount of data processed by the first controller; the difference between the IOPS of any of the controllers and the IOPS of the first controller is greater than a second threshold, and the IOPS of any of the controllers is greater than the IOPS of the first controller; a CPU core isolation event occurs in the first controller; the difference between a first duration used by the first controller to process a first amount of data and a second duration used by any of the controllers to process the first amount of data is greater than a third threshold, and the first duration is greater than the second duration; the difference between the CPU occupancy rate of the first controller and the CPU occupancy rate of any of the controllers is greater than a fourth threshold, and the CPU occupancy rate of the first controller is greater than the CPU occupancy rate of any of the controllers; the difference between the IOPS of the first controller within two adjacent seconds is greater than a fifth threshold; or, within a preset duration, the number of times the first controller sends the second information to the application server is greater than a sixth threshold, and the second information is used to request the application server to resend the data read and write request.

在一种可能的实施方式中,所述方法还包括:接收来自所述存储服务器的第三信息,所述第三信息用于指示将所述第一链路的状态更新为低优先级状态;将所述第一链路的状态设置为所述低优先级状态。In a possible implementation, the method further includes: receiving third information from the storage server, the third information being used to indicate that the state of the first link is updated to a low priority state; and setting the state of the first link to the low priority state.

在一种可能的实施方式中,所述第一信息还用于指示所述第二链路。In a possible implementation manner, the first information is also used to indicate the second link.

在一种可能的实施方式中,所述方法还包括:确定链路负载最小的链路为所述第二链路。In a possible implementation, the method further includes: determining that the link with the smallest link load is the second link.

第四方面,提供一种数据传输装置,该数据传输装置用于实现上述第二方面或第二方面中的任意一种方法,该数据传输装置包括相应的功能模块,分别用于实现以上方法中的步骤。功能可以通过硬件实现,也可以通过硬件执行相应的软件实现。硬件或软件包括一个或多个与上述功能相对应的模块。In a fourth aspect, a data transmission device is provided, the data transmission device is used to implement the second aspect or any one of the methods in the second aspect, and the data transmission device includes corresponding functional modules, which are respectively used to implement the steps in the above method. The functions can be implemented by hardware, or by hardware executing corresponding software. The hardware or software includes one or more modules corresponding to the above functions.

在一种可能的实施方式中,该数据传输装置可以设置于存储系统中的存储服务器,或所述数据传输装置为所述存储服务器,所述存储系统包括所述存储服务器和应用服务器,所述存储服务器包括多个控制器,所述应用服务器与所述多个控制器中的至少两个控制器连接,所述装置包括:处理模块和通信模块,所述处理模块用于在所述至少两个控制器中的第一控制器的数据处理效率低于所述至少两个控制器中除所述第一控制器外的任一个控制器时,控制所述通信模块向所述应用服务器发送第一信息;其中,所述第一信息用于指示所述应用服务器切换链路,所述第一信息包括第一链路的标识,所述第一链路为所述第一控制器与所述应用服务器之间的链路。In a possible implementation, the data transmission device may be arranged in a storage server in a storage system, or the data transmission device is the storage server, the storage system includes the storage server and an application server, the storage server includes multiple controllers, the application server is connected to at least two of the multiple controllers, and the device includes: a processing module and a communication module, the processing module is used to control the communication module to send first information to the application server when the data processing efficiency of the first controller among the at least two controllers is lower than any other controller among the at least two controllers except the first controller; wherein the first information is used to instruct the application server to switch the link, the first information includes an identifier of the first link, and the first link is the link between the first controller and the application server.

在一种可能的实施方式中,所述第一控制器的数据处理效率低于所述任一个控制器,包括如下一项或多项:一个周期内所述任一个控制器处理的数据量与所述第一控制器处理的数据量的差值大于第一阈值,且所述任一个控制器处理的数据量大于所述第一控制器处理的数据量;所述任一个控制器的IOPS与所述第一控制器的IOPS的差值大于第二阈值,且所述任一个控制器的IOPS大于所述第一控制器的IOPS;所述第一控制器发生中央处理器CPU核隔离事件;所述第一控制器处理第一数量的数据所用的第一时长与所述任一个控制器处理所述第一数量的数据所用的第二时长的差值大于第三阈值,且所述第一时长大于所述第二时长;所述第一控制器的CPU占用率与所述任一个控制器的CPU占用率的差值大于第四阈值,且所述第一控制器的CPU占用率大于所述任一个控制器的CPU的占用率;所述第一控制器相邻两秒内的IOPS的差值大于第五阈值;或,预设时长内,所述第一控制器向所述应用服务器发送第二信息的次数大于第六阈值,所述第二信息用于请求所述应用服务器重新发送数据读写请求。In a possible implementation manner, the data processing efficiency of the first controller is lower than that of any of the controllers, including one or more of the following: the difference between the amount of data processed by any of the controllers and the amount of data processed by the first controller within a cycle is greater than a first threshold, and the amount of data processed by any of the controllers is greater than the amount of data processed by the first controller; the difference between the IOPS of any of the controllers and the IOPS of the first controller is greater than a second threshold, and the IOPS of any of the controllers is greater than the IOPS of the first controller; a CPU core isolation event occurs in the first controller; the difference between a first duration used by the first controller to process a first amount of data and a second duration used by any of the controllers to process the first amount of data is greater than a third threshold, and the first duration is greater than the second duration; the difference between the CPU occupancy rate of the first controller and the CPU occupancy rate of any of the controllers is greater than a fourth threshold, and the CPU occupancy rate of the first controller is greater than the CPU occupancy rate of any of the controllers; the difference between the IOPS of the first controller within two adjacent seconds is greater than a fifth threshold; or, within a preset duration, the number of times the first controller sends the second information to the application server is greater than a sixth threshold, and the second information is used to request the application server to resend the data read and write request.

在一种可能的实施方式中,所述处理模块还用于:将所述第一链路的状态设置为低优先级状态。In a possible implementation manner, the processing module is further configured to: set the state of the first link to a low priority state.

在一种可能的实施方式中,所述通信模块还用于:向所述应用服务器发送第三信息,所述第三信息用于指示所述应用服务器将所述第一链路的状态更新为所述低优先级状态。In a possible implementation manner, the communication module is further used to: send third information to the application server, where the third information is used to instruct the application server to update the state of the first link to the low priority state.

在一种可能的实施方式中,所述第一信息还用于指示第二链路,所述第二链路为所述至少两个控制器中的第二控制器与所述应用服务器之间的链路。In a possible implementation manner, the first information is further used to indicate a second link, where the second link is a link between a second controller of the at least two controllers and the application server.

在一种可能的实施方式中,所述处理模块还用于:获取所述应用服务器与所述至少两个控制器中除所述第一控制器之外的控制器之间的至少一条链路的负载信息,并确定所述至少一条链路中负载最小的链路为所述第二链路;或者,获取所述至少两个控制器中除所述第一控制器之外的其他控制器的负载信息,确定其中负载最小的控制器为所述第二控制器,并确定所述第二控制器与所述应用服务之间的链路中负载最小的链路为所述第二链路。In a possible implementation, the processing module is further used to: obtain load information of at least one link between the application server and a controller other than the first controller among the at least two controllers, and determine that the link with the smallest load among the at least one link is the second link; or, obtain load information of other controllers other than the first controller among the at least two controllers, determine that the controller with the smallest load is the second controller, and determine that the link with the smallest load among the links between the second controller and the application service is the second link.

第五方面,提供又一种数据传输装置,该数据传输装置用于实现上述第三方面或第三方面中的任意一种方法,该数据传输装置包括相应的功能模块,分别用于实现以上方法中的步骤。功能可以通过硬件实现,也可以通过硬件执行相应的软件实现。硬件或软件包括一个或多个与上述功能相对应的模块。In a fifth aspect, another data transmission device is provided, which is used to implement the third aspect or any one of the methods in the third aspect, and the data transmission device includes corresponding functional modules, which are respectively used to implement the steps in the above method. The functions can be implemented by hardware, or by hardware executing corresponding software. The hardware or software includes one or more modules corresponding to the above functions.

在一种可能的实施方式中,该数据传输装置可以设置于存储系统中的应用服务器,或所述数据传输装置为所述应用服务器,所述存储系统包括存储服务器和所述应用服务器,所述存储服务器包括多个控制器,所述应用服务器与所述多个控制器中的至少两个控制器连接,所述装置包括通信模块和处理模块,所述通信模块用于接收来自所述存储服务器的第一信息;其中,所述第一信息为在所述至少两个控制器中的第一控制器的数据处理效率低于所述至少两个控制器中除所述第一控制器外的任一个控制器时所述存储服务器发送的,所述第一信息用于指示所述应用服务器切换链路,所述第一信息包括第一链路的标识,所述第一链路为所述第一控制器与所述应用服务器之间的链路;所述处理模块用于将待通过所述第一链路传输的数据切换到第二链路,所述第二链路为所述至少两个控制器中的第二控制器与所述应用服务器之间的链路。In a possible implementation, the data transmission device may be arranged in an application server in a storage system, or the data transmission device is the application server, the storage system includes a storage server and the application server, the storage server includes multiple controllers, the application server is connected to at least two of the multiple controllers, the device includes a communication module and a processing module, the communication module is used to receive first information from the storage server; wherein, the first information is sent by the storage server when the data processing efficiency of the first controller among the at least two controllers is lower than that of any other controller among the at least two controllers except the first controller, the first information is used to instruct the application server to switch the link, the first information includes an identifier of the first link, the first link is the link between the first controller and the application server; the processing module is used to switch the data to be transmitted through the first link to a second link, the second link is the link between the second controller among the at least two controllers and the application server.

在一种可能的实施方式中,所述第一控制器的数据处理效率低于所述任一个控制器,包括如下一项或多项:一个周期内所述任一个控制器处理的数据量与所述第一控制器处理的数据量的差值大于第一阈值,且所述任一个控制器处理的数据量大于所述第一控制器处理的数据量;所述任一个控制器的IOPS与所述第一控制器的IOPS的差值大于第二阈值,且所述任一个控制器的IOPS大于所述第一控制器的IOPS;所述第一控制器发生中央处理器CPU核隔离事件;所述第一控制器处理第一数量的数据所用的第一时长与所述任一个控制器处理所述第一数量的数据所用的第二时长的差值大于第三阈值,且所述第一时长大于所述第二时长;所述第一控制器的CPU占用率与所述任一个控制器的CPU占用率的差值大于第四阈值,且所述第一控制器的CPU占用率大于所述任一个控制器的CPU的占用率;所述第一控制器相邻两秒内的IOPS的差值大于第五阈值;或,预设时长内,所述第一控制器向所述应用服务器发送第二信息的次数大于第六阈值,所述第二信息用于请求所述应用服务器重新发送数据读写请求。In a possible implementation manner, the data processing efficiency of the first controller is lower than that of any of the controllers, including one or more of the following: the difference between the amount of data processed by any of the controllers and the amount of data processed by the first controller within a cycle is greater than a first threshold, and the amount of data processed by any of the controllers is greater than the amount of data processed by the first controller; the difference between the IOPS of any of the controllers and the IOPS of the first controller is greater than a second threshold, and the IOPS of any of the controllers is greater than the IOPS of the first controller; a CPU core isolation event occurs in the first controller; the difference between a first duration used by the first controller to process a first amount of data and a second duration used by any of the controllers to process the first amount of data is greater than a third threshold, and the first duration is greater than the second duration; the difference between the CPU occupancy rate of the first controller and the CPU occupancy rate of any of the controllers is greater than a fourth threshold, and the CPU occupancy rate of the first controller is greater than the CPU occupancy rate of any of the controllers; the difference between the IOPS of the first controller within two adjacent seconds is greater than a fifth threshold; or, within a preset duration, the number of times the first controller sends the second information to the application server is greater than a sixth threshold, and the second information is used to request the application server to resend the data read and write request.

在一种可能的实施方式中,所述通信模块还用于:接收来自所述存储服务器的第三信息,所述第三信息用于指示将所述第一链路的状态更新为低优先级状态;所述处理模块还用于:将所述第一链路的状态设置为所述低优先级状态。In a possible implementation, the communication module is further used to: receive third information from the storage server, the third information being used to indicate that the state of the first link is updated to a low priority state; and the processing module is further used to: set the state of the first link to the low priority state.

在一种可能的实施方式中,所述第一信息还用于指示所述第二链路。In a possible implementation manner, the first information is also used to indicate the second link.

在一种可能的实施方式中,所述处理模块还用于:确定链路负载最小的链路为所述第二链路。In a possible implementation manner, the processing module is further configured to: determine that the link with the smallest link load is the second link.

第六方面,提供一种存储系统,该存储系统包括存储服务器和应用服务器,所述存储服务器包括多个控制器,所述应用服务器与所述多个控制器中的至少两个控制器连接,其中,所述存储服务器,用于在所述至少两个控制器中的第一控制器的数据处理效率低于所述至少两个控制器中除所述第一控制器外的任一个控制器时,向所述应用服务器发送第一信息;其中,所述第一信息用于指示所述应用服务器切换链路,所述第一信息包括第一链路的标识,所述第一链路为所述第一控制器与所述应用服务器之间的链路;所述应用服务器,用于将待通过所述第一链路传输的数据切换到第二链路,所述第二链路为所述至少两个控制器中的第二控制器与所述应用服务器之间的链路。In a sixth aspect, a storage system is provided, which includes a storage server and an application server, the storage server includes multiple controllers, and the application server is connected to at least two of the multiple controllers, wherein the storage server is used to send first information to the application server when the data processing efficiency of a first controller among the at least two controllers is lower than that of any controller among the at least two controllers except the first controller; wherein the first information is used to instruct the application server to switch a link, the first information includes an identifier of a first link, the first link being a link between the first controller and the application server; the application server is used to switch data to be transmitted through the first link to a second link, the second link being a link between the second controller among the at least two controllers and the application server.

在一种可能的实施方式中,所述第一控制器的数据处理效率低于所述任一个控制器,包括如下一项或多项:一个周期内所述任一个控制器处理的数据量与所述第一控制器处理的数据量的差值大于第一阈值,且所述任一个控制器处理的数据量大于所述第一控制器处理的数据量;所述任一个控制器的每秒读写次数IOPS与所述第一控制器的IOPS的差值大于第二阈值,且所述任一个控制器的IOPS大于所述第一控制器的IOPS;所述第一控制器发生中央处理器CPU核隔离事件;所述第一控制器处理第一数量的数据所用的第一时长与所述任一个控制器处理所述第一数量的数据所用的第二时长的差值大于第三阈值,且所述第一时长大于所述第二时长;所述第一控制器的CPU占用率与所述任一个控制器的CPU占用率的差值大于第四阈值,且所述第一控制器的CPU占用率大于所述任一个控制器的CPU的占用率;所述第一控制器相邻两秒内的IOPS的差值大于第五阈值;或,预设时长内,所述第一控制器向所述应用服务器发送第二信息的次数大于第六阈值,所述第二信息用于请求所述应用服务器重新发送数据读写请求。In a possible implementation manner, the data processing efficiency of the first controller is lower than that of any of the controllers, including one or more of the following: the difference between the amount of data processed by any of the controllers and the amount of data processed by the first controller in one cycle is greater than a first threshold, and the amount of data processed by any of the controllers is greater than the amount of data processed by the first controller; the difference between the number of reads and writes per second (IOPS) of any of the controllers and the IOPS of the first controller is greater than a second threshold, and the IOPS of any of the controllers is greater than the IOPS of the first controller; a CPU core isolation event occurs in the first controller; the difference between a first duration used by the first controller to process a first amount of data and a second duration used by any of the controllers to process the first amount of data is greater than a third threshold, and the first duration is greater than the second duration; the difference between the CPU occupancy rate of the first controller and the CPU occupancy rate of any of the controllers is greater than a fourth threshold, and the CPU occupancy rate of the first controller is greater than the CPU occupancy rate of any of the controllers; the difference between the IOPS of the first controller within two adjacent seconds is greater than a fifth threshold; or, within a preset duration, the number of times the first controller sends the second information to the application server is greater than a sixth threshold, and the second information is used to request the application server to resend the data read and write request.

在一种可能的实施方式中,所述存储服务器还用于:将所述第一链路的状态设置为低优先级状态。In a possible implementation manner, the storage server is further configured to: set the state of the first link to a low priority state.

在一种可能的实施方式中,所述存储服务器还用于:向所述应用服务器发送第三信息,所述第三信息用于指示所述应用服务器将所述第一链路的状态更新为所述低优先级状态;所述应用服务器还用于:将所述第一链路的状态设置为所述低优先级状态。In a possible implementation, the storage server is further used to: send third information to the application server, wherein the third information is used to instruct the application server to update the state of the first link to the low priority state; the application server is further used to: set the state of the first link to the low priority state.

在一种可能的实施方式中,所述应用服务器,用于确定链路负载最小的链路为所述第二链路;或,所述第一信息还用于指示所述第二链路。In a possible implementation manner, the application server is used to determine that the link with the smallest link load is the second link; or the first information is also used to indicate the second link.

在一种可能的实施方式中,所述第一信息还用于指示所述第二链路,所述存储服务器还用于:获取所述应用服务器与所述至少两个控制器中除所述第一控制器之外的控制器之间的至少一条链路的负载信息,并确定所述至少一条链路中负载最小的链路为所述第二链路;或者,获取所述至少两个控制器中除所述第一控制器之外的其他控制器的负载信息,确定其中负载最小的控制器为所述第二控制器,并确定所述第二控制器与所述应用服务之间的链路中负载最小的链路为所述第二链路。In a possible implementation, the first information is also used to indicate the second link, and the storage server is further used to: obtain load information of at least one link between the application server and a controller other than the first controller among the at least two controllers, and determine that the link with the smallest load among the at least one link is the second link; or, obtain load information of other controllers other than the first controller among the at least two controllers, determine that the controller with the smallest load is the second controller, and determine that the link with the smallest load among the links between the second controller and the application service is the second link.

第七方面,提供一种计算设备集群,该计算设备集群包括至少一个计算设备,每个计算设备包括处理器和存储器,其中,该处理器用于执行所述至少一个计算设备的存储器中存储的指令,以使得所述计算设备集群执行上述第二方面或第三方面提供的方法。In a seventh aspect, a computing device cluster is provided, which includes at least one computing device, each computing device including a processor and a memory, wherein the processor is used to execute instructions stored in the memory of the at least one computing device, so that the computing device cluster executes the method provided in the second or third aspect above.

第八方面,提供一种计算机可读存储介质,所述计算机可读存储介质用于存储计算机程序,当所述计算机程序在计算机上运行时,使得所述计算机执行如上述第二方面或第三方面提供的方法。In an eighth aspect, a computer-readable storage medium is provided, wherein the computer-readable storage medium is used to store a computer program. When the computer program is run on a computer, the computer executes the method provided in the second or third aspect above.

第九方面,提供一种计算机程序产品,包括计算机程序,当所述计算机程序在计算机上运行时,使得所述计算机执行如上述第二方面或第三方面提供的方法。In a ninth aspect, a computer program product is provided, comprising a computer program, which, when executed on a computer, enables the computer to execute the method provided in the second or third aspect above.

第十方面,提供一种芯片系统,包括处理器和接口,所述处理器用于从所述接口调用并运行指令,以使所述芯片系统实现上述第二方面或第三方面提供的方法。In a tenth aspect, a chip system is provided, comprising a processor and an interface, wherein the processor is used to call and execute instructions from the interface so that the chip system implements the method provided in the second or third aspect above.

上述第二方面至第十方面的有益效果,参见第一方面的有益效果,在此不再赘述。For the beneficial effects of the second to tenth aspects mentioned above, please refer to the beneficial effects of the first aspect and will not be repeated here.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1为本申请实施例提供的一种存储系统的结构示意图;FIG1 is a schematic diagram of the structure of a storage system provided in an embodiment of the present application;

图2A为本申请实施例提供的另一种存储系统的结构示意图;FIG2A is a schematic diagram of the structure of another storage system provided in an embodiment of the present application;

图2B为本申请实施例提供的又一种存储系统的结构示意图;FIG2B is a schematic diagram of the structure of another storage system provided in an embodiment of the present application;

图3为本申请实施例提供的一种数据传输方法的流程示意图;FIG3 is a schematic diagram of a flow chart of a data transmission method provided in an embodiment of the present application;

图4为本申请实施例提供的一种数据传输装置的结构框图;FIG4 is a structural block diagram of a data transmission device provided in an embodiment of the present application;

图5为本申请实施例提供的另一种数据传输装置的结构框图;FIG5 is a structural block diagram of another data transmission device provided in an embodiment of the present application;

图6为本申请实施例提供的一种计算设备;FIG6 is a computing device provided in an embodiment of the present application;

图7为本申请实施例提供的一种计算设备集群;FIG7 is a computing device cluster provided in an embodiment of the present application;

图8为本申请实施例提供的另一种计算设备集群。FIG8 is another computing device cluster provided in an embodiment of the present application.

具体实施方式DETAILED DESCRIPTION

为了使本申请实施例的目的、技术方案和优点更加清楚,下面将结合附图,对本申请实施例进行详细描述。本申请的实施方式部分使用的术语仅用于对本申请的具体实施例进行解释,而非旨在限定本申请。In order to make the purpose, technical solution and advantages of the embodiments of the present application clearer, the embodiments of the present application will be described in detail below in conjunction with the accompanying drawings. The terms used in the implementation method section of the present application are only used to explain the specific embodiments of the present application, and are not intended to limit the present application.

本申请实施例中“多个”是指两个或两个以上,鉴于此,本申请实施例中也可以将“多个”理解为“至少两个”。“至少一个”,可理解为一个或多个,例如理解为一个、两个或更多个。例如,包括至少一个,是指包括一个、两个或更多个,而且不限制包括的是哪几个,例如,包括A、B和C中的至少一个,那么包括的可以是A、B、C、A和B、A和C、B和C、或A和B和C。“和/或”,描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。另外,字符“/”,如无特殊说明,一般表示前后关联对象是一种“或”的关系。In the embodiments of the present application, "multiple" refers to two or more than two. In view of this, in the embodiments of the present application, "multiple" can also be understood as "at least two". "At least one" can be understood as one or more, for example, one, two or more. For example, including at least one means including one, two or more, and there is no restriction on which ones are included. For example, including at least one of A, B and C, then A, B, C, A and B, A and C, B and C, or A and B and C may be included. "And/or" describes the association relationship of associated objects, indicating that three relationships may exist. For example, A and/or B can represent: A exists alone, A and B exist at the same time, and B exists alone. In addition, the character "/", unless otherwise specified, generally indicates that the previously associated objects are in an "or" relationship.

除非有相反的说明,本申请实施例提及“第一”、“第二”等序数词用于对多个对象进行区分,不用于限定多个对象的顺序、时序、优先级或者重要程度。Unless otherwise specified, the ordinal numbers such as "first" and "second" mentioned in the embodiments of the present application are used to distinguish multiple objects and are not used to limit the order, timing, priority or importance of the multiple objects.

目前,存储系统中存储服务器和应用服务器之间的负载均衡主要是通过获取每个控制器与应用服务器之间的链路之间的负载均衡程度和/或存储服务器中不同控制器之间的负载均衡程度,并通过这两种负载均衡程度确定存储系统的整体链路的负载均衡程度,即,确定存储系统中的所有链路的整体负载均衡程度。进一步可基于该整体负载均衡程度进行负载均衡,以实现不同控制器之间的负载均衡。其中,链路的负载均衡程度用于指示通过不同的链路传输的数据量之间的差异;控制器的负载均衡程度用于指示通过不同的控制器处理的数据量的差异。其中,如果差异较小,表明负载均衡程度较高;如果差异较大,表明负载均衡程度较低。可选的,差异较小可以是指最高负载和最低负载之间的差异较小,差异较大是指最高负载和最低负载之间的差异较大。这样,进行负载均衡之后的每个控制器待处理的数据量基本相同,但是,如果某个控制器由于硬盘老化等现象导致处理数据的能力降低,将会导致该控制器相当于其他控制器处理相同数据量的数据所花费的时长更长,无法及时处理接收的新的数据,从而将该新的数据丢弃,使得由该控制器和应用服务器之间的链路传输的数据发生重传的概率变高,影响存储系统的性能。例如,存储服务器包括2个控制器,分别为控制器1和控制器2,其中控制器1通过1条链路与应用服务器连接,控制器2通过2条链路与应用服务器连接。如果控制器1处理数据的能力降低,根据存储系统的整体负载均衡程度对该存储系统进行负载均衡之后,待通过该3条链路传输的数据量基本相同。但是由于控制器1处理数据的能力低于控制器2处理数据的能力,使得控制器1与应用服务器之间的该1条链路发生重传的概率较高,影响存储系统的性能。At present, the load balancing between the storage server and the application server in the storage system is mainly achieved by obtaining the load balancing degree between the links between each controller and the application server and/or the load balancing degree between different controllers in the storage server, and determining the load balancing degree of the overall link of the storage system through these two load balancing degrees, that is, determining the overall load balancing degree of all links in the storage system. Further, load balancing can be performed based on the overall load balancing degree to achieve load balancing between different controllers. Among them, the load balancing degree of the link is used to indicate the difference between the amount of data transmitted through different links; the load balancing degree of the controller is used to indicate the difference in the amount of data processed by different controllers. Among them, if the difference is small, it indicates that the load balancing degree is high; if the difference is large, it indicates that the load balancing degree is low. Optionally, a small difference can refer to a small difference between the highest load and the lowest load, and a large difference refers to a large difference between the highest load and the lowest load. In this way, the amount of data to be processed by each controller after load balancing is basically the same. However, if the data processing capability of a controller is reduced due to the aging of the hard disk or other phenomena, it will cause the controller to spend longer time than other controllers to process the same amount of data, and it will not be able to process the new data received in time, so the new data will be discarded, which makes the probability of retransmission of data transmitted by the link between the controller and the application server higher, affecting the performance of the storage system. For example, the storage server includes two controllers, controller 1 and controller 2, wherein controller 1 is connected to the application server through one link, and controller 2 is connected to the application server through two links. If the data processing capability of controller 1 is reduced, after the storage system is load balanced according to the overall load balancing degree of the storage system, the amount of data to be transmitted through the three links is basically the same. However, since the data processing capability of controller 1 is lower than that of controller 2, the probability of retransmission of the link between controller 1 and the application server is higher, affecting the performance of the storage system.

鉴于此,本申请实施例中,存储服务器可以检测所包括的至少两个控制器中每个控制器的数据处理效率,如果出现某个控制器的数据处理效率小于其他控制器中的任一个控制器,则表明该控制器数据处理能力下降。此时,存储服务器可以指示应用服务器将待通过该控制器对应的某条链路传输的数据迁移到其他控制器的链路上传输,由此可以减少数据处理效率较低的控制器待处理的数据量,提升存储系统的性能。In view of this, in the embodiment of the present application, the storage server can detect the data processing efficiency of each of the at least two controllers included. If the data processing efficiency of a controller is lower than that of any of the other controllers, it indicates that the data processing capacity of the controller has decreased. At this time, the storage server can instruct the application server to migrate the data to be transmitted through a link corresponding to the controller to the link of other controllers for transmission, thereby reducing the amount of data to be processed by the controller with lower data processing efficiency and improving the performance of the storage system.

请参考图1,为本申请实施例提供的一种存储系统的示意图。该存储系统包括存储服务器10以及一个或多个应用服务器20。其中,存储服务器10可以为一个或多个应用服务器20提供数据存储服务,每个应用服务器20上安装有客户端(图中未示出),应用服务器20可以通过客户端访问存储服务器10,以向存储服务器10写入数据,或从存储服务器10读取数据。Please refer to Figure 1, which is a schematic diagram of a storage system provided in an embodiment of the present application. The storage system includes a storage server 10 and one or more application servers 20. The storage server 10 can provide data storage services for one or more application servers 20, and each application server 20 is installed with a client (not shown in the figure). The application server 20 can access the storage server 10 through the client to write data to the storage server 10 or read data from the storage server 10.

可选的,不同的应用服务器20可以通过不同的访问协议访问存储服务器10以写入和/或读取数据。存储服务器10能够支持多种访问协议,例如,服务器信息块(servermessage block,SMB)协议或网络文件系统(network file system,NFS)协议等。例如,操作系统为Windows的应用服务器20可以通过SMB协议访问存储服务器10以写入和/或读取数据;操作系统为Linux或Unix的应用服务器20可以通过NFS协议访问存储服务器10以写入和/或读取数据。Optionally, different application servers 20 can access the storage server 10 through different access protocols to write and/or read data. The storage server 10 can support multiple access protocols, such as the server message block (SMB) protocol or the network file system (NFS) protocol. For example, an application server 20 with a Windows operating system can access the storage server 10 through the SMB protocol to write and/or read data; an application server 20 with an operating system of Linux or Unix can access the storage server 10 through the NFS protocol to write and/or read data.

请参考图2A,为本申请实施例提供的另一种存储系统,该存储系统例如为图1所示的存储系统的一种细化结构。该存储系统包括存储服务器10和应用服务器20。其中,应用服务器20包括至少一个第一端口(port),存储服务器10包括至少两个控制器,每个控制器包括至少一个第二端口。以图2A为例,应用服务器20包括两个第一端口,分别为port-a和port-b,存储服务器10包括两个控制器,分别为控制器A和控制器B,控制器A包括两个第二端口,分别为port1和port2,控制器B包括两个第二端口,分别为port3和port4。可选的,一个第一端口和一个第二端口连接可以形成一条链路,应用服务器20与存储服务器10之间可以存在多条链路,该多条链路例如是至少一个第一端口和至少两个第二端口连接得到的。其中,该至少两个第二端口属于至少两个控制器。可以理解的是,该第一端口和第二端口可以是物理端口也可以是逻辑端口,本申请实施例不对端口类型进行限制。Please refer to FIG. 2A, which is another storage system provided in an embodiment of the present application. The storage system is, for example, a detailed structure of the storage system shown in FIG. 1. The storage system includes a storage server 10 and an application server 20. Among them, the application server 20 includes at least one first port (port), and the storage server 10 includes at least two controllers, each of which includes at least one second port. Taking FIG. 2A as an example, the application server 20 includes two first ports, namely port-a and port-b, and the storage server 10 includes two controllers, namely controller A and controller B, respectively. Controller A includes two second ports, namely port1 and port2, respectively, and controller B includes two second ports, namely port3 and port4. Optionally, a first port and a second port are connected to form a link, and there may be multiple links between the application server 20 and the storage server 10, and the multiple links are, for example, obtained by connecting at least one first port and at least two second ports. Among them, the at least two second ports belong to at least two controllers. It can be understood that the first port and the second port can be physical ports or logical ports, and the embodiment of the present application does not limit the port type.

可选的,该多条链路可以是应用服务器的一个第一端口与控制器A和控制器B的至少两个第二端口连接得到的,以应用服务器20与存储服务器10之间存在两条链路为例,这两条链路是port-a分别与port1和port3连接得到的。或者,该多条链路还可以是应用服务器20的两个第一端口与控制器A和控制器B的至少两个第二端口连接得到的,继续以应用服务器20与存储服务器10之间存在两条链路为例,这两条链路是port-a与port1连接,port-b与port-4连接得到的。Optionally, the multiple links may be obtained by connecting a first port of the application server to at least two second ports of the controller A and the controller B. For example, there are two links between the application server 20 and the storage server 10. The two links are obtained by connecting port-a to port1 and port3 respectively. Alternatively, the multiple links may also be obtained by connecting two first ports of the application server 20 to at least two second ports of the controller A and the controller B. For example, there are two links between the application server 20 and the storage server 10. The two links are obtained by connecting port-a to port1 and port-b to port-4.

可选的,应用服务器20可以通过该多条链路访问控制器A和/或控制器B进行数据传输。例如,应用服务器20可以通过该多条链路向控制器A和/或控制器B发送数据读写请求,用于请求读取数据或写入数据。控制器A或控制器B在接收到该请求时,可以基于该请求写入或读取数据,并向应用服务器20发送反馈信息。例如如果应用服务器20通过该多条链路向控制器A发送数据读取请求,控制器A基于该数据读取请求读取数据,例如从分布式文件系统读取数据,并将读取的数据发送给应用服务器20。如果应用服务器20通过该多条链路向控制器A发送数据写入请求,控制器A可以将应用服务器20发送的数据写入分布式文件系统,并向应用服务器20发送数据已写入的反馈信息。Optionally, the application server 20 may access the controller A and/or the controller B through the multiple links for data transmission. For example, the application server 20 may send a data read and write request to the controller A and/or the controller B through the multiple links to request to read data or write data. When receiving the request, the controller A or the controller B may write or read data based on the request, and send feedback information to the application server 20. For example, if the application server 20 sends a data read request to the controller A through the multiple links, the controller A reads data based on the data read request, such as reading data from a distributed file system, and sends the read data to the application server 20. If the application server 20 sends a data write request to the controller A through the multiple links, the controller A may write the data sent by the application server 20 into the distributed file system, and send feedback information to the application server 20 that the data has been written.

可选的,当存储服务器10为多个应用服务器20提供数据存储服务时,多个应用服务器20还可以通过交换机与存储服务器10连接。例如请参考图2B,该存储系统还包括交换机30,应用服务器20的每个第一端口可以与交换机30建立连接,以及交换机30与存储服务器10的每个控制器的每个第二端口建立连接。此时,应用服务器20与存储服务器10的每条链路包括应用服务器20的第一端口与交换机30连接得到的第一子链路和交换机30与控制器的第二端口连接得到的第二子链路。应用服务器20可以通过第一子链路向交换机30发送数据读写请求,该请求可以携带第二端口的标识,交换机30可以基于该请求中携带的标识通过对应的第二子链路向控制器发送该请求。Optionally, when the storage server 10 provides data storage services for multiple application servers 20, the multiple application servers 20 may also be connected to the storage server 10 through a switch. For example, please refer to FIG. 2B , the storage system also includes a switch 30, each first port of the application server 20 may establish a connection with the switch 30, and the switch 30 may establish a connection with each second port of each controller of the storage server 10. At this time, each link between the application server 20 and the storage server 10 includes a first sub-link obtained by connecting the first port of the application server 20 with the switch 30 and a second sub-link obtained by connecting the switch 30 with the second port of the controller. The application server 20 may send a data read and write request to the switch 30 through the first sub-link, and the request may carry an identifier of the second port. The switch 30 may send the request to the controller through the corresponding second sub-link based on the identifier carried in the request.

上述技术方案中,应用服务器20通过多条链路与存储服务器的控制器连接,使得在一条链路故障,例如应用服务器20的端口或控制器的端口出现硬件老化、失效、断电或人为误拔等情况时,可以快速的将故障端口的业务切换到其他端口,并当故障端口恢复工作后,又可以重新接管切换到其他端口的业务,这样,既可以充分利用存储服务器存储资源,又可以确保业务的数据稳定传输。In the above technical solution, the application server 20 is connected to the controller of the storage server through multiple links, so that when a link fails, for example, the port of the application server 20 or the port of the controller has hardware aging, failure, power failure or human error, the business of the failed port can be quickly switched to other ports, and when the failed port resumes working, it can take over the business switched to other ports again. In this way, the storage resources of the storage server can be fully utilized, and the stable transmission of business data can be ensured.

基于上述内容,下面结合说明书附图对本申请实施例提供的方法进行介绍。Based on the above content, the method provided in the embodiment of the present application is introduced below in conjunction with the drawings in the specification.

请参考图3,为本申请实施例提供的一种数据传输方法的流程示意图,该方法可以通过图1、图2A或图2B所示的存储系统实现。Please refer to FIG. 3 , which is a flow chart of a data transmission method provided in an embodiment of the present application. The method can be implemented by the storage system shown in FIG. 1 , FIG. 2A or FIG. 2B .

S301:当至少两个控制器中的第一控制器的数据处理效率低于该至少两个控制器中除第一控制器外的任一个控制器时,存储服务器10向应用服务器20发送第一信息。相应的,应用服务器20接收来自存储服务器10的该第一信息。其中,至少两个控制器是存储服务器10包括的部分或全部控制器。S301: When the data processing efficiency of a first controller among at least two controllers is lower than that of any controller other than the first controller among the at least two controllers, the storage server 10 sends first information to the application server 20. Accordingly, the application server 20 receives the first information from the storage server 10. The at least two controllers are part or all of the controllers included in the storage server 10.

在控制器的使用过程中,可能由于硬件老化等问题而导致某个控制器的数据处理能力降低。例如该控制器的数据处理效率可能低于存储服务器10中的其他控制器的数据处理效率,该控制器在接收到新的数据时,由于数据处理能力不足,会丢弃该新的数据,并请求应用服务器重新发送该新的数据。这会使得该控制器和应用服务器之间的链路上数据重传的次数较多,可能会导致链路拥塞,影响存储系统的性能。During the use of the controller, the data processing capability of a controller may be reduced due to hardware aging and other issues. For example, the data processing efficiency of the controller may be lower than that of other controllers in the storage server 10. When the controller receives new data, it will discard the new data due to insufficient data processing capability and request the application server to resend the new data. This will cause a large number of data retransmissions on the link between the controller and the application server, which may cause link congestion and affect the performance of the storage system.

因此可选的,存储服务器10可以对存储服务器10包括的至少两个控制器中每个控制器进行检测,例如定时、周期性或非周期性检测,以确定至少两个控制器的数据处理效率。从而存储服务器10可以根据检测结果确定是否有控制器的数据处理效率低于除了该控制器外的其他控制器的处理效率。或者,存储服务器10也可以不必检测控制器,例如该存储服务器所包括的部分或全部控制器中的每个控制器可将该每个控制器的数据处理效率发送给存储服务器,例如定时发送、周期性发送或非周期性发送,则存储服务器10可以获得这部分或全部控制器的数据处理效率,由此可以确定是否有控制器的数据处理效率低于除了该控制器外的其他控制器的处理效率。其中,如果第一控制器的数据处理效率低于该至少两个控制器中除第一控制器外的任一个控制器,则存储服务器10可以发送第一信息,以减少数据处理效率较低的控制器待处理的数据量,降低应用服务器20重传数据读写请求的概率。或者,如果不存在数据处理效率低于其他控制器的数据处理效率的控制器,则存储服务器10可以不发送第一信息。Therefore, optionally, the storage server 10 may detect each of the at least two controllers included in the storage server 10, such as timing, periodic or non-periodic detection, to determine the data processing efficiency of the at least two controllers. Thus, the storage server 10 can determine whether there is a controller whose data processing efficiency is lower than the processing efficiency of other controllers except the controller according to the detection result. Alternatively, the storage server 10 may not need to detect the controller, for example, each of the part or all of the controllers included in the storage server may send the data processing efficiency of each controller to the storage server, such as timing, periodic or non-periodic transmission, and the storage server 10 may obtain the data processing efficiency of this part or all of the controllers, thereby determining whether there is a controller whose data processing efficiency is lower than the processing efficiency of other controllers except the controller. Among them, if the data processing efficiency of the first controller is lower than any controller other than the first controller in the at least two controllers, the storage server 10 may send the first information to reduce the amount of data to be processed by the controller with lower data processing efficiency, and reduce the probability of the application server 20 retransmitting data read and write requests. Alternatively, if there is no controller whose data processing efficiency is lower than the data processing efficiency of other controllers, the storage server 10 may not send the first information.

其中,第一控制器的数据处理效率低于至少两个控制器中除第一控制器外的任一个控制器,可以理解为,第一控制器的数据处理效率低于至少两个控制器中除第一控制器外的所有控制器的数据处理效率。例如共有3个控制器,分别为控制器1~控制器3。控制器1的数据处理效率低于除控制器1外的任一个控制器的数据处理效率,可以理解为,控制器1的数据处理效率低于控制器2的数据处理效率,也低于控制器3的数据处理效率。Among them, the data processing efficiency of the first controller is lower than that of any controller other than the first controller among the at least two controllers, which can be understood as the data processing efficiency of the first controller is lower than the data processing efficiency of all controllers other than the first controller among the at least two controllers. For example, there are 3 controllers, namely controller 1 to controller 3. The data processing efficiency of controller 1 is lower than that of any controller other than controller 1, which can be understood as the data processing efficiency of controller 1 is lower than that of controller 2 and lower than that of controller 3.

可选的,存储服务器10可以确定是否有控制器的数据处理效率在第一时长内低于除了该控制器外的其他控制器的处理效率。如果第一控制器的数据处理效率在第一时长内低于该至少两个控制器中除第一控制器外的任一个控制器,则存储服务器10可以发送第一信息,否则存储服务器10可以不发送第一信息。通过第一时长,可以增加判断结果的稳定性,减小业务来回切换的概率,以及减少存储服务器10与应用服务器20之间的信息交互,从而可以减小传输开销。Optionally, the storage server 10 may determine whether the data processing efficiency of any controller is lower than the processing efficiency of other controllers except the controller within the first time period. If the data processing efficiency of the first controller is lower than any controller of the at least two controllers except the first controller within the first time period, the storage server 10 may send the first information, otherwise the storage server 10 may not send the first information. Through the first time period, the stability of the judgment result can be increased, the probability of business switching back and forth can be reduced, and the information interaction between the storage server 10 and the application server 20 can be reduced, thereby reducing the transmission overhead.

其中,第一控制器的数据处理效率在第一时长内低于该至少两个控制器中除第一控制器外的任一个控制器,可理解为,在第一时长内可以一次或多次确定第一控制器的数据处理效率,其中每次所确定的数据处理效率均低于该至少两个控制器中除第一控制器外的任一个控制器;或者,这一次或多次所确定的数据处理效率的平均值低于该至少两个控制器中除第一控制器外的任一个控制器。Among them, the data processing efficiency of the first controller is lower than that of any one of the at least two controllers except the first controller within the first time period. It can be understood that the data processing efficiency of the first controller can be determined once or multiple times within the first time period, and the data processing efficiency determined each time is lower than that of any one of the at least two controllers except the first controller; or, the average value of the data processing efficiency determined this time or multiple times is lower than that of any one of the at least two controllers except the first controller.

可选的,第一控制器的数据处理效率低于该任一个控制器,包括如下一项或多项:一个周期内任一个控制器处理数据的数据量与第一控制器处理的数据量的差值大于第一阈值,且任一个控制器处理的数据量大于第一控制器处理的数据量;任一个控制器的IOPS与所述第一控制器的IOPS的差值大于第二阈值,且任一个控制器的IOPS大于第一控制器的IOPS;第一控制器发生CPU核隔离事件;第一控制器处理第一数量的数据所用的第一时长与任一个控制器处理第一数量的数据所用的第二时长的差值大于第三阈值,且第一时长大于第二时长;第一控制器的CPU占用率与任一个控制器的CPU占用率的差值大于第四阈值,且第一控制器的CPU占用率大于任一个控制器的CPU的占用率;第一控制器相邻两秒内的IOPS的差值大于第五阈值;和/或,预设时长内,第一控制器向应用服务器20发送第二信息的次数大于第六阈值,第二信息用于请求所述应用服务器20重新发送数据读写请求。Optionally, the data processing efficiency of the first controller is lower than that of any other controller, including one or more of the following: the difference between the amount of data processed by any other controller and the amount of data processed by the first controller within a cycle is greater than a first threshold, and the amount of data processed by any other controller is greater than the amount of data processed by the first controller; the difference between the IOPS of any other controller and the IOPS of the first controller is greater than a second threshold, and the IOPS of any other controller is greater than the IOPS of the first controller; a CPU core isolation event occurs in the first controller; the difference between a first duration used by the first controller to process a first amount of data and a second duration used by any other controller to process the first amount of data is greater than a third threshold, and the first duration is greater than the second duration; the difference between the CPU occupancy rate of the first controller and the CPU occupancy rate of any other controller is greater than a fourth threshold, and the CPU occupancy rate of the first controller is greater than the CPU occupancy rate of any other controller; the difference in IOPS of the first controller within two adjacent seconds is greater than a fifth threshold; and/or, within a preset duration, the number of times the first controller sends the second information to the application server 20 is greater than a sixth threshold, and the second information is used to request the application server 20 to resend the data read and write request.

示例性的,存储服务器10可以周期性的统计每个控制器在一个周期内处理数据的数据量,基于该数据量确定该至少两个控制器中每个控制器的数据处理效率,并基于每个控制器的数据处理效率确定是否存在该第一控制器。可选的,在基于数据量确定该至少两个控制器中每个控制器的数据处理效率时,前述的一次或多次中的每次所确定的控制器的数据处理效率可以通过一个周期内处理的数据量进行体现。以及,前述的一次或多次所确定数据处理效率的平均值可以通过第一时长内处理的数据总量体现。Exemplarily, the storage server 10 may periodically count the amount of data processed by each controller in one cycle, determine the data processing efficiency of each of the at least two controllers based on the data volume, and determine whether the first controller exists based on the data processing efficiency of each controller. Optionally, when determining the data processing efficiency of each of the at least two controllers based on the data volume, the data processing efficiency of the controller determined each time in the aforementioned one or more times can be reflected by the amount of data processed in one cycle. And, the average value of the data processing efficiency determined one or more times can be reflected by the total amount of data processed in the first time period.

例如,存储服务器10可以统计每个控制器在一个周期内输入输出(input/output,IO)数据的数据量,并计算任意两个控制器在该周期内处理数据的数据量的差值。如果存储服务器10计算得到第一控制器与任一个控制器之间在该周期内处理数据的数据量的差值的绝对值大于第一阈值,则表明第一控制器的数据处理效率低于该至少两个控制器中的其他控制器,即存在该第一控制器。可选的,控制器在一个周期内处理数据的数据量也可以称作是控制器的IO带宽。For example, the storage server 10 can count the amount of input/output (IO) data of each controller in one cycle, and calculate the difference in the amount of data processed by any two controllers in the cycle. If the storage server 10 calculates that the absolute value of the difference in the amount of data processed between the first controller and any other controller in the cycle is greater than a first threshold, it indicates that the data processing efficiency of the first controller is lower than that of the other controllers in the at least two controllers, that is, the first controller exists. Optionally, the amount of data processed by a controller in one cycle can also be referred to as the IO bandwidth of the controller.

存储服务器10还可以周期性的统计每个控制器的IOPS,基于每个控制器的IOPS确定该至少两个控制器中每个控制器的数据处理效率,并基于每个控制器的数据处理效率确定是否存在该第一控制器。The storage server 10 may also periodically count the IOPS of each controller, determine the data processing efficiency of each of the at least two controllers based on the IOPS of each controller, and determine whether the first controller exists based on the data processing efficiency of each controller.

例如,存储服务器10可以周期性的获取1秒内每个控制器执行数据写入和读取的次数,计算每个控制器在1秒内执行数据写入和读取任务的总次数,并计算任意两个控制器在1秒内执行数据写入和读取任务的总次数的差值的绝对值。如果存储服务器10计算得到第一控制器与任一个控制器在1秒内执行数据写入和读取任务的总次数的差值的绝对值大于第二阈值,则表明第一控制器的数据处理效率低于该至少两个控制器中的其他控制器,即存在该第一控制器。For example, the storage server 10 may periodically obtain the number of times each controller performs data writing and reading within 1 second, calculate the total number of times each controller performs data writing and reading tasks within 1 second, and calculate the absolute value of the difference between the total number of times any two controllers perform data writing and reading tasks within 1 second. If the storage server 10 calculates that the absolute value of the difference between the total number of times the first controller and any other controller perform data writing and reading tasks within 1 second is greater than the second threshold, it indicates that the data processing efficiency of the first controller is lower than that of the other controllers of the at least two controllers, that is, the first controller exists.

存储服务器10还可以检测每个控制器在相邻两秒内的IOPS,基于该IOPS确定该至少两个控制器中每个控制该器的数据处理效率,并基于每个控制器的数据处理效率确定是否存在该第一控制器。The storage server 10 may also detect the IOPS of each controller within two consecutive seconds, determine the data processing efficiency of each of the at least two controllers based on the IOPS, and determine whether the first controller exists based on the data processing efficiency of each controller.

例如,存储服务器10可以计算每个控制器相邻两秒的IOPS的差值,如果存储服务器10计算得到第一控制器相邻两秒的IOPS的差值的绝对值大于第五阈值,表明第一控制器的数据处理效率低于该至少两个控制器中的其他控制器,即存在该第一控制器。For example, the storage server 10 can calculate the difference in IOPS of each controller in two adjacent seconds. If the storage server 10 calculates that the absolute value of the difference in IOPS of the first controller in two adjacent seconds is greater than the fifth threshold, it indicates that the data processing efficiency of the first controller is lower than that of other controllers among the at least two controllers, that is, the first controller exists.

存储服务器10还可以检测每个控制器是否发生CPU核隔离事件,基于检测结果确定该至少两个控制器中每个控制器的数据处理效率,并基于每个控制器的数据处理效率确定是否存在该第一控制器。The storage server 10 may also detect whether a CPU core isolation event occurs in each controller, determine the data processing efficiency of each of the at least two controllers based on the detection result, and determine whether the first controller exists based on the data processing efficiency of each controller.

例如,一个控制器可以包括多个CPU,每个CPU包括多个处理核(本申请实施例中简称为核),当某个CPU中的某个核发生故障之后,为让该CPU可以继续工作,则会将该发生故障的核隔离起来不工作,使得其他的核可以继续工作,即该CPU可以继续提供服务。因此,如果某个控制器的某个CPU发生核隔离事件,表明该控制器的该CPU可使用的核变少,该控制器的该CPU的数据处理能力降低,即存在该第一控制器,其中该第一控制器即为该发生核隔离事件的CPU对应的控制器。For example, a controller may include multiple CPUs, each CPU includes multiple processing cores (referred to as cores in the embodiments of the present application). When a core in a CPU fails, in order to allow the CPU to continue to work, the failed core will be isolated and not work, so that other cores can continue to work, that is, the CPU can continue to provide services. Therefore, if a core isolation event occurs in a CPU of a controller, it indicates that the CPU of the controller has fewer cores available, and the data processing capacity of the CPU of the controller is reduced, that is, there is a first controller, where the first controller is the controller corresponding to the CPU where the core isolation event occurs.

存储服务器10还可以周期性的统计每个控制器处理相同数据量(例如是第一数据量)的数据所用的时长,基于每个控制器处理第一数据量的数据所用的时长确定该至少两个控制器中每个控制器的数据处理效率,并基于每个控制器的数据处理效率确定是否存在该第一控制器。The storage server 10 can also periodically count the time taken by each controller to process the same amount of data (for example, the first amount of data), determine the data processing efficiency of each controller of the at least two controllers based on the time taken by each controller to process the first amount of data, and determine whether the first controller exists based on the data processing efficiency of each controller.

例如,存储服务器10可以获取每个控制器处理第一数据量的数据所用的时长,并计算任意两个控制器处理该第一数据量的数据所用的时长的差值。如果存储服务器10计算得到第一控制器与任一个控制器处理第一数据量的数据所用的时长的差值的绝对值大于第三阈值,则表明第一控制器的数据处理效率低于该至少两个控制器中的其他控制器,即存在该第一控制器。以第一数据量为5G字节(byte),第三阈值为200μs为例,如果存储服务器10获取的第一控制器处理5GB的数据所用的时长为1ms,任一个控制器处理5GB的数据所用的时长为500μs,第一控制器和任一个控制器处理5GB的数据所用的时长的差值为300μs,大于200μs,存储服务器10可以确定存在该第一控制器。For example, the storage server 10 can obtain the time taken by each controller to process the data of the first data volume, and calculate the difference in time taken by any two controllers to process the data of the first data volume. If the storage server 10 calculates that the absolute value of the difference in time taken by the first controller and any one of the controllers to process the data of the first data volume is greater than the third threshold value, it indicates that the data processing efficiency of the first controller is lower than that of the other controllers among the at least two controllers, that is, the first controller exists. Taking the first data volume as 5G bytes and the third threshold as 200μs as an example, if the storage server 10 obtains that the time taken by the first controller to process 5GB of data is 1ms, and the time taken by any one of the controllers to process 5GB of data is 500μs, the difference in time taken by the first controller and any one of the controllers to process 5GB of data is 300μs, which is greater than 200μs, and the storage server 10 can determine that the first controller exists.

可选的,当第一控制器处理第一数据量的数据所用的时长大于任一个控制器处理第一数据量的数据所用的时长,表明待通过第一控制器处理的数据需要等待的时长大于待通过任一个控制器处理的数据需要等待的时长,而需要等待的时长又可以称作是IO时延。Optionally, when the time taken by the first controller to process the first data volume is longer than the time taken by any controller to process the first data volume, it indicates that the waiting time required for the data to be processed by the first controller is longer than the waiting time required for the data to be processed by any controller, and the waiting time can be referred to as IO latency.

存储服务器10该可以获取每个控制器连续两次处理第一数据量的数据所用的时长,基于该时长确定该至少两个控制器中每个控制该器的数据处理效率,并基于每个控制器的数据处理效率确定是否存在该第一控制器。The storage server 10 can obtain the time taken by each controller to process the first amount of data twice in succession, determine the data processing efficiency of each of the at least two controllers based on the time, and determine whether the first controller exists based on the data processing efficiency of each controller.

例如,存储服务器10可以计算每个控制器连续两次处理第一数据量的数据所用的时长的差值,如果存储服务器10计算得到第一控制器连续两次处理第一数据量的数据所用的时长大于阈值,表明该第一控制器处理数据的速度一会儿快,一会儿慢,即该第一控制器不稳定,存储服务器10可以确定存在该第一控制器。For example, the storage server 10 can calculate the difference in the time taken by each controller to process the first data volume twice in a row. If the storage server 10 calculates that the time taken by the first controller to process the first data volume twice in a row is greater than a threshold, it indicates that the first controller processes data sometimes fast and sometimes slow, that is, the first controller is unstable. The storage server 10 can determine the existence of the first controller.

存储服务器10还可以周期性的统计每个控制器的CPU占用率,基于每个控制器的CPU占用率确定该至少两个控制器中每个控制该器的数据处理效率,并基于每个控制器的数据处理效率确定是否存在该第一控制器。The storage server 10 may also periodically count the CPU occupancy of each controller, determine the data processing efficiency of each of the at least two controllers based on the CPU occupancy of each controller, and determine whether the first controller exists based on the data processing efficiency of each controller.

例如,存储服务器10可以计算任意两个控制器的CPU占用率的差值,如果存储服务器10计算得到第一控制器与任一个控制器之间的CPU占用率的差值的绝对值大于第四阈值,则表明第一控制器的数据处理效率低于该至少两个控制器中的其他控制器,即存在该第一控制器。以第四阈值为5%为例,如果存储服务器10获取的第一控制器的CPU占用率为70%,任一个控制器的CPU占用率为50%,第一控制器和任一个控制器的CPU占用率的差值为20%,大于5%,存储服务器10可以确定存在该第一控制器。For example, the storage server 10 can calculate the difference in CPU occupancy between any two controllers. If the absolute value of the difference in CPU occupancy between the first controller and any one of the controllers calculated by the storage server 10 is greater than the fourth threshold, it indicates that the data processing efficiency of the first controller is lower than that of the other controllers of the at least two controllers, that is, the first controller exists. Taking the fourth threshold of 5% as an example, if the CPU occupancy of the first controller obtained by the storage server 10 is 70%, the CPU occupancy of any one of the controllers is 50%, and the difference in CPU occupancy between the first controller and any one of the controllers is 20%, which is greater than 5%, the storage server 10 can determine that the first controller exists.

存储服务器10还可以周期性的统计预设时长内,每个控制器向应用服务器20发送用于指示应用服务器20重新发送数据的第二信息的次数,根据该次数确定该至少两个控制器中每个控制器的数据处理效率,并基于每个控制器的数据处理效率确定是否存在该第一控制器。The storage server 10 can also periodically count the number of times each controller sends the second information to the application server 20 to instruct the application server 20 to resend data within a preset time period, determine the data processing efficiency of each controller of the at least two controllers based on the number, and determine whether the first controller exists based on the data processing efficiency of each controller.

其中,在控制器的数据处理能力下降时,表明控制器的数据处理压力较大,控制器可能会将接收的来自应用服务器20的数据丢弃,并向应用服务器20发送该第二信息。因此,如果存储服务器10确定第一控制器在预设时长内向应用服务器20发送第二信息的次数超过第六阈值,则表明第一控制器的数据处理效率低于该至少两个控制器中的其他控制器,即存在该第一控制器。When the data processing capability of the controller decreases, it indicates that the data processing pressure of the controller is relatively large, and the controller may discard the data received from the application server 20 and send the second information to the application server 20. Therefore, if the storage server 10 determines that the number of times the first controller sends the second information to the application server 20 within the preset time period exceeds the sixth threshold, it indicates that the data processing efficiency of the first controller is lower than that of the other controllers of the at least two controllers, that is, the first controller exists.

也就是说,存储服务器10可以根据控制器处理数据的效率判断是否需要均衡控制器间的负载,有助于降低应用服务器20重传数据读写请求的概率。That is to say, the storage server 10 can determine whether it is necessary to balance the load between controllers based on the efficiency of the controllers in processing data, which helps to reduce the probability of the application server 20 retransmitting data read and write requests.

在一些实施例中,存储服务器10还可以检测每个控制器包含的第二端口,和/或,每个控制器与应用服务器20之间的链路的负载,基于该负载确定是否需要执行S301。例如,如果基于该负载确定不同端口或链路的负载不均,即需要进行负载均衡时,存储服务器10可以执行S301,否则,不执行该S301。其中,负载例如包括IO带宽、IO时延和/或IOPS等。可选的,如果某个第二端口或某条链路的在一定时长内的IO为0,表明该第二端口或链路故障,例如端口网络断开等。In some embodiments, the storage server 10 may also detect the load of the second port included in each controller, and/or the load of the link between each controller and the application server 20, and determine whether it is necessary to execute S301 based on the load. For example, if it is determined based on the load that the loads of different ports or links are uneven, that is, load balancing is required, the storage server 10 may execute S301, otherwise, S301 is not executed. The load includes, for example, IO bandwidth, IO latency and/or IOPS, etc. Optionally, if the IO of a second port or a link is 0 within a certain period of time, it indicates that the second port or link is faulty, such as a port network disconnection.

存储服务器10在确定存在第一控制器时,可以向应用服务器20发送第一信息。可选的,存储服务器10可以为多个应用服务器20提供数据存储服务。因此,该至少两个控制器中的每个控制器可以处理来自多个应用服务器20的数据,存储服务器10向应用服务器20发送第一信息时,可以向多个应用服务器20发送该第一信息。例如,存储服务器10与应用服务器20的连接方式为如图2B所示的方式,存储服务器10确定的第一链路为多个第二子链路中的一个子链路(例如是子链路1),存储服务器10发送第一信息时可以向所有通过该子链路1与第一控制器进行数据传输的应用服务器20发送第一信息,以使通过该子链路1与第一控制器进行数据传输的所有应用服务器20进行链路切换。When the storage server 10 determines that there is a first controller, it can send the first information to the application server 20. Optionally, the storage server 10 can provide data storage services for multiple application servers 20. Therefore, each of the at least two controllers can process data from multiple application servers 20, and when the storage server 10 sends the first information to the application server 20, it can send the first information to multiple application servers 20. For example, the storage server 10 and the application server 20 are connected in a manner as shown in FIG2B , and the first link determined by the storage server 10 is a sub-link (for example, sub-link 1) among multiple second sub-links. When the storage server 10 sends the first information, it can send the first information to all application servers 20 that transmit data with the first controller through the sub-link 1, so that all application servers 20 that transmit data with the first controller through the sub-link 1 perform link switching.

可选的,该第一信息中可以包括需要被切换的链路(例如是第一链路)的标识,第一链路例如可以是一条链路也可以是多条链路,第一链路可以是存储服务器10从第一控制器对应的链路中随机选择的,或者也可以是存储服务器10从第一控制器对应的链路中根据链路负载情况选择的,本申请不对第一链路的链路数量和存储服务器10确定第一链路的方式进行限制。下面实施例中,以第一链路包括一条链路为例进行说明。Optionally, the first information may include an identifier of a link (e.g., a first link) that needs to be switched. The first link may be, for example, one link or multiple links. The first link may be randomly selected by the storage server 10 from the links corresponding to the first controller, or may be selected by the storage server 10 from the links corresponding to the first controller according to the link load. This application does not limit the number of links of the first link and the way in which the storage server 10 determines the first link. In the following embodiments, the first link including one link is taken as an example for description.

可选的,每个控制器处理来自多个应用服务器20的数据时,如果控制器中处理数据的CPU的数量小于应用服务器20的数量,需要一个CPU处理多个应用服务器20的数据。例如,控制器A内有2个CPU处理数据,如果该控制器A需要处理来自10个应用服务器20的数据,则该控制器A中的每个CPU可以处理来自5个应用服务器20的数据。此时,每个CPU是基于时间分片的原理来处理这多个应用服务器20的数据。例如,CPU1对第一应用服务器20的数据处理一定时长之后,开始处理第二应用服务器20的数据,以及对第二应用服务器20的数据处理该一定时长后,开始处理第三应用服务器20的数据,直到处理完最后一个应用服务器20的数据之后,再开始处理第一应用服务器20的数据。Optionally, when each controller processes data from multiple application servers 20, if the number of CPUs processing data in the controller is less than the number of application servers 20, one CPU is required to process data from multiple application servers 20. For example, there are two CPUs in controller A to process data. If the controller A needs to process data from 10 application servers 20, each CPU in the controller A can process data from five application servers 20. At this time, each CPU processes the data of the multiple application servers 20 based on the principle of time slicing. For example, after CPU1 processes the data of the first application server 20 for a certain period of time, it starts to process the data of the second application server 20, and after CPU1 processes the data of the second application server 20 for the certain period of time, it starts to process the data of the third application server 20, and after processing the data of the last application server 20, it starts to process the data of the first application server 20.

S302:应用服务器20将待通过第一链路传输的数据切换到第二链路。S302: The application server 20 switches the data to be transmitted through the first link to the second link.

可选的,该第二链路为该至少两个控制器中的第二控制器与应用服务器20之间的链路,该第二链路可以是应用服务器20自己确定的,或者该第二链路也可以是存储服务器10指定的,该第二链路可以是一条链路也可以是多条链路,本申请实施例不对第二链路的数量进行限制。下面实施例中,以第二链路包括一条链路为例进行说明。Optionally, the second link is a link between the second controller of the at least two controllers and the application server 20. The second link may be determined by the application server 20 itself, or the second link may be specified by the storage server 10. The second link may be one link or multiple links. The embodiment of the present application does not limit the number of second links. In the following embodiment, the second link includes one link as an example for description.

如果该第二链路是存储服务器10指定,存储服务器10可以通过如下方式确定第二链路:If the second link is specified by the storage server 10, the storage server 10 may determine the second link in the following manner:

方式1:存储服务器10可以获取多条链路中除去第一控制器之外的控制器对应的至少一条链路的负载,从该至少一条链路中选择链路负载最小的链路,将该链路确定为第二链路。Mode 1: The storage server 10 may obtain the load of at least one link corresponding to a controller other than the first controller among the multiple links, select a link with the smallest link load from the at least one link, and determine the link as the second link.

方式2:存储服务器10可以确定至少两个控制器中除去第一控制器之外的其他控制器的负载,并从该其他控制器中选择负载最小的控制器作为目标控制器,存储服务器10可以获取该目标控制器对应的链路的负载,并从该目标控制器对应的链路中选择链路负载最小的链路,将该链路确定为第二链路。其中,该至少两个控制器中除去第一控制器之外的其他控制器的数据处理效率大致相同。Mode 2: The storage server 10 may determine the load of other controllers among the at least two controllers except the first controller, and select the controller with the smallest load from the other controllers as the target controller, and the storage server 10 may obtain the load of the link corresponding to the target controller, and select the link with the smallest link load from the links corresponding to the target controller, and determine the link as the second link. The data processing efficiency of other controllers among the at least two controllers except the first controller is substantially the same.

如果该第二链路是应用服务器20自己确定的,应用服务器20确定第二链路的方式可以参考上述方式1,在此不再赘述。If the second link is determined by the application server 20 itself, the manner in which the application server 20 determines the second link may refer to the above-mentioned manner 1, which will not be described in detail herein.

可选的,存储服务器10还可以将第一链路的状态设置为低优先级状态(也可以称作是降级状态),并向应用服务器20发送第三信息,用于指示应用服务器20将第一链路的状态更新为该降级状态。因此,应用服务器20在接收到该第三信息时,还可以设置该第一链路的状态为该降级状态。Optionally, the storage server 10 may also set the state of the first link to a low priority state (also referred to as a degraded state), and send third information to the application server 20 to instruct the application server 20 to update the state of the first link to the degraded state. Therefore, when the application server 20 receives the third information, it may also set the state of the first link to the degraded state.

可选的,应用服务器20将待通过第一链路传输的数据切换到第二链路之后,第一链路上没有数据传输,因此存储服务器10还可以对第一控制器对应的链路进行链路均衡,例如存储服务器10可以将第一控制器与应用服务器之间的链路中除该第一链路之外的其他链路传输的数据迁移到该第一链路上进行传输。因此可选的,存储服务器10还可以向通过该其他链路与第一控制器进行数据传输的至少一个应用服务器20发送第四信息,用于指示该至少一个应用服务器20将待通过该其他链路传输的部分数据切换到第一链路。Optionally, after the application server 20 switches the data to be transmitted through the first link to the second link, there is no data transmission on the first link, so the storage server 10 can also perform link balancing on the link corresponding to the first controller, for example, the storage server 10 can migrate the data transmitted by other links other than the first link in the link between the first controller and the application server to the first link for transmission. Therefore, optionally, the storage server 10 can also send fourth information to at least one application server 20 that transmits data with the first controller through the other links, for instructing the at least one application server 20 to switch part of the data to be transmitted through the other links to the first link.

上述技术方案中,应用服务器20将待通过第一链路传输的数据切换到第二链路之后,可以使得该至少两个控制器的处理数据的效率大致相同。以存储服务器10包含两个控制器为例,该两个控制器分别为控制器1和控制器2,第一链路为控制器1的链路,第二链路为控制器2的链路,均衡之前,控制器1的IO带宽为5GB/S,IO时延1ms,CPU利用率100%;控制器2的IO带宽5GB/S,IO时延500us,CPU利用率50%。均衡之后,控制器1的IO带宽3GB/S,IO时延500us,CPU利用率70%;控制器2的IO带宽7GB/S,IO时延500us,CPU利用率70%。可以看出,均衡之后控制器1和控制器2的IO时延相同,控制器1和控制器2处理数据的效率相同,且控制器1和控制器2的CPU利用率也相同,减小了应用服务器10重传数据读写请求的概率,提升了存储系统的性能。In the above technical solution, after the application server 20 switches the data to be transmitted through the first link to the second link, the efficiency of processing data by the at least two controllers can be made roughly the same. Take the storage server 10 as an example, which includes two controllers, namely controller 1 and controller 2, the first link is the link of controller 1, and the second link is the link of controller 2. Before balancing, the IO bandwidth of controller 1 is 5GB/S, the IO delay is 1ms, and the CPU utilization is 100%; the IO bandwidth of controller 2 is 5GB/S, the IO delay is 500us, and the CPU utilization is 50%. After balancing, the IO bandwidth of controller 1 is 3GB/S, the IO delay is 500us, and the CPU utilization is 70%; the IO bandwidth of controller 2 is 7GB/S, the IO delay is 500us, and the CPU utilization is 70%. It can be seen that after balancing, the IO delay of controller 1 and controller 2 is the same, the efficiency of controller 1 and controller 2 in processing data is the same, and the CPU utilization of controller 1 and controller 2 is also the same, which reduces the probability of application server 10 retransmitting data read and write requests and improves the performance of the storage system.

结合上述方法实施例,本申请实施例还提供了一种数据传输装置,该数据传输装置用于执行上述方法实施例中的存储服务器10所执行的方法,或者用于执行上述方法实施例中的应用服务器20所执行的方法。In combination with the above method embodiments, the embodiments of the present application also provide a data transmission device, which is used to execute the method executed by the storage server 10 in the above method embodiments, or to execute the method executed by the application server 20 in the above method embodiments.

请参考图4,为本申请实施例提供的一种数据传输装置,该数据传输装置用于执行上述方法实施例中的存储服务器10所执行的方法,该数据传输装置包括处理模块401和通信模块402,所述处理模块401用于在所述至少两个控制器中的第一控制器的数据处理效率低于所述至少两个控制器中除所述第一控制器外的任一个控制器时,控制所述通信模块402向所述应用服务器发送第一信息;其中,所述第一信息用于指示所述应用服务器切换链路,所述第一信息包括第一链路的标识,所述第一链路为所述第一控制器与所述应用服务器之间的链路。Please refer to Figure 4, which is a data transmission device provided in an embodiment of the present application. The data transmission device is used to execute the method executed by the storage server 10 in the above method embodiment. The data transmission device includes a processing module 401 and a communication module 402. When the data processing efficiency of the first controller among the at least two controllers is lower than that of any controller among the at least two controllers except the first controller, the processing module 401 controls the communication module 402 to send a first information to the application server; wherein the first information is used to instruct the application server to switch the link, and the first information includes an identifier of the first link, and the first link is a link between the first controller and the application server.

在一种可能的实施方式中,所述第一控制器的数据处理效率低于所述任一个控制器,包括如下一项或多项:一个周期内所述任一个控制器处理的数据量与所述第一控制器处理的数据量的差值大于第一阈值,且所述任一个控制器处理的数据量大于所述第一控制器处理的数据量;所述任一个控制器的IOPS与所述第一控制器的IOPS的差值大于第二阈值,且所述任一个控制器的IOPS大于所述第一控制器的IOPS;所述第一控制器发生中央处理器CPU核隔离事件;所述第一控制器处理第一数量的数据所用的第一时长与所述任一个控制器处理所述第一数量的数据所用的第二时长的差值大于第三阈值,且所述第一时长大于所述第二时长;所述第一控制器的CPU占用率与所述任一个控制器的CPU占用率的差值大于第四阈值,且所述第一控制器的CPU占用率大于所述任一个控制器的CPU的占用率;所述第一控制器相邻两秒内的IOPS的差值大于第五阈值;或,预设时长内,所述第一控制器向所述应用服务器发送第二信息的次数大于第六阈值,所述第二信息用于请求所述应用服务器重新发送数据读写请求。In a possible implementation manner, the data processing efficiency of the first controller is lower than that of any of the controllers, including one or more of the following: the difference between the amount of data processed by any of the controllers and the amount of data processed by the first controller within a cycle is greater than a first threshold, and the amount of data processed by any of the controllers is greater than the amount of data processed by the first controller; the difference between the IOPS of any of the controllers and the IOPS of the first controller is greater than a second threshold, and the IOPS of any of the controllers is greater than the IOPS of the first controller; a CPU core isolation event occurs in the first controller; the difference between a first duration used by the first controller to process a first amount of data and a second duration used by any of the controllers to process the first amount of data is greater than a third threshold, and the first duration is greater than the second duration; the difference between the CPU occupancy rate of the first controller and the CPU occupancy rate of any of the controllers is greater than a fourth threshold, and the CPU occupancy rate of the first controller is greater than the CPU occupancy rate of any of the controllers; the difference between the IOPS of the first controller within two adjacent seconds is greater than a fifth threshold; or, within a preset duration, the number of times the first controller sends the second information to the application server is greater than a sixth threshold, and the second information is used to request the application server to resend the data read and write request.

在一种可能的实施方式中,所述处理模块401还用于:将所述第一链路的状态设置为低优先级状态。In a possible implementation manner, the processing module 401 is further configured to: set the state of the first link to a low priority state.

在一种可能的实施方式中,所述通信模块402还用于:向所述应用服务器发送第二信息,所述第二信息用于请求所述应用服务器将所述第一链路的状态更新为所述低优先级状态。In a possible implementation, the communication module 402 is further used to: send second information to the application server, where the second information is used to request the application server to update the state of the first link to the low priority state.

在一种可能的实施方式中,所述第一信息还用于指示第二链路,所述第二链路为所述至少两个控制器中的第二控制器与所述应用服务器之间的链路。In a possible implementation manner, the first information is further used to indicate a second link, where the second link is a link between a second controller of the at least two controllers and the application server.

在一种可能的实施方式中,所述处理模块401还用于:获取所述应用服务器与所述至少两个控制器中除所述第一控制器之外的控制器之间的至少一条链路的负载信息,并确定所述至少一条链路中负载最小的链路为所述第二链路;或者,获取所述至少两个控制器中除所述第一控制器之外的其他控制器的负载信息,确定其中负载最小的控制器为所述第二控制器,并确定所述第二控制器与所述应用服务之间的链路中负载最小的链路为所述第二链路。In a possible implementation, the processing module 401 is further used to: obtain load information of at least one link between the application server and a controller other than the first controller among the at least two controllers, and determine that the link with the smallest load among the at least one link is the second link; or, obtain load information of other controllers other than the first controller among the at least two controllers, determine that the controller with the smallest load is the second controller, and determine that the link with the smallest load among the links between the second controller and the application service is the second link.

请参考图5,为本申请实施例提供的一种数据传输装置,该数据传输装置用于执行上述方法实施例中的应用服务器20所执行的方法,该数据传输装置包括通信模块501和处理模块502,所述通信模块501用于接收来自所述存储服务器的第一信息;其中,所述第一信息为在所述至少两个控制器中的第一控制器的数据处理效率低于所述至少两个控制器中除所述第一控制器外的任一个控制器时所述存储服务器发送的,所述第一信息用于指示所述应用服务器切换链路,所述第一信息包括第一链路的标识,所述第一链路为所述第一控制器与所述应用服务器之间的链路;所述处理模块502用于将待通过所述第一链路传输的数据切换到第二链路,所述第二链路为所述至少两个控制器中的第二控制器与所述应用服务器之间的链路。Please refer to Figure 5, which is a data transmission device provided in an embodiment of the present application. The data transmission device is used to execute the method executed by the application server 20 in the above method embodiment. The data transmission device includes a communication module 501 and a processing module 502. The communication module 501 is used to receive first information from the storage server; wherein the first information is sent by the storage server when the data processing efficiency of the first controller among the at least two controllers is lower than that of any controller among the at least two controllers except the first controller, and the first information is used to instruct the application server to switch the link. The first information includes an identifier of the first link, and the first link is the link between the first controller and the application server; the processing module 502 is used to switch the data to be transmitted through the first link to the second link, and the second link is the link between the second controller among the at least two controllers and the application server.

在一种可能的实施方式中,所述第一控制器的数据处理效率低于所述任一个控制器,包括如下一项或多项:一个周期内所述任一个控制器处理的数据量与所述第一控制器处理的数据量的差值大于第一阈值,且所述任一个控制器处理的数据量大于所述第一控制器处理的数据量;所述任一个控制器的IOPS与所述第一控制器的IOPS的差值大于第二阈值,且所述任一个控制器的IOPS大于所述第一控制器的IOPS;所述第一控制器发生中央处理器CPU核隔离事件;所述第一控制器处理第一数量的数据所用的第一时长与所述任一个控制器处理所述第一数量的数据所用的第二时长的差值大于第三阈值,且所述第一时长大于所述第二时长;所述第一控制器的CPU占用率与所述任一个控制器的CPU占用率的差值大于第四阈值,且所述第一控制器的CPU占用率大于所述任一个控制器的CPU的占用率;所述第一控制器相邻两秒内的IOPS的差值大于第五阈值;或,预设时长内,所述第一控制器向所述应用服务器发送第二信息的次数大于第六阈值,所述第二信息用于请求所述应用服务器重新发送数据读写请求。In a possible implementation manner, the data processing efficiency of the first controller is lower than that of any of the controllers, including one or more of the following: the difference between the amount of data processed by any of the controllers and the amount of data processed by the first controller within a cycle is greater than a first threshold, and the amount of data processed by any of the controllers is greater than the amount of data processed by the first controller; the difference between the IOPS of any of the controllers and the IOPS of the first controller is greater than a second threshold, and the IOPS of any of the controllers is greater than the IOPS of the first controller; a CPU core isolation event occurs in the first controller; the difference between a first duration used by the first controller to process a first amount of data and a second duration used by any of the controllers to process the first amount of data is greater than a third threshold, and the first duration is greater than the second duration; the difference between the CPU occupancy rate of the first controller and the CPU occupancy rate of any of the controllers is greater than a fourth threshold, and the CPU occupancy rate of the first controller is greater than the CPU occupancy rate of any of the controllers; the difference between the IOPS of the first controller within two adjacent seconds is greater than a fifth threshold; or, within a preset duration, the number of times the first controller sends the second information to the application server is greater than a sixth threshold, and the second information is used to request the application server to resend the data read and write request.

在一种可能的实施方式中,所述通信模块501还用于:接收来自所述存储服务器的第二信息,所述第二信息用于请求将所述第一链路的状态更新为低优先级状态;所述处理模块502还用于:将所述第一链路的状态设置为所述低优先级状态。In a possible implementation, the communication module 501 is further used to: receive second information from the storage server, the second information being used to request that the state of the first link be updated to a low priority state; the processing module 502 is further used to: set the state of the first link to the low priority state.

在一种可能的实施方式中,所述第一信息还用于指示所述第二链路的标识。In a possible implementation manner, the first information is further used to indicate an identifier of the second link.

在一种可能的实施方式中,所述处理模块502还用于:获取所述应用服务器与所述至少两个控制器中除所述第一控制器之外的控制器之间的至少一条链路的负载信息,并确定所述至少一条链路中负载最小的链路为所述第二链路。In a possible implementation, the processing module 502 is further used to obtain load information of at least one link between the application server and a controller other than the first controller among the at least two controllers, and determine that the link with the smallest load among the at least one link is the second link.

本申请还提供一种计算设备600。如图6所示,计算设备600包括:总线601、处理器602、存储器603和通信接口604。处理器602、存储器603和通信接口604之间通过总线601通信。计算设备600可以是服务器或终端设备。应理解,本申请不限定计算设备600中的处理器、存储器的个数。The present application also provides a computing device 600. As shown in FIG6 , the computing device 600 includes: a bus 601, a processor 602, a memory 603, and a communication interface 604. The processor 602, the memory 603, and the communication interface 604 communicate with each other through the bus 601. The computing device 600 can be a server or a terminal device. It should be understood that the present application does not limit the number of processors and memories in the computing device 600.

总线601可以是外设部件互连标准(peripheral component interconnect,PCI)总线或扩展工业标准结构(extended industry standard architecture,EISA)总线等。总线可以分为地址总线、数据总线、控制总线等。为便于表示,图6中仅用一条线表示,但并不表示仅有一根总线或一种类型的总线。总线601可包括在计算设备600各个部件(例如,存储器603、处理器602、通信接口604)之间传送信息的通路。The bus 601 may be a peripheral component interconnect (PCI) bus or an extended industry standard architecture (EISA) bus, etc. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of representation, FIG. 6 is represented by only one line, but does not mean that there is only one bus or one type of bus. The bus 601 may include a path for transmitting information between various components of the computing device 600 (e.g., the memory 603, the processor 602, and the communication interface 604).

处理器602可以包括中央处理器(central processing unit,CPU)、图形处理器(graphics processing unit,GPU)、微处理器(micro processor,MP)或者数字信号处理器(digital signal processor,DSP)等处理器中的任意一种或多种。The processor 602 may include any one or more of a central processing unit (CPU), a graphics processing unit (GPU), a microprocessor (MP), or a digital signal processor (DSP).

存储器603可以包括易失性存储器(volatile memory),例如随机存取存储器(random access memory,RAM)。处理器602还可以包括非易失性存储器(non-volatilememory),例如只读存储器(read-only memory,ROM),快闪存储器,机械硬盘(hard diskdrive,HDD)或固态硬盘(solid state drive,SSD)。The memory 603 may include a volatile memory, such as a random access memory (RAM). The processor 602 may also include a non-volatile memory, such as a read-only memory (ROM), a flash memory, a hard disk drive (HDD), or a solid state drive (SSD).

存储器603中存储有可执行的程序代码,处理器602执行该可执行的程序代码以分别实现前述处理模块401和通信模块402的功能,或者实现前述通信模块501和处理模块502的功能,从而实现数据传输方法。也即,存储器603上存有用于执行数据传输方法的指令。The memory 603 stores executable program codes, and the processor 602 executes the executable program codes to respectively implement the functions of the aforementioned processing module 401 and the communication module 402, or to implement the functions of the aforementioned communication module 501 and the processing module 502, thereby implementing the data transmission method. That is, the memory 603 stores instructions for executing the data transmission method.

或者,存储器603中存储有可执行的代码,处理器602执行该可执行的代码以分别实现前述存储服务器和应用服务器的功能,从而实现数据传输方法。也即,存储器603上存有用于执行数据传输方法的指令。Alternatively, the memory 603 stores executable codes, and the processor 602 executes the executable codes to respectively implement the functions of the aforementioned storage server and application server, thereby implementing the data transmission method. That is, the memory 603 stores instructions for executing the data transmission method.

通信接口604使用例如但不限于网络接口卡、收发器一类的收发模块,来实现计算设备600与其他设备或通信网络之间的通信。The communication interface 604 uses a transceiver module such as, but not limited to, a network interface card or a transceiver to implement communication between the computing device 600 and other devices or a communication network.

本申请实施例还提供了一种计算设备集群。该计算设备集群包括至少一台计算设备。该计算设备可以是服务器,例如是中心服务器、边缘服务器,或者是本地数据中心中的本地服务器。在一些实施例中,计算设备也可以是台式机、笔记本电脑或者智能手机等终端设备。The embodiment of the present application also provides a computing device cluster. The computing device cluster includes at least one computing device. The computing device can be a server, such as a central server, an edge server, or a local server in a local data center. In some embodiments, the computing device can also be a terminal device such as a desktop computer, a laptop computer, or a smart phone.

如图7所示,所述计算设备集群包括至少一个计算设备600。计算设备集群中的一个或多个计算设备600中的存储器603中可以存有相同的用于执行数据传输方法的指令。As shown in Fig. 7, the computing device cluster includes at least one computing device 600. The memory 603 in one or more computing devices 600 in the computing device cluster may store the same instructions for executing the data transmission method.

在一些可能的实现方式中,该计算设备集群中的一个或多个计算设备600的存储器603中也可以分别存有用于执行数据传输方法的部分指令。换言之,一个或多个计算设备600的组合可以共同执行用于执行数据传输方法的指令。In some possible implementations, the memory 603 of one or more computing devices 600 in the computing device cluster may also store partial instructions for executing the data transmission method. In other words, the combination of one or more computing devices 600 may jointly execute instructions for executing the data transmission method.

可以理解的是,计算设备集群中的不同的计算设备600中的存储器603可以存储不同的指令,分别用于执行存储服务器的部分功能。也即,不同的计算设备600中的存储器603存储的指令可以实现应用服务器和存储服务器中的一个或多个模块的功能。It is understandable that the memory 603 in different computing devices 600 in the computing device cluster can store different instructions, which are respectively used to execute part of the functions of the storage server. That is, the instructions stored in the memory 603 in different computing devices 600 can implement the functions of one or more modules in the application server and the storage server.

在一些可能的实现方式中,计算设备集群中的一个或多个计算设备可以通过网络连接。其中,所述网络可以是广域网或局域网等等。图8示出了一种可能的实现方式。如图8所示,两个计算设备600A和600B之间通过网络进行连接。具体地,通过各个计算设备中的通信接口与所述网络进行连接。在这一类可能的实现方式中,计算设备600A中的存储器603中存有执行存储服务器的功能的指令。In some possible implementations, one or more computing devices in the computing device cluster can be connected via a network. Wherein, the network can be a wide area network or a local area network, etc. FIG. 8 shows a possible implementation. As shown in FIG. 8 , two computing devices 600A and 600B are connected via a network. Specifically, the communication interface in each computing device is connected to the network. In this type of possible implementation, the memory 603 in the computing device 600A stores instructions for executing the function of the storage server.

应理解,图8中示出的计算设备600A的功能也可以由多个计算设备600完成。同样,计算设备600B的功能也可以由多个计算设备600完成。It should be understood that the functions of the computing device 600A shown in FIG8 may also be completed by multiple computing devices 600. Similarly, the functions of the computing device 600B may also be completed by multiple computing devices 600.

本申请实施例还提供了另一种计算设备集群。该计算设备集群中各计算设备之间的连接关系可以类似的参考图7和图8所述计算设备集群的连接方式。不同的是,该计算设备集群中的一个或多个计算设备600中的存储器603中可以存有相同的用于执行数据传输方法的指令。The embodiment of the present application also provides another computing device cluster. The connection relationship between the computing devices in the computing device cluster can be similar to the connection mode of the computing device cluster described in Figures 7 and 8. The difference is that the memory 603 in one or more computing devices 600 in the computing device cluster can store the same instructions for executing the data transmission method.

在一些可能的实现方式中,该计算设备集群中的一个或多个计算设备600的存储器603中也可以分别存有用于执行数据传输方法的部分指令。换言之,一个或多个计算设备600的组合可以共同执行用于执行数据传输方法的指令。In some possible implementations, the memory 603 of one or more computing devices 600 in the computing device cluster may also store partial instructions for executing the data transmission method. In other words, the combination of one or more computing devices 600 may jointly execute instructions for executing the data transmission method.

可以理解的是,计算设备集群中的不同的计算设备600中的存储器603可以存储不同的指令,用于执行存储系统的部分功能。也即,不同的计算设备600中的存储器603存储的指令可以实现应用服务器和存储服务器中的一个或多个装置的功能。It is understandable that the memory 603 in different computing devices 600 in the computing device cluster can store different instructions for executing part of the functions of the storage system. That is, the instructions stored in the memory 603 in different computing devices 600 can implement the functions of one or more devices in the application server and the storage server.

本申请实施例还提供了一种包含指令的计算机程序产品。所述计算机程序产品可以是包含指令的,能够运行在计算设备上或被储存在任何可用介质中的软件或程序产品。当所述计算机程序产品在至少一个计算设备上运行时,使得至少一个计算设备执行数据传输方法。The embodiment of the present application also provides a computer program product including instructions. The computer program product may be software or a program product including instructions that can be run on a computing device or stored in any available medium. When the computer program product is run on at least one computing device, the at least one computing device executes the data transmission method.

本申请实施例还提供了一种计算机可读存储介质。所述计算机可读存储介质可以是计算设备能够存储的任何可用介质或者是包含一个或多个可用介质的数据中心等数据存储设备。所述可用介质可以是磁性介质,(例如,软盘、硬盘、磁带)、光介质(例如,DVD)、或者半导体介质(例如固态硬盘)等。该计算机可读存储介质包括指令,所述指令指示计算设备执行数据传输方法。The embodiment of the present application also provides a computer-readable storage medium. The computer-readable storage medium can be any available medium that can be stored by a computing device or a data storage device such as a data center containing one or more available media. The available medium can be a magnetic medium (e.g., a floppy disk, a hard disk, a tape), an optical medium (e.g., a DVD), or a semiconductor medium (e.g., a solid-state hard disk). The computer-readable storage medium includes instructions that instruct the computing device to execute the data transmission method.

最后应说明的是:以上实施例仅用以说明本发明的技术方案,而非对其限制;尽管参照前述实施例对本发明进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本发明各实施例技术方案的保护范围。Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention, rather than to limit it. Although the present invention has been described in detail with reference to the aforementioned embodiments, those skilled in the art should understand that they can still modify the technical solutions described in the aforementioned embodiments, or make equivalent replacements for some of the technical features therein. However, these modifications or replacements do not cause the essence of the corresponding technical solutions to deviate from the protection scope of the technical solutions of the embodiments of the present invention.

Claims (15)

1. A data transmission method, applied to a storage system, the storage system including a storage server and an application server, the storage server including a plurality of controllers, the application server being connected to at least two controllers of the plurality of controllers, the method comprising:
When the data processing efficiency of a first controller in the at least two controllers is lower than that of any one controller except the first controller in the at least two controllers, the storage server sends first information to the application server; the first information is used for indicating the application server to switch links, and the first information comprises an identifier of a first link, wherein the first link is a link between the first controller and the application server;
the application server switches data to be transmitted through the first link to a second link between a second controller of the at least two controllers and the application server.
2. The method of claim 1, wherein the first controller has a lower data processing efficiency than the any one of the controllers, comprising one or more of:
The difference between the data amount processed by any one of the controllers and the data amount processed by the first controller in one period is larger than a first threshold value, and the data amount processed by any one of the controllers is larger than the data amount processed by the first controller;
The difference value between the read-write times per second IOPS of any one controller and the IOPS of the first controller is larger than a second threshold value, and the IOPS of any one controller is larger than the IOPS of the first controller;
the first controller generates a Central Processing Unit (CPU) core isolation event;
A difference between a first time period for the first controller to process a first amount of data and a second time period for the any one of the controllers to process the first amount of data is greater than a third threshold, and the first time period is greater than the second time period;
the difference value between the CPU occupancy rate of the first controller and the CPU occupancy rate of any one of the controllers is larger than a fourth threshold value, and the CPU occupancy rate of the first controller is larger than the occupancy rate of the CPU of any one of the controllers;
the difference value of the IOPS in two seconds adjacent to the first controller is larger than a fifth threshold value; or alternatively, the first and second heat exchangers may be,
And within a preset duration, the number of times of sending second information to the application server by the first controller is larger than a sixth threshold, and the second information is used for requesting the application server to resend a data read-write request.
3. The method of claim 1 or 2, wherein the method further comprises:
the storage server sets the state of the first link to a low priority state.
4. A method as claimed in claim 3, wherein the method further comprises:
the storage server sends third information to the application server, wherein the third information is used for indicating the application server to update the state of the first link to the low-priority state;
the application server sets the state of the first link to the low priority state.
5. The method of any one of claims 1-4, further comprising:
The application server determines a link with the minimum link load as the second link; or alternatively, the first and second heat exchangers may be,
The first information is also used to indicate the second link.
6. The method of claim 5, wherein the first information is further for indicating the second link, the method further comprising:
The storage server acquires load information of at least one link between the application server and a controller except the first controller in the at least two controllers, and determines a link with the smallest load in the at least one link as the second link; or alternatively
The storage server acquires load information of other controllers except the first controller in the at least two controllers, determines the controller with the smallest load as the second controller, and determines the link with the smallest load in the links between the second controller and the application server as the second link.
7. A storage system comprising a storage server comprising a plurality of controllers and an application server coupled to at least two of the plurality of controllers, wherein,
The storage server is used for sending first information to the application server when the data processing efficiency of a first controller in the at least two controllers is lower than that of any one controller except the first controller in the at least two controllers; the first information is used for indicating the application server to switch links, and the first information comprises an identifier of a first link, wherein the first link is a link between the first controller and the application server;
The application server is configured to switch data to be transmitted through the first link to a second link, where the second link is a link between a second controller of the at least two controllers and the application server.
8. The storage system of claim 7, wherein the first controller has a lower data processing efficiency than the any one of the controllers, comprising one or more of:
The difference between the data amount processed by any one of the controllers and the data amount processed by the first controller in one period is larger than a first threshold value, and the data amount processed by any one of the controllers is larger than the data amount processed by the first controller;
The difference value between the read-write times per second IOPS of any one controller and the IOPS of the first controller is larger than a second threshold value, and the IOPS of any one controller is larger than the IOPS of the first controller;
the first controller generates a Central Processing Unit (CPU) core isolation event;
A difference between a first time period for the first controller to process a first amount of data and a second time period for the any one of the controllers to process the first amount of data is greater than a third threshold, and the first time period is greater than the second time period;
the difference value between the CPU occupancy rate of the first controller and the CPU occupancy rate of any one of the controllers is larger than a fourth threshold value, and the CPU occupancy rate of the first controller is larger than the occupancy rate of the CPU of any one of the controllers;
the difference value of the IOPS in two seconds adjacent to the first controller is larger than a fifth threshold value; or alternatively, the first and second heat exchangers may be,
And within a preset duration, the number of times of sending second information to the application server by the first controller is larger than a sixth threshold, and the second information is used for requesting the application server to resend a data read-write request.
9. The storage system of claim 7 or 8, wherein the storage server is further configured to:
The state of the first link is set to a low priority state.
10. The storage system of claim 9, wherein the memory device comprises a memory device,
The storage server is further configured to: transmitting third information to the application server, wherein the third information is used for indicating the application server to update the state of the first link to the low-priority state;
the application server is further configured to: the state of the first link is set to the low priority state.
11. The storage system of any one of claim 7 to 10,
The application server is configured to determine that a link with a minimum link load is the second link; or alternatively, the first and second heat exchangers may be,
The first information is also used to indicate the second link.
12. The storage system of claim 11, wherein the first information is further for indicating the second link, the storage server is further for:
Load information of at least one link between the application server and a controller except the first controller in the at least two controllers is obtained, and a link with the minimum load in the at least one link is determined to be the second link; or alternatively
And acquiring load information of other controllers except the first controller in the at least two controllers, determining the controller with the smallest load as the second controller, and determining the link with the smallest load in the links between the second controller and the application service as the second link.
13. A cluster of computing devices, comprising at least one computing device, each computing device comprising a processor and a memory;
A processor of the at least one computing device is configured to execute instructions stored in a memory of the at least one computing device to cause the cluster of computing devices to perform the method performed by the storage server or the application server as claimed in any one of claims 1 to 6.
14. A computer program product containing instructions that, when executed by a cluster of computing devices, cause the cluster of computing devices to perform the method performed by the storage server or the application server of any one of claims 1-6.
15. A computer readable storage medium comprising computer program instructions which, when executed by a cluster of computing devices, perform a method performed by a storage server or the application server as claimed in any one of claims 1 to 6.
CN202310366680.8A 2023-03-31 2023-03-31 Data transmission method and system Pending CN118740743A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310366680.8A CN118740743A (en) 2023-03-31 2023-03-31 Data transmission method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310366680.8A CN118740743A (en) 2023-03-31 2023-03-31 Data transmission method and system

Publications (1)

Publication Number Publication Date
CN118740743A true CN118740743A (en) 2024-10-01

Family

ID=92846444

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310366680.8A Pending CN118740743A (en) 2023-03-31 2023-03-31 Data transmission method and system

Country Status (1)

Country Link
CN (1) CN118740743A (en)

Similar Documents

Publication Publication Date Title
CN113728596A (en) System and method for facilitating efficient management of idempotent operations in a Network Interface Controller (NIC)
JP2008507201A (en) Apparatus and method for supporting connection establishment in network protocol processing offload
US10459791B2 (en) Storage device having error communication logical ports
JP2008507030A (en) Apparatus and method for supporting memory management in network protocol processing offload
CN115270033A (en) A data access system, method, device and network card
WO2008025788A1 (en) Persistent information unit pacing
CN114666276B (en) Method and device for sending message
WO2023116438A1 (en) Data access method and apparatus, and device
WO2022160308A1 (en) Data access method and apparatus, and storage medium
CN117221225A (en) Network congestion notification method, device and storage medium
CN114124850A (en) Network communication method and device and storage medium
CN113297117B (en) Data transmission method, device, network system and storage medium
WO2024221928A1 (en) Packet transmission method and device
CN112311694B (en) Priority adjustment method and device
CN113973091A (en) Message processing method, network equipment and related equipment
CN111404842A (en) Data transmission method, device and computer storage medium
CN114328317A (en) A method, device and medium for improving communication performance of a storage system
CN106372013B (en) Remote memory access method, device and system
GB2532732A (en) Integrating a communication bridge into a data procesing system
CN118740743A (en) Data transmission method and system
US12093571B1 (en) Accelerating request/response protocols
CN107273318A (en) Parallel processing device and communication control method
CN107220124A (en) A kind of routing resource and device
CN112463670A (en) Storage controller access method and related device
CN116360675B (en) SAS frame routing method and device in wide port scene

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination