[go: up one dir, main page]

CN118648275A - Method and apparatus for network interface card (NIC) object consistency (NOC) messages - Google Patents

Method and apparatus for network interface card (NIC) object consistency (NOC) messages Download PDF

Info

Publication number
CN118648275A
CN118648275A CN202280089961.7A CN202280089961A CN118648275A CN 118648275 A CN118648275 A CN 118648275A CN 202280089961 A CN202280089961 A CN 202280089961A CN 118648275 A CN118648275 A CN 118648275A
Authority
CN
China
Prior art keywords
value
rnic
request
receiver
header
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202280089961.7A
Other languages
Chinese (zh)
Inventor
本-沙哈尔·贝尔彻
萨吉夫·戈伦
大卫·亚隆
尤里·哈森
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Publication of CN118648275A publication Critical patent/CN118648275A/en
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L49/00Packet switching elements
    • H04L49/90Buffering arrangements
    • H04L49/9063Intermediate storage in different physical parts of a node or terminal
    • H04L49/9068Intermediate storage in different physical parts of a node or terminal in the network interface card
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L49/00Packet switching elements
    • H04L49/90Buffering arrangements
    • H04L49/901Buffering arrangements using storage descriptor, e.g. read or write pointers

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

提供了用于检查远程直接内存访问(remote direct memory access,RDMA)事务中网络接口卡(network interface card,NIC)一致性的方法和设备。所述方法包括:由在接收器处的RDMA NIC从发送器接收从地址内存读取对象的请求,检查头版本何时被解锁并从每个缓存行中提取版本号Vobj,并且验证所述Vobj与所述对象的头的版本字段ObV的最低有效字节(least significant byte,LSB)是否匹配。当每个缓存行中的所述Vobj与所述ObV匹配时:从所述对象的每个缓存行中删除所述Vobj,读取所述对象的数据,向所述发送器发送所述对象。如果不存在匹配,则重试预定义次数,以验证每个Vobj与所述对象的所述头中的所述ObV是否匹配,如果不存在匹配,则发送失败响应。

A method and apparatus for checking network interface card (NIC) consistency in remote direct memory access (RDMA) transactions are provided. The method includes: receiving a request to read an object from an address memory from a sender by an RDMA NIC at a receiver, checking when a header version is unlocked and extracting a version number Vobj from each cache line, and verifying whether the Vobj matches the least significant byte (LSB) of the version field ObV in the header of the object. When the Vobj in each cache line matches the ObV: deleting the Vobj from each cache line of the object, reading the data of the object, and sending the object to the sender. If there is no match, retrying a predefined number of times to verify whether each Vobj matches the ObV in the header of the object, and if there is no match, sending a failure response.

Description

用于网络接口卡(NIC)对象一致性(NOC)消息的方法和设备Method and apparatus for network interface card (NIC) object consistency (NOC) messages

技术领域Technical Field

在本发明的一些实施例中,本发明涉及通信系统。更具体地,但不限于涉及用于网络接口卡(network interface card,NIC)对象一致性(NIC object coherency,NOC)消息的方法和设备。In some embodiments of the present invention, the present invention relates to communication systems and more particularly, but not limited to, methods and apparatus for network interface card (NIC) object coherency (NOC) messages.

背景技术Background Art

远程直接内存访问(remote direct memory access,RDMA)是一台计算机的内存到另一台计算机的内存的直接内存访问,而不涉及任何一台计算机的操作系统。RDMA是一种广泛用于现代数据中心和计算机集群的低时延和高带宽网络的技术。RDMA将内存操作从中央处理单元(central processing unit,CPU)转移到RDMANIC(RNIC),从而直接访问内存。这种转移节省了CPU时间,因此CPU可以自由执行其它任务。RDMA软件层使用称为“verbs”的指令来执行RDMA操作,这些操作随后被转换为写入RNIC队列的工作请求。每个工作请求被称为工作队列元素(work queue element,WQE)。WQE用于标记(单侧写/读/原子)操作,用于未标记(双侧发送/接收)操作。RDMA对等体通过提供各种传输服务的队列对(queue-pair,QP)进行通信。常见的QP类型(由IB规范发布)有可靠连接(reliableconnection,RC)、可靠数据报(reliabledatagram,RD)、不可靠数据报(unreliabledatagram,UD)、扩展可靠连接(extended reliable connection,XRC)和不可靠连接(unreliable connection,UC)。有些公司拥有专有的QP类型,例如Amazon SRD和MellanoxDCT。Remote direct memory access (RDMA) is direct memory access from one computer's memory to another computer's memory without involving the operating system of either computer. RDMA is a technology widely used for low-latency and high-bandwidth networking in modern data centers and computer clusters. RDMA offloads memory operations from the central processing unit (CPU) to the RDMANIC (RNIC), which directly accesses the memory. This offload saves CPU time, so the CPU is free to perform other tasks. The RDMA software layer uses instructions called "verbs" to perform RDMA operations, which are then converted into work requests that are written to the RNIC queue. Each work request is called a work queue element (WQE). WQEs are used for marked (one-sided write/read/atomic) operations and for unmarked (two-sided send/receive) operations. RDMA peers communicate through queue-pairs (QPs) that provide various transport services. Common QP types (published by IB specifications) are reliable connection (RC), reliable datagram (RD), unreliable datagram (UD), extended reliable connection (XRC), and unreliable connection (UC). Some companies have proprietary QP types, such as Amazon SRD and Mellanox DCT.

发明内容Summary of the invention

本发明的一个目的是提供用于通过将对象一致性检查和操作转移到RNIC并通过优化对象一致性流来测试对象一致性和减少计算资源和网络带宽并缩短大数据包的时延的设备和方法。An object of the present invention is to provide an apparatus and method for testing object consistency and reducing computing resources and network bandwidth and shortening latency of large data packets by transferring object consistency checks and operations to RNIC and by optimizing object consistency flows.

本发明的另一个目的是提供一种用于执行先读后写(read before write,RBW)请求的方法,该方法减少了计算资源并节省了时间。Another object of the present invention is to provide a method for performing a read before write (RBW) request, which reduces computing resources and saves time.

本发明的另一个目的是提供一种用于执行作为单个操作的比较和交换(compareand swap,CAS)和读取的RDMA协议操作码的方法,其中,RDMA行为了解数据包结构,以及使用CAS和读操作码对数据包进行所需的操作。从而节省了时间和计算资源。Another object of the present invention is to provide a method for performing RDMA protocol opcodes of compare and swap (CAS) and read as a single operation, wherein the RDMA behavior understands the packet structure and performs the required operations on the packet using CAS and read opcodes, thereby saving time and computing resources.

在研究下文附图和详细描述之后,本发明的其它系统、方法、特征和优点对于本领域技术人员来说是或变得显而易见的。希望所有这些其它系统、方法、特征和优点包括在本说明书中,在本发明的范围内,并且受所附权利要求的保护。Other systems, methods, features and advantages of the present invention will be or become apparent to those skilled in the art after studying the following drawings and detailed description. It is intended that all such other systems, methods, features and advantages be included in this description, be within the scope of the present invention, and be protected by the accompanying claims.

在一个方面中,本发明涉及一种用于接收多个事务的设备,包括远程直接内存访问(remote direct memory access,RDMA)网络接口卡(RDMA network interface card,RNIC),所述设备用于:In one aspect, the present invention relates to a device for receiving a plurality of transactions, comprising a remote direct memory access (RDMA) network interface card (RNIC), the device being configured to:

从发送器接收从内存地址读取对象的请求;receiving a request from a sender to read an object from a memory address;

检查头版本何时解锁并从每个缓存行中提取版本号Vobj,并且验证所述Vobj与所述对象的头的版本字段的最低有效字节(least significant byte,LSB)是否匹配;Check when the header version is unlocked and extract the version number Vobj from each cache line and verify that the Vobj matches the least significant byte (LSB) of the version field of the header of the object;

当每个缓存行中的所述Vobj与所述版本字段中的值匹配时:When the Vobj in each cache line matches the value in the version field:

从所述对象的每个缓存行中删除所述Vobj,remove the Vobj from each cache line of the object,

读取所述对象的数据;Reading data of the object;

向所述发送器发送所述对象;sending the object to the sender;

当所述头版本被锁定时,或者当每个缓存行中的所述Vobj与所述版本字段中的所述值不匹配时,重试预定义次数,以验证头版本是否被解锁,并且验证每个Vobj与所述对象的所述头中的所述版本字段中的所述值是否匹配,并且当不存在匹配时,发送失败响应。When the header version is locked, or when the Vobj in each cache line does not match the value in the version field, retry a predefined number of times to verify whether the header version is unlocked and verify whether each Vobj matches the value in the version field in the header of the object, and send a failure response when there is no match.

由RNIC执行检查并向发送器发送对象而无需Vobj,节省了网络和计算资源,并减少了时延。The RNIC performs the check and sends the object to the sender without Vobj, saving network and computing resources and reducing latency.

在第一方面的另一种实现方式中,所述RNIC还用于:In another implementation of the first aspect, the RNIC is further used to:

从发送器接收将对象写入所述内存地址的请求;receiving a request from a sender to write an object to the memory address;

提取所述对象的所述头的所述版本字段中的所述值,将所述值写入所述对象的每个缓存行中,并将所述对象写入所述内存地址;extracting the value in the version field of the header of the object, writing the value into each cache line of the object, and writing the object to the memory address;

向所述发送器发送成功响应。A success response is sent to the sender.

在第二方面中,本发明涉及一种用于接收多个事务的设备,包括远程直接内存访问(remote direct memory access,RDMA)网络接口卡(RDMA network interface card,RNIC),所述设备用于:In a second aspect, the present invention relates to a device for receiving a plurality of transactions, comprising a remote direct memory access (RDMA) network interface card (RNIC), the device being configured to:

从发送器接收从内存地址读取对象的请求;receiving a request from a sender to read an object from a memory address;

从所述对象的每个数据包中提取数据;extracting data from each data packet of the object;

为所述对象的每个数据包中的所述数据计算散列函数值,并在临时RNIC内存地址处更新当前数据包的计算得到的散列函数值;Calculating a hash function value for the data in each data packet of the object, and updating the calculated hash function value of the current data packet at the temporary RNIC memory address;

在最后一个数据包处,验证更新后的散列函数值与所述对象的头中的散列值字段中的值是否匹配;At the last data packet, verify whether the updated hash function value matches the value in the hash value field in the header of the object;

当所述散列值与所述对象的所述头中的所述散列值字段中的所述值匹配时,向所述发送器发送所述对象;When the hash value matches the value in the hash value field in the header of the object, sending the object to the sender;

当所述计算得到的散列值与所述对象的所述头的所述散列值字段中的所述值不匹配时,重试预定义次数,以:When the calculated hash value does not match the value in the hash value field of the header of the object, retry a predefined number of times to:

为每个数据包中的所述数据计算所述散列函数值,并在所述临时RNIC内存处更新所述当前数据包的所述计算得到的散列函数值,在最后一个数据包处,验证所述更新后的散列函数值与所述对象的所述头中的所述散列值字段中的所述值是否匹配,当不存在匹配时,发送失败响应。Calculate the hash function value for the data in each data packet, and update the calculated hash function value of the current data packet in the temporary RNIC memory. At the last data packet, verify whether the updated hash function value matches the value in the hash value field in the header of the object. When there is no match, send a failure response.

在第二方面的另一种实现方式中,所述RNIC还用于:In another implementation of the second aspect, the RNIC is further used to:

从发送器接收将对象写入所述内存地址的请求;receiving a request from a sender to write an object to the memory address;

为所述对象的每个数据包中的所述数据计算所述散列函数值,并在临时RNIC内存地址处更新所述当前数据包的所述计算得到的散列函数值,在最后一个数据包处,更新所述对象的所述头中的所述散列值字段中的所述计算得到的散列函数值,并且在临时发送器RNIC内存地址处更新当前数据包的所述计算得到的散列函数值,并且其中,在最后一个数据包处,所述更新后的散列函数值被写入所述对象的所述头中的所述散列值字段中;calculating the hash function value for the data in each packet of the object, and updating the calculated hash function value of the current packet at a temporary RNIC memory address, and at a last packet, updating the calculated hash function value in the hash value field in the header of the object, and updating the calculated hash function value of the current packet at a temporary sender RNIC memory address, and wherein, at the last packet, the updated hash function value is written to the hash value field in the header of the object;

将所述对象写入所述内存地址;Writing the object to the memory address;

向所述发送器发送成功响应。A success response is sent to the sender.

在第二方面的另一种实现方式中,所述RNIC还用于:In another implementation of the second aspect, the RNIC is further used to:

从所述发送器接收将对象写入所述地址内存的所述请求;其中,所述散列函数值由发送器RNIC针对所述对象的每个数据包中的数据计算,并且当前数据包的所述计算得到的散列函数值在临时发送器RNIC内存地址处更新,并且其中,在最后一个数据包处,所述更新后的散列函数值被写入所述对象的所述头中的所述散列值字段中;receiving from the sender the request to write the object to the address memory; wherein the hash function value is calculated by the sender RNIC for the data in each packet of the object, and the calculated hash function value of the current packet is updated at a temporary sender RNIC memory address, and wherein, at the last packet, the updated hash function value is written to the hash value field in the header of the object;

将所述对象写入所述地址内存;Writing the object into the address memory;

向所述发送器发送成功响应。A success response is sent to the sender.

在第二方面的另一种实现方式中,所述RNIC还用于:In another implementation of the second aspect, the RNIC is further used to:

从一个或多个发送器接收从所述内存地址读取对象的先读后写(read beforewrite,RBW)请求,其中,所述RBW请求的数据包包括指示期望从所述一个或多个发送器接收将另一个对象写入所述内存地址的请求的比特;receiving a read before write (RBW) request from one or more transmitters to read an object from the memory address, wherein a packet of the RBW request includes a bit indicating an expectation to receive a request from the one or more transmitters to write another object to the memory address;

在表中分配行,所述行具有所述一个或多个发送器的标识(identification,ID)、所述接收到的RBW请求的时间戳以及每个发送器的从中读取所述对象的所述地址内存;allocating a row in a table, the row having an identification (ID) of the one or more senders, a timestamp of the received RBW request, and the address memory of each sender from which the object was read;

当从另一个发送器接收到将对象写入所述地址内存的请求时,进行以下操作,其中,所述另一个发送器在所述表中具有针对从所述内存地址读取所述对象的RBW请求以及时间戳的行,其中,时间戳小于所述一个或多个发送器请求的所述时间戳:Upon receiving a request to write an object to the memory at the address from another sender, wherein the other sender has a row in the table for a RBW request to read the object from the memory address and a timestamp, wherein the timestamp is less than the timestamp requested by the one or more senders:

根据所述表中的所述发送器ID,向所述一个或多个发送器发送通知,以避免发送写入所述内存地址的请求;sending a notification to the one or more transmitters based on the transmitter ID in the table to avoid sending a request to write to the memory address;

从所述表中删除另一个发送器的行以及所述一个或多个发送器ID和时间戳的行。The row for the other transmitter and the row for the one or more transmitter IDs and timestamps are deleted from the table.

在第二方面的另一种实现方式中,所述RNIC还用于:In another implementation of the second aspect, the RNIC is further used to:

一个或多个发送器发送从所述内存地址读取所述对象的RBW请求,其中,所述RBW请求的数据包包括指示期望从所述一个或多个发送器接收将另一个对象写入所述内存地址的请求的比特。One or more transmitters transmit an RBW request to read the object from the memory address, wherein a data packet of the RBW request includes a bit indicating that a request to write another object to the memory address is expected to be received from the one or more transmitters.

在第二方面的另一种实现方式中,所述RNIC还用于:In another implementation of the second aspect, the RNIC is further used to:

发送零读请求,指示不期望从所述一个或多个发送器发送将对象写入所述地址内存的请求。A zero read request is sent, indicating that no request is expected from the one or more senders to write the object to the address memory.

在第二方面的另一种实现方式中,所述RNIC还用于:In another implementation of the second aspect, the RNIC is further used to:

从所述一个或多个发送器接收零读请求,所述零读请求指示不期望从所述一个或多个发送器发送将对象写入所述地址内存的请求;receiving a zero read request from the one or more senders, the zero read request indicating that a request to write an object to the address memory is not expected to be sent from the one or more senders;

从所述表中删除所述一个或多个发送器ID和时间戳的所述行。The row of the one or more transmitter IDs and timestamps is deleted from the table.

在第三方面中,本发明涉及一种用于发送多个事务的设备,包括远程直接内存访问(remote direct memory access,RDMA)网络接口卡(RDMA network interface card,RNIC),所述设备用于:In a third aspect, the present invention relates to a device for sending a plurality of transactions, comprising a remote direct memory access (RDMA) network interface card (RNIC), the device being configured to:

向接收器发送从内存地址读取对象的请求,其中,版本号Vobj由所述接收器从所述对象的每个缓存行中提取,并且每个缓存行的所述Vobj被验证与所述对象的头的版本字段中的LSB值是否匹配;Sending a request to a receiver to read an object from a memory address, wherein a version number Vobj is extracted by the receiver from each cache line of the object and the Vobj of each cache line is verified to match an LSB value in a version field of a header of the object;

当每个缓存行中的所述Vobj与所述版本字段中的所述LSB值匹配时,从所述接收器接收所述对象,其中,所述接收器将所述Vobj从所述对象的每个缓存行中删除;或receiving the object from the receiver when the Vobj in each cache line matches the LSB value in the version field, wherein the receiver deletes the Vobj from each cache line of the object; or

当每个缓存行中的所述Vobj与所述版本字段中的所述LSB值不匹配时,从所述接收器接收失败响应。A failure response is received from the receiver when the Vobj in each cache line does not match the LSB value in the version field.

在第三方面的另一种实现方式中,所述RNIC还用于:In another implementation of the third aspect, the RNIC is further used to:

向接收器发送将对象写入所述内存地址的请求;其中,所述对象的所述头的所述版本字段中的所述LSB值由所述接收器提取并写入所述对象的每个缓存行中,并且所述对象被写入所述内存地址;sending a request to a receiver to write an object to the memory address; wherein the LSB value in the version field of the header of the object is extracted by the receiver and written into each cache line of the object, and the object is written to the memory address;

从所述接收器接收成功响应。A success response is received from the receiver.

在第四方面中,本发明涉及一种用于发送多个事务的设备,包括远程直接内存访问(remote direct memory access,RDMA)网络接口卡(RDMA network interface card,RNIC),所述设备用于:In a fourth aspect, the present invention relates to a device for sending a plurality of transactions, comprising a remote direct memory access (RDMA) network interface card (RNIC), the device being configured to:

向接收器发送从内存地址读取对象的请求;其中,数据由所述接收器从所述对象的每个数据包中提取,并且散列函数值是针对所述对象的每个数据包中的所述数据计算得到的,并且在最后一个数据包处,更新后的散列函数值由所述接收器验证与所述对象的头中的散列值字段中的值是否匹配;Sending a request to a receiver to read an object from a memory address; wherein data is extracted by the receiver from each data packet of the object, and a hash function value is calculated for the data in each data packet of the object, and at the last data packet, the updated hash function value is verified by the receiver to match the value in the hash value field in the header of the object;

当所述散列值与所述对象的所述头中的所述散列值字段中的所述值匹配时,从所述接收器接收所述对象;receiving the object from the receiver when the hash value matches the value in the hash value field in the header of the object;

当所述计算得到的散列值与所述对象的所述头的所述散列值字段中的所述值不匹配时接收失败响应。A failure response is received when the calculated hash value does not match the value in the hash value field of the header of the object.

在第四方面的另一种实现方式中,所述RNIC还用于:In another implementation of the fourth aspect, the RNIC is further used to:

向接收器发送将对象写入内存地址的请求;其中,所述散列函数值由所述接收器针对所述对象的每个数据包中的所述数据计算得到,并且当前数据包的所述计算得到的散列函数值由所述接收器在临时接收器RNIC内存地址处更新,并且在最后一个数据包处,所述计算得到的散列函数值在所述对象的所述头的所述散列值字段中更新;Sending a request to a receiver to write an object to a memory address; wherein the hash function value is calculated by the receiver for the data in each data packet of the object, and the calculated hash function value of the current data packet is updated by the receiver at a temporary receiver RNIC memory address, and at the last data packet, the calculated hash function value is updated in the hash value field of the header of the object;

在所述对象被所述接收器写入所述地址内存后,从所述接收器接收成功响应。After the object is written to the address memory by the receiver, a success response is received from the receiver.

在第四方面的另一种实现方式中,所述RNIC还用于:In another implementation of the fourth aspect, the RNIC is further used to:

为所述对象的每个数据包中的所述数据计算所述散列函数值,并在临时RNIC内存地址处更新当前数据包的所述计算得到的散列函数值;Calculating the hash function value for the data in each data packet of the object, and updating the calculated hash function value of the current data packet at the temporary RNIC memory address;

在最后一个数据包处,将更新后的散列函数值写入所述对象的所述头的散列值字段;At the last data packet, writing the updated hash function value into the hash value field of the header of the object;

向所述接收器发送将所述对象写入的所述请求;sending the request to write the object to the receiver;

在所述对象被所述接收器写入所述地址内存后,从所述接收器接收成功响应。After the object is written to the address memory by the receiver, a success response is received from the receiver.

在第五方面中,本发明涉及一种用于在远程直接内存访问(remote directmemory access,RDMA)事务中将比较和交换操作和读请求的流操作优化为单个操作的设备,包括RDMA网络接口卡(RDMA network interface card,RNIC),所述设备用于:In a fifth aspect, the present invention relates to a device for optimizing a compare and swap operation and a stream operation of a read request into a single operation in a remote direct memory access (RDMA) transaction, comprising an RDMA network interface card (RNIC), the device being configured to:

从发送器接收对优化后的比较和交换和读操作的请求:Receive a request from the sender for an optimized compare and swap and read operation:

将第一内存地址的内容与第一值进行比较;comparing the contents of the first memory address to the first value;

当所述第一内存地址的所述内容等于所述第一值时:When the content of the first memory address is equal to the first value:

将所述第一内存地址的所述内容替换为第二值;replacing the content of the first memory address with a second value;

读取第二内存地址的内容;Read the contents of the second memory address;

向所述发送器发送成功响应以及从所述第二内存地址读取的所述内容;Sending a success response and the content read from the second memory address to the transmitter;

当所述第一内存地址的所述内容不等于所述第一值时:When the content of the first memory address is not equal to the first value:

向所述发送器发送失败响应。A failure response is sent to the sender.

在第六方面中,本发明涉及一种用于接收多个事务的方法,包括:In a sixth aspect, the present invention relates to a method for receiving a plurality of transactions, comprising:

在远程直接内存访问(remote direct memory access,RDMA)网络接口卡(RDMAnetwork interface card,RNIC)处:At the remote direct memory access (RDMA) network interface card (RNIC):

从发送器接收从内存地址读取对象的请求;receiving a request from a sender to read an object from a memory address;

从所述对象的每个缓存行中提取数据版本号Vobj,并验证每个缓存行的所述Vobj与所述对象的头的版本字段中的最低有效字节(least significant byte,LSB)值是否匹配;Extracting a data version number Vobj from each cache line of the object, and verifying whether the Vobj of each cache line matches a least significant byte (LSB) value in a version field of a header of the object;

当每个缓存行中的所述Vobj与所述版本字段中的值匹配时:When the Vobj in each cache line matches the value in the version field:

从所述对象的每个缓存行中删除所述Vobj,remove the Vobj from each cache line of the object,

读取所述对象的数据;Reading data of the object;

向所述发送器发送所述对象;sending the object to the sender;

当每个缓存行中的所述Vobj与所述版本字段中的所述LSB值不匹配时,重试预定义次数,以验证每个Vobj与所述对象的所述头中的所述LSB版本字段中的所述值是否匹配,并且当不存在匹配时,When the Vobj in each cache line does not match the LSB value in the version field, retry a predefined number of times to verify whether each Vobj matches the value in the LSB version field in the header of the object, and when there is no match,

发送失败响应。Send a failure response.

在第六方面的另一种实现方式中,所述方法还包括:In another implementation of the sixth aspect, the method further includes:

从发送器接收将对象写入所述内存地址的请求;receiving a request from a sender to write an object to the memory address;

提取所述对象的所述头的所述版本字段中的所述LSB值,将所述值写入所述对象的每个缓存行中,并将所述对象写入所述内存地址;extracting the LSB value in the version field of the header of the object, writing the value into each cache line of the object, and writing the object to the memory address;

向所述发送器发送成功响应。A success response is sent to the sender.

在第七方面中,本发明涉及一种用于接收多个事务的方法,包括:In a seventh aspect, the present invention relates to a method for receiving a plurality of transactions, comprising:

在远程直接内存访问(remote direct memory access,RDMA)网络接口卡(RDMAnetwork interface card,RNIC)处:At the remote direct memory access (RDMA) network interface card (RNIC):

从发送器接收从内存地址读取对象的请求;receiving a request from a sender to read an object from a memory address;

从所述对象的每个数据包中提取数据;extracting data from each data packet of the object;

为所述对象的每个数据包中的所述数据计算散列函数值,并在临时RNIC内存地址处更新当前数据包的所述计算得到的散列函数值;Calculating a hash function value for the data in each data packet of the object, and updating the calculated hash function value of the current data packet at a temporary RNIC memory address;

在最后一个数据包处,验证更新后的散列函数值与所述对象的头中的散列值字段中的值是否匹配;At the last data packet, verify whether the updated hash function value matches the value in the hash value field in the header of the object;

当所述散列值与所述对象的所述头中的所述散列值字段中的所述值匹配时,向所述发送器发送所述对象;When the hash value matches the value in the hash value field in the header of the object, sending the object to the sender;

当所述计算得到的散列值与所述对象的所述头的所述散列值字段中的所述值不匹配时,重试预定义次数,以:When the calculated hash value does not match the value in the hash value field of the header of the object, retry a predefined number of times to:

为每个数据包中的所述数据计算所述散列函数值,并在所述临时RNIC内存处更新所述当前数据包的所述计算得到的散列函数值,在最后一个数据包处,验证所述更新后的散列函数值与所述对象的所述头中的所述散列值字段中的所述值是否匹配,当不存在匹配时,发送失败响应。Calculate the hash function value for the data in each data packet, and update the calculated hash function value of the current data packet in the temporary RNIC memory. At the last data packet, verify whether the updated hash function value matches the value in the hash value field in the header of the object. When there is no match, send a failure response.

在第七方面的另一种实现方式中,所述方法还包括:In another implementation of the seventh aspect, the method further includes:

从发送器接收将对象写入所述内存地址的请求;receiving a request from a sender to write an object to the memory address;

为所述对象的每个数据包中的所述数据计算所述散列函数值,并在临时RNIC内存地址处更新所述当前数据包的所述计算得到的散列函数值,在最后一个数据包处,更新所述对象的所述头中的所述散列值字段中的所述计算得到的散列函数值;Calculating the hash function value for the data in each data packet of the object, and updating the calculated hash function value of the current data packet at a temporary RNIC memory address, and at a last data packet, updating the calculated hash function value in the hash value field in the header of the object;

将所述对象写入所述内存地址,并发送成功响应。Write the object to the memory address and send a success response.

在第七方面的另一种实现方式中,所述方法还包括:In another implementation of the seventh aspect, the method further includes:

在发送器RNIC处为所述对象的每个数据包中的所述数据计算所述散列函数值,并在临时发送器RNIC内存地址处更新当前数据包的所述计算得到的散列函数值;Calculating the hash function value for the data in each data packet of the object at the transmitter RNIC, and updating the calculated hash function value of the current data packet at a temporary transmitter RNIC memory address;

在最后一个数据包处,将更新后的散列函数值写入所述对象的所述头的所述散列值字段;At the last data packet, writing the updated hash function value into the hash value field of the header of the object;

向所述接收器发送将所述对象写入的所述请求;sending the request to write the object to the receiver;

在所述接收器RNIC处,接收将所述对象写入的所述请求;receiving, at the receiver RNIC, the request to write the object;

向所述发送器发送成功响应。A success response is sent to the sender.

在第八方面中,本发明涉及一种用于发送多个事务的方法,包括:In an eighth aspect, the present invention relates to a method for sending a plurality of transactions, comprising:

在远程直接内存访问(remote direct memory access,RDMA)网络接口卡(RDMAnetwork interface card,RNIC)处:At the remote direct memory access (RDMA) network interface card (RNIC):

向接收器发送从内存地址读取对象的请求,其中,版本号Vobj由接收器RNIC从所述对象的每个缓存行中提取,并且每个缓存行的所述Vobj由所述接收器RNIC验证与所述对象的头的版本字段中的Send a request to the receiver to read the object from the memory address, wherein a version number Vobj is extracted by the receiver RNIC from each cache line of the object, and the Vobj of each cache line is verified by the receiver RNIC to be consistent with the version field in the header of the object

LSB值是否匹配;Whether the LSB value matches;

当每个缓存行中的所述Vobj与所述版本字段中的所述LSB值匹配时,从所述接收器RNIC接收所述对象,其中,所述接收器将所述Vobj从所述对象的每个缓存行中删除;或receiving the object from the receiver RNIC when the Vobj in each cache line matches the LSB value in the version field, wherein the receiver deletes the Vobj from each cache line of the object; or

当每个缓存行中的所述Vobj与所述版本字段中的所述LSB值不匹配时,从所述接收器RNIC接收失败响应。When the Vobj in each cache line does not match the LSB value in the version field, a failure response is received from the receiver RNIC.

在第九方面中,本发明涉及一种用于发送多个事务的方法,包括:In a ninth aspect, the present invention relates to a method for sending a plurality of transactions, comprising:

在远程直接内存访问(remote direct memory access,RDMA)网络接口卡(RDMAnetwork interface card,RNIC)处:At the remote direct memory access (RDMA) network interface card (RNIC):

向接收器RNIC发送从内存地址读取对象的请求;其中,数据由所述接收器RNIC从所述对象的每个数据包中提取,并且散列函数值是针对所述对象的每个数据包中的所述数据计算得到的,并且在最后一个数据包处,更新后的散列函数值由所述接收器RNIC验证与所述对象的头中的散列值字段中的值是否匹配;Sending a request to a receiver RNIC to read an object from a memory address; wherein data is extracted by the receiver RNIC from each data packet of the object, and a hash function value is calculated for the data in each data packet of the object, and at the last data packet, the updated hash function value is verified by the receiver RNIC to match the value in the hash value field in the header of the object;

当所述散列值与所述对象的所述头的所述散列值字段中的所述值匹配时,从所述接收器接收所述对象;receiving the object from the receiver when the hash value matches the value in the hash value field of the header of the object;

当所述计算得到的散列值与所述对象的所述头的所述散列值字段中的所述值不匹配时,接收失败响应。A failure response is received when the calculated hash value does not match the value in the hash value field of the header of the object.

在第十方面中,本发明涉及一种用于在远程直接内存访问(remote directmemory access,RDMA)事务中将比较和交换操作和读请求的流操作优化为单个操作的方法,包括:In a tenth aspect, the present invention relates to a method for optimizing a compare and swap operation and a stream operation of a read request into a single operation in a remote direct memory access (RDMA) transaction, comprising:

在接收器处,当从发送器接收到对优化后的比较和交换和读操作的请求时:At the receiver, when a request for an optimized compare and swap and read operation is received from the sender:

将第一内存地址的内容与第一值进行比较;comparing the contents of the first memory address to the first value;

当所述第一内存地址的所述内容等于所述第一值时:When the content of the first memory address is equal to the first value:

将所述第一内存地址的所述内容替换为第二值;replacing the content of the first memory address with a second value;

读取第二内存地址的内容;Read the contents of the second memory address;

向所述发送器发送成功响应和从所述第二内存地址读取的所述内容;Sending a success response and the content read from the second memory address to the transmitter;

当所述内容不等于所述第一值时:When the content is not equal to the first value:

向所述发送器发送失败响应。A failure response is sent to the sender.

除非另有定义,否则本文所用的所有技术和/或科学术语都具有与实施例所属领域内的普通技术人员通常理解的相同含义。虽然与本文描述的方法和材料类似或等效的方法和材料可用于实施例的实践或测试,但下文描述了示例性方法和/或材料。如有冲突,以本专利说明书(包括定义)为准。此外,这些材料、方法和示例仅是说明性的,并不一定具有限制性。Unless otherwise defined, all technical and/or scientific terms used herein have the same meanings as those generally understood by those of ordinary skill in the art to which the embodiments belong. Although methods and materials similar or equivalent to the methods and materials described herein can be used in the practice or testing of the embodiments, exemplary methods and/or materials are described below. In the event of a conflict, this patent specification (including definitions) shall prevail. In addition, these materials, methods and examples are illustrative only and are not necessarily restrictive.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

此处将仅作为示例,结合附图描述本发明的一些实施例。具体结合附图详细说明,需要强调的是,所示细节通过示例示出并出于对本发明实施例的说明性探讨。这样,根据附图说明,如何实施本发明实施例对本领域技术人员而言是显而易见的。Some embodiments of the present invention will be described herein by way of example only in conjunction with the accompanying drawings. In particular, the details shown are described in detail in conjunction with the accompanying drawings, and it should be emphasized that the details shown are shown by way of example and for the purpose of illustrative discussion of the embodiments of the present invention. Thus, it will be apparent to those skilled in the art how to implement the embodiments of the present invention based on the description of the accompanying drawings.

在附图中:In the attached picture:

图1示意性地示出了本发明的一些实施例提供的用于检查RDMA事务中NIC对象一致性的装置的框图;FIG1 schematically shows a block diagram of an apparatus for checking consistency of NIC objects in an RDMA transaction provided by some embodiments of the present invention;

图2示意性地示出了本发明的一些实施例提供的对象内存的布局,其中,对象版本字段ObV为64比特,在对象的头的开头处(即,在第一缓存行中),ObV的LSB为16比特,表示为Vobj,在后续缓存行中的每个缓存行的开始处;FIG2 schematically illustrates the layout of an object memory provided by some embodiments of the present invention, wherein the object version field ObV is 64 bits, at the beginning of the header of the object (i.e., in the first cache line), and the LSB of ObV is 16 bits, denoted as Vobj, at the beginning of each of the subsequent cache lines;

图3a示意性地示出了本发明的一些实施例提供的在从缓存行中删除Vobj之前的对象内存布局;FIG3 a schematically illustrates an object memory layout before deleting Vobj from a cache line, according to some embodiments of the present invention;

图3b示意性地示出了本发明的一些实施例提供的在从缓存行中删除Vobj之后的对象内存布局;FIG3 b schematically illustrates an object memory layout after Vobj is deleted from a cache line according to some embodiments of the present invention;

图4是本发明的一些实施例提供的当接收读请求时通过验证每个缓存行中的Vobj是否等于对象的头中的对象版本号ObV的LSB来检查RDMA事务中的NIC对象一致性的示例的示意性序列图;4 is a schematic sequence diagram of an example of checking NIC object consistency in an RDMA transaction by verifying whether Vobj in each cache line is equal to the LSB of the object version number ObV in the header of the object when receiving a read request provided by some embodiments of the present invention;

图5是本发明的一些实施例提供的当接收写请求时通过验证每个缓存行中的Vobj是否等于对象的头中的对象版本号ObV的LSB来检查RDMA事务中的NIC对象一致性的示例的示意性序列图;5 is a schematic sequence diagram of an example of checking NIC object consistency in an RDMA transaction by verifying whether Vobj in each cache line is equal to the LSB of the object version number ObV in the header of the object when receiving a write request provided by some embodiments of the present invention;

图6示意性地示出了本发明的一些实施例提供的当接收读请求时通过验证每个缓存行中的Vobj是否等于对象的头中的ObV的LSB来检查RDMA事务中的NIC对象一致性的方法的流程图;6 schematically shows a flowchart of a method for checking NIC object consistency in an RDMA transaction by verifying whether Vobj in each cache line is equal to the LSB of ObV in the header of the object when receiving a read request, provided by some embodiments of the present invention;

图7示意性地示出了本发明的一些实施例提供的当发送读请求时通过将对象的头中的ObV的LSB插入到对象的每个缓存行中来检查RDMA事务中的NIC对象一致性的方法的流程图;7 schematically shows a flowchart of a method for checking NIC object consistency in an RDMA transaction by inserting the LSB of the ObV in the header of the object into each cache line of the object when sending a read request, provided by some embodiments of the present invention;

图8示意性地示出了本发明的一些实施例提供的内存布局,其中,对象保存在内存中并且具有头字段,该头字段的值是通过对象的散列函数计算得到的;FIG8 schematically illustrates a memory layout provided by some embodiments of the present invention, wherein an object is stored in memory and has a header field, the value of which is calculated by a hash function of the object;

图9示意性地示出了本发明的一些实施例提供的当接收读请求时通过散列函数检查RDMA事务中的NIC一致性的方法的流程图;FIG9 schematically shows a flow chart of a method for checking NIC consistency in an RDMA transaction by a hash function when receiving a read request provided by some embodiments of the present invention;

图10示意性地示出了本发明的一些实施例提供的当接收读请求时通过散列函数检查RDMA事务中的NIC对象一致性而不首先锁定对象的示例的序列图;FIG10 schematically shows a sequence diagram of an example of checking the consistency of a NIC object in an RDMA transaction by a hash function without first locking the object when receiving a read request, according to some embodiments of the present invention;

图11示意性地示出了本发明的一些实施例提供的RDMA写请求的示例的流程图,其中,散列函数在接收器RNIC 122中计算;FIG. 11 schematically shows a flowchart of an example of an RDMA write request provided by some embodiments of the present invention, wherein the hash function is calculated in the receiver RNIC 122;

图12示意性地示出了本发明的一些实施例提供的用于使用在接收器RNIC处计算得到的散列函数在RDMA事务中将对象写入的示例的序列图;FIG. 12 schematically shows a sequence diagram of an example of writing an object in an RDMA transaction using a hash function calculated at a receiver RNIC, provided by some embodiments of the present invention;

图13示意性地示出了本发明的一些实施例提供的用于使用在发送器RNIC处计算得到的散列函数在RDMA事务中将对象写入的示例的序列图;FIG13 schematically shows a sequence diagram of an example of writing an object in an RDMA transaction using a hash function calculated at a sender RNIC, provided by some embodiments of the present invention;

图14示意性地示出了本发明的一些实施例提供的RDMA写请求的示例的流程图,其中,散列函数在发送器RNIC 112中计算;FIG. 14 schematically shows a flowchart of an example of an RDMA write request provided by some embodiments of the present invention, wherein the hash function is calculated in the sender RNIC 112;

图15示意性地示出了本发明的一些实施例提供的当发送读请求时通过散列函数检查RDMA事务中的NIC一致性的方法的流程图;FIG15 schematically shows a flow chart of a method for checking NIC consistency in an RDMA transaction by a hash function when sending a read request provided by some embodiments of the present invention;

图16示意性地示出了本发明的一些实施例提供的用于发送先读后写(readbefore write,RBW)请求的方法的流程图,该RBW请求是带有关于写请求的通知的读请求,该写请求预期在读请求之后发送;FIG16 schematically shows a flow chart of a method for sending a read before write (RBW) request provided by some embodiments of the present invention, wherein the RBW request is a read request with a notification about a write request, and the write request is expected to be sent after the read request;

图17a至图17g示意性地示出了本发明的一些实施例提供的用于先读后写(readbefore write,RBW)请求的方法的示例;17a to 17g schematically illustrate examples of methods for read before write (RBW) requests provided by some embodiments of the present invention;

图18示意性地示出了本发明的一些实施例提供的在RDMA事务中将比较和交换(compare and swap,CAS)操作和读请求的流操作优化为单个操作的方法的流程图;FIG. 18 schematically shows a flow chart of a method for optimizing a compare and swap (CAS) operation and a stream operation of a read request into a single operation in an RDMA transaction, provided by some embodiments of the present invention;

图19示意性地示出了本发明的一些实施例提供的优化后的比较和交换(compareand swap,CAS)和读操作的序列图。FIG. 19 schematically illustrates a sequence diagram of optimized compare and swap (CAS) and read operations provided by some embodiments of the present invention.

具体实施方式DETAILED DESCRIPTION

在本发明的一些实施例中,本发明涉及通信系统。更具体地,但不限于涉及用于网络接口卡(network interface card,NIC)对象一致性(NIC object coherency,NOC)消息的方法和装置。In some embodiments of the present invention, the present invention relates to communication systems and more particularly, but not limited to, methods and apparatus for network interface card (NIC) object coherency (NOC) messages.

在运行在多核系统上或运行在分布式计算平台上的多线程应用程序中,经常会遇到某些数据结构,这些数据结构被频繁读取,但相对来说变化较少。一个示例是数据库服务器,它具有很少更改的数据库列表,但针对访问数据库的每次查询需要进行咨询,另一个示例是利用RDMA的主内存分布式对象存储。在这种情况下,需要保证极快的读取访问以及防止不一致。实现这一点的方法是使用无锁只读操作。In multithreaded applications running on multicore systems or running on distributed computing platforms, you often encounter certain data structures that are frequently read but relatively infrequently change. An example is a database server that has a database list that rarely changes but needs to be consulted for every query that accesses the database, and another example is a main memory distributed object store that leverages RDMA. In this case, it is necessary to guarantee extremely fast read access as well as prevent inconsistencies. The way to achieve this is to use lock-free read-only operations.

在使用RDMA的分布式共享内存系统中,无锁读取方法会给计算资源和网络资源带来沉重的负载。In a distributed shared memory system using RDMA, the lock-free read method will impose a heavy load on computing resources and network resources.

有一些方法使用对象一致性,方法是在每个缓存行上插入版本号的最低有效字节(least significant byte,LSB),然后检查版本号是否已更改。但是,这在读取远程对象时浪费了大量的计算资源和网络资源。这些方法中实现远程读取的方式是通过RC QP发出RDMA读请求,然后用版本号的LSB在每个缓存行上测试LSB,如果测试失败,则等待超时值并再次发出请求,这样浪费CPU、带宽并且具有高时延。There are some methods that use object consistency by inserting the least significant byte (LSB) of the version number on each cache line and then checking if the version number has changed. However, this wastes a lot of computing resources and network resources when reading remote objects. The way to implement remote reads in these methods is to issue RDMA read requests through RC QP, and then test the LSB of the version number on each cache line. If the test fails, wait for the timeout value and issue the request again, which wastes CPU, bandwidth and has high latency.

这些方法中实现远程写入的方式是将LSB添加到远程端的每个缓存行,然后将对象发送回其原始机器,这样浪费计算CPU并且具有网络时延。The way these methods implement remote writes is to add the LSB to each cache line on the remote side and then send the object back to its originating machine, wasting computational CPU and incurring network latency.

目前,RDMA不支持,这是一种测试对象一致性、可以在读取时从每个缓存行中剥离对象最低有效字节(least significant byte,LSB)或者将LSB添加到远程或本地NIC上的每个缓存行的方式。Currently not supported by RDMA, this is a way to test object consistency by stripping the object's least significant byte (LSB) from each cache line when reading or adding the LSB to each cache line on the remote or local NIC.

因此,需要提供一种用于测试对象一致性的设备和方法,一方面检查内存区域是否严格可序列化和一致性的,另一方面减少计算资源和网络带宽并且缩短大数据包的时延。Therefore, it is necessary to provide a device and method for testing object consistency, which can check whether the memory area is strictly serializable and consistent on the one hand, and reduce computing resources and network bandwidth and shorten the latency of large data packets on the other hand.

根据本发明的一些实施例,本发明提供了用于通过将对象一致性操作转移到NIC并通过优化对象一致性流来测试对象一致性和减少计算资源和网络带宽并缩短大数据包的时延的设备和方法。According to some embodiments of the present invention, the present invention provides an apparatus and method for testing object consistency and reducing computing resources and network bandwidth and shortening latency of large data packets by transferring object consistency operations to a NIC and by optimizing object consistency flows.

在详细解释本发明的至少一个实施例之前,应当理解,本发明不必将其应用限于下面描述中阐述的和/或在附图和/或示例中说明的部件和/或方法的结构和布置的细节。本发明可以有其它实施例或可以采用各种方式实践或执行。Before explaining at least one embodiment of the present invention in detail, it should be understood that the present invention is not necessarily limited to the details of the structure and arrangement of the components and/or methods set forth in the following description and/or illustrated in the drawings and/or examples. The present invention may have other embodiments or may be practiced or implemented in various ways.

本发明可以是系统、方法和/或计算机程序产品。所述计算机程序产品可包括具有计算机可读程序指令的一个或多个计算机可读存储介质,所述计算机可读程序指令使处理器执行本发明的各方面。The present invention may be a system, method and/or computer program product. The computer program product may include one or more computer-readable storage media having computer-readable program instructions that cause a processor to perform various aspects of the present invention.

所述计算机可读存储介质可以是能够保留和存储指令以供指令执行设备使用的有形设备。计算机可读存储介质可以是例如但不限于电子存储设备、磁存储设备、光存储设备、电磁存储设备、半导体存储设备或上述设备的任何合适组合。The computer readable storage medium may be a tangible device capable of retaining and storing instructions for use by an instruction execution device. The computer readable storage medium may be, for example but not limited to, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the above devices.

本文中描述的计算机可读程序指令可以从计算机可读存储介质下载到相应的计算/处理设备,或者通过互联网、局域网、广域网和/或无线网络等网络下载到外部计算机或外部存储设备。The computer-readable program instructions described herein may be downloaded from a computer-readable storage medium to a corresponding computing/processing device, or downloaded to an external computer or external storage device via a network such as the Internet, a local area network, a wide area network, and/or a wireless network.

计算机可读程序指令可以完全在用户的计算机和/或计算机化设备上执行,部分在用户的计算机和/或计算机化设备上执行,作为独立软件包执行,部分在用户的计算机(和/或计算机化设备)上执行,部分在远程计算机或完全在远程计算机或服务器上执行。在后一种场景中,远程计算机可以通过任何类型的网络连接到用户的计算机和/或计算机化设备,这些网络包括局域网(local area network,LAN)或广域网(wide area network,WAN),还可以(例如,通过使用互联网服务提供商的互联网)连接到外部计算机。在一些实施例中,包括可编程逻辑电路、现场可编程门阵列(field-programmable gate array,FPGA)或可编程逻辑阵列(programmable logic array,PLA)等的电子电路可以通过利用所述计算机可读程序指令的状态信息来个性化所述电子电路,执行所述计算机可读程序指令,以执行本发明的各个方面。The computer-readable program instructions may be executed entirely on the user's computer and/or computerized device, partially on the user's computer and/or computerized device, as a stand-alone software package, partially on the user's computer (and/or computerized device), partially on a remote computer or entirely on a remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer and/or computerized device via any type of network, including a local area network (LAN) or a wide area network (WAN), and may also be connected to an external computer (e.g., via the Internet using an Internet service provider). In some embodiments, an electronic circuit including a programmable logic circuit, a field-programmable gate array (FPGA), or a programmable logic array (PLA), etc. may be executed by utilizing the state information of the computer-readable program instructions to personalize the electronic circuit and execute the computer-readable program instructions to perform various aspects of the present invention.

此处,结合本发明实施例的方法、装置(系统)以及计算机程序产品的流程图和/或框图描述本发明的各方面。应当理解,流程图说明和/或框图的每个方框以及流程图说明和/或框图中的方框的组合可以由计算机可读程序指令实现。Here, various aspects of the present invention are described in conjunction with the flowcharts and/or block diagrams of the methods, devices (systems) and computer program products of the embodiments of the present invention. It should be understood that each box in the flowchart illustration and/or block diagram and the combination of boxes in the flowchart illustration and/or block diagram can be implemented by computer-readable program instructions.

图中的流程图和框图示出了根据本发明的各种实施例的系统、方法和计算机程序产品的可能实现方式的架构、功能和操作。就此而言,流程图或框图中的每个框可以表示模块、区段或部分指令,包括用于实现一个或多个指定逻辑功能的一个或多个可执行指令。在一些替代实现方式中,框中说明的功能可以不按照图中说明的顺序实现。例如,事实上,连续示出的两个框可以几乎同时执行,或者有时候可以按照相反的顺序执行,这取决于所涉及的功能。还需要说明的是,框图和/或流程图中的每个框以及框图和/或流程图中的框组合可以由基于专用硬件的系统实现,这些系统执行特定的功能或动作,或者执行专用硬件和计算机指令的组合。The flowchart and block diagram in the figure show the architecture, function and operation of the possible implementation of the system, method and computer program product according to various embodiments of the present invention. In this regard, each box in the flowchart or block diagram can represent a module, a section or a partial instruction, including one or more executable instructions for implementing one or more specified logical functions. In some alternative implementations, the functions described in the box may not be implemented in the order described in the figure. For example, in fact, the two boxes shown in succession can be executed almost simultaneously, or sometimes in the opposite order, depending on the functions involved. It should also be noted that each box in the block diagram and/or flowchart and the combination of boxes in the block diagram and/or flowchart can be implemented by a system based on special-purpose hardware, which performs specific functions or actions, or performs a combination of special-purpose hardware and computer instructions.

现在参考图1,示意性地示出了本发明的一些实施例提供的用于检查RDMA事务中NIC对象一致性的装置的框图。装置100包括发送器110和接收器120。发送器包括存储器111、还包括处理器113的RDMA NIC(RNIC)112以及一个或多个应用程序114。接收器120包括存储器121、包括处理器123的RNIC 122以及一个或多个应用程序124。发送器110向接收器120发送无锁读请求,以便从存储器121读取对象,该存储器121是发送器110的远程存储器。根据本发明的一些实施例,RNIC 122通过使用每个缓存行中的最低有效字节(leastsignificant byte,LSB)头版本字段来验证对象一致性。当发送器110请求从存储器121读取对象时,对象具有头版本号的LSB附加到每个对象缓存行上的固定位置。然后,RNIC 122通过处理器123检查头版本是否被解锁,并且在其解锁的情况下,RNIC 122从每个缓存行提取版本号Vobj,并验证其与对象的头的对象版本字段ObV的LSB是否匹配。通过比较ObV的LSB与每个缓存行的Vobj来完成验证。Now referring to FIG. 1, a block diagram of an apparatus for checking NIC object consistency in an RDMA transaction provided by some embodiments of the present invention is schematically shown. The apparatus 100 includes a sender 110 and a receiver 120. The sender includes a memory 111, an RDMA NIC (RNIC) 112 further including a processor 113, and one or more applications 114. The receiver 120 includes a memory 121, an RNIC 122 including a processor 123, and one or more applications 124. The sender 110 sends a lock-free read request to the receiver 120 to read an object from the memory 121, which is a remote memory of the sender 110. According to some embodiments of the present invention, the RNIC 122 verifies the object consistency by using the least significant byte (LSB) header version field in each cache line. When the sender 110 requests to read an object from the memory 121, the object has the LSB of the header version number appended to a fixed position on each object cache line. Then, RNIC 122 checks, through processor 123, whether the header version is unlocked, and in the case that it is unlocked, RNIC 122 extracts the version number Vobj from each cache line and verifies whether it matches the LSB of the object version field ObV of the header of the object. The verification is done by comparing the LSB of ObV with Vobj of each cache line.

图2示意性地示出了本发明的一些实施例提供的对象内存的布局,其中,对象版本字段ObV为64比特,在对象的头的开头处(即,在第一缓存行中),ObV的LSB为16比特,表示为Vobj,在后续缓存行中的每个缓存行的开始处。当新对象被创建或被更新时,将第一缓存行中的ObV递增并将ObV的LSB添加到横跨该对象的每个缓存行的固定位置,然后更新存储器121。根据本发明的一些实施例,当远程发送器110或本地接收器120上的线程读取对象时,通过在对象的所有缓存行上循环并检查每个缓存行中的Vobj是否与ObV的LSB相同来进行验证。当相同时,对象被读取并发送到发送器110,而每个缓存行中没有Vobj。FIG2 schematically illustrates the layout of an object memory provided by some embodiments of the present invention, wherein the object version field ObV is 64 bits, at the beginning of the header of the object (i.e., in the first cache line), and the LSB of ObV is 16 bits, denoted as Vobj, at the beginning of each cache line in subsequent cache lines. When a new object is created or updated, ObV in the first cache line is incremented and the LSB of ObV is added to a fixed position across each cache line of the object, and then the memory 121 is updated. According to some embodiments of the present invention, when a thread on a remote sender 110 or a local receiver 120 reads an object, verification is performed by looping over all cache lines of the object and checking whether Vobj in each cache line is the same as the LSB of ObV. When the same, the object is read and sent to the sender 110 without Vobj in each cache line.

当发送器110从接收器120读取对象时,它分配一个缓冲区,将对象写入缓冲区中,而每个缓存行中没有Vobj。当RNIC将对象写入缓冲区时,它会跳过LSB字段,而不会将其写入缓冲区。因此,在缓冲区中接收没有LSB字段的对象,即每个缓存行中没有Vobj。在对象的头版本被锁定的情况下,或者在一个或多个缓存行的Vobj不等于ObV的LSB的情况下,接收器重试预定义次数,以验证头版本是否被解锁,以及每个Vobj与对象的头中的版本字段中的值是否匹配。当不存在匹配时,接收器120将失败响应发送到发送器110。When the sender 110 reads an object from the receiver 120, it allocates a buffer and writes the object into the buffer without Vobj in each cache line. When the RNIC writes the object into the buffer, it skips the LSB field and does not write it into the buffer. Therefore, the object is received in the buffer without the LSB field, i.e., without Vobj in each cache line. In the case where the header version of the object is locked, or in the case where the Vobj of one or more cache lines is not equal to the LSB of ObV, the receiver retries a predefined number of times to verify that the header version is unlocked and that each Vobj matches the value in the version field in the header of the object. When there is no match, the receiver 120 sends a failure response to the sender 110.

图3a示意性地示出了本发明的一些实施例提供的在从缓存行中删除Vobj之前的对象内存布局。图3b示意性地示出了本发明的一些实施例提供的在从缓存行中删除Vobj之后的对象内存布局。Figure 3a schematically illustrates the object memory layout before deleting Vobj from the cache line provided by some embodiments of the present invention. Figure 3b schematically illustrates the object memory layout after deleting Vobj from the cache line provided by some embodiments of the present invention.

根据本发明的一些实施例,当发送器110向接收器120发送写请求以将对象写入存储器121时,接收器120的RNIC 122提取对象的头的版本字段中的值ObV并将值写入对象的每个缓存行,然后将对象写入存储器121中的内存地址。当对象被成功写入存储器121时,RNIC 122向发送器110发送成功响应。According to some embodiments of the present invention, when the sender 110 sends a write request to the receiver 120 to write an object to the memory 121, the RNIC 122 of the receiver 120 extracts the value ObV in the version field of the header of the object and writes the value to each cache line of the object, and then writes the object to the memory address in the memory 121. When the object is successfully written to the memory 121, the RNIC 122 sends a success response to the sender 110.

图4是本发明的一些实施例提供的当接收读请求时通过验证每个缓存行中的Vobj是否等于对象的头中的ObV的LSB来检查RDMA事务中的NIC对象一致性的示例的示意性序列图。在401处,QP在应用程序114上创建并被发送到发送器的RNIC 112。在402处,读请求从应用程序114发送到RNIC 112,以读取存储器121中长度为L的地址A。在RDMA中,当在发送器处创建QP时,在接收器处并行创建QP,并且接收器RNIC 122在403处等待接收来自发送器的读请求。在404处,读请求从发送器处的RNIC 112发送到接收器处的RNIC 122,并且接收读请求。在405处,当接收到读请求时,计数器被设置为零,并且在406处,RNIC 122检查计数器是否等于预定义数字X。在407处,当计数器不同于X时,RNIC 122从所请求的地址中的每个缓存行中提取所有值Vobj,并验证其是否等于对象的头中的对象版本ObV的LSB的值。当不相等时,在408处,RNIC 122等待预定义的时间间隔T,并将1加到计数器。然后,RNIC 122返回406,以再次尝试验证。在接收器RNIC 122尝试和失败的预定义次数X之后,即,当计数器等于X时,在409处,向发送器发送失败响应,并且RNIC 122返回403并等待新的读请求。在从每个缓存行提取的Vobj的值等于对象的头中的ObV的LSB的情况下,在411处,从对象的每个缓存行中删除Vobj值,对象被读取并发送到发送器,而每个缓存行中没有Vobj。然后RNIC 122返回403,等待新的读请求。4 is a schematic sequence diagram of an example of checking NIC object consistency in an RDMA transaction by verifying whether Vobj in each cache line is equal to the LSB of ObV in the header of the object when a read request is received, provided by some embodiments of the present invention. At 401, a QP is created on the application 114 and sent to the RNIC 112 of the sender. At 402, a read request is sent from the application 114 to the RNIC 112 to read the address A of length L in the memory 121. In RDMA, when a QP is created at the sender, a QP is created at the receiver in parallel, and the receiver RNIC 122 waits to receive a read request from the sender at 403. At 404, a read request is sent from the RNIC 112 at the sender to the RNIC 122 at the receiver, and the read request is received. At 405, when the read request is received, the counter is set to zero, and at 406, the RNIC 122 checks whether the counter is equal to a predefined number X. At 407, when the counter is different from X, the RNIC 122 extracts all values Vobj from each cache line in the requested address and verifies whether they are equal to the value of the LSB of the object version ObV in the header of the object. When not equal, at 408, the RNIC 122 waits for a predefined time interval T and adds 1 to the counter. Then, the RNIC 122 returns to 406 to try the verification again. After the receiver RNIC 122 tries and fails a predefined number of times X, that is, when the counter is equal to X, at 409, a failure response is sent to the sender, and the RNIC 122 returns to 403 and waits for a new read request. In the case where the value of Vobj extracted from each cache line is equal to the LSB of ObV in the header of the object, at 411, the Vobj value is deleted from each cache line of the object, and the object is read and sent to the sender without Vobj in each cache line. Then the RNIC 122 returns to 403 and waits for a new read request.

图5是本发明的一些实施例提供的当接收写请求时通过验证每个缓存行中的Vobj是否等于对象的头中的对象版本号ObV的LSB来检查RDMA事务中的NIC对象一致性的示例的示意性序列图。在501处,QP在应用程序114上创建并被发送到发送器的RNIC 112。在502处,写请求从应用程序114发送到RNIC 112,以将对象写入存储器121中长度为L的地址A。写请求中从发送器发送的数据包在缓存行中没有Vobj的值。在RDMA中,当在发送器处创建QP时,在接收器处并行创建QP,并且接收器RNIC 122在503处等待接收来自发送器的写请求。在504处,写请求从发送器处的RNIC 112发送到接收器处的RNIC 122,并且接收写请求。当接收到写请求时,RNIC 122在505处提取对象的头中的ObV的LSB值,并将该值插入到每个缓存行的开头处。在将LSB值插入到每个缓存行之后,在506处,对象被写入存储器121,并且在507处,RNIC 122向发送器110发送成功响应。5 is a schematic sequence diagram of an example of checking NIC object consistency in an RDMA transaction by verifying whether Vobj in each cache line is equal to the LSB of the object version number ObV in the header of the object when a write request is received, provided by some embodiments of the present invention. At 501, a QP is created on the application 114 and sent to the RNIC 112 of the sender. At 502, a write request is sent from the application 114 to the RNIC 112 to write the object to address A of length L in the memory 121. The data packet sent from the sender in the write request does not have a value of Vobj in the cache line. In RDMA, when a QP is created at the sender, a QP is created at the receiver in parallel, and the receiver RNIC 122 waits to receive a write request from the sender at 503. At 504, a write request is sent from the RNIC 112 at the sender to the RNIC 122 at the receiver, and the write request is received. When a write request is received, the RNIC 122 extracts the LSB value of the ObV in the header of the object and inserts the value at the beginning of each cache line at 505. After inserting the LSB value into each cache line, at 506, the object is written to the memory 121, and at 507, the RNIC 122 sends a success response to the sender 110.

图6示意性地示出了本发明的一些实施例提供的当接收读请求时通过验证每个缓存行中的Vobj是否等于对象的头中的ObV的LSB来检查RDMA事务中的NIC对象一致性的方法的流程图。在601处,在接收器120的RNIC 122处接收从存储器121上的内存地址读取对象的请求。在602处,RNIC 122从对象的每个缓存行中提取数据版本号Vobj,并验证每个缓存行的Vobj与对象的头的版本字段ObV的LSB值是否匹配。在603处,当每个缓存行中的Vobj与版本字段中的值匹配时,Vobj由RNIC 122从对象的每个缓存行中删除,对象的数据被读取,并且对象被发送到发送器110。并且在604处,当每个缓存行中的Vobj与版本字段中的LSB值不匹配时,RNIC 122重试预定义次数,以验证每个Vobj与对象的头中的ObV的LSB中的值是否匹配。当不存在匹配时,向发送器110发送失败响应。6 schematically illustrates a flow chart of a method for checking NIC object consistency in an RDMA transaction by verifying whether Vobj in each cache line is equal to the LSB of ObV in the header of an object when a read request is received, provided by some embodiments of the present invention. At 601, a request to read an object from a memory address on a memory 121 is received at the RNIC 122 of the receiver 120. At 602, the RNIC 122 extracts the data version number Vobj from each cache line of the object and verifies whether the Vobj of each cache line matches the LSB value of the version field ObV in the header of the object. At 603, when the Vobj in each cache line matches the value in the version field, Vobj is deleted from each cache line of the object by the RNIC 122, the data of the object is read, and the object is sent to the sender 110. And at 604, when the Vobj in each cache line does not match the LSB value in the version field, the RNIC 122 retries a predefined number of times to verify whether each Vobj matches the value in the LSB of ObV in the header of the object. When there is no match, a failure response is sent to the sender 110 .

图7示意性地示出了本发明的一些实施例提供的当发送读请求时通过将对象的头中的ObV的LSB插入到对象的每个缓存行中来检查RDMA事务中的NIC对象一致性的方法的流程图。在701处,从内存地址读取对象的请求由RNIC 112从发送器110发送到接收器120。当在接收器120处接收到读请求时,版本号Vobj由接收器从对象的每个缓存行中提取,并且每个缓存行的Vobj被验证与对象的头的对象版本号ObV字段的LSB值是否匹配。在702处,当每个缓存行中的Vobj与ObV的LSB值匹配时,发送器110从接收器120接收对象,其中,由接收器从对象的每个缓存行中删除Vobj,即在发送器110中接收对象,而每个缓存行中没有Vobj。否则,在703处,当每个缓存行中的Vobj与ObV的LSB值不匹配时,发送器110从接收器接收失败响应。在请求将对象写入存储器121中的地址的写请求从发送器110发送到接收器120的情况下,ObV的LSB值由接收器120提取并写入对象的每个缓存行中。对象被写入存储器121中的内存地址,发送器110从接收器120接收成功响应。根据本发明的一些实施例,对象的头中的版本字段的LSB值被写入每个缓存行的开头。FIG7 schematically shows a flowchart of a method for checking NIC object consistency in an RDMA transaction by inserting the LSB of ObV in the header of an object into each cache line of the object when sending a read request provided by some embodiments of the present invention. At 701, a request to read an object from a memory address is sent by RNIC 112 from transmitter 110 to receiver 120. When the read request is received at receiver 120, version number Vobj is extracted by the receiver from each cache line of the object, and Vobj of each cache line is verified to see if it matches the LSB value of the object version number ObV field of the header of the object. At 702, when Vobj in each cache line matches the LSB value of ObV, transmitter 110 receives the object from receiver 120, wherein Vobj is deleted from each cache line of the object by the receiver, i.e., the object is received in transmitter 110 without Vobj in each cache line. Otherwise, at 703, when Vobj in each cache line does not match the LSB value of ObV, transmitter 110 receives a failure response from the receiver. In the case where a write request requesting to write an object to an address in memory 121 is sent from sender 110 to receiver 120, the LSB value of ObV is extracted and written to each cache line of the object by receiver 120. The object is written to the memory address in memory 121, and sender 110 receives a success response from receiver 120. According to some embodiments of the present invention, the LSB value of the version field in the header of the object is written to the beginning of each cache line.

根据本发明的其它一些实施例,当接收到读或写请求时,可以通过散列函数检查RDMA事务中的NIC一致性。According to some other embodiments of the present invention, when a read or write request is received, the NIC consistency in the RDMA transaction may be checked by a hash function.

图8示意性地示出了本发明的一些实施例提供的内存布局,其中,对象保存在内存中并且具有头字段,该头字段的值是通过对象的散列函数计算得到的。在本发明的一些实施例中,当创建或更新对象时,计算对象的散列值并将散列值存储在散列值头字段中。当对象被发送器(或任何远程节点)或本地节点上的线程读取时,通过计算对象的散列值来进行测试,并将结果与散列值头进行比较,如果相同,则对象是可序列化的,如果不相同,则说明对象正在由应用程序处理,并应重新读取或丢弃。Fig. 8 schematically illustrates a memory layout provided by some embodiments of the present invention, wherein an object is stored in memory and has a header field, the value of which is calculated by a hash function of the object. In some embodiments of the present invention, when an object is created or updated, a hash value of the object is calculated and the hash value is stored in a hash value header field. When an object is read by a thread on a sender (or any remote node) or a local node, it is tested by calculating the hash value of the object and comparing the result with the hash value header. If they are the same, the object is serializable. If they are not the same, the object is being processed by an application and should be reread or discarded.

根据本发明的一些实施例,可以使用任何具有良好分布的散列函数,包括循环冗余校验-16(CRC-16)、循环冗余校验-32(CRC-32)、循环冗余校验-64(CRC-64)、消息摘要算法第五版(message digest algorithm5,MD5)等。According to some embodiments of the present invention, any hash function with good distribution can be used, including cyclic redundancy check-16 (CRC-16), cyclic redundancy check-32 (CRC-32), cyclic redundancy check-64 (CRC-64), message digest algorithm 5 (MD5), etc.

散列值头字段根据使用的散列函数来选择,例如MD5使用128比特的头,CRC-64使用64比特的头,CRC-32使用32比特的头等等。The hash value header field is selected according to the hash function used, for example, MD5 uses a 128-bit header, CRC-64 uses a 64-bit header, CRC-32 uses a 32-bit header, and so on.

散列函数覆盖对象的全部内容,由将对象写入内存地址的RNIC计算得出。如果读取器计算的散列函数与对象的头中的散列字段中的值匹配,则对象是一致的。The hash function covers the entire contents of the object and is calculated by the RNIC that wrote the object to the memory address. If the hash function calculated by the reader matches the value in the hash field in the object's header, the object is consistent.

图9示意性地示出了本发明的一些实施例提供的当接收读请求时通过散列函数检查RDMA事务中的NIC一致性的方法的流程图。在901处,在接收器120中接收读请求,以从存储器121中的内存地址读取。在902处,接收器120处的RNIC 122从对象的每个数据包中提取数据。在903处,RNIC 122为对象的每个数据包中的数据计算散列函数值,并在RNIC 122中在临时内存地址处更新当前数据包的计算得到的散列函数值。在904处,当RNIC到达对象的最后一个数据包时,RNIC 122验证更新后的散列函数值与对象的头中的散列值字段中的值是否匹配。在905处,当散列值与对象的头中的散列值字段中的值匹配时,在906处,RNIC122响应于读取对象的请求向发送器110发送对象。或者,在907处,当计算得到的散列值与对象的头的散列值字段中的值不匹配时,RNIC 122重试预定义次数,以为每个数据包中的数据计算散列函数值。RNIC 122在临时RNIC内存处更新当前数据包的计算得到的散列函数值。在最后一个数据包处,验证更新后的散列函数值与对象的头的散列值字段中的值是否匹配。在908处,当不存在匹配时,RNIC 122向发送器110发送失败响应。FIG9 schematically illustrates a flow chart of a method for checking NIC consistency in an RDMA transaction by a hash function when receiving a read request provided by some embodiments of the present invention. At 901, a read request is received in a receiver 120 to read from a memory address in a memory 121. At 902, an RNIC 122 at the receiver 120 extracts data from each data packet of an object. At 903, the RNIC 122 calculates a hash function value for the data in each data packet of the object, and updates the calculated hash function value of the current data packet at a temporary memory address in the RNIC 122. At 904, when the RNIC reaches the last data packet of the object, the RNIC 122 verifies whether the updated hash function value matches the value in the hash value field in the header of the object. At 905, when the hash value matches the value in the hash value field in the header of the object, at 906, the RNIC 122 sends the object to the sender 110 in response to the request to read the object. Alternatively, at 907, when the calculated hash value does not match the value in the hash value field of the header of the object, the RNIC 122 retries a predefined number of times to calculate the hash function value for the data in each data packet. The RNIC 122 updates the calculated hash function value of the current data packet at the temporary RNIC memory. At the last data packet, it is verified whether the updated hash function value matches the value in the hash value field of the header of the object. At 908, when there is no match, the RNIC 122 sends a failure response to the transmitter 110.

图10示意性地示出了本发明的一些实施例提供的当接收读请求时通过散列函数检查RDMA事务中的NIC对象一致性而不首先锁定对象的示例的序列图。在1001处,QP在应用程序114上创建并被发送到发送器110的RNIC 112。在1002处,读请求从应用程序114发送到RDMA NIC 112,以读取存储器121中长度为L的地址A。当在发送器处创建QP时,在接收器120处并行地创建QP,接收器中的QP是使用对存储在存储器121中的对象计算的散列函数来创建的。散列函数的结果与头长度一起存储在对象的头中的散列值字段处。接收器RDMA NIC122在1003处等待接收来自发送器110的读请求。在1004处,读请求从发送器110处的RDMANIC 112发送到接收器120处的RDMA NIC 122,并且接收读请求。在1005处,当接收到读请求时,计数器被设置为零,并且在1006处,RDMA NIC 122检查计数器是否等于预定义数字X。在1007处,当计数器不同于X时,RDMA NIC 122从对象的每个数据包中提取数据并计算每个数据包的散列函数,并在RNIC 122中在临时内存地址处更新当前数据包的计算得到的散列函数值。当RNIC到达对象的最后一个数据包时,RNIC 122验证更新后的散列函数值与对象的头中的散列值字段中的值是否匹配。当存储在RNIC 122的临时存储器中的更新后的散列函数值不等于对象的头中的散列值字段的值时,在1008处,RDMA NIC 122等待预定义的时间间隔T,并将计数器加1。然后RDMA NIC 122返回1006,再次尝试验证。在接收器RDMA NIC122尝试和失败的预定义次数X之后,即,当计数器等于X时,在1009处,向发送器110发送失败响应,并且在1010处,RDMA NIC 122返回1003并等待新的读请求。在存储在RNIC 122的临时存储器中的更新后的散列函数值等于对象的头中的散列值字段的值的情况下,在1011处,RNIC 122响应于读取对象的请求向发送器110发送对象。然后在1012处向应用程序114发送对象。然后,在1013处,RDMA NIC 122返回1003,等待新的读请求。FIG10 schematically shows a sequence diagram of an example of checking the consistency of a NIC object in an RDMA transaction by a hash function without first locking the object when receiving a read request, provided by some embodiments of the present invention. At 1001, a QP is created on an application 114 and sent to the RNIC 112 of the sender 110. At 1002, a read request is sent from the application 114 to the RDMA NIC 112 to read an address A of length L in the memory 121. When the QP is created at the sender, the QP is created in parallel at the receiver 120, and the QP in the receiver is created using a hash function calculated on an object stored in the memory 121. The result of the hash function is stored at the hash value field in the header of the object together with the header length. The receiver RDMA NIC 122 waits to receive a read request from the sender 110 at 1003. At 1004, a read request is sent from the RDMA NIC 112 at the sender 110 to the RDMA NIC 122 at the receiver 120, and the read request is received. At 1005, when a read request is received, the counter is set to zero, and at 1006, the RDMA NIC 122 checks whether the counter is equal to a predefined number X. At 1007, when the counter is different from X, the RDMA NIC 122 extracts data from each packet of the object and calculates a hash function for each packet, and updates the calculated hash function value of the current packet at a temporary memory address in the RNIC 122. When the RNIC reaches the last packet of the object, the RNIC 122 verifies whether the updated hash function value matches the value in the hash value field in the header of the object. When the updated hash function value stored in the temporary memory of the RNIC 122 is not equal to the value of the hash value field in the header of the object, at 1008, the RDMA NIC 122 waits for a predefined time interval T and increments the counter by 1. The RDMA NIC 122 then returns to 1006 and attempts verification again. After the receiver RDMA NIC 122 tries and fails a predefined number of times X, i.e., when the counter is equal to X, at 1009, a failure response is sent to the sender 110, and at 1010, the RDMA NIC 122 returns to 1003 and waits for a new read request. In the case where the updated hash function value stored in the temporary memory of the RNIC 122 is equal to the value of the hash value field in the header of the object, at 1011, the RNIC 122 sends the object to the sender 110 in response to the request to read the object. The object is then sent to the application 114 at 1012. Then, at 1013, the RDMA NIC 122 returns to 1003, waiting for a new read request.

根据本发明的一些其它实施例,在RDMA写请求中,散列函数由接收器120中的RNIC122计算。图11示意性地示出了本发明的一些实施例提供的RDMA写请求的示例的流程图,其中,散列函数在接收器RNIC 122中计算。在1101处,从发送器110向接收器120发送RDMA写请求。在图11的示例的写请求中,数据包被划分为先写入、中间写入、最后写入和立即数据(例如在InfiniBand(IB)/RDMA融合以太网(RDMA over converged Ethernet,RoCE)/RoCE第2版(RoCE version 2,RoCEv2)中)。在接收器RNIC 122处接收写请求,在1102处,该接收器RNIC 122从存储器121中的对象的所有数据包中提取数据,并为对象的每个数据包计算散列函数。每次计算的结果存储在RNIC 122的临时存储器上,例如在QP上下文(QP context,QPC)中,并且随着数据包的每次计算而更新。当计算最后一个数据包的散列函数时,在1103处,结果例如在QPC处更新,更新后的散列函数然后在对象的头中的散列值字段处更新。在1104处,RNIC 122写入存储器121中的对象,并且成功响应被发送到发送器110。According to some other embodiments of the present invention, in an RDMA write request, a hash function is calculated by RNIC 122 in receiver 120. FIG. 11 schematically shows a flowchart of an example of an RDMA write request provided by some embodiments of the present invention, wherein the hash function is calculated in receiver RNIC 122. At 1101, an RDMA write request is sent from sender 110 to receiver 120. In the write request of the example of FIG. 11, data packets are divided into first write, middle write, last write, and immediate data (for example, in InfiniBand (IB)/RDMA over converged Ethernet (RoCE)/RoCE version 2 (RoCEv2)). The write request is received at receiver RNIC 122, and at 1102, the receiver RNIC 122 extracts data from all data packets of an object in memory 121 and calculates a hash function for each data packet of the object. The result of each calculation is stored in a temporary memory of the RNIC 122, such as in a QP context (QPC), and is updated with each calculation of a packet. When the hash function of the last packet is calculated, at 1103, the result is updated, for example, at the QPC, and the updated hash function is then updated at the hash value field in the header of the object. At 1104, the RNIC 122 writes the object in the memory 121, and a success response is sent to the sender 110.

图12示意性地示出了本发明的一些实施例提供的用于使用在接收器RNIC处计算得到的散列函数在RDMA事务中将对象写入的示例的序列图。在1201处,QP在应用程序114上创建并被发送到发送器110的RNIC 112。在1202处,写请求从应用程序114发送到RNIC 112,以写入存储器121中长度为L的地址A。当在发送器处创建QP时,在接收器120处并行地创建QP,接收器中的QP是使用对存储在存储器121中的对象计算的散列函数来创建的。散列函数的结果与头长度一起存储在对象的头中的散列值字段处。接收器RNIC 122在1203处等待接收来自发送器110的写请求。在1204处,写请求从发送器110处的RNIC 112发送到接收器120处的RNIC 122,并且接收写请求。在1205处,当接收到写请求时,RNIC 122从存储器121中的对象的所有数据包中提取数据,并为对象的每个数据包计算散列函数。每次计算的结果存储在RNIC 122的临时存储器上,例如在QP上下文(QP context,QPC)中,并且随着数据包的每次计算而更新。当计算最后一个数据包的散列函数时,结果例如在QPC处被更新。在1206处,RNIC 122写入存储器121中的对象,并且在1207处,成功响应被发送到发送器110。FIG12 schematically shows a sequence diagram of an example of writing an object in an RDMA transaction using a hash function calculated at a receiver RNIC, provided by some embodiments of the present invention. At 1201, a QP is created on an application 114 and sent to the RNIC 112 of the sender 110. At 1202, a write request is sent from the application 114 to the RNIC 112 to write to an address A of length L in the memory 121. When the QP is created at the sender, the QP is created in parallel at the receiver 120, and the QP in the receiver is created using a hash function calculated for an object stored in the memory 121. The result of the hash function is stored at the hash value field in the header of the object together with the header length. The receiver RNIC 122 waits to receive a write request from the sender 110 at 1203. At 1204, a write request is sent from the RNIC 112 at the sender 110 to the RNIC 122 at the receiver 120, and the write request is received. At 1205, upon receiving the write request, the RNIC 122 extracts data from all packets of the object in the memory 121 and calculates a hash function for each packet of the object. The result of each calculation is stored on a temporary memory of the RNIC 122, such as in a QP context (QPC), and is updated with each calculation of the packet. When the hash function of the last packet is calculated, the result is updated, for example, at the QPC. At 1206, the RNIC 122 writes the object in the memory 121, and at 1207, a success response is sent to the sender 110.

根据本发明的一些其它实施例,在RDMA写请求的情况下,可以在发送器RNIC 112处计算散列函数,而不是在接收器RNIC 122处计算散列函数。在这种情况下,接收器120从发送器110接收将对象写入存储器121中的地址内存的请求,以及由发送器RNIC 112为对象的每个数据包中的数据计算的散列函数值。接收器RNIC 122将对象写入存储器121,并向发送器110发送成功响应。According to some other embodiments of the present invention, in the case of an RDMA write request, the hash function may be calculated at the sender RNIC 112 instead of at the receiver RNIC 122. In this case, the receiver 120 receives a request from the sender 110 to write an object to an address memory in the memory 121, and a hash function value calculated by the sender RNIC 112 for the data in each packet of the object. The receiver RNIC 122 writes the object to the memory 121 and sends a success response to the sender 110.

根据本发明的一些实施例,散列函数在发送器110中计算,而不是在接收器120中计算。图13示意性地示出了本发明的一些实施例提供的用于使用在发送器RNIC处计算得到的散列函数在RDMA事务中将对象写入的示例的序列图。在1301处,QP在应用程序114上创建并被发送到发送器110的RNIC 112。发送器中的QP是使用对存储在存储器111中的对象计算的散列函数创建的。在1302处,散列函数的最终值的结果与头长度一起存储在对象的头的散列偏移字段处。在1303处,写请求从应用程序114发送到RNIC 112,以写入存储器121中长度为L的地址A。在1304处,对整个对象计算散列函数,并且在1305处,将写请求从发送器RNIC 112发送到接收器RNIC 122。当在发送器处创建QP时,在接收器120处并行地创建QP。接收器RNIC 122在1306处等待接收来自发送器110的写请求。在1307处,当接收到写请求时,RNIC 122将对象写入存储器121,并将在发送器RNIC 112处计算得到的散列函数值插入写入存储器121的对象的头中的散列字段中。在1307处,向发送器110发送成功响应。According to some embodiments of the present invention, the hash function is calculated in the sender 110, rather than in the receiver 120. FIG. 13 schematically illustrates a sequence diagram of an example of writing an object in an RDMA transaction using a hash function calculated at the sender RNIC, provided by some embodiments of the present invention. At 1301, a QP is created on the application 114 and sent to the RNIC 112 of the sender 110. The QP in the sender is created using a hash function calculated on an object stored in the memory 111. At 1302, the result of the final value of the hash function is stored at the hash offset field of the header of the object together with the header length. At 1303, a write request is sent from the application 114 to the RNIC 112 to write to the address A of length L in the memory 121. At 1304, the hash function is calculated for the entire object, and at 1305, the write request is sent from the sender RNIC 112 to the receiver RNIC 122. When the QP is created at the sender, the QP is created in parallel at the receiver 120. The receiver RNIC 122 waits to receive a write request from the sender 110 at 1306. At 1307, upon receiving the write request, the RNIC 122 writes the object to the memory 121 and inserts the hash function value calculated at the sender RNIC 112 into the hash field in the header of the object written to the memory 121. At 1307, a success response is sent to the sender 110.

图14示意性地示出了本发明的一些实施例提供的通过RoCEv2的RDMA写请求的示例的流程图,其中,散列函数在发送器RNIC 112中计算。在1401处,创建写入和校验和的操作码。在此操作码中,确定数据必须写入的最后一个SGE的目的地虚拟地址(virtualaddress,VA),而不是确定从中获取数据的位置的源。这是因为在这种情况下,数据(即散列函数)存储在QPC处,该数据由发送器RNIC 112计算,因此对于RNIC 112是已知的。在1402处,RNIC 112从存储器111读取数据,这些数据被划分为分散聚集条目(scatter gatherentry,SGE)。在1403处,RNIC 112从每个SGE提取数据,计算散列函数并将结果数据存储在RNIC 112的QPC内的临时存储器中。在1404处,RNIC 112将由RNIC 112读取的数据包作为写请求发送到网络,即划分为先写入、中间写入、最后写入。在由RNIC 112读取的最后一个SGE中,目的地VA被写入,因此RNIC知道目的地,即写入对象的地址。此外,在1405处,最后一个SGE由RNIC 112读取,并且在1406处,将存储在RNIC 112中的QPC处的散列函数的最终值被写入附加数据包“仅写入”,该附加数据包在1407处在先写入、中间写入、最后写入的数据包之后被发送到网络。在1408处,当存在立即数据数据包时,在“仅写入”数据包之后发送该立即数据数据包。14 schematically illustrates a flowchart of an example of an RDMA write request through RoCEv2 provided by some embodiments of the present invention, wherein the hash function is calculated in the sender RNIC 112. At 1401, an opcode for write and checksum is created. In this opcode, the destination virtual address (VA) of the last SGE to which the data must be written is determined, rather than the source from which the data is obtained. This is because in this case, the data (i.e., the hash function) is stored at the QPC, which is calculated by the sender RNIC 112 and is therefore known to the RNIC 112. At 1402, the RNIC 112 reads data from the memory 111, which is divided into scatter gather entries (SGE). At 1403, the RNIC 112 extracts data from each SGE, calculates the hash function and stores the resulting data in a temporary memory within the QPC of the RNIC 112. At 1404, the RNIC 112 sends the data packet read by the RNIC 112 to the network as a write request, that is, divided into first write, middle write, and last write. In the last SGE read by the RNIC 112, the destination VA is written, so the RNIC knows the destination, that is, the address of the write object. In addition, at 1405, the last SGE is read by the RNIC 112, and at 1406, the final value of the hash function at the QPC stored in the RNIC 112 is written to the additional data packet "write only", which is sent to the network after the first write, middle write, and last write data packets at 1407. At 1408, when there is an immediate data packet, the immediate data packet is sent after the "write only" packet.

现在参考图15,示意性地示出了本发明的一些实施例提供的当发送读请求时通过散列函数检查RDMA事务中的NIC一致性的方法的流程图。在1501处,发送器110的RNIC 112向接收器120发送从存储器121中的内存地址读取对象的读请求。在1502处,接收器120的RNIC 122从对象的每个数据包中提取数据,并为对象的每个数据包中的数据计算散列函数值。使用RNIC 122的临时存储器将计算得到的散列函数的值存储在QPC中,其中,对于每个数据包,散列函数值在QPC处更新并在最后一个数据包处发送,RNIC 122验证更新后的散列函数值与对象的头中的散列值字段中的值是否匹配。在1503处,当散列值与对象的头中的散列值字段中的值匹配时,在1504处,发送器110从接收器接收对象(即,成功读取的对象)。但是,在1505处,当计算得到的散列值与对象的头的散列值字段中的值不匹配时,发送器110在1506处接收失败响应。根据本发明的一些实施例,散列函数可以由发送器110的RNIC112计算。Now referring to FIG. 15 , a flowchart of a method for checking NIC consistency in an RDMA transaction by a hash function when sending a read request provided by some embodiments of the present invention is schematically shown. At 1501, the RNIC 112 of the sender 110 sends a read request to the receiver 120 to read an object from a memory address in the memory 121. At 1502, the RNIC 122 of the receiver 120 extracts data from each data packet of the object and calculates a hash function value for the data in each data packet of the object. The calculated hash function value is stored in the QPC using the temporary memory of the RNIC 122, wherein for each data packet, the hash function value is updated at the QPC and sent at the last data packet, and the RNIC 122 verifies whether the updated hash function value matches the value in the hash value field in the header of the object. At 1503, when the hash value matches the value in the hash value field in the header of the object, at 1504, the sender 110 receives the object (i.e., the object successfully read) from the receiver. However, at 1505, when the calculated hash value does not match the value in the hash value field of the object's header, sender 110 receives a failure response at 1506. According to some embodiments of the present invention, the hash function may be calculated by RNIC 112 of sender 110.

根据本发明的一些实施例,由于RDMA协议定义了特定的读取行为,因此RNIC使接收器能够返回用于故障检测的特殊读取响应(不返回有效载荷)。可以使用确认扩展传输头部(acknowledgment extended transport header,AETH)字段。因此,获得特殊读取响应或AETH综合征字段的发送器会跳过为该特定读取响应保留的数据包序列号(packetsequence number,PSN)间隙,就像它从接收器获得所有数据包(PSN)时一样。即使未向读取操作发出完成生成信号,发送器也会使用完成队列向应用程序报告该故障的错误。According to some embodiments of the present invention, since the RDMA protocol defines specific read behaviors, the RNIC enables the receiver to return a special read response (without returning a payload) for fault detection. The acknowledgment extended transport header (AETH) field can be used. Therefore, a sender that obtains a special read response or an AETH syndrome field skips the packet sequence number (PSN) gap reserved for that particular read response, just as it does when it obtains all packets (PSN) from the receiver. Even if a completion generation signal is not issued for the read operation, the sender uses the completion queue to report an error of the fault to the application.

现在参考图16,示意性地示出了本发明的一些实施例提供的用于发送先读后写(read before write,RBW)请求的方法的流程图,该RBW请求是带有关于写请求的通知的读请求,该写请求预期在读请求之后发送。在1601处,RNIC 122从一个或多个发送器接收从存储器121中的内存地址读取对象的先读后写(read before write,RBW)请求,其中,RBW请求的数据包包括指示期望从一个或多个发送器接收将对象写入内存地址的请求的比特。在1602处,RNIC 122在表中分配行,该行具有一个或多个发送器的标识ID和从每个发送器的内存地址读取对象的接收到请求的确切时间的时间戳。在1603处,从另一个发送器接收到将对象写入所述地址内存的请求,其中,所述另一个发送器在所述表中具有针对从所述内存地址读取所述对象的请求以及时间戳的行,其中,时间戳小于所述一个或多个发送器请求的所述时间戳。在这种情况下,在1604处,根据本发明的一些实施例,RNIC 122根据表中的发送器ID向一个或多个发送器发送通知,以避免发送写入内存地址的请求,因为内存地址的内容已更改。然后,在1605处,RNIC 122从表中删除另一个发送器的行以及一个或多个发送器ID和时间戳的行,因为另一个发送器的RBW请求完成并且避免了一个或多个发送器的写请求。在本发明的一些实施例中,一个或多个发送器发送RBW请求以从存储器读取对象,其中,RBW请求的数据包包括指示期望从一个或多个发送器接收将另一个对象写入内存地址的请求的比特。Now referring to FIG. 16, a flowchart of a method for sending a read before write (RBW) request provided by some embodiments of the present invention is schematically shown, wherein the RBW request is a read request with a notification about a write request, which is expected to be sent after the read request. At 1601, the RNIC 122 receives a read before write (RBW) request from one or more senders to read an object from a memory address in the memory 121, wherein the data packet of the RBW request includes a bit indicating that a request to write the object to the memory address is expected to be received from one or more senders. At 1602, the RNIC 122 allocates a row in a table, which has an identification ID of one or more senders and a timestamp of the exact time of receiving the request to read the object from the memory address of each sender. At 1603, a request to write the object to the address memory is received from another sender, wherein the other sender has a row in the table for the request to read the object from the memory address and a timestamp, wherein the timestamp is less than the timestamp of the one or more sender requests. In this case, at 1604, according to some embodiments of the present invention, the RNIC 122 sends a notification to one or more senders based on the sender ID in the table to avoid sending a request to write to the memory address because the content of the memory address has changed. Then, at 1605, the RNIC 122 deletes the row of the other sender and the row of the one or more sender IDs and timestamp from the table because the RBW request of the other sender is completed and the write request of the one or more senders is avoided. In some embodiments of the present invention, the one or more senders send a RBW request to read an object from a memory, wherein the packet of the RBW request includes a bit indicating that a request to write another object to the memory address is expected to be received from the one or more senders.

在本发明的一些其它实施例中,在一个或多个发送器发送RBW请求之后,一个或多个发送器可以发送零读请求,该零读请求指示不期望从一个或多个发送器发送将对象写入地址内存的请求。在这种情况下,接收器RNIC 122从一个或多个发送器接收零读请求,该零读请求指示不期望从一个或多个发送器发送将对象写入地址内存的请求,并且从表中删除一个或多个发送器ID和时间戳的行。In some other embodiments of the present invention, after one or more senders send the RBW request, one or more senders may send a zero read request indicating that a request to write an object to the address memory is not expected from the one or more senders. In this case, the receiver RNIC 122 receives the zero read request from the one or more senders indicating that a request to write an object to the address memory is not expected from the one or more senders, and deletes the row of the one or more sender IDs and timestamps from the table.

图17a至图17g示意性地示出了本发明的一些实施例提供的用于先读后写(readbefore write,RBW)请求的方法的示例。图17a示出了发送器B 1702,该发送器B 1702向接收器1704的RNIC 1705发送从地址0x1234读取的RBW请求。根据本发明的一些实施例,读请求还包括指示期望从发送器B 1702发送写请求的比特,并且当该比特被启用时,读请求是RBW请求。RNIC 1704接收请求并在表1706中为发送器B 1702分配行。在该行中,存储发送器B 1702的ID,还存储接收到来自发送器B 1702的RBW请求的时间戳1111以及将对其执行读写操作的地址0x1234。图17b示意性地示出了发送器C 1703,该发送器C1703向接收器1704的RNIC 1705发送从地址0x1234读取的RBW请求。读请求还包括指示预期从发送器C 1703发送写请求的比特。RNIC 1704接收RBW请求并在表1706中为发送器C 1703分配行。在该行中,存储发送器C 1702的ID,还存储接收到来自发送器C 1703的RBW请求的时间戳1130以及将对其执行读写操作的地址0x1234。图17c示意性地示出发送器A 1701,该发送器A 1701向接收器1704的RNIC 1705发送从地址0x1234读取的请求。该读请求不包括期望稍后从发送器A1701发送的写请求的任何指示。RNIC 1704接收该请求,但是,它没有被分配到表1706中的任何行。Figures 17a to 17g schematically illustrate an example of a method for a read before write (RBW) request provided by some embodiments of the present invention. Figure 17a shows a transmitter B 1702, which sends a RBW request to read from address 0x1234 to the RNIC 1705 of the receiver 1704. According to some embodiments of the present invention, the read request also includes a bit indicating that a write request is expected to be sent from the transmitter B 1702, and when the bit is enabled, the read request is a RBW request. The RNIC 1704 receives the request and allocates a row for the transmitter B 1702 in the table 1706. In this row, the ID of the transmitter B 1702 is stored, and the timestamp 1111 of the receipt of the RBW request from the transmitter B 1702 and the address 0x1234 on which the read and write operations will be performed are also stored. FIG17b schematically shows transmitter C 1703 sending a RBW request to read from address 0x1234 to the RNIC 1705 of the receiver 1704. The read request also includes a bit indicating that a write request is expected from transmitter C 1703. RNIC 1704 receives the RBW request and allocates a row for transmitter C 1703 in table 1706. In this row, the ID of transmitter C 1702 is stored, as well as the timestamp 1130 at which the RBW request from transmitter C 1703 was received and the address 0x1234 to which the read and write operations are to be performed. FIG17c schematically shows transmitter A 1701 sending a request to read from address 0x1234 to the RNIC 1705 of the receiver 1704. The read request does not include any indication that a write request is expected to be sent later from transmitter A 1701. RNIC 1704 receives the request, however, it is not assigned to any row in table 1706.

图17d示意性地示出了发送器B 1702,该发送器B 1702向地址0x1234发送写请求,如先前由发送器B 1702发送的RBW请求中所指示(如图17a所示)。图17e示意性地示出了发送器B 1702的行被删除,因为完成了RBW请求,完成了发送器B 1702的写请求,并且对地址0x1234的内容执行了写操作。由于地址0x1234的内容被改变了,因此RNIC 1705向表中期望写入地址0x1234的其余发送器(即在这种情况下为发送器C 1703)通知地址0x1234的内容被改变了,并从表1706中删除发送器C 1703的行。由于该通知,避免了预期来自发送器C1703的写请求。在图17f中,发送器A 1701向地址0x1234发送写请求。但是,由于该地址的内容在来自发送器A 1701的读请求与来自发送器A 1701的写请求之间的时间内发生了改变,因此该写请求失败。图17g示意性地示出了RNIC 1705向发送器A 1701发送故障通知。从这个示例可以看出,通过使用带有预期发送的写请求的指示比特的读取请求,避免了在地址的内容已经改变的情况下写请求发送、失败和失败通知发送。这节省了时间和计算资源。FIG. 17d schematically illustrates transmitter B 1702 sending a write request to address 0x1234, as indicated in the RBW request previously sent by transmitter B 1702 (as shown in FIG. 17a). FIG. 17e schematically illustrates that the row of transmitter B 1702 is deleted because the RBW request is completed, the write request of transmitter B 1702 is completed, and the write operation is performed on the contents of address 0x1234. Since the contents of address 0x1234 are changed, the RNIC 1705 notifies the remaining transmitters in the table (i.e., transmitter C 1703 in this case) that the contents of address 0x1234 are changed, and deletes the row of transmitter C 1703 from table 1706. Due to this notification, the expected write request from transmitter C 1703 is avoided. In FIG. 17f, transmitter A 1701 sends a write request to address 0x1234. However, since the content of the address has changed in the time between the read request from sender A 1701 and the write request from sender A 1701, the write request fails. Figure 17g schematically shows RNIC 1705 sending a failure notification to sender A 1701. From this example, it can be seen that by using a read request with an indication bit of the expected write request to be sent, the sending of a write request, failure, and failure notification is avoided in the case where the content of the address has changed. This saves time and computing resources.

现在参考图18,示意性地示出了本发明的一些实施例提供的在RDMA事务中将比较和交换(compare and swap,CAS)操作和读请求的流操作优化为单个操作的方法的流程图。在两阶段锁定(two phase locking,2PL)协议中,需要先锁定一个对象,然后再读取其内容。当对象在远程节点上时,当前通过两个单侧RDMA请求来完成:一个CAS(锁定对象)请求,然后是一个读请求。这需要从本地节点L到远程节点R的两次往返。优化后的CAS和读操作是组合了两个操作的新的操作码CAS-and-READ:即如果在地址X上的CAS成功,则从地址Y读取N个字节。如果不成功,则返回失败。在1801处,RNIC 122从发送器110接收优化后的比较和交换和读(CAS和读)操作的请求。在1802处,RNIC 122将第一内存地址的内容与第一值进行比较。在1803处,当第一内存地址的内容等于第一值时,RNIC 122将第一内存地址的内容替换为第二值。在1804处,RNIC 122读取第二内存地址的内容,并且在1805处,RNIC 122向发送器110发送成功响应和从第二内存地址读取的内容。但是,在1806处,当内容不等于第一值时,RNIC 122向发送器110发送失败响应。Now referring to FIG. 18, a flowchart of a method for optimizing the stream operation of compare and swap (CAS) operations and read requests into a single operation in an RDMA transaction provided by some embodiments of the present invention is schematically shown. In a two-phase locking (2PL) protocol, an object needs to be locked before its contents are read. When the object is on a remote node, this is currently accomplished through two one-sided RDMA requests: a CAS (lock object) request and then a read request. This requires two round trips from the local node L to the remote node R. The optimized CAS and read operation is a new opcode CAS-and-READ that combines the two operations: that is, if the CAS at address X is successful, N bytes are read from address Y. If unsuccessful, failure is returned. At 1801, the RNIC 122 receives a request for an optimized compare and swap and read (CAS and read) operation from the transmitter 110. At 1802, the RNIC 122 compares the contents of the first memory address with the first value. At 1803, when the content of the first memory address is equal to the first value, the RNIC 122 replaces the content of the first memory address with the second value. At 1804, the RNIC 122 reads the content of the second memory address, and at 1805, the RNIC 122 sends a success response and the content read from the second memory address to the sender 110. However, at 1806, when the content is not equal to the first value, the RNIC 122 sends a failure response to the sender 110.

图19示意性地示出了本发明的一些实施例提供的优化后的比较和交换(compareand swap,CAS)和读操作的序列图。在1901处,在发送器110和接收器120处创建QP。在1902处,接收器120中的RNIC 122等待来自发送器的CAS和读请求到达。在1903处,CAS和读操作从应用程序114发送到发送器110中的RNIC 112。CAS和读操作包括比较和交换目的地地址和与目的地地址的内容比较的第一值、替换CAS目的地址内容的第二值、读取目的地地址和读取长度。在1904处,CAS和读请求从发送器中的RNIC 112发送到接收器中的RNIC 122。在1905处,在接收到CAS和读请求之后,RNIC 122转到CAS地址,并且在1906处,RNIC 122将CAS地址的内容与第一值进行比较。在1907处,当CAS地址的内容不等于第一值时,RNIC向发送器110中的RNIC 112发送失败响应,并且在1908处,向应用程序114发送失败响应。但是,在1905处,当CAS地址的内容等于第一值时,RNIC 122用第二值替换CAS地址的内容。然后,在1909处,RNIC转到第二内存地址(读取地址)并读取第二内存地址的内容。然后,在1910处,RNIC 122检查读操作是否成功。当读取失败并且第二地址内存的内容没有被成功读取时,在1911处向RNIC 112发送失败响应,并且在1911处将该失败响应从RNIC 112发送到应用程序114。但是,当第二地址内存的内容的读取成功时,在1913处,RNIC 122向发送器110发送具有从第二内存地址读取的内容的成功响应。然后,在1914处,向应用程序114发送成功响应。FIG. 19 schematically illustrates a sequence diagram of optimized compare and swap (CAS) and read operations provided by some embodiments of the present invention. At 1901, a QP is created at the sender 110 and the receiver 120. At 1902, the RNIC 122 in the receiver 120 waits for the CAS and read request from the sender to arrive. At 1903, the CAS and read operation is sent from the application 114 to the RNIC 112 in the sender 110. The CAS and read operation includes comparing and swapping a destination address and a first value compared with the content of the destination address, replacing a second value of the CAS destination address content, reading the destination address, and reading a length. At 1904, the CAS and read request is sent from the RNIC 112 in the sender to the RNIC 122 in the receiver. At 1905, after receiving the CAS and read request, the RNIC 122 goes to the CAS address, and at 1906, the RNIC 122 compares the content of the CAS address with the first value. At 1907, when the content of the CAS address is not equal to the first value, the RNIC sends a failure response to the RNIC 112 in the transmitter 110, and at 1908, the failure response is sent to the application 114. However, at 1905, when the content of the CAS address is equal to the first value, the RNIC 122 replaces the content of the CAS address with the second value. Then, at 1909, the RNIC goes to the second memory address (read address) and reads the content of the second memory address. Then, at 1910, the RNIC 122 checks whether the read operation is successful. When the read fails and the content of the second address memory is not successfully read, a failure response is sent to the RNIC 112 at 1911, and the failure response is sent from the RNIC 112 to the application 114 at 1911. However, when the reading of the content of the second address memory is successful, at 1913, the RNIC 122 sends a success response with the content read from the second memory address to the transmitter 110. Then, at 1914, a success response is sent to the application 114.

在研究下文附图和详细描述之后,本发明的其它系统、方法、特征和优点对于本领域技术人员来说是或变得显而易见的。希望所有这些其它系统、方法、特征和优点包括在本说明书中,在本发明的范围内,并且受所附权利要求的保护。Other systems, methods, features and advantages of the present invention will be or become apparent to those skilled in the art after studying the following drawings and detailed description. It is intended that all such other systems, methods, features and advantages be included in this description, be within the scope of the present invention, and be protected by the accompanying claims.

对本发明各实施例的描述只是出于说明的目的,而这些描述并不旨在穷举或限于所公开的实施例。在不脱离所描述的实施例的范围和精神的情况下,许多修改和变化对本领域技术人员而言是显而易见的。相比于市场上存在的技术,选择本文使用的术语可最好地解释本实施例的原理、实际应用或技术进步,或使本领域其它技术人员理解此处公开的实施例。The description of the various embodiments of the present invention is for illustrative purposes only, and these descriptions are not intended to be exhaustive or limited to the disclosed embodiments. Many modifications and variations will be apparent to those skilled in the art without departing from the scope and spirit of the described embodiments. The terms used herein are selected to best explain the principles, practical applications, or technological advances of the embodiments, or to enable other persons skilled in the art to understand the embodiments disclosed herein, compared to the technologies available on the market.

期望在本申请成熟的专利有效期内,将开发用于检查NIC对象一致性、减少计算资源和网络带宽并缩短大数据包的时延的许多相关方法和装置,It is expected that during the life of this patent application, many related methods and apparatus will be developed for checking NIC object consistency, reducing computing resources and network bandwidth, and shortening latency for large data packets.

并且用于检查NIC对象一致性、减少计算资源和网络带宽并缩短大数据包的时延的方法和装置的术语的范围旨在先验地包括所有此类新技术。And the scope of the term method and apparatus for checking NIC object consistency, reducing computing resources and network bandwidth, and shortening latency for large data packets is intended to a priori include all such new technologies.

本文所使用的术语“约”是指±10%。As used herein, the term "about" refers to ± 10%.

术语“包括”、“具有”以及其变化形式表示“包括但不限于”。该术语包括了术语“由……组成”以及“基本上由……组成”。The terms "including", "having" and variations thereof mean "including but not limited to". This term encompasses the terms "consisting of" and "consisting essentially of.

短语“基本上由……组成”表示组成物或方法可以包括附加成分和/或步骤,但前提是所述附加成分和/或步骤不会实质上改变所要求保护的组成物或方法的基本和新颖特性。The phrase "consisting essentially of" means that the composition or method may include additional ingredients and/or steps, but only if the additional ingredients and/or steps do not materially alter the basic and novel characteristics of the claimed composition or method.

除非上下文另有明确说明,否则本文使用的单数形式“一个”和“所述”包括复数含义。例如,术语“一种复合物”或“至少一种复合物”可以包括多种复合物,包括其混合物。As used herein, the singular forms "a", "an" and "the" include plural references unless the context clearly indicates otherwise. For example, the term "a compound" or "at least one compound" may include a plurality of compounds, including mixtures thereof.

本文所使用的词语“示例性的”表示“作为一个示例、实例或说明”。任何被描述为“示例性的”实施例不一定解释为比其它实施例更优选或更有利,和/或排除其它实施例的特征的结合。The word “exemplary” is used herein to mean “serving as an example, instance, or illustration.” Any embodiment described as “exemplary” is not necessarily to be construed as preferred or advantageous over other embodiments and/or to exclude the incorporation of features of other embodiments.

本文所使用的词语“可选地”表示“在一些实施例中提供且在其它实施例中没有提供”。本发明的任何特定实施例都可以包括多个“可选的”特征,除非这些特征有冲突。As used herein, the word “optionally” means “provided in some embodiments and not provided in other embodiments.” Any particular embodiment of the present invention may include a number of “optional” features, unless such features are in conflict.

在整个本申请案中,本发明的各实施例可以以范围格式呈现。应理解,范围格式的描述仅为了方便和简洁起见,并且不应该被解释为对本发明范围的固定限制。因此,对于范围的描述应被认为已经具体公开所有可能的子范围以及该范围内的单个数值。例如,对于例如从1到6的范围的描述应被视为已具体公开了1至3、1至4、1至5、2至4、2至6、3至6等子范围以及该范围内的单独数字例如1、2、3、4、5和6。不论范围有多广,这都适用。Throughout this application, embodiments of the present invention may be presented in a range format. It should be understood that the description of the range format is only for convenience and brevity, and should not be interpreted as a fixed limitation on the scope of the present invention. Therefore, the description of the range should be considered to have specifically disclosed all possible sub-ranges and individual numerical values within the range. For example, the description of a range such as from 1 to 6 should be considered to have specifically disclosed sub-ranges such as 1 to 3, 1 to 4, 1 to 5, 2 to 4, 2 to 6, 3 to 6, and individual numbers such as 1, 2, 3, 4, 5, and 6 within the range. This applies regardless of how wide the range is.

当本文指示一个数字范围时,表示包括所指示的范围内的任何所列举的数字(分数或整数)。短语“第一指示数字和第二指示数字之间的范围”以及“从第一指示数字到第二指示数字的范围”在本文中可互换使用,表示包括第一指示数字和第二指示数字以及二者之间的所有分数和整数。When a numerical range is indicated herein, any number (fractional or integer) listed within the indicated range is included. The phrases "a range between a first indicated number and a second indicated number" and "a range from a first indicated number to a second indicated number" are used interchangeably herein to include the first indicated number and the second indicated number and all fractions and integers between the two.

应理解,为了清楚起见,在单独实施例的上下文中描述的本发明的某些特征还可以组合提供于单个实施例中。相反地,为了简洁起见,在单个实施例的上下文中描述的本发明的各种特征也可以单独地或以任何合适的子组合或作为本发明的任何合适的其它实施例提供。在各种实施例的上下文中描述的某些特征不应当被认为是这些实施例的基本特征,除非在没有这些元件的情况下实施例是不可操作的。It should be understood that, for the sake of clarity, certain features of the present invention described in the context of a separate embodiment may also be provided in combination in a single embodiment. On the contrary, for the sake of brevity, the various features of the present invention described in the context of a single embodiment may also be provided individually or in any suitable sub-combination or as any suitable other embodiment of the present invention. Certain features described in the context of various embodiments should not be considered as essential features of these embodiments, unless the embodiments are inoperable without these elements.

本说明书中所提及的所有公开、专利和专利申请都在本文中以全文引用的方式并入本说明书中,程度如同每一单独的公开、专利或专利申请被专门并且单独地指示以引用的方式并入本文中一般。此外,对本申请案的任何参考的引用或识别不得解释为承认此类参考可作为本发明的现有技术使用。对于章节标题而言,它们不应当被解释为必然的限制。另外,本申请的任何一个或多个优先权文件以全文引用的方式并入本申请中。All disclosures, patents and patent applications mentioned in this specification are incorporated herein by reference in their entirety, to the extent that each individual disclosure, patent or patent application is specifically and individually indicated to be incorporated herein by reference. In addition, any reference or identification to the present application shall not be construed as admitting that such references can be used as prior art of the present invention. For section headings, they should not be construed as necessary limitations. In addition, any one or more priority documents of the present application are incorporated herein by reference in their entirety.

Claims (23)

1. An apparatus for receiving a plurality of transactions, comprising a remote direct memory access (remote direct memory access, RDMA) network interface card (RDMA network INTERFACE CARD, RNIC), the apparatus being configured to:
receiving a request from a sender to read an object from a memory address;
checking when the header version is unlocked and extracting a version number Vobj from each cache line and verifying if the Vobj matches the least significant byte (LEAST SIGNIFICANT byte, LSB) of the version field of the header of the object;
When the Vobj in each cache line matches the value in the version field:
deleting the Vobj from each cache line of the object,
Reading data of the object;
Transmitting the object to the transmitter;
Retrying a predefined number of times when the header version is locked or when the Vobj in each cache line does not match the value in the version field to verify whether the header version is unlocked and to verify whether each Vobj matches the value in the version field in the header of the object, and sending a failure response when there is no match.
2. The RNIC of claim 1, characterized in that said RNIC is further to:
receiving a request from a sender to write an object to the memory address;
Extracting the value in the version field of the header of the object, writing the value into each cache line of the object, and writing the object into the memory address;
And sending a success response to the sender.
3. An apparatus for receiving a plurality of transactions, comprising a remote direct memory access (remote direct memory access, RDMA) network interface card (RDMA network INTERFACE CARD, RNIC), the apparatus being configured to:
receiving a request from a sender to read an object from a memory address;
extracting data from each data packet of the object;
calculating a hash function value for the data in each data packet of the object, and updating the calculated hash function value of the current data packet at a temporary RNIC memory address;
Verifying, at the last data packet, whether the updated hash function value matches a value in a hash value field in the header of the object;
transmitting the object to the transmitter when the hash value matches the value in the hash value field in the header of the object;
Retrying a predefined number of times when the calculated hash value does not match the value in the hash value field of the header of the object to:
Calculating the hash function value for the data in each data packet, updating the calculated hash function value of the current data packet at the temporary RNIC memory, verifying whether the updated hash function value matches the value in the hash value field in the header of the object at the last data packet, and sending a failure response when there is no match.
4. The RNIC of claim 3, characterized in that said RNIC is further to:
receiving a request from a sender to write an object to the memory address;
Calculating the hash function value for the data in each data packet of the object and updating the calculated hash function value for the current data packet at a temporary RNIC memory address, the calculated hash function value in the hash value field in the header of the object at the last data packet;
writing the object into the memory address;
and sending a success response.
5. The RNIC according to claim 4, wherein said RNIC is further configured to:
Receiving the request from the sender to write an object into the address memory; wherein the hash function value is calculated by the sender RNIC for data in each data packet of the object and the calculated hash function value of the current data packet is updated at a temporary sender RNIC memory address, and wherein at the last data packet the updated hash function value is written into the hash value field in the header of the object;
writing the object into the address memory;
And sending a success response to the sender.
6. A device according to claim 3, characterized in that the device is further adapted to:
receiving a read-before-write (read before write, RBW) request to read an object from the memory address from one or more senders, wherein a packet of the RBW request includes a bit indicating a desire to receive a request from the one or more senders to write another object to the memory address;
Allocating a row in a table, the row having an Identification (ID) of the one or more senders, a timestamp of the received RBW request, and the address memory of each sender from which the object was read;
when a request to write an object to the address memory is received from another sender, wherein the other sender has a row in the table of RBW requests for reading the object from the memory address with a timestamp that is less than the timestamp of the one or more sender requests:
Sending a notification to the one or more transmitters according to the transmitter ID in the table to avoid sending a request to write the memory address;
the row of the other transmitter and the row of the one or more transmitter IDs and time stamps are deleted from the table.
7. The apparatus of claim 6, wherein the apparatus is further configured to:
one or more senders send RBW requests to read the objects from the memory address, wherein packets of the RBW requests include bits indicating a desire to receive a request from the one or more senders to write another object to the memory address.
8. The apparatus of claim 7, wherein the apparatus is further configured to:
A zero read request is sent indicating that a request to write an object to the address memory is not expected to be sent from the one or more senders.
9. The apparatus of claim 7, wherein the apparatus is further configured to:
receiving a zero read request from the one or more senders, the zero read request indicating that a request to write an object to the address memory is not expected to be sent from the one or more senders;
the row of the one or more sender IDs and timestamps is deleted from the table.
10. An apparatus for transmitting a plurality of transactions, comprising a remote direct memory access (remote direct memory access, RDMA) network interface card (RDMA network INTERFACE CARD, RNIC), the apparatus being configured to:
transmitting a request to a receiver to read an object from a memory address, wherein a version number Vobj is extracted by the receiver from each cache line of the object, and the Vobj of each cache line is verified as matching LSB values in a version field of a header of the object;
Receiving the object from the receiver when the Vobj in each cache line matches the LSB value in the version field, wherein the receiver deletes the Vobj from each cache line of the object; or (b)
A failure response is received from the receiver when the Vobj in each cache line does not match the LSB value in the version field.
11. The RNIC of claim 10, characterized in that the RNIC is further to:
transmitting a request to a receiver to write an object to the memory address; wherein the LSB value in the version field of the header of the object is extracted by the receiver and written into each cache line of the object, and the object is written into the memory address;
A success response is received from the receiver.
12. An apparatus for transmitting a plurality of transactions, comprising a remote direct memory access (remote direct memory access, RDMA) network interface card (RDMA network INTERFACE CARD, RNIC), the apparatus being configured to:
Transmitting a request to the receiver to read the object from the memory address; wherein data is extracted from each data packet of the object by the receiver and a hash function value is calculated for the data in each data packet of the object and at the last data packet, the updated hash function value verifies by the receiver whether it matches a value in a hash value field in a header of the object;
receiving the object from the receiver when the hash value matches the value in the hash value field of the header of the object;
A failure response is received when the calculated hash value does not match the value in the hash value field of the header of the object.
13. The RNIC of claim 12, characterized in that the RNIC is further to:
Transmitting a request to the receiver to write the object to the memory address; wherein the hash function value is calculated by the receiver for the data in each data packet of the object and the calculated hash function value for a current data packet is updated by the receiver at a temporary receiver RNIC memory address and at a last data packet the calculated hash function value is updated in the hash value field of the header of the object;
and after the object is written into the address memory by the receiver, receiving a successful response from the receiver.
14. The RNIC of claim 13, characterized in that the RNIC is further to:
calculating the hash function value for the data in each data packet of the object, and updating the calculated hash function value of the current data packet at a temporary RNIC memory address;
Writing the updated hash function value to the hash value field of the header of the object at the last data packet;
Transmitting the request to write the object to the receiver;
and after the object is written into the address memory by the receiver, receiving a successful response from the receiver.
15. An apparatus for optimizing compare and exchange operations and streaming operations of read requests as a single operation in a remote direct memory access (remote direct memory access, RDMA) transaction, comprising an RDMA Network Interface Card (RNIC) for:
Receiving a request from the sender for an optimized compare and exchange and read operation:
Comparing the content of the first memory address with a first value;
When the content of the first memory address is equal to the first value:
Replacing the content of the first memory address with a second value;
reading the content of the second memory address;
transmitting a success response to the transmitter and the content read from the second memory address;
When the content of the first memory address is not equal to the first value:
and sending a failure response to the sender.
16. A method for receiving a plurality of transactions, comprising:
At a remote direct memory access (remote direct memory access, RDMA) network interface card (RDMA network INTERFACE CARD, RNIC):
receiving a request from a sender to read an object from a memory address;
extracting a data version number Vobj from each cache line of the object and verifying whether the Vobj of each cache line matches a Least Significant Byte (LSB) value in a version field of a header of the object;
When the Vobj in each cache line matches the value in the version field:
deleting the Vobj from each cache line of the object,
Reading data of the object;
Transmitting the object to the transmitter;
Retrying a predefined number of times when the Vobj in each cache line does not match the LSB value in the version field to verify whether each Vobj matches the value in the LSB version field in the header of the object, and sending a failure response when there is no match.
17. The method as recited in claim 16, further comprising:
receiving a request from a sender to write an object to the memory address;
extracting the LSB value in the version field of the header of the object, writing the value in each cache line of the object, and writing the object to the memory address;
And sending a success response to the sender.
18. A method for receiving a plurality of transactions, comprising:
At a remote direct memory access (remote direct memory access, RDMA) network interface card (RDMAnetwork INTERFACE CARD, RNIC):
receiving a request from a sender to read an object from a memory address;
extracting data from each data packet of the object;
Calculating a hash function value for the data in each data packet of the object, and updating the calculated hash function value of the current data packet at a temporary RNIC memory address;
Verifying, at the last data packet, whether the updated hash function value matches a value in a hash value field in the header of the object;
transmitting the object to the transmitter when the hash value matches the value in the hash value field in the header of the object;
Retrying a predefined number of times when the calculated hash value does not match the value in the hash value field of the header of the object to:
Calculating the hash function value for the data in each data packet, updating the calculated hash function value of the current data packet at the temporary RNIC memory, verifying whether the updated hash function value matches the value in the hash value field in the header of the object at the last data packet, and sending a failure response when there is no match.
19. The method as recited in claim 18, further comprising:
receiving a request from a sender to write an object to the memory address;
Calculating the hash function value for the data in each data packet of the object and updating the calculated hash function value for the current data packet at a temporary RNIC memory address, the calculated hash function value in the hash value field in the header of the object at the last data packet;
And writing the object into the memory address and sending a success response.
20. The method as recited in claim 19, further comprising:
calculating said hash function value for said data in each data packet of said object at a sender RNIC and updating said calculated hash function value for a current data packet at a temporary sender RNIC memory address;
Writing the updated hash function value to the hash value field of the header of the object at the last data packet;
Transmitting the request to write the object to the receiver;
Receiving, at the receiver RNIC, the request to write the object to the memory address;
And sending a success response to the sender.
21. A method for transmitting a plurality of transactions, comprising:
At a remote direct memory access (remote direct memory access, RDMA) network interface card (RDMAnetwork INTERFACE CARD, RNIC):
transmitting a request to a receiver to read an object from a memory address, wherein a version number Vobj is extracted from each cache line of the object by a receiver RNIC, and the Vobj of each cache line is verified by the receiver RNIC as matching LSB values in a version field of a header of the object;
receiving the object from the receiver RNIC when the Vobj in each cache line matches the LSB value in the version field, wherein the receiver deletes the Vobj from each cache line of the object; or (b)
A failure response is received from the receiver RNIC when the Vobj in each cache line does not match the LSB value in the version field.
22. A method for transmitting a plurality of transactions, comprising:
At a remote direct memory access (remote direct memory access, RDMA) network interface card (RDMAnetwork INTERFACE CARD, RNIC):
Transmitting a request to the receiver RNIC to read the object from the memory address; wherein data is extracted from each data packet of the object by the receiver RNIC and a hash function value is calculated for the data in each data packet of the object and at the last data packet, the updated hash function value is verified by the receiver RNIC as matching the value in the hash value field in the header of the object;
receiving the object from the receiver when the hash value matches the value in the hash value field of the header of the object;
A failure response is received when the calculated hash value does not match the value in the hash value field of the header of the object.
23. A method for optimizing compare and exchange operations and streaming operations of read requests as a single operation in a remote direct memory access (remote direct memory access, RDMA) transaction, comprising:
at the receiver, when a request for optimized compare and exchange and read operations is received from the transmitter:
Comparing the content of the first memory address with a first value;
When the content of the first memory address is equal to the first value:
Replacing the content of the first memory address with a second value;
reading the content of the second memory address;
transmitting a success response and the content read from the second memory address to the transmitter;
When the content is not equal to the first value:
and sending a failure response to the sender.
CN202280089961.7A 2022-01-24 2022-01-24 Method and apparatus for network interface card (NIC) object consistency (NOC) messages Pending CN118648275A (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/EP2022/051476 WO2023138789A1 (en) 2022-01-24 2022-01-24 Methods and devices for network interface card (nic) object coherency (noc) messages

Publications (1)

Publication Number Publication Date
CN118648275A true CN118648275A (en) 2024-09-13

Family

ID=80168170

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202280089961.7A Pending CN118648275A (en) 2022-01-24 2022-01-24 Method and apparatus for network interface card (NIC) object consistency (NOC) messages

Country Status (2)

Country Link
CN (1) CN118648275A (en)
WO (1) WO2023138789A1 (en)

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7826470B1 (en) * 2004-10-19 2010-11-02 Broadcom Corp. Network interface device with flow-oriented bus interface
US8880935B2 (en) * 2012-06-12 2014-11-04 International Business Machines Corporation Redundancy and load balancing in remote direct memory access communications
CN109936510B (en) * 2017-12-15 2022-11-15 微软技术许可有限责任公司 Multipath RDMA transport

Also Published As

Publication number Publication date
WO2023138789A1 (en) 2023-07-27

Similar Documents

Publication Publication Date Title
US11023411B2 (en) Programmed input/output mode
CN110177118B (en) RDMA-based RPC communication method
US20210281499A1 (en) Network interface device
US7519650B2 (en) Split socket send queue apparatus and method with efficient queue flow control, retransmission and sack support mechanisms
US7089289B1 (en) Mechanisms for efficient message passing with copy avoidance in a distributed system using advanced network devices
US8265092B2 (en) Adaptive low latency receive queues
US7889749B1 (en) Cut-through decode and reliability
US6799200B1 (en) Mechanisms for efficient message passing with copy avoidance in a distributed system
US9411775B2 (en) iWARP send with immediate data operations
CN106487896B (en) Method and apparatus for handling remote direct memory access request
US7457845B2 (en) Method and system for TCP/IP using generic buffers for non-posting TCP applications
CN109564502B (en) Method and device for processing access request in storage device
TWI582609B (en) Method and apparatus for performing remote memory access(rma) data transfers between a remote node and a local node
CN111026324B (en) Updating method and device of forwarding table entry
US8312241B2 (en) Serial buffer to support request packets with out of order response packets
CN113553184A (en) A method, apparatus, electronic device and readable storage medium for realizing load balancing
CN110602211A (en) Out-of-order RDMA method and device with asynchronous notification
US7710990B2 (en) Adaptive low latency receive queues
US7817572B2 (en) Communications apparatus and communication method
CN115914144B (en) Direct access to storage devices by a data plane of a switch
CN118648275A (en) Method and apparatus for network interface card (NIC) object consistency (NOC) messages
EP4470185A2 (en) System and method for one-sided read rma using linked queues
CN116508011A (en) Data storage method and device
CN115150472A (en) Concurrency control method, network card, computer equipment, storage medium
Kamp AXI over Ethernet; a protocol for the monitoring and control of FPGA clusters

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination