CN101751415B - Metadata service system, metadata synchronization method and write server update method - Google Patents
Metadata service system, metadata synchronization method and write server update method Download PDFInfo
- Publication number
- CN101751415B CN101751415B CN200810224708XA CN200810224708A CN101751415B CN 101751415 B CN101751415 B CN 101751415B CN 200810224708X A CN200810224708X A CN 200810224708XA CN 200810224708 A CN200810224708 A CN 200810224708A CN 101751415 B CN101751415 B CN 101751415B
- Authority
- CN
- China
- Prior art keywords
- server
- read
- write
- metadata
- service device
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 45
- 230000001360 synchronised effect Effects 0.000 claims abstract description 10
- 230000004048 modification Effects 0.000 claims description 8
- 238000012986 modification Methods 0.000 claims description 8
- 238000010586 diagram Methods 0.000 description 5
- 238000012544 monitoring process Methods 0.000 description 4
- 230000011664 signaling Effects 0.000 description 4
- HRANPRDGABOKNQ-ORGXEYTDSA-N (1r,3r,3as,3br,7ar,8as,8bs,8cs,10as)-1-acetyl-5-chloro-3-hydroxy-8b,10a-dimethyl-7-oxo-1,2,3,3a,3b,7,7a,8,8a,8b,8c,9,10,10a-tetradecahydrocyclopenta[a]cyclopropa[g]phenanthren-1-yl acetate Chemical compound C1=C(Cl)C2=CC(=O)[C@@H]3C[C@@H]3[C@]2(C)[C@@H]2[C@@H]1[C@@H]1[C@H](O)C[C@@](C(C)=O)(OC(=O)C)[C@@]1(C)CC2 HRANPRDGABOKNQ-ORGXEYTDSA-N 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000007726 management method Methods 0.000 description 1
- 238000004806 packaging method and process Methods 0.000 description 1
Images
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
本发明公开了一种元数据服务系统、元数据同步方法和写服务器更新方法。本发明提供的元数据服务系统,包括:写服务器和读服务器;所述写服务器用于存储并行文件系统中的元数据,接受读访问;以及接受写访问,修改并行文件系统中的元数据,并将修改后的元数据同步更新到读服务器;读服务器,用于存储元数据,接受读访问;以及接受写服务器对元数据的同步更新;还用于当监测到所述写服务器失效时,转换为写服务。本发明通过读服务器和写服务器彼此互为备份,解决单点失效问题,并可满足高效率的大量并发访问需求。
The invention discloses a metadata service system, a metadata synchronization method and a write server update method. The metadata service system provided by the present invention includes: a write server and a read server; the write server is used to store metadata in the parallel file system and accept read access; and accept write access to modify the metadata in the parallel file system, And synchronously update the modified metadata to the reading server; the reading server is used to store the metadata and accept read access; and accept the synchronous update of the metadata from the writing server; it is also used to monitor the failure of the writing server, Convert to write service. The invention solves the problem of a single point of failure by using the read server and the write server as backups for each other, and can satisfy a large number of high-efficiency concurrent access requirements.
Description
技术领域 technical field
本发明涉及集群计算中的分布式存储,尤其涉及对并行文件系统中的元数据进行存储、修改与读取的元数据服务系统,以及元数据同步方法与写服务器更新方法。The invention relates to distributed storage in cluster computing, in particular to a metadata service system for storing, modifying and reading metadata in a parallel file system, a method for synchronizing metadata and a method for updating a write server.
背景技术 Background technique
文件系统是计算机系统中用于数据的存储和读取的子系统,一般构建在磁盘之上。在网络应用场景中,计算与存储往往是分离的,即通过网络访问分布式的文件系统。对于使用大量计算机并行工作的高性能计算和高负载的网络服务应用,常常需要使用多台专用于存储的服务器共同提供文件系统服务,来满足访问带宽需求。A file system is a subsystem used for data storage and reading in a computer system, and is generally built on a disk. In network application scenarios, computing and storage are often separated, that is, distributed file systems are accessed through the network. For high-performance computing and high-load network service applications that use a large number of computers to work in parallel, it is often necessary to use multiple servers dedicated to storage to jointly provide file system services to meet access bandwidth requirements.
一种比较常用的高性能并行文件系统的架构是将文件系统的索引信息等“元数据”存放于元数据服务器,将数据存放在其他数据服务器,客户程序只需要从元数据服务器取回少量的索引信息,在大部分时候,直接和数据服务器交互。这种架构结构简单,可以提供较高的访问带宽,适用于大多数应用。A commonly used high-performance parallel file system architecture is to store the "metadata" such as the index information of the file system in the metadata server, and store the data in other data servers. The client program only needs to retrieve a small amount of data from the metadata server. Index information, most of the time, directly interacts with the data server. This architecture is simple in structure, can provide high access bandwidth, and is suitable for most applications.
但是,这种架构的元数据服务器是一个单点,其失效将导致整个文件系统不可用,因此存在可用性问题,无法提供电信级服务。同时,元数据服务器作为一个单点,其性能在面对高密度的峰值访问时可能会成为性能瓶颈。However, the metadata server of this architecture is a single point, and its failure will cause the entire file system to be unavailable, so there is an availability problem and it cannot provide carrier-class services. At the same time, as a single point, the performance of the metadata server may become a performance bottleneck in the face of high-density peak access.
目前已有的多元数据服务器的并行文件系统架构中,大部分是通过共享存储设备构成的主备服务器,主备服务器之间的同步借助于共享存储设备完成,但在发生主备服务器倒换的时候,会丧失部分状态信息。In the current parallel file system architecture of multi-data servers, most of them are active and standby servers composed of shared storage devices. The synchronization between the active and standby servers is completed by means of shared storage devices. , some state information will be lost.
另外,现有技术中有些并行文件系统通过接力的多跳查询来访问位于多台服务器上的索引信息,如PVFS2,但这会导致查询时延的增加以及难以保证文件系统中多台服务器上信息的一致性。In addition, some parallel file systems in the prior art access index information located on multiple servers through relay multi-hop queries, such as PVFS2, but this will increase the query delay and make it difficult to ensure that the information on multiple servers in the file system consistency.
综上所述,现有技术中,对于并行文件系统中的元数据管理存在如下缺点:To sum up, in the prior art, the metadata management in the parallel file system has the following disadvantages:
1、已有的由单服务器提供元数据服务的并行文件系统不具备高可用性,该元数据服务器失效将会导致整个文件系统不可用。1. The existing parallel file system that provides metadata services by a single server does not have high availability, and the failure of the metadata server will cause the entire file system to be unavailable.
2、使用共享存储设备构成主备结构的元数据服务器,在发生倒换的时候会损失状态数据。并且,同一时刻只有一台服务器提供服务,当面向大量并发访问时,难以提供足够的负荷能力。2. The metadata server that uses shared storage devices to form a primary and secondary structure will lose state data when a switchover occurs. Moreover, only one server provides services at the same time, and it is difficult to provide sufficient load capacity when facing a large number of concurrent accesses.
3、通过多节点分布式存储元数据,即通过互相查询信息的方式来提供多个元数据服务器服务的方式,会影响访问的效率,增加访问的时延。同时也没有解决状态数据的同步问题。3. Multi-node distributed storage of metadata, that is, the way of providing multiple metadata server services by querying each other for information will affect the efficiency of access and increase the delay of access. At the same time, it does not solve the synchronization problem of state data.
发明内容 Contents of the invention
本发明提供一种元数据服务系统,解决单点失效问题,并可满足高效率的大量并发访问需求。The invention provides a metadata service system, which solves the problem of single point of failure and can meet the requirements of high-efficiency massive concurrent access.
根据本发明提供的元数据服务系统,本发明还提供一种元数据同步方法,实现各服务器之间存储的元数据的同步更新,保证各服务器存储数据的一致性。According to the metadata service system provided by the present invention, the present invention also provides a metadata synchronization method, which realizes the synchronous update of metadata stored between servers and ensures the consistency of data stored by each server.
根据本发明提供的元数据服务系统,本发明还提供一种写服务器更新方法,当写服务器失效时,由读服务器转换为写服务器,解决单点失效问题。According to the metadata service system provided by the present invention, the present invention also provides a method for updating a write server. When the write server fails, the read server is converted to the write server to solve the single point failure problem.
本发明提供的元数据服务系统,包括:写服务器、至少两个读服务器和仲裁服务装置;The metadata service system provided by the present invention includes: a write server, at least two read servers and an arbitration service device;
所述写服务器,用于存储并行文件系统中的元数据,接受读访问;以及接受写访问,修改所述元数据,并将修改后的元数据同步更新到所述读服务器;The write server is used to store metadata in the parallel file system, accept read access; and accept write access, modify the metadata, and synchronously update the modified metadata to the read server;
所述读服务器,用于存储所述元数据,接受读访问;以及接受所述写服务器对元数据的同步更新;还用于监测所述写服务器是否失效,当监测到所述写服务器失效时,每个所述读服务器还用于向所述仲裁服务装置发起仲裁请求,以及根据所述仲裁服务装置确定出的继承顺序,拥有最大更新序列号的所述读服务器转换为所述写服务器;The read server is used to store the metadata and accept read access; and accept the synchronous update of the metadata by the write server; it is also used to monitor whether the write server is invalid, and when it is detected that the write server is invalid Each of the read servers is further configured to initiate an arbitration request to the arbitration service device, and according to the succession order determined by the arbitration service device, the read server with the largest update sequence number is converted to the write server;
所述仲裁服务装置,用于根据读服务器发起的仲裁请求,为各读服务器排定继承写服务器的继承顺序;The arbitration service device is configured to, according to the arbitration request initiated by the read server, arrange for each read server to inherit the succession order of the write server;
其中,所述更新序列号为所述写服务器分配给每次更新的元数据的按序递增的对应序列号,并在将修改后的元数据同步更新到所述读服务器时携带所述更新序列号。Wherein, the update sequence number is the corresponding sequence number assigned by the write server to the metadata updated each time, and the update sequence is carried when the modified metadata is synchronously updated to the read server Number.
转换为所述写服务器的对应读服务器,还用于将最后一次修改的元数据同步到其余读服务器。The corresponding read server converted to the write server is also used to synchronize the last modified metadata to the rest of the read servers.
所述仲裁服务装置为独立于所述写服务器和读服务器的一个实体装置;或者The arbitration service device is an entity device independent of the write server and the read server; or
所述仲裁服务装置集成于所述读服务器中。The arbitration service device is integrated in the read server.
当所述仲裁服务装置为独立于所述写服务器和读服务器的一个实体装置时,还用于存储所述写服务器和读服务器信息;接收客户端发起的服务器信息查询请求,返回所述写服务器和读服务器信息。When the arbitration service device is an entity device independent of the write server and the read server, it is also used to store the information of the write server and the read server; receive the server information query request initiated by the client, and return to the write server and read server information.
本发明提供的元数据同步方法,应用于本发明提供的元数据服务系统,包括:The metadata synchronization method provided by the present invention is applied to the metadata service system provided by the present invention, including:
所述写服务器接受写请求,将本次修改的元数据写入临时区域,并向所述读服务器发送写请求通告;The write server accepts the write request, writes the modified metadata into the temporary area, and sends a write request notification to the read server;
所述读服务器接收到所述写请求通告后,对修改前的对应元数据禁止操作,或者等待正在读取的所述对应元数据读取完毕后禁止操作;并在禁止操作完成后向所述写服务器返回通告接收响应;After the read server receives the write request notification, it prohibits operations on the corresponding metadata before modification, or prohibits operations after waiting for the corresponding metadata that is being read to be read; The write server returns a notification receipt response;
所述写服务器接收到所述读服务器返回的通告接收响应后,将所述临时区域中存储的所述元数据发送给所述读服务器;以及用所述临时区域中存储的所述元数据,更新本地存储的对应元数据记录;After the write server receives the notification reception response returned by the read server, send the metadata stored in the temporary area to the read server; and use the metadata stored in the temporary area, Update the corresponding metadata records stored locally;
所述读服务器更新本地存储的对应元数据记录,并在更新成功后解除禁止操作。The read server updates the corresponding metadata record stored locally, and releases the prohibited operation after the update is successful.
还包括:Also includes:
所述写服务器监测所述读服务器是否失效;The write server monitors whether the read server fails;
所述写服务器仅向当前未失效的读服务器发送所述写请求通告;以及当确定当前未失效的读服务器都返回通告接收响应后,将所述临时区域中存储的所述元数据发送给当前未失效的读服务器。The write server only sends the write request notification to the current non-failure read server; and after determining that the current non-failure read server returns a notification reception response, sends the metadata stored in the temporary area to the current A non-failed read server.
若所述读服务器在禁止操作完成后监测到所述写服务器失效,则解除禁止操作。If the read server detects that the write server is invalid after the prohibition operation is completed, the prohibition operation is lifted.
本发明提供的写服务器更新方法,应用于本发明提供的元数据服务系统,包括:The write server update method provided by the present invention is applied to the metadata service system provided by the present invention, including:
所述读服务器监测所述写服务器的当前状态;the read server monitors the current state of the write server;
当所述读服务器监测到所述写服务器失效时,每个所述读服务器向所述仲裁服务装置发起仲裁请求,所述仲裁服务装置为各读服务器排定继承写服务器的继承顺序,根据所述继承顺序,拥有最大更新序列号的所述读服务器转换为所述写服务器,其中,所述更新序列号为所述写服务器分配给每次更新的元数据的按序递增的对应序列号,并在将修改后的元数据同步更新到所述读服务器时携带所述更新序列号When the read server detects that the write server fails, each of the read servers initiates an arbitration request to the arbitration service device, and the arbitration service device arranges an inheritance sequence for each read server to inherit the write server, according to the In the order of inheritance, the read server with the largest update sequence number is converted to the write server, wherein the update sequence number is the corresponding sequence number assigned by the write server to the metadata of each update in sequence, and carrying the update sequence number when synchronously updating the modified metadata to the read server
上述写服务器更新方法,具体包括:The above write server update method specifically includes:
当所述读服务器监测到所述写服务器失效时,向所述仲裁服务装置发起请求加锁的仲裁请求,携带本地存储的最新的更新序列号;When the read server detects that the write server is invalid, it initiates an arbitration request for locking to the arbitration service device, carrying the latest update sequence number stored locally;
所述仲裁服务装置存储各读服务器的最新的更新序列号,为各读服务器排定继承写服务器的继承顺序,并分配锁给第一继承顺序对应的读服务器;The arbitration service device stores the latest update serial number of each read server, arranges the inheritance order of each read server to inherit the write server, and assigns a lock to the read server corresponding to the first inheritance order;
得到锁的读服务器根据所述仲裁服务装置中存储的各读服务器的最新的更新序列号,判断本地存储的最新的更新序列号是否是其中最大的更新序列号;若是,则将自身转换为所述写服务器;否则,将锁让出给所述继承顺序中的下一个读服务器。The read server that obtains the lock judges whether the latest update sequence number stored locally is the largest update sequence number among them according to the latest update sequence numbers of each read server stored in the arbitration service device; write server; otherwise, yield the lock to the next reader server in the succession order.
当所述仲裁服务装置集成于所述读服务器中时,预先设置任一所述读服务器中集成的所述仲裁服务装置为有权仲裁服务装置,由所述有权仲裁服务装置接收所述仲裁请求;When the arbitration service device is integrated in the read server, the arbitration service device integrated in any one of the read servers is preset as an authorized arbitration service device, and the authorized arbitration service device receives the arbitration ask;
当所述有权仲裁服务装置所属读服务器转换为写服务器时,由该所属读服务器随机选择一个其余所述读服务器中集成的所述仲裁服务装置为有权仲裁服务装置,并通知给各读服务器。When the read server to which the authorized arbitration service device belongs is converted into a write server, the read server randomly selects one of the arbitration service devices integrated in the other read servers as the authorized arbitration service device, and notifies each reader server.
本发明由写服务器和读服务器组成元数据服务系统。写服务器不仅能接受读访问,还能接受写访问,修改并行文件系统中的元数据,并将修改后的元数据同步更新到读服务器;读服务器存储元数据,接受读访问;并接受写服务器对元数据的同步更新;当读服务器监测到写服务器失效时,转换为写服务器。由于本发明提供的元数据服务系统中设置有互为备份的读服务器和写服务器,因此,有效解决了单点失效问题。通过数据同步,各读服务器存储一致的元数据,彼此可以互为备份,能满足大量并发访问需求;且由于各服务器并行设置,不需要多跳查询,每台读服务器可以直接接收客户端的读访问,读取效率高。The invention consists of a write server and a read server to form a metadata service system. The write server can not only accept read access, but also accept write access, modify the metadata in the parallel file system, and update the modified metadata to the read server synchronously; the read server stores metadata and accepts read access; and accepts the write server Synchronous update of metadata; when the read server detects that the write server fails, it will switch to the write server. Since the metadata service system provided by the present invention is provided with a read server and a write server for mutual backup, the problem of single point failure is effectively solved. Through data synchronization, each read server stores consistent metadata, which can back up each other and meet a large number of concurrent access requirements; and because each server is set in parallel, there is no need for multi-hop queries, and each read server can directly receive read access from clients , high read efficiency.
附图说明 Description of drawings
图1为本发明实施例提供的元数据服务系统结构示意图之一;Fig. 1 is one of the structural diagrams of the metadata service system provided by the embodiment of the present invention;
图2为本发明实施例提供的元数据服务系统结构示意图之二;Fig. 2 is the second schematic diagram of the structure of the metadata service system provided by the embodiment of the present invention;
图3为本发明实施例提供的元数据服务系统结构示意图之三;Fig. 3 is the third structural diagram of the metadata service system provided by the embodiment of the present invention;
图4为本发明实施例提供的元数据同步更新方法流程图;FIG. 4 is a flowchart of a method for synchronously updating metadata provided by an embodiment of the present invention;
图5为本发明实施例提供的写服务器更新方法流程图之一;FIG. 5 is one of the flow charts of the write server update method provided by the embodiment of the present invention;
图6为本发明实施例提供的写服务器更新方法流程图之二;FIG. 6 is the second flowchart of the write server update method provided by the embodiment of the present invention;
图7为写服务器更新方法的一个具体实例信令流程图。Fig. 7 is a signaling flow chart of a specific example of the write server update method.
具体实施方式 Detailed ways
本发明实施例提供一种元数据服务系统、元数据同步更新方法和写服务器更新方法,通过多台服务器彼此互为备份,各服务器之间存储的元数据的同步更新以及写服务器更新,解决单点失效问题,并可满足高效率的大量并发访问需求。Embodiments of the present invention provide a metadata service system, a method for synchronously updating metadata, and a method for updating server writes. Through multiple servers serving as backups for each other, the synchronization update of metadata stored between servers and the update of write servers solve the problem of single point failure problem, and can meet the high efficiency of a large number of concurrent access requirements.
下面给合附图,对本发明提供的系统与方法进行详细描述。The system and method provided by the present invention will be described in detail below with reference to the accompanying drawings.
参见图1,为本发明实施例提供的元数据服务系统结构示意图之一,包括:写服务器11和读服务器12;其中:Referring to FIG. 1 , it is one of the schematic structural diagrams of the metadata service system provided by the embodiment of the present invention, including: a
写服务器11,用于存储并行文件系统中的元数据,接受读访问;以及接受写访问,修改并行文件系统中的元数据,并将修改后的元数据同步更新到读服务器12;The
读服务器12,用于存储并行文件系统中的元数据,接受读访问;以及接受写服务器11对元数据的同步更新;还用于监测写服务器11是否失效,当监测到写服务器11失效时,读服务器12转换为写服务器。The
在图1所示系统中,配置了一个写服务器和一个读服务器。写服务器和读服务器都可以接受读访问;写服务器可以实现元数据的修改和数据同步;当该写服务器失效时,读服务器转换为写服务器,接受读访问和写访问,解决单点失效问题。In the system shown in Figure 1, a write server and a read server are configured. Both the write server and the read server can accept read access; the write server can implement metadata modification and data synchronization; when the write server fails, the read server is converted into a write server to accept read access and write access, solving the single point of failure problem.
一实施例中,元数据服务系统还包括仲裁服务装置13,且读服务器12至少配置有两个,其结构示意图如图2所示,其中,仲裁服务装置13为一个独立于写服务器11和读服务器12的实体装置。各部件功能如下:In one embodiment, the metadata service system further includes an
写服务器11,用于存储并行文件系统中的元数据,接受读访问;以及接受写访问,修改并行文件系统中的元数据,并将修改后的元数据同步更新到各读服务器12;The
读服务器12,用于存储元数据,接受读访问;以及接受写服务器11对元数据的同步更新;还用于监测写服务器11是否失效,当监测到写服务器11失效时,启动仲裁服务装置13;The
仲裁服务装置13,用于确定出读服务器12之一转换为写服务器。The
一实施例中,仲裁服务装置13还可以存储写服务器11和读服务器12的相关信息;接收客户端发起的服务器信息查询请求,返回写服务器和读服务器信息。客户端可以根据返回的服务器信息,向任一读服务器发起读请求,以及向写服务器发起写请求。In an embodiment, the
实际应用中,写服务器和读服务器可以是具有相同功能的服务器。当某一服务器用作写服务器时,需要执行写服务器的相关功能,即:除了存储并行文件系统中的元数据,接受读访问外,还接受写访问,修改并行文件系统中的元数据,并将修改后的元数据同步更新到各读服务器。当某一服务器用作读服务器时,存储元数据并接受读访问。用作读服务器的各服务器都具备写服务器相关功能,只不过在用作读服务器时该相关功能没有开启或暂时闲置。当系统中的写服务器失效时,任意一个读服务器都可以转换为写服务器。In practical applications, the write server and the read server may be servers with the same function. When a server is used as a write server, it needs to perform the related functions of the write server, that is: in addition to storing metadata in the parallel file system and accepting read access, it also accepts write access, modifies the metadata in the parallel file system, and Synchronously update the modified metadata to each read server. When a server is used as a read server, metadata is stored and read access is accepted. Each server used as a read server has a write server-related function, but the related function is not enabled or is temporarily idle when used as a read server. When the write server in the system fails, any read server can be converted into a write server.
参见图3,为本发明实施例提供的元数据服务系统另一结构示意图,包括:写服务器11、至少两个读服务器12和仲裁服务装置13。其中,与图2所示结构不同的是,仲裁服务装置13不是一个独立设置的实体装置,而是集成于各读服务器中。集成于各读服务器中的仲裁服务装置13可以是纯软件模块,也可以是软件和硬件结合的相应功能模块。Referring to FIG. 3 , it is another structural diagram of a metadata service system provided by an embodiment of the present invention, including: a
下面结合本发明上述实施例提供的元数据服务系统,对元数据的同步更新流程以及当写服务器失效时的写服务器更新流程进行具体说明。In the following, the metadata synchronous update process and the write server update process when the write server fails will be specifically described in conjunction with the metadata service system provided by the above-mentioned embodiments of the present invention.
参见图4,为元数据同步更新方法流程图,具体包括:Referring to Figure 4, it is a flow chart of the metadata synchronization update method, which specifically includes:
步骤S401、写服务器接受写请求,将本次修改的元数据写入临时区域;Step S401, the write server accepts the write request, and writes the modified metadata into the temporary area;
步骤S402、写服务器向读服务器发送写请求通告;Step S402, the write server sends a write request notification to the read server;
步骤S403、读服务器接收到写请求通告后,对修改前的对应元数据禁止操作,或者等待正在读取的对应元数据读取完毕后禁止操作;Step S403. After receiving the write request notification, the reading server prohibits operations on the corresponding metadata before modification, or prohibits operations after waiting for the corresponding metadata being read to be read;
步骤S404、禁止操作完成后向写服务器返回通告接收响应;Step S404, returning a notification reception response to the write server after the prohibition operation is completed;
步骤S405、写服务器接收到读服务器返回的通告接收响应后,将临时区域中存储的元数据打包发送给读服务器;以及用临时区域中存储的元数据,更新本地存储的对应元数据记录;Step S405, after the write server receives the notification reception response returned by the read server, it packages and sends the metadata stored in the temporary area to the read server; and uses the metadata stored in the temporary area to update the corresponding metadata records stored locally;
步骤S406、读服务器更新本地存储的对应元数据记录,并在更新成功后解除禁止操作。Step S406, the reading server updates the corresponding metadata record stored locally, and releases the prohibited operation after the update is successful.
上述步骤402可以在写服务器接受写请求后即开始执行,即在本次修改的元数据被写入临时区域的过程中,可以向读服务器发送写请求通告,不必等待本次修改的全部元数据被写入完毕再执行步骤S402。The above step 402 can be executed after the write server accepts the write request, that is, in the process of writing the modified metadata into the temporary area, it can send a write request notification to the read server without waiting for all the modified metadata After being written, step S402 is executed.
在步骤S403中,对应元数据被禁止读取;以免后续执行数据更新过程中同时执行读操作,使得读取的数据错误。In step S403, the corresponding metadata is prohibited from being read; in order to prevent the read operation from being performed simultaneously during the subsequent data update process, causing the read data to be wrong.
在步骤S406中,读服务器成功更新本地存储的对应元数据记录后,及时解除禁止操作,重新接受元数据读取访问。当元数据服务系统中配置有不止一个读服务器时,即使有其他读服务器的更新尚未完成,但这些处于不稳定状态的读服务器的对应元数据是不允许访问的,所以,不会有客户端进程访问到不一致的元数据,从而保证了文件系统元数据修改后的一致性。In step S406, after the read server successfully updates the corresponding metadata record stored locally, it releases the prohibition operation in time, and accepts the metadata read access again. When more than one read server is configured in the metadata service system, even if the update of other read servers has not been completed, the corresponding metadata of these read servers in an unstable state is not allowed to be accessed, so there will be no client The process accesses inconsistent metadata, thus ensuring the consistency of the file system metadata after modification.
一实施例中,写服务器和各读服务器相互监测彼此的状态,并可以相互通报监测到的状态信息。具体应用中,可以通过在写服务器和各读服务器中运行一个状态监视程序(具体监测方法为现有技术,在此不作详述)。在写服务器向读服务器发送写请求通告之前,写服务器根据当前监测结果,确定出是否有读服务器失效;仅向当前未失效的读服务器发送写请求通告。写服务器在接收读服务器返回的通告接收响应时,根据当前的监测结果,确定出当前未失效的各读服务器都返回了通告接收响应后,就将临时区域中存储的元数据打包发送给当前未失效的各读服务器。In an embodiment, the write server and each read server monitor each other's status, and can report the monitored status information to each other. In a specific application, a status monitoring program may be run in the write server and each read server (the specific monitoring method is in the prior art, and will not be described in detail here). Before the write server sends a write request notification to the read server, the write server determines whether any read server fails according to the current monitoring result; only sends the write request notification to the currently unfailed read server. When the write server receives the notification reception response returned by the reading server, according to the current monitoring results, after determining that all the reading servers that are not invalid have returned the notification reception response, it will package the metadata stored in the temporary area and send it to the currently unavailable read server. Failed read servers.
通过写服务器和各读服务器相互监测彼此的状态,可以保证元数据的同步更新顺利进行,具体包括:By monitoring each other's status between the write server and each read server, it is possible to ensure the smooth progress of metadata synchronization updates, including:
1、若读服务器在禁止操作过程中失效,写服务器获知后,不必等待该失效的读服务器返回通告接收响应,可以对其余未失效的读服务器继续进行元数据同步。1. If the read server fails during the prohibition operation, after the write server is informed, it does not need to wait for the failed read server to return a notification receipt response, and can continue to synchronize metadata with the rest of the non-failed read servers.
2、若读服务器在禁止操作过程中,监测到写服务器失效,读服务器之间会相互通报,并解除禁止操作。本次写操作失败,但不会让文件系统处于不稳定状态。2. If the read server detects that the write server is invalid during the prohibition operation, the read servers will notify each other and release the prohibition operation. This write operation failed without leaving the file system in an unstable state.
3、如果写服务器在数据打包发送的过程中失效,那么,已经接收到打包发送的更新后的元数据的读服务器可以完成更新;如果还有读服务器没有接收到更新后的元数据,那么可以由后继写服务器重新同步给它(在后述写服务器同步方法中进行具体描述),保证各读服务器中存储的元数据同步。3. If the write server fails during the process of data packaging and sending, then the reading server that has received the updated metadata that is packaged and sent can complete the update; if there are still reading servers that have not received the updated metadata, then you can The subsequent write server re-synchronizes to it (detailed in the write server synchronization method described later), to ensure the synchronization of metadata stored in each read server.
4、在元数据打包发送到读服务器后,如果某一读服务器失效,不会影响其它读服务器的数据更新,且写服务器能及时获知失效的读服务器,不必等待该读服务器返回信息就可以判断整个数据同步过程结束。4. After the metadata is packaged and sent to the reading server, if a reading server fails, it will not affect the data update of other reading servers, and the writing server can know the failed reading server in time, and can judge without waiting for the reading server to return information The entire data synchronization process is over.
综上,在各种可能情况下,本发明提供的元数据同步方法都可以保持文件系统中元数据的一致性。同时,所有读服务器可以同时提供读访问服务,对于大量并发尤其是突发访问有很强的适用性。In summary, under various possible circumstances, the metadata synchronization method provided by the present invention can maintain the consistency of metadata in the file system. At the same time, all read servers can provide read access services at the same time, which is highly applicable to a large number of concurrent, especially burst access.
根据本发明上述实施例提供的元数据服务系统,本发明还提供一种写服务器更新方法,其具体实现流程如图5所示,包括:According to the metadata service system provided by the above-mentioned embodiments of the present invention, the present invention also provides a write server update method, the specific implementation process of which is shown in Figure 5, including:
步骤S501、读服务器监测写服务器的当前状态;Step S501, the read server monitors the current state of the write server;
步骤S502、判断写服务器是否失效,若否,转至步骤S501;若写服务器失效,执行步骤S503;Step S502, judging whether the write server is invalid, if not, go to step S501; if the write server is invalid, execute step S503;
步骤S503、读服务器转换为写服务器。Step S503, the read server is converted into a write server.
当元数据系统中还包括仲裁服务装置,且读服务器至少配置有两个,读服务器监测到当前写服务器失效时,更新写服务器的具体流程如图6所示,包括:When the metadata system also includes an arbitration service device, and at least two read servers are configured, and the read server detects that the current write server is invalid, the specific process for updating the write server is shown in Figure 6, including:
步骤S601、读服务器向仲裁服务装置发起仲裁请求,携带本地存储的最新的更新序列号。Step S601 , the read server initiates an arbitration request to the arbitration service device, carrying the latest updated serial number stored locally.
其中,更新序列号为写服务器分配给每次更新的元数据的按序递增的对应序列号,并在将修改后的元数据同步更新到读服务器时携带该更新序列号,各读服务器在本地保存最新的更新序列号。Among them, the update sequence number is the corresponding sequence number assigned by the write server to the metadata updated each time, and the update sequence number is carried when the modified metadata is synchronously updated to the read server, and each read server locally Save the latest update serial number.
本实施例中,读服务器向仲裁服务装置发起仲裁请求,是请求加锁的仲裁请求。In this embodiment, the read server initiates an arbitration request to the arbitration service device, which is an arbitration request requesting locking.
步骤S602、仲裁服务装置存储各读服务器的最新的更新序列号,并为各读服务器排定继承写服务器的继承顺序。Step S602, the arbitration service device stores the latest update sequence number of each read server, and arranges for each read server the succession sequence of the write server.
继承顺序的排序方法,可以由仲裁服务装置根据接收到的仲裁请求的先后顺序进行排定,或者也可以由仲裁服务装置随机排定。The method for sorting the succession order may be arranged by the arbitration service device according to the order of the received arbitration requests, or may be randomly arranged by the arbitration service device.
步骤S603、仲裁服务装置分配锁给第一继承顺序对应的读服务器。Step S603, the arbitration service device allocates locks to the read servers corresponding to the first inheritance sequence.
步骤S604、当前得到锁的读服务器将本地存储的最新的更新序列号与仲裁服务装置中存储的各读服务器的最新的更新序列号进行比较。Step S604 , the read server currently obtaining the lock compares the latest update sequence number stored locally with the latest update sequence number of each read server stored in the arbitration server device.
步骤S605、判断本地存储的最新的更新序列号是否是其中最大的更新序列号;若是,执行步骤S606;否则,执行步骤S607;Step S605, judging whether the latest update serial number stored locally is the largest update serial number; if yes, execute step S606; otherwise, execute step S607;
步骤S606、当前得到锁的读服务器将自身转换为写服务器,结束流程。Step S606, the currently locked read server transforms itself into a write server, and the process ends.
步骤S607、当前得到锁的读服务器将锁让出给继承顺序中的下一个读服务器,转至步骤S604。In step S607, the read server that currently obtains the lock yields the lock to the next read server in the succession sequence, and proceeds to step S604.
参见图7,为写服务器更新方法的一个具体实例信令流程图。描述如下:Referring to FIG. 7 , it is a signaling flowchart of a specific example of a write server update method. Described as follows:
假设元数据服务系统中设置有读服务器1、读服务器2、读服务器3和读服务器4,读服务器1在本地保存的最新的更新序列号为1009,读服务器2在本地保存的最新的更新序列号为1008,读服务器3在本地保存的最新的更新序列号为1008,读服务器4在本地保存的最新的更新序列号为1009,根据前述元数据同步方法可知,各读服务器中存储的最新序列号最多只会相差1。Assuming that the metadata service system is equipped with
若监测到当前写服务器失效,则需要进行写服务器更新,在读服务器1、读服务器2、读服务器3和读服务器4中确定出一个服务器作为写服务器。具体信令过程为:If it is detected that the current write server is invalid, the write server needs to be updated, and one server among the read
读服务器1、读服务器2、读服务器3和读服务器4分别向仲裁服务装置发起请求加锁的仲裁请求,并携带本地存储的最新的更新序列号;Read
仲裁服务装置存储读服务器1、读服务器2、读服务器3和读服务器4的最新的更新序列号,并为读服务器1、读服务器2、读服务器3和读服务器4排定继承写服务器的继承顺序,分别为:读服务器2为第一继承顺序,读服务器4为第二继承顺序,读服务器3为第三继承顺序,读服务器1为第四继承顺序;并分配锁给第一继承顺序对应的读服务器2;The quorum server stores the latest update sequence numbers of
仲裁服务装置可以将排定的顺序通知给各读服务器,也可以将排定的顺序保存在本地,由各读服务器主动来获知;The arbitration service device can notify each read server of the scheduled order, or store the scheduled order locally, and each read server can actively learn about it;
读服务器2获知自己是第一继承顺序并得到了锁,根据仲裁服务装置中存储的其余读服务器的最新的更新序列号,判断本地存储的最新的更新序列号是否是其中最大的更新序列号;由于读服务器2本地保存的最新的更新序列号为1008,而仲裁服务装置中存储的最大更新序列号为1009,从而判断本地存储的最新的更新序列号不是最大的更新序列号(也即不是最新的更新序列号,因为更新序列号是按照元数据的更新次数递增的);该读服务器2放弃锁,将锁让出给继承顺序中的下一个(第二继承顺序)读服务器4;Read
读服务器4重复读服务器2的比较判断过程,由于读服务器4本地保存的最新的更新序列号为1009,而仲裁服务装置中存储的最大更新序列号也为1009,因此,读服务器4判断本地存储的最新的更新序列号是仲裁服务装置中存储的最大的更新序列号,将其自身转换为写服务器(作为新的写服务器),并写入仲裁服务装置中;The
其它的读服务器发现仲裁服务装置中已有服务器(即读服务器4)设置为写服务器,则设置自己为被同步的服务器,保持自己的读服务器身份;。Other read servers find that the existing server (ie, the read server 4) in the arbitration service device is set as the write server, and then set themselves as the synchronized server and keep their own read server identity;
新仲裁出的写服务器拥有最大的更新序列号,其元数据的更新是最新的,为了使得各读服务器存储的元数据一致,该新仲裁出的写服务器(即读服务器4),将重发其最后一次更新的数据给其余读服务器,进行数据同步更新;即:读服务器4重发其最后一次更新的数据给读服务器1、读服务器2和读服务器3;其中,读服务器2和读服务器3接受更新,存储最新的更新序列号1009;读服务器1保存的最新更新序列号已是1009,表明之前已进行过相应更新,可以忽略已有更新,接受本次更新。The newly arbitrated write server has the largest update sequence number, and its metadata update is the latest. In order to make the metadata stored by each read server consistent, the newly arbitrated write server (read server 4) will resend Its last updated data is sent to other read servers for synchronous data update; that is: read
图7所示实例中,仲裁服务装置为独立于写服务器和读服务器的一个实体装置。当仲裁服务装置集成于各读服务器中时,可以预先设置任一读服务器中集成的仲裁服务装置为有权仲裁服务装置,当写服务器失效时,信令流程基本类似,各读服务器向设置的该有权仲裁服务装置发起仲裁请求。In the example shown in FIG. 7, the arbitration service device is an entity device independent of the write server and the read server. When the arbitration service device is integrated in each read server, the arbitration service device integrated in any read server can be pre-set as the authorized arbitration service device. When the write server fails, the signaling process is basically similar. The authorized arbitration service device initiates an arbitration request.
为了避免有权仲裁服务装置所属读服务器转换为写服务器之后,若发生失效,下次仲裁新的写服务器时,各读服务器不知该向集成于哪一个读服务器中的仲裁服务装置发起仲裁请求,因此,当有权仲裁服务装置所属读服务器转换为写服务器时,由该所属读服务器随机选择一个其余读服务器中集成的仲裁服务装置作为下一任有权仲裁服务装置,并通知给各读服务器。In order to avoid that after the read server to which the authorized arbitration service device belongs is converted into a write server, if a failure occurs, when a new write server is arbitrated next time, each read server does not know which arbitration service device integrated in the read server should initiate an arbitration request. Therefore, when the read server to which the authorized arbitration service device belongs is converted into a write server, the subordinate read server randomly selects an arbitration service device integrated in the remaining read servers as the next authorized arbitration service device, and notifies each read server.
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分步骤是可以通过程序来指令相关的硬件来完成,该程序可以存储于一计算机可读取存储介质中,如:ROM/RAM、磁碟、光盘等。Those of ordinary skill in the art can understand that all or part of the steps in the method of the above-mentioned embodiments can be completed by instructing related hardware through a program, and the program can be stored in a computer-readable storage medium, such as: ROM/RAM, Diskettes, CDs, etc.
综上所述,本发明由写服务器、至少两个读服务器和仲裁服务装置组成元数据服务系统。由于本发明提供的元数据服务系统中设置有至少两个读服务器,且当写服务器失效时,可以由读服务器其中之一转换为写服务器,因此,有效解决了单点失效问题。通过数据同步,多台读服务器存储一致的元数据,彼此可以互为备份,能满足大量并发访问需求;且由于多台读服务器并行设置,不需要多跳查询,每台读服务器可以直接接收客户端的读访问,读取效率高。In summary, the present invention consists of a write server, at least two read servers and an arbitration service device to form a metadata service system. Since the metadata service system provided by the present invention is provided with at least two read servers, and when the write server fails, one of the read servers can be converted into a write server, thus effectively solving the single point failure problem. Through data synchronization, multiple read servers store consistent metadata, which can back up each other and meet a large number of concurrent access requirements; and because multiple read servers are set in parallel, multi-hop queries are not required, and each read server can directly receive customers End read access, high read efficiency.
显然,本领域的技术人员可以对本发明进行各种改动和变型而不脱离本发明的精神和范围。这样,倘若本发明的这些修改和变型属于本发明权利要求及其等同技术的范围之内,则本发明也意图包含这些改动和变型在内。Obviously, those skilled in the art can make various changes and modifications to the present invention without departing from the spirit and scope of the present invention. Thus, if these modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalent technologies, the present invention also intends to include these modifications and variations.
Claims (10)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN200810224708XA CN101751415B (en) | 2008-12-09 | 2008-12-09 | Metadata service system, metadata synchronization method and write server update method |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN200810224708XA CN101751415B (en) | 2008-12-09 | 2008-12-09 | Metadata service system, metadata synchronization method and write server update method |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| CN101751415A CN101751415A (en) | 2010-06-23 |
| CN101751415B true CN101751415B (en) | 2012-03-28 |
Family
ID=42478406
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN200810224708XA Active CN101751415B (en) | 2008-12-09 | 2008-12-09 | Metadata service system, metadata synchronization method and write server update method |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN101751415B (en) |
Families Citing this family (13)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN102694825A (en) * | 2011-03-22 | 2012-09-26 | 腾讯科技(深圳)有限公司 | Data processing method and data processing system |
| CN102780571A (en) * | 2011-05-11 | 2012-11-14 | 中兴通讯股份有限公司 | Main board and spare board switching processing method and system |
| CN103580891A (en) * | 2012-07-27 | 2014-02-12 | 腾讯科技(深圳)有限公司 | Data synchronization method and system and servers |
| CN103019875B (en) * | 2012-12-19 | 2015-12-09 | 北京世纪家天下科技发展有限公司 | The method of the two main transformation of a kind of fulfillment database and device |
| US9684672B2 (en) | 2013-07-01 | 2017-06-20 | Empire Technology Development Llc | System and method for data storage |
| CN103369051B (en) * | 2013-07-22 | 2016-04-27 | 中安消技术有限公司 | A kind of data server cluster system and method for data synchronization |
| CN104158898B (en) * | 2014-08-25 | 2018-01-19 | 曙光信息产业股份有限公司 | The update method of file layout in a kind of distributed file system |
| CN104268097B (en) * | 2014-10-13 | 2018-02-06 | 浪潮(北京)电子信息产业有限公司 | A kind of metadata processing method and system |
| CN105045938A (en) * | 2015-09-17 | 2015-11-11 | 浪潮(北京)电子信息产业有限公司 | Method and system for concurrent access to metadata |
| CN105468718B (en) * | 2015-11-18 | 2020-09-08 | 腾讯科技(深圳)有限公司 | Data consistency processing method, device and system |
| CN106603665B (en) * | 2016-12-16 | 2018-04-13 | 无锡华云数据技术服务有限公司 | Cloud platform continuous data synchronous method and its device |
| CN108829496A (en) * | 2018-05-29 | 2018-11-16 | 阿里巴巴集团控股有限公司 | A kind of service calling method, device and electronic equipment |
| CN111045870B (en) * | 2019-12-27 | 2022-06-10 | 北京浪潮数据技术有限公司 | Method, device and medium for saving and restoring metadata |
Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN101059807A (en) * | 2007-01-26 | 2007-10-24 | 华中科技大学 | Method and system for promoting metadata service reliability |
| CN101247417A (en) * | 2008-03-07 | 2008-08-20 | 中国科学院计算技术研究所 | Two-layer metadata processing system and method |
-
2008
- 2008-12-09 CN CN200810224708XA patent/CN101751415B/en active Active
Patent Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN101059807A (en) * | 2007-01-26 | 2007-10-24 | 华中科技大学 | Method and system for promoting metadata service reliability |
| CN101247417A (en) * | 2008-03-07 | 2008-08-20 | 中国科学院计算技术研究所 | Two-layer metadata processing system and method |
Non-Patent Citations (2)
| Title |
|---|
| 何飞跃."并行文件系统元数据管理研究".《中国优秀博硕士学位论文全文数据库(硕士)信息科技辑》.中国学术期刊(光盘版)电子杂志社,2005,(第3期),20-28. |
| 张宏亮等."INFORMIX-HDR高可用性数据复制方案的研究及应用".《计算机应用研究》.2005,(第4期),171-173. |
Also Published As
| Publication number | Publication date |
|---|---|
| CN101751415A (en) | 2010-06-23 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN101751415B (en) | Metadata service system, metadata synchronization method and write server update method | |
| US11360854B2 (en) | Storage cluster configuration change method, storage cluster, and computer system | |
| CN101334797B (en) | Distributed file systems and its data block consistency managing method | |
| JP6382454B2 (en) | Distributed storage and replication system and method | |
| EP1625502B1 (en) | Redundant data assigment in a data storage system | |
| CN105814544B (en) | System and method for supporting persistent partition recovery in a distributed data grid | |
| US8856091B2 (en) | Method and apparatus for sequencing transactions globally in distributed database cluster | |
| CN102148850B (en) | Cluster system and service processing method thereof | |
| US20070061379A1 (en) | Method and apparatus for sequencing transactions globally in a distributed database cluster | |
| US20060212453A1 (en) | System and method for preserving state for a cluster of data servers in the presence of load-balancing, failover, and fail-back events | |
| CN112039970B (en) | Distributed business lock service method, server, system and storage medium | |
| US11003550B2 (en) | Methods and systems of operating a database management system DBMS in a strong consistency mode | |
| WO2014177085A1 (en) | Distributed multicopy data storage method and device | |
| US20230205638A1 (en) | Active-active storage system and data processing method thereof | |
| JP4461147B2 (en) | Cluster database using remote data mirroring | |
| CN116627315A (en) | Distributed block storage data access method and storage system based on stateful DHT | |
| CN107239235B (en) | A multi-control multi-active RAID synchronization method and system | |
| US20190251006A1 (en) | Methods and systems of managing consistency and availability tradeoffs in a real-time operational dbms | |
| JP5480046B2 (en) | Distributed transaction processing system, apparatus, method and program | |
| CN105323271A (en) | Cloud computing system, and processing method and apparatus thereof | |
| WO2007028249A1 (en) | Method and apparatus for sequencing transactions globally in a distributed database cluster with collision monitoring | |
| CN116055494A (en) | Resource sharing method and system of distributed block storage gateway | |
| CN113360279A (en) | Method for realizing remote multi-active system | |
| CN108279850B (en) | Data resource storage method | |
| WO2023125412A1 (en) | Method and system for synchronous data replication |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| C06 | Publication | ||
| PB01 | Publication | ||
| C10 | Entry into substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| C14 | Grant of patent or utility model | ||
| GR01 | Patent grant |