[go: up one dir, main page]

CN103092927B - File rapid read-write method under a kind of distributed environment - Google Patents

File rapid read-write method under a kind of distributed environment Download PDF

Info

Publication number
CN103092927B
CN103092927B CN201210590615.5A CN201210590615A CN103092927B CN 103092927 B CN103092927 B CN 103092927B CN 201210590615 A CN201210590615 A CN 201210590615A CN 103092927 B CN103092927 B CN 103092927B
Authority
CN
China
Prior art keywords
file
data
node
metadata node
metadata
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201210590615.5A
Other languages
Chinese (zh)
Other versions
CN103092927A (en
Inventor
郑然�
金海�
章勤
姚传威
冯晓文
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huazhong University of Science and Technology
Original Assignee
Huazhong University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huazhong University of Science and Technology filed Critical Huazhong University of Science and Technology
Priority to CN201210590615.5A priority Critical patent/CN103092927B/en
Publication of CN103092927A publication Critical patent/CN103092927A/en
Application granted granted Critical
Publication of CN103092927B publication Critical patent/CN103092927B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

本发明公开了一种分布式环境下的文件快速读方法,包括:客户节点向元数据节点发出读文件请求,客户节点判断其自身是否和分布式文件系统中该客户节点上一次读取文件所连接的数据节点保持着连接,若不是则元数据节点根据其索引区中的信息查询该文件是否存在于其数据区中,若不是则元数据节点根据其一级索引信息查询存有该文件的数据节点,客户节点与该数据节点建立连接,数据节点根据二级索引信息查找该文件所在的数据块,根据二级索引信息获取文件,并将该文件发送给客户节点,客户节点接收数据并保持与该数据节点的连接。本发明能够解决现有方法中存在的元数据节点占用内存大,以及大量文件写效率低下的问题。

The invention discloses a method for quickly reading files in a distributed environment. The connected data node keeps the connection. If not, the metadata node inquires whether the file exists in its data area according to the information in its index area. If not, the metadata node inquires whether the file exists in its primary index information. Data node, the client node establishes a connection with the data node, the data node searches for the data block where the file is located according to the secondary index information, obtains the file according to the secondary index information, and sends the file to the client node, the client node receives the data and keeps A connection to this data node. The invention can solve the problems of large memory occupied by metadata nodes and low writing efficiency of a large number of files existing in the existing method.

Description

一种分布式环境下的文件快速读写方法A Fast File Reading and Writing Method in Distributed Environment

技术领域technical field

本发明属于网络通信领域,更具体地,涉及一种分布式环境下的文件快速读写方法。The invention belongs to the field of network communication, and more specifically relates to a method for fast reading and writing of files in a distributed environment.

背景技术Background technique

随着科技和互联网的高速发展,存储系统需要存储海量的数据,应对高并发用户的访问,提供高可靠、高可用的服务,传统的单机系统已经不能满足这些需求,而分布式文件系统可以很好的满足这些需求。在实际的应用中(个人应用、web应用、科学计算等)会产生海量的文件信息,如何在分布式环境下高效的存储和访问海量的文件,至今任然是一个难题和挑战。With the rapid development of science and technology and the Internet, storage systems need to store massive amounts of data, respond to high-concurrent user access, and provide highly reliable and highly available services. Traditional stand-alone systems can no longer meet these requirements, and distributed file systems can quickly Good to meet these needs. In practical applications (personal applications, web applications, scientific computing, etc.), massive file information will be generated. How to efficiently store and access massive files in a distributed environment is still a problem and challenge.

当前主流的分布式文件系统包括googleGFS,HDFS,Lustre,Ceph等。这些分布式文件系统的架构和基本原理大致相同,主要由元数据节点,数据节点和客户节点组成。其中元数据节点保存分布式文件系统的元数据(文件系统的命名空间,文件名->数据块的映射,数据块->数据节点的映射);数据节点存放实际的文件数据(一般以数据块的形式进行存储);客户节点连接元数据节点进行文件信息查询,连接数据节点进行实际的文件传输,其在存取数据之前都要先和元数据节点进行通信。The current mainstream distributed file systems include googleGFS, HDFS, Lustre, Ceph, etc. The architecture and basic principles of these distributed file systems are roughly the same, mainly composed of metadata nodes, data nodes and client nodes. Among them, the metadata node stores the metadata of the distributed file system (the namespace of the file system, the mapping of file name -> data block, and the mapping of data block -> data node); the data node stores the actual file data (generally in the form of data block storage in the form of storage); the client node connects to the metadata node for file information query, connects to the data node for actual file transfer, and communicates with the metadata node before accessing data.

分布式文件系统对于文件的读写性能比较低。其文件读写性能差有以下原因:分布式文件系统的元数据存于元数据节点的内存中,大量的文件会占用元数据节点很多内存(一个文件会占用一个索引项);大量文件的频繁存取,会加重元数据节点的负担(客户节点不停的和元数据节点交互),造成数据节点磁盘频繁的寻道,降低系统的性能;客户节点在存取文件时和元数据节点交互的时间可能大于和数据节点的数据传输时间。The distributed file system has relatively low read and write performance for files. The reasons for its poor file read and write performance are as follows: the metadata of the distributed file system is stored in the memory of the metadata node, and a large number of files will occupy a lot of memory of the metadata node (one file will occupy one index item); a large number of files frequently Access will increase the burden on metadata nodes (client nodes interact with metadata nodes continuously), causing frequent disk seeks of data nodes and reducing system performance; client nodes interact with metadata nodes when accessing files The time may be greater than the data transfer time with the data node.

发明内容Contents of the invention

针对现有技术的缺陷,本发明的目的在于提供一种分布式环境下的文件快速写方法,旨在解决现有方法中存在的元数据节点占用内存大,以及大量文件写效率低下的问题。Aiming at the defects of the prior art, the purpose of the present invention is to provide a method for fast file writing in a distributed environment, aiming at solving the problems in the existing method that the metadata nodes occupy a large amount of memory and the writing efficiency of a large number of files is low.

为实现上述目的,本发明提供了一种分布式环境下的文件快速写方法,包括以下步骤:In order to achieve the above object, the present invention provides a method for fast writing of files in a distributed environment, comprising the following steps:

步骤S301:对分布式环境下元数据节点的数据区及索引信息进行初始化,其中索引信息包括元数据节点的索引区和一级索引区,以及数据节点的二级索引区;Step S301: Initialize the data area and index information of the metadata node in the distributed environment, where the index information includes the index area and primary index area of the metadata node, and the secondary index area of the data node;

步骤S302:客户节点向元数据节点发出写文件请求;Step S302: the client node sends a file write request to the metadata node;

步骤S303:元数据节点根据写文件请求判断元数据节点的数据区的剩余空间是否大于或等于该文件大小,如果是,则转入步骤S304,否则转入步骤S308;Step S303: The metadata node determines whether the remaining space of the data area of the metadata node is greater than or equal to the size of the file according to the request for writing the file, if yes, then go to step S304, otherwise go to step S308;

步骤S304:元数据节点接收客户节点的文件,并将该文件存储到元数据节点的数据区的剩余空间中;Step S304: the metadata node receives the file of the client node, and stores the file in the remaining space of the data area of the metadata node;

步骤S305:元数据节点更新其索引区的信息:Step S305: The metadata node updates the information of its index area:

步骤S306:元数据节点判断元数据节点的数据区中存储的数据是否大于一个阈值,如果是,则转入步骤S307,否则过程结束;Step S306: the metadata node judges whether the data stored in the data area of the metadata node is greater than a threshold, if yes, then go to step S307, otherwise the process ends;

步骤S307:元数据节点将其数据区的数据作为一个普通文件存于分布式文件系统中,并清空其数据区及索引区中的数据,过程结束;Step S307: The metadata node stores the data in its data area as an ordinary file in the distributed file system, and clears the data in its data area and index area, and the process ends;

步骤S308:元数据节点将其数据区的数据作为一个普通文件存于分布式文件系统中,并清空其数据区及索引区中的数据;Step S308: the metadata node stores the data in its data area as an ordinary file in the distributed file system, and clears the data in its data area and index area;

步骤S309:元数据节点接收客户节点的文件数据,并将其存储到其数据区的剩余空间中;Step S309: the metadata node receives the file data of the client node and stores it in the remaining space of its data area;

步骤S310:元数据节点更新其索引区的信息。Step S310: The metadata node updates the information of its index area.

文件的大小是介于0~1MB之间,普通文件的大小大于所述阈值。The size of the file is between 0~1MB, and the size of an ordinary file is larger than the threshold.

步骤305和步骤S310具体为,元数据节点在其索引区中添加一条新的表项,包括有文件ID、文件在数据区中的偏移、以及文件的大小。In step 305 and step S310, the metadata node adds a new entry in its index area, including the file ID, the offset of the file in the data area, and the size of the file.

步骤S301包括以下子步骤:Step S301 includes the following sub-steps:

步骤S401:判断是否已经对分布式环境下元数据节点的数据区及索引信息进行过初始化,如果是,则过程结束,否则转入步骤S402;Step S401: Determine whether the data area and index information of the metadata node in the distributed environment have been initialized, if yes, the process ends, otherwise, go to step S402;

步骤S402:元数据节点在其内存中开辟一个大小为M的区域,用以保存临时的文件,其中M为大于上述阈值的正整数;Step S402: the metadata node opens an area of size M in its memory to store temporary files, where M is a positive integer greater than the above threshold;

步骤S403:元数据节点设置索引区,用于存储每个文件在其数据区中的索引信息;Step S403: the metadata node sets an index area for storing the index information of each file in its data area;

步骤S404:元数据节点设置一级索引区,用于保存文件到数据节点的映射关系;Step S404: the metadata node sets a first-level index area for storing the mapping relationship between files and data nodes;

步骤S405:数据节点设置二级索引区,其位于数据节点中,用于存储文件的二级索引信息。Step S405: The data node sets a secondary index area, which is located in the data node and used to store the secondary index information of the file.

二级索引信息包括:文件到数据块的映射、文件在数据块内的偏移、及文件的大小。The secondary index information includes: a mapping from a file to a data block, the offset of the file in the data block, and the size of the file.

步骤S307和S308均包括以下子步骤:Both steps S307 and S308 include the following sub-steps:

步骤S501:元数据节点将其数据区的数据作为一个普通文件保存于分布式文件系统中;Step S501: the metadata node saves the data in its data area as an ordinary file in the distributed file system;

步骤S502:元数据节点将该普通文件的索引信息发送到相应的数据节点的二级索引区中,数据节点将该索引信息添加到其二级索引区;Step S502: the metadata node sends the index information of the ordinary file to the secondary index area of the corresponding data node, and the data node adds the index information to its secondary index area;

步骤S503:元数据节点根据文件的ID和数据节点ID更新其一级索引信息;Step S503: the metadata node updates its primary index information according to the file ID and the data node ID;

步骤S504:元数据节点清空其数据区中的数据;Step S504: the metadata node clears the data in its data area;

步骤S505:元数据节点清空其索引区中的数据。Step S505: The metadata node clears the data in its index area.

步骤S503具体为,元数据节点在其一级索引区中添加文件ID与数据节点ID的映射关系,以便进行文件的读取查询。Step S503 is specifically that the metadata node adds the mapping relationship between the file ID and the data node ID in its primary index area, so as to read and query the file.

通过本发明所构思的以上技术方案,与现有技术相比,本方法具有以下的有益效果:Through the above technical solutions conceived by the present invention, compared with the prior art, this method has the following beneficial effects:

(1)节省元数据节点的内存,增加分布式文件系统所能存储的文件数目:由于采用了步骤S301、S307以及S308,通过在元数据节点中存储文件的一级索引信息,在数据节点中存储文件的二级索引信息,因而降低了元数据节点的内存使用,增加了分布式文件系统所能存储的文件数目,且提高了数据节点的内存利用率。(1) Save the memory of the metadata node and increase the number of files that the distributed file system can store: due to the adoption of steps S301, S307 and S308, by storing the primary index information of the file in the metadata node, in the data node Store the secondary index information of the file, thus reducing the memory usage of the metadata node, increasing the number of files that the distributed file system can store, and improving the memory utilization rate of the data node.

(2)提高写文件的性能:由于采用了步骤S301、S307以及S308,通过在元数据节点的数据区中将许多文件进行合并后存储到分布式文件系统中,因而减少了客户节点与数据节点的交互次数,也减少了写大量文件所花费的时间。(2) Improve the performance of writing files: due to the adoption of steps S301, S307 and S308, many files are merged in the data area of the metadata node and stored in the distributed file system, thus reducing the number of client nodes and data nodes The number of interactions is also reduced, and the time spent writing large files is also reduced.

本发明的另一目的在于提供一种分布式环境下的文件快速读方法,旨在解决现有方法中存在的元数据节点负载过大,以及大量文件读效率低下的问题。Another object of the present invention is to provide a method for quickly reading files in a distributed environment, which aims to solve the problems of excessive load on metadata nodes and low efficiency of reading a large number of files existing in existing methods.

为实现上述目的,本发明提供了一种分布式环境下的文件快速读方法,包括以下步骤:In order to achieve the above object, the present invention provides a method for quickly reading files in a distributed environment, comprising the following steps:

步骤S601:客户节点向元数据节点发出读文件请求;Step S601: the client node sends a file read request to the metadata node;

步骤S602:客户节点判断其自身是否和分布式文件系统中该客户节点上一次读取文件所连接的数据节点保持着连接,若是,则转入步骤S603,否则转入步骤S606;Step S602: The client node judges whether it maintains a connection with the data node connected to the client node for reading the file last time in the distributed file system, if so, proceeds to step S603, otherwise proceeds to step S606;

步骤S603:客户节点向该数据节点发送读文件请求;Step S603: the client node sends a file read request to the data node;

步骤S604:数据节点根据其二级索引区中存储的二级索引信息进行查询,以判断其自身是否存储了读文件请求所对应的文件,若是则转入步骤S609,否则转入步骤S605;Step S604: The data node searches according to the secondary index information stored in its secondary index area to determine whether it has stored the file corresponding to the file read request, if so, proceed to step S609, otherwise proceed to step S605;

步骤S605:客户节点断开与该数据节点的连接;Step S605: the client node disconnects from the data node;

步骤S606:元数据节点根据其索引区中的信息查询该文件是否存在于其数据区中,若是则转入步骤S611,否则转入步骤S607;Step S606: The metadata node inquires whether the file exists in its data area according to the information in its index area, and if so, proceeds to step S611, otherwise proceeds to step S607;

步骤S607:元数据节点根据其一级索引信息查询存有该文件的数据节点;Step S607: the metadata node queries the data node storing the file according to its primary index information;

步骤S608:客户节点与该数据节点建立连接;Step S608: the client node establishes a connection with the data node;

步骤S609:数据节点根据二级索引信息查找该文件所在的数据块,根据二级索引信息获取文件,并将该文件发送给客户节点;Step S609: the data node searches for the data block where the file is located according to the secondary index information, obtains the file according to the secondary index information, and sends the file to the client node;

步骤S610:客户节点接收数据并保持与该数据节点的连接,然后过程结束;Step S610: the client node receives the data and maintains a connection with the data node, and then the process ends;

步骤S611:元数据节点根据其索引区中的索引信息从其数据区获取文件,并将该文件发送给客户节点。Step S611: the metadata node obtains the file from its data area according to the index information in its index area, and sends the file to the client node.

客户节点和数据节点之间的连接可以是TCP连接或UDP连接。The connection between the client node and the data node can be a TCP connection or a UDP connection.

通过本发明所构思的以上技术方案,与现有技术相比,本方法具有以下的有益效果:Through the above technical solutions conceived by the present invention, compared with the prior art, this method has the following beneficial effects:

(1)降低元数据节点的负载:由于采用了步骤S602和S610,客户节点会保持与上一次读取的文件所在的数据节点的连接,这样如果下一次要读取的文件也在该数据节点中(对于文件的读取通常具有局部性,在同一个数据块内的文件有可能被连续的读取),则客户节点不用连接元数据节点,因而降低了元数据节点的负载,提高了系统的响应速度。(1) Reduce the load on the metadata node: due to the adoption of steps S602 and S610, the client node will maintain a connection with the data node where the file read last time is located, so that if the file to be read next is also on the data node (The reading of files is usually localized, and the files in the same data block may be read continuously), the client node does not need to connect to the metadata node, thus reducing the load on the metadata node and improving the system performance. response speed.

(2)提高读文件的性能:由于采用了步骤S611,如果要读取的文件位于元数据节点的数据区中,客户节点可以直接从元数据节点的数据区中读取数据(比从磁盘中读快),且不用和数据节点进行连接和文件读取,因而可以明显提升文件读取的效率。由于采用了步骤S602和S610,客户节点直接连接数据节点进行文件的读取,因而可以提高读文件的性能。(2) Improve the performance of reading files: due to the adoption of step S611, if the file to be read is located in the data area of the metadata node, the client node can directly read the data from the data area of the metadata node (compared with the data from the disk). Read fast), and do not need to connect with data nodes and read files, so the efficiency of file reading can be significantly improved. Due to the adoption of steps S602 and S610, the client node directly connects to the data node to read the file, so the performance of reading the file can be improved.

附图说明Description of drawings

图1为本发明分布式环境下的文件快速读写方法所应用到的分布式文件系统架构图。FIG. 1 is an architecture diagram of a distributed file system to which the fast file reading and writing method in the distributed environment of the present invention is applied.

图2为本发明元数据节点的框架图。Fig. 2 is a frame diagram of metadata nodes in the present invention.

图3为本发明分布式环境下的文件快速写方法的流程图。FIG. 3 is a flow chart of the method for fast writing files in a distributed environment according to the present invention.

图4为本发明分布式环境下的文件快速写方法中步骤S301的细化流程图。FIG. 4 is a detailed flow chart of step S301 in the method for fast file writing in a distributed environment of the present invention.

图5为本发明分布式环境下的文件快速写方法中步骤S307/S308的细化流程图。FIG. 5 is a detailed flow chart of steps S307/S308 in the method for fast writing files in a distributed environment according to the present invention.

图6为本发明分布式环境下的文件快速读方法的流程图。FIG. 6 is a flow chart of the method for quickly reading files in a distributed environment according to the present invention.

具体实施方式detailed description

为了使本发明的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本发明进行进一步详细说明。应当理解,此处所描述的具体实施例仅仅用以解释本发明,并不用于限定本发明。In order to make the object, technical solution and advantages of the present invention clearer, the present invention will be further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the present invention, not to limit the present invention.

首先对本发明中的技术术语进行解释和定义:At first the technical terms in the present invention are explained and defined:

元数据节点:保存分布式文件系统的元数据(文件系统的命名空间,文件名->数据块的映射,数据块->数据节点的映射)。Metadata node: save the metadata of the distributed file system (the namespace of the file system, the mapping of file name -> data block, and the mapping of data block -> data node).

数据节点:存放实际的文件数据(一般以数据块的形式进行存储)。其通过心跳接受来自元数据节点的块操作命令。Data node: stores the actual file data (generally stored in the form of data blocks). It accepts block operation commands from metadata nodes through heartbeat.

客户节点:连接元数据节点进行文件信息查询,连接元数据节点和数据节点进行实际的文件传输。Client node: connect to metadata nodes for file information query, connect metadata nodes and data nodes for actual file transfer.

下面结合附图对本发明进行详细说明。The present invention will be described in detail below in conjunction with the accompanying drawings.

如图1所示,本发明分布式环境下的文件快速读写方法所应用到的分布式文件系统架构包括以下部分:As shown in Figure 1, the distributed file system framework to which the file fast reading and writing method in the distributed environment of the present invention is applied includes the following parts:

元数据节点:对于文件,元数据节点在内存中开辟一个数据区、索引区和一级索引区,数据区用于保存临时的文件,将文件进行合并,索引区用于存储每个文件在数据区的索引信息,一级索引区保存文件到数据节点的映射。元数据节点的框架图如图2所示;Metadata node: For files, the metadata node opens up a data area, index area and first-level index area in the memory. The data area is used to save temporary files and merge files. The index area is used to store the data in each file. The index information of the area, the first-level index area stores the mapping from files to data nodes. The frame diagram of the metadata node is shown in Figure 2;

数据节点:在内存中开辟一个二级索引区,存储了文件的二级索引信息,包括以下信息:文件到数据块的映射;文件在数据块内的偏移及文件的大小;以及Data node: Open up a secondary index area in the memory to store the secondary index information of the file, including the following information: the mapping from the file to the data block; the offset of the file in the data block and the size of the file; and

客户节点:连接元数据节点进行文件信息查询,连接元数据节点和数据节点进行实际的文件传输。Client node: connect to metadata nodes for file information query, connect metadata nodes and data nodes for actual file transfer.

如图2所示,本发明中元数据节点框架包括以下内容:As shown in Figure 2, the metadata node framework in the present invention includes the following contents:

数据区:用以保存临时的文件,将文件进行合并;Data area: used to save temporary files and merge files;

索引区:其用于存储每个文件在数据区的索引信息,索引区的索引项是定长索引,每一个文件对应一个索引项,索引项包括fileID、offset和length,其中fileID代表文件名,offset代表文件在数据区的偏移,length代表文件的大小,数据区中每增加一个文件的数据,都要在索引区中增加一个相应的索引项;Index area: It is used to store the index information of each file in the data area. The index item in the index area is a fixed-length index. Each file corresponds to an index item. The index item includes fileID, offset and length, where fileID represents the file name, offset represents the offset of the file in the data area, and length represents the size of the file. For each additional file data in the data area, a corresponding index item must be added to the index area;

一级索引区:其是一个全局索引,一级索引项包括fileID和数据节点ID的映射,数据节点ID标识某个特定的数据节点,对于合并成文件且存储到数据节点中的文件,将其存储信息添加到一级索引中,以便客户节点读取文件时能定位到存储文件的具体数据节点。First-level index area: it is a global index. The first-level index item includes the mapping between fileID and data node ID. The data node ID identifies a specific data node. For files that are merged into files and stored in the data node, the The storage information is added to the first-level index, so that when the client node reads the file, it can locate the specific data node that stores the file.

如图3所示,本发明分布式环境下的文件快速写方法包括以下步骤:As shown in Figure 3, the file fast writing method under the distributed environment of the present invention comprises the following steps:

步骤S301:对分布式环境下元数据节点的数据区及索引信息进行初始化,其中索引信息包括元数据节点的索引区和一级索引区,以及数据节点的二级索引区;Step S301: Initialize the data area and index information of the metadata node in the distributed environment, where the index information includes the index area and primary index area of the metadata node, and the secondary index area of the data node;

步骤S302:客户节点向元数据节点发出写文件请求;在本发明中,文件的大小是介于0~1MB之间;Step S302: the client node sends a file write request to the metadata node; in the present invention, the file size is between 0 and 1MB;

步骤S303:元数据节点根据写文件请求判断元数据节点的数据区的剩余空间是否大于或等于该文件大小,如果是,则转入步骤S304,否则转入步骤S308;Step S303: The metadata node determines whether the remaining space of the data area of the metadata node is greater than or equal to the size of the file according to the request for writing the file, if yes, then go to step S304, otherwise go to step S308;

步骤S304:元数据节点接收客户节点的文件,并将该文件存储到元数据节点的数据区的剩余空间中;Step S304: the metadata node receives the file of the client node, and stores the file in the remaining space of the data area of the metadata node;

步骤S305:元数据节点更新其索引区的信息:具体而言,元数据节点在其索引区中添加一条新的表项,包括有文件ID、文件在数据区中的偏移、以及文件的大小;Step S305: The metadata node updates the information in its index area: specifically, the metadata node adds a new entry in its index area, including the file ID, the offset of the file in the data area, and the size of the file ;

步骤S306:元数据节点判断元数据节点的数据区中存储的数据是否大于一个阈值,如果是,则转入步骤S307,否则过程结束;具体而言,阈值的取值范围是60至63Mb;Step S306: the metadata node judges whether the data stored in the data area of the metadata node is greater than a threshold, if yes, then proceed to step S307, otherwise the process ends; specifically, the value range of the threshold is 60 to 63Mb;

步骤S307:元数据节点将其数据区的数据作为一个普通文件存于分布式文件系统中,并清空其数据区及索引区中的数据,过程结束;具体而言,普通文件是指文件大小大于上述阈值的文件;Step S307: The metadata node stores the data in its data area as an ordinary file in the distributed file system, and clears the data in its data area and index area, and the process ends; specifically, an ordinary file refers to a file whose size is larger than Documentation of the above thresholds;

步骤S308:元数据节点将其数据区的数据作为一个普通文件存于分布式文件系统中,并清空其数据区及索引区中的数据;Step S308: the metadata node stores the data in its data area as an ordinary file in the distributed file system, and clears the data in its data area and index area;

步骤S309:元数据节点接收客户节点的文件数据,并将其存储到其数据区的剩余空间中;Step S309: the metadata node receives the file data of the client node and stores it in the remaining space of its data area;

步骤S310:元数据节点更新其索引区的信息:具体而言,元数据节点在其索引区中添加一条新的表项,包括有文件ID、文件在数据区中的偏移、以及文件的大小;Step S310: The metadata node updates the information in its index area: specifically, the metadata node adds a new entry in its index area, including the file ID, the offset of the file in the data area, and the size of the file ;

如图4所示,本发明方法中的步骤S301包括以下子步骤:As shown in Figure 4, step S301 in the method of the present invention includes the following sub-steps:

步骤S401:判断是否已经对分布式环境下元数据节点的数据区及索引信息进行过初始化,如果是,则过程结束,否则转入步骤S402;Step S401: Determine whether the data area and index information of the metadata node in the distributed environment have been initialized, if yes, the process ends, otherwise, go to step S402;

步骤S402:元数据节点在其内存中开辟一个大小为M的区域,用以保存临时的文件,其中M为大于上述阈值的正整数,其取值范围为64-128Mb;Step S402: The metadata node opens an area of size M in its memory to store temporary files, where M is a positive integer greater than the above threshold, and its value range is 64-128Mb;

步骤S403:元数据节点设置索引区,用于存储每个文件在其数据区中的索引信息;Step S403: the metadata node sets an index area for storing the index information of each file in its data area;

步骤S404:元数据节点设置一级索引区,用于保存文件到数据节点的映射关系;Step S404: the metadata node sets a first-level index area for storing the mapping relationship between files and data nodes;

步骤S405:数据节点设置二级索引区,其位于数据节点中,用于存储文件的二级索引信息;具体而言,二级索引信息包括:文件到数据块的映射、文件在数据块内的偏移、及文件的大小。Step S405: The data node sets the secondary index area, which is located in the data node and is used to store the secondary index information of the file; specifically, the secondary index information includes: the mapping from the file to the data block, the Offset, and file size.

如图5所示,本发明方法中的步骤S307和S308均包括以下子步骤:As shown in Figure 5, steps S307 and S308 in the method of the present invention all include the following sub-steps:

步骤S501:元数据节点将其数据区的数据作为一个普通文件保存于分布式文件系统中;Step S501: the metadata node saves the data in its data area as an ordinary file in the distributed file system;

步骤S502:元数据节点将该普通文件的索引信息发送到相应的数据节点的二级索引区中,数据节点将该索引信息添加到其二级索引区;Step S502: the metadata node sends the index information of the ordinary file to the secondary index area of the corresponding data node, and the data node adds the index information to its secondary index area;

步骤S503:元数据节点根据文件的ID和数据节点ID更新其一级索引信息;具体而言,元数据节点在其一级索引区中添加文件ID与数据节点ID的映射关系,以便进行文件的读取查询;Step S503: The metadata node updates its first-level index information according to the file ID and data node ID; specifically, the metadata node adds the mapping relationship between the file ID and the data node ID in its first-level index area, so that the file read query;

步骤S504:元数据节点清空其数据区中的数据;Step S504: the metadata node clears the data in its data area;

步骤S505:元数据节点清空其索引区中的数据。Step S505: The metadata node clears the data in its index area.

如图6所示,本发明分布式环境下的文件快速读方法包括以下步骤:As shown in Figure 6, the file fast reading method under the distributed environment of the present invention comprises the following steps:

步骤S601:客户节点向元数据节点发出读文件请求;Step S601: the client node sends a file read request to the metadata node;

步骤S602:客户节点判断其自身是否和分布式文件系统中该客户节点上一次读取文件所连接的数据节点保持着连接,若是,则转入步骤S603,否则转入步骤S606;具体而言,客户节点和数据节点之间的连接可以是TCP连接或UDP连接;Step S602: The client node judges whether it maintains a connection with the data node connected to the client node last read file in the distributed file system, and if so, proceeds to step S603, otherwise proceeds to step S606; specifically, The connection between the client node and the data node can be a TCP connection or a UDP connection;

步骤S603:客户节点向该数据节点发送读文件请求;Step S603: the client node sends a file read request to the data node;

步骤S604:数据节点根据其二级索引区中存储的二级索引信息进行查询,以判断其自身是否存储了读文件请求所对应的文件,若是则转入步骤S609,否则转入步骤S605;具体而言,二级索引信息包括:文件到数据块的映射、文件在数据块内的偏移、及文件的大小;Step S604: The data node searches according to the secondary index information stored in its secondary index area to determine whether it has stored the file corresponding to the file read request, and if so, proceed to step S609; otherwise, proceed to step S605; Specifically, the secondary index information includes: the mapping of files to data blocks, the offset of files in data blocks, and the size of files;

步骤S605:客户节点断开与该数据节点的连接;Step S605: the client node disconnects from the data node;

步骤S606:元数据节点根据其索引区中的信息查询该文件是否存在于其数据区中,若是则转入步骤S611,否则转入步骤S607;Step S606: The metadata node inquires whether the file exists in its data area according to the information in its index area, and if so, proceeds to step S611, otherwise proceeds to step S607;

步骤S607:元数据节点根据其一级索引信息(即文件ID到数据节点的映射关系)查询存有该文件的数据节点;Step S607: The metadata node queries the data node storing the file according to its primary index information (ie, the mapping relationship between the file ID and the data node);

步骤S608:客户节点与该数据节点建立连接;Step S608: the client node establishes a connection with the data node;

步骤S609:数据节点根据二级索引信息查找该文件所在的数据块,根据二级索引信息获取文件,并将该文件发送给客户节点;Step S609: the data node searches for the data block where the file is located according to the secondary index information, obtains the file according to the secondary index information, and sends the file to the client node;

步骤S610:客户节点接收数据并保持与该数据节点的连接,然后过程结束;Step S610: the client node receives the data and maintains a connection with the data node, and then the process ends;

步骤S611:元数据节点根据其索引区中的索引信息从其数据区获取文件,并将该文件发送给客户节点。Step S611: the metadata node obtains the file from its data area according to the index information in its index area, and sends the file to the client node.

本领域的技术人员容易理解,以上所述仅为本发明的较佳实施例而已,并不用以限制本发明,凡在本发明的精神和原则之内所作的任何修改、等同替换和改进等,均应包含在本发明的保护范围之内。It is easy for those skilled in the art to understand that the above descriptions are only preferred embodiments of the present invention, and are not intended to limit the present invention. Any modifications, equivalent replacements and improvements made within the spirit and principles of the present invention, All should be included within the protection scope of the present invention.

Claims (7)

1. the quick write method of the file under distributed environment, is characterized in that, comprise the following steps:
Step S301: carry out initialization to the data field of metadata node under distributed environment and index information, wherein index information comprises index area and the one-level index area of metadata node, and the secondary index district of back end; This step comprises following sub-step:
Step S401: judge whether to carry out initialization to the data field of metadata node under distributed environment and index information, if so, then process terminates, otherwise proceeds to step S402;
Step S402: metadata node opens up a size within it in depositing be the region of M, in order to preserve interim file, wherein M is the positive integer being greater than a threshold value;
Step S403: metadata node arranges index area, for storing the index information of each file in its data field;
Step S404: metadata node arranges one-level index area, for preserving the mapping relations of file to back end;
Step S405: back end arranges secondary index district, and it is arranged in back end, for the secondary index information of storage file;
Step S302: client node sends written document request to metadata node;
Step S303: metadata node judges according to written document request whether the remaining space of the data field of metadata node is more than or equal to this file size, if so, then proceeds to step S304, otherwise proceeds to step S308;
Step S304: metadata node receives the file of client node, and is stored into by this file in the remaining space of the data field of metadata node;
Step S305: metadata node upgrades the information of its index area:
Step S306: metadata node judges whether the data stored in the data field of metadata node are greater than above-mentioned threshold value, and if so, then proceed to step S307, else process terminates;
Step S307: the data of its data field are stored in distributed file system as an ordinary file by metadata node, and empty the data in its data field and index area, and process terminates;
Step S308: the data of its data field are stored in distributed file system as an ordinary file by metadata node, and empty the data in its data field and index area; Step S307 and step S308 includes following sub-step:
Step S501: the data of its data field are stored in distributed file system as an ordinary file by metadata node;
Step S502: the index information of this ordinary file is sent in the secondary index district of corresponding back end by metadata node, back end adds this index information to its secondary index district;
Step S503: metadata node upgrades its one-level index information according to the ID of file and back end ID;
Step S504: metadata node empties the data in its data field;
Step S505: metadata node empties the data in its index area;
Step S309: metadata node receives the file data of client node, and is stored in the remaining space of its data field;
Step S310: metadata node upgrades the information of its index area.
2. the quick write method of file according to claim 1, is characterized in that, the size of file is between 0 ~ 1MB, and the size of ordinary file is greater than described threshold value.
3. the quick write method of file according to claim 1, it is characterized in that, step 305 and step S310 are specially, and metadata node adds a new list item in its index area, includes the skew within a data area of file ID, file and the size of file.
4. the quick write method of file according to claim 1, is characterized in that, secondary index information comprises: file is to the mapping of data block, the skew of file in data block and the size of file.
5. the quick write method of file according to claim 1, it is characterized in that, step S503 is specially, and metadata node adds the mapping relations of file ID and back end ID in its one-level index area, to carry out the reading inquiry of file.
6. the fast fast reading method of the file under distributed environment, is characterized in that, comprise the following steps:
Step S601: client node sends to metadata node and reads file request;
Step S602: client node judges that the back end whether himself is connected with the last file reading of this client node in distributed file system remains connection, if so, then proceeds to step S603, otherwise proceeds to step S606;
Step S603: client node sends to this back end and reads file request;
Step S604: back end is inquired about according to the secondary index information stored in its secondary index district, to judge that whether himself stores the file read corresponding to file request, if then proceed to step S609, otherwise proceeds to step S605;
Step S605: client node disconnects the connection with this back end;
Step S606: whether metadata node is present in its data field according to this file of the information inquiry in its index area, if then proceed to step S611, otherwise proceeds to step S607;
Step S607: metadata node has the back end of this file according to the inquiry of its one-level index information;
Step S608: client node and this back end connect;
Step S609: this file, according to the data block at this file place of secondary index information searching, according to secondary index acquisition of information file, and is sent to client node by back end;
Step S610: client node receives data and keeps the connection with this back end, and then process terminates;
Step S611: metadata node obtains file according to the index information in its index area from its data field, and this file is sent to client node.
7. the fast fast reading method of file according to claim 6, is characterized in that, the connection between client node and back end can be that TCP connects or UDP connects.
CN201210590615.5A 2012-12-29 2012-12-29 File rapid read-write method under a kind of distributed environment Expired - Fee Related CN103092927B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201210590615.5A CN103092927B (en) 2012-12-29 2012-12-29 File rapid read-write method under a kind of distributed environment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210590615.5A CN103092927B (en) 2012-12-29 2012-12-29 File rapid read-write method under a kind of distributed environment

Publications (2)

Publication Number Publication Date
CN103092927A CN103092927A (en) 2013-05-08
CN103092927B true CN103092927B (en) 2016-01-20

Family

ID=48205492

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210590615.5A Expired - Fee Related CN103092927B (en) 2012-12-29 2012-12-29 File rapid read-write method under a kind of distributed environment

Country Status (1)

Country Link
CN (1) CN103092927B (en)

Families Citing this family (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103595797B (en) * 2013-11-18 2017-01-18 上海爱数信息技术股份有限公司 Caching method for distributed storage system
CN104750708B (en) * 2013-12-27 2018-09-28 华为技术有限公司 A kind of index establishing method of space-time data, querying method, device and equipment
CN104205780B (en) * 2014-01-23 2017-06-27 华为技术有限公司 A method and device for storing data
CN105279166B (en) * 2014-06-20 2019-01-25 中国电信股份有限公司 File management method and system
CN104965835B (en) * 2014-07-30 2018-12-07 浙江大华技术股份有限公司 A kind of file read/write method and device of distributed file system
CN105630779A (en) * 2014-10-27 2016-06-01 杭州海康威视系统技术有限公司 Hadoop distributed file system based small file storage method and apparatus
CN106326239B (en) * 2015-06-18 2020-01-31 阿里巴巴集团控股有限公司 Distributed file system and file meta-information management method thereof
CN105912428B (en) * 2016-05-20 2019-01-08 上海数腾软件科技股份有限公司 Realize that source data is converted into the system and method for virtual machine image in real time
CN109739434A (en) * 2018-12-03 2019-05-10 中科恒运股份有限公司 File reads address acquiring method, file reading and terminal device
CN110109622A (en) * 2019-04-28 2019-08-09 平安科技(深圳)有限公司 A kind of data processing method and relevant apparatus based on middleware
CN111581015B (en) * 2020-04-14 2021-06-29 上海爱数信息技术股份有限公司 Continuous data protection system and method for modern application
CN111858494B (en) * 2020-07-23 2024-05-17 珠海豹趣科技有限公司 File acquisition method and device, storage medium and electronic equipment
CN113986828A (en) * 2021-10-28 2022-01-28 浙江立元科技有限公司 Method and device for storing mass files, electronic equipment and storage medium
CN113703413B (en) * 2021-11-01 2022-01-25 西安热工研究院有限公司 Data interaction method and system, device and storage medium based on secondary index
CN118069650B (en) * 2024-02-27 2025-03-28 中国船舶科学研究中心 Distributed data management method based on key value

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101866359A (en) * 2010-06-24 2010-10-20 北京航空航天大学 A small file storage and access method in a cluster file system
CN102075584A (en) * 2011-01-30 2011-05-25 中国科学院计算技术研究所 Distributed file system and access method thereof
CN102332029A (en) * 2011-10-15 2012-01-25 西安交通大学 A method for associative storage of massive classifiable small files based on Hadoop

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8751561B2 (en) * 2008-04-08 2014-06-10 Roderick B. Wideman Methods and systems for improved throughput performance in a distributed data de-duplication environment
US8510267B2 (en) * 2011-03-08 2013-08-13 Rackspace Us, Inc. Synchronization of structured information repositories

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101866359A (en) * 2010-06-24 2010-10-20 北京航空航天大学 A small file storage and access method in a cluster file system
CN102075584A (en) * 2011-01-30 2011-05-25 中国科学院计算技术研究所 Distributed file system and access method thereof
CN102332029A (en) * 2011-10-15 2012-01-25 西安交通大学 A method for associative storage of massive classifiable small files based on Hadoop

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
一种Hadoop小文件存储和读取的方法;张春明等;《计算机应用与软件》;20121130;第29卷(第11期);第95-100页 *
一种优化分布式文件系统的文件合并策略;陈剑等;《计算机应用》;20111231;第31卷;第161-163页 *

Also Published As

Publication number Publication date
CN103092927A (en) 2013-05-08

Similar Documents

Publication Publication Date Title
CN103092927B (en) File rapid read-write method under a kind of distributed environment
CN102662992B (en) Method and device for storing and accessing massive small files
US20210019257A1 (en) Persistent memory storage engine device based on log structure and control method thereof
CN104462225B (en) The method, apparatus and system of a kind of digital independent
CN100561386C (en) A data storage method and device
AU2017218964A1 (en) Cloud-based distributed persistence and cache data model
CN101566927B (en) Storage system, storage controller and data caching method
CN106909651A (en) A kind of method for being write based on HDFS small documents and being read
CN103176754A (en) Reading and storing method for massive amounts of small files
CN106021381A (en) Data access/storage method and device for cloud storage service system
CN103064639A (en) Method and device for storing data
WO2014101108A1 (en) Caching method for distributed storage system, node and computer readable medium
CN104111804A (en) Distributed file system
CN103034684A (en) Optimizing method for storing virtual machine mirror images based on CAS (content addressable storage)
CN102821138A (en) Metadata distributed storage method applicable to cloud storage system
WO2018137327A1 (en) Data transmission method for host and standby devices, control node, and database system
CN101986655A (en) Storage network and data reading and writing method thereof
CN102693286A (en) Method for organizing and managing file content and metadata
CN102420814A (en) Data access method and device and server
CN107291876A (en) A kind of DDM method
WO2021143351A1 (en) Distributed retrieval method, apparatus and system, computer device, and storage medium
CN102915340A (en) Expanded B+ tree-based object file system
CN110276713A (en) A high-efficiency caching method and system for remote sensing image data
CN107241444A (en) A kind of distributed caching data management system, method and device
JPWO2014010038A1 (en) Information processing system

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20160120

Termination date: 20211229