CN107391761B

CN107391761B - A data management method and device based on deduplication technology

Info

Publication number: CN107391761B
Application number: CN201710750609.4A
Authority: CN
Inventors: 胡永刚; 王利朋
Original assignee: Suzhou Wave Intelligent Technology Co Ltd
Current assignee: Suzhou Metabrain Intelligent Technology Co Ltd
Priority date: 2017-08-28
Filing date: 2017-08-28
Publication date: 2020-03-06
Anticipated expiration: 2037-08-28
Also published as: CN107391761A

Abstract

The invention discloses a data management method and a device based on a repeated data deleting technology, wherein the method calculates a fingerprint value of target data through a HASH algorithm; determining a storage position corresponding to the fingerprint value of the target data through CRUSH mapping; then, the target data is used as data to be stored, and whether the storage position of the data to be stored is stored with data is judged; if yes, adding one to the reference count of the data to be stored; if the first metadata information of the data to be stored is not stored, the data to be stored is stored, the reference count of the data to be stored is set to be one, and finally the first metadata information of the data to be stored is stored. Therefore, in the process of data storage, the repeated storage of data is avoided, and the working efficiency is improved; meanwhile, based on the repeated data technology, the management of data is realized, the cost is saved, and the service life of the storage system is prolonged. The data management device based on the data de-duplication technology provided by the embodiment of the invention also has the technical effects.

Description

A data management method and device based on deduplication technology

技术领域technical field

本发明涉及云计算数据中心技术领域，更具体地说，涉及一种基于重复数据删除技术的数据管理方法及装置。The present invention relates to the technical field of cloud computing data centers, and more particularly, to a data management method and device based on duplicate data deletion technology.

背景技术Background technique

随着计算机技术以及互联网行业的迅猛发展，数据信息日益增多，为了节约存储空间，实现资源共享，分布式存储系统应运而生。分布式存储系统将数据分散存储在多台独立的设备上，采用可扩展的系统结构，利用多台存储服务器分担存储负荷，利用位置服务器定位存储信息，可以提高系统的可靠性、可用性和管理效率，还易于扩展。With the rapid development of computer technology and the Internet industry, data information is increasing day by day. In order to save storage space and realize resource sharing, distributed storage systems emerge as the times require. The distributed storage system stores data in multiple independent devices, adopts a scalable system structure, uses multiple storage servers to share the storage load, and uses the location server to locate and store information, which can improve the reliability, availability and management efficiency of the system , but also easy to extend.

但是，由于众多终端都可以访问存储服务器，其中必然存在大量重复数据，占用存储空间，此时优化存储容量的重复数据删除技术解决了这一问题。重复数据删除技术通过消除存储系统中重复的数据，缩减系统中实际存储的数据或通过网络传输的数据，在备份、长期归档和数据灾难恢复等方面得到了广泛的应用。而在分布式存储领域中，为了降低存储单位容量成本，在线重复数据的处理迫在眉睫。However, since many terminals can access the storage server, there must be a large amount of duplicate data, which occupies storage space. At this time, the deduplication technology that optimizes storage capacity solves this problem. Data deduplication technology has been widely used in backup, long-term archiving and data disaster recovery by eliminating duplicate data in the storage system and reducing the data actually stored in the system or the data transmitted through the network. In the field of distributed storage, in order to reduce the cost of storage unit capacity, the processing of online duplicate data is imminent.

因此，如何在分布式存储领域中，实现重复数据技术，即利用重复数据技术在分布式存储领域中，实现数据的存储、读取以及删除操作，是本领域技术人员需要解决的问题。Therefore, how to implement the repeated data technology in the field of distributed storage, that is, to realize the storage, reading and deletion of data in the field of distributed storage by using the repeated data technology, is a problem to be solved by those skilled in the art.

发明内容SUMMARY OF THE INVENTION

本发明的目的在于提供一种基于重复数据删除技术的数据管理方法及装置，以实现在分布式存储领域中，基于重复数据技术进行数据的存储、读取以及删除操作。The purpose of the present invention is to provide a data management method and device based on the repeated data deletion technology, so as to realize the storage, reading and deletion of data based on the repeated data technology in the field of distributed storage.

为实现上述目的，本发明实施例提供了如下技术方案：To achieve the above purpose, the embodiments of the present invention provide the following technical solutions:

一种基于重复数据删除技术的数据管理方法，包括：A data management method based on data deduplication technology, comprising:

S11、通过HASH算法计算目标数据的指纹值；S11. Calculate the fingerprint value of the target data through the HASH algorithm;

S12、通过CRUSH映射确定与所述目标数据的指纹值对应的存储位置；将所述目标数据作为待存数据，并执行S13；S12, determine the storage location corresponding to the fingerprint value of the target data through CRUSH mapping; take the target data as the data to be stored, and execute S13;

S13、判断与待存数据对应的存储位置中是否存有数据；若是，则执行S14；若否，则执行S15；S13, determine whether there is data in the storage location corresponding to the data to be stored; if so, execute S14; if not, execute S15;

S14、将与待存数据对应的引用计数加一，并执行S16；S14, add one to the reference count corresponding to the data to be stored, and execute S16;

S15、将待存数据存储至与待存数据对应的存储位置，并将与待存数据对应的引用计数置一，并执行S16；S15, store the data to be stored in the storage location corresponding to the data to be stored, and set the reference count corresponding to the data to be stored to one, and execute S16;

S16、存储待存数据的第一元数据信息，所述第一元数据信息包括：待存数据的指纹值。S16. Store first metadata information of the data to be stored, where the first metadata information includes: a fingerprint value of the data to be stored.

其中，执行所述S11之前，还包括：Wherein, before executing the S11, it also includes:

S21、判断是否存在与所述目标数据对应的第二元数据信息；若是，则执行S22；若否，则执行S11；S21, determine whether there is second metadata information corresponding to the target data; if so, execute S22; if not, execute S11;

S22、获取所述第二元数据信息；S22, acquiring the second metadata information;

S23、判断所述第二元数据信息中是否存在指纹值；若是，则执行S24；若否，则执行S11；S23, determine whether there is a fingerprint value in the second metadata information; if so, execute S24; if not, execute S11;

S24、比较所述目标数据的长度与预设的数据长度；若所述目标数据的长度等于所述预设的数据长度，则执行S11；若所述目标数据的长度小于所述预设的数据长度，则执行S25；S24. Compare the length of the target data with the preset data length; if the length of the target data is equal to the preset data length, execute S11; if the length of the target data is less than the preset data length, execute S25;

S25、将所述目标数据以及与所述第二元数据信息对应的数据进行拼接，得到拼接数据，并计算所述拼接数据的指纹值，并执行S26；S25, splicing the target data and the data corresponding to the second metadata information to obtain spliced data, and calculate the fingerprint value of the spliced data, and execute S26;

S26、通过CRUSH映射确定与所述拼接数据的指纹值对应的存储位置；将所述拼接数据作为待存数据，并执行S13。S26. Determine the storage location corresponding to the fingerprint value of the spliced data through CRUSH mapping; take the spliced data as the data to be stored, and execute S13.

其中，所述若所述目标数据的长度等于所述预设的数据长度，包括：Wherein, the if the length of the target data is equal to the preset data length, including:

若所述目标数据的长度等于所述预设的数据长度，则将所述第二元数据信息对应的数据的引用计数减一；If the length of the target data is equal to the preset data length, decrement the reference count of the data corresponding to the second metadata information by one;

判断所述第二元数据信息对应的数据的引用计数是否为零；judging whether the reference count of the data corresponding to the second metadata information is zero;

若是，则删除所述第二元数据信息对应的数据。If so, delete the data corresponding to the second metadata information.

其中，所述将所述目标数据以及与所述第二元数据信息对应的数据进行拼接，得到拼接数据，并计算所述拼接数据的指纹值，包括：Wherein, splicing the target data and the data corresponding to the second metadata information to obtain spliced data, and calculating the fingerprint value of the spliced data, including:

获取与所述第二元数据信息对应的数据内容；acquiring data content corresponding to the second metadata information;

按照所述预设的数据长度以及数据偏移量，将所述目标数据以及与所述第二元数据信息对应的数据进行拼接，得到拼接数据；According to the preset data length and data offset, the target data and the data corresponding to the second metadata information are spliced to obtain spliced data;

计算所述拼接数据的指纹值；Calculate the fingerprint value of the spliced data;

将所述第二元数据信息对应的数据的引用计数减一；Decrement the reference count of the data corresponding to the second metadata information by one;

其中，还包括：Among them, it also includes:

接收客户端发送的删除请求；Receive the deletion request sent by the client;

根据所述删除请求确定待删数据，并获取所述待删数据的第三元数据信息和所述第三元数据信息中的待删数据的指纹值；Determine the data to be deleted according to the deletion request, and obtain the third metadata information of the data to be deleted and the fingerprint value of the data to be deleted in the third metadata information;

通过CRUSH映射确定与所述待删数据的指纹值对应的存储位置，并将与所述待删数据对应的引用计数减一；Determine the storage location corresponding to the fingerprint value of the data to be deleted through CRUSH mapping, and decrement the reference count corresponding to the data to be deleted by one;

判断与所述待删数据对应的引用计数是否为零；Determine whether the reference count corresponding to the data to be deleted is zero;

若是，则删除所述待删数据和所述第三元数据信息。If so, delete the data to be deleted and the third metadata information.

一种基于重复数据删除技术的数据管理装置，包括：A data management device based on deduplication technology, comprising:

第一计算模块，用于通过HASH算法计算目标数据的指纹值；The first calculation module is used to calculate the fingerprint value of the target data through the HASH algorithm;

第一确定模块，用于通过CRUSH映射确定与所述目标数据的指纹值对应的存储位置，将所述目标数据作为待存数据；The first determination module is used to determine the storage location corresponding to the fingerprint value of the target data through CRUSH mapping, and use the target data as the data to be stored;

第一判断模块，用于判断与待存数据对应的存储位置中是否存有数据；a first judging module for judging whether there is data in the storage location corresponding to the data to be stored;

第一执行模块，用于当与待存数据对应的存储位置中存有数据时，将与待存数据对应的引用计数加一；a first execution module, configured to add one to the reference count corresponding to the data to be stored when data is stored in the storage location corresponding to the data to be stored;

第一存储模块，用于当与待存数据对应的存储位置中未存有数据时，将待存数据存储至与待存数据对应的存储位置，并将与待存数据对应的引用计数置一；The first storage module is used to store the to-be-stored data in the storage location corresponding to the to-be-stored data when there is no data in the storage location corresponding to the to-be-stored data, and set the reference count corresponding to the to-be-stored data to one ;

第二存储模块，用于存储待存数据的第一元数据信息，所述第一元数据信息包括：待存数据的指纹值。The second storage module is configured to store first metadata information of the data to be stored, where the first metadata information includes: a fingerprint value of the data to be stored.

其中，还包括：Among them, it also includes:

第二判断模块，用于判断是否存在与所述目标数据对应的第二元数据信息；若不存在，则触发所述第一计算模块；a second judgment module, configured to judge whether there is second metadata information corresponding to the target data; if not, trigger the first calculation module;

第一获取模块，用于当存在与所述目标数据对应的第二元数据信息时，获取所述第二元数据信息；a first obtaining module, configured to obtain the second metadata information when there is second metadata information corresponding to the target data;

第三判断模块，用于判断所述第二元数据信息中是否存在指纹值；若不存在，则触发所述第一计算模块；a third judging module for judging whether there is a fingerprint value in the second metadata information; if not, triggering the first computing module;

比较模块，用于当所述第二元数据信息中存在指纹值时，比较所述目标数据的长度与预设的数据长度；若所述目标数据的长度等于所述预设的数据长度，则触发所述第一计算模块；a comparison module, configured to compare the length of the target data with a preset data length when there is a fingerprint value in the second metadata information; if the length of the target data is equal to the preset data length, then triggering the first calculation module;

拼接模块，用于当所述目标数据的长度小于所述预设的数据长度时，将所述目标数据以及与所述第二元数据信息对应的数据进行拼接，得到拼接数据，并计算所述拼接数据的指纹值；A splicing module, configured to splicing the target data and the data corresponding to the second metadata information when the length of the target data is less than the preset data length to obtain splicing data, and calculate the The fingerprint value of the concatenated data;

第二确定模块，用于通过CRUSH映射确定与所述拼接数据的指纹值对应的存储位置。The second determining module is configured to determine the storage location corresponding to the fingerprint value of the spliced data through CRUSH mapping.

其中，所述比较模块包括：Wherein, the comparison module includes:

第一执行单元，用于当所述目标数据的长度等于所述预设的数据长度时，将所述第二元数据信息对应的数据的引用计数减一；a first execution unit, configured to decrement the reference count of the data corresponding to the second metadata information by one when the length of the target data is equal to the preset data length;

第一判断单元，用于判断所述第二元数据信息对应的数据的引用计数是否为零；a first judging unit, configured to judge whether the reference count of the data corresponding to the second metadata information is zero;

第一删除单元，用于当所述第二元数据信息对应的数据的引用计数为零时，删除所述第二元数据信息对应的数据。A first deletion unit, configured to delete the data corresponding to the second metadata information when the reference count of the data corresponding to the second metadata information is zero.

其中，所述拼接模块包括：Wherein, the splicing module includes:

获取单元，用于获取与所述第二元数据信息对应的数据内容；an acquiring unit, configured to acquire data content corresponding to the second metadata information;

拼接单元，用于按照所述预设的数据长度以及数据偏移量，将所述目标数据以及与所述第二元数据信息对应的数据进行拼接，得到拼接数据；a splicing unit, configured to splicing the target data and the data corresponding to the second metadata information according to the preset data length and data offset to obtain splicing data;

计算单元，用于计算所述拼接数据的指纹值；a computing unit for computing the fingerprint value of the spliced data;

第二执行单元，用于将所述第二元数据信息对应的数据的引用计数减一；a second execution unit, configured to decrement the reference count of the data corresponding to the second metadata information by one;

第二判断单元，用于判断所述第二元数据信息对应的数据的引用计数是否为零；a second judging unit, configured to judge whether the reference count of the data corresponding to the second metadata information is zero;

第二删除单元，用于当所述第二元数据信息对应的数据的引用计数为零时，删除所述第二元数据信息对应的数据。A second deletion unit, configured to delete the data corresponding to the second metadata information when the reference count of the data corresponding to the second metadata information is zero.

其中，还包括：Among them, it also includes:

接收模块，用于接收客户端发送的删除请求；The receiving module is used to receive the deletion request sent by the client;

第二获取模块，用于根据所述删除请求确定待删数据，并获取所述待删数据的第三元数据信息和所述第三元数据信息中的待删数据的指纹值；a second acquiring module, configured to determine the data to be deleted according to the deletion request, and acquire third metadata information of the data to be deleted and the fingerprint value of the data to be deleted in the third metadata information;

第三确定模块，用于通过CRUSH映射确定与所述待删数据的指纹值对应的存储位置，并将与所述待删数据对应的引用计数减一；A third determining module, configured to determine the storage location corresponding to the fingerprint value of the data to be deleted through CRUSH mapping, and decrement the reference count corresponding to the data to be deleted by one;

第四判断模块，用于判断与所述待删数据对应的引用计数是否为零；The fourth judgment module is used for judging whether the reference count corresponding to the data to be deleted is zero;

删除模块，用于当与所述待删数据对应的引用计数为零时，删除所述待删数据和所述第三元数据信息。A deletion module, configured to delete the to-be-deleted data and the third metadata information when the reference count corresponding to the to-be-deleted data is zero.

通过以上方案可知，本发明实施例提供的一种基于重复数据删除技术的数据管理方法，包括：As can be seen from the above solutions, a data management method based on the deduplication technology provided by the embodiment of the present invention includes:

可见，上述方法通过HASH算法计算目标数据的指纹值；并通过CRUSH映射确定与所述目标数据的指纹值对应的存储位置；其中，指纹值确定了目标数据的唯一性，进而确定了目标数据存储位置的唯一性；进而将所述目标数据作为待存数据，判断与待存数据对应的存储位置中是否存有数据；由于待存数据具有唯一的存储位置，所以若存储位置中已经存有数据，则表明待存数据已经存储，便不再存储待存数据，而是将与待存数据对应的引用计数加一；若存储位置中未存有数据，则表明待存数据未存储，便将待存数据存储至与待存数据对应的存储位置，并将与待存数据对应的引用计数置一，最后将待存数据的第一元数据信息存储，所述第一元数据信息包括：待存数据的指纹值。如此通过上述方法，在进行数据存储的过程中，既避免了数据的多次存储，又提高了工作效率，节约了系统的存储空间；同时，基于重复数据技术，在分布式存储领域中实现了数据的管理，节约了成本，延长了存储系统的寿命。It can be seen that the above method calculates the fingerprint value of the target data through the HASH algorithm; and determines the storage location corresponding to the fingerprint value of the target data through CRUSH mapping; wherein, the fingerprint value determines the uniqueness of the target data, and then determines the target data storage location The uniqueness of the location; then the target data is used as the data to be stored, and it is judged whether there is data in the storage location corresponding to the data to be stored; since the data to be stored has a unique storage location, if there is already data in the storage location , it indicates that the data to be stored has been stored, so the data to be stored is no longer stored, but the reference count corresponding to the data to be stored is incremented by one; if there is no data in the storage location, it indicates that the data to be stored is not stored, and the The data to be stored is stored in the storage location corresponding to the data to be stored, the reference count corresponding to the data to be stored is set to one, and finally the first metadata information of the data to be stored is stored, and the first metadata information includes: The fingerprint value of the stored data. In this way, through the above method, in the process of data storage, the multiple storage of data is avoided, the work efficiency is improved, and the storage space of the system is saved; at the same time, based on the repeated data technology, it is realized in the field of distributed storage. Data management saves costs and extends the life of storage systems.

相应地，本发明实施例提供的一种基于重复数据删除技术的数据管理装置，也同样具有上述技术效果。Correspondingly, the data management apparatus based on the deduplication technology provided by the embodiment of the present invention also has the above technical effects.

附图说明Description of drawings

为了更清楚地说明本发明实施例或现有技术中的技术方案，下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图仅仅是本发明的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他的附图。In order to explain the embodiments of the present invention or the technical solutions in the prior art more clearly, the following briefly introduces the accompanying drawings that need to be used in the description of the embodiments or the prior art. Obviously, the accompanying drawings in the following description are only These are some embodiments of the present invention. For those of ordinary skill in the art, other drawings can also be obtained according to these drawings without creative efforts.

图1为本发明实施例公开的一种基于重复数据删除技术的数据管理方法流程图；1 is a flowchart of a data management method based on a deduplication technology disclosed in an embodiment of the present invention;

图2为本发明实施例公开的另一种基于重复数据删除技术的数据管理方法流程图；2 is a flowchart of another data management method based on a deduplication technology disclosed in an embodiment of the present invention;

图3为本发明实施例一种基于重复数据删除技术的数据管理方法中的数据删除方法流程图；3 is a flowchart of a data deletion method in a data management method based on a deduplication technology according to an embodiment of the present invention;

图4为本发明实施例公开的一种基于重复数据删除技术的数据管理装置示意图。FIG. 4 is a schematic diagram of a data management apparatus based on a deduplication technology disclosed in an embodiment of the present invention.

具体实施方式Detailed ways

下面将结合本发明实施例中的附图，对本发明实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例仅仅是本发明一部分实施例，而不是全部的实施例。基于本发明中的实施例，本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例，都属于本发明保护的范围。The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only a part of the embodiments of the present invention, but not all of the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative efforts shall fall within the protection scope of the present invention.

本发明实施例公开了一种基于重复数据删除技术的数据管理方法及装置，以实现在分布式存储领域中，基于重复数据技术进行数据的存储、读取以及删除操作。The embodiment of the present invention discloses a data management method and device based on the duplicate data deletion technology, so as to realize the storage, reading and deletion of data based on the duplicate data technology in the field of distributed storage.

参见图1，本发明实施例提供的一种基于重复数据删除技术的数据管理方法，包括：Referring to FIG. 1 , a data management method based on a deduplication technology provided by an embodiment of the present invention includes:

具体的，在本实施例中，所述目标数据是当前操作要存储的数据，在计算目标数据的指纹值之前，首先需要将目标数据进行分块。Specifically, in this embodiment, the target data is the data to be stored in the current operation, and before calculating the fingerprint value of the target data, the target data needs to be divided into blocks first.

目前，在分布式存储领域中，为了使底层存储的数据规则化，一般把即将要存储的数据按照底层存储对象的大小进行划分。例如：如果底层存储对象按照4M的大小进行划分，且目标数据的大小为10M，那么将目标数据以4M为依据切分为4M、4M和2M这三个块。即将要存储的数据切为小于或者等于4M的块。At present, in the field of distributed storage, in order to regularize the data in the underlying storage, the data to be stored is generally divided according to the size of the underlying storage object. For example: if the underlying storage object is divided according to the size of 4M, and the size of the target data is 10M, then the target data is divided into three blocks of 4M, 4M and 2M based on 4M. The data to be stored is cut into blocks less than or equal to 4M.

具体的，在计算目标数据的指纹值时，根据上述分块的数据内容通过HASH算法计算目标数据的指纹值，所述指纹值与分块的数据内容一一对应，即与要存储的数据一一对应，即数据内容与指纹值匹配成对，并形成键值对匹配信息。若目标数据被分为多个分块，则每个分块具有一个指纹值，对于每个指纹值对于的数据进行后续操作；若目标数据被分为一个分块，则目标数据具有一个指纹值，对于这个指纹值进行后续操作。在本实施例，目标数据被分为一个分块，具有唯一指纹值。Specifically, when calculating the fingerprint value of the target data, the fingerprint value of the target data is calculated by the HASH algorithm according to the data content of the above-mentioned blocks. One correspondence, that is, the data content and the fingerprint value are matched in pairs, and the key-value pair matching information is formed. If the target data is divided into multiple blocks, each block has a fingerprint value, and subsequent operations are performed on the data corresponding to each fingerprint value; if the target data is divided into one block, the target data has a fingerprint value. , and perform subsequent operations on this fingerprint value. In this embodiment, the target data is divided into a block with a unique fingerprint value.

具体的，根据通过上述HASH算法计算而得的指纹值，在Rados层通过CRUSH映射过程将目标数据替换为目标数据的指纹值，并传递至对象存储设备，进而由对象存储设备在自身存储系统中寻找与目标数据对应的存储位置，进而确定存储位置。Specifically, according to the fingerprint value calculated by the above HASH algorithm, the target data is replaced by the fingerprint value of the target data through the CRUSH mapping process at the Rados layer, and is transmitted to the object storage device, and then the object storage device stores it in its own storage system. Find the storage location corresponding to the target data, and then determine the storage location.

具体的，将目标数据作为待存数据，对象存储设备确定待存数据的存储位置后，首先判断此存储位置中是否存有数据，若已经存有数据，则表明待存数据已经存储过；若未存有数据，则表明待存数据尚未存储。Specifically, taking the target data as the data to be stored, after the object storage device determines the storage location of the data to be stored, it first determines whether there is data in the storage location. If there is data, it means that the data to be stored has been stored; If there is no data, it means that the data to be stored has not been stored.

具体的，通过步骤S13确定待存数据已经存储过，则不再存储待存数据，而是将与待存数据对应的引用计数加一。Specifically, if it is determined in step S13 that the data to be stored has been stored, the data to be stored is no longer stored, but the reference count corresponding to the data to be stored is incremented by one.

具体的，通过步骤S13确定待存数据尚未存储，则将待存数据存储至与待存数据对应的存储位置，并将与待存数据对应的引用计数置一。Specifically, it is determined in step S13 that the data to be stored has not been stored, the data to be stored is stored in a storage location corresponding to the data to be stored, and the reference count corresponding to the data to be stored is set to one.

具体的，在完成待存数据的存储后，待存数据对应的引用计数也会存储至对象存储设备准备的专用存储位置；同时，待存数据的元数据信息也要进行存储，元数据信息内包括待存数据的指纹值等各种属性。Specifically, after the storage of the data to be stored is completed, the reference count corresponding to the data to be stored will also be stored in the dedicated storage location prepared by the object storage device; at the same time, the metadata information of the data to be stored is also stored, and the metadata information Including various attributes such as the fingerprint value of the data to be stored.

具体的，元数据存储在存储指纹值的时候，会首先存储8K的元数据，然后将会在元数据后面存储该文件对应的指纹值。元数据存储的元数据信息是以文件为单位，以8K作为一个对象存储在集群环境中，对于4MB的数据分块，此时将会有4088KB的空间来存储指纹数据，采用SHA-1作为指纹HASH算法，一条指纹大小为20byte，此时将会存储209305条指纹值，对应了817GB的数据。Specifically, when the metadata is stored in the fingerprint value, 8K metadata will be stored first, and then the fingerprint value corresponding to the file will be stored after the metadata. The metadata information stored in the metadata is based on files, and is stored in a cluster environment with 8K as an object. For a 4MB data block, there will be 4088KB of space to store fingerprint data, and SHA-1 is used as the fingerprint. HASH algorithm, the size of a fingerprint is 20 bytes, and 209305 fingerprint values will be stored at this time, corresponding to 817GB of data.

可见，本实施例提供的一种基于重复数据删除技术的数据管理方法，所述方法通过HASH算法计算目标数据的指纹值；并通过CRUSH映射确定与所述目标数据的指纹值对应的存储位置；其中，指纹值确定了目标数据的唯一性，进而确定了目标数据存储位置的唯一性；进而将所述目标数据作为待存数据，判断与待存数据对应的存储位置中是否存有数据；由于待存数据具有唯一的存储位置，所以若存储位置中已经存有数据，则表明待存数据已经存储，便不再存储待存数据，而是将与待存数据对应的引用计数加一；若存储位置中未存有数据，则表明待存数据未存储，便将待存数据存储至与待存数据对应的存储位置，并将与待存数据对应的引用计数置一，最后将待存数据的第一元数据信息存储，所述第一元数据信息包括：待存数据的指纹值。如此通过上述方法，在进行数据存储的过程中，既避免了数据的多次存储，又提高了工作效率，节约了系统的存储空间；同时，基于重复数据技术，在分布式存储领域中实现了数据的管理，节约了成本，延长了存储系统的寿命。It can be seen that the present embodiment provides a data management method based on de-duplication technology, wherein the method calculates the fingerprint value of the target data through the HASH algorithm; and determines the storage location corresponding to the fingerprint value of the target data through CRUSH mapping; Among them, the fingerprint value determines the uniqueness of the target data, and then determines the uniqueness of the storage location of the target data; then the target data is used as the data to be stored, and it is judged whether there is data in the storage location corresponding to the data to be stored; The data to be stored has a unique storage location, so if there is already data in the storage location, it means that the data to be stored has been stored, and the data to be stored is no longer stored, but the reference count corresponding to the data to be stored is incremented by one; If there is no data in the storage location, it means that the data to be stored is not stored, then the data to be stored is stored in the storage location corresponding to the data to be stored, and the reference count corresponding to the data to be stored is set to one, and finally the data to be stored is stored. The first metadata information is stored, the first metadata information includes: the fingerprint value of the data to be stored. In this way, through the above method, in the process of data storage, the multiple storage of data is avoided, the work efficiency is improved, and the storage space of the system is saved; at the same time, based on the repeated data technology, it is realized in the field of distributed storage. Data management saves costs and extends the life of storage systems.

参见图2，本发明实施例提供的另一种基于重复数据删除技术的数据管理方法，包括：Referring to FIG. 2, another data management method based on the deduplication technology provided by an embodiment of the present invention includes:

具体的，在本实施例中，在对目标数据进行存储之前，会首先判断目标数据的第二元数据信息是否存在，即判断目标数据是首次存储还是再次存储，从而确定当前操作是创建写还是修改写。若目标数据的第二元数据信息存在，则表明目标数据不是首次存储，则确定当前操作为修改写，继续执行步骤S22；若目标数据的第二元数据信息不存在，则表明目标数据是首次存储，则执行步骤S11。Specifically, in this embodiment, before storing the target data, it will first determine whether the second metadata information of the target data exists, that is, whether the target data is stored for the first time or stored again, so as to determine whether the current operation is to create a write or to store it again. Modify write. If the second metadata information of the target data exists, it indicates that the target data is not stored for the first time, then it is determined that the current operation is a modification write, and step S22 is continued; if the second metadata information of the target data does not exist, it indicates that the target data is the first time If stored, step S11 is executed.

具体的，获取所述第二元数据信息的具体过程如下：文件系统客户端获取目标数的索引信息，并向元数据存储请求获取第二元数据信息；元数据存储根据目标数据的索引信息获取第二元数据信息，其中包括目标数据的指纹值，所述指纹值以键值对形式存储。Specifically, the specific process of obtaining the second metadata information is as follows: the file system client obtains the index information of the target number, and requests the metadata storage to obtain the second metadata information; the metadata storage obtains the index information according to the target data The second metadata information includes the fingerprint value of the target data, and the fingerprint value is stored in the form of a key-value pair.

需要说明的是，此处仅获取到第二元数据信息，若要获取某元数据对应的数据内容，即读取该数据，则需客户端获取要读取数据的索引信息，并向元数据存储请求元数据信息；元数据存储根据索引信息获取元数据信息，其中包括组成文件的各个对象的，以键值对形式存储的指纹值；Rados根据数据偏移量、数据长度以及指纹值，到对象存储设备内直接读取数据。如此便完成了数据的读取过程。It should be noted that only the second metadata information is obtained here. To obtain the data content corresponding to a certain metadata, that is, to read the data, the client needs to obtain the index information of the data to be read, and send the metadata to the metadata. Storage request metadata information; metadata storage obtains metadata information based on index information, including fingerprint values stored in the form of key-value pairs for each object that composes the file; Rados, according to data offset, data length and fingerprint value, to Read data directly from the object storage device. This completes the data reading process.

具体的，通过步骤S22获取到目标数据的第二元数据信息后，需要判断第二元数据信息是否完整，即判断第二元数据信息中是否存在指纹值，若存在指纹值，则继续执行步骤S24；若不存在指纹值，则执行步骤S11。Specifically, after obtaining the second metadata information of the target data through step S22, it is necessary to determine whether the second metadata information is complete, that is, to determine whether there is a fingerprint value in the second metadata information, and if there is a fingerprint value, continue to execute the step S24; if there is no fingerprint value, execute step S11.

具体的，通过上述步骤S23确定第二元数据信息内存在指纹值后，需要比较目标数据的长度与预设的数据长度的大小。在比较数据长度的大小之前，一般会对目标数据进行大小划分，划分大小的具体过程与上述实施例类似，故在此并不赘述。Specifically, after determining that the fingerprint value exists in the second metadata information through the above step S23, it is necessary to compare the length of the target data with the size of the preset data length. Before comparing the size of the data length, the size of the target data is generally divided, and the specific process of dividing the size is similar to the above-mentioned embodiment, so it is not repeated here.

具体的，将目标数据进行分块处理后，比较分块的长度与预设的数据长度的大小，在本实施例中，假设目标数据被划分为一个数据块，则这个数据块的长度等于目标数据的长度，那么比较目标数据的长度与预设的数据长度即可。此处预设的数据长度为系统内默认的长度，一般系统默认长度为4M。若所述目标数据的长度等于所述预设的数据长度，则执行S11；若所述目标数据的长度小于所述预设的数据长度，则继续执行S25；Specifically, after the target data is processed into blocks, the length of the block is compared with the size of the preset data length. In this embodiment, assuming that the target data is divided into one data block, the length of this data block is equal to the target data length. The length of the data, then compare the length of the target data with the preset data length. The preset data length here is the default length in the system. Generally, the default length of the system is 4M. If the length of the target data is equal to the preset data length, execute S11; if the length of the target data is less than the preset data length, continue to execute S25;

具体的，若目标数据的长度小于预设的数据长度，则需要按照数据偏移量、数据长度将目标数据与所述第二元数据信息对应的数据进行拼接。在本实施例中，将预设的数据长度定为4M。例如：第二元数据信息对应的数据的长度是一个0～4M的数据对象，而目标数据的长度为1M，此时需要修改0～4M中的2～3M这个位置；首先把第二元数据信息对应的数据0～4M全部读取，和目标数据1M进行拼接，即将0～4M分为0～2M、2～3M、3～4M三段，并将原来的2～3M共1M内容替换为目标数据的1M内容，再将0～2M、新的2～3M、3～4M这三段拼接到一起，组成一个新的4M数据，即得到了拼接数据。Specifically, if the length of the target data is smaller than the preset data length, the target data and the data corresponding to the second metadata information need to be spliced according to the data offset and the data length. In this embodiment, the preset data length is set as 4M. For example: the length of the data corresponding to the second metadata information is a data object of 0 to 4M, and the length of the target data is 1M. At this time, the position of 2 to 3M in 0 to 4M needs to be modified; All data 0-4M corresponding to the information are read and spliced with the target data 1M, that is, 0-4M is divided into 0-2M, 2-3M, 3-4M three segments, and the original 2-3M total of 1M content is replaced with 1M content of the target data, and then splicing the three segments of 0~2M, new 2~3M, 3~4M together to form a new 4M data, that is, the splicing data is obtained.

具体的，在本实施例中，确定拼接数据的指纹值对应的存储位置的具体过程与上述实施例类似，故在此不再赘述。而在确定拼接数据的指纹值对应的存储位置后，需要将拼接数据作为待存数据，继续执行步骤S13。Specifically, in this embodiment, the specific process of determining the storage location corresponding to the fingerprint value of the spliced data is similar to the above-mentioned embodiment, so it is not repeated here. After the storage location corresponding to the fingerprint value of the spliced data is determined, the spliced data needs to be used as the data to be stored, and step S13 is continued.

可见，本实施例提供的一种基于重复数据删除技术的数据管理方法，所述方法首先判断是否存在与所述目标数据对应的第二元数据信息；当目标数据存在第二元数据信息时，则获取所述第二元数据信息；当目标数据不存在第二元数据信息时，则执行S11；在获取到第二元数据信息后，判断所述第二元数据信息中是否存在指纹值；若存在，则比较所述目标数据的长度与预设的数据长度；若不存在，则执行S11；在比较目标数据的长度与预设的数据长度后，若所述目标数据的长度等于所述预设的数据长度，则执行S11；若所述目标数据的长度小于所述预设的数据长度，则将所述目标数据以及与所述第二元数据信息对应的数据进行拼接，得到拼接数据，并计算所述拼接数据的指纹值，并通过CRUSH映射确定与所述拼接数据的指纹值对应的存储位置；将所述拼接数据作为待存数据，进而执行S13。通过上述方法，在进行数据存储的过程中，既避免了数据的多次存储，又提高了工作效率，节约了系统的存储空间；同时，基于重复数据技术，在分布式存储领域中实现了数据的管理，节约了成本，延长了存储系统的寿命。It can be seen that, in a data management method based on a deduplication technology provided by this embodiment, the method first determines whether there is second metadata information corresponding to the target data; when the target data has second metadata information, then acquire the second metadata information; when the target data does not have the second metadata information, execute S11; after acquiring the second metadata information, determine whether there is a fingerprint value in the second metadata information; If there is, compare the length of the target data with the preset data length; if not, execute S11; after comparing the length of the target data and the preset data length, if the length of the target data is equal to the The preset data length, then execute S11; if the length of the target data is less than the preset data length, then splicing the target data and the data corresponding to the second metadata information to obtain spliced data , and calculate the fingerprint value of the spliced data, and determine the storage location corresponding to the fingerprint value of the spliced data through CRUSH mapping; take the spliced data as the data to be stored, and then execute S13. Through the above method, in the process of data storage, the multiple storage of data is avoided, the work efficiency is improved, and the storage space of the system is saved; at the same time, based on the repeated data technology, the data is realized in the field of distributed storage. management, saving costs and extending the life of the storage system.

基于上述任意实施例，需要说明的是，所述若所述目标数据的长度等于所述预设的数据长度，包括：Based on any of the foregoing embodiments, it should be noted that the if the length of the target data is equal to the preset data length, including:

具体的，在数据的修改写过程中，当所述目标数据的长度等于所述预设的数据长度时，将所述第二元数据信息对应的数据的引用计数减一，若第二元数据信息对应的数据不再有其他引用，则减一后的引用计数为零，此时便将第二元数据信息对应的数据进行删除。Specifically, in the data modification and writing process, when the length of the target data is equal to the preset data length, the reference count of the data corresponding to the second metadata information is decremented by one. If there is no other reference to the data corresponding to the information, the reference count after decrement by one is zero, and at this time, the data corresponding to the second metadata information is deleted.

基于上述任意实施例，需要说明的是，所述将所述目标数据以及与所述第二元数据信息对应的数据进行拼接，得到拼接数据，并计算所述拼接数据的指纹值，包括：Based on any of the above embodiments, it should be noted that the splicing of the target data and the data corresponding to the second metadata information to obtain spliced data, and calculating the fingerprint value of the spliced data, includes:

具体的，在数据的修改写过程中，所述目标数据的长度小于所述预设的数据长度，并按照所述预设的数据长度以及数据偏移量，将所述目标数据以及与所述第二元数据信息对应的数据进行拼接，得到拼接数据，计算所述拼接数据的指纹值后；进而判断第二元数据信息对应的数据是否还存在其他引用，具体过程如下：将所述第二元数据信息对应的数据的引用计数减一，若所述引用计数减一后为零，则表明第二元数据信息对应的数据不存在其他引用，便将第二元数据信息对应的数据删除；若所述引用计数减一后不为零，则表明第二元数据信息对应的数据存在其他引用，则保留第二元数据信息对应的数据。Specifically, in the process of data modification and writing, the length of the target data is less than the preset data length, and the target data and the The data corresponding to the second metadata information is spliced to obtain spliced data, and the fingerprint value of the spliced data is calculated; and then it is judged whether there are other references to the data corresponding to the second metadata information. The specific process is as follows: The reference count of the data corresponding to the metadata information is decremented by one, and if the reference count is decremented by one and then zero, it indicates that there is no other reference to the data corresponding to the second metadata information, and the data corresponding to the second metadata information is deleted; If the reference count is not zero after being decremented by one, it indicates that there are other references to the data corresponding to the second metadata information, and the data corresponding to the second metadata information is retained.

基于上述任意实施例，需要说明的是，本发明实施例提供的一种基于重复数据删除的数据管理方法，还包括数据的删除方法，参见图3，具体过程如下：Based on any of the above-mentioned embodiments, it should be noted that a data management method based on deduplication provided by the embodiment of the present invention also includes a data deletion method. Referring to FIG. 3 , the specific process is as follows:

S31、接收客户端发送的删除请求；S31. Receive the deletion request sent by the client;

S32、根据所述删除请求确定待删数据，并获取所述待删数据的第三元数据信息和所述第三元数据信息中的待删数据的指纹值；S32, determining the data to be deleted according to the deletion request, and acquiring the third metadata information of the data to be deleted and the fingerprint value of the data to be deleted in the third metadata information;

S33、通过CRUSH映射确定与所述待删数据的指纹值对应的存储位置，并将与所述待删数据对应的引用计数减一；S33, determine the storage location corresponding to the fingerprint value of the data to be deleted by CRUSH mapping, and decrement the reference count corresponding to the data to be deleted by one;

S34、判断与所述待删数据对应的引用计数是否为零；S34, determine whether the reference count corresponding to the data to be deleted is zero;

S35、若是，则删除所述待删数据和所述第三元数据信息；S35, if yes, then delete the data to be deleted and the third metadata information;

S36、若否，暂不执行删除操作。S36. If not, do not perform the deletion operation for the time being.

具体的，在执行数据的删除方法时，其中包含了数据的读取过程，即根据所述删除请求确定待删数据，并获取所述待删数据的第三元数据信息和所述第三元数据信息中的待删数据的指纹值，此处仅读取了待删数据的第三元数据信息和其中的指纹值，对于待删数据的内容并未读取。进而通过CRUSH映射确定与所述待删数据的指纹值对应的存储位置，并将与所述待删数据对应的引用计数减一并删除其元数据信息，告知客户端该待删数据成功删除；若所述引用计数减一后为零，则表明第二元数据信息对应的数据不存在其他引用，便将待删数据删除。Specifically, when the data deletion method is executed, the data reading process is included, that is, the data to be deleted is determined according to the deletion request, and the third metadata information and the third metadata of the to-be-deleted data are obtained. The fingerprint value of the data to be deleted in the data information, here only the third metadata information of the data to be deleted and the fingerprint value therein are read, and the content of the data to be deleted is not read. Then, determine the storage location corresponding to the fingerprint value of the data to be deleted through CRUSH mapping, decrement the reference count corresponding to the data to be deleted and delete its metadata information, and inform the client that the data to be deleted is successfully deleted; If the reference count is decremented by one and then zero, it indicates that there is no other reference to the data corresponding to the second metadata information, and the to-be-deleted data is deleted.

下面对本发明实施例提供的一种基于重复数据删除技术的数据管理装置进行介绍，下文描述的一种基于重复数据删除技术的数据管理装置与上文描述的一种基于重复数据删除技术的数据管理方法可以相互参照。A data management device based on the deduplication technology provided by the embodiments of the present invention will be introduced below. The data management device based on the deduplication technology described below and the data management device based on the deduplication technology described above Methods can refer to each other.

参见图4，本发明实施例提供的一种基于重复数据删除技术的数据管理装置，包括：Referring to FIG. 4 , a data management device based on a deduplication technology provided by an embodiment of the present invention includes:

第一计算模块401，用于通过HASH算法计算目标数据的指纹值；The first calculation module 401 is used to calculate the fingerprint value of the target data through the HASH algorithm;

第一确定模块402，用于通过CRUSH映射确定与所述目标数据的指纹值对应的存储位置，将所述目标数据作为待存数据；The first determination module 402 is used to determine the storage location corresponding to the fingerprint value of the target data through CRUSH mapping, and use the target data as the data to be stored;

第一判断模块403，用于判断与待存数据对应的存储位置中是否存有数据；The first judgment module 403 is used for judging whether there is data in the storage location corresponding to the data to be stored;

第一执行模块404，用于当与待存数据对应的存储位置中存有数据时，将与待存数据对应的引用计数加一；The first execution module 404 is configured to add one to the reference count corresponding to the data to be stored when there is data in the storage location corresponding to the data to be stored;

第一存储模块405，用于当与待存数据对应的存储位置中未存有数据时，将待存数据存储至与待存数据对应的存储位置，并将与待存数据对应的引用计数置一；The first storage module 405 is configured to store the data to be stored in the storage location corresponding to the data to be stored when there is no data in the storage location corresponding to the data to be stored, and set the reference count corresponding to the data to be stored. one;

第二存储模块406，用于存储待存数据的第一元数据信息，所述第一元数据信息包括：待存数据的指纹值。The second storage module 406 is configured to store first metadata information of the data to be stored, where the first metadata information includes: a fingerprint value of the data to be stored.

其中，还包括：Among them, it also includes:

具体的，当第二判断模块判断不存在与所述目标数据对应的第二元数据信息时，触发所述第一计算模块，由第一计算模块通过HASH算法计算目标数据的指纹值；并由第一确定模块通过CRUSH映射确定与所述目标数据的指纹值对应的存储位置，将所述目标数据作为待存数据；进而第一判断模块判断与待存数据对应的存储位置中是否存有数据；当与待存数据对应的存储位置中存有数据时，第一执行模块将与待存数据对应的引用计数加一；当与待存数据对应的存储位置中未存有数据时，第一存储模块将待存数据存储至与待存数据对应的存储位置，并将与待存数据对应的引用计数置一；最后第二存储模块存储待存数据的第一元数据信息，所述第一元数据信息包括：待存数据的指纹值。Specifically, when the second judgment module judges that there is no second metadata information corresponding to the target data, the first calculation module is triggered, and the first calculation module calculates the fingerprint value of the target data through the HASH algorithm; The first determination module determines the storage location corresponding to the fingerprint value of the target data through CRUSH mapping, and uses the target data as the data to be stored; and then the first determination module determines whether there is data in the storage location corresponding to the data to be stored. When there is data in the storage location corresponding to the data to be stored, the first execution module adds one to the reference count corresponding to the data to be stored; when there is no data in the storage location corresponding to the data to be stored, the first execution module The storage module stores the to-be-stored data in a storage location corresponding to the to-be-stored data, and sets the reference count corresponding to the to-be-stored data to one; finally, the second storage module stores the first metadata information of the to-be-stored data, the first The metadata information includes: the fingerprint value of the data to be stored.

具体的，当第三判读模块判断第二元数据信息中不存在指纹值时，触发所述第一计算模块；由第一计算模块通过HASH算法计算目标数据的指纹值；并由第一确定模块通过CRUSH映射确定与所述目标数据的指纹值对应的存储位置，将所述目标数据作为待存数据；进而第一判断模块判断与待存数据对应的存储位置中是否存有数据；当与待存数据对应的存储位置中存有数据时，第一执行模块将与待存数据对应的引用计数加一；当与待存数据对应的存储位置中未存有数据时，第一存储模块将待存数据存储至与待存数据对应的存储位置，并将与待存数据对应的引用计数置一；最后第二存储模块存储待存数据的第一元数据信息，所述第一元数据信息包括：所述待存数据的指纹值。Specifically, when the third interpretation module judges that there is no fingerprint value in the second metadata information, the first calculation module is triggered; the first calculation module calculates the fingerprint value of the target data through the HASH algorithm; and the first determination module Determine the storage location corresponding to the fingerprint value of the target data through CRUSH mapping, and use the target data as the data to be stored; then the first judgment module determines whether there is data in the storage location corresponding to the data to be stored; When data is stored in the storage location corresponding to the stored data, the first execution module adds one to the reference count corresponding to the data to be stored; when there is no data stored in the storage location corresponding to the data to be stored, the first storage module will The stored data is stored in the storage location corresponding to the to-be-stored data, and the reference count corresponding to the to-be-stored data is set to one; finally, the second storage module stores the first metadata information of the to-be-stored data, and the first metadata information includes : The fingerprint value of the data to be stored.

具体的，当第二元数据信息中存在指纹值时，比较所述目标数据的长度与预设的数据长度；若所述目标数据的长度等于所述预设的数据长度，则触发所述第一计算模块；由第一计算模块通过HASH算法计算目标数据的指纹值；并由第一确定模块通过CRUSH映射确定与所述目标数据的指纹值对应的存储位置，将所述目标数据作为待存数据；进而第一判断模块判断与待存数据对应的存储位置中是否存有数据；当与待存数据对应的存储位置中存有数据时，第一执行模块将与待存数据对应的引用计数加一；当与待存数据对应的存储位置中未存有数据时，第一存储模块将待存数据存储至与待存数据对应的存储位置，并将与待存数据对应的引用计数置一；最后第二存储模块存储待存数据的第一元数据信息，所述第一元数据信息包括：待存数据的指纹值。Specifically, when there is a fingerprint value in the second metadata information, the length of the target data is compared with a preset data length; if the length of the target data is equal to the preset data length, trigger the first A calculation module; the fingerprint value of the target data is calculated by the first calculation module through the HASH algorithm; and the storage position corresponding to the fingerprint value of the target data is determined by the first determination module through CRUSH mapping, and the target data is used as the pending storage. Then the first judgment module judges whether there is data in the storage location corresponding to the data to be stored; when there is data in the storage location corresponding to the data to be stored, the first execution module counts the references corresponding to the data to be stored Add one; when there is no data in the storage location corresponding to the data to be stored, the first storage module stores the data to be stored in the storage location corresponding to the data to be stored, and sets the reference count corresponding to the data to be stored by one ; Finally, the second storage module stores the first metadata information of the data to be stored, and the first metadata information includes: the fingerprint value of the data to be stored.

其中，所述比较模块包括：Wherein, the comparison module includes:

其中，所述拼接模块包括：Wherein, the splicing module includes:

其中，还包括：Among them, it also includes:

可见，本实施例提供的一种基于重复数据删除技术的数据管理装置，首先第一计算模块通过HASH算法计算目标数据的指纹值；第一确定模块通过CRUSH映射确定与所述目标数据的指纹值对应的存储位置，将所述目标数据作为待存数据；由第一判断模块判断与待存数据对应的存储位置中是否存有数据；当与待存数据对应的存储位置中存有数据时，第一执行模块将与待存数据对应的引用计数加一；当与待存数据对应的存储位置中未存有数据时，第一存储模块将待存数据存储至与待存数据对应的存储位置，并将与待存数据对应的引用计数置一；最后第二存储模块存储待存数据的第一元数据信息，所述第一元数据信息包括：待存数据的指纹值。从而完成了数据的存储以及其元数据信息的存储。It can be seen that, in the data management device based on the deduplication technology provided by this embodiment, first the first calculation module calculates the fingerprint value of the target data through the HASH algorithm; the first determination module determines the fingerprint value of the target data through CRUSH mapping The corresponding storage location, the target data is taken as the data to be stored; the first judgment module judges whether there is data in the storage location corresponding to the data to be stored; when there is data in the storage location corresponding to the data to be stored, The first execution module adds one to the reference count corresponding to the data to be stored; when there is no data in the storage location corresponding to the data to be stored, the first storage module stores the data to be stored in the storage location corresponding to the data to be stored , and set the reference count corresponding to the data to be stored to one; finally, the second storage module stores the first metadata information of the data to be stored, and the first metadata information includes: the fingerprint value of the data to be stored. Thus, the storage of data and the storage of its metadata information are completed.

本说明书中各个实施例采用递进的方式描述，每个实施例重点说明的都是与其他实施例的不同之处，各个实施例之间相同相似部分互相参见即可。The various embodiments in this specification are described in a progressive manner, and each embodiment focuses on the differences from other embodiments, and the same and similar parts between the various embodiments can be referred to each other.

对所公开的实施例的上述说明，使本领域专业技术人员能够实现或使用本发明。对这些实施例的多种修改对本领域的专业技术人员来说将是显而易见的，本文中所定义的一般原理可以在不脱离本发明的精神或范围的情况下，在其它实施例中实现。因此，本发明将不会被限制于本文所示的这些实施例，而是要符合与本文所公开的原理和新颖特点相一致的最宽的范围。The above description of the disclosed embodiments enables any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be implemented in other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein, but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. a data management method based on deduplication technology, is characterized in that, comprises:

S11. Calculate the fingerprint value of the target data through the HASH algorithm;

S12, determine the storage location corresponding to the fingerprint value of the target data through CRUSH mapping; take the target data as the data to be stored, and execute S13;

S13, determine whether there is data in the storage location corresponding to the data to be stored; if so, execute S14; if not, execute S15;

S14, add one to the reference count corresponding to the data to be stored, and execute S16;

S15, store the data to be stored in the storage location corresponding to the data to be stored, and set the reference count corresponding to the data to be stored to one, and execute S16;

S16. Store first metadata information of the data to be stored, where the first metadata information includes: a fingerprint value of the data to be stored;

Wherein, before executing the S11, it also includes:

S21, determine whether there is second metadata information corresponding to the target data; if so, execute S22; if not, execute S11;

S22, acquiring the second metadata information;

S23, determine whether there is a fingerprint value in the second metadata information; if so, execute S24; if not, execute S11;

S24. Compare the length of the target data with the preset data length; if the length of the target data is equal to the preset data length, execute S11; if the length of the target data is less than the preset data length, execute S25;

S25, splicing the target data and the data corresponding to the second metadata information to obtain spliced data, and calculate the fingerprint value of the spliced data, and execute S26;

S26. Determine the storage location corresponding to the fingerprint value of the spliced data through CRUSH mapping; take the spliced data as the data to be stored, and execute S13.

2. The data management method based on data deduplication technology according to claim 1, wherein, if the length of the target data is equal to the preset data length, comprising:

If the length of the target data is equal to the preset data length, decrement the reference count of the data corresponding to the second metadata information by one;

judging whether the reference count of the data corresponding to the second metadata information is zero;

If so, delete the data corresponding to the second metadata information.

3. the data management method based on deduplication technology according to claim 1, is characterized in that, described target data and the data corresponding with described second metadata information are spliced, obtain splicing data, and Calculate the fingerprint value of the spliced data, including:

acquiring data content corresponding to the second metadata information;

According to the preset data length and data offset, the target data and the data corresponding to the second metadata information are spliced to obtain spliced data;

Calculate the fingerprint value of the spliced data;

Decrement the reference count of the data corresponding to the second metadata information by one;

If so, delete the data corresponding to the second metadata information.

4. The data management method based on any one of claims 1-3, characterized in that, further comprising:

Receive the deletion request sent by the client;

Determine the data to be deleted according to the deletion request, and obtain the third metadata information of the data to be deleted and the fingerprint value of the data to be deleted in the third metadata information;

Determine the storage location corresponding to the fingerprint value of the data to be deleted through CRUSH mapping, and decrement the reference count corresponding to the data to be deleted by one;

Determine whether the reference count corresponding to the data to be deleted is zero;

If so, delete the data to be deleted and the third metadata information.

5. A data management device based on deduplication technology, characterized in that, comprising:

The first calculation module is used to calculate the fingerprint value of the target data through the HASH algorithm;

The first determination module is used to determine the storage location corresponding to the fingerprint value of the target data through CRUSH mapping, and use the target data as the data to be stored;

a first judging module for judging whether there is data in the storage location corresponding to the data to be stored;

a first execution module, configured to add one to the reference count corresponding to the data to be stored when data is stored in the storage location corresponding to the data to be stored;

The first storage module is used to store the to-be-stored data in the storage location corresponding to the to-be-stored data when there is no data in the storage location corresponding to the to-be-stored data, and set the reference count corresponding to the to-be-stored data to one ;

a second storage module, configured to store first metadata information of the data to be stored, where the first metadata information includes: a fingerprint value of the data to be stored;

Wherein, the data management device further includes:

a second judgment module, configured to judge whether there is second metadata information corresponding to the target data; if not, trigger the first calculation module;

a first obtaining module, configured to obtain the second metadata information when there is second metadata information corresponding to the target data;

a third judging module for judging whether there is a fingerprint value in the second metadata information; if not, triggering the first computing module;

a comparison module, configured to compare the length of the target data with a preset data length when there is a fingerprint value in the second metadata information; if the length of the target data is equal to the preset data length, then triggering the first calculation module;

A splicing module, configured to splicing the target data and the data corresponding to the second metadata information when the length of the target data is less than the preset data length to obtain splicing data, and calculate the The fingerprint value of the concatenated data;

The second determining module is configured to determine the storage location corresponding to the fingerprint value of the spliced data through CRUSH mapping.

6. The data management device based on deduplication technology according to claim 5, wherein the comparison module comprises:

a first execution unit, configured to decrement the reference count of the data corresponding to the second metadata information by one when the length of the target data is equal to the preset data length;

a first judging unit, configured to judge whether the reference count of the data corresponding to the second metadata information is zero;

A first deletion unit, configured to delete the data corresponding to the second metadata information when the reference count of the data corresponding to the second metadata information is zero.

7. The data management device based on deduplication technology according to claim 5, wherein the splicing module comprises:

an acquiring unit, configured to acquire data content corresponding to the second metadata information;

a splicing unit, configured to splicing the target data and the data corresponding to the second metadata information according to the preset data length and data offset to obtain splicing data;

a computing unit for computing the fingerprint value of the spliced data;

a second execution unit, configured to decrement the reference count of the data corresponding to the second metadata information by one;

a second judging unit, configured to judge whether the reference count of the data corresponding to the second metadata information is zero;

A second deletion unit, configured to delete the data corresponding to the second metadata information when the reference count of the data corresponding to the second metadata information is zero.

8. The data management device based on any one of claims 5-7, characterized in that, further comprising:

The receiving module is used to receive the deletion request sent by the client;

a second acquiring module, configured to determine the data to be deleted according to the deletion request, and acquire third metadata information of the data to be deleted and the fingerprint value of the data to be deleted in the third metadata information;

A third determining module, configured to determine the storage location corresponding to the fingerprint value of the data to be deleted through CRUSH mapping, and decrement the reference count corresponding to the data to be deleted by one;

The fourth judgment module is used for judging whether the reference count corresponding to the data to be deleted is zero;

A deletion module, configured to delete the to-be-deleted data and the third metadata information when the reference count corresponding to the to-be-deleted data is zero.