[go: up one dir, main page]

CN103902735B - Application perception data routing method oriented to large-scale cluster deduplication and system - Google Patents

Application perception data routing method oriented to large-scale cluster deduplication and system Download PDF

Info

Publication number
CN103902735B
CN103902735B CN201410158590.0A CN201410158590A CN103902735B CN 103902735 B CN103902735 B CN 103902735B CN 201410158590 A CN201410158590 A CN 201410158590A CN 103902735 B CN103902735 B CN 103902735B
Authority
CN
China
Prior art keywords
backup
file
node
application
deduplication
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201410158590.0A
Other languages
Chinese (zh)
Other versions
CN103902735A (en
Inventor
付印金
胡谷雨
倪桂强
谢钧
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
PLA University of Science and Technology
Original Assignee
PLA University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by PLA University of Science and Technology filed Critical PLA University of Science and Technology
Priority to CN201410158590.0A priority Critical patent/CN103902735B/en
Publication of CN103902735A publication Critical patent/CN103902735A/en
Application granted granted Critical
Publication of CN103902735B publication Critical patent/CN103902735B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1448Management of the data involved in backup or backup restore
    • G06F11/1453Management of the data involved in backup or backup restore using de-duplication of the data
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/174Redundancy elimination performed by the file system
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/44Distributed routing

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Quality & Reliability (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本发明公开一种面向大规模集群消重的应用感知数据路由方法及实现本方法的大规模备份存储集群系统。应用感知数据路由方法包括:S10)获取备份文件元信息,S20)感知文件应用类型,S30)计算消重存储节点负载,S40)选取文件路由节点,S50)发送文件到目标节点和S60)处理节点内文件消重等步骤;大规模备份存储集群系统包括多个备份客户端、一个备份服务器和多个消重存储服务器。本发明的数据路由方法及系统具有数据消重率高、节点吞吐率高、系统通信开销小、系统负载均衡的特点。

The invention discloses an application-aware data routing method for large-scale cluster deduplication and a large-scale backup storage cluster system for realizing the method. The application-aware data routing method includes: S10) obtaining backup file meta information, S20) perceiving the file application type, S30) calculating the deduplication storage node load, S40) selecting a file routing node, S50) sending the file to the target node and S60) processing the node Steps such as deduplication of internal files; a large-scale backup storage cluster system includes multiple backup clients, a backup server and multiple deduplication storage servers. The data routing method and system of the present invention have the characteristics of high data deduplication rate, high node throughput rate, small system communication overhead and balanced system load.

Description

面向大规模集群消重的应用感知数据路由方法及系统Application-aware data routing method and system for large-scale cluster deduplication

技术领域technical field

本发明属于信息存储及集群计算技术领域,特别是一种面向大规模集群消重的应用感知数据路由方法及大规模备份存储集群系统。The invention belongs to the technical field of information storage and cluster computing, in particular to an application-aware data routing method for large-scale cluster deduplication and a large-scale backup storage cluster system.

背景技术Background technique

在众多管理海量数据的备份存储系统内数据高度冗余,集群消重(ClusterDeduplication)技术是在备份存储服务器集群系统上实现分布并行的数据消重处理,能够满足海量备份数据管理在容量和性能上的可扩展需求。为构建节能、环保、高效的绿色数据中心,集群消重已成为当前数据中心存储管理的核心技术。In many backup storage systems that manage massive data, the data is highly redundant. Cluster Deduplication technology is to realize distributed and parallel data deduplication processing on the backup storage server cluster system, which can meet the capacity and performance of massive backup data management. scalability needs. In order to build an energy-saving, environmentally friendly, and efficient green data center, cluster deduplication has become the core technology of current data center storage management.

出于对系统开销的考虑,集群消重往往选择松耦合设计,不去执行跨节点的数据消重。备份客户端发送的数据先通过数据路由分配到各个消重存储服务器节点,消重存储服务器再独立并行地删除节点内重复的数据内容。数据路由直接影响备份数据的存储空间利用率、消重存储服务器节点的系统吞吐率、消重存储服务器集群的负载均衡和通信开销。因此,数据路由方法对集群消重效率的提升至关重要。In consideration of system overhead, cluster deduplication often chooses a loosely coupled design and does not perform cross-node data deduplication. The data sent by the backup client is first distributed to each deduplication storage server node through data routing, and the deduplication storage server then independently and parallelly deletes the duplicate data content in the node. Data routing directly affects storage space utilization of backup data, system throughput of deduplication storage server nodes, load balancing and communication overhead of deduplication storage server clusters. Therefore, the data routing method is very important to improve the efficiency of cluster deduplication.

目前,集群消重的数据路由方法主要有三种:基于分布式哈希表的块级数据路由方法、基于状态信息的超块级数据路由方法、以及基于相似性的文件级数据路由方法。基于分布式哈希表的块级数据路由方法,如USENIX FAST’09会议论文“HYDRAstor:a ScalableSecondary Storage”(公开日:2009-02-23)和中国发明专利申请“分布式的重复数据删除系统及其方法”(申请号:201110461322.2,公开日:2011-12-28),是将数据块特征值按分布式哈希表分配到不同数据消重节点。虽然该方法能够有效地提高空间利用率和降低通信开销,但不能保留住节点内的数据局部性而影响系统吞吐率。基于状态信息的超块级数据路由方法,如USENIX FAST’11会议论文“Tradeoffs in Scalable Data Routing forDeduplication Clusters”(公开日:2011-02-14),将划分后连续的许多数据块合并成粒度均匀的超块,超块路由前都需要查询其所含数据块与各个节点内已存数据块的重复数,然后在考虑负载平衡的前提下尽量将超块路由到重复数据块数最多的节点。这种策略能在负载平衡的前提下获得高数据缩减率,但其广播式的系统通信开销以及节点内频繁的块指纹查询操作严重影响了系统性能。基于相似性的文件级数据路由,如IEEE/ACM MASCOTS’09会议论文“Extreme Binning:Scalable,Parallel Deduplication for Chunk based FileBackup”(公开日:2009-09-21),利用基于Broder最小值独立置换定理选取文件内数据块指纹的最小值作为文件的相似特征,按分布式哈希机制将相似的文件路由到相同的消重存储服务器节点,但当数据流中相似性较低时,不能检测出文件相似性,备份数据的集群消重效果较差。At present, there are three main data routing methods for cluster deduplication: block-level data routing methods based on distributed hash tables, super-block-level data routing methods based on state information, and file-level data routing methods based on similarity. A block-level data routing method based on a distributed hash table, such as the USENIX FAST'09 conference paper "HYDRAstor: a Scalable Secondary Storage" (public date: 2009-02-23) and the Chinese invention patent application "Distributed Data Deduplication System and its method" (application number: 201110461322.2, publication date: 2011-12-28), which is to distribute the characteristic value of the data block to different data deduplication nodes according to the distributed hash table. Although this method can effectively improve space utilization and reduce communication overhead, it cannot preserve the data locality in nodes and affect system throughput. The super-block-level data routing method based on state information, such as the USENIX FAST'11 conference paper "Tradeoffs in Scalable Data Routing for Deduplication Clusters" (public date: 2011-02-14), merges many continuous data blocks after division into uniform granularity Before superblock routing, it is necessary to query the number of repetitions between the data blocks contained in it and the existing data blocks in each node, and then try to route the superblock to the node with the largest number of duplicate data blocks under the premise of considering load balancing. This strategy can achieve a high data reduction rate under the premise of load balancing, but its broadcast system communication overhead and frequent block fingerprint query operations in nodes seriously affect system performance. Similarity-based file-level data routing, such as the IEEE/ACM MASCOTS'09 conference paper "Extreme Binning: Scalable, Parallel Deduplication for Chunk based FileBackup" (public date: 2009-09-21), uses Broder's minimum independent permutation theorem Select the minimum value of the fingerprint of the data block in the file as the similar feature of the file, and route similar files to the same deduplication storage server node according to the distributed hash mechanism, but when the similarity in the data stream is low, the file cannot be detected Similarity, the cluster deduplication effect of backup data is poor.

总之,现有技术存在的问题是:对数据中心成百上千个节点规模的集群消重,存在数据消重率低、节点吞吐率低、系统通信开销大和系统负载不均衡等缺陷。In short, the problems existing in the existing technology are: deduplication for clusters with hundreds or thousands of nodes in the data center, there are defects such as low data deduplication rate, low node throughput rate, large system communication overhead and unbalanced system load.

发明内容Contents of the invention

本发明的目的在于提供一种面向大规模集群消重的应用感知数据路由方法及系统,具有数据消重率高、节点吞吐率高、系统通信开销小和系统负载均衡的特点。The purpose of the present invention is to provide an application-aware data routing method and system for large-scale cluster deduplication, which has the characteristics of high data deduplication rate, high node throughput rate, small system communication overhead and system load balance.

实现本发明目的的技术解决方案为:一种面向大规模集群消重的应用感知数据路由方法,所述大规模备份存储集群系统包括多个备份客户端(100)、一个备份服务器(200)和多个消重存储服务器(300),其特征在于,包括如下步骤:The technical solution to achieve the purpose of the present invention is: an application-aware data routing method for large-scale cluster deduplication, the large-scale backup storage cluster system includes multiple backup clients (100), a backup server (200) and Multiple deduplication storage servers (300), characterized by comprising the following steps:

S10)获取备份文件元信息:备份客户端(100)向备份服务器(200)发送包含文件的名称、用户和大小等文件元信息的文件备份请求消息;S10) Obtaining backup file meta information: the backup client (100) sends a file backup request message including file name, user and size of the file meta information to the backup server (200);

S20)感知文件应用类型:备份服务器(200)根据文件元信息对备份文件的应用类型进行划分,并查询应用索引结构,获取可存放相应类型应用文件的候选消重存储服务器(300)节点列表;S20) Sensing file application types: the backup server (200) classifies the application types of the backup files according to the file meta-information, and queries the application index structure to obtain a list of candidate deduplication storage server (300) nodes that can store corresponding types of application files;

S30)计算消重存储节点负载:备份服务器(200)通过查询应用感知索引结构获取各消重存储服务器(300)节点的实时动态负载信息,并根据这些节点负载信息和备份文件元信息计算出可保持负载平衡的低负载消重存储服务器(300)节点列表;S30) Calculating the load of deduplication storage nodes: the backup server (200) obtains the real-time dynamic load information of each deduplication storage server (300) node by querying the application-aware index structure, and calculates the available load information based on these node load information and backup file meta information A list of low load deduplication storage servers (300) to maintain load balance;

S40)选取文件路由节点:备份服务器(200)分析候选消重存储服务器节点列表和低负载消重存储服务器节点列表,选取存放相同类型应用数据的一个低负载候选服务器节点作为文件路由目标节点,并将结果返回给备份客户端(100);S40) Select file routing nodes: the backup server (200) analyzes the list of candidate deduplication storage server nodes and the list of low-load deduplication storage server nodes, selects a low-load candidate server node storing the same type of application data as the file routing target node, and Return the result to the backup client (100);

S50)发送文件到目标节点:备份客户端(100)根据备份服务器(200)返回的文件路由决策结果,将备份会话内的各个文件发送到相应的路由目标消重存储服务器(300)节点;S50) Send files to the target node: the backup client (100) sends each file in the backup session to the corresponding routing target deduplication storage server (300) node according to the file routing decision result returned by the backup server (200);

S60)处理节点内文件消重:消重存储服务器(300)节点根据应用文件数据格式和内容的差异,独立地对不同类型的应用文件进行数据消重处理。S60) Processing deduplication of files within the node: the deduplication storage server (300) node independently performs data deduplication processing on different types of application files according to differences in the data format and content of the application files.

一种用于实现面向大规模集群消重的应用感知数据路由方法的大规模备份存储集群系统,包括多个备份客户端(100)、一个备份服务器(200)和多个消重存储服务器(300),其特征在于:A large-scale backup storage cluster system for implementing an application-aware data routing method for large-scale cluster deduplication, including multiple backup clients (100), a backup server (200) and multiple deduplication storage servers (300 ), characterized by:

所述备份客户端(100)用于向备份服务器(200)发送包含文件的名称、用户和大小等文件元信息的文件备份请求消息,The backup client (100) is configured to send to the backup server (200) a file backup request message containing file meta information such as file name, user, and size,

备份服务器(200)用于根据文件元信息感知备份文件的应用类型,并查询应用索引结构,获取可存放相应类型应用文件的候选消重存储服务器(300)节点号列表;The backup server (200) is used to perceive the application type of the backup file according to the file meta information, and query the application index structure to obtain a list of node numbers of the candidate deduplication storage server (300) that can store the corresponding type of application file;

备份服务器(200)用于通过查询应用感知索引结构获取各消重存储服务器(300)节点的实时动态负载信息,并根据这些节点负载信息和备份文件元信息计算出可保持负载平衡的低负载消重存储服务器(300)节点列表;The backup server (200) is used to obtain the real-time dynamic load information of each deduplication storage server (300) node by querying the application-aware index structure, and calculate the low-load consumption information that can maintain load balance according to the load information of these nodes and the backup file meta-information. re-storage server (300) node list;

备份服务器(200)用于分析候选消重存储服务器节点列表和低负载消重存储服务器节点列表,选取存放相同类型应用数据的一个低负载候选节点作为文件路由目标节点,并将结果返回给备份客户端(100);The backup server (200) is used to analyze the list of candidate deduplication storage server nodes and the list of low-load deduplication storage server nodes, select a low-load candidate node storing the same type of application data as the file routing target node, and return the result to the backup client end(100);

备份客户端(100)根据备份服务器(200)返回的文件路由决策结果,将备份会话内的各个文件发送到相应的路由目标消重存储服务器(300)节点;The backup client (100) sends each file in the backup session to the corresponding routing target deduplication storage server (300) node according to the file routing decision result returned by the backup server (200);

消重存储服务器(300)节点用于根据应用文件数据格式和内容的差异,独立地对不同类型的应用文件进行数据消重处理。The deduplication storage server (300) node is used to independently perform data deduplication processing on different types of application files according to differences in the data format and content of the application files.

本发明与现有技术相比,其显著优点:Compared with the prior art, the present invention has significant advantages:

1、数据消重率高:通过应用感知的数据路由策略将相似的数据分配到同一消重存储服务器节点,减少各个节点之间的数据重叠,对同一消重存储服务器节点内的文件按应用独立地进行数据消重处理;1. High data deduplication rate: through the application-aware data routing strategy, similar data is allocated to the same deduplication storage server node, reducing data overlap between nodes, and files in the same deduplication storage server node are independent by application Perform data deduplication processing in a timely manner;

2、节点吞吐率高:基于文件粒度分配数据,保持很好的数据访问局部性;2. High node throughput: allocate data based on file granularity, and maintain good locality of data access;

3、系统负载平衡:根据各个消重存储服务器节点的实际物理存储容量来动态分配存储资源,保证整个备份存储集群系统的负载平衡;3. System load balancing: dynamically allocate storage resources according to the actual physical storage capacity of each deduplication storage server node, to ensure the load balance of the entire backup storage cluster system;

4、通信开销低:以应用为粒度来判断数据路由,极大地减少了系统的消息通信开销。4. Low communication overhead: The data routing is judged at the granularity of the application, which greatly reduces the message communication overhead of the system.

总之,本发明提供一种可支持成百上千节点规模的备份存储集群系统进行集群消重的应用感知数据路由方法。它不仅能够极大地节省备份数据的存储空间使用,还能优化消重存储服务器节点的消重吞吐率,减少集群系统内部的通信开销,以及保持各个消重存储服务器节点的负载平衡。In a word, the present invention provides an application-aware data routing method that can support a backup storage cluster system with a scale of hundreds or even thousands of nodes for cluster deduplication. It can not only greatly save the storage space usage of backup data, but also optimize the deduplication throughput rate of deduplication storage server nodes, reduce the communication overhead within the cluster system, and maintain the load balance of each deduplication storage server node.

下面结合附图和具体实施方式对本发明作进一步的详细描述。The present invention will be further described in detail below in conjunction with the accompanying drawings and specific embodiments.

附图说明Description of drawings

图1是本发明大规模备份存储集群系统结构示意图。FIG. 1 is a schematic structural diagram of a large-scale backup storage cluster system of the present invention.

图2是本发明面向大规模集群消重的应用感知数据路由方法主流程图。FIG. 2 is a main flowchart of the application-aware data routing method for large-scale cluster deduplication in the present invention.

图3是感知文件应用类型原理图。Fig. 3 is a schematic diagram of the perception file application type.

图4是图2中选取文件路由节点步骤流程图。FIG. 4 is a flow chart of steps for selecting a file routing node in FIG. 2 .

具体实施方式detailed description

如图1所示,本发明的大规模备份存储集群系统,包括多个备份客户端100、一个备份服务器200和多个消重存储服务器300;As shown in Figure 1, the large-scale backup storage cluster system of the present invention includes multiple backup clients 100, a backup server 200 and multiple deduplication storage servers 300;

所述备份客户端100用于向备份服务器200发送包含文件的名称、用户和大小等文件元信息的文件备份请求消息;备份客户端100根据备份服务器200返回的文件路由决策结果,将备份会话内的各个文件发送到相应的路由目标消重存储服务器300节点;The backup client 100 is used to send a file backup request message containing file meta information such as the name, user and size of the file to the backup server 200; Each file sent to the corresponding routing target deduplication storage server 300 node;

所述每个备份客户端100包括文件I/O模块101和备份请求模块102,所述备份请求模块102用于与所述备份服务器200进行文件备份会话,所述文件I/O模块101用于根据所述备份服务器200返回的文件路由决策结果,将各个文件备份到相应的消重存储服务器300;Each backup client 100 includes a file I/O module 101 and a backup request module 102, the backup request module 102 is used to perform a file backup session with the backup server 200, and the file I/O module 101 is used to According to the file routing decision result returned by the backup server 200, each file is backed up to the corresponding deduplication storage server 300;

备份服务器200用于根据文件元信息感知备份文件的应用类型,并查询应用索引结构,获取可存放相应类型应用文件的候选消重存储服务器300节点号列表;备份服务器200是实现本发明方法的核心部分。The backup server 200 is used to perceive the application type of the backup file according to the file meta information, and query the application index structure to obtain a list of node numbers of the candidate deduplication storage server 300 that can store the corresponding type of application file; the backup server 200 is the core for realizing the method of the present invention part.

备份服务器200通过查询应用感知索引结构获取各消重存储服务器300节点的实时动态负载信息,并根据这些节点负载信息和备份文件元信息计算出可保持负载平衡的低负载消重存储服务器300节点列表;The backup server 200 obtains the real-time dynamic load information of each deduplication storage server 300 nodes by querying the application-aware index structure, and calculates a list of low-load deduplication storage server 300 nodes that can maintain load balance according to the load information of these nodes and the backup file meta-information ;

备份服务器200分析候选消重存储服务器节点列表和低负载消重存储服务器节点列表,选取存放相同类型应用数据的一个低负载候选节点作为文件路由目标节点,并将结果返回给备份客户端100;The backup server 200 analyzes the candidate deduplication storage server node list and the low-load deduplication storage server node list, selects a low-load candidate node storing the same type of application data as the file routing target node, and returns the result to the backup client 100;

所述备份服务器200包括备份会话管理模块201、应用感知过程模块202、文件路由决策模块203和负载平衡模块204,所述备份会话管理模块201用于接收备份客户端100的备份请求,将文件按来自同一用户的相同备份会话进行分组管理,并将文件路由决策结果反馈给备份客户端100,所述应用感知过程模块202用于对文件按应用类型进行分类,所述负载平衡模块204用于保持消重存储服务器集群的系统负载均衡,所述文件路由决策模块203用于将相同类型的应用文件分配到同一低负载的消重存储服务器节点,并将文件路由目标节点信息反馈给备份客户端100,并建立应用文件到消重存储服务器节点的映射关系,供文件恢复时使用。The backup server 200 includes a backup session management module 201, an application-aware process module 202, a file routing decision module 203, and a load balancing module 204. The backup session management module 201 is configured to receive a backup request from the backup client 100, and store files by The same backup sessions from the same user are managed in groups, and the file routing decision result is fed back to the backup client 100. The application awareness process module 202 is used to classify files by application type, and the load balancing module 204 is used to maintain System load balancing of the deduplication storage server cluster, the file routing decision module 203 is used to distribute the same type of application files to the same low-load deduplication storage server node, and feed back the file routing target node information to the backup client 100 , and establish a mapping relationship between the application file and the deduplication storage server node for use when the file is restored.

消重存储服务器300节点根据应用文件数据格式和内容的差异,独立地对不同类型的应用文件进行数据消重处理。The deduplication storage server 300 node independently performs data deduplication processing on different types of application files according to differences in the data format and content of the application files.

所述消重存储服务器300包含数据消重引擎301、文件元数据管理模块302和数据块管理模块303,所述数据消重引擎301用于对备份文件进行消重处理,并根据不同应用的特点,对每种应用类型的文件独立地进行数据消重,所述文件元数据管理模块302用于管理节点上所存文件的元数据和块指纹索引信息,数据块管理模块303用于管理消重后内容不重复的唯一数据块。The deduplication storage server 300 includes a data deduplication engine 301, a file metadata management module 302 and a data block management module 303. The data deduplication engine 301 is used to perform deduplication processing on backup files, and according to the characteristics of different applications , independently perform data deduplication on files of each application type, the file metadata management module 302 is used to manage metadata and block fingerprint index information of files stored on nodes, and the data block management module 303 is used to manage deduplication A unique data block whose content is not repeated.

如图2所示,本发明面向大规模集群消重的应用感知数据路由方法,是基于一种可扩展的集群消重系统架构,所述大规模备份存储集群系统如图1所示,包括多个备份客户端100、一个备份服务器200和多个消重存储服务器300。As shown in Figure 2, the application-aware data routing method for large-scale cluster deduplication of the present invention is based on a scalable cluster deduplication system architecture. The large-scale backup storage cluster system is shown in Figure 1, including multiple A backup client 100, a backup server 200, and multiple deduplication storage servers 300.

本发明面向大规模集群消重的应用感知数据路由方法,包括如下步骤:The application-aware data routing method for large-scale cluster deduplication of the present invention includes the following steps:

S10)获取备份文件元信息:备份客户端100向备份服务器200发送包含文件的名称、用户和大小等文件元信息的文件备份请求消息。S10 ) Acquiring backup file meta information: the backup client 100 sends a file backup request message including file name, user, size and other file meta information to the backup server 200 .

S20)感知文件应用类型:备份服务器200的应用感知过程模块202根据备份会话管理模块201获得的文件元信息对备份文件的应用类型进行划分,并查询应用索引结构,获取可存放相应类型应用文件的候选消重存储服务器300节点列表。S20) Sensing file application type: the application awareness process module 202 of the backup server 200 classifies the application type of the backup file according to the file meta-information obtained by the backup session management module 201, and queries the application index structure to obtain the application file that can store the corresponding type Candidate deduplication storage server 300 node list.

所述感知文件应用类型(S20)步骤如图3所示,包括:The step of perceiving file application type (S20) is shown in Figure 3, including:

S21)获取文件元信息:备份服务器200获取备份请求中的文件元信息,包括文件的名称230、用户231和大小232等文件元信息,文件名称230包括前缀和后缀,由后缀定义应用类型;如Test.doc的前缀为Test,后缀为doc,对应的应用类型是doc格式的Word文档。S21) Obtaining file meta information: the backup server 200 obtains the file meta information in the backup request, including the file name 230, user 231, and size 232 of the file. The file name 230 includes a prefix and a suffix, and the suffix defines the application type; The prefix of Test.doc is Test, the suffix is doc, and the corresponding application type is a Word document in doc format.

S22)查询应用索引结构:根据文件名称确定的应用类型查询应用索引结构,包含应用类型233、节点号234和数据量235等索引项;S22) Query the application index structure: query the application index structure according to the application type determined by the file name, including index items such as application type 233, node number 234, and data volume 235;

其中,应用类型233是备份文件对应的文件名后缀,节点号234是指存储该类应用文件的消重存储服务器节点号,数据量235是指存储在同一节点上的同类应用文件的物理数据量。如应用索引结构实例中与doc类型匹配的是第一行和第三行内容。Among them, the application type 233 is the file name suffix corresponding to the backup file, the node number 234 refers to the deduplication storage server node number storing this type of application file, and the data volume 235 refers to the physical data volume of the same type of application file stored on the same node . For example, in the example of the application index structure, the contents of the first and third lines match the doc type.

S23)获取候选消重存储服务器节点号:从应用索引结构中找出存放相同应用类型文件的消重存储服务器节点号,并将结果保存在候选消重存储服务器节点列表236—LIST1中。如图3所示,发现节点1和节点2都存放有doc类型的应用文件。S23) Obtain candidate deduplication storage server node numbers: find out the deduplication storage server node numbers storing the same application type files from the application index structure, and save the result in the candidate deduplication storage server node list 236—LIST 1 . As shown in FIG. 3 , it is found that both node 1 and node 2 store application files of doc type.

S30)计算消重存储节点负载:备份服务器200的负载平衡模块204通过查询应用感知索引结构获取各消重存储服务器300节点的实时动态负载信息,并根据这些节点负载信息和备份文件元信息计算出可保持负载平衡的低负载消重存储服务器300节点列表LIST2S30) Calculate the load of deduplication storage nodes: the load balancing module 204 of the backup server 200 obtains the real-time dynamic load information of each deduplication storage server 300 nodes by querying the application-aware index structure, and calculates the A list of 300 nodes of low-load deduplication storage servers capable of maintaining load balance LIST 2 .

所述计算消重存储节点负载(S30)步骤包括:The step of calculating the deduplication storage node load (S30) includes:

S31)计算消重存储服务器节点已使用的物理容量:消重存储服务器节点i的物理容量Ci,可表示为:S31) Calculate the physical capacity used by the deduplication storage server node: the physical capacity C i of the deduplication storage server node i can be expressed as:

其中i=1,2,...,N; where i=1,2,...,N;

其中,N为消重存储服务器集群服务器节点个数,K为节点i上存放的应用文件种类数,Cij为通过查询应用索引结构得到的消重存储服务器节点i上存放应用类型j的对应物理容量;Among them, N is the number of server nodes in the deduplication storage server cluster, K is the number of application file types stored on node i, and C ij is the corresponding physical data of application type j stored on deduplication storage server node i obtained by querying the application index structure. capacity;

S32)查找低负载消重存储服务器节点:当Ci+S<Ti时,判定节点i为低负载节点,将节点号i填加到LIST2中,S32) Search for low-load deduplication storage server nodes: when C i +S<T i , determine that node i is a low-load node, and add node number i to LIST 2 ,

其中,Ti为消重存储服务器节点i的负载阈值,S为备份文件的大小,LIST2为低负载消重存储服务器节点列表。Wherein, T i is the load threshold of the deduplication storage server node i, S is the size of the backup file, and LIST 2 is a list of low-load deduplication storage server nodes.

S40)选取文件路由节点:备份服务器200的文件路由决策模块203分析候选消重存储服务器节点列表LIST1和低负载消重存储服务器节点列表LIST2,选取存放相同类型应用数据的一个低负载候选节点作为文件路由目标节点,并将结果返回给备份客户端100。S40) Select file routing nodes: the file routing decision module 203 of the backup server 200 analyzes the candidate deduplication storage server node list LIST 1 and the low-load deduplication storage server node list LIST 2 , and selects a low-load candidate node that stores the same type of application data Routing the target node as a file, and returning the result to the backup client 100.

如图4所示,所述选取文件路由节点(S40)步骤包括:As shown in Figure 4, the step of selecting a file routing node (S40) includes:

S41)输入存有相同应用文件的候选消重存储服务器节点列表LIST1和低负载消重存储服务器节点列表LIST2S41) Input the candidate deduplication storage server node list LIST 1 and the low-load deduplication storage server node list LIST 2 storing the same application file;

S42)判断这两个节点列表的交集LIST1∩LIST2是否为空,如是则转步骤S43,如否则转到步骤S46;S42) Judging whether the intersection LIST 1 ∩ LIST 2 of the two node lists is empty, if yes, go to step S43, otherwise go to step S46;

S43)判断低负载消重存储服务器节点列表LIST2是否为空,如是则转步骤S44,如否则转步骤S45;S43) Determine whether the low-load deduplication storage server node list LIST 2 is empty, if yes, go to step S44, otherwise go to step S45;

S44)发出消重存储服务器集群负载过高的警告,结束处理过程;S44) Issue a warning that the load of the deduplication storage server cluster is too high, and end the processing process;

S45)从低负载消重存储服务器节点列表LIST2中选取一个节点;S45) Select a node from the low-load deduplication storage server node list LIST 2 ;

S46)从低负载的候选节点子集LIST1∩LIST2中选取一个作为目标节点返回。S46) Select one from the low-load candidate node subset LIST 1 ∩ LIST 2 as the target node and return.

S50)发送文件到目标节点:备份客户端100根据备份服务器200返回的文件路由决策结果,将备份会话内的各个文件发送到相应的路由目标消重存储服务器300节点。S50) Sending files to target nodes: the backup client 100 sends each file in the backup session to the corresponding node of the routing target deduplication storage server 300 according to the file routing decision result returned by the backup server 200 .

S60)处理节点内文件消重:消重存储服务器300节点根据应用文件数据格式和内容的差异,独立地对不同类型的应用文件进行数据消重处理。S60) Processing file deduplication within the node: the deduplication storage server 300 node independently performs data deduplication processing on different types of application files according to differences in the data format and content of the application files.

消重存储服务器节点300的数据消重引擎301模块根据应用文件数据格式和内容的差异,独立地对不同类型的应用文件进行数据消重优化,并将消重后文件存储增加的物理容量作为消息反馈更新到备份服务器200的应用索引结构中。文件元数据管理模块302和数据块管理模块303分别对节点上所存文件的元数据(包含块指纹索引信息)和消重后内容不重复的唯一数据块进行有效的管理。The data deduplication engine 301 module of the deduplication storage server node 300 independently performs data deduplication optimization on different types of application files according to the differences in the data format and content of the application files, and uses the increased physical capacity of the file storage after deduplication as a message The feedback is updated into the application index structure of the backup server 200 . The file metadata management module 302 and the data block management module 303 respectively manage the metadata (including block fingerprint index information) of the files stored on the node and the unique data blocks whose content is not repeated after deduplication.

本发明通过开发应用感知来优化集群消重处理,藉以提供一种可以兼顾备份存储空间节省和集群系统扩展能力提升的数据路由技术。本发明可以应用于网络备份软件、分布式文件系统和云存储系统软件之中,容易实现高效率的并行数据消重处理。The present invention optimizes cluster deduplication processing by developing application awareness, thereby providing a data routing technology that can take into account backup storage space saving and cluster system expansion capability improvement. The invention can be applied to network backup software, distributed file system and cloud storage system software, and can easily realize high-efficiency parallel data deduplication processing.

当然,本发明还可有其它多种实施例,在不背离本发明精神及其实质的情况下,熟悉本领域的技术人员当可根据本发明做出各种相应的改变和变形,但这些相应的改变和变形都应属于本发明所附的权利要求的保护范围。Of course, the present invention can also have other various embodiments, and those skilled in the art can make various corresponding changes and deformations according to the present invention without departing from the spirit and essence of the present invention. All changes and deformations should belong to the protection scope of the appended claims of the present invention.

Claims (4)

1.一种面向大规模集群消重的应用感知数据路由方法,所述方法实现于大规模备份存储集群系统,包括多个备份客户端(100)、一个备份服务器(200)和多个消重存储服务器(300),包括如下步骤:1. An application-aware data routing method for large-scale cluster deduplication, the method is implemented in a large-scale backup storage cluster system, including multiple backup clients (100), a backup server (200) and multiple deduplication A storage server (300), comprising the steps of: S10)获取备份文件元信息:备份客户端(100)向备份服务器(200)发送包含文件的名称、用户和大小等文件元信息的文件备份请求消息;S10) obtain the backup file meta-information: the backup client (100) sends a file backup request message including the name of the file, the user and the file meta-information such as the size to the backup server (200); S20)感知文件应用类型:备份服务器(200)根据文件元信息对备份文件的应用类型进行划分,并查询应用索引结构,获取可存放相应类型应用文件的候选消重存储服务器(300)节点列表;S20) Sensing file application types: the backup server (200) classifies the application types of the backup files according to the file meta information, and queries the application index structure to obtain a list of candidate deduplication storage server (300) nodes that can store corresponding types of application files; S30)计算消重存储节点负载:备份服务器(200)通过查询应用感知索引结构获取各消重存储服务器(300)节点的实时动态负载信息,并根据这些节点负载信息和备份文件元信息计算出可保持负载平衡的低负载消重存储服务器(300)节点列表;S30) Calculate the load of deduplication storage nodes: the backup server (200) obtains the real-time dynamic load information of each deduplication storage server (300) node by querying the application-aware index structure, and calculates the available load information based on these node load information and the backup file meta-information A low load deduplication storage server (300) node list for maintaining load balance; S40)选取文件路由节点:备份服务器(200)分析候选消重存储服务器节点列表和低负载消重存储服务器节点列表,选取存放相同类型应用数据的一个低负载候选节点作为文件路由目标节点,并将结果返回给备份客户端(100);S40) select file routing nodes: the backup server (200) analyzes the list of candidate deduplication storage server nodes and the list of low-load deduplication storage server nodes, selects a low-load candidate node storing the same type of application data as the file routing target node, and The result is returned to the backup client (100); S50)发送文件到目标节点:备份客户端(100)根据备份服务器(200)返回的文件路由决策结果,将备份会话内的各个文件发送到相应的路由目标消重存储服务器(300)节点;S50) Send files to the target node: the backup client (100) sends each file in the backup session to the corresponding routing target deduplication storage server (300) node according to the file routing decision result returned by the backup server (200); S60)处理节点内文件消重:消重存储服务器(300)节点根据应用文件数据格式和内容的差异,独立地对不同类型的应用文件进行数据消重处理;S60) Processing deduplication of files in the node: the deduplication storage server (300) node independently performs data deduplication processing on different types of application files according to the difference in the data format and content of the application files; 其特征在于,所述感知文件应用类型(S20)步骤包括:It is characterized in that the step of perceiving file application type (S20) includes: S21)获取文件元信息:备份服务器(200)获取备份请求中的文件元信息,包括文件的名称、用户和大小,文件名称包括前缀和后缀,由后缀定义应用类型;S21) Obtaining file meta information: the backup server (200) obtains the file meta information in the backup request, including the name, user and size of the file, the file name includes a prefix and a suffix, and the application type is defined by the suffix; S22)查询应用索引结构:根据文件名称确定的应用类型查询应用索引结构,应用索引包含应用类型、节点号和数据量等索引项;S22) Query the application index structure: query the application index structure according to the application type determined by the file name, and the application index includes index items such as application type, node number and data volume; S23)获取候选消重存储服务器节点号:从应用索引结构中找出存放相同应用类型文件的消重存储服务器节点号,并将结果保存到候选消重存储服务器节点列表。S23) Obtain candidate deduplication storage server node numbers: find out the deduplication storage server node numbers storing the same application type files from the application index structure, and save the result to the candidate deduplication storage server node list. 2.根据权利要求1所述的应用感知数据路由方法,其特征在于,所述计算消重存储节点负载(S30)步骤包括:2. The application-aware data routing method according to claim 1, wherein the step of calculating and deduplication storage node load (S30) comprises: S31)计算消重存储服务器节点已使用的物理容量:消重存储服务器节点i的物理 容量Ci可表示为,S31) Calculate the physical capacity used by the deduplication storage server node: the physical capacity C i of the deduplication storage server node i can be expressed as, 其中i=1,2,...,N; where i=1,2,...,N; 其中,N为消重存储服务器集群服务器节点个数,K为节点i上存放的应用文件种类数,Cij为通过查询应用索引结构得到的消重存储服务器节点i上存放应用类型j的对应物理容量;Among them, N is the number of server nodes in the deduplication storage server cluster, K is the number of application file types stored on node i, and C ij is the corresponding physical data of application type j stored on deduplication storage server node i obtained by querying the application index structure. capacity; S32)查找低负载消重存储服务器节点:当Ci+S<Ti时,判定节点i为低负载节点,将节点号i填加到低负载消重存储服务器节点列表中;S32) Searching for a low-load deduplication storage server node: when C i +S<T i , determine that node i is a low-load node, and add node number i to the low-load deduplication storage server node list; 其中,Ti为消重存储服务器节点i的负载阈值,S为备份文件的大小。Wherein, T i is the load threshold of the deduplication storage server node i, and S is the size of the backup file. 3.根据权利要求1所述的应用感知数据路由方法,其特征在于,所述选取文件路由节点(S40)步骤包括:3. The application-aware data routing method according to claim 1, wherein the step of selecting a file routing node (S40) comprises: S41)输入存有相同应用文件的候选消重存储服务器节点列表LIST1和低负载消重存储服务器节点列表LIST2S41) Input the candidate deduplication storage server node list LIST 1 and the low load deduplication storage server node list LIST 2 that have the same application file; S42)判断这两个节点列表的交集LIST1∩LIST2是否为空,如是则转步骤S43,如否则转到步骤S46;S42) judge whether the intersection LIST1 ∩ LIST 2 of these two node lists is empty, if so then go to step S43, otherwise go to step S46; S43)判断低负载消重存储服务器节点列表LIST2是否为空,如是则转步骤S44,如否则转步骤S45;S43) judging whether the low-load deduplication storage server node list LIST 2 is empty, if so, turn to step S44, otherwise turn to step S45; S44)发出消重存储服务器集群负载过高的警告,结束处理过程;S44) issue a warning that the load of the deduplication storage server cluster is too high, and end the processing process; S45)从低负载消重存储服务器节点列表LIST2中选取一个节点;S45) Select a node from the low-load deduplication storage server node list LIST 2 ; S46)从低负载的候选节点子集LIST1∩LIST2中选取一个作为目标节点返回。S46) Select one of the low-load candidate node subsets LIST 1 ∩ LIST 2 as the target node and return. 4.一种用于实现权利要求1所述的应用感知数据路由方法的大规模备份存储集群系统,包括多个备份客户端(100)、一个备份服务器(200)和多个消重存储服务器(300),其特征在于:4. A large-scale backup storage cluster system for realizing the application-aware data routing method described in claim 1, comprising multiple backup clients (100), a backup server (200) and multiple deduplication storage servers ( 300), characterized in that: 所述备份客户端(100)用于向备份服务器(200)发送包含文件的名称、用户和大小等文件元信息的文件备份请求消息;The backup client (100) is used to send a file backup request message containing file meta-information such as file name, user and size to the backup server (200); 备份服务器(200)用于根据文件元信息感知备份文件的应用类型,并查询应用索引结构,获取可存放相应类型应用文件的候选消重存储服务器(300)节点号列表;The backup server (200) is used to perceive the application type of the backup file according to the file meta information, and query the application index structure to obtain a list of node numbers of the candidate deduplication storage server (300) that can store the corresponding type of application file; 备份服务器(200)用于通过查询应用感知索引结构获取各消重存储服务器(300)节点的实时动态负载信息,并根据这些节点负载信息和备份文件元信息计算出可保持负 载平衡的低负载消重存储服务器(300)节点列表;The backup server (200) is used to obtain the real-time dynamic load information of each deduplication storage server (300) node by querying the application-aware index structure, and calculate the low-load consumption information that can maintain load balance according to the load information of these nodes and the backup file meta-information. Heavy storage server (300) node list; 备份服务器(200)用于分析候选消重存储服务器节点列表和低负载消重存储服务器节点列表,选取存放相同类型应用数据的一个低负载候选节点作为文件路由目标节点,并将结果返回给备份客户端(100);The backup server (200) is used to analyze the candidate deduplication storage server node list and the low-load deduplication storage server node list, select a low-load candidate node storing the same type of application data as the file routing target node, and return the result to the backup client end(100); 备份客户端(100)根据备份服务器(200)返回的文件路由决策结果,将备份会话内的各个文件发送到相应的路由目标消重存储服务器(300)节点;The backup client (100) sends each file in the backup session to the corresponding routing target deduplication storage server (300) node according to the file routing decision result returned by the backup server (200); 消重存储服务器(300)节点用于根据应用文件数据格式和内容的差异,独立地对不同类型的应用文件进行数据消重处理;The deduplication storage server (300) node is used to independently perform data deduplication processing on different types of application files according to the differences in the data format and content of the application files; 所述每个备份客户端(100)包括文件I/O模块(101)和备份请求模块(102),所述备份请求模块(102)用于与所述备份服务器(200)进行文件备份会话,所述文件I/O模块(101)用于根据所述备份服务器(200)返回的文件路由决策结果,将各个文件备份到相应的消重存储服务器(300);Each backup client (100) includes a file I/O module (101) and a backup request module (102), and the backup request module (102) is used to perform a file backup session with the backup server (200), The file I/O module (101) is used for backing up each file to a corresponding deduplication storage server (300) according to the file routing decision result returned by the backup server (200); 所述备份服务器(200)包括备份会话管理模块(201)、应用感知过程模块(202)、文件路由决策模块(203)和负载平衡模块(204),所述备份会话管理模块(201)用于接收备份客户端(100)的备份请求,将文件按来自同一用户的相同备份会话进行分组管理,并将文件路由决策结果反馈给备份客户端(100),所述应用感知过程模块(202)用于对文件按应用类型进行分类,所述负载平衡模块(204)用于保持消重存储服务器集群的系统负载均衡,所述文件路由决策模块(203)用于将相同类型的应用文件分配到同一低负载的消重存储服务器节点,并将文件路由目标节点信息反馈给备份客户端(100),并建立应用文件到消重存储服务器节点的映射关系,供文件恢复时使用;The backup server (200) includes a backup session management module (201), an application awareness process module (202), a file routing decision module (203) and a load balancing module (204), and the backup session management module (201) is used for Receive the backup request of the backup client (100), manage the files in groups according to the same backup session from the same user, and feed back the file routing decision result to the backup client (100), and the application awareness process module (202) uses For classifying files according to application types, the load balancing module (204) is used to maintain system load balance of the deduplication storage server cluster, and the file routing decision module (203) is used to distribute the same type of application files to the same The low-load deduplication storage server node feeds back the information of the file routing target node to the backup client (100), and establishes a mapping relationship from the application file to the deduplication storage server node for use when the file is restored; 所述消重存储服务器(300)包含数据消重引擎(301)、文件元数据管理模块(302)和数据块管理模块(303),所述数据消重引擎(301)用于对备份文件进行消重处理,并根据不同应用的特点,对每种应用类型的文件独立地进行数据消重,所述文件元数据管理模块(302)用于管理节点上所存文件的元数据和块指纹索引信息,数据块管理模块(303)用于管理消重后内容不重复的唯一数据块。The deduplication storage server (300) includes a data deduplication engine (301), a file metadata management module (302) and a data block management module (303), and the data deduplication engine (301) is used to perform backup files Deduplication processing, and according to the characteristics of different applications, independently perform data deduplication on files of each application type, and the file metadata management module (302) is used to manage metadata and block fingerprint index information of files stored on nodes , the data block management module (303) is used for managing the unique data block whose content is not repeated after deduplication.
CN201410158590.0A 2014-04-18 2014-04-18 Application perception data routing method oriented to large-scale cluster deduplication and system Expired - Fee Related CN103902735B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410158590.0A CN103902735B (en) 2014-04-18 2014-04-18 Application perception data routing method oriented to large-scale cluster deduplication and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410158590.0A CN103902735B (en) 2014-04-18 2014-04-18 Application perception data routing method oriented to large-scale cluster deduplication and system

Publications (2)

Publication Number Publication Date
CN103902735A CN103902735A (en) 2014-07-02
CN103902735B true CN103902735B (en) 2017-02-22

Family

ID=50994057

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410158590.0A Expired - Fee Related CN103902735B (en) 2014-04-18 2014-04-18 Application perception data routing method oriented to large-scale cluster deduplication and system

Country Status (1)

Country Link
CN (1) CN103902735B (en)

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104216803A (en) * 2014-09-29 2014-12-17 北京奇艺世纪科技有限公司 Data backup method and device for out-of-service nodes
CN106202134A (en) * 2015-05-30 2016-12-07 中国石油化工股份有限公司 Data redundancy inspection method
CN105159925B (en) * 2015-08-04 2019-08-30 北京京东尚科信息技术有限公司 A kind of data-base cluster data distributing method and system
US20170214627A1 (en) * 2016-01-21 2017-07-27 Futurewei Technologies, Inc. Distributed Load Balancing for Network Service Function Chaining
CN107666495B (en) * 2016-07-27 2020-11-10 平安科技(深圳)有限公司 Disaster recovery method and terminal for application
CN109214206A (en) * 2018-08-01 2019-01-15 武汉普利商用机器有限公司 cloud backup storage system and method
CN110213319B (en) * 2018-10-08 2022-03-08 腾讯科技(深圳)有限公司 Access method and device, terminal, server and storage medium
CN112685223B (en) * 2019-10-17 2024-12-20 伊姆西Ip控股有限责任公司 File backup based on file type
CN111400105A (en) * 2020-03-27 2020-07-10 北京拓世寰宇网络技术有限公司 Database backup method and device
CN111858494B (en) * 2020-07-23 2024-05-17 珠海豹趣科技有限公司 File acquisition method and device, storage medium and electronic equipment
CN113590535B (en) * 2021-09-30 2021-12-17 中国人民解放军国防科技大学 An efficient data migration method and device for deduplication storage system

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6065018A (en) * 1998-03-04 2000-05-16 International Business Machines Corporation Synchronizing recovery log having time stamp to a remote site for disaster recovery of a primary database having related hierarchial and relational databases
CN101751394A (en) * 2008-12-16 2010-06-23 青岛海信传媒网络技术有限公司 Method and system for synchronizing data

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6065018A (en) * 1998-03-04 2000-05-16 International Business Machines Corporation Synchronizing recovery log having time stamp to a remote site for disaster recovery of a primary database having related hierarchial and relational databases
CN101751394A (en) * 2008-12-16 2010-06-23 青岛海信传媒网络技术有限公司 Method and system for synchronizing data

Also Published As

Publication number Publication date
CN103902735A (en) 2014-07-02

Similar Documents

Publication Publication Date Title
CN103902735B (en) Application perception data routing method oriented to large-scale cluster deduplication and system
CN105069111B (en) Block level data duplicate removal method based on similitude in cloud storage
CN105487818B (en) For the efficient De-weight method of repeated and redundant data in cloud storage system
US9639543B2 (en) Adaptive index for data deduplication
CN104969213B (en) Data flow for low latency data access is split
US10291696B2 (en) Peer-to-peer architecture for processing big data
CN103106249B (en) A kind of parallel data processing system based on Cassandra
CN103067525B (en) A kind of cloud storing data backup method of feature based code
CN105005611B (en) A kind of file management system and file management method
CN102708165A (en) Method and device for processing files in distributed file system
CN104820717A (en) Massive small file storage and management method and system
CN106066896A (en) A kind of big Data duplication applying perception deletes storage system and method
CN107085539A (en) A cloud database system and a method for dynamically adjusting cloud database resources
US20140244794A1 (en) Information System, Method and Program for Managing the Same, Method and Program for Processing Data, and Data Structure
Singh et al. Scalable metadata management techniques for ultra-large distributed storage systems--A systematic review
CN103455531A (en) Parallel indexing method supporting real-time biased query of high dimensional data
CN105354250A (en) Data storage method and device for cloud storage
Von der Weth et al. Multiterm keyword search in NoSQL systems
CN104376014B (en) Resource issue and querying method in a kind of structured P 2 P network
KR20160072305A (en) Partitioning System and Method for Distributed Storage of Large Scale Semantic Web Data in Dynamic Environments
CN103246716B (en) Based on object copies efficient management and the system of object cluster file system
CN104978327B (en) A method for querying data, a management control node and a target data node
Akdogan et al. Cost-efficient partitioning of spatial data on cloud
Furfaro et al. Managing multidimensional historical aggregate data in unstructured P2P networks
CN103207897B (en) A kind of distributed storage inquiry system and operation method thereof and running gear

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20170222