[go: up one dir, main page]

CN101814045B - Data organization method for backup services - Google Patents

Data organization method for backup services Download PDF

Info

Publication number
CN101814045B
CN101814045B CN2010101523978A CN201010152397A CN101814045B CN 101814045 B CN101814045 B CN 101814045B CN 2010101523978 A CN2010101523978 A CN 2010101523978A CN 201010152397 A CN201010152397 A CN 201010152397A CN 101814045 B CN101814045 B CN 101814045B
Authority
CN
China
Prior art keywords
data
backup
space
index
storage server
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN2010101523978A
Other languages
Chinese (zh)
Other versions
CN101814045A (en
Inventor
周可
王桦
张鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huazhong University of Science and Technology
Original Assignee
Huazhong University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huazhong University of Science and Technology filed Critical Huazhong University of Science and Technology
Priority to CN2010101523978A priority Critical patent/CN101814045B/en
Publication of CN101814045A publication Critical patent/CN101814045A/en
Application granted granted Critical
Publication of CN101814045B publication Critical patent/CN101814045B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本发明公开了一种备份服务软件存储服务器端数据组织方法,用于提高存储服务器端的数据组织和数据管理效率。方法包括:①初始化存储服务器存储空间为元数据区(包括主记录、索引头和数据索引)和数据区;②接受并判断用户操作命令,备份操作顺序进行,恢复操作转步骤④,删除操作转步骤⑤;③处理用户备份操作,将用户数据备份到存储服务器数据区,同时利用重复数据删除技术避免重复数据备份;转步骤②;④处理恢复操作,将用户指定的恢复数据列表在存储服务器数据区定位查找,然后传输到用户端;转步骤②;⑤处理删除操作,查找用户指定删除的数据,根据这些数据在存储服务器数据区的备份数据块引用计数进行相应的处理;转步骤②。本方法提高了存储服务器端的利用率、可管理性和系统的可扩展性,节省了网络带宽,提高了备份效率。

Figure 201010152397

The invention discloses a backup service software storage server end data organization method, which is used for improving the data organization and data management efficiency of the storage server end. The method includes: ① initializing storage server storage space into metadata area (including main record, index header and data index) and data area; ② accepting and judging user operation commands, performing backup operations sequentially, and turning to step ④ for restoring operations, and turning to step ④ for deleting operations. Step ⑤; ③ process user backup operation, back up user data to the storage server data area, and use deduplication technology to avoid duplicate data backup; turn to step ②; ④ process recovery operation, and list the recovery data specified by the user in the storage server data Area location search, and then transfer to the client; go to step ②; ⑤ process the delete operation, find the data that the user specifies to delete, and perform corresponding processing according to the backup data block reference count of these data in the storage server data area; go to step ②. The method improves the utilization ratio, manageability and system scalability of the storage server end, saves network bandwidth, and improves backup efficiency.

Figure 201010152397

Description

A kind of data organization method that is used for backup services
Technical field
The invention belongs to computer data storage and backup method, be specifically related to a kind of data organization method of backup services, this method has realized the deletion of piece level repeating data.
Background technology
Along with the develop rapidly of informationized society, the service operation from the daily life to the enterprise, all the infosystem of being permeated is day by day surrounded, and is also increasing to its dependence.Especially in industries such as finance, communication, traffic and insurance,, bring immeasurable loss can for individual and enterprise in case critical data is lost or damaged.
Here said backup services is one in essence provides the backup of certain disaster tolerance function to recover software systems, can provide perfect data backup, recovery and related management task for individual and enterprise customer, and can customize various backup policy according to self actual demand.The backup services here also is a kind of software pattern simultaneously, for information-based needed all-network infrastructure and software, hardware running platform are built by enterprise, and be responsible for the enforcement in all early stages, a series of services such as maintenance in later stage, enterprise need not to buy software and hardware, builds machine room, recruits the IT personnel, can use infosystem by the internet.Just as opening just energy water of water tap, enterprise leases software service according to actual needs.
Data backup is the important measures that ensure information safety with recovering.Data importance show constantly that the data that require on the storage system can obtain effectively and comprehensively protection especially.Along with the appearance and the development of express network and the communication technology, mass memory innovation technology, basic storage resources is compared snafu variation in the past.The application of increasing various infosystems also makes the data volume of conservation value be the geometric series rising, and these all are that data backup has been researched and proposed higher requirement with exploitation and the correlation technique of recovering software.
Demand to storage space and data aspect when the user uses backup services generally includes: can increase or reduce the use amount to storage space according to demand; Can accessible use both have living space, as long as promptly have remaining space and network to reach, the data backup task can both correct execution; Can recover backed up data at any time.In order to satisfy these demands, require user's space and data to possess certain logical independence, so need research user's space way to manage and Backup Data method for organizing.In addition, also need to design allocation of space, reclaim mechanism efficiently, when fully excavating the coupling of repeating data storage space, keep the logical independence of user data, and realize data search and visit efficiently.
In general backup software framework, storage server is the physical medium that the supervisor console of process data backup software authenticated, and it can be a hard drive space on the server, the memory device that server is plug-in, perhaps a disk mapping on the network.Can dispose a plurality of storage servers by supervisor console, under the unified management of backup server, backup client backups to data on the corresponding storage server.
In the design of before backup software storage server end, adopt the backup method of file-level mostly.The backup of file-level, promptly backup software can only perceive this one deck of file, with files all on the source disk, backups on another purpose medium.So the file-level backup software, otherwise the file system interface that relies on operating system to provide comes backup file, or self have the function of file system, can discern file system metadata.In brief, it is that unit is read with the file that the mechanism of file-level backup software is exactly data, and then with the file storage of reading on the another one medium.Obviously this has formed performance bottleneck for PB level large-scale storage systems, because the data cell of storage server end management is exactly a file, this inevitably causes the backup of a large amount of repeating datas, management to the storage server end has also brought very big inconvenience, can address these problems to a great extent and utilize piece level data de-duplication technology to carry out data backup.
On the other hand, in backup software before, storage server end original allocation fixes for user's free memory capacity often, greatly reduces the extensibility of system like this.In the practical application, system can't expect that each user who is faced finally can use much storage spaces (may a maximum available storage restriction be arranged according to user's authority and type certainly), distributed big and caused waste of storage space and utilization factor to descend possibly, distributed little and may bring very big restriction user's use.
Recently Avamar company has been purchased by EMC Inc., and this company obtains the data de-duplication of patent and overall single example memory technology and can guarantee the Backup Data section only storage is once in global scope.This can be effectively will move and 300 times of data recovered amount reductions, can also realize full backup and fast quick-recovery every day simultaneously.At each 24KB data segment, Avamar generates unique 20 byte ID sign, uses the SHA-1 cryptographic algorithm.This unique ID is exactly the fingerprint of this data segment, so the software of Avamar can use this unique ID to determine whether that a data segment once was stored before this.But SHA-1 cryptographic algorithm calculation of complex is very big to the consumption of CPU.Because data segment is too small, the fingerprint space that consumes when the user ID data volume is very big is also very big, also has certain scalability problem simultaneously simultaneously.
Summary of the invention
The object of the present invention is to provide a kind of data organization method that is used for backup services, this method can realize the deletion of piece level repeating data, can improve the data organization and the efficiency of management.
The invention provides a kind of data organization method that is used for backup services, this method comprises the steps:
(1) initialization:
The metadata information partially-initialized comprises that indexing head information, data directory information, the data field metadata information to meta-data region composed initial value;
Prepare to accept user's backup request in data space of data field predistribution;
(2) receive user command and judge the user command type:
Judge the user command type, if backup operation enters step (3), if recovery operation then changes step (4) over to, if deletion action then changes step (5) over to;
(3) carry out back-up processing according to following process:
(3.1) file block at first that the user is to be backed up, then the content of data piecemeal is carried out Hash with the MD5 algorithm, obtain the fingerprint of a unique identification data piecemeal, deblocking is that index stores is in the indexing head and data directory of storage server end member data with the fingerprint;
(3.2) by the fingerprint of backup client to storage server transmission deblocking, whether the storage server end is inquired about this deblocking according to fingerprint and is existed;
(3.3) if this fingerprint does not exist, then backup client transmits this deblocking and gives storage server, and then this deblocking is new Backup Data piece, in storage server end memory space dynamic allocation, and finishes the write operation of this new Backup Data piece; If exist, then only need the pairing index information of updated stored this deblocking of server end, its reference count is added one;
(3.4) change step (2) over to;
When (4) recovering, check in the Hash tabulation for the treatment of that recovery file comprises by backup server, be positioned at logical place in the corresponding data space according to Hash tabulation visit storage space metadata information, read to treat that from the storage server end recovery file data are to core buffer successively then, pass to backup client by socket then, and synthetic required file set, change step (2) again over to;
(5) delete backup file by following process:
(5.1) check in the Hash tabulation for the treatment of that deleted file comprises by backup server in the standby system software;
(5.2) search the indexing head and the data directory mapping table of the meta-data region of storage server end according to hash value, if the suction parameter hash value does not exist, then return at once, rreturn value is false;
(5.3) otherwise the reference count of the object metadata of hash value correspondence is subtracted 1, rreturn value is true;
(5.4) change step (2) again over to.
Not only the many growths of kind are fast for present business data, and are high redundancies, a lot of identical files or data storing arranged in system and between the system, and the file that edits has a large amount of redundancies too, and these redundancies are present in the file version in the past.Traditional backup software backs up these redundant datas again and again, has amplified this redundancy.Present reasonable solution is to adopt data de-duplication technology.Data de-duplication technology not only can realize high compression rate, discharges storage space, also can reduce the cost based on Disk Backup, has also reduced the cost of data management.The present invention is data organization and the management method that a kind of data de-duplication technology based on the piece level realizes the storage server end, can efficiently carry out and client computer between the transmission of backup/restoration data, and carry out local storage space management and data organization by the strategy of backup server.The present invention can realize the data de-duplication of the overall situation under the prerequisite that does not influence the main users backup and recover, along with the growth of number of users and backup data quantity, the effect of data de-duplication will be obvious all the more.Can significantly reduce the required data volume of user ID, save the storage space that BACKUP TIME, the network bandwidth and backup need.
Description of drawings
Fig. 1 is the position fixing process figure of employed storage data organization of the inventive method and data item;
Fig. 2 is the FB(flow block) of the inventive method;
Fig. 3 is the process flow diagram of memory space dynamic allocation among the present invention;
Fig. 4 is the write operation process flow diagram in the backup operation of the present invention.
Embodiment
Below by by embodiment the present invention being described in further detail, but following examples only are illustrative, and protection scope of the present invention is not subjected to the restriction of these embodiment.
The backup services system is made up of backup server, storage server, backup client three parts based on tripartite framework.Wherein, backup client is responsible for accepting other relevant requests of data backup policy, recovery request or data management of customization.Backup server connects backup client and storage server, is the control center of whole data backup software.It is responsible for user right control, overall job scheduling and overall storage administration.When backup client is initiated the backup/restoration operation, guide the storage server of itself and appointment to connect and enter the execution link by backup server; On the other hand, backup server will be monitored calculating, transmission and the storage pressure of each storage server, and carry out the load balancing strategy.User profile, storage server state and other basic metadata that supports the backup server operation intend adopting database to store.Storage server be responsible for carrying out and client computer between the transmission of backup/restoration data, and carry out local storage space management and data organization by the strategy of backup server.
Below be 4 data structures that need this example use of explanation: master record district, indexing head, data directory and data field, its structure as shown in Figure 1.
The master record district: mainly describe the information of whole storage space, it deposits following information: indexing head information, data directory information, data field metadata information.
Indexing head: be an object Hash table, be used for realizing the mapping of object ID (by 160 hash value of data content generation) to data directory.Here to as if storage system in the elementary cell of data storage, be different from file and piece as basic module in the heritage storage system, to liking the combination of application data and definition memory attribute (metadata), wherein comprise data and permission data autonomy of other enough information and self-management.Its uses hash value represent object ID in the object-oriented storage, as storing foundation, sets up mapping relations content and object between by the hash value index with file content.Because hash value is that the overall situation is unique,, improved the manageability that system is shared so have the unique NameSpace of the overall situation on statistical significance.What system adopted is ripe MD5 algorithm, and the MD5Hash algorithm is transformed into the big integer of a 128bit (16byte), i.e. object ID with the data content of random length.
Data directory: be that a size is that (N represents index number in the data directory to N, and span is 2 20~2 30) array, each element in the array is the metadata structure of an object, information in the metadata structure has: object ID, (I represents the data space numbering to the start offset address of object in the data field, J represents the logical data block number in the data space of place, K represents the interior offset address of logic data block in the data space), object institute corresponding data size, the copy number of object institute corresponding data content, with the position of next object in Object table of this object map same position in the object Hash table, this just becomes a chained list to the object linking that is mapped to same position in the object Hash table.
Data field: the data that are used for depositing object, the data of object comprise object ID, data content length and data content, for the ease of storage space management, the data field is divided into several continuous data spaces (each data space is represented with an independent data file), and each data space is made up of some logic data blocks.
Deblocking: in the backup services system, when carrying out backup or recovery operation, all be the data that will handle according to the regular length piecemeal, each piecemeal is exactly a deblocking.
Backup Data piece: when the user utilizes the file of backup client backup appointment and file, backup client at first wants these backed up data according to regular length piecemeal (dividing block size in the actual backup services software systems is 4M), and each piecemeal is exactly the Backup Data piece
Logic data block: at the storage server end, manage and efficiently utilize the storage server storage space for convenience, each data space is divided into the experimental process storage unit, and each sub-storage unit is exactly a logic data block (each logical data block size is 1G in the actual backup services software systems)
Further specify the implementation procedure of this example below in conjunction with accompanying drawing.
Show that as Fig. 2 the inventive method comprises the steps:
(1) initialization:
Usually the storage data are divided into two parts: meta-data region and data field.The actual backed up data of user is stored in the data field, and the relevant information of describing these user data is stored in meta-data region.Beginning initialization metadata district mainly is that indexing head information, data directory information, the data field metadata information to meta-data region composed initial value.With indexing head is that the object Hash table all is changed to 0, represents all availablely, also each element in the data directory array is changed to simultaneously zero, represents that also write without any data this time.And prepare to accept user's backup request in data space of data field predistribution.The data space sum is defined as S, and S value maximum is no more than 1000.The data space number that the data field is current has used is V, V<=S.The preallocated logic data block of each data space (block) number is defined as P, and the P maximum is no more than 10.Each data space largest logical data block number is defined as W, and the W maximum is no more than 1024.In our present backup services Software deployment was implemented, each data space was made up of 1024 logic data blocks, and each logical data block size is 1G, and each data space is 1T to the maximum.
(2) receive user command and judge the user command type:
Judge the user command type, if backup operation enters step (3), if recovery operation then changes step (4) over to, if deletion action then changes step (5) over to;
(3) carry out back-up processing according to following process:
(3.1) (it is b to file block at first that the user is to be backed up that definition of data divides block size, b value size is 1M---4M, the b value is 4M during this backup services Software deployment of reality), then the content of data piecemeal is carried out Hash with the MD5 algorithm, obtain the fingerprint of a unique identification data piecemeal, deblocking is that index stores is in the indexing head and data directory of storage server end member data with the fingerprint;
(3.2) by the fingerprint of backup client to storage server transmission deblocking, whether the storage server end is inquired about this deblocking according to fingerprint and is existed;
(3.3) if this fingerprint does not exist, then backup client transmits this deblocking and gives storage server, in storage server end memory space dynamic allocation, and finishes the write operation of this deblocking; If exist, then need not to transmit data, only need the pairing index information of updated stored this deblocking of server end, reference count is added one.
(3.4) change step (2) over to;
In the above-mentioned steps (3.3), can be according to process memory space dynamic allocation shown in Figure 3, concrete steps are as follows:
(a1) judge whether the residue free space that can satisfy the big or small Backup Data piece of appointment is arranged,, enter step (a5) in P the logic data block in current data space if having, otherwise, step (a2) entered;
(a2) judge whether P<W sets up, enter step (a6) if set up, otherwise, step (a3) entered;
(a3) judge whether other data space in the storage server master record has the residue free space that can satisfy the Backup Data piece of specifying size, if having, enters step (a5), otherwise, step (a4) entered;
(a4) whether interpretation V<S sets up, if set up, then increases a data space on storage server, for Backup Data piece to be written in the new data space distributes a data index, changes step (a8) then over to, otherwise, enter step (a7);
(a5) for Backup Data piece to be written in the residue free space distributes a data index, change step (a8) then over to;
(a6) being the space of a Backup Data block size of this data space growth on storage server, is that this Backup Data piece distributes a data index again, changes step (a8) then over to;
(a7) because can not find the residue free space that can satisfy the Backup Data piece of specifying size, so announce the dynamic assignment failure;
(a8) finish dynamic allocation procedure.
Can also process as shown in Figure 4 finish write operation, its step is as follows:
(b1) dynamically seek free memory at the storage server end, search whether the logic data block that satisfies condition is arranged;
(b2) if do not have the utilogic data block then return failure;
(b3) if the free memory that satisfies new Backup Data block size is arranged, just create a new data index, new Backup Data piece is write the respective stored server location, then respective index head and data directory metadata are write the master record district.
When (4) recovering, check in the Hash tabulation for the treatment of that recovery file comprises by backup server in the standby system software, be positioned at logical place in the corresponding data space according to Hash tabulation visit storage space metadata information, read to treat that from the storage server end recovery file data are to core buffer successively then, pass to backup client by socket then, and synthetic required file set, change step (2) again over to.
As shown in Figure 2, it is as follows to be positioned at the process of the logical place in the corresponding data space according to Hash tabulation visit storage space metadata information:
(4.1) establish the figure place that m is predefined indexing head, by the preceding m position of data hash value indexing head is carried out index, the content of indexing head has constituted data directory number.
Usually, each indexing head accounts for two bytes, and one has 2 mIndividual, the span of m is generally 20~30.
(4.2) by indexing head to the data indexed addressing.Data directory then carries out addressing to the data item of single job, specifically comprises three partial contents:
(4.2.1) the structure member I by data directory (I is the data space numbering) finds concrete data space number;
(4.2.2) the structure member J by data directory (J is the logical data block number in the data space) finds the piece number in the data space;
(4.2.4) the structure member K (K is the interior offset address of logic data block in the data space) by data directory finds the offset address of data item in logic data block, is equivalent to three-level addressing.Data head and data entity that thus can the locator data item.
(4.4) obtain top three grades of logic data block address informations, just can navigate to corresponding data field reading of data.
(5) delete backup file by following process:
(5.1) check in the Hash tabulation that backup file to be deleted comprises by backup server in the standby system software;
(5.2) search the indexing head and the data directory mapping table of the meta-data region of storage server end according to hash value, if the suction parameter hash value does not exist, then return at once, rreturn value is false;
(5.3) otherwise the reference count of the object metadata of hash value correspondence is subtracted 1, rreturn value is true;
(5.4) change step (2) again over to.
Because we provide a kind of online backup service, thus backup server and storage server as finger daemon all the time at running background, therefore do not have the end situation, wait for the operation requests that receives the user all the time.And the operation interface of backup client to be the user use online backup service, the user can land the operation that backup client is carried out appointment arbitrarily the time, as backup, recovery and deletion etc.
Example:
The run time infrastructure of backup services system applies is:
1. hardware environment and support environment
Backup client requires main frame to possess 512M and above internal memory, 10Mbps and above network handling capacity.
Dispatch server requires main frame to possess 2GB and above internal memory, 1000Mbps and above network handling capacity.
Storage server requires main frame to possess 4GB and above internal memory and TB level external memory ability, the above network handling capacity of 1000Mbps level.
Possess GB level network exchange ability between dispatch server and the storage server software place main frame, possess the network-in-dialing ability between client and the service end software place main frame.Require server host place environment to possess the pacing items that redundant power guarantee, the guarantee of redundancy communication link, temperature control system, fire prevention system etc. guarantee that main frame runs well.
2. software runtime environment
The backup client program run is under Windows XP and later version operating system or the operating system platform based on Linux 2.6 kernels.
Dispatch server and storage server operate under Windows Server 2003 operating system platforms.
In the online backup service system that realizes at present and normally move, each data space size of storage server end is 1T, and the data space number is 20 to the maximum.Each data space is divided into 1024 logic data blocks, and each logical data block size is 1G.
The above is preferred embodiment of the present invention, but the present invention should not be confined to the disclosed content of this embodiment and accompanying drawing.So everyly do not break away from the equivalence of finishing under the spirit disclosed in this invention or revise, all fall into the scope of protection of the invention.

Claims (4)

1.一种用于备份服务的数据组织方法,其特征在于,该方法包括下述步骤: 1. A data organization method for backup service, characterized in that the method comprises the steps of: (1)初始化:(1) Initialization: 元数据信息部分初始化,包括给元数据区的索引头信息、数据索引信息、数据区元数据信息赋初始值;Partial initialization of metadata information, including assigning initial values to index header information, data index information, and data area metadata information in the metadata area; 在数据区预分配一个数据空间准备接受用户的备份请求;Pre-allocate a data space in the data area and prepare to accept the user's backup request; (2)接收用户命令并判断用户命令类型:(2) Receive user commands and determine the type of user commands: 判断用户命令类型,如果是备份操作,进入步骤(3),如果是恢复操作,则转入步骤(4),如果是删除操作,则转入步骤(5);Determine the user command type, if it is a backup operation, go to step (3), if it is a restore operation, go to step (4), if it is a delete operation, go to step (5); (3)按照下述过程进行备份处理:(3) Perform backup processing according to the following process: (3.1)首先将用户待备份的文件分块,然后对数据分块的内容用MD5算法进行哈希,得到一个唯一标识数据分块的指纹,数据分块以指纹为索引存储在存储服务器端元数据的索引头和数据索引中;(3.1) First divide the file to be backed up by the user into blocks, and then hash the content of the data block with the MD5 algorithm to obtain a fingerprint that uniquely identifies the data block. The data block is stored in the storage server end element with the fingerprint as the index Data index header and data index; (3.2)由备份客户端向存储服务器传送数据分块的指纹,存储服务器端根据指纹查询该分块是否存在;(3.2) The backup client transmits the fingerprint of the data block to the storage server, and the storage server queries whether the block exists according to the fingerprint; (3.3)如果该指纹不存在,则备份客户端传送该数据分块给存储服务器,则该数据分块为新备份数据块,在存储服务器端动态分配存储空间,并完成该新备份数据块的写操作;如果存在,则只需更新存储服务器端该数据分块所对应的索引信息,将其引用计数加一;(3.3) If the fingerprint does not exist, the backup client sends the data block to the storage server, then the data block is a new backup data block, and the storage server dynamically allocates storage space, and completes the new backup data block Write operation; if it exists, you only need to update the index information corresponding to the data block on the storage server, and increase its reference count by one; (3.4)转入步骤(2);(3.4) Go to step (2); (4)恢复时,由备份服务器查得待恢复文件包含的Hash列表,根据Hash列表访问存储空间元数据信息来定位在相应数据空间中的逻辑位置,然后依次从存储服务器端读待恢复文件数据到内存缓冲区,然后通过套接字传给备份客户端,并合成所需文件集,再转入步骤(2);(4) When restoring, the backup server checks the Hash list contained in the file to be restored, accesses the metadata information of the storage space according to the Hash list to locate the logical position in the corresponding data space, and then reads the data of the file to be restored from the storage server in turn to the memory buffer, and then pass it to the backup client through the socket, and synthesize the required file set, and then turn to step (2); (5)按下述过程删除已备份文件: (5) Delete the backed up files according to the following procedure:   (5.1) 由备份系统软件中备份服务器查得待删除文件包含的Hash列表;(5.1) The Hash list contained in the file to be deleted is found by the backup server in the backup system software; (5.2) 根据Hash值查找存储服务器端的元数据区的索引头和数据索引映射表,如果入口参数Hash值不存在,则立刻返回,返回值为false;(5.2) Search the index header and data index mapping table of the metadata area on the storage server according to the Hash value. If the Hash value of the entry parameter does not exist, return immediately, and the return value is false; (5.3)否则将Hash值对应的对象元数据的引用计数减1,返回值为true;(5.3) Otherwise, the reference count of the object metadata corresponding to the Hash value is decremented by 1, and the return value is true; (5.4)再转入步骤(2)。(5.4) Go to step (2) again. 2.根据权利要求1所述的用于备份服务的数据组织方法,其特征在于,上述步骤(3.3)中,令P表示每个数据空间预分配的逻辑数据块个数,W表示每个数据空间内能够容纳的最大逻辑数据块个数,V表示使用了的数据空间数,S表示数据空间总数;2. The data organization method for backup service according to claim 1, characterized in that, in the above step (3.3), let P represent the number of logical data blocks pre-allocated in each data space, and W represent each data space The maximum number of logical data blocks that can be accommodated in the space, V represents the number of data spaces used, and S represents the total number of data spaces; 动态分配存储空间的具体步骤如下:The specific steps of dynamically allocating storage space are as follows: (a1) 判断在当前数据空间的P个逻辑数据块中是否有能够满足指定大小备份数据块的剩余可用空间,如果有,进入步骤(a5),否则,进入步骤(a2);(a1) Judging whether there is remaining available space in the P logical data blocks of the current data space that can satisfy the specified size of the backup data block, if yes, go to step (a5), otherwise, go to step (a2); (a2)判断P<W是否成立,若成立进入步骤(a6),否则,进入步骤(a3);(a2) Determine whether P<W is true, if true, go to step (a6), otherwise, go to step (a3); (a3)判断在存储服务器主记录中的其它数据空间是否有能够满足指定大小的备份数据块的剩余可用空间,如果有,进入步骤(a5),否则,进入步骤(a4);(a3) Judging whether other data spaces in the main record of the storage server have remaining free space that can meet the specified size of the backup data block, if yes, go to step (a5), otherwise, go to step (a4); (a4)判读V<S是否成立,若成立,则在存储服务器上增长一个数据空间,为新的数据空间内待写入备份数据块分配一个数据索引,然后转入步骤(a8),否则,进入步骤(a7);(a4) Judging whether V<S is true, if it is true, add a data space on the storage server, allocate a data index for the backup data block to be written in the new data space, and then go to step (a8), otherwise, Go to step (a7); (a5)  为剩余可用空间内待写入备份数据块分配一个数据索引,然后转入步骤(a8);(a5) Allocate a data index for the backup data block to be written in the remaining available space, and then go to step (a8);  (a6) 在存储服务器上为该数据空间增长一个逻辑数据块大小的空间,再为备份数据块分配一个数据索引,然后转入步骤(a8);  (a6) Increase the space of a logical data block size for the data space on the storage server, then allocate a data index for the backup data block, and then turn to step (a8); (a7) 宣布动态分配失败;(a7) declare a dynamic allocation failure;  (a8) 结束动态分配过程。(a8) End the dynamic allocation process. 3.根据权利要求1所述的用于备份服务的数据组织方法,其特征在于,写操作包括下述步骤:3. The data organization method for backup service according to claim 1, wherein the writing operation comprises the following steps: (b1)在存储服务器端动态寻找可用存储空间,查找是否有满足条件的逻辑数据块;(b1) Dynamically search for available storage space on the storage server side, and find out whether there are logical data blocks that meet the conditions; (b2)如果没有可用逻辑数据块则返回失败;(b2) Return failure if there is no available logical data block; (b3)如果有满足新备份数据块大小的可用存储空间,则创建一个新数据索引,将新备份数据块写入相应存储服务器位置,然后将相应索引头和数据索引元数据写入主记录区。(b3) If there is available storage space that meets the size of the new backup data block, create a new data index, write the new backup data block to the corresponding storage server location, and then write the corresponding index header and data index metadata into the main record area . 4.根据权利要求1所述的用于备份服务的数据组织方法,其特征在于,根据Hash列表访问存储空间元数据信息来定位在相应数据空间中的逻辑位置的过程如下:4. the data organization method that is used for backup service according to claim 1, is characterized in that, the process of being positioned at the logical position in corresponding data space according to Hash list access storage space metadata information is as follows: (4.1)按照预先设定的索引头的位数,由数据Hash值的前位对索引头进行索引,索引头的内容构成数据索引号; (4.1) According to the preset number of digits in the index header, the index header is indexed by the first digit of the data Hash value, and the content of the index header constitutes the data index number; (4.2)通过索引头对数据索引寻址,再通过数据索引对一次操作的数据项进行寻址,具体包括三部分内容:(4.2) Address the data index through the index header, and then address the data item of an operation through the data index, which specifically includes three parts: (4.2.1)通过数据索引的结构体成员中的数据空间编号找到具体的数据空间号;(4.2.1) Find the specific data space number through the data space number in the structure member of the data index; (4.2.2)通过数据索引的结构体成员中的数据空间内的逻辑数据块编号找到数据空间中的块号;(4.2.2) Find the block number in the data space through the logical data block number in the data space in the structure member of the data index; (4.2.3)通过数据索引的结构体成员中的数据空间内的逻辑数据块内的偏移地址找到数据项在逻辑数据块中的偏移地址,定位数据项的数据头和数据实体;(4.2.3) Find the offset address of the data item in the logical data block through the offset address in the logical data block in the data space of the structure member of the data index, and locate the data header and data entity of the data item; (4.4)利用获得的地址信息,定位到相应的数据区读取数据。(4.4) Use the obtained address information to locate the corresponding data area to read data.
CN2010101523978A 2010-04-22 2010-04-22 Data organization method for backup services Expired - Fee Related CN101814045B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2010101523978A CN101814045B (en) 2010-04-22 2010-04-22 Data organization method for backup services

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2010101523978A CN101814045B (en) 2010-04-22 2010-04-22 Data organization method for backup services

Publications (2)

Publication Number Publication Date
CN101814045A CN101814045A (en) 2010-08-25
CN101814045B true CN101814045B (en) 2011-09-14

Family

ID=42621306

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2010101523978A Expired - Fee Related CN101814045B (en) 2010-04-22 2010-04-22 Data organization method for backup services

Country Status (1)

Country Link
CN (1) CN101814045B (en)

Families Citing this family (63)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8392376B2 (en) * 2010-09-03 2013-03-05 Symantec Corporation System and method for scalable reference management in a deduplication based storage system
KR101035302B1 (en) * 2010-10-11 2011-05-19 (주)이스트소프트 How to compress and transfer files in the cloud system and cloud system
CN102456059A (en) * 2010-10-21 2012-05-16 英业达股份有限公司 Data de-duplication processing system
CN101986276B (en) * 2010-10-21 2013-07-10 华为数字技术(成都)有限公司 Methods and systems for storing and recovering files and server
CN102467528A (en) * 2010-11-02 2012-05-23 英业达股份有限公司 deduplication operating system
CN102004769B (en) * 2010-11-12 2013-03-27 华为数字技术(成都)有限公司 File management method, equipment and memory system
CN102469142A (en) * 2010-11-16 2012-05-23 英业达股份有限公司 Data transfer methods for deduplicators
CN101989929B (en) * 2010-11-17 2014-07-02 中兴通讯股份有限公司 Disaster recovery data backup method and system
CN102479245B (en) * 2010-11-30 2013-07-17 英业达集团(天津)电子技术有限公司 Data block segmentation method
CN102012846A (en) * 2010-12-12 2011-04-13 成都东方盛行电子有限责任公司 Integrity check method for large video file
TWI420306B (en) * 2010-12-22 2013-12-21 Inventec Corp A searching method of the blocks of the data deduplication
CN102647399B (en) * 2011-02-17 2015-08-19 腾讯科技(深圳)有限公司 A kind of software backup method and system
CN102810107B (en) * 2011-06-01 2015-10-07 英业达股份有限公司 How to deal with duplicate data
CN102810108A (en) * 2011-06-02 2012-12-05 英业达股份有限公司 How to deal with duplicate data
CN102833298A (en) * 2011-06-17 2012-12-19 英业达集团(天津)电子技术有限公司 Distributed repeated data deleting system and processing method thereof
CN102436408B (en) * 2011-10-10 2014-02-19 上海交通大学 Data storage cloudification and cloud backup method based on Map/Dedup
CN102385554B (en) * 2011-10-28 2014-01-15 华中科技大学 Optimizing Method for Data Deduplication System
CN102364474B (en) * 2011-11-17 2014-08-20 中国科学院计算技术研究所 Metadata storage system for cluster file system and metadata management method
CN103164431B (en) * 2011-12-13 2016-04-20 北京神州泰岳软件股份有限公司 The date storage method of relevant database and storage system
CN103218273A (en) * 2012-01-20 2013-07-24 深圳市腾讯计算机系统有限公司 Hard disk data recovery method, server and distributed-memory system
CN103309873B (en) * 2012-03-09 2018-06-12 阿里巴巴集团控股有限公司 The processing method of data, apparatus and system
WO2013163813A1 (en) * 2012-05-04 2013-11-07 华为技术有限公司 Data deduplication method and device
CN102799659B (en) * 2012-07-05 2015-01-21 广州鼎鼎信息科技有限公司 Overall repeating data deleting system and method based on non-centre distribution system
CN102915325A (en) * 2012-08-11 2013-02-06 深圳市极限网络科技有限公司 Md5 Hash list-based file decomposing and combining technique
CN102890721B (en) * 2012-10-16 2016-03-30 苏州迈科网络安全技术股份有限公司 Based on database building method and the system of row memory technology
CN103873503A (en) * 2012-12-12 2014-06-18 鸿富锦精密工业(深圳)有限公司 Data block backup system and method
CN103139300A (en) * 2013-02-05 2013-06-05 杭州电子科技大学 Virtual machine image management optimization method based on data de-duplication
CN103530201B (en) * 2013-07-17 2016-03-02 华中科技大学 A kind of secure data De-weight method and system being applicable to standby system
CN103412929A (en) * 2013-08-16 2013-11-27 蓝盾信息安全技术股份有限公司 Mass data storage method
CN103559143A (en) * 2013-11-08 2014-02-05 华为技术有限公司 Data copying management device and data copying method of data copying management device
CN103810297B (en) * 2014-03-07 2017-02-01 华为技术有限公司 Writing method, reading method, writing device and reading device on basis of re-deleting technology
CN103944969A (en) * 2014-03-31 2014-07-23 中国电子科技集团公司第三十研究所 Secure transmission optimization method and device for narrow-band network
CN104166607B (en) * 2014-09-04 2017-12-19 北京国双科技有限公司 Data processing method and device for backup database
CN104317735A (en) * 2014-09-24 2015-01-28 北京云巢动脉科技有限公司 High-capacity cache and method for data storage and readout as well as memory allocation and recovery
CN104317676A (en) * 2014-11-21 2015-01-28 四川智诚天逸科技有限公司 Data backup disaster tolerance method
CN104537112B (en) * 2015-01-20 2017-07-14 成都携恩科技有限公司 A kind of method of safe cloud computing
CN104536849B (en) * 2015-01-20 2017-10-20 成都携恩科技有限公司 A kind of data back up method based on cloud computing
CN104778095B (en) * 2015-01-20 2017-11-17 成都携恩科技有限公司 A kind of cloud platform data managing method
US9684569B2 (en) * 2015-03-30 2017-06-20 Western Digital Technologies, Inc. Data deduplication using chunk files
CN104965772A (en) * 2015-07-29 2015-10-07 浪潮(北京)电子信息产业有限公司 Method and device for recovering files
CN105183400B (en) * 2015-10-23 2019-03-12 浪潮(北京)电子信息产业有限公司 A method and system for object storage based on content addressing
CN105302675A (en) * 2015-11-25 2016-02-03 上海爱数信息技术股份有限公司 Method and device for data backup
CN107340971B (en) * 2016-04-28 2019-05-07 优刻得科技股份有限公司 A kind of storage of data with restore framework and method
CN106203154A (en) * 2016-06-27 2016-12-07 联想(北京)有限公司 A kind of file memory method and electronic equipment
CN106326397A (en) * 2016-08-19 2017-01-11 东软集团股份有限公司 Method and device for generating index file
CN106372170B (en) * 2016-08-30 2020-02-14 上海爱数信息技术股份有限公司 Method and system for recovering table in database and server with system
CN106877998B (en) * 2017-01-11 2020-06-19 裘羽 Electronic evidence management method and system
CN107066352A (en) * 2017-03-02 2017-08-18 陈辉 With delete again and remote functionality portable intelligent device backup devices and methods therefor
CN109254786B (en) * 2018-09-30 2022-04-05 湖北华联博远科技有限公司 Software backup restoration method and system
CN109271461A (en) * 2018-09-30 2019-01-25 广州鼎甲计算机科技有限公司 The increment synthesized backup method and device of SQL Server database
CN111435331B (en) * 2019-01-14 2022-08-26 杭州宏杉科技股份有限公司 Data writing method and device for storage volume, electronic equipment and machine-readable storage medium
CN111694848B (en) * 2019-03-15 2023-04-25 阿里巴巴集团控股有限公司 Method and apparatus for updating data buffering using reference counts
CN110471793B (en) * 2019-07-18 2022-05-06 维沃移动通信有限公司 Data backup method, data recovery method, first terminal and second terminal
CN112394873B (en) * 2019-08-12 2024-05-24 深信服科技股份有限公司 Data management method, system, electronic equipment and storage medium
CN113422789B (en) * 2020-03-26 2022-11-25 山东管理学院 Service deployment method and system in network computing environment
CN112000523A (en) * 2020-08-25 2020-11-27 浪潮云信息技术股份公司 Cloud backup system and method
CN112256194A (en) * 2020-09-30 2021-01-22 新华三技术有限公司成都分公司 Storage space distribution method and storage server
CN114528148B (en) * 2020-10-30 2025-08-12 伊姆西Ip控股有限责任公司 Method, electronic device and computer program product for storage management
CN112328435B (en) * 2020-12-07 2023-09-12 武汉绿色网络信息服务有限责任公司 Methods, devices, equipment and storage media for target data backup and recovery
CN114816228B (en) * 2021-01-29 2025-04-29 中移(苏州)软件技术有限公司 Data processing method, device, server and storage medium
CN113111043B (en) * 2021-04-21 2023-05-23 北京大学 Method, device, system and storage medium for processing medium source data file
CN114064361A (en) * 2021-11-16 2022-02-18 阿里巴巴(中国)有限公司 Data writing method executed in backup related operation and backup gateway system
CN118503207B (en) * 2024-07-17 2024-10-29 青岛诺亚信息技术有限公司 Scientific research whole process-oriented data management and archiving method and integrated platform

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CA2546304A1 (en) * 2003-11-13 2005-05-26 Commvault Systems, Inc. System and method for performing an image level snapshot and for restoring partial volume data
CN100547555C (en) * 2007-12-10 2009-10-07 华中科技大学 A Data Backup System Based on Fingerprint

Also Published As

Publication number Publication date
CN101814045A (en) 2010-08-25

Similar Documents

Publication Publication Date Title
CN101814045B (en) Data organization method for backup services
US12248379B2 (en) Using mirrored copies for data availability
CN114041112B (en) Virtual storage system architecture
JP7312251B2 (en) Improving available storage space in systems with various data redundancy schemes
US12001688B2 (en) Utilizing data views to optimize secure data access in a storage system
US11068389B2 (en) Data resiliency with heterogeneous storage
US11995336B2 (en) Bucket views
US12175124B2 (en) Enhanced data access using composite data views
US20230333781A1 (en) Modular data storage system with data resiliency
CN101241476B (en) A kind of virtual storage system and method
CN105718217B (en) A kind of method and device of simplify configuration storage pool data sign processing
US10359967B2 (en) Computer system
CN102436408B (en) Data storage cloudification and cloud backup method based on Map/Dedup
CN102906743A (en) Hybrid OLTP and OLAP high-performance database system
CN103890738A (en) System and method for preserving deduplication in storage objects after clone split operations
CN107291889A (en) A kind of date storage method and system
US10346362B2 (en) Sparse file access
CN115083538B (en) Medicine data processing system, operation method and data processing method
CN1770114A (en) Copy operations in storage networks
CN100524235C (en) Recovery operations in storage networks
US20250278360A1 (en) Optimizing subsegment height in a heterogeneous storage system
US8527723B1 (en) Storage system and control method for storage system
US11656961B2 (en) Deallocation within a storage system
US9594635B2 (en) Systems and methods for sequential resilvering
US7350042B1 (en) Method of optimizing the space and improving the write performance of volumes with multiple virtual copies

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20110914

CF01 Termination of patent right due to non-payment of annual fee