[go: up one dir, main page]

CN113391948B - Folding type extensible distributed storage coding and repairing and expanding method - Google Patents

Folding type extensible distributed storage coding and repairing and expanding method Download PDF

Info

Publication number
CN113391948B
CN113391948B CN202110726617.1A CN202110726617A CN113391948B CN 113391948 B CN113391948 B CN 113391948B CN 202110726617 A CN202110726617 A CN 202110726617A CN 113391948 B CN113391948 B CN 113391948B
Authority
CN
China
Prior art keywords
matrix
check
nodes
information
symbols
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110726617.1A
Other languages
Chinese (zh)
Other versions
CN113391948A (en
Inventor
孙蓉
杜从军
刘景伟
裴庆祺
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xidian University
Original Assignee
Xidian University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xidian University filed Critical Xidian University
Priority to CN202110726617.1A priority Critical patent/CN113391948B/en
Publication of CN113391948A publication Critical patent/CN113391948A/en
Application granted granted Critical
Publication of CN113391948B publication Critical patent/CN113391948B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/08Error detection or correction by redundancy in data representation, e.g. by using checking codes
    • G06F11/10Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's
    • G06F11/1076Parity data used in redundant arrays of independent storages, e.g. in RAID systems
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0614Improving the reliability of storage systems
    • G06F3/0619Improving the reliability of storage systems in relation to data integrity, e.g. data losses, bit errors
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/067Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Quality & Reliability (AREA)
  • Computer Security & Cryptography (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

本发明公开了一种折叠式可扩展分布式存储编码及修复、扩展方法,包括:确定各节点的编码参数后从最后一个阶段的生成矩阵开始依次构造出其他阶段对应的生成矩阵集,组合出一个编码组并从中选择一个码对待编码数据进行编码;出现节点失效时选择与信息节点相同数量的未失效的节点并从中下载数据符号对失效节点中的数据符号进行恢复;对编码数据进行扩展时将每个扩展组内的两个子条带进行合并。本发明具有能够提高扩展后系统的容错能力、具备MDS性质、扩展带宽低、可多次扩展等优点,可用于对节点具有计算能力的分布式存储系统进行编码、修复和扩展。

Figure 202110726617

The invention discloses a foldable and scalable distributed storage encoding, repairing and expanding method, comprising: after determining the encoding parameters of each node, starting from the last stage of the generation matrix, sequentially constructing the generating matrix sets corresponding to other stages, and combining a code group and select a code from it to encode the data to be encoded; when a node failure occurs, select the same number of non-failed nodes as the information nodes and download data symbols from them to restore the data symbols in the failed node; when the encoded data is extended Merge the two sub-stripes within each expansion group. The invention has the advantages of being able to improve the fault-tolerant capability of the expanded system, possessing MDS properties, low expansion bandwidth, and multi-expansion, and can be used for coding, repairing and expanding the distributed storage system whose nodes have computing power.

Figure 202110726617

Description

一种折叠式可扩展分布式存储编码及修复、扩展方法A foldable and scalable distributed storage encoding, repair and expansion method

技术领域technical field

本发明属于计算机技术领域,更进一步涉及分布式存储技术领域中的一种折叠式可扩展分布式存储编码及修复、扩展方法。本发明可用于对节点具有计算能力的分布式存储系统进行编码、修复和扩展。The invention belongs to the field of computer technology, and further relates to a foldable and scalable distributed storage coding, repair and expansion method in the field of distributed storage technology. The present invention can be used to code, repair and expand a distributed storage system whose nodes have computing power.

背景技术Background technique

由于分布式存储系统数据存储量大,节点失效的事件频发,分布式存储系统需要通过存储冗余数据提高系统的可靠性。纠删码技术是一种典型的数据冗余机制,通过将原始数据分割成信息块后进行编码产生校验块,并将信息块与校验块分散存储在分布式存储系统的节点中,实现容错的目的。分布式存储系统的规模通常随着使用时间的变化而不断增大,有越来越多的新节点加入到系统中,理想情况下分布式存储系统的编码参数应当可以根据应用需求进行动态扩展,基于纠删码的分布式存储系统在进行码参数扩展时,数据的迁移、更新过程需要在节点之间传输大量的数据,这会造成大量的网络带宽资源消耗,影响分布式存储系统的性能。Due to the large amount of data storage in the distributed storage system and frequent node failure events, the distributed storage system needs to improve the reliability of the system by storing redundant data. Erasure coding technology is a typical data redundancy mechanism. The original data is divided into information blocks and then encoded to generate check blocks, and the information blocks and check blocks are scattered and stored in the nodes of the distributed storage system. fault tolerance purpose. The scale of the distributed storage system usually increases with the change of usage time, and more and more new nodes are added to the system. Ideally, the coding parameters of the distributed storage system should be dynamically expandable according to the application requirements. When the erasure code-based distributed storage system expands the code parameters, a large amount of data needs to be transmitted between nodes in the process of data migration and update, which will cause a large amount of network bandwidth resource consumption and affect the performance of the distributed storage system.

华中科技大学在申请的专利文献“一种基于网络编码的存储扩展方法”(专利申请号:201810304384.4,公布号:CN 108536396 B)中公开了一种基于网络编码的存储扩展方法。该方法的思路是将存储扩展前的条带划分为多个扩展组,并进一步将每个扩展组划分为PG和DG;在DG内循环地依次从原节点中取数据块,得到一系列的数据集合;利用网络编码对每一个数据集合进行编码生成更新块,使用这些更新块对PG中的编码块进行本地更新或者异地更新;将编码块或者数据块传输至新增节点上,并保持扩展后数据块与编码块在所有节点上均匀放置;删除所有传输至新节点的数据块和编码块,并删除DG内所有的编码块。该方法在存储扩展时利用存储节点的计算资源对数据块进行编码并对部分编码块进行本地更新,减少了扩展带宽,但是,该方法仍然存在的不足之处是,进行存储扩展后系统中节点的数量增多,需要更多的编码块来满足系统对于容错能力的要求,该方法在存储扩展时单个条带内编码块的数量保持不变,系统扩展后的容错能力无法适应系统的节点规模。Huazhong University of Science and Technology disclosed a network coding-based storage expansion method in the patent document "A Network Coding-Based Storage Expansion Method" (Patent Application No.: 201810304384.4, Publication No.: CN 108536396 B). The idea of this method is to divide the strip before storage expansion into multiple expansion groups, and further divide each expansion group into PG and DG; cyclically take data blocks from the original node in DG, and obtain a series of Data set; use network coding to encode each data set to generate update blocks, and use these update blocks to update the code blocks in the PG locally or off-site; transfer the code blocks or data blocks to the new node, and keep the expansion The latter data blocks and coding blocks are evenly placed on all nodes; all data blocks and coding blocks transmitted to the new node are deleted, and all coding blocks in the DG are deleted. This method utilizes the computing resources of storage nodes to encode data blocks and locally update some of the encoded blocks during storage expansion, which reduces the expansion bandwidth. However, this method still has the disadvantage that after storage expansion, the nodes in the system The number of coding blocks increases, and more coding blocks are needed to meet the system's requirements for fault tolerance. This method keeps the number of coding blocks in a single strip unchanged during storage expansion, and the fault tolerance after system expansion cannot adapt to the node scale of the system.

陈亮等人发表的论文“随机二元扩展码:一种适用于分布式存储系统的编码”(计算机学报,2017年9月)提出一种可以动态调整码率和纠删能力的编码方法。该方法的编码矩阵由一个单元阵和一个随机阵构成,采用自顶向下的设计模式,通过控制随机矩阵中各个元素生成,达到码字整体上的高性能。该方法的参数具有动态调整的能力,其编码矩阵的行列可以自由伸缩,进而,存储系统可根据应用需求的变化,动态调整码率和纠删能力,该方法仍然存在的不足是,在对失效节点进行修复时需要连接的节点数量大于信息节点数量,修复度高。The paper "Random Binary Spreading Codes: A Coding Applicable to Distributed Storage Systems" (Journal of Computer, September 2017) published by Chen Liang et al. proposes a coding method that can dynamically adjust the code rate and erasure capability. The coding matrix of the method is composed of a unit matrix and a random matrix. The top-down design mode is adopted, and the generation of each element in the random matrix is controlled to achieve the overall high performance of the codeword. The parameters of this method have the ability to dynamically adjust, and the rows and columns of the encoding matrix can be freely scaled, and further, the storage system can dynamically adjust the code rate and erasure ability according to changes in application requirements. When a node is repaired, the number of nodes that need to be connected is greater than the number of information nodes, and the repair degree is high.

Francisco Maturana等人发表的论文“Convertible codes:New class of codesfor efficient conversion of coded data in distributed storage”(11thInnovations in Theoretical Computer Science Conference ser.LeibnizInternational Proceedings in Informatics,vol.151,pp.66:1-66:26,2020.)提出了一种有效进行码转换的编码方法,并在其后发表的论文“Bandwidth Cost of CodeConversions in Distributed Storage:Fundamental Limits and OptimalConstructions”(arXiv:2008.12707)中借助网络信息流图对码转换问题进行了分析,提出一种用于降低转换带宽的Convertible codes编码方案。Convertible codes有效降低了系统在从初始码向最终码进行扩展时的资源消耗,但该方法仍然存在的不足是,该方法仅可以以低的带宽资源消耗扩展一次。The paper "Convertible codes: New class of codes for efficient conversion of coded data in distributed storage" by Francisco Maturana et al. (11thInnovations in Theoretical Computer Science Conference ser. LeibnizInternational Proceedings in Informatics,vol.151,pp.66:1-66: 26, 2020.) proposed an efficient encoding method for code conversion, and used the network information flow graph in the subsequent paper "Bandwidth Cost of CodeConversions in Distributed Storage: Fundamental Limits and Optimal Constructionions" (arXiv:2008.12707). The problem of code conversion is analyzed, and a coding scheme of Convertible codes is proposed to reduce the conversion bandwidth. Convertible codes effectively reduce the resource consumption of the system when extending from the initial code to the final code, but this method still has the disadvantage that this method can only be extended once with low bandwidth resource consumption.

发明内容SUMMARY OF THE INVENTION

本发明的目的在于针对上述现有技术的不足,提出一种折叠式可扩展分布式存储编码及修复、扩展方法,旨在解决系统扩展后的容错能力无法适应节点规模、修复失效节点时修复度高、可扩展次数少的问题。The purpose of the present invention is to propose a foldable and scalable distributed storage coding, repair and expansion method in view of the deficiencies of the above-mentioned prior art, aiming at solving the problem that the fault tolerance capability of the system after expansion cannot adapt to the node scale, and the repair degree when repairing a failed node The problem of high and low scalability.

实现本发明目的的思路是:由于本发明的编码方法是按照公式从第1个阶段的编码参数开始依次计算其他阶段的编码参数,由于在计算出的校验节点数量增多,解决了系统扩展后的容错能力无法适应节点规模的问题。将一个系统型MDS码作为基本码构造最后一个阶段对应的生成矩阵,并根据设定的折叠规则逆序地依次构造出其他阶段对应的生成矩阵集,组合出一个编码组,并从编码组中选择一个码对待编码数据进行编码。由于本发明的修复方法是从未失效的节点中选择与信息节点数相同数量的节点下载数据符号对失效节点中的数据符号进行恢复,由于选择的节点数等于信息节点数,解决了修复失效节点时修复度高的问题。由于本发明的扩展方法是对编码数据进行扩展时将每个扩展组内的两个子条带进行合并,由于编码组中有多个码,这样的合并过程可以进行多次,解决了可扩展次数少的问题。The idea of realizing the object of the present invention is: because the encoding method of the present invention calculates the encoding parameters of other stages in turn from the encoding parameters of the first stage according to the formula, due to the increase in the number of the calculated check nodes, it solves the problem of system expansion. The fault tolerance ability cannot adapt to the problem of node scale. A systematic MDS code is used as the basic code to construct the generator matrix corresponding to the last stage, and according to the set folding rules, the generator matrix sets corresponding to the other stages are constructed in reverse order, and a coding group is combined, and selected from the coding group. A code encodes the data to be encoded. Because the repair method of the present invention is to select the same number of nodes as the number of information nodes from the non-failed nodes to download data symbols to restore the data symbols in the failed node, since the number of selected nodes is equal to the number of information nodes, the problem of repairing the failed nodes is solved. time to fix high-level issues. Since the expansion method of the present invention is to combine the two sub-strips in each expansion group when the encoded data is expanded, since there are multiple codes in the encoding group, such a combination process can be performed multiple times, which solves the problem of the number of expansions. less problems.

为实现上述目的,本发明一种折叠式可扩展分布式存储编码方法的步骤包括如下:In order to achieve the above object, the steps of a foldable scalable distributed storage coding method of the present invention include the following steps:

(1)设定第1个阶段的编码参数:(1) Set the encoding parameters of the first stage:

将信息节点数k1、校验节点数r1、元校验节点数s1设定为第1个阶段的编码参数,其中,k1、r1为正整数,s1为非负整数且小于等于r1The number of information nodes k 1 , the number of check nodes r 1 , and the number of meta-check nodes s 1 are set as the encoding parameters of the first stage, wherein k 1 and r 1 are positive integers, s 1 is a non-negative integer and less than or equal to r 1 ;

(2)计算下一个阶段的编码参数:(2) Calculate the encoding parameters of the next stage:

(2a)按照下式,计算下一个阶段的信息节点数、校验节点数:(2a) Calculate the number of information nodes and check nodes in the next stage according to the following formula:

k′=2kk'=2k

r′=2r-sr′=2r-s

其中,k′、r′分别表示当前阶段的下一个阶段的信息节点数、校验节点数,k、r、s分别表示当前阶段的信息节点数、校验节点数、元校验节点数;Among them, k' and r' respectively represent the number of information nodes and check nodes in the next stage of the current stage, and k, r and s respectively represent the number of information nodes, check nodes and meta-check nodes in the current stage;

(2b)从取值范围{s,s+1,…,2r-s}中选取一个与希望以低修复复杂度修复的同时失效的信息节点数最大值相等的值作为下一个阶段的元校验节点数;(2b) From the value range {s,s+1,...,2r-s}, select a value equal to the maximum number of information nodes that fail to be repaired at the same time with low repair complexity as the meta-calibration of the next stage Check the number of nodes;

(3)判断当前迭代得到的编码参数的总数是否是等于m,若是,则执行步骤(4);否则,将本次确定的编码参数作为当前阶段的编码参数后执行步骤(2);m表示设定的拟构造的编码组中码的总数,m的值为大于等于2的整数;(3) Judging whether the total number of coding parameters obtained by the current iteration is equal to m, and if so, execute step (4); otherwise, execute step (2) after taking the coding parameters determined this time as the coding parameters of the current stage; m represents The total number of codes in the set coding group to be constructed, and the value of m is an integer greater than or equal to 2;

(4)确定最终编码参数:(4) Determine the final encoding parameters:

(4a)将当前迭代得到的编码参数中的校验节点数的值设定为当前迭代得到的编码参数中的元校验节点数;(4a) the value of the number of check nodes in the encoding parameter obtained by the current iteration is set to the number of meta-check nodes in the encoding parameter obtained by the current iteration;

(4b)将当前迭代得到的信息节点数、校验节点数与确定的元校验节点数组成最终编码参数;(4b) The number of information nodes obtained by the current iteration, the number of check nodes and the determined number of meta-check nodes are formed into final coding parameters;

(5)构建最后一个阶段对应的生成矩阵:(5) Construct the generator matrix corresponding to the last stage:

将一个系统型MDS码作为基本码,利用基本码的生成矩阵构造方法,构建一个与最后一个阶段对应的生成矩阵GmTaking a systematic MDS code as the basic code, and using the generator matrix construction method of the basic code, a generator matrix G m corresponding to the last stage is constructed:

Figure GDA0003835645220000041
Figure GDA0003835645220000041

其中,Gm表示最后一个阶段对应的生成矩阵,

Figure GDA0003835645220000042
表示一个km阶的单位矩阵,km的取值等于最终编码参数中的信息节点数,
Figure GDA0003835645220000043
表示一个rm行km列的矩阵,rm的取值等于最终编码参数中的校验节点数;Among them, G m represents the generator matrix corresponding to the last stage,
Figure GDA0003835645220000042
represents a unit matrix of order k m , and the value of k m is equal to the number of information nodes in the final encoding parameter,
Figure GDA0003835645220000043
Represents a matrix with r m rows and k m columns, and the value of r m is equal to the number of check nodes in the final encoding parameter;

(6)设定生成矩阵的折叠规则:(6) Set the folding rules of the generated matrix:

(6a)将待折叠的矩阵划分成A、B、X、U、V五个矩阵:其中,A表示待折叠的矩阵的第1行至第l1行组成的一个左信息矩阵,

Figure GDA0003835645220000044
表示当前阶段的上一阶段对应的编码参数中的信息节点数;B表示待折叠的矩阵的第l2行至第l3行组成的一个右信息矩阵,
Figure GDA0003835645220000045
X表示待折叠的矩阵的第l4行至前l5行组成的矩阵,
Figure GDA0003835645220000046
表示当前阶段的上一阶段对应的编码参数中的元校验节点数;U表示待折叠的矩阵的第l6行至第l7行组成的矩阵,
Figure GDA0003835645220000047
表示当前阶段的上一阶段对应的编码参数中的校验节点数;V表示待折叠的矩阵的第l8行至最后1行组成的一个右非元矩阵,
Figure GDA0003835645220000048
(6a) Divide the matrix to be folded into five matrices of A, B, X, U, and V: wherein, A represents a left information matrix composed of the 1st row to the 11th row of the matrix to be folded,
Figure GDA0003835645220000044
Represents the number of information nodes in the coding parameters corresponding to the previous stage of the current stage; B represents a right information matrix composed of the 12th row to the 13th row of the matrix to be folded,
Figure GDA0003835645220000045
X represents the matrix composed of the 14th row to the first 15th row of the matrix to be folded,
Figure GDA0003835645220000046
Represents the number of meta-check nodes in the coding parameters corresponding to the previous stage of the current stage; U represents the matrix formed by the 16th row to the 17th row of the matrix to be folded,
Figure GDA0003835645220000047
Represents the number of check nodes in the coding parameters corresponding to the previous stage of the current stage; V represents a right non-element matrix composed of the 18th row to the last row of the matrix to be folded,
Figure GDA0003835645220000048

(6b)生成一个与矩阵X行列元素均相等的矩阵,该矩阵的最后μ个非零列的元素全部置零后得到一个左元矩阵X′,

Figure GDA0003835645220000049
生成一个与矩阵X行列元素均相等的矩阵,该矩阵的前μ个非零列的元素全部置零后得到一个右元矩阵X″;将矩阵U的最后μ个非零列的元素全部置零后得到一个左非元矩阵U′;(6b) Generate a matrix whose row and column elements are equal to those of matrix X, and obtain a left-element matrix X′ after all the elements of the last μ non-zero columns of the matrix are set to zero,
Figure GDA0003835645220000049
Generate a matrix with the same row and column elements as matrix X, and set all the elements of the first μ non-zero columns of the matrix to zero to obtain a right-element matrix X”; set all the elements of the last μ non-zero columns of the matrix U to zero Then get a left non-element matrix U';

(6c)按照下式,分别对矩阵A、矩阵X′、矩阵U′,矩阵B、矩阵X″、矩阵V进行组合,构建出对待折叠的矩阵折叠后的当前阶段的上一个阶段对应的生成矩阵集中的两个相互关联的生成矩阵:(6c) According to the following formula, combine matrix A, matrix X', matrix U', matrix B, matrix X'', and matrix V respectively to construct the generation corresponding to the previous stage of the current stage after the matrix to be folded is folded Two interrelated generator matrices in a matrix set:

Figure GDA00038356452200000410
Figure GDA00038356452200000410

Figure GDA0003835645220000051
Figure GDA0003835645220000051

其中,G′表示左生成矩阵、G″表示右生成矩阵;Among them, G′ represents the left generator matrix, and G″ represents the right generator matrix;

(7)构建最后一个阶段的上一个阶段对应的生成矩阵集:(7) Construct the generation matrix set corresponding to the previous stage of the last stage:

(7a)根据生成矩阵的折叠规则,对最后一个阶段对应的生成矩阵进行折叠,(7a) According to the folding rule of the generator matrix, fold the generator matrix corresponding to the last stage,

并将折叠后的生成矩阵加入到当前阶段的上一个阶段对应的生成矩阵集中;And add the folded generator matrix to the generator matrix set corresponding to the previous stage of the current stage;

(7b)判断m-1的值是否等于2,若是,则执行步骤(10),否则,将本次迭代确定的生成矩阵集作为当前阶段对应的生成矩阵集后执行步骤(8);(7b) judge whether the value of m-1 is equal to 2, if yes, then execute step (10), otherwise, perform step (8) after taking the generation matrix set determined by this iteration as the generation matrix set corresponding to the current stage;

(8)构造上一阶段对应的生成矩阵集:(8) Construct the generating matrix set corresponding to the previous stage:

根据生成矩阵的折叠规则,对当前阶段对应的生成矩阵集中的每一个生成矩阵进行折叠,并将折叠得到的生成矩阵加入到当前阶段的上一个阶段对应的生成矩阵集中;According to the folding rule of the generator matrix, fold each generator matrix in the generator matrix set corresponding to the current stage, and add the generator matrix obtained by folding to the generator matrix set corresponding to the previous stage of the current stage;

(9)判断当前迭代得到的生成矩阵集中生成矩阵的数量是否等于2m-1,若是,则执行步骤(10),否则,将本次确定的生成矩阵集作为当前阶段对应的生成矩阵集后执行步骤(8);(9) Determine whether the number of generator matrices in the generator matrix set obtained by the current iteration is equal to 2 m-1 , if so, perform step (10), otherwise, take the generator matrix set determined this time as the generator matrix set corresponding to the current stage. Execute step (8);

(10)确定各阶段的码:(10) Determine the code of each stage:

将每一个阶段对应的编码参数作为其对应的码的编码参数;将最后一个阶段对应的生成矩阵作为最后一个阶段的码的子条带的生成矩阵;将除最后一个阶段以外的其他每个阶段对应的生成矩阵集中的各生成矩阵作为其对应的码的各子条带的生成矩阵;Take the encoding parameter corresponding to each stage as the encoding parameter of its corresponding code; take the generator matrix corresponding to the last stage as the generator matrix of the sub-strip of the code of the last stage; take every other stage except the last stage Each generator matrix in the corresponding generator matrix set is used as the generator matrix of each sub-strip of its corresponding code;

(11)将所有阶段的码组合成一个编码组;(11) code of all stages is combined into a coding group;

(12)从编码组中选择一个码,所选码的编码参数中信息节点数与校验节点数的和等于希望采用的节点总数;(12) select a code from the coding group, and the sum of the number of information nodes and the number of check nodes in the coding parameters of the selected code is equal to the total number of nodes expected to be adopted;

(13)对待编码数据进行编码:(13) Encode the data to be encoded:

将待编码数据平均划分为t个信息符号,t=km;用所选码的每一个子条带对应的生成矩阵分别对待编码数据进行编码,得到该子条带的数据符号,由所有子条带的数据符号组成编码数据;将编码数据保存到相应节点中。Divide the data to be encoded into t information symbols on average, t =km; encode the data to be encoded with the corresponding generator matrix of each sub-strip of the selected code, obtain the data symbols of this sub-strip, by all sub-strips The data symbols of the strip make up the encoded data; the encoded data is stored in the corresponding node.

本发明一种折叠式可扩展分布式存储编码修复方法的步骤包括如下:The steps of a foldable and scalable distributed storage code repair method of the present invention include the following steps:

(1)将每个编码数据采用同一个码编码的失效节点的总数大于该码编码参数中的校验节点数的情况放弃修复,其余情况执行步骤(2);(1) The total number of failed nodes encoded by the same code for each encoded data is greater than the number of check nodes in the code encoding parameters, and the repair is abandoned, and step (2) is performed in the remaining cases;

(2)判断所有失效节点的失效信息节点的总数是否为非0值,若是,则执行步骤(3);否则,判定不存在失效信息节点但存在失效校验节点后执行步骤(6);(2) Determine whether the total number of failed information nodes of all failed nodes is a non-zero value, and if so, execute step (3); otherwise, execute step (6) after determining that there is no failed information node but there is a failed check node;

(3)判断

Figure GDA0003835645220000061
是否成立,若是,则执行步骤(4),否则,执行步骤(7);其中,α表示所有失效节点的失效信息节点的总数,
Figure GDA0003835645220000062
表示每个编码数据采用同一个码编码的参数中的元校验节点数、λ表示所有失效校验节点中失效元校验节点的总数;(3) Judgment
Figure GDA0003835645220000061
Is it true, if so, go to step (4), otherwise, go to step (7); where α represents the total number of failure information nodes of all failed nodes,
Figure GDA0003835645220000062
Represents the number of meta-check nodes in the parameters encoded by the same code for each encoded data, and λ represents the total number of failed meta-check nodes in all failed check nodes;

(4)划分数据符号:(4) Divide data symbols:

从与失效节点保存同一个编码数据的每个未失效的信息节点中分别下载该信息节点保存的所有的信息符号,从与失效节点保存同一个编码数据的每个未失效的元校验节点中下载该元校验节点保存的所有的元校验符号,从与失效节点保存同一个编码数据的所有未失效的非元校验节点中任意选择η个非元校验节点下载该非元校验节点中所有的非元校验符号;将属于同一个子条带的数据符号划为同一个符号组,将属于同一个编码数据的符号组划为同一个数据集合;其中η的值等于

Figure GDA0003835645220000063
Download all information symbols saved by the information node from each non-invalidated information node that stores the same encoded data as the failed node, and download all the information symbols stored by the failed node from each non-invalidated meta-check node that stores the same encoded data as the failed node. Download all meta-check symbols saved by the meta-check node, and randomly select n non-meta-check nodes from all non-meta-check nodes that store the same encoded data as the failed node to download the non-meta-check node All non-meta-check symbols in the node; the data symbols belonging to the same sub-strip are divided into the same symbol group, and the symbol groups belonging to the same encoded data are divided into the same data set; the value of n is equal to
Figure GDA0003835645220000063

(5)对每个数据集合进行处理:(5) Process each data set:

(5a)按照下式,对每一个数据集合中的每一个符号组进行编号:(5a) Number each symbol group in each data set according to the following formula:

Figure GDA0003835645220000064
Figure GDA0003835645220000064

其中,jh,c表示第h个数据集合中第c个符号组的编号,

Figure GDA0003835645220000065
表示向上取整操作,qh,c表示第h个数据集合中第c个符号组对应子条带的生成矩阵中第一行元素中非零元素的列的序号,
Figure GDA0003835645220000066
表示每个编码数据采用同一个码编码的参数中的信息节点数;Among them, j h,c represents the number of the c-th symbol group in the h-th data set,
Figure GDA0003835645220000065
Represents the round-up operation, q h,c represents the sequence number of the column of the non-zero element in the first row element in the generator matrix of the c-th symbol group corresponding to the sub-strip in the h-th data set,
Figure GDA0003835645220000066
Indicates the number of information nodes in the parameters encoded by the same code for each encoded data;

(5b)对于每一个数据集合从该数据集合中编号为1的符号组依次对该数据集合中的每个符号组进行译码-消除操作;(5b) for each data set, perform decoding-elimination operation on each symbol group in the data set in turn from the symbol group numbered 1 in the data set;

(5c)所有的数据集合均已进行处理后执行步骤(9);(5c) Step (9) is performed after all data sets have been processed;

(6)划分信息符号:(6) Divide information symbols:

从与失效节点保存同一个编码数据的每个信息节点中下载该信息节点保存的所有的信息符号,将属于同一个子条带的信息符号划为同一个符号组;将属于同一个编码数据的符号组划为同一个数据集合后执行步骤(9);Download all the information symbols stored by the information node from each information node that stores the same encoded data as the failed node, and divide the information symbols belonging to the same sub-strip into the same symbol group; classify the symbols belonging to the same encoded data Step (9) is performed after the group is divided into the same data set;

(7)恢复失效信息节点对应的信息符号:(7) Restore the information symbol corresponding to the failed information node:

(7a)从与失效节点保存同一个编码数据的每个未失效的信息节点中下载该信息节点保存的所有的信息符号,从与失效节点保存同一个编码数据的所有未失效的元校验节点中任意选择与α的值相等数量的元校验节点下载该元校验节点保存的所有的元校验符号;将属于同一个子条带的数据符号划为同一个符号组;(7a) Download all the information symbols saved by the information node from each non-invalidated information node that stores the same encoded data as the failed node, and download all the non-invalidated meta-check nodes that store the same encoded data with the failed node. Arbitrarily select a number of meta-check nodes equal to the value of α to download all meta-check symbols saved by the meta-check node; divide the data symbols belonging to the same sub-strip into the same symbol group;

(7b)利用与基本码对应的译码方法对每个符号组中的数据符号进行译码,恢复出该符号组中与失效信息节点对应的信息符号,并将其加入到该符号组中;(7b) utilize the decoding method corresponding to the basic code to decode the data symbols in each symbol group, recover the information symbols corresponding to the failed information nodes in the symbol group, and add it to the symbol group;

(7c)将属于同一个编码数据的符号组划为同一个数据集合;(7c) dividing the symbol groups belonging to the same encoded data into the same data set;

(8)判断所有失效节点中是否存在失效校验节点,若是,则执行步骤(9),否则,执行步骤(10);(8) Judge whether there is a failed check node in all the failed nodes, if so, execute step (9), otherwise, execute step (10);

(9)对信息符号进行编码:(9) Encode the information symbol:

利用失效校验节点对应的编码系数行矩阵对每个数据集合中所有符号组中所有的信息符号进行编码,恢复出失效校验节点对应的校验符号,并将恢复后的校验符号加入到与其对应同一个子条带的符号组中;All information symbols in all symbol groups in each data set are encoded using the coding coefficient row matrix corresponding to the failed check node, the check symbol corresponding to the failed check node is recovered, and the recovered check symbol is added to the in the symbol group corresponding to the same sub-strip;

(10)保存恢复出的数据符号:(10) Save the recovered data symbols:

增加与α的值相等数量的新的信息节点,增加与所有失效节点的失效校验节点的总数相等数量的新的校验节点,将恢复出的属于同一个信息节点的信息符号保存在同一个信息节点中,将恢复出的属于同一个校验节点的校验符号保存在同一个校验节点中;Add a number of new information nodes equal to the value of α, add a number of new check nodes equal to the total number of failed check nodes of all failed nodes, and save the recovered information symbols belonging to the same information node in the same In the information node, the recovered check symbols belonging to the same check node are stored in the same check node;

(11)用新的节点替换掉失效的节点。(11) Replace the failed node with a new node.

本发明一种折叠式可扩展分布式存储编码扩展方法的步骤包括如下:The steps of a foldable scalable distributed storage coding extension method of the present invention include the following steps:

(1)添加新的节点:(1) Add a new node:

对除编码数据采用最后一个阶段的码进行编码的情况之外,向保存编码数据的节点中添加ρ个信息节点Y1、Y2、…、Yρ和γ个校验节点F1、F2、…、Fγ,其中,

Figure GDA0003835645220000081
表示编码数据当前采用的码对应阶段的下一个阶段的码的编码参数中的信息节点数,
Figure GDA0003835645220000082
表示编码数据当前采用的码的编码参数中的信息节点数,
Figure GDA0003835645220000083
表示编码数据当前采用的码对应阶段的下一个阶段的码的编码参数中的校验节点数,
Figure GDA0003835645220000084
表示编码数据当前采用的码的编码参数中的校验节点数;Add ρ information nodes Y 1 , Y 2 , . , ..., F γ , where,
Figure GDA0003835645220000081
Indicates the number of information nodes in the coding parameters of the code of the next stage of the code corresponding to the current stage of the coded data,
Figure GDA0003835645220000082
Indicates the number of information nodes in the encoding parameter of the code currently used by the encoded data,
Figure GDA0003835645220000083
Indicates the number of check nodes in the coding parameters of the code of the next stage of the code corresponding to the current stage of the coded data,
Figure GDA0003835645220000084
Indicates the number of check nodes in the encoding parameters of the code currently used by the encoded data;

(2)将编码数据中生成矩阵相互关联的两个子条带划为一个扩展组;(2) the two sub-strips in which the generation matrix is related to each other in the encoded data are divided into an extended group;

(3)合并每个扩展组内的两个子条带:(3) Merge the two sub-stripes within each extension group:

(3a)从保存编码数据的所有信息节点中下载并缓存每个扩展组中与右生成矩阵对应的子条带的所有的信息符号,将这些信息符号中位于子条带不同位置的信息符号分别迁移至信息节点Y1、Y2、…、Yρ中不同的信息节点中,将迁移后的信息符号和未迁移的信息符号组成合并后的子条带的信息符号;(3a) Download and cache all the information symbols of the sub-strip corresponding to the right generator matrix in each extension group from all the information nodes that save the encoded data, and separate the information symbols located at different positions of the sub-strip among these information symbols. Migrating to different information nodes in the information nodes Y 1 , Y 2 , .

(3b)将每个扩展组中两个子条带中位于同一个元校验节点的两个元校验符号在该元校验节点中相加,得到更新后的元校验符号;(3b) adding the two meta-check symbols located in the same meta-check node in the two sub-strips in each extended group in the meta-check node to obtain the updated meta-check symbol;

(3c)利用与每个扩展组中子条带对应的左生成矩阵的互补矩阵对缓存的信息符号进行编码得到校正符号,利用校正符号对左生成矩阵对应的子条带的非元校验符号进行更新;(3c) Use the complementary matrix of the left generator matrix corresponding to the sub-strip in each extension group to encode the buffered information symbols to obtain correction symbols, and use the correction symbols to encode the non-element check symbols of the sub-strip corresponding to the left generator matrix to update;

(3d)将每个扩展组中与右生成矩阵对应的子条带中不同位置的非元校验符号分别迁移至校验节点F1、F2、…、Fγ中不同的校验节点中;(3d) Migrate the non-meta-check symbols at different positions in the sub-strip corresponding to the right generator matrix in each extension group to different check nodes in the check nodes F 1 , F 2 , . . . , F γ respectively ;

(3e)将更新后的元校验符号、更新后的非元校验符号和迁移后的校验符号组成合并后子条带的校验符号;(3e) the updated meta-check symbol, the updated non-meta-check symbol and the migrated check symbol are formed into the check symbol of the merged sub-strip;

(4)将所有合并后的子条带的数据符号组成扩展后的编码数据。(4) The data symbols of all the merged sub-strips are formed into the extended coded data.

本发明与现有的技术相比具有以下优点:Compared with the prior art, the present invention has the following advantages:

第一,本发明的编码方法中在计算下一个阶段的编码参数时计算出的下一个阶段的校验节点数不少于当前阶段的校验节点数,克服了现有技术在存储扩展时单个条带内编码块的数量保持不变、系统扩展后的容错能力无法适应系统的节点规模的问题,使得利用本发明的编码方法构造的码字具有构造的下一个阶段的码时能够同时提高单个子条带内校验符号的数量、使系统扩展后的容错能力可以适应系统的节点规模的优点。First, in the coding method of the present invention, the number of check nodes in the next stage calculated when the coding parameters of the next stage are calculated is not less than the number of check nodes in the current stage, which overcomes the problem that the prior art requires a single check node during storage expansion. The number of coding blocks in the strip remains unchanged, and the fault-tolerant capability after system expansion cannot adapt to the node scale of the system, so that when the codeword constructed by the coding method of the present invention has the code of the next stage of construction, it can simultaneously improve the single-digit rate. The number of check symbols in the sub-strip enables the extended fault tolerance of the system to adapt to the node scale of the system.

第二,本发明的修复方法中在从未失效的节点选择节点下载数据符号时仅需选择与信息节点数相同数量的节点,克服了现有技术在对失效节点进行修复时需要连接的节点数量大于信息节点数量、修复度高的问题,使得本发明的修复方法具有在对失效节点进行修复时需要连接的节点数量等于信息节点数量,修复度较低的优点。Second, in the repair method of the present invention, when selecting a node from an unfailed node to download data symbols, only the same number of nodes as the number of information nodes need to be selected, which overcomes the number of nodes that need to be connected when repairing a failed node in the prior art The problems of being larger than the number of information nodes and having a high repair degree make the repair method of the present invention have the advantages that the number of nodes to be connected is equal to the number of information nodes when repairing a failed node, and the repair degree is low.

第三,本发明的扩展方法中通过对每个扩展组内的两个子条带进行合并可以实现对编码数据的扩展,由于编码组中有多个码,这样的扩展过程可以多次进行,克服了现有技术仅可以以低的带宽资源消耗扩展一次的问题,使得本发明的扩展方法具有可以以低的带宽资源消耗扩展多次的优点。Third, in the expansion method of the present invention, the expansion of the encoded data can be realized by merging the two sub-stripes in each expansion group. Since there are multiple codes in the encoding group, such an expansion process can be performed multiple times to overcome The problem that the prior art can only be extended once with low bandwidth resource consumption makes the extension method of the present invention have the advantage that it can be extended multiple times with low bandwidth resource consumption.

附图说明Description of drawings

图1为本发明的折叠式可扩展分布式存储编码的流程图;Fig. 1 is the flow chart of the foldable scalable distributed storage encoding of the present invention;

图2为本发明实施例中对待编码数据进行编码的示意图;2 is a schematic diagram of encoding data to be encoded in an embodiment of the present invention;

图3为本发明的折叠式可扩展分布式存储修复的流程图;Fig. 3 is the flow chart of the foldable scalable distributed storage repair of the present invention;

图4为本发明的折叠式可扩展分布式存储扩展的流程图;Fig. 4 is the flow chart of the foldable scalable distributed storage expansion of the present invention;

图5为本发明实施例中对编码数据进行扩展的示意图。FIG. 5 is a schematic diagram of extending encoded data in an embodiment of the present invention.

具体实施方式Detailed ways

下面结合附图和实施例,对本发明做进一步的描述。The present invention will be further described below with reference to the accompanying drawings and embodiments.

参照图1,对本发明的折叠式可扩展分布式存储编码方法的实现步骤做进一步的描述。Referring to FIG. 1 , the implementation steps of the foldable scalable distributed storage encoding method of the present invention will be further described.

步骤1,设定第1个阶段的编码参数。Step 1, set the encoding parameters of the first stage.

将信息节点数k1、校验节点数r1、元校验节点数s1设定为第1个阶段的编码参数,其中,k1、r1为正整数,s1为非负整数且小于等于r1The number of information nodes k 1 , the number of check nodes r 1 , and the number of meta-check nodes s 1 are set as the encoding parameters of the first stage, wherein k 1 and r 1 are positive integers, s 1 is a non-negative integer and less than or equal to r 1 .

在本发明实施例中,将第1个阶段的编码参数的信息节点数、校验节点数、元校验节点数分别设置为4、3、1。In the embodiment of the present invention, the number of information nodes, the number of check nodes, and the number of meta-check nodes of the coding parameters in the first stage are set to 4, 3, and 1, respectively.

步骤2,计算下一个阶段的编码参数。Step 2, calculate the encoding parameters of the next stage.

按照下式,计算下一个阶段的信息节点数、校验节点数:Calculate the number of information nodes and check nodes in the next stage according to the following formula:

k′=2kk'=2k

r′=2r-sr′=2r-s

其中,k′、r′分别表示当前阶段的下一个阶段的信息节点数、校验节点数,k、r、s分别表示当前阶段的信息节点数、校验节点数、元校验节点数;Among them, k' and r' respectively represent the number of information nodes and check nodes in the next stage of the current stage, and k, r and s respectively represent the number of information nodes, check nodes and meta-check nodes in the current stage;

从取值范围{s,s+1,…,2r-s}中选取一个与希望以低修复复杂度修复的同时失效的信息节点数最大值相等的值作为下一个阶段的元校验节点数。From the value range {s, s+1,..., 2r-s}, select a value equal to the maximum number of information nodes that fail to be repaired at the same time with low repair complexity as the number of meta-check nodes in the next stage .

步骤3,判断当前迭代得到的编码参数的总数是否是等于m,若是,则执行步骤4;否则,将本次确定的编码参数作为当前阶段的编码参数后执行步骤2;m表示设定的拟构造的编码组中码的总数,m的值为大于等于2的整数。Step 3, judge whether the total number of coding parameters obtained by the current iteration is equal to m, if so, execute step 4; otherwise, use the coding parameters determined this time as the coding parameters of the current stage and execute step 2; m represents the set simulation parameter. The total number of codes in the constructed coding group, and the value of m is an integer greater than or equal to 2.

步骤4,确定最终编码参数。Step 4, determine the final encoding parameters.

第1步,将当前迭代得到的编码参数中的校验节点数的值设定为当前迭代得到的编码参数中的元校验节点数;Step 1: Set the value of the number of check nodes in the encoding parameter obtained by the current iteration to the number of meta-check nodes in the encoding parameter obtained by the current iteration;

第2步,将当前迭代得到的信息节点数、校验节点数与确定的元校验节点数组成最终编码参数。In the second step, the number of information nodes obtained by the current iteration, the number of check nodes and the determined number of meta-check nodes are formed into final coding parameters.

在本发明实施例中,设定编码组中的码的总数为3个,通过计算、选择可以得到第2个阶段的码的编码参数中信息节点数、校验节点数、元校验点数分别为8、5、2,第3个阶段的码的编码参数即最终编码参数中信息节点数、校验节点数、元校验点数分别为16、8、8。In the embodiment of the present invention, the total number of codes in the coding group is set to 3, and the number of information nodes, the number of check nodes, and the number of meta-check points in the coding parameters of the codes in the second stage can be obtained by calculation and selection, respectively. are 8, 5, and 2, and the coding parameters of the code in the third stage, that is, the number of information nodes, the number of check nodes, and the number of meta-check points in the final coding parameters are 16, 8, and 8, respectively.

步骤5,构建最后一个阶段对应的生成矩阵。Step 5, construct the generator matrix corresponding to the last stage.

将一个系统型MDS码作为基本码,利用基本码的生成矩阵构造方法,构建一个与最后一个阶段对应的生成矩阵GmTaking a systematic MDS code as the basic code, and using the generator matrix construction method of the basic code, a generator matrix G m corresponding to the last stage is constructed:

Figure GDA0003835645220000101
Figure GDA0003835645220000101

其中,Gm表示最后一个阶段对应的生成矩阵,

Figure GDA0003835645220000102
表示一个km阶的单位矩阵,km的取值等于最终编码参数中的信息节点数,
Figure GDA0003835645220000103
表示一个rm行km列的矩阵,rm的取值等于最终编码参数中的校验节点数。Among them, G m represents the generator matrix corresponding to the last stage,
Figure GDA0003835645220000102
represents a unit matrix of order k m , and the value of k m is equal to the number of information nodes in the final encoding parameter,
Figure GDA0003835645220000103
Represents a matrix with r m rows and k m columns, and the value of rm is equal to the number of check nodes in the final encoding parameter.

在本发明实施例中,采用一个系统型(24,16)RS作为基本码构造一个生成矩阵

Figure GDA0003835645220000104
In the embodiment of the present invention, a system type (24,16) RS is used as the basic code to construct a generator matrix
Figure GDA0003835645220000104

步骤6,设定生成矩阵的折叠规则。Step 6, set the folding rule of the generated matrix.

第1步,将待折叠的矩阵划分成A、B、X、U、V五个矩阵:其中,A表示待折叠的矩阵的第1行至第l1行组成的一个左信息矩阵,

Figure GDA0003835645220000105
表示当前阶段的上一阶段对应的编码参数中的信息节点数;B表示待折叠的矩阵的第l2行至第l3行组成的一个右信息矩阵,
Figure GDA0003835645220000106
X表示待折叠的矩阵的第l4行至前l5行组成的矩阵,
Figure GDA0003835645220000111
表示当前阶段的上一阶段对应的编码参数中的元校验节点数;U表示待折叠的矩阵的第l6行至第l7行组成的矩阵,
Figure GDA0003835645220000112
表示当前阶段的上一阶段对应的编码参数中的校验节点数;V表示待折叠的矩阵的第l8行至最后1行组成的一个右非元矩阵,
Figure GDA0003835645220000113
Step 1: Divide the matrix to be folded into five matrices of A, B, X, U, and V: where A represents a left information matrix composed of the 1st row to the 11th row of the matrix to be folded,
Figure GDA0003835645220000105
Represents the number of information nodes in the coding parameters corresponding to the previous stage of the current stage; B represents a right information matrix composed of the 12th row to the 13th row of the matrix to be folded,
Figure GDA0003835645220000106
X represents the matrix composed of the 14th row to the first 15th row of the matrix to be folded,
Figure GDA0003835645220000111
Represents the number of meta-check nodes in the coding parameters corresponding to the previous stage of the current stage; U represents the matrix formed by the 16th row to the 17th row of the matrix to be folded,
Figure GDA0003835645220000112
Represents the number of check nodes in the coding parameters corresponding to the previous stage of the current stage; V represents a right non-element matrix composed of the 18th row to the last row of the matrix to be folded,
Figure GDA0003835645220000113

第2步,生成一个与矩阵X行列元素均相等的矩阵,该矩阵的最后μ个非零列的元素全部置零后得到一个左元矩阵X′,

Figure GDA0003835645220000114
生成一个与矩阵X行列元素均相等的矩阵,该矩阵的前μ个非零列的元素全部置零后得到一个右元矩阵X″;将矩阵U的最后μ个非零列的元素全部置零后得到一个左非元矩阵U′;The second step is to generate a matrix with the same row and column elements as the matrix X. After the elements of the last μ non-zero columns of the matrix are all set to zero, a left-element matrix X' is obtained.
Figure GDA0003835645220000114
Generate a matrix with the same row and column elements as matrix X, and set all the elements of the first μ non-zero columns of the matrix to zero to obtain a right-element matrix X”; set all the elements of the last μ non-zero columns of the matrix U to zero Then get a left non-element matrix U';

第3步,按照下式,分别对矩阵A、矩阵X′、矩阵U′,矩阵B、矩阵X″、矩阵V进行组合,构建出对待折叠的矩阵折叠后的当前阶段的上一个阶段对应的生成矩阵集中的两个相互关联的生成矩阵:Step 3: Combine matrix A, matrix X', matrix U', matrix B, matrix X'' and matrix V respectively according to the following formula to construct the matrix corresponding to the previous stage of the current stage after the matrix to be folded is folded Two interrelated generator matrices in a generator matrix set:

Figure GDA0003835645220000115
Figure GDA0003835645220000115

Figure GDA0003835645220000116
Figure GDA0003835645220000116

其中,G′表示左生成矩阵、G″表示右生成矩阵。Among them, G′ represents the left generator matrix, and G″ represents the right generator matrix.

步骤7,构建最后一个阶段的上一个阶段对应的生成矩阵集。Step 7: Construct the generator matrix set corresponding to the previous stage of the last stage.

第1步,根据生成矩阵的折叠规则,对最后一个阶段对应的生成矩阵进行折叠,Step 1, according to the folding rules of the generator matrix, fold the generator matrix corresponding to the last stage,

并将折叠得到的生成矩阵加入到当前阶段的上一个阶段对应的生成矩阵集中;And add the generated matrix obtained by folding to the set of generating matrices corresponding to the previous stage of the current stage;

第2步,判断m-1的值是否等于2,若是,则执行步骤10,否则,将本次确定的生成矩阵集作为当前阶段对应的生成矩阵集后执行步骤8。Step 2, judge whether the value of m-1 is equal to 2, if so, go to step 10, otherwise, take the generator matrix set determined this time as the generator matrix set corresponding to the current stage and then go to step 8.

步骤8,构造上一阶段对应的生成矩阵集。Step 8: Construct the generator matrix set corresponding to the previous stage.

根据生成矩阵的折叠规则,对当前阶段对应的生成矩阵集中的每一个生成矩阵进行折叠,并将折叠得到的生成矩阵加入到当前阶段的上一个阶段对应的生成矩阵集中。According to the folding rules of the generator matrix, each generator matrix in the generator matrix set corresponding to the current stage is folded, and the folded generator matrix is added to the generator matrix set corresponding to the previous stage of the current stage.

步骤9,判断当前得到的生成矩阵集的总数是否是等于m-1,若是,则执行步骤10,否则,将本次确定的生成矩阵集作为当前阶段对应的生成矩阵集后执行步骤8。Step 9: Determine whether the total number of generator matrix sets currently obtained is equal to m-1, if so, perform step 10; otherwise, perform step 8 after taking the generator matrix set determined this time as the generator matrix set corresponding to the current stage.

在本发明实施例中,为便于描述,将矩阵P8×16中第1至第8行对应的编码系数行矩阵分别表示为p1、p2、…、p8,用符号

Figure GDA0003835645220000128
表示保留pq中第u到v个元素并将其余元素置零后得到的编码系数行矩阵,1≤q≤8,1≤u<v≤16,用符号Eu′×u′表示一个u′行u′列的单位矩阵,1≤u′≤16,用符号Ou″×v″表示一个u″行v″列的全零矩阵,1≤u″≤16,1≤v″≤16。首先,将最后一阶段对应的生成矩阵G3划分为五个矩阵:左信息矩阵A3=[E8×88×8]、右信息矩阵B3=[O8×8|E8×8]、矩阵
Figure GDA0003835645220000121
矩阵
Figure GDA0003835645220000122
右非元矩阵
Figure GDA0003835645220000123
生成两个与矩阵X3相同的矩阵并将其中一个矩阵的最后8个非零列的元素全部置零,得到构造的元矩阵
Figure GDA0003835645220000124
将另一个矩阵的前8个非零列的元素全部置零,得到构造的右元矩阵
Figure GDA0003835645220000125
将矩阵U3的最后8个非零列的元素全部置零后得到左非元矩阵
Figure GDA0003835645220000126
分别对矩阵A3、矩阵X1 3、矩阵U1 3,矩阵B3、矩阵X2 3、矩阵V3进行组合,构建出对矩阵G3折叠后得到的当前阶段的上一个阶段对应的生成矩阵集中的两个相互关联的生成矩阵
Figure GDA0003835645220000127
即:In this embodiment of the present invention, for the convenience of description, the coding coefficient row matrices corresponding to the 1st to 8th rows in the matrix P 8×16 are represented as p 1 , p 2 , . . . , p 8 , respectively, with symbols
Figure GDA0003835645220000128
Represents the coding coefficient row matrix obtained after retaining the uth to vth elements in p q and setting the remaining elements to zero, 1≤q≤8, 1≤u<v≤16, and the symbol E u′×u′ represents a u The identity matrix of 'row and u' column, 1≤u'≤16, use the symbol O u"×v" to represent an all-zero matrix with u" row and v" column, 1≤u"≤16, 1≤v"≤16 . First, divide the generator matrix G 3 corresponding to the last stage into five matrices: left information matrix A 3 =[E 8×8 |O 8×8 ], right information matrix B 3 =[O 8×8 |E 8 ×8 ], matrix
Figure GDA0003835645220000121
matrix
Figure GDA0003835645220000122
right non-element matrix
Figure GDA0003835645220000123
Generate two matrices identical to matrix X 3 and zero out the elements of the last 8 non-zero columns of one of the matrices to get the constructed metamatrix
Figure GDA0003835645220000124
Set all the elements of the first 8 non-zero columns of another matrix to zero to get the constructed right-element matrix
Figure GDA0003835645220000125
The left non-element matrix is obtained by setting all the elements of the last 8 non-zero columns of the matrix U 3 to zero
Figure GDA0003835645220000126
Respectively combine matrix A 3 , matrix X 1 3 , matrix U 1 3 , matrix B 3 , matrix X 2 3 , and matrix V 3 to construct the generation corresponding to the previous stage of the current stage obtained by folding matrix G 3 Two interrelated generator matrices in a matrix set
Figure GDA0003835645220000127
which is:

Figure GDA0003835645220000131
Figure GDA0003835645220000131

Figure GDA0003835645220000132
Figure GDA0003835645220000132

G2,1、G2,1即为第2个阶段的生成矩阵集中的2个生成矩阵。G 2,1 and G 2,1 are the two generator matrices in the generator matrix set in the second stage.

采用同样的方法分别对G2,1、G2,2进行折叠,可以分别得到两个生成矩阵G1,1、G1,2,G1,3、G1,4Using the same method to fold G 2,1 and G 2,2 respectively, two generating matrices G 1,1 , G 1,2 , G 1,3 and G 1,4 can be obtained respectively:

Figure GDA0003835645220000133
Figure GDA0003835645220000133

Figure GDA0003835645220000134
Figure GDA0003835645220000134

Figure GDA0003835645220000135
Figure GDA0003835645220000135

Figure GDA0003835645220000141
Figure GDA0003835645220000141

G1,1、G1,2,G1,3、G1,4即为第1个阶段的生成矩阵集中的4个生成矩阵。G 1,1 , G 1,2 , G 1,3 , and G 1,4 are the four generator matrices in the generator matrix set in the first stage.

步骤10,确定各阶段的码。Step 10: Determine the codes of each stage.

将每一个阶段对应的编码参数作为其对应的码的编码参数;将最后一个阶段对应的生成矩阵作为最后一个阶段的码的子条带的生成矩阵;将除最后一个阶段以外的其他每个阶段对应的生成矩阵集中的各生成矩阵作为其对应的码的各子条带的生成矩阵。Take the encoding parameter corresponding to each stage as the encoding parameter of its corresponding code; take the generator matrix corresponding to the last stage as the generator matrix of the sub-strip of the code of the last stage; take every other stage except the last stage Each generator matrix in the corresponding generator matrix set is used as the generator matrix of each sub-strip of the corresponding code.

步骤11,将所有阶段的码组合成一个编码组。Step 11: Combine codes of all stages into a coding group.

步骤12,从编码组中选择一个码,所选码的编码参数中信息节点数与校验节点数的和等于希望采用的节点总数。Step 12, select a code from the coding group, and the sum of the number of information nodes and the number of check nodes in the coding parameters of the selected code is equal to the total number of nodes to be used.

在本发明实施例中,希望采用的节点总数为7个节点,故从编码组中选择第1个阶段的码。In the embodiment of the present invention, the total number of nodes expected to be used is 7 nodes, so the code of the first stage is selected from the coding group.

步骤13,对待编码数据进行编码。Step 13, encoding the data to be encoded.

将待编码数据平均划分为t个信息符号,t=km;用所选码的每一个子条带对应的生成矩阵分别对待编码数据进行编码,得到该子条带的数据符号,由所有子条带的数据符号组成编码数据;将编码数据保存到相应节点中。Divide the data to be encoded into t information symbols on average, t =km; encode the data to be encoded with the corresponding generator matrix of each sub-strip of the selected code, obtain the data symbols of this sub-strip, by all sub-strips The data symbols of the strip make up the encoded data; the encoded data is stored in the corresponding node.

所述的子条带的数据符号包括信息符号和校验符号两类,校验符号包括元校验符号和非元校验符号两小类,所述数据符号通过生成矩阵中各行对应的编码系数行矩阵对待编码数据进行编码得到;所述信息符号为左生成矩阵中的左信息矩阵或右生成矩阵中的右信息矩阵对待编码数据进行编码得到的数据符号;所述元校验符号为左生成矩阵中的左元矩阵或右生成矩阵中的右元矩阵对待编码数据进行编码得到的数据符号;所述非元校验符号为左生成矩阵中的左非元矩阵或右生成矩阵中的右非元矩阵对待编码数据进行编码得到的数据符号。The data symbols of the sub-strips include two types of information symbols and check symbols, and the check symbols include two subclasses of meta-check symbols and non-meta-check symbols. The data symbols are generated by encoding coefficients corresponding to each row in the generation matrix. The row matrix is obtained by encoding the data to be encoded; the information symbol is the data symbol obtained by encoding the data to be encoded by the left information matrix in the left generator matrix or the right information matrix in the right generator matrix; the meta-check symbol is the left generator The left element matrix in the matrix or the right element matrix in the right generator matrix is the data symbol obtained by encoding the data to be encoded; the non-element check symbol is the left non-element matrix in the left generator matrix or the right non-element matrix in the right generator matrix. The metamatrix is the data symbol obtained by encoding the data to be encoded.

所述的编码数据保存到相应节点指的是,每一个子条带中不同位置的数据符号保存在不同的节点中,位于不同子条带中相同位置的数据符号保存在同一个节点中;所述的节点包括信息节点和校验节点两类,所述校验节点包括元校验节点与非元校验节点两小类;所述信息节点用于保存信息符号;所述元校验节点用于保存元校验符号;所述非元校验节点用于保存非元校验符号。The said encoded data is stored in the corresponding node means that the data symbols at different positions in each sub-strip are stored in different nodes, and the data symbols at the same position in different sub-strips are stored in the same node; The nodes described include two types of information nodes and check nodes, and the check nodes include two sub-categories of meta-check nodes and non-meta-check nodes; the information nodes are used to store information symbols; the meta-check nodes are used for for storing meta-check symbols; the non-meta-check nodes are used for storing non-meta-check symbols.

参照图2,对本发明实施例中对待编码数据进行编码的实现步骤做进一步的描述。Referring to FIG. 2 , the implementation steps of encoding the data to be encoded in the embodiment of the present invention will be further described.

图2中D1、D2、D3、D4表示4个信息节点,C1、C2、C3表示3个校验节点,a1、a2、…、a16表示16个信息符号,M表示待编码数据,M=[a1,a2,…,a16]T,T表示转置操作。用第1个阶段的码的4个子条带对应的生成矩阵G1,1、G1,2、G1,3、G1,4分别对待编码数据M进行编码,得到各子条带的数据如下:In Figure 2, D 1 , D 2 , D 3 , and D 4 represent 4 information nodes, C 1 , C 2 , and C 3 represent 3 check nodes, and a 1 , a 2 , ..., a 16 represent 16 information symbols , M represents the data to be encoded, M=[a 1 , a 2 , . . . , a 16 ] T , T represents the transposition operation. Use the generator matrix G 1,1 , G 1,2 , G 1,3 , G 1,4 corresponding to the four sub-stripes of the code in the first stage to encode the data M to be encoded, respectively, to obtain the data of each sub-strip as follows:

第1个子条带包括数据符号a1、a2、a3、a4

Figure GDA0003835645220000151
The first sub-strip includes data symbols a 1 , a 2 , a 3 , a 4 ,
Figure GDA0003835645220000151

第2个子条带包括数据符号a5、a6、a7、a8

Figure GDA0003835645220000152
The second sub-strip includes data symbols a 5 , a 6 , a 7 , a 8 ,
Figure GDA0003835645220000152

第3个子条带包括数据符号a9、a10、a11、a12

Figure GDA0003835645220000153
The third sub-strip includes data symbols a 9 , a 10 , a 11 , a 12 ,
Figure GDA0003835645220000153

第4个子条带包括数据符号a13、a14、a15、a16

Figure GDA0003835645220000154
p7M、p8M。The 4th substrip includes data symbols a 13 , a 14 , a 15 , a 16 ,
Figure GDA0003835645220000154
p7M , p8M .

由以上4个子条带的数据共同组成了一个编码数据。将第1个子条带中的第1至第4个信息符号依次保存在信息节点D1、D2、D3、D4中,第1至第3个校验符号依次保存在校验节点C1、C2、C3中;其他子条带的数据符号采用与子条带1相同的方式进行保存。The data of the above 4 sub-strips together constitute one encoded data. Store the 1st to 4th information symbols in the first sub-strip in the information nodes D 1 , D 2 , D 3 , and D 4 in sequence, and store the 1st to 3rd check symbols in the check node C in sequence 1 , C 2 , and C 3 ; the data symbols of other sub-stripes are stored in the same way as sub-strip 1 .

参照图3,对本发明的折叠式可扩展分布式存储编码修复方法的实现步骤做进一步的描述。Referring to FIG. 3 , the implementation steps of the foldable and scalable distributed storage code repair method of the present invention are further described.

步骤1,将每个编码数据采用同一个码编码的失效节点的总数大于该码编码参数中的校验节点数的情况放弃修复,其余情况执行步骤2。Step 1: Abandon repair if the total number of failed nodes encoded by the same code for each encoded data is greater than the number of check nodes in the code encoding parameters, and step 2 is performed in other cases.

在本发明实施例中,假设图2中所有的节点中有2个节点失效。In this embodiment of the present invention, it is assumed that 2 of all the nodes in FIG. 2 fail.

步骤2,判断所有失效节点的失效信息节点的总数是否为非0值,若是,则执行步骤3;否则,判定不存在失效信息节点但存在失效校验节点后执行步骤6。Step 2, determine whether the total number of failure information nodes of all failed nodes is a non-zero value, if so, go to step 3;

步骤3,判断

Figure GDA0003835645220000155
是否成立,若是,则执行步骤4,否则,执行步骤7;其中,α表示所有失效节点的失效信息节点的总数,
Figure GDA0003835645220000156
表示每个编码数据采用同一个码编码的参数中的元校验节点数、λ表示所有失效校验节点中失效元校验节点的总数。Step 3, judge
Figure GDA0003835645220000155
Whether it is true, if yes, go to step 4, otherwise, go to step 7; where α represents the total number of failure information nodes of all failed nodes,
Figure GDA0003835645220000156
Represents the number of meta-check nodes in the parameters encoded with the same code for each encoded data, and λ represents the total number of failed meta-check nodes in all failed check nodes.

在本发明实施例中,假设图2中失效的2个节点为D3、D4,即失效信息节点的总数为2个,失效元校验节点的数量为0个。In the embodiment of the present invention, it is assumed that the two failed nodes in FIG. 2 are D 3 and D 4 , that is, the total number of failed information nodes is two, and the number of failed meta-check nodes is zero.

步骤4,划分数据符号。Step 4, dividing the data symbols.

从与失效节点保存同一个编码数据的每个未失效的信息节点中下载该信息节点保存的所有的信息符号,从与失效节点保存同一个编码数据的每个未失效的元校验节点中下载该元校验节点保存的所有的元校验符号,从与失效节点保存同一个编码数据的所有未失效的非元校验节点中任意选择η个非元校验节点下载该非元校验节点中所有的非元校验符号;将属于同一个子条带的数据符号划为同一个符号组,将属于同一个编码数据的符号组划为同一个数据集合;其中η的值等于

Figure GDA0003835645220000161
Download all information symbols stored by the information node from each non-invalidated information node that stores the same encoded data as the failed node, and download from each non-invalidated meta-check node that stores the same encoded data as the failed node. For all the meta-check symbols stored in the meta-check node, select n non-meta-check nodes arbitrarily from all non-meta-check nodes that store the same encoded data as the failed node to download the non-meta-check node All non-meta-check symbols in ; the data symbols belonging to the same sub-strip are divided into the same symbol group, and the symbol groups belonging to the same encoded data are divided into the same data set; wherein the value of n is equal to
Figure GDA0003835645220000161

在本发明实施例中,从图2所示的节点D1、D2中下载所有的信息符号,从元校验节点C1中下载校验符号

Figure GDA0003835645220000162
选择非元校验节点C2并从C2中下载校验符号
Figure GDA0003835645220000163
将下载的数据符号划分为4个符号组:对应第1个子条带的符号组为
Figure GDA0003835645220000164
对应第2个子条带的符号组为
Figure GDA0003835645220000165
对应第3个子条带的符号组为
Figure GDA0003835645220000166
对应第4个子条带的符号组为
Figure GDA0003835645220000167
In this embodiment of the present invention, all information symbols are downloaded from the nodes D 1 and D 2 shown in FIG. 2 , and check symbols are downloaded from the meta-check node C 1
Figure GDA0003835645220000162
Select non-meta check node C 2 and download check symbols from C 2
Figure GDA0003835645220000163
The downloaded data symbols are divided into 4 symbol groups: the symbol group corresponding to the first sub-strip is
Figure GDA0003835645220000164
The symbol group corresponding to the second sub-strip is
Figure GDA0003835645220000165
The symbol group corresponding to the third sub-strip is
Figure GDA0003835645220000166
The symbol group corresponding to the 4th sub-strip is
Figure GDA0003835645220000167

步骤5,对每个数据集合进行处理。Step 5, process each data set.

第1步,按照下式,对每一个数据集合中的每一个符号组进行编号:Step 1, according to the following formula, number each symbol group in each data set:

Figure GDA0003835645220000168
Figure GDA0003835645220000168

其中,jh,c表示第h个数据集合中第c个符号组的编号,

Figure GDA00038356452200001610
表示向上取证操作,qh,c表示第h个数据集合中第c个符号组对应子条带的生成矩阵中第一行元素中非零元素的列的序号,
Figure GDA0003835645220000169
表示每个编码数据采用同一个码编码的参数中的信息节点数;Among them, j h,c represents the number of the c-th symbol group in the h-th data set,
Figure GDA00038356452200001610
Represents the upward forensic operation, q h,c represents the sequence number of the column of the non-zero element in the first row element in the generator matrix of the c-th symbol group corresponding to the sub-strip in the h-th data set,
Figure GDA0003835645220000169
Indicates the number of information nodes in the parameters encoded by the same code for each encoded data;

第2步,对于每一个数据集合从该数据集合中编号为1的符号组依次对该数据集合中的每个符号组进行译码-消除操作。Step 2, for each data set, sequentially perform decoding-elimination operations on each symbol group in the data set from the symbol group numbered 1 in the data set.

所述的译码-消除操作是先进行译码操作再进行消除操作,译码操作指的是,利用与基本码对应的译码方法对该数据集合中的当前符号组中的数据符号进行译码,恢复出失效信息节点对应的信息符号,并将其加入到该数据集合中的当前符号组中;消除操作指的是,利用当前符号组中的信息符号对该数据集合中编号大于当前符号组的编号的每一个符号组中的校验符号进行消除;消除是指若校验符号的编码系数行矩阵中与信息符号对应的元素的值为非零值,则将这些信息符号与对应的非零值相乘后从校验符号中减去,得到消除后的校验符号。The decoding-elimination operation is to first perform a decoding operation and then perform an elimination operation. The decoding operation refers to decoding the data symbols in the current symbol group in the data set by using a decoding method corresponding to the basic code. code, recover the information symbol corresponding to the failed information node, and add it to the current symbol group in the data set; the elimination operation refers to using the information symbol in the current symbol group to use the information symbol in the data set with a number greater than the current symbol The check symbols in each symbol group of the group number are eliminated; elimination means that if the value of the elements corresponding to the information symbols in the coding coefficient row matrix of the check symbols is non-zero, then these information symbols are compared with the corresponding ones. The non-zero values are multiplied and subtracted from the check symbol to obtain the eliminated check symbol.

在本发明实施例中,与图2所示的第1至第4个子条带对应的符号集合依次被编号为1、2、3、4。对编号为1的符号组

Figure GDA0003835645220000171
进行RS译码恢复出信息符号a3,a4;然后,利用编号为1的符号组中的信息符号a1,a2,a3,a4对编号为2、3、4的每一个符号组中的校验符号进行消除:在编号为2的符号组中,由校验符号
Figure GDA0003835645220000172
的编码系数行矩阵
Figure GDA0003835645220000173
可知
Figure GDA0003835645220000174
中与信息符号a1,a2,a3,a4对应的元素的值为非零值,故从
Figure GDA0003835645220000175
中减去
Figure GDA0003835645220000176
得到消除后的校验符号
Figure GDA0003835645220000177
编号为2的符号组被更新为
Figure GDA0003835645220000178
采用同样的方法对编号为3、4的符号组中的校验符号进行消除后,编号为3的符号组保持不变、编号为4的符号组更新为
Figure GDA0003835645220000179
采用与上述过程相同的方法,可以依次恢复出信息符号a7、a8,a11、a12,a15、a16。In this embodiment of the present invention, the symbol sets corresponding to the first to fourth sub-strips shown in FIG. 2 are numbered as 1, 2, 3, and 4 in sequence. pair number 1 symbol group
Figure GDA0003835645220000171
Perform RS decoding to recover the information symbols a 3 , a 4 ; then, use the information symbols a 1 , a 2 , a 3 , a 4 in the symbol group numbered 1 to pair each symbol numbered 2, 3, and 4 The check symbol in the group is eliminated: in the symbol group numbered 2, the check symbol is
Figure GDA0003835645220000172
The encoded coefficient row matrix of
Figure GDA0003835645220000173
know
Figure GDA0003835645220000174
The values of the elements corresponding to the information symbols a 1 , a 2 , a 3 , and a 4 are non-zero, so from
Figure GDA0003835645220000175
subtract
Figure GDA0003835645220000176
Get the check symbol after elimination
Figure GDA0003835645220000177
Symbol group number 2 is updated to
Figure GDA0003835645220000178
After the check symbols in the symbol groups numbered 3 and 4 are eliminated by the same method, the symbol group numbered 3 remains unchanged, and the symbol group numbered 4 is updated to
Figure GDA0003835645220000179
Using the same method as the above process, the information symbols a 7 , a 8 , a 11 , a 12 , a 15 , and a 16 can be recovered in sequence.

第3步,所有的数据集合均已进行处理后执行步骤9。In step 3, step 9 is executed after all data sets have been processed.

步骤6,划分信息符号。Step 6, dividing information symbols.

从与失效节点保存同一个编码数据的每个信息节点中下载该信息节点保存的所有的信息符号,将属于同一个子条带的信息符号划为同一个符号组;将属于同一个编码数据的符号组划为同一个数据集合后执行步骤9。Download all the information symbols stored by the information node from each information node that stores the same encoded data as the failed node, and divide the information symbols belonging to the same sub-strip into the same symbol group; classify the symbols belonging to the same encoded data After the group is divided into the same data set, perform step 9.

步骤7,恢复失效信息节点对应的信息符号。Step 7, restore the information symbol corresponding to the failed information node.

第1步,从与失效节点保存同一个编码数据的每个未失效的信息节点中下载该信息节点保存的所有的信息符号,从与失效节点保存同一个编码数据的所有未失效的元校验节点中任意选择与α的值相等数量的元校验节点下载该元校验节点保存的所有的元校验符号;将属于同一个子条带的数据符号划为同一个符号组;Step 1: Download all the information symbols saved by the information node from each non-invalidated information node that saves the same encoded data as the failed node, and check all the non-invalidated meta-checks from all the non-invalidated information nodes that store the same encoded data as the failed node. The node arbitrarily selects the same number of meta-check nodes as the value of α to download all the meta-check symbols saved by the meta-check node; divides the data symbols belonging to the same sub-strip into the same symbol group;

第2步,利用与基本码对应的译码方法对每个符号组中的数据符号进行译码,恢复出该符号组中与失效信息节点对应的信息符号,并将其加入到该符号组中;Step 2: Use the decoding method corresponding to the basic code to decode the data symbols in each symbol group, recover the information symbols corresponding to the failed information nodes in the symbol group, and add them to the symbol group ;

第3步,将属于同一个编码数据的符号组划为同一个数据集合。Step 3: Divide the symbol groups belonging to the same encoded data into the same data set.

步骤8,判断所有失效节点中是否存在失效校验节点,若是,则执行步骤9,否则,执行步骤10。Step 8, determine whether there is a failed check node in all the failed nodes, if yes, go to Step 9, otherwise, go to Step 10.

步骤9,对信息符号进行编码。Step 9, encode the information symbol.

利用失效校验节点对应的编码系数行矩阵对每个数据集合中所有符号组中所有的信息符号进行编码,恢复出失效校验节点对应的校验符号,并将恢复后的校验符号加入到与其对应同一个子条带的符号组中。All information symbols in all symbol groups in each data set are encoded using the coding coefficient row matrix corresponding to the failed check node, the check symbol corresponding to the failed check node is recovered, and the recovered check symbol is added to the in the symbol group corresponding to the same substrip.

步骤10,保存恢复出的数据符号。Step 10, save the recovered data symbols.

增加与α的值相等数量的新的信息节点,增加与所有失效节点的失效校验节点的总数相等数量的新的校验节点,将恢复出的属于同一个信息节点的信息符号保存在同一个信息节点中,将恢复出的属于同一个校验节点的校验符号保存在同一个校验节点中。Add a number of new information nodes equal to the value of α, add a number of new check nodes equal to the total number of failed check nodes of all failed nodes, and save the recovered information symbols belonging to the same information node in the same In the information node, the recovered check symbols belonging to the same check node are stored in the same check node.

本发明实施例中,将恢复出的信息符号a3、a7、a11、a15保存到一个新的节点D3′,将恢复出的信息符号a4、a8、a12、a16保存在另一个新的节点D4′。In the embodiment of the present invention, the recovered information symbols a 3 , a 7 , a 11 , and a 15 are stored in a new node D 3 ′, and the recovered information symbols a 4 , a 8 , a 12 , and a 16 are stored Save in another new node D4 '.

步骤11,用新的节点替换掉失效的节点。Step 11, replace the failed node with a new node.

本发明实施例中,用节点D3′、D4′替换掉失效的节点D3、D4In the embodiment of the present invention, the failed nodes D 3 , D 4 are replaced with nodes D 3 ′, D 4 ′.

参照图4,对本发明的折叠式可扩展分布式存储编码扩展方法的实现步骤做进一步的描述。Referring to FIG. 4 , the implementation steps of the foldable scalable distributed storage coding extension method of the present invention will be further described.

步骤1,添加新的节点。Step 1, add a new node.

对除编码数据采用最后一个阶段的码进行编码的情况之外,向保存编码数据的节点中添加ρ个信息节点Y1、Y2、…、Yρ和γ个校验节点F1、F2、…、Fγ,其中,

Figure GDA0003835645220000181
表示编码数据当前采用的码对应阶段的下一个阶段的码的编码参数中的信息节点数,
Figure GDA0003835645220000182
表示编码数据当前采用的码的编码参数中的信息节点数,
Figure GDA0003835645220000183
表示编码数据当前采用的码对应阶段的下一个阶段的码的编码参数中的校验节点数,
Figure GDA0003835645220000184
表示编码数据当前采用的码的编码参数中的校验节点数。Add ρ information nodes Y 1 , Y 2 , . , ..., F γ , where,
Figure GDA0003835645220000181
Indicates the number of information nodes in the coding parameters of the code of the next stage of the code corresponding to the current stage of the coded data,
Figure GDA0003835645220000182
Indicates the number of information nodes in the encoding parameter of the code currently used by the encoded data,
Figure GDA0003835645220000183
Indicates the number of check nodes in the coding parameters of the code of the next stage of the code corresponding to the current stage of the coded data,
Figure GDA0003835645220000184
Indicates the number of check nodes in the encoding parameter of the code currently used by the encoded data.

参照图5,对本发明实施例中编码数据扩展的实现步骤做进一步的描述。Referring to FIG. 5 , the implementation steps of coded data extension in the embodiment of the present invention will be further described.

本发明实施例中对图2所示的编码数据进行扩展,图5(a)表示在图2的基础上添加节点后的编码数据存储示意图,其中,D5、D6、D7、D8分别表示在图2的基础上新添加的4个信息节点,C4、C5分别表示在图2的基础上新添加的2个校验节点。图5(b)表示对编码数据扩展后的编码数据存储示意图。In the embodiment of the present invention, the coded data shown in FIG. 2 is extended, and FIG. 5(a) shows a schematic diagram of coded data storage after adding nodes on the basis of FIG. 2 , wherein D 5 , D 6 , D 7 , and D 8 Respectively represent four information nodes newly added on the basis of FIG. 2 , and C 4 and C 5 respectively represent two newly added check nodes on the basis of FIG. 2 . FIG. 5(b) is a schematic diagram showing the storage of encoded data after extending the encoded data.

步骤2,将编码数据中生成矩阵由同一个矩阵折叠得到的两个子条带划为一个扩展组。Step 2: Divide the two sub-strips obtained by folding the generated matrix from the same matrix in the encoded data into an extended group.

步骤3,合并每个扩展组内的两个子条带。Step 3, merge the two sub-stripes within each expansion group.

第1步,从保存编码数据的所有信息节点中下载并缓存每个扩展组中与右生成矩阵对应的子条带的所有的信息符号,将这些信息符号中位于子条带不同位置的信息符号分别迁移至信息节点Y1、Y2、…、Yρ中不同的信息节点中,将迁移后的信息符号和未迁移的信息符号组成合并后的子条带的信息符号;Step 1: Download and cache all the information symbols of the sub-strip corresponding to the right generator matrix in each extension group from all the information nodes that save the encoded data, and put the information symbols located in different positions of the sub-strip among these information symbols. Respectively migrate to different information nodes in the information nodes Y 1 , Y 2 , .

第2步,将每个扩展组中两个子条带中位于同一个元校验节点的两个元校验符号在该元校验节点中相加,得到更新后的元校验符号;Step 2, adding the two meta-check symbols located in the same meta-check node in the two sub-strips in each extended group in the meta-check node to obtain the updated meta-check symbol;

第3步,利用与每个扩展组中子条带对应的左生成矩阵的互补矩阵对缓存的信息符号进行编码得到校正符号,利用校正符号对左生成矩阵对应的子条带的非元校验符号进行更新。The third step is to use the complementary matrix of the left generator matrix corresponding to the sub-strip in each extended group to encode the buffered information symbols to obtain correction symbols, and use the correction symbols to check the non-element of the sub-strip corresponding to the left generator matrix. symbols are updated.

所述的每个扩展组中子条带对应的左生成矩阵的互补矩阵指的是,将与所述扩展组中子条带对应的左生成矩阵对应的被折叠的矩阵表示为矩阵

Figure GDA0003835645220000191
将所述扩展组中子条带对应的左生成矩阵中的左非元矩阵表示为矩阵W,用矩阵
Figure GDA0003835645220000192
的第l6′行至第l7′行的元素组成一个矩阵
Figure GDA0003835645220000193
表示编码数据当前采用的码的编码参数中的元校验节点数,将
Figure GDA0003835645220000194
得到的矩阵表示为矩阵T,用矩阵T中的所有非零列组成扩展组中子条带对应的左生成矩阵的互补矩阵。The complementary matrix of the left generator matrix corresponding to the sub-strip in each expansion group refers to that the collapsed matrix corresponding to the left generator matrix corresponding to the sub-strip in the expansion group is represented as a matrix
Figure GDA0003835645220000191
Denote the left non-element matrix in the left generator matrix corresponding to the sub-strip in the extended group as a matrix W, and use the matrix
Figure GDA0003835645220000192
The elements from row l 6 ' to row l 7 ' form a matrix
Figure GDA0003835645220000193
Indicates the number of meta-check nodes in the encoding parameters of the code currently used by the encoded data, set
Figure GDA0003835645220000194
The resulting matrix is denoted as matrix T, and all non-zero columns in matrix T are used to form the complementary matrix of the left generator matrix corresponding to the substrips in the extended group.

第4步,将每个扩展组中与右生成矩阵对应的子条带中不同位置的非元校验符号分别迁移至校验节点F1、F2、…、Fγ中不同的校验节点中;Step 4: Migrate the non-element check symbols from different positions in the sub-strip corresponding to the right generator matrix in each extension group to different check nodes in the check nodes F 1 , F 2 , . . . , F γ respectively middle;

第5步,将更新后的元校验符号、更新后的非元校验符号和迁移后的校验符号组成合并后子条带的校验符号。In step 5, the updated meta-check symbol, the updated non-meta-check symbol and the migrated check symbol are formed into the check symbol of the merged sub-strip.

本发明实施例中,由于图5(a)所示的第1个子条带的生成矩阵G1,1和第2个子条带的生成矩阵G1,2是相互关联的,故第1个子条带与第2个子条带划为一个扩展组,同样的,第3个子条带与第4个子条带划为一个扩展组。对于由第1个子条带与第2个子条带组成的扩展组,下载并缓存第2个子条带中所有的信息符号a5、a6、a7、a8,将这些信息符号依次迁移至节点D5、D6、D7、D8,迁移后的符号和未迁移的信息符号a1、a2、a3、a4组成了合并后的子条带的信息符号。在节点C1内部,通过合并

Figure GDA0003835645220000195
Figure GDA0003835645220000196
得到更新后的元校验符号
Figure GDA0003835645220000197
Figure GDA0003835645220000201
左生成矩阵G1,1对应的互补矩阵为
Figure GDA0003835645220000202
通过利用互补矩阵对缓存的信息符号a5、a6、a7、a8进行编码得到校正符号
Figure GDA0003835645220000203
Figure GDA0003835645220000204
利用得到的两个校正符号分别对
Figure GDA0003835645220000205
进行消除,即从
Figure GDA0003835645220000206
中减去
Figure GDA0003835645220000207
Figure GDA0003835645220000208
中减去
Figure GDA0003835645220000209
得到更新后的两个校验符号
Figure GDA00038356452200002010
将第2个子条带中的非元校验符号
Figure GDA00038356452200002011
分别迁移至节点C4、C5;将更新后的元校验符号、更新后的非元校验符号、迁移后的校验符号组成合并后子条带所有的校验符号。图5(b)中左侧的子条带为对第1个子条带、第2个子条带合并后得到的结果。采用同样的方式对第3个子条带、第4个子条带进行合并可以得到另外一个新的子条带。图5(b)中右侧的子条带为对第3个子条带、第4个子条带合并后得到的结果。In this embodiment of the present invention, since the generator matrix G 1,1 of the first sub-strip and the generator matrix G 1,2 of the second sub-strip shown in FIG. 5(a) are related to each other, the first sub-strip is The strip and the second sub-strip are grouped into an extended group, and similarly, the third and fourth sub-strips are grouped into an extended group. For the extended group consisting of the first sub-strip and the second sub-strip, download and cache all the information symbols a 5 , a 6 , a 7 , a 8 in the second sub-strip, and migrate these information symbols to Nodes D 5 , D 6 , D 7 , D 8 , the migrated symbols and the non-migrated information symbols a 1 , a 2 , a 3 , and a 4 form the information symbols of the merged sub-strip. Inside node C1 , by merging
Figure GDA0003835645220000195
and
Figure GDA0003835645220000196
Get the updated meta check symbol
Figure GDA0003835645220000197
which is
Figure GDA0003835645220000201
The complementary matrix corresponding to the left generator matrix G 1,1 is
Figure GDA0003835645220000202
Correction symbols are obtained by encoding the buffered information symbols a 5 , a 6 , a 7 , a 8 with complementary matrices
Figure GDA0003835645220000203
Figure GDA0003835645220000204
Using the obtained two correction symbols, respectively
Figure GDA0003835645220000205
to eliminate, i.e. from
Figure GDA0003835645220000206
subtract
Figure GDA0003835645220000207
from
Figure GDA0003835645220000208
subtract
Figure GDA0003835645220000209
Get the updated two check symbols
Figure GDA00038356452200002010
put the non-meta check symbols in the 2nd substrip
Figure GDA00038356452200002011
Migrate to nodes C 4 and C 5 respectively; compose the updated meta-check symbols, updated non-meta-check symbols, and migrated check symbols to form all the check symbols of the merged sub-stripes. The sub-strip on the left in Figure 5(b) is the result obtained by merging the first sub-strip and the second sub-strip. In the same way, the third sub-strip and the fourth sub-strip can be merged to obtain another new sub-strip. The sub-strip on the right in Figure 5(b) is the result obtained by merging the third sub-strip and the fourth sub-strip.

步骤4,将所有合并后的子条带的数据符号组成扩展后的编码数据。In step 4, the data symbols of all merged sub-strips are formed into the expanded coded data.

在本发明实施例中,将图5(b)中所示的两个新的子条带所有的数据符号组成了扩展后的编码数据。In the embodiment of the present invention, all the data symbols of the two new sub-strips shown in FIG. 5(b) form the extended coded data.

Claims (7)

1.一种折叠式可扩展分布式存储编码方法,其特征在于,计算下一个阶段的编码参数,根据生成矩阵的折叠规则构建上一个阶段对应的生成矩阵集,确定各阶段对应的码,将所有阶段对应的码组成一个编码组;该方法的步骤包括如下:1. a foldable scalable distributed storage coding method is characterized in that, calculating the coding parameters of the next stage, constructing the corresponding generating matrix set of the previous stage according to the folding rule of the generating matrix, determining the code corresponding to each stage, The codes corresponding to all stages form a coding group; the steps of the method include the following: (1)设定第1个阶段的编码参数:(1) Set the encoding parameters of the first stage: 将信息节点数k1、校验节点数r1、元校验节点数s1设定为第1个阶段的编码参数,其中,k1、r1为正整数,s1为非负整数且小于等于r1The number of information nodes k 1 , the number of check nodes r 1 , and the number of meta-check nodes s 1 are set as the encoding parameters of the first stage, wherein k 1 and r 1 are positive integers, s 1 is a non-negative integer and less than or equal to r 1 ; (2)计算下一个阶段的编码参数:(2) Calculate the encoding parameters of the next stage: (2a)按照下式,计算下一个阶段的信息节点数、校验节点数:(2a) Calculate the number of information nodes and check nodes in the next stage according to the following formula: k′=2kk'=2k r′=2r-sr′=2r-s 其中,k′、r′分别表示当前阶段的下一个阶段的信息节点数、校验节点数,k、r、s分别表示当前阶段的信息节点数、校验节点数、元校验节点数;Among them, k' and r' respectively represent the number of information nodes and check nodes in the next stage of the current stage, and k, r and s respectively represent the number of information nodes, check nodes and meta-check nodes in the current stage; (2b)从取值范围{s,s+1,…,2r-s}中选取一个与希望以低修复复杂度修复的同时失效的信息节点数最大值相等的值作为下一个阶段的元校验节点数;(2b) From the value range {s,s+1,...,2r-s}, select a value equal to the maximum number of information nodes that fail to be repaired at the same time with low repair complexity as the meta-calibration of the next stage Check the number of nodes; (3)判断当前迭代得到的编码参数的总数是否是等于m,若是,则执行步骤(4);否则,将本次确定的编码参数作为当前阶段的编码参数后执行步骤(2);m表示设定的拟构造的编码组中码的总数,m的值为大于等于2的整数;(3) Judging whether the total number of coding parameters obtained by the current iteration is equal to m, and if so, execute step (4); otherwise, execute step (2) after taking the coding parameters determined this time as the coding parameters of the current stage; m represents The total number of codes in the set coding group to be constructed, and the value of m is an integer greater than or equal to 2; (4)确定最终编码参数:(4) Determine the final encoding parameters: (4a)将当前迭代得到的编码参数中的校验节点数的值设定为当前迭代得到的编码参数中的元校验节点数;(4a) the value of the number of check nodes in the encoding parameter obtained by the current iteration is set to the number of meta-check nodes in the encoding parameter obtained by the current iteration; (4b)将当前迭代得到的信息节点数、校验节点数与确定的元校验节点数组成最终编码参数;(4b) The number of information nodes obtained by the current iteration, the number of check nodes and the determined number of meta-check nodes are formed into final coding parameters; (5)构建最后一个阶段对应的生成矩阵:(5) Construct the generator matrix corresponding to the last stage: 将一个系统型MDS码作为基本码,利用基本码的生成矩阵构造方法,构建一个与最后一个阶段对应的生成矩阵GmTaking a systematic MDS code as the basic code, and using the generator matrix construction method of the basic code, a generator matrix G m corresponding to the last stage is constructed:
Figure FDA0003835645210000021
Figure FDA0003835645210000021
其中,Gm表示最后一个阶段对应的生成矩阵,
Figure FDA0003835645210000022
表示一个km阶的单位矩阵,km的取值等于最终编码参数中的信息节点数,
Figure FDA0003835645210000023
表示一个rm行km列的矩阵,rm的取值等于最终编码参数中的校验节点数;
Among them, G m represents the generator matrix corresponding to the last stage,
Figure FDA0003835645210000022
represents a unit matrix of order k m , and the value of k m is equal to the number of information nodes in the final encoding parameter,
Figure FDA0003835645210000023
Represents a matrix with r m rows and k m columns, and the value of r m is equal to the number of check nodes in the final encoding parameter;
(6)设定生成矩阵的折叠规则:(6) Set the folding rules of the generated matrix: (6a)将待折叠的矩阵划分成A、B、X、U、V五个矩阵:其中,A表示待折叠的矩阵的第1行至第l1行组成的一个左信息矩阵,
Figure FDA0003835645210000024
Figure FDA0003835645210000025
表示当前阶段的上一阶段对应的编码参数中的信息节点数;B表示待折叠的矩阵的第l2行至第l3行组成的一个右信息矩阵,
Figure FDA0003835645210000026
X表示待折叠的矩阵的第l4行至前l5行组成的矩阵,
Figure FDA0003835645210000027
Figure FDA0003835645210000028
表示当前阶段的上一阶段对应的编码参数中的元校验节点数;U表示待折叠的矩阵的第l6行至第l7行组成的矩阵,
Figure FDA0003835645210000029
Figure FDA00038356452100000210
Figure FDA00038356452100000211
表示当前阶段的上一阶段对应的编码参数中的校验节点数;V表示待折叠的矩阵的第l8行至最后1行组成的一个右非元矩阵,
Figure FDA00038356452100000212
(6a) Divide the matrix to be folded into five matrices of A, B, X, U, and V: wherein, A represents a left information matrix composed of the 1st row to the 11th row of the matrix to be folded,
Figure FDA0003835645210000024
Figure FDA0003835645210000025
Represents the number of information nodes in the coding parameters corresponding to the previous stage of the current stage; B represents a right information matrix composed of the 12th row to the 13th row of the matrix to be folded,
Figure FDA0003835645210000026
X represents the matrix composed of the 14th row to the first 15th row of the matrix to be folded,
Figure FDA0003835645210000027
Figure FDA0003835645210000028
Represents the number of meta-check nodes in the coding parameters corresponding to the previous stage of the current stage; U represents the matrix formed by the 16th row to the 17th row of the matrix to be folded,
Figure FDA0003835645210000029
Figure FDA00038356452100000210
Figure FDA00038356452100000211
Represents the number of check nodes in the coding parameters corresponding to the previous stage of the current stage; V represents a right non-element matrix composed of the 18th row to the last row of the matrix to be folded,
Figure FDA00038356452100000212
(6b)生成一个与矩阵X行列元素均相等的矩阵,该矩阵的最后μ个非零列的元素全部置零后得到一个左元矩阵X′,
Figure FDA00038356452100000213
生成一个与矩阵X行列元素均相等的矩阵,该矩阵的前μ个非零列的元素全部置零后得到一个右元矩阵X″;将矩阵U的最后μ个非零列的元素全部置零后得到一个左非元矩阵U′;
(6b) Generate a matrix whose row and column elements are equal to those of matrix X, and obtain a left-element matrix X′ after all the elements of the last μ non-zero columns of the matrix are set to zero,
Figure FDA00038356452100000213
Generate a matrix with the same row and column elements as matrix X, and set all the elements of the first μ non-zero columns of the matrix to zero to obtain a right-element matrix X”; set all the elements of the last μ non-zero columns of the matrix U to zero Then get a left non-element matrix U';
(6c)按照下式,分别对矩阵A、矩阵X′、矩阵U′,矩阵B、矩阵X″、矩阵V进行组合,构建出对待折叠的矩阵折叠后的当前阶段的上一个阶段对应的生成矩阵集中的两个相互关联的生成矩阵:(6c) According to the following formula, combine matrix A, matrix X', matrix U', matrix B, matrix X'', and matrix V respectively to construct the generation corresponding to the previous stage of the current stage after the matrix to be folded is folded Two interrelated generator matrices in a matrix set:
Figure FDA00038356452100000214
Figure FDA00038356452100000214
Figure FDA00038356452100000215
Figure FDA00038356452100000215
其中,G′表示左生成矩阵、G″表示右生成矩阵;Among them, G′ represents the left generator matrix, and G″ represents the right generator matrix; (7)构建最后一个阶段的上一个阶段对应的生成矩阵集:(7) Construct the generation matrix set corresponding to the previous stage of the last stage: (7a)根据生成矩阵的折叠规则,对最后一个阶段对应的生成矩阵进行折叠,并将折叠后的生成矩阵加入到当前阶段的上一个阶段对应的生成矩阵集中;(7a) According to the folding rule of the generator matrix, the generator matrix corresponding to the last stage is folded, and the folded generator matrix is added to the generator matrix set corresponding to the previous stage of the current stage; (7b)判断m-1的值是否等于2,若是,则执行步骤(10),否则,将本次迭代确定的生成矩阵集作为当前阶段对应的生成矩阵集后执行步骤(8);(7b) judge whether the value of m-1 is equal to 2, if yes, then execute step (10), otherwise, perform step (8) after taking the generation matrix set determined by this iteration as the generation matrix set corresponding to the current stage; (8)构造上一阶段对应的生成矩阵集:(8) Construct the generating matrix set corresponding to the previous stage: 根据生成矩阵的折叠规则,对当前阶段对应的生成矩阵集中的每一个生成矩阵进行折叠,并将折叠得到的生成矩阵加入到当前阶段的上一个阶段对应的生成矩阵集中;According to the folding rule of the generator matrix, fold each generator matrix in the generator matrix set corresponding to the current stage, and add the generator matrix obtained by folding to the generator matrix set corresponding to the previous stage of the current stage; (9)判断当前迭代得到的生成矩阵集中生成矩阵的数量是否等于2m-1,若是,则执行步骤(10),否则,将本次确定的生成矩阵集作为当前阶段对应的生成矩阵集后执行步骤(8);(9) Determine whether the number of generator matrices in the generator matrix set obtained by the current iteration is equal to 2 m-1 , if so, perform step (10), otherwise, take the generator matrix set determined this time as the generator matrix set corresponding to the current stage. Execute step (8); (10)确定各阶段的码:(10) Determine the code of each stage: 将每一个阶段对应的编码参数作为其对应的码的编码参数;将最后一个阶段对应的生成矩阵作为最后一个阶段的码的子条带的生成矩阵;将除最后一个阶段以外的其他每个阶段对应的生成矩阵集中的各生成矩阵作为其对应的码的各子条带的生成矩阵;Take the encoding parameter corresponding to each stage as the encoding parameter of its corresponding code; take the generator matrix corresponding to the last stage as the generator matrix of the sub-strip of the code of the last stage; take every other stage except the last stage Each generator matrix in the corresponding generator matrix set is used as the generator matrix of each sub-strip of its corresponding code; (11)将所有阶段的码组合成一个编码组;(11) code of all stages is combined into a coding group; (12)从编码组中选择一个码,所选码的编码参数中信息节点数与校验节点数的和等于希望采用的节点总数;(12) select a code from the coding group, and the sum of the number of information nodes and the number of check nodes in the coding parameters of the selected code is equal to the total number of nodes expected to be adopted; (13)对待编码数据进行编码:(13) Encode the data to be encoded: 将待编码数据平均划分为t个信息符号,t=km;用所选码的每一个子条带对应的生成矩阵分别对待编码数据进行编码,得到该子条带的数据符号,由所有子条带的数据符号组成编码数据;将编码数据保存到相应节点中。Divide the data to be encoded into t information symbols on average, t =km; encode the data to be encoded with the corresponding generator matrix of each sub-strip of the selected code, obtain the data symbols of this sub-strip, by all sub-strips The data symbols of the strip make up the encoded data; the encoded data is stored in the corresponding node.
2.根据权利要求1所述的一种折叠式可扩展分布式存储编码方法,其特征在于,步骤(13)中所述的子条带的数据符号包括信息符号和校验符号两类,校验符号包括元校验符号和非元校验符号两小类,所述数据符号通过生成矩阵中各行对应的编码系数行矩阵对待编码数据进行编码得到;所述信息符号为左生成矩阵中的左信息矩阵或右生成矩阵中的右信息矩阵对待编码数据进行编码得到的数据符号;所述元校验符号为左生成矩阵中的左元矩阵或右生成矩阵中的右元矩阵对待编码数据进行编码得到的数据符号;所述非元校验符号为左生成矩阵中的左非元矩阵或右生成矩阵中的右非元矩阵对待编码数据进行编码得到的数据符号。2. a kind of foldable scalable distributed storage coding method according to claim 1, is characterized in that, the data symbol of the sub-strip described in the step (13) comprises two kinds of information symbols and check symbols, The verification symbols include two sub-categories of meta-check symbols and non-meta-check symbols, and the data symbols are obtained by encoding the data to be encoded by the coding coefficient row matrix corresponding to each row in the generator matrix; the information symbols are the left generator matrix in the left generator matrix. The data symbol obtained by encoding the data to be encoded by the right information matrix in the information matrix or the right generator matrix; the element check symbol is the left element matrix in the left generator matrix or the right element matrix in the right generator matrix to encode the data to be encoded The obtained data symbol; the non-element check symbol is the data symbol obtained by encoding the data to be encoded by the left non-element matrix in the left generator matrix or the right non-element matrix in the right generator matrix. 3.根据权利要求1所述的一种折叠式可扩展分布式存储编码方法,其特征在于,步骤(13)中所述的将编码数据保存到相应节点中指的是,每一个子条带中不同位置的数据符号保存在不同的节点中,位于不同子条带中相同位置的数据符号保存在同一个节点中;所述的节点包括信息节点和校验节点两类,所述校验节点包括元校验节点与非元校验节点两小类;所述信息节点用于保存信息符号;所述元校验节点用于保存元校验符号;所述非元校验节点用于保存非元校验符号。3. a kind of foldable scalable distributed storage coding method according to claim 1, is characterized in that, described in step (13), to save coded data in corresponding node refers to, in each sub-strip Data symbols at different positions are stored in different nodes, and data symbols at the same position in different sub-strips are stored in the same node; the nodes include two types of information nodes and check nodes, and the check nodes include There are two types of meta-check nodes and non-meta-check nodes; the information nodes are used to store information symbols; the meta-check nodes are used to store meta-check symbols; the non-meta-check nodes are used to store non-meta-check nodes. Check mark. 4.根据权利要求1所述的折叠式可扩展分布式存储编码的一种折叠式可扩展分布式存储编码修复方法,其特征在于,对每个数据集合进行处理,对信息符号进行编码,保存恢复出的数据符号;该方法的步骤包括如下:4. A kind of foldable scalable distributed storage coding repair method of foldable scalable distributed storage coding according to claim 1 is characterized in that, each data set is processed, information symbols are coded, stored The recovered data symbols; the steps of the method include the following: (1)将每个编码数据采用同一个码编码的失效节点的总数大于该码编码参数中的校验节点数的情况放弃修复,其余情况执行步骤(2);(1) The total number of failed nodes encoded by the same code for each encoded data is greater than the number of check nodes in the code encoding parameters, and the repair is abandoned, and step (2) is performed in the remaining cases; (2)判断所有失效节点的失效信息节点的总数是否为非0值,若是,则执行步骤(3);否则,判定不存在失效信息节点但存在失效校验节点后执行步骤(6);(2) Determine whether the total number of failed information nodes of all failed nodes is a non-zero value, and if so, execute step (3); otherwise, execute step (6) after determining that there is no failed information node but there is a failed check node; (3)判断
Figure FDA0003835645210000041
是否成立,若是,则执行步骤(4),否则,执行步骤(7);其中,α表示所有失效节点的失效信息节点的总数,
Figure FDA0003835645210000042
表示每个编码数据采用同一个码编码的参数中的元校验节点数、λ表示所有失效校验节点中失效元校验节点的总数;
(3) Judgment
Figure FDA0003835645210000041
Is it true, if so, go to step (4), otherwise, go to step (7); where α represents the total number of failure information nodes of all failed nodes,
Figure FDA0003835645210000042
Represents the number of meta-check nodes in the parameters encoded by the same code for each encoded data, and λ represents the total number of failed meta-check nodes in all failed check nodes;
(4)划分数据符号:(4) Divide data symbols: 从与失效节点保存同一个编码数据的每个未失效的信息节点中分别下载该信息节点保存的所有的信息符号,从与失效节点保存同一个编码数据的每个未失效的元校验节点中下载该元校验节点保存的所有的元校验符号,从与失效节点保存同一个编码数据的所有未失效的非元校验节点中任意选择η个非元校验节点下载该非元校验节点中所有的非元校验符号;将属于同一个子条带的数据符号划为同一个符号组,将属于同一个编码数据的符号组划为同一个数据集合;其中η的值等于
Figure FDA0003835645210000054
Download all information symbols saved by the information node from each non-invalidated information node that stores the same encoded data as the failed node, and download all the information symbols stored by the failed node from each non-invalidated meta-check node that stores the same encoded data as the failed node. Download all meta-check symbols saved by the meta-check node, and randomly select n non-meta-check nodes from all non-meta-check nodes that store the same encoded data as the failed node to download the non-meta-check node All non-meta-check symbols in the node; the data symbols belonging to the same sub-strip are divided into the same symbol group, and the symbol groups belonging to the same encoded data are divided into the same data set; the value of n is equal to
Figure FDA0003835645210000054
(5)对每个数据集合进行处理:(5) Process each data set: (5a)按照下式,对每一个数据集合中的每一个符号组进行编号:(5a) Number each symbol group in each data set according to the following formula:
Figure FDA0003835645210000051
Figure FDA0003835645210000051
其中,jh,c表示第h个数据集合中第c个符号组的编号,
Figure FDA0003835645210000052
表示向上取整操作,qh,c表示第h个数据集合中第c个符号组对应子条带的生成矩阵中第一行元素中非零元素的列的序号,
Figure FDA0003835645210000053
表示每个编码数据采用同一个码编码的参数中的信息节点数;
Among them, j h,c represents the number of the c-th symbol group in the h-th data set,
Figure FDA0003835645210000052
Represents the round-up operation, q h,c represents the sequence number of the column of the non-zero element in the first row element in the generator matrix of the c-th symbol group corresponding to the sub-strip in the h-th data set,
Figure FDA0003835645210000053
Indicates the number of information nodes in the parameters encoded by the same code for each encoded data;
(5b)对于每一个数据集合从该数据集合中编号为1的符号组依次对该数据集合中的每个符号组进行译码-消除操作;(5b) for each data set, perform decoding-elimination operation on each symbol group in the data set in turn from the symbol group numbered 1 in the data set; (5c)所有的数据集合均已进行处理后执行步骤(9);(5c) Step (9) is performed after all data sets have been processed; (6)划分信息符号:(6) Divide information symbols: 从与失效节点保存同一个编码数据的每个信息节点中下载该信息节点保存的所有的信息符号,将属于同一个子条带的信息符号划为同一个符号组;将属于同一个编码数据的符号组划为同一个数据集合后执行步骤(9);Download all the information symbols stored by the information node from each information node that stores the same encoded data as the failed node, and divide the information symbols belonging to the same sub-strip into the same symbol group; classify the symbols belonging to the same encoded data Step (9) is performed after the group is divided into the same data set; (7)恢复失效信息节点对应的信息符号:(7) Restore the information symbol corresponding to the failed information node: (7a)从与失效节点保存同一个编码数据的每个未失效的信息节点中下载该信息节点保存的所有的信息符号,从与失效节点保存同一个编码数据的所有未失效的元校验节点中任意选择与α的值相等数量的元校验节点下载该元校验节点保存的所有的元校验符号;将属于同一个子条带的数据符号划为同一个符号组;(7a) Download all the information symbols saved by the information node from each non-invalidated information node that stores the same encoded data as the failed node, and download all the non-invalidated meta-check nodes that store the same encoded data with the failed node. Arbitrarily select a number of meta-check nodes equal to the value of α to download all meta-check symbols saved by the meta-check node; divide the data symbols belonging to the same sub-strip into the same symbol group; (7b)利用与基本码对应的译码方法对每个符号组中的数据符号进行译码,恢复出该符号组中与失效信息节点对应的信息符号,并将其加入到该符号组中;(7b) utilize the decoding method corresponding to the basic code to decode the data symbols in each symbol group, recover the information symbols corresponding to the failed information nodes in the symbol group, and add it to the symbol group; (7c)将属于同一个编码数据的符号组划为同一个数据集合;(7c) dividing the symbol groups belonging to the same encoded data into the same data set; (8)判断所有失效节点中是否存在失效校验节点,若是,则执行步骤(9),否则,执行步骤(10);(8) Judging whether there is a failed check node in all the failed nodes, if so, execute step (9), otherwise, execute step (10); (9)对信息符号进行编码:(9) Encode the information symbol: 利用失效校验节点对应的编码系数行矩阵对每个数据集合中所有符号组中所有的信息符号进行编码,恢复出失效校验节点对应的校验符号,并将恢复后的校验符号加入到与其对应同一个子条带的符号组中;All information symbols in all symbol groups in each data set are encoded using the coding coefficient row matrix corresponding to the failed check node, the check symbol corresponding to the failed check node is recovered, and the recovered check symbol is added to the in the symbol group corresponding to the same sub-strip; (10)保存恢复出的数据符号:(10) Save the recovered data symbols: 增加与α的值相等数量的新的信息节点,增加与所有失效节点的失效校验节点的总数相等数量的新的校验节点,将恢复出的属于同一个信息节点的信息符号保存在同一个信息节点中,将恢复出的属于同一个校验节点的校验符号保存在同一个校验节点中;Add a number of new information nodes equal to the value of α, add a number of new check nodes equal to the total number of failed check nodes of all failed nodes, and save the recovered information symbols belonging to the same information node in the same In the information node, the recovered check symbols belonging to the same check node are stored in the same check node; (11)用新的节点替换掉失效的节点。(11) Replace the failed node with a new node.
5.根据权利要求4所述的一种折叠式可扩展分布式存储编码修复方法,其特征在于,步骤(5b)中所述的译码-消除操作是先进行译码操作再进行消除操作,译码操作指的是,利用与基本码对应的译码方法对该数据集合中的当前符号组中的数据符号进行译码,恢复出失效信息节点对应的信息符号,并将其加入到该数据集合中的当前符号组中;消除操作指的是,利用当前符号组中的信息符号对该数据集合中编号大于当前符号组的编号的每一个符号组中的校验符号进行消除;消除是指若校验符号的编码系数行矩阵中与信息符号对应的元素的值为非零值,则将这些信息符号与对应的非零值相乘后从校验符号中减去,得到消除后的校验符号。5. a kind of foldable scalable distributed storage coding repair method according to claim 4, is characterized in that, the decoding-elimination operation described in the step (5b) is to carry out the decoding operation and then carry out the elimination operation again, The decoding operation refers to decoding the data symbols in the current symbol group in the data set by using the decoding method corresponding to the basic code, recovering the information symbols corresponding to the failed information nodes, and adding them to the data. In the current symbol group in the set; the elimination operation refers to using the information symbols in the current symbol group to eliminate the check symbols in each symbol group whose number is greater than the number of the current symbol group in the data set; elimination refers to If the values of the elements corresponding to the information symbols in the coding coefficient row matrix of the check symbols are non-zero values, multiply these information symbols by the corresponding non-zero values and then subtract them from the check symbols to obtain the eliminated calibration symbols. test symbol. 6.根据权利要求1所述的折叠式可扩展分布式存储编码的一种折叠式可扩展分布式存储编码扩展方法,其特征在于,将编码数据中生成矩阵由同一个矩阵折叠得到的两个子条带划为一个扩展组,合并同一个扩展组内的两条子条带,将所有新的子条带的数据符号组成扩展后的新的编码数据,该方法的步骤包括如下:6. a kind of foldable scalable distributed storage coding expansion method of foldable scalable distributed storage coding according to claim 1, is characterized in that, the two subfolders obtained by the same matrix folding in the coded data are generated by the matrix. The strip is divided into an extended group, two sub-strips in the same extended group are merged, and the data symbols of all the new sub-strips are formed into expanded new encoded data. The steps of the method include the following: (1)添加新的节点:(1) Add a new node: 对除编码数据采用最后一个阶段的码进行编码的情况之外,向保存编码数据的节点中添加ρ个信息节点Y1、Y2、…、Yρ和γ个校验节点F1、F2、…、Fγ,其中,
Figure FDA0003835645210000061
Figure FDA0003835645210000062
表示编码数据当前采用的码对应阶段的下一个阶段的码的编码参数中的信息节点数,
Figure FDA0003835645210000063
表示编码数据当前采用的码的编码参数中的信息节点数,
Figure FDA0003835645210000064
Figure FDA0003835645210000065
表示编码数据当前采用的码对应阶段的下一个阶段的码的编码参数中的校验节点数,
Figure FDA0003835645210000066
表示编码数据当前采用的码的编码参数中的校验节点数;
Add ρ information nodes Y 1 , Y 2 , . , ..., F γ , where,
Figure FDA0003835645210000061
Figure FDA0003835645210000062
Indicates the number of information nodes in the coding parameters of the code of the next stage of the code corresponding to the current stage of the coded data,
Figure FDA0003835645210000063
Indicates the number of information nodes in the encoding parameter of the code currently used by the encoded data,
Figure FDA0003835645210000064
Figure FDA0003835645210000065
Indicates the number of check nodes in the coding parameters of the code of the next stage of the code corresponding to the current stage of the coded data,
Figure FDA0003835645210000066
Indicates the number of check nodes in the encoding parameters of the code currently used by the encoded data;
(2)将编码数据中生成矩阵相互关联的两个子条带划为一个扩展组;(2) the two sub-strips in which the generation matrix is related to each other in the encoded data are divided into an extended group; (3)合并每个扩展组内的两个子条带:(3) Merge the two sub-stripes within each extension group: (3a)从保存编码数据的所有信息节点中下载并缓存每个扩展组中与右生成矩阵对应的子条带的所有的信息符号,将这些信息符号中位于子条带不同位置的信息符号分别迁移至信息节点Y1、Y2、…、Yρ中不同的信息节点中,将迁移后的信息符号和未迁移的信息符号组成合并后的子条带的信息符号;(3a) Download and cache all the information symbols of the sub-strip corresponding to the right generator matrix in each extension group from all the information nodes that save the encoded data, and separate the information symbols located at different positions of the sub-strip among these information symbols. Migrating to different information nodes in the information nodes Y 1 , Y 2 , . (3b)将每个扩展组中两个子条带中位于同一个元校验节点的两个元校验符号在该元校验节点中相加,得到更新后的元校验符号;(3b) adding the two meta-check symbols located in the same meta-check node in the two sub-strips in each extended group in the meta-check node to obtain the updated meta-check symbol; (3c)利用与每个扩展组中子条带对应的左生成矩阵的互补矩阵对缓存的信息符号进行编码得到校正符号,利用校正符号对左生成矩阵对应的子条带的非元校验符号进行更新;(3c) Use the complementary matrix of the left generator matrix corresponding to the sub-strip in each extension group to encode the buffered information symbols to obtain correction symbols, and use the correction symbols to encode the non-element check symbols of the sub-strip corresponding to the left generator matrix to update; (3d)将每个扩展组中与右生成矩阵对应的子条带中不同位置的非元校验符号分别迁移至校验节点F1、F2、…、Fγ中不同的校验节点中;(3d) Migrate the non-meta-check symbols at different positions in the sub-strip corresponding to the right generator matrix in each extension group to different check nodes in the check nodes F 1 , F 2 , . . . , F γ respectively ; (3e)将更新后的元校验符号、更新后的非元校验符号和迁移后的校验符号组成合并后子条带的校验符号;(3e) the updated meta-check symbol, the updated non-meta-check symbol and the migrated check symbol are formed into the check symbol of the merged sub-strip; (4)将所有合并后的子条带的数据符号组成扩展后的编码数据。(4) The data symbols of all the merged sub-strips are formed into the extended coded data.
7.根据权利要求6所述的一种折叠式可扩展分布式存储编码扩展方法,其特征在于,步骤(3c)中所述的每个扩展组中子条带对应的左生成矩阵的互补矩阵指的是,将与所述扩展组中子条带对应的左生成矩阵对应的被折叠的矩阵表示为矩阵
Figure FDA0003835645210000071
将所述扩展组中子条带对应的左生成矩阵中的左非元矩阵表示为矩阵W,用矩阵
Figure FDA0003835645210000072
的第l6′行至第l7′行的元素组成一个矩阵
Figure FDA0003835645210000073
Figure FDA0003835645210000074
Figure FDA0003835645210000075
表示编码数据当前采用的码的编码参数中的元校验节点数,将
Figure FDA0003835645210000076
得到的矩阵表示为矩阵T,用矩阵T中的所有非零列组成扩展组中子条带对应的左生成矩阵的互补矩阵。
7. a kind of foldable scalable distributed storage coding expansion method according to claim 6, is characterized in that, the complementary matrix of the left generating matrix corresponding to the sub-strip in each expansion group described in step (3c) Refers to denoting the collapsed matrix corresponding to the left generating matrix corresponding to the substrip in the expanded group as a matrix
Figure FDA0003835645210000071
Denote the left non-element matrix in the left generator matrix corresponding to the sub-strip in the extended group as a matrix W, and use the matrix
Figure FDA0003835645210000072
The elements from row l 6 ' to row l 7 ' form a matrix
Figure FDA0003835645210000073
Figure FDA0003835645210000074
Figure FDA0003835645210000075
Indicates the number of meta-check nodes in the encoding parameters of the code currently used by the encoded data, set
Figure FDA0003835645210000076
The resulting matrix is denoted as matrix T, and all non-zero columns in matrix T are used to form the complementary matrix of the left generator matrix corresponding to the substrips in the extended group.
CN202110726617.1A 2021-06-29 2021-06-29 Folding type extensible distributed storage coding and repairing and expanding method Active CN113391948B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110726617.1A CN113391948B (en) 2021-06-29 2021-06-29 Folding type extensible distributed storage coding and repairing and expanding method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110726617.1A CN113391948B (en) 2021-06-29 2021-06-29 Folding type extensible distributed storage coding and repairing and expanding method

Publications (2)

Publication Number Publication Date
CN113391948A CN113391948A (en) 2021-09-14
CN113391948B true CN113391948B (en) 2022-10-21

Family

ID=77624387

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110726617.1A Active CN113391948B (en) 2021-06-29 2021-06-29 Folding type extensible distributed storage coding and repairing and expanding method

Country Status (1)

Country Link
CN (1) CN113391948B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102571104A (en) * 2012-01-15 2012-07-11 西安电子科技大学 Distributed encoding and decoding method for RA (Repeat Accumulate) code
CN102667761A (en) * 2009-06-19 2012-09-12 布雷克公司 Scalable Cluster Database
CN103688514A (en) * 2013-02-26 2014-03-26 北京大学深圳研究生院 An Encoding and Storage Node Restoration Method of Minimal Storage Regeneration Code
CN103688515A (en) * 2013-03-26 2014-03-26 北京大学深圳研究生院 An Encoding and Storage Node Restoration Method of Minimal Bandwidth Regenerating Codes
CN104503706A (en) * 2014-12-23 2015-04-08 中国科学院计算技术研究所 Data storing method and data reading method based on disk array
CN106790408A (en) * 2016-11-29 2017-05-31 中国空间技术研究院 A kind of coding method repaired for distributed memory system node

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9098519B2 (en) * 2008-09-16 2015-08-04 File System Labs Llc Methods and apparatus for distributed data storage

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102667761A (en) * 2009-06-19 2012-09-12 布雷克公司 Scalable Cluster Database
CN102571104A (en) * 2012-01-15 2012-07-11 西安电子科技大学 Distributed encoding and decoding method for RA (Repeat Accumulate) code
CN103688514A (en) * 2013-02-26 2014-03-26 北京大学深圳研究生院 An Encoding and Storage Node Restoration Method of Minimal Storage Regeneration Code
CN103688515A (en) * 2013-03-26 2014-03-26 北京大学深圳研究生院 An Encoding and Storage Node Restoration Method of Minimal Bandwidth Regenerating Codes
CN104503706A (en) * 2014-12-23 2015-04-08 中国科学院计算技术研究所 Data storing method and data reading method based on disk array
CN106790408A (en) * 2016-11-29 2017-05-31 中国空间技术研究院 A kind of coding method repaired for distributed memory system node

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
Distributed cloud storage using network coding;Márton Sipos等;《2014 IEEE 11th Consumer Communications and Networking Conference (CCNC)》;20150309;全文 *
一种网络编码分布式存储系统中的数据更新策略;刘冰星等;《小型微型计算机系统》;20170315;第38卷(第03期);全文 *
分布式存储中的纠删码容错技术研究;王意洁等;《计算机学报》;20160919;第40卷(第01期);全文 *
随机二元扩展码:一种适用于分布式存储系统的编码;陈亮等;《计算机学报》;20171114;第40卷(第9期);全文 *

Also Published As

Publication number Publication date
CN113391948A (en) 2021-09-14

Similar Documents

Publication Publication Date Title
CN101106381B (en) Hierarchical LDPC decoder and decoding processing method
CN103688514B (en) An Encoding and Storage Node Restoration Method of Minimal Storage Regeneration Code
CN103688515B (en) An Encoding and Storage Node Restoration Method of Minimal Bandwidth Regenerating Codes
CN103336785B (en) A kind of distributed storage method based on network code and device thereof
CN107086870B (en) MDS array code encoding and decoding method for repairing multi-node failure
CN109491835B (en) Data fault-tolerant method based on dynamic block code
CN103746774B (en) The fault-tolerant coding method that a kind of efficient data is read
CN110764950A (en) Hybrid coding method, data repair method, and system based on RS code and regeneration code
WO2016058289A1 (en) Mds erasure code capable of repairing multiple node failures
CN108132854B (en) Erasure code decoding method capable of simultaneously recovering data elements and redundant elements
CN107395207A (en) The MDS array codes coding and restorative procedure of more fault-tolerances
WO2018072294A1 (en) Method for constructing check matrix and method for constructing horizontal array erasure code
CN109358980A (en) A RAID6 encoding method friendly to data update and single-disk error repair
CN110781024B (en) Matrix Construction Method of Symmetric Partial Repetition Code and Faulty Node Restoration Method
CN108762978B (en) A Grouping Construction Method of Partially Repeated Cyclic Codes
CN112799605A (en) Square Partially Repeated Code Construction Method, Node Repair Method and Capacity Calculation Method
CN113391948B (en) Folding type extensible distributed storage coding and repairing and expanding method
US10187084B2 (en) Method of encoding data and data storage system
CN119834814A (en) System MSR code generation and restoration method with minimum packet number
WO2017041233A1 (en) Encoding and storage node repairing method for functional-repair regenerating code
WO2020029418A1 (en) Method for constructing repair binary code generator matrix and repair method
WO2018119976A1 (en) Efficient data layout optimization method for data warehouse system
CN111679793A (en) A fast recovery method for single disk failure based on STAR code
CN108647108B (en) Construction method of minimum bandwidth regeneration code based on cyclic VFRC
CN115993941A (en) Distributed data storage error correction method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant