CN113391948B

CN113391948B - Folding type extensible distributed storage coding and repairing and expanding method

Info

Publication number: CN113391948B
Application number: CN202110726617.1A
Authority: CN
Inventors: 孙蓉; 杜从军; 刘景伟; 裴庆祺
Original assignee: Xidian University
Current assignee: Xidian University
Priority date: 2021-06-29
Filing date: 2021-06-29
Publication date: 2022-10-21
Anticipated expiration: 2041-06-29
Also published as: CN113391948A

Abstract

The invention discloses a foldable and scalable distributed storage encoding, repairing and expanding method, comprising: after determining the encoding parameters of each node, starting from the last stage of the generation matrix, sequentially constructing the generating matrix sets corresponding to other stages, and combining a code group and select a code from it to encode the data to be encoded; when a node failure occurs, select the same number of non-failed nodes as the information nodes and download data symbols from them to restore the data symbols in the failed node; when the encoded data is extended Merge the two sub-stripes within each expansion group. The invention has the advantages of being able to improve the fault-tolerant capability of the expanded system, possessing MDS properties, low expansion bandwidth, and multi-expansion, and can be used for coding, repairing and expanding the distributed storage system whose nodes have computing power.

Description

A foldable and scalable distributed storage encoding, repair and expansion method

技术领域technical field

本发明属于计算机技术领域，更进一步涉及分布式存储技术领域中的一种折叠式可扩展分布式存储编码及修复、扩展方法。本发明可用于对节点具有计算能力的分布式存储系统进行编码、修复和扩展。The invention belongs to the field of computer technology, and further relates to a foldable and scalable distributed storage coding, repair and expansion method in the field of distributed storage technology. The present invention can be used to code, repair and expand a distributed storage system whose nodes have computing power.

背景技术Background technique

由于分布式存储系统数据存储量大，节点失效的事件频发，分布式存储系统需要通过存储冗余数据提高系统的可靠性。纠删码技术是一种典型的数据冗余机制，通过将原始数据分割成信息块后进行编码产生校验块，并将信息块与校验块分散存储在分布式存储系统的节点中，实现容错的目的。分布式存储系统的规模通常随着使用时间的变化而不断增大，有越来越多的新节点加入到系统中，理想情况下分布式存储系统的编码参数应当可以根据应用需求进行动态扩展，基于纠删码的分布式存储系统在进行码参数扩展时，数据的迁移、更新过程需要在节点之间传输大量的数据，这会造成大量的网络带宽资源消耗，影响分布式存储系统的性能。Due to the large amount of data storage in the distributed storage system and frequent node failure events, the distributed storage system needs to improve the reliability of the system by storing redundant data. Erasure coding technology is a typical data redundancy mechanism. The original data is divided into information blocks and then encoded to generate check blocks, and the information blocks and check blocks are scattered and stored in the nodes of the distributed storage system. fault tolerance purpose. The scale of the distributed storage system usually increases with the change of usage time, and more and more new nodes are added to the system. Ideally, the coding parameters of the distributed storage system should be dynamically expandable according to the application requirements. When the erasure code-based distributed storage system expands the code parameters, a large amount of data needs to be transmitted between nodes in the process of data migration and update, which will cause a large amount of network bandwidth resource consumption and affect the performance of the distributed storage system.

华中科技大学在申请的专利文献“一种基于网络编码的存储扩展方法”(专利申请号：201810304384.4，公布号：CN 108536396 B)中公开了一种基于网络编码的存储扩展方法。该方法的思路是将存储扩展前的条带划分为多个扩展组，并进一步将每个扩展组划分为PG和DG；在DG内循环地依次从原节点中取数据块，得到一系列的数据集合；利用网络编码对每一个数据集合进行编码生成更新块，使用这些更新块对PG中的编码块进行本地更新或者异地更新；将编码块或者数据块传输至新增节点上，并保持扩展后数据块与编码块在所有节点上均匀放置；删除所有传输至新节点的数据块和编码块，并删除DG内所有的编码块。该方法在存储扩展时利用存储节点的计算资源对数据块进行编码并对部分编码块进行本地更新，减少了扩展带宽，但是，该方法仍然存在的不足之处是，进行存储扩展后系统中节点的数量增多，需要更多的编码块来满足系统对于容错能力的要求，该方法在存储扩展时单个条带内编码块的数量保持不变，系统扩展后的容错能力无法适应系统的节点规模。Huazhong University of Science and Technology disclosed a network coding-based storage expansion method in the patent document "A Network Coding-Based Storage Expansion Method" (Patent Application No.: 201810304384.4, Publication No.: CN 108536396 B). The idea of this method is to divide the strip before storage expansion into multiple expansion groups, and further divide each expansion group into PG and DG; cyclically take data blocks from the original node in DG, and obtain a series of Data set; use network coding to encode each data set to generate update blocks, and use these update blocks to update the code blocks in the PG locally or off-site; transfer the code blocks or data blocks to the new node, and keep the expansion The latter data blocks and coding blocks are evenly placed on all nodes; all data blocks and coding blocks transmitted to the new node are deleted, and all coding blocks in the DG are deleted. This method utilizes the computing resources of storage nodes to encode data blocks and locally update some of the encoded blocks during storage expansion, which reduces the expansion bandwidth. However, this method still has the disadvantage that after storage expansion, the nodes in the system The number of coding blocks increases, and more coding blocks are needed to meet the system's requirements for fault tolerance. This method keeps the number of coding blocks in a single strip unchanged during storage expansion, and the fault tolerance after system expansion cannot adapt to the node scale of the system.

陈亮等人发表的论文“随机二元扩展码：一种适用于分布式存储系统的编码”(计算机学报，2017年9月)提出一种可以动态调整码率和纠删能力的编码方法。该方法的编码矩阵由一个单元阵和一个随机阵构成，采用自顶向下的设计模式，通过控制随机矩阵中各个元素生成，达到码字整体上的高性能。该方法的参数具有动态调整的能力，其编码矩阵的行列可以自由伸缩，进而，存储系统可根据应用需求的变化，动态调整码率和纠删能力，该方法仍然存在的不足是，在对失效节点进行修复时需要连接的节点数量大于信息节点数量，修复度高。The paper "Random Binary Spreading Codes: A Coding Applicable to Distributed Storage Systems" (Journal of Computer, September 2017) published by Chen Liang et al. proposes a coding method that can dynamically adjust the code rate and erasure capability. The coding matrix of the method is composed of a unit matrix and a random matrix. The top-down design mode is adopted, and the generation of each element in the random matrix is controlled to achieve the overall high performance of the codeword. The parameters of this method have the ability to dynamically adjust, and the rows and columns of the encoding matrix can be freely scaled, and further, the storage system can dynamically adjust the code rate and erasure ability according to changes in application requirements. When a node is repaired, the number of nodes that need to be connected is greater than the number of information nodes, and the repair degree is high.

Francisco Maturana等人发表的论文“Convertible codes:New class of codesfor efficient conversion of coded data in distributed storage”(11thInnovations in Theoretical Computer Science Conference ser.LeibnizInternational Proceedings in Informatics,vol.151,pp.66:1-66:26,2020.)提出了一种有效进行码转换的编码方法，并在其后发表的论文“Bandwidth Cost of CodeConversions in Distributed Storage:Fundamental Limits and OptimalConstructions”(arXiv:2008.12707)中借助网络信息流图对码转换问题进行了分析，提出一种用于降低转换带宽的Convertible codes编码方案。Convertible codes有效降低了系统在从初始码向最终码进行扩展时的资源消耗，但该方法仍然存在的不足是，该方法仅可以以低的带宽资源消耗扩展一次。The paper "Convertible codes: New class of codes for efficient conversion of coded data in distributed storage" by Francisco Maturana et al. (11thInnovations in Theoretical Computer Science Conference ser. LeibnizInternational Proceedings in Informatics,vol.151,pp.66:1-66: 26, 2020.) proposed an efficient encoding method for code conversion, and used the network information flow graph in the subsequent paper "Bandwidth Cost of CodeConversions in Distributed Storage: Fundamental Limits and Optimal Constructionions" (arXiv:2008.12707). The problem of code conversion is analyzed, and a coding scheme of Convertible codes is proposed to reduce the conversion bandwidth. Convertible codes effectively reduce the resource consumption of the system when extending from the initial code to the final code, but this method still has the disadvantage that this method can only be extended once with low bandwidth resource consumption.

发明内容SUMMARY OF THE INVENTION

本发明的目的在于针对上述现有技术的不足，提出一种折叠式可扩展分布式存储编码及修复、扩展方法，旨在解决系统扩展后的容错能力无法适应节点规模、修复失效节点时修复度高、可扩展次数少的问题。The purpose of the present invention is to propose a foldable and scalable distributed storage coding, repair and expansion method in view of the deficiencies of the above-mentioned prior art, aiming at solving the problem that the fault tolerance capability of the system after expansion cannot adapt to the node scale, and the repair degree when repairing a failed node The problem of high and low scalability.

实现本发明目的的思路是：由于本发明的编码方法是按照公式从第1个阶段的编码参数开始依次计算其他阶段的编码参数，由于在计算出的校验节点数量增多，解决了系统扩展后的容错能力无法适应节点规模的问题。将一个系统型MDS码作为基本码构造最后一个阶段对应的生成矩阵，并根据设定的折叠规则逆序地依次构造出其他阶段对应的生成矩阵集，组合出一个编码组，并从编码组中选择一个码对待编码数据进行编码。由于本发明的修复方法是从未失效的节点中选择与信息节点数相同数量的节点下载数据符号对失效节点中的数据符号进行恢复，由于选择的节点数等于信息节点数，解决了修复失效节点时修复度高的问题。由于本发明的扩展方法是对编码数据进行扩展时将每个扩展组内的两个子条带进行合并，由于编码组中有多个码，这样的合并过程可以进行多次，解决了可扩展次数少的问题。The idea of realizing the object of the present invention is: because the encoding method of the present invention calculates the encoding parameters of other stages in turn from the encoding parameters of the first stage according to the formula, due to the increase in the number of the calculated check nodes, it solves the problem of system expansion. The fault tolerance ability cannot adapt to the problem of node scale. A systematic MDS code is used as the basic code to construct the generator matrix corresponding to the last stage, and according to the set folding rules, the generator matrix sets corresponding to the other stages are constructed in reverse order, and a coding group is combined, and selected from the coding group. A code encodes the data to be encoded. Because the repair method of the present invention is to select the same number of nodes as the number of information nodes from the non-failed nodes to download data symbols to restore the data symbols in the failed node, since the number of selected nodes is equal to the number of information nodes, the problem of repairing the failed nodes is solved. time to fix high-level issues. Since the expansion method of the present invention is to combine the two sub-strips in each expansion group when the encoded data is expanded, since there are multiple codes in the encoding group, such a combination process can be performed multiple times, which solves the problem of the number of expansions. less problems.

为实现上述目的，本发明一种折叠式可扩展分布式存储编码方法的步骤包括如下：In order to achieve the above object, the steps of a foldable scalable distributed storage coding method of the present invention include the following steps:

(1)设定第1个阶段的编码参数：(1) Set the encoding parameters of the first stage:

将信息节点数k₁、校验节点数r₁、元校验节点数s₁设定为第1个阶段的编码参数，其中，k₁、r₁为正整数，s₁为非负整数且小于等于r₁；The number of information nodes k ₁ , the number of check nodes r ₁ , and the number of meta-check nodes s ₁ are set as the encoding parameters of the first stage, wherein k ₁ and r ₁ are positive integers, s ₁ is a non-negative integer and less than or equal to r ₁ ;

(2)计算下一个阶段的编码参数：(2) Calculate the encoding parameters of the next stage:

(2a)按照下式，计算下一个阶段的信息节点数、校验节点数：(2a) Calculate the number of information nodes and check nodes in the next stage according to the following formula:

k′＝2kk'=2k

r′＝2r-sr′=2r-s

其中，k′、r′分别表示当前阶段的下一个阶段的信息节点数、校验节点数，k、r、s分别表示当前阶段的信息节点数、校验节点数、元校验节点数；Among them, k' and r' respectively represent the number of information nodes and check nodes in the next stage of the current stage, and k, r and s respectively represent the number of information nodes, check nodes and meta-check nodes in the current stage;

(2b)从取值范围{s,s+1,…,2r-s}中选取一个与希望以低修复复杂度修复的同时失效的信息节点数最大值相等的值作为下一个阶段的元校验节点数；(2b) From the value range {s,s+1,...,2r-s}, select a value equal to the maximum number of information nodes that fail to be repaired at the same time with low repair complexity as the meta-calibration of the next stage Check the number of nodes;

(3)判断当前迭代得到的编码参数的总数是否是等于m，若是，则执行步骤(4)；否则，将本次确定的编码参数作为当前阶段的编码参数后执行步骤(2)；m表示设定的拟构造的编码组中码的总数，m的值为大于等于2的整数；(3) Judging whether the total number of coding parameters obtained by the current iteration is equal to m, and if so, execute step (4); otherwise, execute step (2) after taking the coding parameters determined this time as the coding parameters of the current stage; m represents The total number of codes in the set coding group to be constructed, and the value of m is an integer greater than or equal to 2;

(4)确定最终编码参数：(4) Determine the final encoding parameters:

(4a)将当前迭代得到的编码参数中的校验节点数的值设定为当前迭代得到的编码参数中的元校验节点数；(4a) the value of the number of check nodes in the encoding parameter obtained by the current iteration is set to the number of meta-check nodes in the encoding parameter obtained by the current iteration;

(4b)将当前迭代得到的信息节点数、校验节点数与确定的元校验节点数组成最终编码参数；(4b) The number of information nodes obtained by the current iteration, the number of check nodes and the determined number of meta-check nodes are formed into final coding parameters;

(5)构建最后一个阶段对应的生成矩阵：(5) Construct the generator matrix corresponding to the last stage:

将一个系统型MDS码作为基本码，利用基本码的生成矩阵构造方法，构建一个与最后一个阶段对应的生成矩阵G^m：Taking a systematic MDS code as the basic code, and using the generator matrix construction method of the basic code, a generator matrix G ^m corresponding to the last stage is constructed:

其中，G^m表示最后一个阶段对应的生成矩阵，

表示一个k_m阶的单位矩阵，k_m的取值等于最终编码参数中的信息节点数，

表示一个r_m行k_m列的矩阵，r_m的取值等于最终编码参数中的校验节点数；Among them, G ^m represents the generator matrix corresponding to the last stage,

represents a unit matrix of order k _m , and the value of k _m is equal to the number of information nodes in the final encoding parameter,

Represents a matrix with r _m rows and k _m columns, and the value of r _m is equal to the number of check nodes in the final encoding parameter;

(6)设定生成矩阵的折叠规则：(6) Set the folding rules of the generated matrix:

(6a)将待折叠的矩阵划分成A、B、X、U、V五个矩阵：其中，A表示待折叠的矩阵的第1行至第l₁行组成的一个左信息矩阵，

表示当前阶段的上一阶段对应的编码参数中的信息节点数；B表示待折叠的矩阵的第l₂行至第l₃行组成的一个右信息矩阵，

X表示待折叠的矩阵的第l₄行至前l₅行组成的矩阵，

表示当前阶段的上一阶段对应的编码参数中的元校验节点数；U表示待折叠的矩阵的第l₆行至第l₇行组成的矩阵，

表示当前阶段的上一阶段对应的编码参数中的校验节点数；V表示待折叠的矩阵的第l₈行至最后1行组成的一个右非元矩阵，

(6a) Divide the matrix to be folded into five matrices of A, B, X, U, and V: wherein, A represents a left information matrix composed of the _1st row to the 11th row of the matrix to be folded,

Represents the number of information nodes in the coding parameters corresponding to the previous stage of the current stage; B represents a right information matrix composed of the _12th row to the 13th _row of the matrix to be folded,

X represents the matrix composed of the _14th row to the first _15th row of the matrix to be folded,

Represents the number of meta-check nodes in the coding parameters corresponding to the previous stage of the current stage; U represents the matrix formed by the _16th row to the _17th row of the matrix to be folded,

Represents the number of check nodes in the coding parameters corresponding to the previous stage of the current stage; V represents a right non-element matrix composed of the _18th row to the last row of the matrix to be folded,

(6b)生成一个与矩阵X行列元素均相等的矩阵，该矩阵的最后μ个非零列的元素全部置零后得到一个左元矩阵X′，

生成一个与矩阵X行列元素均相等的矩阵，该矩阵的前μ个非零列的元素全部置零后得到一个右元矩阵X″；将矩阵U的最后μ个非零列的元素全部置零后得到一个左非元矩阵U′；(6b) Generate a matrix whose row and column elements are equal to those of matrix X, and obtain a left-element matrix X′ after all the elements of the last μ non-zero columns of the matrix are set to zero,

Generate a matrix with the same row and column elements as matrix X, and set all the elements of the first μ non-zero columns of the matrix to zero to obtain a right-element matrix X”; set all the elements of the last μ non-zero columns of the matrix U to zero Then get a left non-element matrix U';

(6c)按照下式，分别对矩阵A、矩阵X′、矩阵U′，矩阵B、矩阵X″、矩阵V进行组合，构建出对待折叠的矩阵折叠后的当前阶段的上一个阶段对应的生成矩阵集中的两个相互关联的生成矩阵：(6c) According to the following formula, combine matrix A, matrix X', matrix U', matrix B, matrix X'', and matrix V respectively to construct the generation corresponding to the previous stage of the current stage after the matrix to be folded is folded Two interrelated generator matrices in a matrix set:

其中，G′表示左生成矩阵、G″表示右生成矩阵；Among them, G′ represents the left generator matrix, and G″ represents the right generator matrix;

(7)构建最后一个阶段的上一个阶段对应的生成矩阵集：(7) Construct the generation matrix set corresponding to the previous stage of the last stage:

(7a)根据生成矩阵的折叠规则，对最后一个阶段对应的生成矩阵进行折叠，(7a) According to the folding rule of the generator matrix, fold the generator matrix corresponding to the last stage,

并将折叠后的生成矩阵加入到当前阶段的上一个阶段对应的生成矩阵集中；And add the folded generator matrix to the generator matrix set corresponding to the previous stage of the current stage;

(7b)判断m-1的值是否等于2，若是，则执行步骤(10)，否则，将本次迭代确定的生成矩阵集作为当前阶段对应的生成矩阵集后执行步骤(8)；(7b) judge whether the value of m-1 is equal to 2, if yes, then execute step (10), otherwise, perform step (8) after taking the generation matrix set determined by this iteration as the generation matrix set corresponding to the current stage;

(8)构造上一阶段对应的生成矩阵集：(8) Construct the generating matrix set corresponding to the previous stage:

根据生成矩阵的折叠规则，对当前阶段对应的生成矩阵集中的每一个生成矩阵进行折叠，并将折叠得到的生成矩阵加入到当前阶段的上一个阶段对应的生成矩阵集中；According to the folding rule of the generator matrix, fold each generator matrix in the generator matrix set corresponding to the current stage, and add the generator matrix obtained by folding to the generator matrix set corresponding to the previous stage of the current stage;

(9)判断当前迭代得到的生成矩阵集中生成矩阵的数量是否等于2^m-1，若是，则执行步骤(10)，否则，将本次确定的生成矩阵集作为当前阶段对应的生成矩阵集后执行步骤(8)；(9) Determine whether the number of generator matrices in the generator matrix set obtained by the current iteration is equal to 2 ^m-1 , if so, perform step (10), otherwise, take the generator matrix set determined this time as the generator matrix set corresponding to the current stage. Execute step (8);

(10)确定各阶段的码：(10) Determine the code of each stage:

将每一个阶段对应的编码参数作为其对应的码的编码参数；将最后一个阶段对应的生成矩阵作为最后一个阶段的码的子条带的生成矩阵；将除最后一个阶段以外的其他每个阶段对应的生成矩阵集中的各生成矩阵作为其对应的码的各子条带的生成矩阵；Take the encoding parameter corresponding to each stage as the encoding parameter of its corresponding code; take the generator matrix corresponding to the last stage as the generator matrix of the sub-strip of the code of the last stage; take every other stage except the last stage Each generator matrix in the corresponding generator matrix set is used as the generator matrix of each sub-strip of its corresponding code;

(11)将所有阶段的码组合成一个编码组；(11) code of all stages is combined into a coding group;

(12)从编码组中选择一个码，所选码的编码参数中信息节点数与校验节点数的和等于希望采用的节点总数；(12) select a code from the coding group, and the sum of the number of information nodes and the number of check nodes in the coding parameters of the selected code is equal to the total number of nodes expected to be adopted;

(13)对待编码数据进行编码：(13) Encode the data to be encoded:

将待编码数据平均划分为t个信息符号，t＝k_m；用所选码的每一个子条带对应的生成矩阵分别对待编码数据进行编码，得到该子条带的数据符号，由所有子条带的数据符号组成编码数据；将编码数据保存到相应节点中。Divide the data to be encoded into t information symbols on average, _t =km; encode the data to be encoded with the corresponding generator matrix of each sub-strip of the selected code, obtain the data symbols of this sub-strip, by all sub-strips The data symbols of the strip make up the encoded data; the encoded data is stored in the corresponding node.

本发明一种折叠式可扩展分布式存储编码修复方法的步骤包括如下：The steps of a foldable and scalable distributed storage code repair method of the present invention include the following steps:

(1)将每个编码数据采用同一个码编码的失效节点的总数大于该码编码参数中的校验节点数的情况放弃修复，其余情况执行步骤(2)；(1) The total number of failed nodes encoded by the same code for each encoded data is greater than the number of check nodes in the code encoding parameters, and the repair is abandoned, and step (2) is performed in the remaining cases;

(2)判断所有失效节点的失效信息节点的总数是否为非0值，若是，则执行步骤(3)；否则，判定不存在失效信息节点但存在失效校验节点后执行步骤(6)；(2) Determine whether the total number of failed information nodes of all failed nodes is a non-zero value, and if so, execute step (3); otherwise, execute step (6) after determining that there is no failed information node but there is a failed check node;

(3)判断

是否成立，若是，则执行步骤(4)，否则，执行步骤(7)；其中，α表示所有失效节点的失效信息节点的总数，

表示每个编码数据采用同一个码编码的参数中的元校验节点数、λ表示所有失效校验节点中失效元校验节点的总数；(3) Judgment

Is it true, if so, go to step (4), otherwise, go to step (7); where α represents the total number of failure information nodes of all failed nodes,

Represents the number of meta-check nodes in the parameters encoded by the same code for each encoded data, and λ represents the total number of failed meta-check nodes in all failed check nodes;

(4)划分数据符号：(4) Divide data symbols:

从与失效节点保存同一个编码数据的每个未失效的信息节点中分别下载该信息节点保存的所有的信息符号，从与失效节点保存同一个编码数据的每个未失效的元校验节点中下载该元校验节点保存的所有的元校验符号，从与失效节点保存同一个编码数据的所有未失效的非元校验节点中任意选择η个非元校验节点下载该非元校验节点中所有的非元校验符号；将属于同一个子条带的数据符号划为同一个符号组，将属于同一个编码数据的符号组划为同一个数据集合；其中η的值等于

Download all information symbols saved by the information node from each non-invalidated information node that stores the same encoded data as the failed node, and download all the information symbols stored by the failed node from each non-invalidated meta-check node that stores the same encoded data as the failed node. Download all meta-check symbols saved by the meta-check node, and randomly select n non-meta-check nodes from all non-meta-check nodes that store the same encoded data as the failed node to download the non-meta-check node All non-meta-check symbols in the node; the data symbols belonging to the same sub-strip are divided into the same symbol group, and the symbol groups belonging to the same encoded data are divided into the same data set; the value of n is equal to

(5)对每个数据集合进行处理：(5) Process each data set:

(5a)按照下式，对每一个数据集合中的每一个符号组进行编号：(5a) Number each symbol group in each data set according to the following formula:

其中，j_h,c表示第h个数据集合中第c个符号组的编号，

表示向上取整操作，q_h,c表示第h个数据集合中第c个符号组对应子条带的生成矩阵中第一行元素中非零元素的列的序号，

表示每个编码数据采用同一个码编码的参数中的信息节点数；Among them, j _h,c represents the number of the c-th symbol group in the h-th data set,

Represents the round-up operation, q _h,c represents the sequence number of the column of the non-zero element in the first row element in the generator matrix of the c-th symbol group corresponding to the sub-strip in the h-th data set,

Indicates the number of information nodes in the parameters encoded by the same code for each encoded data;

(5b)对于每一个数据集合从该数据集合中编号为1的符号组依次对该数据集合中的每个符号组进行译码-消除操作；(5b) for each data set, perform decoding-elimination operation on each symbol group in the data set in turn from the symbol group numbered 1 in the data set;

(5c)所有的数据集合均已进行处理后执行步骤(9)；(5c) Step (9) is performed after all data sets have been processed;

(6)划分信息符号：(6) Divide information symbols:

从与失效节点保存同一个编码数据的每个信息节点中下载该信息节点保存的所有的信息符号，将属于同一个子条带的信息符号划为同一个符号组；将属于同一个编码数据的符号组划为同一个数据集合后执行步骤(9)；Download all the information symbols stored by the information node from each information node that stores the same encoded data as the failed node, and divide the information symbols belonging to the same sub-strip into the same symbol group; classify the symbols belonging to the same encoded data Step (9) is performed after the group is divided into the same data set;

(7)恢复失效信息节点对应的信息符号：(7) Restore the information symbol corresponding to the failed information node:

(7a)从与失效节点保存同一个编码数据的每个未失效的信息节点中下载该信息节点保存的所有的信息符号，从与失效节点保存同一个编码数据的所有未失效的元校验节点中任意选择与α的值相等数量的元校验节点下载该元校验节点保存的所有的元校验符号；将属于同一个子条带的数据符号划为同一个符号组；(7a) Download all the information symbols saved by the information node from each non-invalidated information node that stores the same encoded data as the failed node, and download all the non-invalidated meta-check nodes that store the same encoded data with the failed node. Arbitrarily select a number of meta-check nodes equal to the value of α to download all meta-check symbols saved by the meta-check node; divide the data symbols belonging to the same sub-strip into the same symbol group;

(7b)利用与基本码对应的译码方法对每个符号组中的数据符号进行译码，恢复出该符号组中与失效信息节点对应的信息符号，并将其加入到该符号组中；(7b) utilize the decoding method corresponding to the basic code to decode the data symbols in each symbol group, recover the information symbols corresponding to the failed information nodes in the symbol group, and add it to the symbol group;

(7c)将属于同一个编码数据的符号组划为同一个数据集合；(7c) dividing the symbol groups belonging to the same encoded data into the same data set;

(8)判断所有失效节点中是否存在失效校验节点，若是，则执行步骤(9)，否则，执行步骤(10)；(8) Judge whether there is a failed check node in all the failed nodes, if so, execute step (9), otherwise, execute step (10);

(9)对信息符号进行编码：(9) Encode the information symbol:

利用失效校验节点对应的编码系数行矩阵对每个数据集合中所有符号组中所有的信息符号进行编码，恢复出失效校验节点对应的校验符号，并将恢复后的校验符号加入到与其对应同一个子条带的符号组中；All information symbols in all symbol groups in each data set are encoded using the coding coefficient row matrix corresponding to the failed check node, the check symbol corresponding to the failed check node is recovered, and the recovered check symbol is added to the in the symbol group corresponding to the same sub-strip;

(10)保存恢复出的数据符号：(10) Save the recovered data symbols:

增加与α的值相等数量的新的信息节点，增加与所有失效节点的失效校验节点的总数相等数量的新的校验节点，将恢复出的属于同一个信息节点的信息符号保存在同一个信息节点中，将恢复出的属于同一个校验节点的校验符号保存在同一个校验节点中；Add a number of new information nodes equal to the value of α, add a number of new check nodes equal to the total number of failed check nodes of all failed nodes, and save the recovered information symbols belonging to the same information node in the same In the information node, the recovered check symbols belonging to the same check node are stored in the same check node;

(11)用新的节点替换掉失效的节点。(11) Replace the failed node with a new node.

本发明一种折叠式可扩展分布式存储编码扩展方法的步骤包括如下：The steps of a foldable scalable distributed storage coding extension method of the present invention include the following steps:

(1)添加新的节点：(1) Add a new node:

对除编码数据采用最后一个阶段的码进行编码的情况之外，向保存编码数据的节点中添加ρ个信息节点Y₁、Y₂、…、Y_ρ和γ个校验节点F₁、F₂、…、F_γ，其中，

表示编码数据当前采用的码对应阶段的下一个阶段的码的编码参数中的信息节点数，

表示编码数据当前采用的码的编码参数中的信息节点数，

表示编码数据当前采用的码对应阶段的下一个阶段的码的编码参数中的校验节点数，

表示编码数据当前采用的码的编码参数中的校验节点数；Add _ρ information nodes _Y ₁ , Y ₂ , _. , ..., F _γ , where,

Indicates the number of information nodes in the coding parameters of the code of the next stage of the code corresponding to the current stage of the coded data,

Indicates the number of information nodes in the encoding parameter of the code currently used by the encoded data,

Indicates the number of check nodes in the coding parameters of the code of the next stage of the code corresponding to the current stage of the coded data,

Indicates the number of check nodes in the encoding parameters of the code currently used by the encoded data;

(2)将编码数据中生成矩阵相互关联的两个子条带划为一个扩展组；(2) the two sub-strips in which the generation matrix is related to each other in the encoded data are divided into an extended group;

(3)合并每个扩展组内的两个子条带：(3) Merge the two sub-stripes within each extension group:

(3a)从保存编码数据的所有信息节点中下载并缓存每个扩展组中与右生成矩阵对应的子条带的所有的信息符号，将这些信息符号中位于子条带不同位置的信息符号分别迁移至信息节点Y₁、Y₂、…、Y_ρ中不同的信息节点中，将迁移后的信息符号和未迁移的信息符号组成合并后的子条带的信息符号；(3a) Download and cache all the information symbols of the sub-strip corresponding to the right generator matrix in each extension group from all the information nodes that save the encoded data, and separate the information symbols located at different positions of the sub-strip among these information symbols. _Migrating to different information nodes in the information nodes Y ₁ , Y ₂ , .

(3b)将每个扩展组中两个子条带中位于同一个元校验节点的两个元校验符号在该元校验节点中相加，得到更新后的元校验符号；(3b) adding the two meta-check symbols located in the same meta-check node in the two sub-strips in each extended group in the meta-check node to obtain the updated meta-check symbol;

(3c)利用与每个扩展组中子条带对应的左生成矩阵的互补矩阵对缓存的信息符号进行编码得到校正符号，利用校正符号对左生成矩阵对应的子条带的非元校验符号进行更新；(3c) Use the complementary matrix of the left generator matrix corresponding to the sub-strip in each extension group to encode the buffered information symbols to obtain correction symbols, and use the correction symbols to encode the non-element check symbols of the sub-strip corresponding to the left generator matrix to update;

(3d)将每个扩展组中与右生成矩阵对应的子条带中不同位置的非元校验符号分别迁移至校验节点F₁、F₂、…、F_γ中不同的校验节点中；(3d) Migrate the non-meta-check symbols at different positions in the sub-strip corresponding to the right generator matrix in each extension group to different check nodes in the check nodes F ₁ , F ₂ , . . . , F _γ respectively ;

(3e)将更新后的元校验符号、更新后的非元校验符号和迁移后的校验符号组成合并后子条带的校验符号；(3e) the updated meta-check symbol, the updated non-meta-check symbol and the migrated check symbol are formed into the check symbol of the merged sub-strip;

(4)将所有合并后的子条带的数据符号组成扩展后的编码数据。(4) The data symbols of all the merged sub-strips are formed into the extended coded data.

本发明与现有的技术相比具有以下优点：Compared with the prior art, the present invention has the following advantages:

第一，本发明的编码方法中在计算下一个阶段的编码参数时计算出的下一个阶段的校验节点数不少于当前阶段的校验节点数，克服了现有技术在存储扩展时单个条带内编码块的数量保持不变、系统扩展后的容错能力无法适应系统的节点规模的问题，使得利用本发明的编码方法构造的码字具有构造的下一个阶段的码时能够同时提高单个子条带内校验符号的数量、使系统扩展后的容错能力可以适应系统的节点规模的优点。First, in the coding method of the present invention, the number of check nodes in the next stage calculated when the coding parameters of the next stage are calculated is not less than the number of check nodes in the current stage, which overcomes the problem that the prior art requires a single check node during storage expansion. The number of coding blocks in the strip remains unchanged, and the fault-tolerant capability after system expansion cannot adapt to the node scale of the system, so that when the codeword constructed by the coding method of the present invention has the code of the next stage of construction, it can simultaneously improve the single-digit rate. The number of check symbols in the sub-strip enables the extended fault tolerance of the system to adapt to the node scale of the system.

第二，本发明的修复方法中在从未失效的节点选择节点下载数据符号时仅需选择与信息节点数相同数量的节点，克服了现有技术在对失效节点进行修复时需要连接的节点数量大于信息节点数量、修复度高的问题，使得本发明的修复方法具有在对失效节点进行修复时需要连接的节点数量等于信息节点数量，修复度较低的优点。Second, in the repair method of the present invention, when selecting a node from an unfailed node to download data symbols, only the same number of nodes as the number of information nodes need to be selected, which overcomes the number of nodes that need to be connected when repairing a failed node in the prior art The problems of being larger than the number of information nodes and having a high repair degree make the repair method of the present invention have the advantages that the number of nodes to be connected is equal to the number of information nodes when repairing a failed node, and the repair degree is low.

第三，本发明的扩展方法中通过对每个扩展组内的两个子条带进行合并可以实现对编码数据的扩展，由于编码组中有多个码，这样的扩展过程可以多次进行，克服了现有技术仅可以以低的带宽资源消耗扩展一次的问题，使得本发明的扩展方法具有可以以低的带宽资源消耗扩展多次的优点。Third, in the expansion method of the present invention, the expansion of the encoded data can be realized by merging the two sub-stripes in each expansion group. Since there are multiple codes in the encoding group, such an expansion process can be performed multiple times to overcome The problem that the prior art can only be extended once with low bandwidth resource consumption makes the extension method of the present invention have the advantage that it can be extended multiple times with low bandwidth resource consumption.

附图说明Description of drawings

图1为本发明的折叠式可扩展分布式存储编码的流程图；Fig. 1 is the flow chart of the foldable scalable distributed storage encoding of the present invention;

图2为本发明实施例中对待编码数据进行编码的示意图；2 is a schematic diagram of encoding data to be encoded in an embodiment of the present invention;

图3为本发明的折叠式可扩展分布式存储修复的流程图；Fig. 3 is the flow chart of the foldable scalable distributed storage repair of the present invention;

图4为本发明的折叠式可扩展分布式存储扩展的流程图；Fig. 4 is the flow chart of the foldable scalable distributed storage expansion of the present invention;

图5为本发明实施例中对编码数据进行扩展的示意图。FIG. 5 is a schematic diagram of extending encoded data in an embodiment of the present invention.

具体实施方式Detailed ways

下面结合附图和实施例，对本发明做进一步的描述。The present invention will be further described below with reference to the accompanying drawings and embodiments.

参照图1，对本发明的折叠式可扩展分布式存储编码方法的实现步骤做进一步的描述。Referring to FIG. 1 , the implementation steps of the foldable scalable distributed storage encoding method of the present invention will be further described.

步骤1，设定第1个阶段的编码参数。Step 1, set the encoding parameters of the first stage.

将信息节点数k₁、校验节点数r₁、元校验节点数s₁设定为第1个阶段的编码参数，其中，k₁、r₁为正整数，s₁为非负整数且小于等于r₁。The number of information nodes k ₁ , the number of check nodes r ₁ , and the number of meta-check nodes s ₁ are set as the encoding parameters of the first stage, wherein k ₁ and r ₁ are positive integers, s ₁ is a non-negative integer and less than or equal to r ₁ .

在本发明实施例中，将第1个阶段的编码参数的信息节点数、校验节点数、元校验节点数分别设置为4、3、1。In the embodiment of the present invention, the number of information nodes, the number of check nodes, and the number of meta-check nodes of the coding parameters in the first stage are set to 4, 3, and 1, respectively.

步骤2，计算下一个阶段的编码参数。Step 2, calculate the encoding parameters of the next stage.

按照下式，计算下一个阶段的信息节点数、校验节点数：Calculate the number of information nodes and check nodes in the next stage according to the following formula:

k′＝2kk'=2k

r′＝2r-sr′=2r-s

从取值范围{s,s+1,…,2r-s}中选取一个与希望以低修复复杂度修复的同时失效的信息节点数最大值相等的值作为下一个阶段的元校验节点数。From the value range {s, s+1,..., 2r-s}, select a value equal to the maximum number of information nodes that fail to be repaired at the same time with low repair complexity as the number of meta-check nodes in the next stage .

步骤3，判断当前迭代得到的编码参数的总数是否是等于m，若是，则执行步骤4；否则，将本次确定的编码参数作为当前阶段的编码参数后执行步骤2；m表示设定的拟构造的编码组中码的总数，m的值为大于等于2的整数。Step 3, judge whether the total number of coding parameters obtained by the current iteration is equal to m, if so, execute step 4; otherwise, use the coding parameters determined this time as the coding parameters of the current stage and execute step 2; m represents the set simulation parameter. The total number of codes in the constructed coding group, and the value of m is an integer greater than or equal to 2.

步骤4，确定最终编码参数。Step 4, determine the final encoding parameters.

第1步，将当前迭代得到的编码参数中的校验节点数的值设定为当前迭代得到的编码参数中的元校验节点数；Step 1: Set the value of the number of check nodes in the encoding parameter obtained by the current iteration to the number of meta-check nodes in the encoding parameter obtained by the current iteration;

第2步，将当前迭代得到的信息节点数、校验节点数与确定的元校验节点数组成最终编码参数。In the second step, the number of information nodes obtained by the current iteration, the number of check nodes and the determined number of meta-check nodes are formed into final coding parameters.

在本发明实施例中，设定编码组中的码的总数为3个，通过计算、选择可以得到第2个阶段的码的编码参数中信息节点数、校验节点数、元校验点数分别为8、5、2，第3个阶段的码的编码参数即最终编码参数中信息节点数、校验节点数、元校验点数分别为16、8、8。In the embodiment of the present invention, the total number of codes in the coding group is set to 3, and the number of information nodes, the number of check nodes, and the number of meta-check points in the coding parameters of the codes in the second stage can be obtained by calculation and selection, respectively. are 8, 5, and 2, and the coding parameters of the code in the third stage, that is, the number of information nodes, the number of check nodes, and the number of meta-check points in the final coding parameters are 16, 8, and 8, respectively.

步骤5，构建最后一个阶段对应的生成矩阵。Step 5, construct the generator matrix corresponding to the last stage.

其中，G^m表示最后一个阶段对应的生成矩阵，

表示一个r_m行k_m列的矩阵，r_m的取值等于最终编码参数中的校验节点数。Among them, G ^m represents the generator matrix corresponding to the last stage,

Represents a matrix with r _m rows and _k _m columns, and the value of rm is equal to the number of check nodes in the final encoding parameter.

在本发明实施例中，采用一个系统型(24,16)RS作为基本码构造一个生成矩阵

In the embodiment of the present invention, a system type (24,16) RS is used as the basic code to construct a generator matrix

步骤6，设定生成矩阵的折叠规则。Step 6, set the folding rule of the generated matrix.

第1步，将待折叠的矩阵划分成A、B、X、U、V五个矩阵：其中，A表示待折叠的矩阵的第1行至第l₁行组成的一个左信息矩阵，

X表示待折叠的矩阵的第l₄行至前l₅行组成的矩阵，

Step 1: Divide the matrix to be folded into five matrices of A, B, X, U, and V: where A represents a left information matrix composed of the _1st row to the 11th row of the matrix to be folded,

第2步，生成一个与矩阵X行列元素均相等的矩阵，该矩阵的最后μ个非零列的元素全部置零后得到一个左元矩阵X′，

生成一个与矩阵X行列元素均相等的矩阵，该矩阵的前μ个非零列的元素全部置零后得到一个右元矩阵X″；将矩阵U的最后μ个非零列的元素全部置零后得到一个左非元矩阵U′；The second step is to generate a matrix with the same row and column elements as the matrix X. After the elements of the last μ non-zero columns of the matrix are all set to zero, a left-element matrix X' is obtained.

第3步，按照下式，分别对矩阵A、矩阵X′、矩阵U′，矩阵B、矩阵X″、矩阵V进行组合，构建出对待折叠的矩阵折叠后的当前阶段的上一个阶段对应的生成矩阵集中的两个相互关联的生成矩阵：Step 3: Combine matrix A, matrix X', matrix U', matrix B, matrix X'' and matrix V respectively according to the following formula to construct the matrix corresponding to the previous stage of the current stage after the matrix to be folded is folded Two interrelated generator matrices in a generator matrix set:

其中，G′表示左生成矩阵、G″表示右生成矩阵。Among them, G′ represents the left generator matrix, and G″ represents the right generator matrix.

步骤7，构建最后一个阶段的上一个阶段对应的生成矩阵集。Step 7: Construct the generator matrix set corresponding to the previous stage of the last stage.

第1步，根据生成矩阵的折叠规则，对最后一个阶段对应的生成矩阵进行折叠，Step 1, according to the folding rules of the generator matrix, fold the generator matrix corresponding to the last stage,

并将折叠得到的生成矩阵加入到当前阶段的上一个阶段对应的生成矩阵集中；And add the generated matrix obtained by folding to the set of generating matrices corresponding to the previous stage of the current stage;

第2步，判断m-1的值是否等于2，若是，则执行步骤10，否则，将本次确定的生成矩阵集作为当前阶段对应的生成矩阵集后执行步骤8。Step 2, judge whether the value of m-1 is equal to 2, if so, go to step 10, otherwise, take the generator matrix set determined this time as the generator matrix set corresponding to the current stage and then go to step 8.

步骤8，构造上一阶段对应的生成矩阵集。Step 8: Construct the generator matrix set corresponding to the previous stage.

根据生成矩阵的折叠规则，对当前阶段对应的生成矩阵集中的每一个生成矩阵进行折叠，并将折叠得到的生成矩阵加入到当前阶段的上一个阶段对应的生成矩阵集中。According to the folding rules of the generator matrix, each generator matrix in the generator matrix set corresponding to the current stage is folded, and the folded generator matrix is added to the generator matrix set corresponding to the previous stage of the current stage.

步骤9，判断当前得到的生成矩阵集的总数是否是等于m-1，若是，则执行步骤10，否则，将本次确定的生成矩阵集作为当前阶段对应的生成矩阵集后执行步骤8。Step 9: Determine whether the total number of generator matrix sets currently obtained is equal to m-1, if so, perform step 10; otherwise, perform step 8 after taking the generator matrix set determined this time as the generator matrix set corresponding to the current stage.

在本发明实施例中，为便于描述，将矩阵P_8×16中第1至第8行对应的编码系数行矩阵分别表示为p¹、p²、…、p⁸，用符号

表示保留p^q中第u到v个元素并将其余元素置零后得到的编码系数行矩阵，1≤q≤8，1≤u＜v≤16，用符号E_u′×u′表示一个u′行u′列的单位矩阵，1≤u′≤16，用符号O_u″×v″表示一个u″行v″列的全零矩阵，1≤u″≤16，1≤v″≤16。首先，将最后一阶段对应的生成矩阵G³划分为五个矩阵：左信息矩阵A³＝[E_8×8|Ο_8×8]、右信息矩阵B³＝[O_8×8|E_8×8]、矩阵

矩阵

右非元矩阵

生成两个与矩阵X³相同的矩阵并将其中一个矩阵的最后8个非零列的元素全部置零，得到构造的元矩阵

将另一个矩阵的前8个非零列的元素全部置零，得到构造的右元矩阵

将矩阵U³的最后8个非零列的元素全部置零后得到左非元矩阵

分别对矩阵A³、矩阵X₁ ³、矩阵U₁ ³，矩阵B³、矩阵X₂ ³、矩阵V³进行组合，构建出对矩阵G³折叠后得到的当前阶段的上一个阶段对应的生成矩阵集中的两个相互关联的生成矩阵

即：In this embodiment of the present invention, for the convenience of description, the coding coefficient row matrices corresponding to the 1st to 8th rows in the matrix P _8×16 are represented as p ¹ , p ² , . . . , p ⁸ , respectively, with symbols

Represents the coding coefficient row matrix obtained after retaining the uth to vth elements in p ^q and setting the remaining elements to zero, 1≤q≤8, 1≤u<v≤16, and the symbol E _u′×u′ represents a u The identity matrix of 'row and u' column, 1≤u'≤16, use the symbol O _u"×v" to represent an all-zero matrix with u" row and v" column, 1≤u"≤16, 1≤v"≤16 . First, divide the generator matrix G ³ corresponding to the last stage into five matrices: left information matrix A ³ =[E _8×8 |O _8×8 ], right information matrix B ³ =[O _8×8 |E _{8 ×8} ], matrix

matrix

right non-element matrix

Generate two matrices identical to matrix X ³ and zero out the elements of the last 8 non-zero columns of one of the matrices to get the constructed metamatrix

Set all the elements of the first 8 non-zero columns of another matrix to zero to get the constructed right-element matrix

The left non-element matrix is obtained by setting all the elements of the last 8 non-zero columns of the matrix U ³ to zero

Respectively combine matrix A ³ , matrix X ₁ ³ , matrix U ₁ ³ , matrix B ³ , matrix X ₂ ³ , and matrix V ³ to construct the generation corresponding to the previous stage of the current stage obtained by folding matrix G ³ Two interrelated generator matrices in a matrix set

which is:

G^2,1、G^2,1即为第2个阶段的生成矩阵集中的2个生成矩阵。G ^2,1 and G ^2,1 are the two generator matrices in the generator matrix set in the second stage.

采用同样的方法分别对G^2,1、G^2,2进行折叠，可以分别得到两个生成矩阵G^1,1、G^1,2，G^1,3、G^1,4：Using the same method to fold G ^2,1 and G ^2,2 respectively, two generating matrices G ^1,1 , G ^1,2 , G ^1,3 and G ^1,4 can be obtained respectively:

G^1,1、G^1,2，G^1,3、G^1,4即为第1个阶段的生成矩阵集中的4个生成矩阵。G ^1,1 , G ^1,2 , G ^1,3 , and G ^1,4 are the four generator matrices in the generator matrix set in the first stage.

步骤10，确定各阶段的码。Step 10: Determine the codes of each stage.

将每一个阶段对应的编码参数作为其对应的码的编码参数；将最后一个阶段对应的生成矩阵作为最后一个阶段的码的子条带的生成矩阵；将除最后一个阶段以外的其他每个阶段对应的生成矩阵集中的各生成矩阵作为其对应的码的各子条带的生成矩阵。Take the encoding parameter corresponding to each stage as the encoding parameter of its corresponding code; take the generator matrix corresponding to the last stage as the generator matrix of the sub-strip of the code of the last stage; take every other stage except the last stage Each generator matrix in the corresponding generator matrix set is used as the generator matrix of each sub-strip of the corresponding code.

步骤11，将所有阶段的码组合成一个编码组。Step 11: Combine codes of all stages into a coding group.

步骤12，从编码组中选择一个码，所选码的编码参数中信息节点数与校验节点数的和等于希望采用的节点总数。Step 12, select a code from the coding group, and the sum of the number of information nodes and the number of check nodes in the coding parameters of the selected code is equal to the total number of nodes to be used.

在本发明实施例中，希望采用的节点总数为7个节点，故从编码组中选择第1个阶段的码。In the embodiment of the present invention, the total number of nodes expected to be used is 7 nodes, so the code of the first stage is selected from the coding group.

步骤13，对待编码数据进行编码。Step 13, encoding the data to be encoded.

所述的子条带的数据符号包括信息符号和校验符号两类，校验符号包括元校验符号和非元校验符号两小类，所述数据符号通过生成矩阵中各行对应的编码系数行矩阵对待编码数据进行编码得到；所述信息符号为左生成矩阵中的左信息矩阵或右生成矩阵中的右信息矩阵对待编码数据进行编码得到的数据符号；所述元校验符号为左生成矩阵中的左元矩阵或右生成矩阵中的右元矩阵对待编码数据进行编码得到的数据符号；所述非元校验符号为左生成矩阵中的左非元矩阵或右生成矩阵中的右非元矩阵对待编码数据进行编码得到的数据符号。The data symbols of the sub-strips include two types of information symbols and check symbols, and the check symbols include two subclasses of meta-check symbols and non-meta-check symbols. The data symbols are generated by encoding coefficients corresponding to each row in the generation matrix. The row matrix is obtained by encoding the data to be encoded; the information symbol is the data symbol obtained by encoding the data to be encoded by the left information matrix in the left generator matrix or the right information matrix in the right generator matrix; the meta-check symbol is the left generator The left element matrix in the matrix or the right element matrix in the right generator matrix is the data symbol obtained by encoding the data to be encoded; the non-element check symbol is the left non-element matrix in the left generator matrix or the right non-element matrix in the right generator matrix. The metamatrix is the data symbol obtained by encoding the data to be encoded.

所述的编码数据保存到相应节点指的是，每一个子条带中不同位置的数据符号保存在不同的节点中，位于不同子条带中相同位置的数据符号保存在同一个节点中；所述的节点包括信息节点和校验节点两类，所述校验节点包括元校验节点与非元校验节点两小类；所述信息节点用于保存信息符号；所述元校验节点用于保存元校验符号；所述非元校验节点用于保存非元校验符号。The said encoded data is stored in the corresponding node means that the data symbols at different positions in each sub-strip are stored in different nodes, and the data symbols at the same position in different sub-strips are stored in the same node; The nodes described include two types of information nodes and check nodes, and the check nodes include two sub-categories of meta-check nodes and non-meta-check nodes; the information nodes are used to store information symbols; the meta-check nodes are used for for storing meta-check symbols; the non-meta-check nodes are used for storing non-meta-check symbols.

参照图2，对本发明实施例中对待编码数据进行编码的实现步骤做进一步的描述。Referring to FIG. 2 , the implementation steps of encoding the data to be encoded in the embodiment of the present invention will be further described.

图2中D₁、D₂、D₃、D₄表示4个信息节点，C₁、C₂、C₃表示3个校验节点，a₁、a₂、…、a₁₆表示16个信息符号，M表示待编码数据，M＝[a₁,a₂,…,a₁₆]^T，T表示转置操作。用第1个阶段的码的4个子条带对应的生成矩阵G^1,1、G^1,2、G^1,3、G^1,4分别对待编码数据M进行编码，得到各子条带的数据如下：In Figure 2, D ₁ , D ₂ , D ₃ , and D ₄ represent 4 information nodes, C ₁ , C ₂ , and C ₃ represent 3 check nodes, and a ₁ , a ₂ , ..., a ₁₆ represent 16 information symbols , M represents the data to be encoded, M=[a ₁ , a ₂ , . . . , a ₁₆ ] ^T , T represents the transposition operation. Use the generator matrix G ^1,1 , G ^1,2 , G ^1,3 , G ^1,4 corresponding to the four sub-stripes of the code in the first stage to encode the data M to be encoded, respectively, to obtain the data of each sub-strip as follows:

第1个子条带包括数据符号a₁、a₂、a₃、a₄、

The first sub-strip includes data symbols a ₁ , a ₂ , a ₃ , a ₄ ,

第2个子条带包括数据符号a₅、a₆、a₇、a₈、

The second sub-strip includes data symbols a ₅ , a ₆ , a ₇ , a ₈ ,

第3个子条带包括数据符号a₉、a₁₀、a₁₁、a₁₂、

The third sub-strip includes data symbols a ₉ , a ₁₀ , a ₁₁ , a ₁₂ ,

第4个子条带包括数据符号a₁₃、a₁₄、a₁₅、a₁₆、

p⁷M、p⁸M。The 4th substrip includes data symbols a ₁₃ , a ₁₄ , a ₁₅ , a ₁₆ ,

^p7M , ^p8M .

由以上4个子条带的数据共同组成了一个编码数据。将第1个子条带中的第1至第4个信息符号依次保存在信息节点D₁、D₂、D₃、D₄中，第1至第3个校验符号依次保存在校验节点C₁、C₂、C₃中；其他子条带的数据符号采用与子条带1相同的方式进行保存。The data of the above 4 sub-strips together constitute one encoded data. Store the 1st to 4th information symbols in the first sub-strip in the information nodes D ₁ , D ₂ , D ₃ , and D ₄ in sequence, and store the 1st to 3rd check symbols in the check node C in sequence ₁ , C ₂ , and C ₃ ; the data symbols of other sub-stripes are stored in the same way as sub-strip 1 .

参照图3，对本发明的折叠式可扩展分布式存储编码修复方法的实现步骤做进一步的描述。Referring to FIG. 3 , the implementation steps of the foldable and scalable distributed storage code repair method of the present invention are further described.

步骤1，将每个编码数据采用同一个码编码的失效节点的总数大于该码编码参数中的校验节点数的情况放弃修复，其余情况执行步骤2。Step 1: Abandon repair if the total number of failed nodes encoded by the same code for each encoded data is greater than the number of check nodes in the code encoding parameters, and step 2 is performed in other cases.

在本发明实施例中，假设图2中所有的节点中有2个节点失效。In this embodiment of the present invention, it is assumed that 2 of all the nodes in FIG. 2 fail.

步骤2，判断所有失效节点的失效信息节点的总数是否为非0值，若是，则执行步骤3；否则，判定不存在失效信息节点但存在失效校验节点后执行步骤6。Step 2, determine whether the total number of failure information nodes of all failed nodes is a non-zero value, if so, go to step 3;

步骤3，判断

是否成立，若是，则执行步骤4，否则，执行步骤7；其中，α表示所有失效节点的失效信息节点的总数，

表示每个编码数据采用同一个码编码的参数中的元校验节点数、λ表示所有失效校验节点中失效元校验节点的总数。Step 3, judge

Whether it is true, if yes, go to step 4, otherwise, go to step 7; where α represents the total number of failure information nodes of all failed nodes,

Represents the number of meta-check nodes in the parameters encoded with the same code for each encoded data, and λ represents the total number of failed meta-check nodes in all failed check nodes.

在本发明实施例中，假设图2中失效的2个节点为D₃、D₄，即失效信息节点的总数为2个，失效元校验节点的数量为0个。In the embodiment of the present invention, it is assumed that the two failed nodes in FIG. 2 are D ₃ and D ₄ , that is, the total number of failed information nodes is two, and the number of failed meta-check nodes is zero.

步骤4，划分数据符号。Step 4, dividing the data symbols.

从与失效节点保存同一个编码数据的每个未失效的信息节点中下载该信息节点保存的所有的信息符号，从与失效节点保存同一个编码数据的每个未失效的元校验节点中下载该元校验节点保存的所有的元校验符号，从与失效节点保存同一个编码数据的所有未失效的非元校验节点中任意选择η个非元校验节点下载该非元校验节点中所有的非元校验符号；将属于同一个子条带的数据符号划为同一个符号组，将属于同一个编码数据的符号组划为同一个数据集合；其中η的值等于

Download all information symbols stored by the information node from each non-invalidated information node that stores the same encoded data as the failed node, and download from each non-invalidated meta-check node that stores the same encoded data as the failed node. For all the meta-check symbols stored in the meta-check node, select n non-meta-check nodes arbitrarily from all non-meta-check nodes that store the same encoded data as the failed node to download the non-meta-check node All non-meta-check symbols in ; the data symbols belonging to the same sub-strip are divided into the same symbol group, and the symbol groups belonging to the same encoded data are divided into the same data set; wherein the value of n is equal to

在本发明实施例中，从图2所示的节点D₁、D₂中下载所有的信息符号，从元校验节点C₁中下载校验符号

选择非元校验节点C₂并从C₂中下载校验符号

将下载的数据符号划分为4个符号组：对应第1个子条带的符号组为

对应第2个子条带的符号组为

对应第3个子条带的符号组为

对应第4个子条带的符号组为

In this embodiment of the present invention, all information symbols are downloaded from the nodes D ₁ and D ₂ shown in FIG. 2 , and check symbols are downloaded from the meta-check node C ₁

Select non-meta check node C ₂ and download check symbols from C ₂

The downloaded data symbols are divided into 4 symbol groups: the symbol group corresponding to the first sub-strip is

The symbol group corresponding to the second sub-strip is

The symbol group corresponding to the third sub-strip is

The symbol group corresponding to the 4th sub-strip is

步骤5，对每个数据集合进行处理。Step 5, process each data set.

第1步，按照下式，对每一个数据集合中的每一个符号组进行编号：Step 1, according to the following formula, number each symbol group in each data set:

其中，j_h,c表示第h个数据集合中第c个符号组的编号，

表示向上取证操作，q_h,c表示第h个数据集合中第c个符号组对应子条带的生成矩阵中第一行元素中非零元素的列的序号，

Represents the upward forensic operation, q _h,c represents the sequence number of the column of the non-zero element in the first row element in the generator matrix of the c-th symbol group corresponding to the sub-strip in the h-th data set,

第2步，对于每一个数据集合从该数据集合中编号为1的符号组依次对该数据集合中的每个符号组进行译码-消除操作。Step 2, for each data set, sequentially perform decoding-elimination operations on each symbol group in the data set from the symbol group numbered 1 in the data set.

所述的译码-消除操作是先进行译码操作再进行消除操作，译码操作指的是，利用与基本码对应的译码方法对该数据集合中的当前符号组中的数据符号进行译码，恢复出失效信息节点对应的信息符号，并将其加入到该数据集合中的当前符号组中；消除操作指的是，利用当前符号组中的信息符号对该数据集合中编号大于当前符号组的编号的每一个符号组中的校验符号进行消除；消除是指若校验符号的编码系数行矩阵中与信息符号对应的元素的值为非零值，则将这些信息符号与对应的非零值相乘后从校验符号中减去，得到消除后的校验符号。The decoding-elimination operation is to first perform a decoding operation and then perform an elimination operation. The decoding operation refers to decoding the data symbols in the current symbol group in the data set by using a decoding method corresponding to the basic code. code, recover the information symbol corresponding to the failed information node, and add it to the current symbol group in the data set; the elimination operation refers to using the information symbol in the current symbol group to use the information symbol in the data set with a number greater than the current symbol The check symbols in each symbol group of the group number are eliminated; elimination means that if the value of the elements corresponding to the information symbols in the coding coefficient row matrix of the check symbols is non-zero, then these information symbols are compared with the corresponding ones. The non-zero values are multiplied and subtracted from the check symbol to obtain the eliminated check symbol.

在本发明实施例中，与图2所示的第1至第4个子条带对应的符号集合依次被编号为1、2、3、4。对编号为1的符号组

进行RS译码恢复出信息符号a₃,a₄；然后，利用编号为1的符号组中的信息符号a₁,a₂,a₃,a₄对编号为2、3、4的每一个符号组中的校验符号进行消除：在编号为2的符号组中，由校验符号

的编码系数行矩阵

可知

中与信息符号a₁,a₂,a₃,a₄对应的元素的值为非零值，故从

中减去

得到消除后的校验符号

编号为2的符号组被更新为

采用同样的方法对编号为3、4的符号组中的校验符号进行消除后，编号为3的符号组保持不变、编号为4的符号组更新为

采用与上述过程相同的方法，可以依次恢复出信息符号a₇、a₈，a₁₁、a₁₂，a₁₅、a₁₆。In this embodiment of the present invention, the symbol sets corresponding to the first to fourth sub-strips shown in FIG. 2 are numbered as 1, 2, 3, and 4 in sequence. pair number 1 symbol group

Perform RS decoding to recover the information symbols a ₃ , a ₄ ; then, use the information symbols a ₁ , a ₂ , a ₃ , a ₄ in the symbol group numbered 1 to pair each symbol numbered 2, 3, and 4 The check symbol in the group is eliminated: in the symbol group numbered 2, the check symbol is

The encoded coefficient row matrix of

know

The values of the elements corresponding to the information symbols a ₁ , a ₂ , a ₃ , and a ₄ are non-zero, so from

subtract

Get the check symbol after elimination

Symbol group number 2 is updated to

After the check symbols in the symbol groups numbered 3 and 4 are eliminated by the same method, the symbol group numbered 3 remains unchanged, and the symbol group numbered 4 is updated to

Using the same method as the above process, the information symbols a ₇ , a ₈ , a ₁₁ , a ₁₂ , a ₁₅ , and a ₁₆ can be recovered in sequence.

第3步，所有的数据集合均已进行处理后执行步骤9。In step 3, step 9 is executed after all data sets have been processed.

步骤6，划分信息符号。Step 6, dividing information symbols.

从与失效节点保存同一个编码数据的每个信息节点中下载该信息节点保存的所有的信息符号，将属于同一个子条带的信息符号划为同一个符号组；将属于同一个编码数据的符号组划为同一个数据集合后执行步骤9。Download all the information symbols stored by the information node from each information node that stores the same encoded data as the failed node, and divide the information symbols belonging to the same sub-strip into the same symbol group; classify the symbols belonging to the same encoded data After the group is divided into the same data set, perform step 9.

步骤7，恢复失效信息节点对应的信息符号。Step 7, restore the information symbol corresponding to the failed information node.

第1步，从与失效节点保存同一个编码数据的每个未失效的信息节点中下载该信息节点保存的所有的信息符号，从与失效节点保存同一个编码数据的所有未失效的元校验节点中任意选择与α的值相等数量的元校验节点下载该元校验节点保存的所有的元校验符号；将属于同一个子条带的数据符号划为同一个符号组；Step 1: Download all the information symbols saved by the information node from each non-invalidated information node that saves the same encoded data as the failed node, and check all the non-invalidated meta-checks from all the non-invalidated information nodes that store the same encoded data as the failed node. The node arbitrarily selects the same number of meta-check nodes as the value of α to download all the meta-check symbols saved by the meta-check node; divides the data symbols belonging to the same sub-strip into the same symbol group;

第2步，利用与基本码对应的译码方法对每个符号组中的数据符号进行译码，恢复出该符号组中与失效信息节点对应的信息符号，并将其加入到该符号组中；Step 2: Use the decoding method corresponding to the basic code to decode the data symbols in each symbol group, recover the information symbols corresponding to the failed information nodes in the symbol group, and add them to the symbol group ;

第3步，将属于同一个编码数据的符号组划为同一个数据集合。Step 3: Divide the symbol groups belonging to the same encoded data into the same data set.

步骤8，判断所有失效节点中是否存在失效校验节点，若是，则执行步骤9，否则，执行步骤10。Step 8, determine whether there is a failed check node in all the failed nodes, if yes, go to Step 9, otherwise, go to Step 10.

步骤9，对信息符号进行编码。Step 9, encode the information symbol.

利用失效校验节点对应的编码系数行矩阵对每个数据集合中所有符号组中所有的信息符号进行编码，恢复出失效校验节点对应的校验符号，并将恢复后的校验符号加入到与其对应同一个子条带的符号组中。All information symbols in all symbol groups in each data set are encoded using the coding coefficient row matrix corresponding to the failed check node, the check symbol corresponding to the failed check node is recovered, and the recovered check symbol is added to the in the symbol group corresponding to the same substrip.

步骤10，保存恢复出的数据符号。Step 10, save the recovered data symbols.

增加与α的值相等数量的新的信息节点，增加与所有失效节点的失效校验节点的总数相等数量的新的校验节点，将恢复出的属于同一个信息节点的信息符号保存在同一个信息节点中，将恢复出的属于同一个校验节点的校验符号保存在同一个校验节点中。Add a number of new information nodes equal to the value of α, add a number of new check nodes equal to the total number of failed check nodes of all failed nodes, and save the recovered information symbols belonging to the same information node in the same In the information node, the recovered check symbols belonging to the same check node are stored in the same check node.

本发明实施例中，将恢复出的信息符号a₃、a₇、a₁₁、a₁₅保存到一个新的节点D₃′，将恢复出的信息符号a₄、a₈、a₁₂、a₁₆保存在另一个新的节点D₄′。In the embodiment of the present invention, the recovered information symbols a ₃ , a ₇ , a ₁₁ , and a ₁₅ are stored in a new node D ₃ ′, and the recovered information symbols a ₄ , a ₈ , a ₁₂ , and a ₁₆ are stored Save in another new node _D4 '.

步骤11，用新的节点替换掉失效的节点。Step 11, replace the failed node with a new node.

本发明实施例中，用节点D₃′、D₄′替换掉失效的节点D₃、D₄。In the embodiment of the present invention, the failed nodes D ₃ , D ₄ are replaced with nodes D ₃ ′, D ₄ ′.

参照图4，对本发明的折叠式可扩展分布式存储编码扩展方法的实现步骤做进一步的描述。Referring to FIG. 4 , the implementation steps of the foldable scalable distributed storage coding extension method of the present invention will be further described.

步骤1，添加新的节点。Step 1, add a new node.

表示编码数据当前采用的码的编码参数中的信息节点数，

表示编码数据当前采用的码的编码参数中的校验节点数。Add _ρ information nodes _Y ₁ , Y ₂ , _. , ..., F _γ , where,

Indicates the number of check nodes in the encoding parameter of the code currently used by the encoded data.

参照图5，对本发明实施例中编码数据扩展的实现步骤做进一步的描述。Referring to FIG. 5 , the implementation steps of coded data extension in the embodiment of the present invention will be further described.

本发明实施例中对图2所示的编码数据进行扩展，图5(a)表示在图2的基础上添加节点后的编码数据存储示意图，其中，D₅、D₆、D₇、D₈分别表示在图2的基础上新添加的4个信息节点，C₄、C₅分别表示在图2的基础上新添加的2个校验节点。图5(b)表示对编码数据扩展后的编码数据存储示意图。In the embodiment of the present invention, the coded data shown in FIG. 2 is extended, and FIG. 5(a) shows a schematic diagram of coded data storage after adding nodes on the basis of FIG. 2 , wherein D ₅ , D ₆ , D ₇ , and D ₈ Respectively represent four information nodes newly added on the basis of FIG. 2 , and C ₄ and C ₅ respectively represent two newly added check nodes on the basis of FIG. 2 . FIG. 5(b) is a schematic diagram showing the storage of encoded data after extending the encoded data.

步骤2，将编码数据中生成矩阵由同一个矩阵折叠得到的两个子条带划为一个扩展组。Step 2: Divide the two sub-strips obtained by folding the generated matrix from the same matrix in the encoded data into an extended group.

步骤3，合并每个扩展组内的两个子条带。Step 3, merge the two sub-stripes within each expansion group.

第1步，从保存编码数据的所有信息节点中下载并缓存每个扩展组中与右生成矩阵对应的子条带的所有的信息符号，将这些信息符号中位于子条带不同位置的信息符号分别迁移至信息节点Y₁、Y₂、…、Y_ρ中不同的信息节点中，将迁移后的信息符号和未迁移的信息符号组成合并后的子条带的信息符号；Step 1: Download and cache all the information symbols of the sub-strip corresponding to the right generator matrix in each extension group from all the information nodes that save the encoded data, and put the information symbols located in different positions of the sub-strip among these information symbols. _Respectively migrate to different information nodes in the information nodes Y ₁ , Y ₂ , .

第2步，将每个扩展组中两个子条带中位于同一个元校验节点的两个元校验符号在该元校验节点中相加，得到更新后的元校验符号；Step 2, adding the two meta-check symbols located in the same meta-check node in the two sub-strips in each extended group in the meta-check node to obtain the updated meta-check symbol;

第3步，利用与每个扩展组中子条带对应的左生成矩阵的互补矩阵对缓存的信息符号进行编码得到校正符号，利用校正符号对左生成矩阵对应的子条带的非元校验符号进行更新。The third step is to use the complementary matrix of the left generator matrix corresponding to the sub-strip in each extended group to encode the buffered information symbols to obtain correction symbols, and use the correction symbols to check the non-element of the sub-strip corresponding to the left generator matrix. symbols are updated.

所述的每个扩展组中子条带对应的左生成矩阵的互补矩阵指的是，将与所述扩展组中子条带对应的左生成矩阵对应的被折叠的矩阵表示为矩阵

将所述扩展组中子条带对应的左生成矩阵中的左非元矩阵表示为矩阵W,用矩阵

的第l₆′行至第l₇′行的元素组成一个矩阵

表示编码数据当前采用的码的编码参数中的元校验节点数，将

得到的矩阵表示为矩阵T，用矩阵T中的所有非零列组成扩展组中子条带对应的左生成矩阵的互补矩阵。The complementary matrix of the left generator matrix corresponding to the sub-strip in each expansion group refers to that the collapsed matrix corresponding to the left generator matrix corresponding to the sub-strip in the expansion group is represented as a matrix

Denote the left non-element matrix in the left generator matrix corresponding to the sub-strip in the extended group as a matrix W, and use the matrix

The elements from row l ₆ ' to row l ₇ ' form a matrix

Indicates the number of meta-check nodes in the encoding parameters of the code currently used by the encoded data, set

The resulting matrix is denoted as matrix T, and all non-zero columns in matrix T are used to form the complementary matrix of the left generator matrix corresponding to the substrips in the extended group.

第4步，将每个扩展组中与右生成矩阵对应的子条带中不同位置的非元校验符号分别迁移至校验节点F₁、F₂、…、F_γ中不同的校验节点中；Step 4: Migrate the non-element check symbols from different positions in the sub-strip corresponding to the right generator matrix in each extension group to different check nodes in the check nodes F ₁ , F ₂ , . . . , F _γ respectively middle;

第5步，将更新后的元校验符号、更新后的非元校验符号和迁移后的校验符号组成合并后子条带的校验符号。In step 5, the updated meta-check symbol, the updated non-meta-check symbol and the migrated check symbol are formed into the check symbol of the merged sub-strip.

本发明实施例中，由于图5(a)所示的第1个子条带的生成矩阵G^1,1和第2个子条带的生成矩阵G^1,2是相互关联的，故第1个子条带与第2个子条带划为一个扩展组，同样的，第3个子条带与第4个子条带划为一个扩展组。对于由第1个子条带与第2个子条带组成的扩展组，下载并缓存第2个子条带中所有的信息符号a₅、a₆、a₇、a₈，将这些信息符号依次迁移至节点D₅、D₆、D₇、D₈，迁移后的符号和未迁移的信息符号a₁、a₂、a₃、a₄组成了合并后的子条带的信息符号。在节点C₁内部，通过合并

与

得到更新后的元校验符号

即

左生成矩阵G^1,1对应的互补矩阵为

通过利用互补矩阵对缓存的信息符号a₅、a₆、a₇、a₈进行编码得到校正符号

利用得到的两个校正符号分别对

进行消除，即从

中减去

从

中减去

得到更新后的两个校验符号

将第2个子条带中的非元校验符号

分别迁移至节点C₄、C₅；将更新后的元校验符号、更新后的非元校验符号、迁移后的校验符号组成合并后子条带所有的校验符号。图5(b)中左侧的子条带为对第1个子条带、第2个子条带合并后得到的结果。采用同样的方式对第3个子条带、第4个子条带进行合并可以得到另外一个新的子条带。图5(b)中右侧的子条带为对第3个子条带、第4个子条带合并后得到的结果。In this embodiment of the present invention, since the generator matrix G ^1,1 of the first sub-strip and the generator matrix G ^1,2 of the second sub-strip shown in FIG. 5(a) are related to each other, the first sub-strip is The strip and the second sub-strip are grouped into an extended group, and similarly, the third and fourth sub-strips are grouped into an extended group. For the extended group consisting of the first sub-strip and the second sub-strip, download and cache all the information symbols a ₅ , a ₆ , a ₇ , a ₈ in the second sub-strip, and migrate these information symbols to Nodes D ₅ , D ₆ , D ₇ , D ₈ , the migrated symbols and the non-migrated information symbols a ₁ , a ₂ , a ₃ , and a ₄ form the information symbols of the merged sub-strip. Inside node _C1 , by merging

and

Get the updated meta check symbol

which is

The complementary matrix corresponding to the left generator matrix G ^1,1 is

Correction symbols are obtained by encoding the buffered information symbols a ₅ , a ₆ , a ₇ , a ₈ with complementary matrices

Using the obtained two correction symbols, respectively

to eliminate, i.e. from

subtract

from

subtract

Get the updated two check symbols

put the non-meta check symbols in the 2nd substrip

Migrate to nodes C ₄ and C ₅ respectively; compose the updated meta-check symbols, updated non-meta-check symbols, and migrated check symbols to form all the check symbols of the merged sub-stripes. The sub-strip on the left in Figure 5(b) is the result obtained by merging the first sub-strip and the second sub-strip. In the same way, the third sub-strip and the fourth sub-strip can be merged to obtain another new sub-strip. The sub-strip on the right in Figure 5(b) is the result obtained by merging the third sub-strip and the fourth sub-strip.

步骤4，将所有合并后的子条带的数据符号组成扩展后的编码数据。In step 4, the data symbols of all merged sub-strips are formed into the expanded coded data.

在本发明实施例中，将图5(b)中所示的两个新的子条带所有的数据符号组成了扩展后的编码数据。In the embodiment of the present invention, all the data symbols of the two new sub-strips shown in FIG. 5(b) form the extended coded data.

Claims

1. a foldable scalable distributed storage coding method is characterized in that, calculating the coding parameters of the next stage, constructing the corresponding generating matrix set of the previous stage according to the folding rule of the generating matrix, determining the code corresponding to each stage, The codes corresponding to all stages form a coding group; the steps of the method include the following:

(1) Set the encoding parameters of the first stage:

The number of information nodes k ₁ , the number of check nodes r ₁ , and the number of meta-check nodes s ₁ are set as the encoding parameters of the first stage, wherein k ₁ and r ₁ are positive integers, s ₁ is a non-negative integer and less than or equal to r ₁ ;

(2) Calculate the encoding parameters of the next stage:

(2a) Calculate the number of information nodes and check nodes in the next stage according to the following formula:

k'=2k

r′=2r-s

Among them, k' and r' respectively represent the number of information nodes and check nodes in the next stage of the current stage, and k, r and s respectively represent the number of information nodes, check nodes and meta-check nodes in the current stage;

(2b) From the value range {s,s+1,...,2r-s}, select a value equal to the maximum number of information nodes that fail to be repaired at the same time with low repair complexity as the meta-calibration of the next stage Check the number of nodes;

(3) Judging whether the total number of coding parameters obtained by the current iteration is equal to m, and if so, execute step (4); otherwise, execute step (2) after taking the coding parameters determined this time as the coding parameters of the current stage; m represents The total number of codes in the set coding group to be constructed, and the value of m is an integer greater than or equal to 2;

(4) Determine the final encoding parameters:

(4a) the value of the number of check nodes in the encoding parameter obtained by the current iteration is set to the number of meta-check nodes in the encoding parameter obtained by the current iteration;

(4b) The number of information nodes obtained by the current iteration, the number of check nodes and the determined number of meta-check nodes are formed into final coding parameters;

(5) Construct the generator matrix corresponding to the last stage:

Taking a systematic MDS code as the basic code, and using the generator matrix construction method of the basic code, a generator matrix G ^m corresponding to the last stage is constructed:

Among them, G ^m represents the generator matrix corresponding to the last stage,

(6) Set the folding rules of the generated matrix:

(6b) Generate a matrix whose row and column elements are equal to those of matrix X, and obtain a left-element matrix X′ after all the elements of the last μ non-zero columns of the matrix are set to zero,

(6c) According to the following formula, combine matrix A, matrix X', matrix U', matrix B, matrix X'', and matrix V respectively to construct the generation corresponding to the previous stage of the current stage after the matrix to be folded is folded Two interrelated generator matrices in a matrix set:

Among them, G′ represents the left generator matrix, and G″ represents the right generator matrix;

(7) Construct the generation matrix set corresponding to the previous stage of the last stage:

(7a) According to the folding rule of the generator matrix, the generator matrix corresponding to the last stage is folded, and the folded generator matrix is added to the generator matrix set corresponding to the previous stage of the current stage;

(7b) judge whether the value of m-1 is equal to 2, if yes, then execute step (10), otherwise, perform step (8) after taking the generation matrix set determined by this iteration as the generation matrix set corresponding to the current stage;

(8) Construct the generating matrix set corresponding to the previous stage:

According to the folding rule of the generator matrix, fold each generator matrix in the generator matrix set corresponding to the current stage, and add the generator matrix obtained by folding to the generator matrix set corresponding to the previous stage of the current stage;

(9) Determine whether the number of generator matrices in the generator matrix set obtained by the current iteration is equal to 2 ^m-1 , if so, perform step (10), otherwise, take the generator matrix set determined this time as the generator matrix set corresponding to the current stage. Execute step (8);

(10) Determine the code of each stage:

Take the encoding parameter corresponding to each stage as the encoding parameter of its corresponding code; take the generator matrix corresponding to the last stage as the generator matrix of the sub-strip of the code of the last stage; take every other stage except the last stage Each generator matrix in the corresponding generator matrix set is used as the generator matrix of each sub-strip of its corresponding code;

(11) code of all stages is combined into a coding group;

(12) select a code from the coding group, and the sum of the number of information nodes and the number of check nodes in the coding parameters of the selected code is equal to the total number of nodes expected to be adopted;

(13) Encode the data to be encoded:

Divide the data to be encoded into t information symbols on average, _t =km; encode the data to be encoded with the corresponding generator matrix of each sub-strip of the selected code, obtain the data symbols of this sub-strip, by all sub-strips The data symbols of the strip make up the encoded data; the encoded data is stored in the corresponding node.

2. a kind of foldable scalable distributed storage coding method according to claim 1, is characterized in that, the data symbol of the sub-strip described in the step (13) comprises two kinds of information symbols and check symbols, The verification symbols include two sub-categories of meta-check symbols and non-meta-check symbols, and the data symbols are obtained by encoding the data to be encoded by the coding coefficient row matrix corresponding to each row in the generator matrix; the information symbols are the left generator matrix in the left generator matrix. The data symbol obtained by encoding the data to be encoded by the right information matrix in the information matrix or the right generator matrix; the element check symbol is the left element matrix in the left generator matrix or the right element matrix in the right generator matrix to encode the data to be encoded The obtained data symbol; the non-element check symbol is the data symbol obtained by encoding the data to be encoded by the left non-element matrix in the left generator matrix or the right non-element matrix in the right generator matrix.

3. a kind of foldable scalable distributed storage coding method according to claim 1, is characterized in that, described in step (13), to save coded data in corresponding node refers to, in each sub-strip Data symbols at different positions are stored in different nodes, and data symbols at the same position in different sub-strips are stored in the same node; the nodes include two types of information nodes and check nodes, and the check nodes include There are two types of meta-check nodes and non-meta-check nodes; the information nodes are used to store information symbols; the meta-check nodes are used to store meta-check symbols; the non-meta-check nodes are used to store non-meta-check nodes. Check mark.

4. A kind of foldable scalable distributed storage coding repair method of foldable scalable distributed storage coding according to claim 1 is characterized in that, each data set is processed, information symbols are coded, stored The recovered data symbols; the steps of the method include the following:

(1) The total number of failed nodes encoded by the same code for each encoded data is greater than the number of check nodes in the code encoding parameters, and the repair is abandoned, and step (2) is performed in the remaining cases;

(2) Determine whether the total number of failed information nodes of all failed nodes is a non-zero value, and if so, execute step (3); otherwise, execute step (6) after determining that there is no failed information node but there is a failed check node;

(3) Judgment

(4) Divide data symbols:

(5) Process each data set:

(5a) Number each symbol group in each data set according to the following formula:

Among them, j _h,c represents the number of the c-th symbol group in the h-th data set,

(5b) for each data set, perform decoding-elimination operation on each symbol group in the data set in turn from the symbol group numbered 1 in the data set;

(5c) Step (9) is performed after all data sets have been processed;

(6) Divide information symbols:

Download all the information symbols stored by the information node from each information node that stores the same encoded data as the failed node, and divide the information symbols belonging to the same sub-strip into the same symbol group; classify the symbols belonging to the same encoded data Step (9) is performed after the group is divided into the same data set;

(7) Restore the information symbol corresponding to the failed information node:

(7a) Download all the information symbols saved by the information node from each non-invalidated information node that stores the same encoded data as the failed node, and download all the non-invalidated meta-check nodes that store the same encoded data with the failed node. Arbitrarily select a number of meta-check nodes equal to the value of α to download all meta-check symbols saved by the meta-check node; divide the data symbols belonging to the same sub-strip into the same symbol group;

(7b) utilize the decoding method corresponding to the basic code to decode the data symbols in each symbol group, recover the information symbols corresponding to the failed information nodes in the symbol group, and add it to the symbol group;

(7c) dividing the symbol groups belonging to the same encoded data into the same data set;

(8) Judging whether there is a failed check node in all the failed nodes, if so, execute step (9), otherwise, execute step (10);

(9) Encode the information symbol:

All information symbols in all symbol groups in each data set are encoded using the coding coefficient row matrix corresponding to the failed check node, the check symbol corresponding to the failed check node is recovered, and the recovered check symbol is added to the in the symbol group corresponding to the same sub-strip;

(10) Save the recovered data symbols:

Add a number of new information nodes equal to the value of α, add a number of new check nodes equal to the total number of failed check nodes of all failed nodes, and save the recovered information symbols belonging to the same information node in the same In the information node, the recovered check symbols belonging to the same check node are stored in the same check node;

(11) Replace the failed node with a new node.

5. a kind of foldable scalable distributed storage coding repair method according to claim 4, is characterized in that, the decoding-elimination operation described in the step (5b) is to carry out the decoding operation and then carry out the elimination operation again, The decoding operation refers to decoding the data symbols in the current symbol group in the data set by using the decoding method corresponding to the basic code, recovering the information symbols corresponding to the failed information nodes, and adding them to the data. In the current symbol group in the set; the elimination operation refers to using the information symbols in the current symbol group to eliminate the check symbols in each symbol group whose number is greater than the number of the current symbol group in the data set; elimination refers to If the values of the elements corresponding to the information symbols in the coding coefficient row matrix of the check symbols are non-zero values, multiply these information symbols by the corresponding non-zero values and then subtract them from the check symbols to obtain the eliminated calibration symbols. test symbol.

6. a kind of foldable scalable distributed storage coding expansion method of foldable scalable distributed storage coding according to claim 1, is characterized in that, the two subfolders obtained by the same matrix folding in the coded data are generated by the matrix. The strip is divided into an extended group, two sub-strips in the same extended group are merged, and the data symbols of all the new sub-strips are formed into expanded new encoded data. The steps of the method include the following:

(1) Add a new node:

Add _ρ information nodes _Y ₁ , Y ₂ , _. , ..., F _γ , where,

(2) the two sub-strips in which the generation matrix is related to each other in the encoded data are divided into an extended group;

(3) Merge the two sub-stripes within each extension group:

(3a) Download and cache all the information symbols of the sub-strip corresponding to the right generator matrix in each extension group from all the information nodes that save the encoded data, and separate the information symbols located at different positions of the sub-strip among these information symbols. _Migrating to different information nodes in the information nodes Y ₁ , Y ₂ , .

(3b) adding the two meta-check symbols located in the same meta-check node in the two sub-strips in each extended group in the meta-check node to obtain the updated meta-check symbol;

(3c) Use the complementary matrix of the left generator matrix corresponding to the sub-strip in each extension group to encode the buffered information symbols to obtain correction symbols, and use the correction symbols to encode the non-element check symbols of the sub-strip corresponding to the left generator matrix to update;

(3d) Migrate the non-meta-check symbols at different positions in the sub-strip corresponding to the right generator matrix in each extension group to different check nodes in the check nodes F ₁ , F ₂ , . . . , F _γ respectively ;

(3e) the updated meta-check symbol, the updated non-meta-check symbol and the migrated check symbol are formed into the check symbol of the merged sub-strip;

(4) The data symbols of all the merged sub-strips are formed into the extended coded data.

7. a kind of foldable scalable distributed storage coding expansion method according to claim 6, is characterized in that, the complementary matrix of the left generating matrix corresponding to the sub-strip in each expansion group described in step (3c) Refers to denoting the collapsed matrix corresponding to the left generating matrix corresponding to the substrip in the expanded group as a matrix

The elements from row l ₆ ' to row l ₇ ' form a matrix