CN103326731B - A kind of Hidden Markov correlated source coded method encoded based on distributed arithmetic - Google Patents
A kind of Hidden Markov correlated source coded method encoded based on distributed arithmetic Download PDFInfo
- Publication number
- CN103326731B CN103326731B CN201310130164.1A CN201310130164A CN103326731B CN 103326731 B CN103326731 B CN 103326731B CN 201310130164 A CN201310130164 A CN 201310130164A CN 103326731 B CN103326731 B CN 103326731B
- Authority
- CN
- China
- Prior art keywords
- scale
- alpha
- interval
- probability
- node
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
- 238000000034 method Methods 0.000 title claims abstract description 47
- 230000002596 correlated effect Effects 0.000 title claims description 8
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 21
- 230000008569 process Effects 0.000 claims abstract description 19
- 230000007704 transition Effects 0.000 claims description 3
- 238000003491 array Methods 0.000 claims 2
- 230000014509 gene expression Effects 0.000 claims 2
- 238000005516 engineering process Methods 0.000 abstract description 6
- 230000000694 effects Effects 0.000 abstract description 2
- 230000000875 corresponding effect Effects 0.000 description 14
- 238000005259 measurement Methods 0.000 description 13
- 238000004364 calculation method Methods 0.000 description 7
- 238000002474 experimental method Methods 0.000 description 4
- 238000005457 optimization Methods 0.000 description 4
- 230000003321 amplification Effects 0.000 description 2
- 230000006835 compression Effects 0.000 description 2
- 238000007906 compression Methods 0.000 description 2
- 239000011159 matrix material Substances 0.000 description 2
- 238000003199 nucleic acid amplification method Methods 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000000717 retained effect Effects 0.000 description 1
- 238000004088 simulation Methods 0.000 description 1
Landscapes
- Error Detection And Correction (AREA)
Abstract
本发明公开了一种基于分布式算术编码的隐马尔科夫相关信源编码方法,包括以下步骤:产生信源;对隐马尔科夫相关信源编码;利用前向算法进行的解码。本发明通过使用DAC来压缩记忆相关的信源,在具体实现中,将信源之间的相关性建模为一个隐马尔科夫过程,实验结果表明,新方案的执行结果接近斯理篇‑伍夫理论界限,较好的解决了现有基于熵编码的SWC技术,重叠区域后,解码端将存在有歧义的码字和SWC对称的问题。本发明较好的利用现有的技术,实验效果好,有着很好的应用价值。
The invention discloses a method for encoding hidden Markov correlation information sources based on distributed arithmetic coding, which comprises the following steps: generating information sources; encoding hidden Markov correlation information sources; and decoding by forward algorithm. The present invention uses DAC to compress the information source related to memory, in specific implementation, the correlation between information sources is modeled as a Hidden Markov process, experimental result shows, the executive result of new scheme is close to Si Li- The boundary of Wooff theory better solves the existing SWC technology based on entropy coding. After overlapping regions, there will be ambiguous codewords and SWC symmetry problems at the decoding end. The present invention preferably utilizes the existing technology, has good experimental effect and has good application value.
Description
技术领域technical field
本发明属于分布式算术编码领域,尤其涉及一种基于分布式算术编码的隐马尔科夫相关信源编码方法。The invention belongs to the field of distributed arithmetic coding, in particular to a hidden Markov correlation information source coding method based on distributed arithmetic coding.
背景技术Background technique
分布式算术编码(Distributed Arithmetic Coding,DAC)可以有效地应用于斯理篇-伍夫编码,特别是对于短数据块。最近,一些基于熵编码的SWC技术被提出,比如分布式算术编码(Distributed Arithmetic Coding,DAC)和重叠的准算术编码(OverlappedQuasi-Arithmetic,OQAC),这些技术可以看做是经典算术编码(Arithmetic Coding AC)的扩展,其原理是允许码率概率区间的重叠以实现在码率H(X|Y)≤R<H(x)的条件下编码信源X。引入重叠区域后,将产生一个较大的最终区间和一个较短的码字。然而,同时解码端将存在有歧义的码字。一个软连接解码器利用边信息Y来解码X。之后,研究者提出了时间共享DAC(time-shared DAC,TS-DAC)以解决对称的SWC问题。为了实现率增量SWC,研究者提出了率自适应DAC。Distributed Arithmetic Coding (DAC) can be effectively applied to Slipper-Wooffer coding, especially for short data blocks. Recently, some SWC technologies based on entropy coding have been proposed, such as Distributed Arithmetic Coding (DAC) and Overlapped Quasi-Arithmetic (OQAC), which can be regarded as classical arithmetic coding (Arithmetic Coding) The principle of the extension of AC) is to allow the overlapping of the code rate probability intervals to realize the encoding of the source X under the condition that the code rate H(X|Y)≤R<H(x). The introduction of overlapping regions results in a larger final interval and a shorter codeword. However, at the same time there will be ambiguous codewords at the decoding end. A soft-connected decoder utilizes side information Y to decode X. Afterwards, the researchers proposed a time-shared DAC (time-shared DAC, TS-DAC) to solve the symmetric SWC problem. In order to realize the rate-incremental SWC, the researchers proposed a rate-adaptive DAC.
发明内容Contents of the invention
本发明提供了一种基于分布式算术编码的隐马尔科夫相关信源编码方法,旨在解决现有基于熵编码的SWC技术,重叠区域后,解码端将存在有歧义的码字和SWC对称的问题。The present invention provides a hidden Markov correlation source coding method based on distributed arithmetic coding, which aims to solve the existing SWC technology based on entropy coding. After overlapping regions, there will be ambiguous codewords and SWC symmetry at the decoding end The problem.
本发明的目的在于提供一种基于分布式算术编码的隐马尔科夫相关信源编码方法,所述基于分布式算术编码的隐马尔科夫相关信源编码方法包括以下步骤:The object of the present invention is to provide a kind of hidden Markov correlation information source coding method based on distributed arithmetic coding, described hidden Markov correlation information source coding method based on distributed arithmetic coding comprises the following steps:
产生信源;generate a source;
对隐马尔科夫相关信源编码;Encoding hidden Markov correlation sources;
利用前向算法进行的解码。Decoding with forward algorithm.
进一步、所述产生信源的步骤为:Further, the steps of generating a source of information are:
读取HMM文件,确定隐马尔科夫模型;Read the HMM file and determine the hidden Markov model;
应用隐马尔科夫模型计算观察值序列O;Apply the hidden Markov model to calculate the observation sequence O;
根据偏差概率p,生成信源X;According to the deviation probability p, generate the source X;
将信源X与观察值序列O进行异或运算,生成边信息Y。The XOR operation is performed on the information source X and the observation value sequence O to generate side information Y.
进一步、所述对隐马尔科夫相关信源编码步骤为:Further, the described step of coding the Hidden Markov Correlation Information Source is:
设p是二元信源X的偏差概率,即p=P(xt=1),在经典的算术编码中,信源符号xt被迭代的映射到[0,1)的子区间中,这个子区间的长度与(1-p)和p成比例,在分布式算术编码中,分配的子区间会有重叠,即符号xt=0和xt=1分别对应于区间[0,(1-p)γ)和[1-pγ,1),定义low和high表示编码区间,range表示整个区间长度,half_range表示区间的一半,first_quarter表示整个区间的四分之一大小,third_quarter表示整个区间的四分之三大小。Let p be the bias probability of the binary source X, that is, p=P(x t =1), in classical arithmetic coding, the source symbol x t is iteratively mapped to the subinterval of [0, 1), The length of this subinterval is proportional to (1-p) and p. In distributed arithmetic coding, the allocated subintervals will overlap, that is, the symbols x t =0 and x t =1 correspond to the interval [0, ( 1-p) γ ) and [1-p γ , 1), define low and high to represent the coding interval, range represents the length of the entire interval, half_range represents half of the interval, first_quarter represents a quarter of the entire interval, third_quarter represents the entire Three-quarters the size of the interval.
进一步、所述对隐马尔科夫相关信源编码还包括具体的区间放大操作Further, the encoding of the hidden Markov correlation source also includes a specific interval amplification operation
当high<half_range时,这时子区间完全处在上半区,该区间的上下端点的最高有效位都是0,输出相同的最高有效位0,然后将区间的上界和下界分别扩大2倍,即:low=2*low,high=2*high;When high<half_range, the sub-interval is completely in the upper half of the range, the most significant bits of the upper and lower endpoints of the range are both 0, and the same most significant bit 0 is output, and then the upper and lower bounds of the interval are respectively enlarged by 2 times , namely: low=2*low, high=2*high;
当low>half_range时,这时子区间完全处在下半区,该区间的上下端点的最高有效位都是1,此时输出相同的最高有效位1,然后将区间的上界和下界分别扩大,即:low=2*(low-half_range),high=2*(high-half_range);When low>half_range, the sub-interval is completely in the lower half of the range, and the most significant bits of the upper and lower endpoints of the range are both 1. At this time, the same most significant bit 1 is output, and then the upper and lower bounds of the interval are respectively expanded. Namely: low=2*(low-half_range), high=2*(high-half_range);
当first_quarter≤low≤high≤third_quarter,如果还是按照上述的方法对low和high进行更新,由于子区间不断缩小,所以子区间总是会处在first_quarter≤low≤high≤third_quarter这个区间里,而始终没有相同的最高比特位可以输出,所以在这种情况下就用新的方法进行子区间更新:low=2*(low-first_quarter),high=2*(high-first_quarter)+1。When first_quarter≤low≤high≤third_quarter, if the low and high are still updated according to the above method, since the sub-interval keeps shrinking, the sub-interval will always be in the interval of first_quarter≤low≤high≤third_quarter, and there will never be The same highest bit can be output, so in this case, a new method is used to update the sub-interval: low=2*(low-first_quarter), high=2*(high-first_quarter)+1.
进一步、所述利用前向算法进行的解码步骤为:Further, the decoding steps carried out by using the forward algorithm are:
定义一个结构体node,结构体包括low,high,用来表示概率区间,metric用来表示每个分支的重要程度即分支中各个结点度量的总和,alpha用来记录局部概率αt(i),path用来保存解码出的字符,另外定义数组scale[2],其中scale[0]表示当前解码符号为0或1时,边信息对应位置符号为0的总概率,scale[0]=alpha[0][0]+alpha[0][1];scale[1]表示当前解码符号为0或1时,边信息对应位置符号为1的总概率,scale[1]=alpha[1][0]+alpha[1][1];Define a structure node, the structure includes low, high, which is used to represent the probability interval, metric is used to represent the importance of each branch, that is, the sum of the metrics of each node in the branch, and alpha is used to record the local probability α t (i) , path is used to save the decoded characters, and an array scale[2] is also defined, where scale[0] indicates the total probability that the position symbol corresponding to the side information is 0 when the current decoding symbol is 0 or 1, scale[0]=alpha [0][0]+alpha[0][1]; scale[1] indicates the total probability that the position symbol corresponding to the side information is 1 when the current decoding symbol is 0 or 1, scale[1]=alpha[1][ 0]+alpha[1][1];
步骤一、根据原始信源X的长度N,分配解码所需的结点空间,并初始化分布式算术码的解码缓冲区;Step 1. According to the length N of the original source X, allocate the node space required for decoding, and initialize the decoding buffer of the distributed arithmetic code;
步骤二、在第i次解码过程中(0<i≤N),读取边信息当前位置的字符si,Step 2. During the ith decoding process (0<i≤N), read the character s i at the current position of the side information,
步骤三、使用前向算法,对每个结点,计算第i次解码时,产生0分支和1分支的概率,当i=1即在状态处于初始时刻时,局部概率用公式a1(i)=πibi(z1)来计算并用alpha数组记录,当i>1时,局部概率用公式计算,公式中变量αt-1(j)由结构体node内取得,每次计算后,得到本次迭代的alpha[0],alpha[1],scale[0]和scale[1],scale[0]表示当前解码符号为0或1时,边信息对应位置符号为0的总概率,scale[0]=alpha[0][0]+alpha[0][1];scale[1]表示当前解码符号为0或1时,边信息对应位置符号为1的总概率,scale[1]=alpha[1][0]+alpha[1][1];Step 3. Use the forward algorithm to calculate the probability of branch 0 and branch 1 for each node when it is decoded for the i-th time. When i=1, that is, when the state is at the initial moment, the local probability is calculated using the formula a 1 (i )=π i b i (z 1 ) to calculate and record with alpha array, when i>1, the local probability is calculated by the formula Calculation, the variable α t-1 (j) in the formula is obtained from the structure node, and after each calculation, alpha[0], alpha[1], scale[0] and scale[1], scale of this iteration are obtained [0] indicates that when the current decoding symbol is 0 or 1, the total probability that the corresponding position symbol of the side information is 0, scale[0]=alpha[0][0]+alpha[0][1]; scale[1] indicates When the current decoding symbol is 0 or 1, the total probability of side information corresponding to the position symbol being 1, scale[1]=alpha[1][0]+alpha[1][1];
步骤四、计算第i次解码的符号ci,如果码字没有落在重叠区域内,则把当前的符号写入node.path,并将解码符号ci和边信息si进行异或运算,当前点的度量就用到达该点的所有路径的概率之和来表示,也就是scale[0]或者scale[1],如果边信息和当前解码符号相同,当前点的度量就用scale[0]表示,如果边信息和当前解码符号不同,就用scale[1]表示,并将结点中的局部alpha更新为更新整条分支的重要性,对于第k条分支,如果码字落在重叠区域内,则产生两个分支m和n,这时候node[m].metric表示的是与边信息相同的分支的度量,即当边信息为1时,node[m].metric表示1分支的度量,当边信息为0时,node[m].metric表示0分支的度量,而node[n].metric则相反,表示与边信息相反的分支的度量,对于m分支,其存储的符号与边信息相同,此时所以m分支的整体重要性就用node[m].metric+=log(scale[0])表示,即m分支上当前所有点的scale值的乘积,而n分支的整体重要性就用node[n].metric+=log(scale[1])表示,即n分支上当前所有点的scale值的乘积,并用新的概率alpha[0]更新原node点中node[m].alpha,用新的alpha[1]更新原node点中的node[n].alpha;Step 4: Calculate the i-th decoded symbol c i , if the code word does not fall in the overlapping area, write the current symbol into node.path, and perform XOR operation on the decoded symbol c i and the side information s i , The measure of the current point is the sum of the probabilities of all paths to the point To represent, that is, scale[0] or scale[1], if the side information is the same as the current decoding symbol, The metric of the current point is represented by scale[0]. If the side information is different from the current decoding symbol, it is represented by scale[1], and the local alpha in the node is updated as Update the importance of the entire branch, for the kth branch, If the codeword falls in the overlapping area, two branches m and n are generated. At this time, node[m].metric represents the measurement of the same branch as the side information, that is, when the side information is 1, node[m] .metric represents the measurement of branch 1. When the side information is 0, node[m].metric represents the measurement of branch 0, while node[n].metric is the opposite, representing the measurement of the branch opposite to the side information. For m branch , the stored sign is the same as the side information, at this time Therefore, the overall importance of the m branch is represented by node[m].metric+=log(scale[0]), that is, the product of the scale values of all current points on the m branch, and the overall importance of the n branch is represented by node[n ].metric+=log(scale[1]) means, that is, the product of the scale values of all current points on the n branch, and update the node[m].alpha in the original node point with the new probability alpha[0], and use the new alpha [1] Update node[n].alpha in the original node point;
步骤五、由于解码一次就会产生新的解码区间,所以计算完分支的度量并更新了局部概率后,要更新概率区间和相关的信息;Step 5. Since a new decoding interval will be generated after decoding once, after calculating the branch metric and updating the local probability, the probability interval and related information must be updated;
步骤六、对所有分支按照metric值从大到小进行排序,如果分支数大于我们所设定的阈值M,则将多余的分支剪掉,如果i<N,则返回步骤二;否则,解码结束,将分支排序后的第一条分支中path保存的数据作为解码输出。Step 6. Sort all branches according to the metric value from large to small. If the number of branches is greater than the threshold M we set, cut off the redundant branches. If i<N, return to step 2; otherwise, the decoding ends , take the data saved in the path in the first branch after branch sorting as the decoding output.
本发明通过使用DAC来压缩记忆相关的信源,在具体实现中,将信源之间的相关性建模为一个隐马尔科夫过程,实验结果表明,新方案的执行结果接近斯理篇-伍夫理论界限,较好的解决了现有基于熵编码的SWC技术,重叠区域后,解码端将存在有歧义的码字和SWC对称的问题。本发明较好的利用现有的技术,实验效果好,有着很好的应用价值。The present invention uses DAC to compress the information sources related to memory, and in the specific implementation, the correlation between information sources is modeled as a hidden Markov process, and the experimental results show that the execution result of the new scheme is close to that of Si Li- The boundary of Wooff theory better solves the existing SWC technology based on entropy coding. After overlapping regions, there will be ambiguous codewords and SWC symmetry problems at the decoding end. The present invention preferably utilizes the existing technology, has good experimental effect and has good application value.
附图说明Description of drawings
图1是本发明提供的基于分布式算术编码的隐马尔科夫相关信源编码方法的流程图;Fig. 1 is the flowchart of the hidden Markov correlation information source coding method based on distributed arithmetic coding provided by the present invention;
图2是本发明提供的基于分布式算术编码的一个解码树的示意图。Fig. 2 is a schematic diagram of a decoding tree based on distributed arithmetic coding provided by the present invention.
具体实施方式detailed description
为了使本发明的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本发明进行进一步的详细说明。应当理解,此处所描述的具体实施例仅仅用以解释本发明,并不用于限定发明。In order to make the purpose, technical solution and advantages of the present invention more clear, the present invention will be further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the present invention, not to limit the invention.
图1示出了本发明实施例提供的基于分布式算术编码的隐马尔科夫相关信源编码方法的实现流程。为了便于说明,仅仅示出了与本发明相关的部分。FIG. 1 shows the implementation flow of the distributed arithmetic coding-based hidden Markov correlation information source coding method provided by the embodiment of the present invention. For ease of illustration, only the parts relevant to the present invention are shown.
本发明的基于分布式算术编码的隐马尔科夫相关信源编码方法,包括以下步骤:The Hidden Markov Correlation Source Coding method based on Distributed Arithmetic Coding of the present invention comprises the following steps:
产生信源;generate a source;
对隐马尔科夫相关信源编码;Encoding hidden Markov correlation sources;
利用前向算法进行的解码。Decoding with forward algorithm.
作为本发明实施例的一优化方案,产生信源的步骤为:As an optimization scheme of the embodiment of the present invention, the steps of generating the information source are:
读取HMM文件,确定隐马尔科夫模型;Read the HMM file and determine the hidden Markov model;
应用隐马尔科夫模型计算观察值序列O;Apply the hidden Markov model to calculate the observation sequence O;
根据偏差概率p,生成信源X;According to the deviation probability p, generate the source X;
将信源X与观察值序列O进行异或运算,生成边信息Y。The XOR operation is performed on the information source X and the observation value sequence O to generate side information Y.
作为本发明实施例的一优化方案,对隐马尔科夫相关信源编码步骤为:As an optimization scheme of the embodiment of the present invention, the encoding steps of hidden Markov correlation sources are as follows:
设p是二元信源X的偏差概率,即p=P(xt=1),在经典的算术编码中,信源符号xt被迭代的映射到[0,1)的子区间中,这个子区间的长度与(1-p)和p成比例,在分布式算术编码中,分配的子区间会有重叠,即符号xt=0和xt=1分别对应于区间[0,(1-p)γ)和[1-pγ,1),定义low和high表示编码区间,range表示整个区间长度,half_range表示区间的一半,first_quarter表示整个区间的四分之一大小,third_quarter表示整个区间的四分之三大小。Let p be the bias probability of the binary source X, that is, p=P(x t =1), in classical arithmetic coding, the source symbol x t is iteratively mapped to the subinterval of [0, 1), The length of this subinterval is proportional to (1-p) and p. In distributed arithmetic coding, the allocated subintervals will overlap, that is, the symbols x t =0 and x t =1 correspond to the interval [0, ( 1-p) γ ) and [1-p γ , 1), define low and high to represent the coding interval, range represents the length of the entire interval, half_range represents half of the interval, first_quarter represents a quarter of the entire interval, third_quarter represents the entire Three-quarters the size of the interval.
作为本发明实施例的一优化方案,对隐马尔科夫相关信源编码还包括具体的区间放大操作As an optimization scheme of the embodiment of the present invention, the hidden Markov correlation source coding also includes a specific interval amplification operation
当high<half_range时,这时子区间完全处在上半区,该区间的上下端点的最高有效位都是0,输出相同的最高有效位0,然后将区间的上界和下界分别扩大2倍,即:low=2*low,high=2*high;When high<half_range, the sub-interval is completely in the upper half of the range, the most significant bits of the upper and lower endpoints of the range are both 0, and the same most significant bit 0 is output, and then the upper and lower bounds of the interval are respectively enlarged by 2 times , namely: low=2*low, high=2*high;
当low>half_range时,这时子区间完全处在下半区,该区间的上下端点的最高有效位都是1,此时输出相同的最高有效位1,然后将区间的上界和下界分别扩大,即:low=2*(low-half_range),high=2*(high-half_range);When low>half_range, the sub-interval is completely in the lower half of the range, and the most significant bits of the upper and lower endpoints of the range are both 1. At this time, the same most significant bit 1 is output, and then the upper and lower bounds of the interval are respectively expanded. Namely: low=2*(low-half_range), high=2*(high-half_range);
当first_quarter≤low≤high≤third_quarter,如果还是按照上述的方法对low和high进行更新,由于子区间不断缩小,所以子区间总是会处在first_quarter≤low≤high≤third_quarter这个区间里,而始终没有相同的最高比特位可以输出,所以在这种情况下就用新的方法进行子区间更新:low=2*(low-first_quarter),high=2*(high-first_quarter)+1。When first_quarter≤low≤high≤third_quarter, if the low and high are still updated according to the above method, since the sub-interval keeps shrinking, the sub-interval will always be in the interval of first_quarter≤low≤high≤third_quarter, and there will never be The same highest bit can be output, so in this case, a new method is used to update the sub-interval: low=2*(low-first_quarter), high=2*(high-first_quarter)+1.
作为本发明实施例的一优化方案,利用前向算法进行的解码步骤为:As an optimization scheme of the embodiment of the present invention, the decoding steps performed using the forward algorithm are:
定义一个结构体node,结构体包括low,high,用来表示概率区间,metric用来表示每个分支的重要程度即分支中各个结点度量的总和,alpha用来记录局部概率αt(i),path用来保存解码出的字符,另外定义数组scale[2],其中scale[0]表示当前解码符号为0或1时,边信息对应位置符号为0的总概率,scale[0]=alpha[0][0]+alpha[0][1];scale[1]表示当前解码符号为0或1时,边信息对应位置符号为1的总概率,scale[1]=alpha[1][0]+alpha[1][1];Define a structure node, the structure includes low, high, which is used to represent the probability interval, metric is used to represent the importance of each branch, that is, the sum of the metrics of each node in the branch, and alpha is used to record the local probability α t (i) , path is used to save the decoded characters, and an array scale[2] is also defined, where scale[0] indicates the total probability that the position symbol corresponding to the side information is 0 when the current decoding symbol is 0 or 1, scale[0]=alpha [0][0]+alpha[0][1]; scale[1] indicates the total probability that the position symbol corresponding to the side information is 1 when the current decoding symbol is 0 or 1, scale[1]=alpha[1][ 0]+alpha[1][1];
步骤一、根据原始信源X的长度N,分配解码所需的结点空间,并初始化分布式算术码的解码缓冲区;Step 1. According to the length N of the original source X, allocate the node space required for decoding, and initialize the decoding buffer of the distributed arithmetic code;
步骤二、在第i次解码过程中(0<i≤N),读取边信息当前位置的字符si,Step 2. During the ith decoding process (0<i≤N), read the character s i at the current position of the side information,
步骤三、使用前向算法,对每个结点,计算第i次解码时,产生0分支和1分支的概率,当i=1即在状态处于初始时刻时,局部概率用公式α1(i)=πibi(z1)来计算并用alpha数组记录,当i>1时,局部概率用公式计算,公式中变量αt-1(j)由结构体node内取得,每次计算后,得到本次迭代的alpha[0],alpha[1],scale[0]和scale[1],scale[0]表示当前解码符号为0或1时,边信息对应位置符号为0的总概率,scale[0]=alpha[0][0]+alpha[0][1];scale[1]表示当前解码符号为0或1时,边信息对应位置符号为1的总概率,scale[1]=alpha[1][0]+alpha[1][1];Step 3. Use the forward algorithm to calculate the probability of branch 0 and branch 1 for each node when it is decoded for the i-th time. When i=1, that is, when the state is at the initial moment, the local probability is calculated using the formula α 1 (i )=π i b i (z 1 ) to calculate and record with alpha array, when i>1, the local probability is calculated by the formula Calculation, the variable α t-1 (j) in the formula is obtained from the structure node, and after each calculation, alpha[0], alpha[1], scale[0] and scale[1], scale of this iteration are obtained [0] indicates that when the current decoding symbol is 0 or 1, the total probability that the corresponding position symbol of the side information is 0, scale[0]=alpha[0][0]+alpha[0][1]; scale[1] indicates When the current decoding symbol is 0 or 1, the total probability of side information corresponding to the position symbol being 1, scale[1]=alpha[1][0]+alpha[1][1];
步骤四、计算第i次解码的符号ci,如果码字没有落在重叠区域内,则把当前的符号写入node.path,并将解码符号ci和边信息si进行异或运算,当前点的度量就用到达该点的所有路径的概率之和来表示,也就是scale[0]或者scale[1],如果边信息和当前解码符号相同,当前点的度量就用scale[0]表示,如果边信息和当前解码符号不同,就用scale[1]表示,并将结点中的局部alpha更新为更新整条分支的重要性,对于第k条分支,如果码字落在重叠区域内,则产生两个分支m和n,这时候node[m].metric表示的是与边信息相同的分支的度量,即当边信息为1时,node[m].metric表示1分支的度量,当边信息为0时,node[m].metric表示0分支的度量,而node[n].metric则相反,表示与边信息相反的分支的度量,对于m分支,其存储的符号与边信息相同,此时所以m分支的整体重要性就用node[m].metric+=log(scale[0])表示,即m分支上当前所有点的scale值的乘积,而n分支的整体重要性就用node[n].metric+=log(scale[1])表示,即n分支上当前所有点的scale值的乘积,并用新的概率alpha[0]更新原node点中node[m].alpha,用新的alpha[1]更新原node点中的node[n].alpha;Step 4: Calculate the i-th decoded symbol c i , if the code word does not fall in the overlapping area, write the current symbol into node.path, and perform XOR operation on the decoded symbol c i and the side information s i , The measure of the current point is the sum of the probabilities of all paths to the point To represent, that is, scale[0] or scale[1], if the side information is the same as the current decoding symbol, The metric of the current point is represented by scale[0]. If the side information is different from the current decoding symbol, it is represented by scale[1], and the local alpha in the node is updated as Update the importance of the entire branch, for the kth branch, If the codeword falls in the overlapping area, two branches m and n are generated. At this time, node[m].metric represents the measurement of the same branch as the side information, that is, when the side information is 1, node[m] .metric represents the measurement of branch 1. When the side information is 0, node[m].metric represents the measurement of branch 0, while node[n].metric is the opposite, representing the measurement of the branch opposite to the side information. For m branch , the stored sign is the same as the side information, at this time Therefore, the overall importance of the m branch is represented by node[m].metric+=log(scale[0]), that is, the product of the scale values of all current points on the m branch, and the overall importance of the n branch is represented by node[n ].metric+=log(scale[1]) means that it is the product of the scale values of all current points on the n branch, and update the node[m].alpha in the original node with the new probability alpha[0], and use the new alpha [1] Update node[n].alpha in the original node point;
步骤五、由于解码一次就会产生新的解码区间,所以计算完分支的度量并更新了局部概率后,要更新概率区间和相关的信息;Step 5. Since a new decoding interval will be generated after decoding once, after calculating the branch metric and updating the local probability, the probability interval and related information must be updated;
步骤六、对所有分支按照metric值从大到小进行排序,如果分支数大于我们所设定的阈值M,则将多余的分支剪掉,如果i<N,则返回步骤二;否则,解码结束,将分支排序后的第一条分支中path保存的数据作为解码输出。Step 6. Sort all branches according to the metric value from large to small. If the number of branches is greater than the threshold M we set, cut off the redundant branches. If i<N, return to step 2; otherwise, the decoding ends , take the data saved in the path in the first branch after branch sorting as the decoding output.
下面结合附图及具体实施例对本发明的应用原理作进一步描述。The application principle of the present invention will be further described below in conjunction with the accompanying drawings and specific embodiments.
如图1所示,本发明实施例的基于分布式算术编码的隐马尔科夫相关信源编码方法包括以下步骤:As shown in Figure 1, the hidden Markov correlation source coding method based on distributed arithmetic coding in the embodiment of the present invention includes the following steps:
S101:产生信源;S101: Generate a source;
S102:对隐马尔科夫相关信源编码;S102: Encoding hidden Markov correlation sources;
S103:利用前向算法进行的解码。S103: Decoding by using the forward algorithm.
本发明的原理为:Principle of the present invention is:
A.二进制算术编码:A. Binary arithmetic coding:
设p是二元信源X的偏差概率,即p=P(xt=1)。在经典的算术编码中,信源符号xt被迭代的映射到[0,1)的子区间中,这个子区间的长度与(1-p)和p成比例。由此产生的码长大于X的信息熵,即R≥H(X)。在分布式算术编码[4]中,子区间的长度与修改后的概率(1-p)γ和pγ成比例,其中γ满足公式:Let p be the bias probability of the binary information source X, that is, p=P(x t =1). In classical arithmetic coding, the source symbol x t is iteratively mapped to a subinterval of [0, 1), the length of this subinterval is proportional to (1-p) and p. The resulting code length is greater than the information entropy of X, that is, R≥H(X). In distributed arithmetic coding [4] , the length of the subinterval is proportional to the modified probability (1-p) γ and p γ , where γ satisfies the formula:
产生的码长满足R≥γH(X)≥H(X|Y)。为了适应区间[0,1),子区间必须部分重叠。更具体的说,符号xt=0和xt=1分别对应于区间[0,(1-p)γ)和[1-pγ,1)。引入重叠区域后,将产生一个较大的最终区间和一个较短的码字。由此产生代价是,没有边信息Y,解码器不能准确地解码X。The generated code length satisfies R≥γH(X)≥H(X|Y). To fit in the interval [0, 1), the subintervals must partially overlap. More specifically, the symbols x t =0 and x t =1 correspond to the intervals [0, (1-p) γ ) and [1-p γ , 1), respectively. The introduction of overlapping regions results in a larger final interval and a shorter codeword. The resulting penalty is that without side information Y, the decoder cannot decode X accurately.
为了描述解码的过程,本文定义一个三元符号集{0,χ,1},其中χ表示一个有歧义的解码。定义Cx是码字,是第t个解码的符号,可以得到In order to describe the decoding process, this paper defines a ternary symbol set {0, χ, 1}, where χ represents an ambiguous decoding. Define C x to be a codeword, is the t-th decoded symbol, we can get
当第t个符号被解码的时候,如果该解码器产生分支:两个候选分支产生,对应两个符号xt=0或xt=1。对于每一个新的分支,它的度量值是不断更新的,并且它对应的区间会作为下一次迭代的开始。为了降低复杂性,每次解码完一个符号,解码器使用M-自由度算法来保证至多保留M个重要性最大的分支,而把其他分支都删除掉。When the t-th symbol is decoded, if The decoder generates branches: two candidate branches are generated, corresponding to the two symbols x t =0 or x t =1. For each new branch, its metric value is continuously updated, and its corresponding interval will be used as the start of the next iteration. In order to reduce the complexity, each time a symbol is decoded, the decoder uses the M-degree-of-freedom algorithm to ensure that at most M most important branches are retained, and all other branches are deleted.
需要注意的是,对于一个有限长度序列X,对最后的一些符号进行度量是不可靠的。因此,对最后T个符号的编码,其概率区间不重叠,即采用传统的算术编码。在1≤t≤(N-T)这个范围内,xt被映射到[0,(1-p)γ)和[1-pγ,1)这两个区间中;而对于(N-T+1)≤t≤N这个范围内,xt映射到区间[0,1-p)和[1-p,1)。It should be noted that for a sequence X of finite length, it is unreliable to measure some of the last symbols. Therefore, for the coding of the last T symbols, the probability intervals do not overlap, that is, the traditional arithmetic coding is adopted. In the range of 1≤t≤(NT), x t is mapped to the two intervals [0, (1-p) γ ) and [1-p γ , 1); and for (N-T+1 ) ≤ t ≤ N, x t is mapped to the interval [0, 1-p) and [1-p, 1).
因此,一个二进制DAC系统可以由四个参数来描述:{p,γ,M,T}。Therefore, a binary DAC system can be described by four parameters: {p, γ, M, T}.
隐马尔科夫模型和前向算法:Hidden Markov Model and Forward Algorithm:
设是一个状态序列,是一个观察值序列。用λ=(A,B,π)来定义一个隐马尔科夫过程:Assume is a sequence of states, is a sequence of observations. Use λ=(A, B, π) to define a hidden Markov process:
A={aji}:状态转移概率矩阵,其中aji=P(st=i|st-1=j);A={a ji }: state transition probability matrix, where a ji =P(s t =i|s t-1 =j);
B={bi(k)}:观察概率分布,其中bi(k)=P(zt=k|st=i);B={b i (k)}: observation probability distribution, where b i (k)=P(z t =k|s t =i);
π={πi}:初始状态分布,其中πi=P(s1=i)。π={π i }: initial state distribution, where π i =P(s 1 =i).
前向算法的目的是计算观察序列概率P(z1,...,zt|λ),已知观察值{z1,...,zt}和模型λ。定义αt(i)是观察序列{z1,...,zt}的局部概率,状态st用i表示,即The purpose of the forward algorithm is to calculate the observation sequence probability P(z 1 ,...,z t |λ), the known observations {z 1 ,...,z t } and the model λ. Definition α t (i) is the local probability of observing the sequence {z 1 ,...,z t }, and the state s t is denoted by i, namely
αt(i)=P(Z1,...,ZtSt=i|, (3)α t (i)=P(Z 1 , . . . , Z t S t =i|, (3)
开始时,当t=1时有At the beginning, when t=1 there is
a1(i)=πibi(z1). (4)a 1 (i)=π i b i (z 1 ). (4)
当t>1时,αt(i)可以通过迭代来得出即When t>1, α t (i) can be obtained through iteration, namely
因此,therefore,
在实际应用中,αt(i)通常用下面的公式进行归一化,In practical applications, α t (i) is usually normalized by the following formula,
其中in
在这种情况下,可以得到有In this case, one can get
本发明的具体实施过程为:The concrete implementation process of the present invention is:
第一步、相关信源的产生:The first step, the generation of relevant information sources:
对于两个统计相关的信源X和Y,分布式信源编码通常假设在序列X和序列Y之间有一个虚拟的相关信道,其中将信源X作为信道的输入,信源Y作为信道的输出,通过信道来描述X和Y之间的相关性,而它们之间的相关性可以用条件熵H(X|Y)来衡量,而条件熵H(X|Y)又可以使用隐状态序列与观察值序列来计算获得,在发明中,信源X和边信息Y是通过相关联的,其中Z是由一个使用参数λ的隐马尔科夫模型得到的,X通过一个有四个参数{p,γ,M,T}的DAC编码器进行编码,在编码时,信源X作为待编码的信源,信源Y作为边信息在解码端辅助解码,For two statistically correlated sources X and Y, distributed source coding usually assumes that there is a virtual correlation channel between sequence X and sequence Y, where source X is used as the input of the channel, and source Y is used as the input of the channel The output, through the channel to describe the correlation between X and Y, and the correlation between them can be measured by the conditional entropy H(X|Y), and the conditional entropy H(X|Y) can use the hidden state sequence Obtained by calculating with the observation sequence, in the invention, the source X and the side information Y are obtained by Associated, where Z is obtained by a Hidden Markov Model using parameter λ, X is encoded by a DAC encoder with four parameters {p, γ, M, T}, when encoding, the source X is used as the information source to be encoded, and the information source Y is used as side information to assist decoding at the decoding end.
在发明中,定义了HMM格式的文件,其中包含观察值个数M,状态个数N,状态转移概率矩阵A,观察概率分布B和初始概率分布π,相关信源的产生过程为:In the invention, a file in HMM format is defined, which contains the number of observations M, the number of states N, the state transition probability matrix A, the observation probability distribution B and the initial probability distribution π, and the generation process of the relevant information source is:
(1)读取HMM文件,确定隐马尔科夫模型,(1) Read the HMM file to determine the hidden Markov model,
(2)应用隐马尔科夫模型计算观察值序列O,一个隐马尔科夫模型主要包括两个信息序列,一个是可以直接观察得到的观察值序列,另一个是不可直接获取的隐状态序列,由步骤(1)得到的隐马尔科夫模型,可以计算观察值序列O,(2) Apply the hidden Markov model to calculate the observation value sequence O. A hidden Markov model mainly includes two information sequences, one is the observation value sequence that can be directly observed, and the other is the hidden state sequence that cannot be directly obtained. The hidden Markov model obtained by step (1) can calculate the observation sequence O,
(3)根据偏差概率p,生成信源X,(3) According to the deviation probability p, generate the source X,
(4)将信源X与观察值序列O进行异或运算,生成边信息Y。(4) XOR operation is performed on the information source X and the observation value sequence O to generate side information Y.
第二步、隐马尔科夫相关信源的编码过程:The second step, the encoding process of the hidden Markov correlation source:
设p是二元信源X的偏差概率,即p=P(xt=1),在经典的算术编码中,信源符号xt被迭代的映射到[0,1)的子区间中,这个子区间的长度与(1-p)和p成比例,在分布式算术编码中,分配的子区间会有重叠,即符号xt=0和xt=1分别对应于区间[0,(1-p)γ)和[1-pγ,1),需要注意的是,对于一个有限长度序列X[5],对最后的一些符号进行度量是不可靠的,因此,对最后T个符号的编码,其概率区间不重叠,即采用传统的算术编码,在1≤t≤(N-T)这个范围内,xt被映射到[0,(1-p)γ)和[1-pγ,1)这两个区间中;而对于(N-T+1)≤t≤N这个范围内,xt映射到区间[0,1-p)和[1-p,1),Let p be the bias probability of the binary source X, that is, p=P(x t =1), in classical arithmetic coding, the source symbol x t is iteratively mapped to the subinterval of [0, 1), The length of this subinterval is proportional to (1-p) and p. In distributed arithmetic coding, the allocated subintervals will overlap, that is, the symbols x t =0 and x t =1 correspond to the interval [0, ( 1-p) γ ) and [1-p γ , 1), it should be noted that for a finite-length sequence X [5] , it is unreliable to measure the last symbols, so the last T symbols The coding of , its probability intervals do not overlap, that is, using traditional arithmetic coding, in the range of 1≤t≤(NT), x t is mapped to [0, (1-p) γ ) and [1-p γ , 1) In these two intervals; and for the range of (N-T+1)≤t≤N, x t is mapped to the interval [0, 1-p) and [1-p, 1),
在编码的具体实现过程中,子区间的划分还要注意上溢和下溢的问题,定义low和high表示编码区间,range表示整个区间长度,half_range表示区间的一半,first_quarter表示整个区间的四分之一大小,third_quarter表示整个区间的四分之三大小,在对区间进行更新后,子区间的大小会不断收缩,如果不进行合适的处理会导致区间越来越小,当区间过小时就会超出计算机能够处理的精度范围从而溢出,为了不出现这种情况,必须对区间端点low和high进行比例放大操作,以使计算机能够保持足够的精度,具体的区间放大操作如下:In the specific implementation process of encoding, the division of sub-intervals should also pay attention to the problem of overflow and underflow. Define low and high to represent the encoding interval, range to represent the length of the entire interval, half_range to represent half of the interval, and first_quarter to represent the quarter of the entire interval One size, third_quarter means three-quarters of the entire interval, after the interval is updated, the size of the sub-interval will continue to shrink, if not properly processed, the interval will become smaller and smaller, when the interval is too small, it will Exceeding the range of accuracy that the computer can handle and overflowing, in order to avoid this situation, the low and high endpoints of the interval must be scaled up so that the computer can maintain sufficient accuracy. The specific interval enlargement operation is as follows:
(1)当high<half_range时,这时子区间完全处在上半区,该区间的上下端点的最高有效位都是0,此时输出相同的最高有效位0,然后将区间的上界和下界分别扩大2倍,即:low=2*low,high=2*high,(1) When high<half_range, at this time the sub-interval is completely in the upper half area, and the most significant bits of the upper and lower endpoints of the interval are both 0, and the same most significant bit 0 is output at this time, and then the upper bound of the interval and The lower bounds are expanded by 2 times, namely: low=2*low, high=2*high,
(2)当low>half_range时,这时子区间完全处在下半区,该区间的上下端点的最高有效位都是1,此时输出相同的最高有效位1,然后将区间的上界和下界分别扩大,即:low=2*(low-half_range),high=2*(high-half_range),(2) When low>half_range, the sub-interval is completely in the lower half of the range, and the most significant bits of the upper and lower endpoints of the range are both 1. At this time, the same most significant bit of 1 is output, and then the upper and lower bounds of the interval are Expand respectively, namely: low=2*(low-half_range), high=2*(high-half_range),
(3)当first_quarter≤low≤high≤third_quarter,如果还是按照上述的方法对low和high进行更新,由于子区间不断缩小,所以子区间总是会处在first_quarter≤low≤high≤third_quarter这个区间里,而始终没有相同的最高比特位可以输出,所以在这种情况下就用新的方法进行子区间更新:low=2*(low-first_quarter),high=2*(high-first_quarter)+1。(3) When first_quarter≤low≤high≤third_quarter, if the low and high are still updated according to the above method, since the sub-interval keeps shrinking, the sub-interval will always be in the interval of first_quarter≤low≤high≤third_quarter, However, there is always no same highest bit to output, so in this case, a new method is used to update the sub-interval: low=2*(low-first_quarter), high=2*(high-first_quarter)+1.
第三步、利用前向算法进行的解码过程:The third step is the decoding process using the forward algorithm:
假设二进制信源X和边信息Y通过相关联,其中Z是由一个使用参数λ的隐马尔科夫模型得到的,解码符号和边信息yt取异或得到的zt,即符合隐马尔科夫模型,可以使用前向算法计算其局部概率,解码的过程和在[4]中描述的非常相似,本发明的创新之处就是在解码的过程中,每条分支的度量用P(z1,...,zt|λ)来表示,Suppose binary source X and side information Y pass associated, where Z is obtained by a Hidden Markov Model using parameter λ, and the decoded symbol z t obtained by taking XOR with side information y t , namely Conforms to the Hidden Markov Model, and can use the forward algorithm to calculate its local probability. The decoding process is very similar to that described in [4]. The innovation of the present invention is that in the decoding process, the measurement of each branch is used P(z 1 ,..., z t |λ) to represent,
本发明提出的解码方法主要利用了前向算法,前向算法的具体实施过程如下:The decoding method proposed by the present invention mainly utilizes the forward algorithm, and the specific implementation process of the forward algorithm is as follows:
本发明在实施过程中,定义了一个结构体node,结构体包括low,high,用来表示概率区间,metric用来表示每个分支的重要程度即分支中各个结点度量的总和,alpha用来记录局部概率αt(i),path用来保存解码出的字符,另外定义数组scale[2],其中scale[0]表示当前解码符号为0或1时,边信息对应位置符号为0的总概率,scale[0]=alpha[0][0]+alpha[0][1];scale[1]表示当前解码符号为0或1时,边信息对应位置符号为1的总概率,scale[1]=alpha[1][0]+alpha[1][1],利用前向算法进行DAC解码的过程为:In the implementation process of the present invention, a structure node is defined, and the structure includes low and high, which are used to represent the probability interval, metric is used to represent the importance of each branch, that is, the sum of the metrics of each node in the branch, and alpha is used to Record the local probability α t (i), path is used to save the decoded characters, and define an array scale[2], where scale[0] indicates that when the current decoding symbol is 0 or 1, the side information corresponding to the position symbol is 0. Probability, scale[0]=alpha[0][0]+alpha[0][1]; scale[1] indicates the total probability that the position symbol corresponding to side information is 1 when the current decoding symbol is 0 or 1, scale[ 1]=alpha[1][0]+alpha[1][1], the process of DAC decoding using the forward algorithm is:
(1)根据原始信源X的长度N,分配解码所需的结点空间,并初始化分布式算术码的解码缓冲区,(1) According to the length N of the original source X, allocate the node space required for decoding, and initialize the decoding buffer of the distributed arithmetic code,
(2)在第i次解码过程中(0<i≤N),读取边信息当前位置的字符si,(2) During the i-time decoding process (0<i≤N), read the character si at the current position of the side information,
(3)使用前向算法,对每个结点,计算第i次解码时,产生0分支和1分支的概率,当i=1即在状态处于初始时刻时,局部概率用公式(4)来计算并用alpha数组记录,当i>1时,局部概率用公式(5)计算,公式中变量αt-1(j)由结构体node内取得,每次计算后,得到本次迭代的alpha[0],alpha[1],scale[0]和scale[1],scale[0]表示当前解码符号为0或1时,边信息对应位置符号为0的总概率,scale[0]=alpha[0][0]+alpha[0][1];scale[1]表示当前解码符号为0或1时,边信息对应位置符号为1的总概率,scale[1]=alpha[1][0]+alpha[1][1],(3) Using the forward algorithm, for each node, calculate the probability of branch 0 and branch 1 when it is decoded for the i-th time. When i=1, that is, when the state is at the initial moment, the local probability is calculated by formula (4) Calculate and record with the alpha array. When i>1, the local probability is calculated with the formula (5). The variable α t-1 (j) in the formula is obtained from the structure node. After each calculation, the alpha of this iteration is obtained[ 0], alpha[1], scale[0] and scale[1], scale[0] indicates the total probability that the position symbol corresponding to the side information is 0 when the current decoding symbol is 0 or 1, scale[0]=alpha[ 0][0]+alpha[0][1]; scale[1] indicates the total probability that the position symbol corresponding to the side information is 1 when the current decoding symbol is 0 or 1, scale[1]=alpha[1][0 ]+alpha[1][1],
(4)计算第i次解码的符号ci,如果码字没有落在重叠区域内,则把当前的符号写入node.path,并将解码符号ci和边信息si进行异或运算,当前点的度量就用到达该点的所有路径的概率之和来表示,也就是scale[0]或者scale[1],如果边信息和当前解码符号相同,当前点的度量就用scale[0]表示,如果边信息和当前解码符号不同,就用scale[1]表示,并将结点中的局部alpha更新为更新整条分支的重要性,对于第k条分支,如果码字落在重叠区域内,则产生两个分支m和n,这时候node[m].metric表示的是与边信息相同的分支的度量,即当边信息为1时,node[m].metric表示1分支的度量,当边信息为0时,node[m].metric表示0分支的度量,而node[n].metric则相反,表示与边信息相反的分支的度量,对于m分支,其存储的符号与边信息相同,此时所以m分支的整体重要性就用node[m].metric+=log(scale[0])表示,即m分支上当前所有点的scale值的乘积,而n分支的整体重要性就用node[n].metric+=log(scale[1])表示,即n分支上当前所有点的scale值的乘积,并用新的概率alpha[0]更新原node点中node[m].alpha,用新的alpha[1]更新原node点中的node[n].alpha,(4) Calculate the i-th decoding symbol c i , if the code word does not fall in the overlapping area, write the current symbol into node.path, and perform XOR operation on the decoded symbol c i and side information s i , The measure of the current point is the sum of the probabilities of all paths to the point To represent, that is, scale[0] or scale[1], if the side information is the same as the current decoding symbol, The metric of the current point is represented by scale[0]. If the side information is different from the current decoding symbol, it is represented by scale[1], and the local alpha in the node is updated as Update the importance of the entire branch, for the kth branch, If the codeword falls in the overlapping area, two branches m and n are generated. At this time, node[m].metric represents the measurement of the same branch as the side information, that is, when the side information is 1, node[m] .metric represents the measurement of branch 1. When the side information is 0, node[m].metric represents the measurement of branch 0, while node[n].metric is the opposite, representing the measurement of the branch opposite to the side information. For m branch , the stored sign is the same as the side information, at this time Therefore, the overall importance of the m branch is represented by node[m].metric+=log(scale[0]), that is, the product of the scale values of all current points on the m branch, and the overall importance of the n branch is represented by node[n ].metric+=log(scale[1]) means that it is the product of the scale values of all current points on the n branch, and update the node[m].alpha in the original node with the new probability alpha[0], and use the new alpha [1] Update node[n].alpha in the original node point,
(5)由于解码一次就会产生新的解码区间,所以计算完分支的度量并更新了局部概率后,要更新概率区间和相关的信息,(5) Since a new decoding interval will be generated after decoding once, after calculating the branch metric and updating the local probability, the probability interval and related information must be updated.
(6)对所有分支按照metric值从大到小进行排序,如果分支数大于我们所设定的阈值M,则将多余的分支剪掉,如果i<N,则返回步骤(2);否则,解码结束,将分支排序后的第一条分支中path保存的数据作为解码输出。(6) Sort all branches according to the metric value from large to small, if the number of branches is greater than the threshold M we set, cut off the redundant branches, if i<N, return to step (2); otherwise, After the decoding is completed, the data saved in the path in the first branch after branch sorting is used as the decoding output.
如图2所示的一个的解码树,假设边信息是10111,每条分支的度量都是用当前分支上所有点的scale值的乘积来表示的,比如分支1,那么分支1的度量就是其中5个点的scale值的乘积,即metric=scale[1]*scale[0]*scale[0]*scale[0]*scale[1],而每个点的scale是根据局部概率alpha来计算的,并且局部概率是不断更新的,计算方法scale[0]=alpha[0][0]+alpha[0][1],scale[1]=alpha[1][0]+alpha[1][1],alpha的计算方法是根据前面介绍的前向算法来计算的,可以由公式(4)和(5)计算得到,A decoding tree as shown in Figure 2, assuming that the side information is 10111, the metric of each branch is represented by the product of the scale values of all points on the current branch, such as branch 1, then the metric of branch 1 is The product of the scale value of 5 points, that is, metric=scale[1]*scale[0]*scale[0]*scale[0]*scale[1], and the scale of each point is calculated according to the local probability alpha , and the local probability is constantly updated, the calculation method scale[0]=alpha[0][0]+alpha[0][1], scale[1]=alpha[1][0]+alpha[1] [1], the calculation method of alpha is calculated according to the forward algorithm introduced earlier, which can be calculated by formulas (4) and (5),
本发明的实验结果:Experimental result of the present invention:
实验中验证了一个16位的DAC编解码器系统,X的偏差概率是p=0.5,设置M=2048和T=15,实验中模拟了与[9]中相同的两个状态(0和1)和两个输出信源(0和1)(见表1),每次实验中所用的数据块长度是N=1024,A 16-bit DAC codec system is verified in the experiment, the deviation probability of X is p=0.5, M=2048 and T=15 are set, and the same two states (0 and 1 ) and two output sources (0 and 1) (see Table 1), the data block length used in each experiment is N=1024,
为了实现无损压缩,每次实验都从γ=H(X|Y)开始(见表2),如果解码失败,以0.01的幅度增加γ,这个过程一直迭代到解码成功,对于每个模型,运行了100次试验,100次试验结果的平均值记录在表2中,In order to achieve lossless compression, each experiment starts from γ=H(X|Y) (see Table 2). If the decoding fails, increase γ by 0.01. This process is iterated until the decoding is successful. For each model, run 100 tests were carried out, and the average value of the 100 test results was recorded in Table 2,
表1仿真模型Table 1 Simulation model
        
表2实验结果Table 2 Experimental results
为了进行比较,表2中也记录了有着相同设置的[9]的实验结果,在[9]的每次试验中,N=16384的信源符号用LDPC码来进行编码,另外,为了要与隐马尔科夫模型同步,Nα个原始信源符号在没有压缩的情况下直接被送到解码器,For comparison, the experimental results of [9] with the same settings are also recorded in Table 2. In each experiment of [9], the source symbols of N = 16384 are encoded with LDPC codes. In addition, in order to compare with Hidden Markov model synchronization, Nα original source symbols are sent directly to the decoder without compression,
实验结果表明DAC的执行结果与基于LDPC的方法[9]的执行结果相似或者略好一点(模型1和模型2),此外,对于隐马尔科夫相关,DAC在两个方面优于以LDPC为基础的方法:Experimental results show that the performance of DAC is similar to or slightly better than that of the LDPC-based method [9] (Model 1 and Model 2). In addition, for hidden Markov correlation, DAC is better than LDPC-based methods in two aspects. Basic method:
(1)基于LDPC的方法需要较长的码字来达到较好的执行结果,而DAC对码字的长度并不敏感[4],(1) The LDPC-based method requires a longer codeword to achieve better execution results, while DAC is not sensitive to the length of the codeword [4] ,
(2)为了与隐马尔科夫模型保持同步,基于LDPC的方法必须发送一定比例的原始信号到解码端作为种子,然而,很难确定种子的最佳比例α,在[9]中报告的结果是通过穷举搜索得到,这限制了它在实际中的应用。(2) In order to keep pace with the Hidden Markov Model, LDPC-based methods must send a certain proportion of the original signal to the decoder as a seed, however, it is difficult to determine the optimal proportion α of the seed, the results reported in [9] is obtained through an exhaustive search, which limits its practical application.
以上所述仅为本发明的较佳实施例而已,并不用以限制本发明,凡在本发明的精神和原则之内所作的任何修改、等同替换和改进等,均应包含在本发明的保护范围之内。The above descriptions are only preferred embodiments of the present invention, and are not intended to limit the present invention. Any modifications, equivalent replacements and improvements made within the spirit and principles of the present invention should be included in the protection of the present invention. within range.
Claims (2)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title | 
|---|---|---|---|
| CN201310130164.1A CN103326731B (en) | 2013-04-16 | 2013-04-16 | A kind of Hidden Markov correlated source coded method encoded based on distributed arithmetic | 
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title | 
|---|---|---|---|
| CN201310130164.1A CN103326731B (en) | 2013-04-16 | 2013-04-16 | A kind of Hidden Markov correlated source coded method encoded based on distributed arithmetic | 
Publications (2)
| Publication Number | Publication Date | 
|---|---|
| CN103326731A CN103326731A (en) | 2013-09-25 | 
| CN103326731B true CN103326731B (en) | 2017-03-29 | 
Family
ID=49195296
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date | 
|---|---|---|---|
| CN201310130164.1A Expired - Fee Related CN103326731B (en) | 2013-04-16 | 2013-04-16 | A kind of Hidden Markov correlated source coded method encoded based on distributed arithmetic | 
Country Status (1)
| Country | Link | 
|---|---|
| CN (1) | CN103326731B (en) | 
Families Citing this family (2)
| Publication number | Priority date | Publication date | Assignee | Title | 
|---|---|---|---|---|
| WO2016119726A1 (en) | 2015-01-30 | 2016-08-04 | Mediatek Inc. | Method and apparatus for entropy coding of source samples with large alphabet | 
| CN107294656B (en) * | 2017-06-06 | 2021-02-02 | 长安大学 | A Depth-First-Based Distributed Arithmetic Code Decoding Method | 
Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title | 
|---|---|---|---|---|
| US5608840A (en) * | 1992-06-03 | 1997-03-04 | Matsushita Electric Industrial Co., Ltd. | Method and apparatus for pattern recognition employing the hidden markov model | 
| CN101425184A (en) * | 2008-10-30 | 2009-05-06 | 西安电子科技大学 | Image segmentation method based on second generation Bandelet concealed Markov tree model | 
- 
        2013
        - 2013-04-16 CN CN201310130164.1A patent/CN103326731B/en not_active Expired - Fee Related
 
Patent Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title | 
|---|---|---|---|---|
| US5608840A (en) * | 1992-06-03 | 1997-03-04 | Matsushita Electric Industrial Co., Ltd. | Method and apparatus for pattern recognition employing the hidden markov model | 
| CN101425184A (en) * | 2008-10-30 | 2009-05-06 | 西安电子科技大学 | Image segmentation method based on second generation Bandelet concealed Markov tree model | 
Non-Patent Citations (2)
| Title | 
|---|
| Distribution of Distributed Arithmetic Codewords for Equiprobable Binary Sources;方勇;《IEEE SIGNAL PROCESSING LETTERS》;20091230;全文 * | 
| 基于隐马尔可夫模型的分布式算术编码研究;王东;《中国优秀硕士学位论文全文数据库》;20120415;第26页,图3-5 * | 
Also Published As
| Publication number | Publication date | 
|---|---|
| CN103326731A (en) | 2013-09-25 | 
Similar Documents
| Publication | Publication Date | Title | 
|---|---|---|
| CN101944067B (en) | Data storage method and storage system | |
| CN101656541B (en) | Coding method and device of RS codes | |
| CN106209113A (en) | A kind of decoding method of polarization code | |
| CN112582030B (en) | A Text Storage Method Based on DNA Storage Medium | |
| CN104683072A (en) | A Blind Identification Method of Parameters of Component Encoders with Punctured Turbo Codes | |
| CN106656198B (en) | A coding method based on LZ77 | |
| CN110661533A (en) | A Method for Optimizing the Decoding Performance of Decoder Stored Polar Codes | |
| CN115088038A (en) | Improved quality value compression framework in aligned sequencing data based on new context | |
| CN105553485A (en) | FPGA-based BCH encoding and decoding device and encoding and decoding method thereof | |
| US20240095543A1 (en) | Reconstruction of information stored in a dna stroage system | |
| KR20230040702A (en) | Method and apparatus for generating a decoding position control signal for decoding using polar codes | |
| CN102255617B (en) | Storage method of Huffman tree and method of decoding data by using array | |
| Safieh et al. | Efficient VLSI architecture for the parallel dictionary LZW data compression algorithm | |
| CN103326731B (en) | A kind of Hidden Markov correlated source coded method encoded based on distributed arithmetic | |
| Tang et al. | Error-correcting codes for short tandem duplication and edit errors | |
| JP2023132713A (en) | Data expansion device, memory system, and data expansion method | |
| CN106559085A (en) | A kind of normal form Hafman decoding method and its device | |
| CN113556134A (en) | Polar code puncturing encoder and encoding method suitable for simplifying serial offset decoding | |
| CN101630999A (en) | Fountain encoding and decoding method for forward error correction of binary erasure channel | |
| CN103957016A (en) | A Low Storage Capacity Turbo Code Decoder and Its Design Method | |
| Nishad et al. | Behavioral study of data structures on Lempel Ziv Welch (LZW) data compression algorithm and its computational complexity | |
| CN112416431B (en) | Source code segment pair comparison method based on coding sequence representation | |
| Yuan et al. | An adaptive ECC scheme for dynamic protection of NAND Flash memories | |
| WO2018219031A1 (en) | Polar code processing method, decoder and terminal | |
| Kouzani et al. | Multilabel classification by bch code and random forests | 
Legal Events
| Date | Code | Title | Description | 
|---|---|---|---|
| C06 | Publication | ||
| PB01 | Publication | ||
| C10 | Entry into substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant | ||
| GR01 | Patent grant | ||
| CF01 | Termination of patent right due to non-payment of annual fee | ||
| CF01 | Termination of patent right due to non-payment of annual fee | Granted publication date: 20170329 Termination date: 20180416 |