[go: up one dir, main page]

CN101667843B - Methods and devices for compressing and uncompressing data of embedded system - Google Patents

Methods and devices for compressing and uncompressing data of embedded system Download PDF

Info

Publication number
CN101667843B
CN101667843B CN2009100938637A CN200910093863A CN101667843B CN 101667843 B CN101667843 B CN 101667843B CN 2009100938637 A CN2009100938637 A CN 2009100938637A CN 200910093863 A CN200910093863 A CN 200910093863A CN 101667843 B CN101667843 B CN 101667843B
Authority
CN
China
Prior art keywords
byte string
byte
same
string
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN2009100938637A
Other languages
Chinese (zh)
Other versions
CN101667843A (en
Inventor
杨河山
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
ZTE Corp
Original Assignee
ZTE Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ZTE Corp filed Critical ZTE Corp
Priority to CN2009100938637A priority Critical patent/CN101667843B/en
Publication of CN101667843A publication Critical patent/CN101667843A/en
Application granted granted Critical
Publication of CN101667843B publication Critical patent/CN101667843B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

本发明公开了一种嵌入式系统的数据压缩方法,包括:确定出待压缩数据中相同的字节串,相同的字节串中除第一个字节串之外,使用与相同字节串的距离及相同字节的数目来代替所述相同的字节串。本发明同时公开了一种嵌入式系统的数据解压缩方法,包括:压缩数据中包含有压缩标识时,获取代替字节串的距离及字节数目,并在已解压的数据中所述距离处获取出所述字节数目的字节,填入当前的解压数据中。本发明同时公开了实现前述方法的装置。本发明尤其适合字节流传输的嵌入式系统,而嵌入式系统中的字节流传输的重复字节串又比较多,因此,数据压缩率比较高,从而使嵌入式系统的CPU占用率处于较低的水平。

The invention discloses a data compression method for an embedded system, which includes: determining the same byte string in the data to be compressed, using the same byte string except the first byte string distance and the number of identical bytes to replace the identical byte strings. The invention also discloses a data decompression method for an embedded system, which includes: when the compressed data contains a compressed identifier, obtain the distance and the number of bytes to replace the byte string, and obtain the distance and the number of bytes in the decompressed data at the distance Obtain the bytes of the number of bytes and fill them into the current decompressed data. The invention also discloses a device for realizing the aforementioned method. The present invention is especially suitable for the embedded system of byte stream transmission, and the repeated byte string of byte stream transmission in the embedded system is more again, therefore, data compression ratio is higher, thereby makes the CPU occupancy rate of embedded system be in lower level.

Description

嵌入式系统的数据压缩、及解压缩方法与装置Data compression and decompression method and device for embedded system

技术领域 technical field

本发明涉及数据压缩技术,尤其涉及一种嵌入式系统的数据压缩方法与装置,以及解压缩方法与装置。The invention relates to data compression technology, in particular to a data compression method and device for an embedded system, and a decompression method and device.

背景技术 Background technique

在嵌入式系统数据链路传输数据的过程中,当数据流量非常大时,往往会导致链路的阻塞,同时会对数据上报的单板产生非常大的影响,使系统容量大大降低,甚至会导致单板复位。例如在码分多址(CDMA,Code Division MultipleAccess)系统的EVDO信道板日志上报时,当数据量较大时,将会存在大量的数据丢包现象,且当数据上报选项有多个时,由于链路的堵塞,很容易导致单板复位,给系统带来很多风险,同时给局外人员定位问题也增加了难度。为了能够准确定位问题,实现多个数据项的同时上报,并将丢包率降到最低,提出了在日志上报时,先对其进行数据压缩,在接收端再进行数据解压。从而减少了丢包问题,避免了数据传输链路的拥塞。由于传统的压缩方法,都是基于文件进行压缩的,并且都是在PC上进行压缩的,不用考虑CPU的占用率,但在嵌入式系统中,所有的数据都是以字节流的形式出现的,没有办法以文件的方式进行压缩存储,只能压缩缓冲数据流,而且由于数据的传输是一个不间断的过程,需要考虑到CPU的占用率、内存利用率,同时也需要考虑到压缩率。目前,尚未有专门针对嵌入式系统的较合适的数据压缩的技术方案。In the process of data link transmission in the embedded system, when the data flow is very large, it will often lead to blockage of the link, and at the same time, it will have a very large impact on the single board for data reporting, greatly reducing the system capacity, and even cause the board to reset. For example, when reporting logs on an EVDO channel board in a Code Division Multiple Access (CDMA, Code Division Multiple Access) system, when the amount of data is large, there will be a large number of data packet loss phenomena, and when there are multiple data reporting options, due to The blockage of the link can easily lead to the reset of the board, which brings a lot of risks to the system, and also increases the difficulty for outsiders to locate the problem. In order to accurately locate the problem, realize the simultaneous reporting of multiple data items, and minimize the packet loss rate, it is proposed that when the log is reported, the data is compressed first, and then the data is decompressed at the receiving end. Thereby, the packet loss problem is reduced, and the congestion of the data transmission link is avoided. Since the traditional compression methods are all based on file compression, and are all compressed on the PC, regardless of the CPU usage, but in the embedded system, all data appears in the form of byte streams Yes, there is no way to compress and store in the form of files, only the buffered data stream can be compressed, and since data transmission is an uninterrupted process, it is necessary to consider the CPU occupancy rate, memory utilization rate, and also the compression rate . At present, there is no suitable technical solution for data compression specifically for embedded systems.

发明内容 Contents of the invention

有鉴于此,本发明的主要目的在于提供一种嵌入式系统的数据压缩方法与装置,以及解压缩方法与装置,能对嵌入式系统中的字节流进行很好地压缩,利于数据传输。In view of this, the main purpose of the present invention is to provide a data compression method and device for an embedded system, and a decompression method and device, which can well compress the byte stream in the embedded system and facilitate data transmission.

为达到上述目的,本发明的技术方案是这样实现的:In order to achieve the above object, technical solution of the present invention is achieved in that way:

一种嵌入式系统的数据压缩方法,包括:A data compression method for an embedded system, comprising:

确定出待压缩数据中相同的字节串,相同的字节串中除第一个字节串之外,使用与相同字节串的距离及相同字节的数目来代替所述相同的字节串。Determine the same byte string in the data to be compressed, in the same byte string except the first byte string, use the distance from the same byte string and the number of the same byte to replace the same byte string.

优选地,所述方法还包括:Preferably, the method also includes:

传输数据时,为未压缩数据及压缩数据分别设置标识位。When transmitting data, set flags for uncompressed data and compressed data respectively.

优选地,所述与相同字节串的距离及相同字节的数目使用伽马Gamma编码方式进行编码。Preferably, the distance from the identical byte string and the number of identical bytes are encoded using Gamma encoding.

优选地,所述确定出待压缩数据中相同的字节串,包括:Preferably, said determining the same byte string in the data to be compressed includes:

所设置的滑动窗从待压缩数据的首字节起逐字节地滑动,根据每次滑动后所述滑动窗中的字节串确定所述字节串在类表数组中的地址,判断所述地址中存储的数据是否为零,若为零则之前没有与所述字节串相同的字节串,不为零时,则之前有与所述字节串相同的字节串,则继续滑动所述滑动窗,直到确定出与之前字节串相同的最长字节串,并确定出与之前相同的字节串的距离。The set sliding window slides byte by byte from the first byte of the data to be compressed, determines the address of the byte string in the class table array according to the byte string in the sliding window after each slide, and judges the Whether the data stored in the above address is zero, if it is zero, there is no byte string identical to the byte string before, if it is not zero, then there is a byte string identical to the byte string before, then continue The sliding window is slid until the longest byte string identical to the previous byte string is determined, and the distance to the same byte string is determined.

优选地,所述确定出待压缩数据中相同的字节串,包括:Preferably, said determining the same byte string in the data to be compressed includes:

所设置的滑动窗从待压缩数据的首字节起逐字节地滑动,根据每次滑动后所述滑动窗中的字节串确定所述字节串在类表数组中的地址,判断所述地址中存储的数据是否为零,若为零则之前没有与所述字节串相同的字节串,不为零时,则确定出与所述字节串相同且距离最近的字节串,并继续滑动所述滑动窗,判断与前一滑动窗中所述字节串相同且距离最近的字节串的后一滑动窗中字节是否与当前的滑动窗中的字节串相同,确定出字节串相同的最长字节串,并判断字节串相同的最长字节串的字节数是否达到设定阈值,未达到时则判断是否还存在与第一次字节串相同的滑动窗中的字节串相同的其他字节串,直到确定出的字节串相同的最长字节串的字节数达到设定阈值;或确定不存在与第一次字节串相同的滑动窗中的字节串相同的其他字节串,并以所确定的字节串相同的最长字节串作为最终的相同的字节串。The set sliding window slides byte by byte from the first byte of the data to be compressed, determines the address of the byte string in the class table array according to the byte string in the sliding window after each slide, and judges the Whether the data stored in the above address is zero, if it is zero, there is no byte string identical to the byte string before, if it is not zero, then determine the byte string that is the same as the byte string and the closest distance , and continue to slide the sliding window, and judge whether the byte in the next sliding window of the byte string that is the same as the byte string in the previous sliding window and the nearest byte string is the same as the byte string in the current sliding window, Determine the longest byte string with the same byte string, and judge whether the number of bytes in the longest byte string with the same byte string reaches the set threshold, and if it does not reach, judge whether there is still the same byte string as the first byte string Other byte strings identical to the byte strings in the same sliding window until the number of bytes of the longest byte string identical to the determined byte string reaches the set threshold; or it is determined that there is no byte string identical to the first byte string The byte strings in the same sliding window are the same as other byte strings, and the determined longest byte string is the same as the final identical byte string.

优选地,所述方法还包括:Preferably, the method also includes:

确定所述滑动窗中的字节串与之前字节串不同时,将所述字节串存储于索引链表的节点中,记录相同字节串的偏移量为零,记录所述字节串在类表数组中对应地址存储的数据为存储所述字节串的节点在所述索引链表中顺序值;确定所述滑动窗中的字节串与之前字节串相同时,将所述字节串存储于索引链表的节点中,所述字节串与当前最近的相同字节串的偏移量作为所述字节串的偏移量,记录所述字节串在类表数组中对应地址存储的数据为存储所述字节串的节点在所述索引链表中顺序值。When determining that the byte string in the sliding window is different from the previous byte string, store the byte string in the node of the index linked list, record the offset of the same byte string as zero, and record the byte string The data stored in the corresponding address in the class table array is the sequence value of the node storing the byte string in the index linked list; when it is determined that the byte string in the sliding window is the same as the previous byte string, the word The byte string is stored in the node of the index linked list, the offset between the byte string and the current closest same byte string is used as the offset of the byte string, and the corresponding byte string is recorded in the class list array The data stored by the address is the sequence value of the node storing the byte string in the index linked list.

一种嵌入式系统的数据解压缩方法,包括:A data decompression method for an embedded system, comprising:

压缩数据中包含有压缩标识时,获取代替字节串的距离及字节数目,并在已解压的数据中所述距离处获取出所述字节数目的字节,填入当前的解压数据中。When the compressed data contains a compression mark, obtain the distance and the number of bytes that replace the byte string, and obtain the bytes with the number of bytes at the distance in the decompressed data, and fill them in the current decompressed data .

优选地,所述获取代替字节串的距离及字节数目,包括:Preferably, said obtaining the distance and the number of bytes to replace the byte string includes:

使用Gamma编码方式对压缩数据中代替字节串的距离及字节数目进行解码而获取所述距离及所述字节数目。Gamma encoding is used to decode the distance and the number of bytes replacing the byte string in the compressed data to obtain the distance and the number of bytes.

一种嵌入式系统的数据压缩装置,包括:A data compression device for an embedded system, comprising:

确定单元,用于确定出待压缩数据中相同的字节串;以及a determining unit, configured to determine the same byte string in the data to be compressed; and

压缩单元,用于对相同的字节串中除第一个字节串之外,使用与相同字节串的距离及相同字节的数目来代替所述相同的字节串。The compression unit is used to replace the same byte string by using the distance from the same byte string and the number of the same byte except for the first byte string in the same byte string.

优选地,所述装置还包括:Preferably, the device also includes:

设置单元,用于传输数据时,为未压缩数据及压缩数据分别设置标识位。The setting unit is used to set the identification bits for the uncompressed data and the compressed data respectively when transmitting data.

优选地,所述装置还包括:Preferably, the device also includes:

编码单元,用于对与相同字节串的距离及相同字节的数目使用Gamma编码方式进行编码。The encoding unit is used to encode the distance from the identical byte string and the number of identical bytes using Gamma encoding.

优选地,所述确定单元确定出待压缩数据中相同的字节串,包括:Preferably, the determination unit determines the same byte string in the data to be compressed, including:

所设置的滑动窗从待压缩数据的首字节起逐字节地滑动,根据每次滑动后所述滑动窗中的字节串确定所述字节串在类表数组中的地址,判断所述地址中存储的数据是否为零,若为零则之前没有与所述字节串相同的字节串,不为零时,则之前有与所述字节串相同的字节串,则继续滑动所述滑动窗,直到确定出与之前字节串相同的最长字节串,并确定出与之前相同的字节串的距离。The set sliding window slides byte by byte from the first byte of the data to be compressed, determines the address of the byte string in the class table array according to the byte string in the sliding window after each slide, and judges the Whether the data stored in the above address is zero, if it is zero, there is no byte string identical to the byte string before, if it is not zero, then there is a byte string identical to the byte string before, then continue The sliding window is slid until the longest byte string identical to the previous byte string is determined, and the distance to the same byte string is determined.

优选地,所述确定单元确定出待压缩数据中相同的字节串,包括:Preferably, the determination unit determines the same byte string in the data to be compressed, including:

所设置的滑动窗从待压缩数据的首字节起逐字节地滑动,根据每次滑动后所述滑动窗中的字节串确定所述字节串在类表数组中的地址,判断所述地址中存储的数据是否为零,若为零则之前没有与所述字节串相同的字节串,不为零时,则确定出与所述字节串相同且距离最近的字节串,并继续滑动所述滑动窗,判断与前一滑动窗中所述字节串相同且距离最近的字节串的后一滑动窗中字节是否与当前的滑动窗中的字节串相同,确定出字节串相同的最长字节串,并判断字节串相同的最长字节串的字节数是否达到设定阈值,未达到时则判断是否还存在与第一次字节串相同的滑动窗中的字节串相同的其他字节串,直到确定出的字节串相同的最长字节串的字节数达到设定阈值;或确定不存在与第一次字节串相同的滑动窗中的字节串相同的其他字节串,并以所确定的字节串相同的最长字节串作为最终的相同的字节串。The set sliding window slides byte by byte from the first byte of the data to be compressed, determines the address of the byte string in the class table array according to the byte string in the sliding window after each slide, and judges the Whether the data stored in the above address is zero, if it is zero, there is no byte string identical to the byte string before, if it is not zero, then determine the byte string that is the same as the byte string and the closest distance , and continue to slide the sliding window, and judge whether the byte in the next sliding window of the byte string that is the same as the byte string in the previous sliding window and the nearest byte string is the same as the byte string in the current sliding window, Determine the longest byte string with the same byte string, and judge whether the number of bytes in the longest byte string with the same byte string reaches the set threshold, and if it does not reach, judge whether there is still the same byte string as the first byte string Other byte strings identical to the byte strings in the same sliding window until the number of bytes of the longest byte string identical to the determined byte string reaches the set threshold; or it is determined that there is no byte string identical to the first byte string The byte strings in the same sliding window are the same as other byte strings, and the determined longest byte string is the same as the final identical byte string.

一种嵌入式系统的数据解压缩装置,包括:A data decompression device for an embedded system, comprising:

确定单元,用于确定压缩数据中包含有压缩标识时,触发获取单元;A determination unit, configured to trigger an acquisition unit when determining that the compressed data contains a compression identifier;

获取单元,用于获取代替字节串的距离及字节数目;以及an acquisition unit, configured to acquire the distance and the number of bytes in place of the byte string; and

解压单元,用于在已解压的数据中所述距离处获取出所述字节数目的字节,并填入当前的解压数据中。The decompression unit is configured to obtain the bytes of the number of bytes at the distance in the decompressed data, and fill them into the current decompressed data.

本发明中,针对嵌入式系统的字节流数据,在待传输数据中确定出所有的相同的字节串,对于与之前相同的字节串,使用与之前相同的字节串的距离及相同的字节数来代替该相同的字节串,由于代替字节串的仅仅是两个数字,从而大大减少了待传输字节的数据量,从而达到了对字节流数据的压缩,解压时,对于未压缩字节,直接输出即可,对于压缩字节串,按压缩方式反向进行解压即可。本发明尤其适合字节流传输的嵌入式系统,而嵌入式系统中的字节流传输的重复字节串又比较多,因此,数据压缩率比较高,从而使嵌入式系统的CPU占用率处于较低的水平。In the present invention, for the byte stream data of the embedded system, determine all the same byte strings in the data to be transmitted, and for the same byte string as before, use the distance and the same byte string as before. The number of bytes to replace the same byte string, because only two numbers replace the byte string, which greatly reduces the amount of data to be transmitted bytes, thus achieving the compression of byte stream data, when decompressing , for uncompressed bytes, just output directly; for compressed byte strings, decompress in reverse according to the compression method. The present invention is especially suitable for the embedded system of byte stream transmission, and the repeated byte string of byte stream transmission in the embedded system is more again, therefore, data compression ratio is higher, thereby makes the CPU occupancy rate of embedded system be in lower level.

附图说明 Description of drawings

图1为本发明嵌入式系统的数据压缩方法的示意图;Fig. 1 is the schematic diagram of the data compression method of embedded system of the present invention;

图2为本发明嵌入式系统的数据压缩装置的组成结构示意图;Fig. 2 is the composition structure schematic diagram of the data compression device of embedded system of the present invention;

图3为本发明嵌入式系统的数据接压缩装置的组成结构示意图。FIG. 3 is a schematic diagram of the composition and structure of the data interface compression device of the embedded system of the present invention.

具体实施方式 Detailed ways

本发明的基本思想是:针对嵌入式系统的字节流数据,在待传输数据中确定出所有的相同的字节串,对于与之前相同的字节串,使用与之前相同的字节串的距离及相同的字节数来代替该相同的字节串,由于代替字节串的仅仅是两个数字,从而大大减少了待传输字节的数据量,从而达到了对字节流数据的压缩,解压时,对于未压缩字节,直接输出即可,对于压缩字节串,按压缩方式反向进行解压即可。本发明尤其适合字节流传输的嵌入式系统,而嵌入式系统中的字节流传输的重复字节串又比较多,因此,数据压缩率比较高,从而使嵌入式系统的CPU占用率处于较低的水平。The basic idea of the present invention is: for the byte stream data of the embedded system, all identical byte strings are determined in the data to be transmitted, and for the same byte strings as before, use the same byte strings as before. The distance and the same number of bytes are used to replace the same byte string. Since only two numbers are used to replace the byte string, the amount of data to be transmitted is greatly reduced, thereby achieving the compression of byte stream data. , when decompressing, for uncompressed bytes, just output directly, and for compressed byte strings, decompress in reverse according to the compression method. The present invention is especially suitable for the embedded system of byte stream transmission, and the repeated byte string of byte stream transmission in the embedded system is more again, therefore, data compression ratio is higher, thereby makes the CPU occupancy rate of embedded system be in lower level.

为使本发明的目的、技术方案和优点更加清楚明白,以下举实施例并参照附图,对本发明进一步详细说明。In order to make the object, technical solution and advantages of the present invention clearer, the present invention will be further described in detail by citing the following embodiments and referring to the accompanying drawings.

首先说明一下本发明采用的压缩原理,以字符串“abcdef1234abcdeki1234”为例进行说明,对于该字符串中,查找出内容相同的字节串,下面用“( )”括起来的部分就是字节串相同的部分,如:abcdef1234(abcde)ki(1234)。本发明使用相同字节串之间的距离以及相同字符的数目(或长度)这样一对信息,来替换相同的字节串,如上述字节串可表示为:abcdef1234(10,5)ki(11,4),其中,“(10,5)”中的“10”为相同字节串与当前位置之间的距离,“5”为相同字节串的数目(或长度)。同理,“(11,4)”中的“11”为相同字节串与当前位置之间的距离,“4”为相同字节串的数目。由于相同字节串之间的距离及相同字节数目这一对信息远小于被代替的字节串的大小,所以传输的字节流中的字节串得到了压缩。First of all, explain the compression principle adopted by the present invention, and take the character string "abcdef1234abcdeki1234" as an example to illustrate. For this character string, find out the byte string with the same content, and the part enclosed by "()" below is the byte string The same part, such as: abcdef1234(abcde)ki(1234). The present invention uses a pair of information such as the distance between the same byte string and the number (or length) of the same character to replace the same byte string, as the above byte string can be expressed as: abcdef1234(10,5)ki( 11, 4), wherein, "10" in "(10, 5)" is the distance between the same byte string and the current position, and "5" is the number (or length) of the same byte string. Similarly, "11" in "(11, 4)" is the distance between the same byte string and the current position, and "4" is the number of the same byte string. Since the distance between the same byte strings and the same byte number are much smaller than the size of the replaced byte string, the byte strings in the transmitted byte stream are compressed.

以下通过具体的示例,进一步阐明本发明技术方案的实质。The essence of the technical solution of the present invention is further clarified through specific examples below.

图1为本发明嵌入式系统的数据压缩方法的示意图,如图1所示,本发明以字节流“abcdef1234abcki123abcde1234”为例,详细阐明本发明数据压缩的实质方案。Fig. 1 is the schematic diagram of the data compression method of the embedded system of the present invention, as shown in Fig. 1, the present invention takes the byte stream "abcdef1234abcki123abcde1234" as an example, and elaborates the substantive scheme of the present invention's data compression.

如图1所示,为待压缩的字节流设置存储字节串用的索引链表,并设置存储字节串匹配状态的类表数组m_awSortTable[],该数组中存储的数据为0时,表示当前滑动窗中的字节串在之前的字节串中没有与之相同的字节串,而类表数组m_awSortTable[]中存储的数据不为0时,表示当前滑动窗中的字节串在之前的字节串中有与之相同的字节串。根据滑动窗中的具体的字节串的内容来确定该字节串匹配状态在类表数组中的存储地址,以下将详细说明。本示例中的滑动窗大小为2个字节,可根据具体的需要而任意设置滑动窗的大小,例如设置其大小为3个、4个字节等。As shown in Figure 1, an index linked list for storing byte strings is set for the byte stream to be compressed, and a class table array m_awSortTable[] for storing byte string matching status is set. When the data stored in this array is 0, it means The byte string in the current sliding window does not have the same byte string in the previous byte string, and when the data stored in the class table array m_awSortTable[] is not 0, it means that the byte string in the current sliding window is in The previous byte string has the same byte string as it. The storage address of the byte string matching state in the class table array is determined according to the content of the specific byte string in the sliding window, which will be described in detail below. The size of the sliding window in this example is 2 bytes, and the size of the sliding window can be set arbitrarily according to specific needs, such as setting the size to 3 or 4 bytes.

从字节流的开头开始按滑动窗大小遍历整个字节流,首先找到“ab”2字节串,计算“ab”字节串的匹配状态在m_awSortTable[]中的存储地址,具体的,通过“a×256+b”来确定,“a×256+b”相当于确定出“ab”字节串的Hash值,通过查找m_awSortTable[a×256+b],确认其为0,因为当前不存在与“ab”字节串相同(匹配)的字节串,则将“ab”插入到索引链表的节点1中,并将节点1对应的wOffSet、wSameOffSet及wnext进行赋值,即全为0,wOffSet表示当前的字节串与之前最近的相同的字节串的偏移量,wSameOffSet表示当前的字节串是否是相同字节的字节串,即确定当前的字节串是否为类似“aa”、“bb”、“22”等字节相同的字节串,本例中为非字节相同的字节串,wSameOffSet值为0,而为字节相同的字节串时,wSameOffSet值为1;wnext为与相同的字节串之间的偏移量,本发明中是确定与之前最近的字节串的偏移量(或距离),由于不存在与之相同的字节串,因此,该wnext为0。修改m_awSortTable[a×256+b]=1,表示“ab”插入到了索引链表的节点1中。Start from the beginning of the byte stream to traverse the entire byte stream according to the size of the sliding window, first find the "ab" 2-byte string, and calculate the storage address of the matching state of the "ab" byte string in m_awSortTable[], specifically, by "a×256+b" to determine, "a×256+b" is equivalent to determine the Hash value of the "ab" byte string, by searching m_awSortTable[a×256+b], confirm that it is 0, because there is currently no If there is a byte string identical to (matching) the "ab" byte string, insert "ab" into node 1 of the index list, and assign values to wOffSet, wSameOffSet, and wnext corresponding to node 1, that is, all are 0, wOffSet indicates the offset between the current byte string and the most recent same byte string, wSameOffSet indicates whether the current byte string is a byte string of the same byte, that is, to determine whether the current byte string is similar to "aa ", "bb", "22" and other byte strings with the same byte. In this example, it is a non-byte string with the same byte. 1; wnext is the offset between the same byte string, and in the present invention, it is to determine the offset (or distance) with the previous nearest byte string, because there is no identical byte string, so , the wnext is 0. Modify m_awSortTable[a×256+b]=1, indicating that “ab” is inserted into node 1 of the index linked list.

使滑动窗向后滑动一个字节,滑动窗当前为“bc”2字节串,确定之前的字节串中还是不存与之匹配的字节串,同“ab”相同的方式,将“bc”插入索引链表的节点2,其中,节点2对应的wOffSet=0,wSameOffSet=0,wnext=0,需要说明的是,此时m_awSortTable[b×256+c]=2(之前该值为空,即为0)。Make the sliding window slide back one byte, the sliding window is currently a "bc" 2-byte string, determine whether there is still no matching byte string in the previous byte string, in the same way as "ab", set " bc" is inserted into node 2 of the index linked list, wherein wOffSet=0, wSameOffSet=0, wnext=0 corresponding to node 2, it should be noted that, at this time, m_awSortTable[b×256+c]=2 (the previous value was empty , which is 0).

如此一直向后滑动滑动窗,直到滑动窗中出现下一个“ab”2字节串,通过计算“ab”字节串的类表数组地址,发现m_awSortTable[a×256+b]不为0,证明在前面已经出现过一次“ab”字节串,且通过查询m_awSortTable[a×256+b]=1可以知道与该字节串相同的字节串存储在索引链表的节点1中,此时从该两个相同的字节串“ab”开始,在两相同的字节串之后分别滑动滑动窗,按前述方式逐个比较两滑动窗中的字节串,直到不相同为止,得出最大的相同字节串,对于本例,即确定出三个相同字节“abc”。并计算出从第11个节点(第二个字节串“abc”或“ab”在索引链表中的存储的节点号)和第1个节点(第一个字节串“abc”或“ab”在索引链表中的存储的节点号)的差值,得到两字节串之间的距离为“10”,同时又得到相同字节串的字节数目(或长度)为“3”。并将m_awSortTable[a×256+b]=11(11为第二个字节串“ab”对应的节点号)。Keep sliding the sliding window backward until the next "ab" 2-byte string appears in the sliding window. By calculating the address of the table-like array of the "ab" byte string, it is found that m_awSortTable[a×256+b] is not 0, Prove that the "ab" byte string has appeared once before, and by querying m_awSortTable[a×256+b]=1, we can know that the byte string identical to this byte string is stored in node 1 of the index linked list, at this time Starting from the two identical byte strings "ab", slide the sliding window after the two identical byte strings, compare the byte strings in the two sliding windows one by one in the aforementioned manner, until they are different, and obtain the largest For the same byte string, for this example, three identical bytes "abc" are determined. And calculate the node number from the 11th node (the second byte string "abc" or "ab" stored in the index linked list) and the first node (the first byte string "abc" or "ab" "The node number stored in the index linked list) difference, the distance between the two byte strings is "10", and the number of bytes (or length) of the same byte string is "3". And m_awSortTable[a×256+b]=11 (11 is the node number corresponding to the second byte string “ab”).

接着继续滑动滑动窗,当滑动窗中出现“12”2字节串时,发现m_awSortTable[1×256+2]=7,说明在前面的字节流中已经出现过一次“12”字节串,同上一步方法,得到最大的相同字符,三个重复字符“123”和两者之间的距离“9”。Then continue to slide the sliding window. When the "12" 2-byte string appears in the sliding window, m_awSortTable[1×256+2]=7 is found, indicating that the "12" byte string has appeared once in the previous byte stream , with the method of the previous step, get the largest identical character, three repeated characters "123" and the distance between them "9".

继续滑动滑动窗,当滑动窗再次出现“ab”2字节串时,发现m_awSortTable[a×256+b]=11,说明在前面的字节流中已出现过一次“ab”字节串,利用以上方式得到与节点11所在的字节串(两者之间的距离,相同字节串的字节长度),这时发现节点11对应的wnext=1,说明在节点1还有一个“ab”字节串,此时,将同时在以上步骤得到字节流当前位置到节点1所在位置的(两者之间的距离,相同字节串的字节长度),比较两次得到的“相同字节串的字节长度”,最大的一个作为最终确定的相同字节串。本例中,当前字节串“ab”与索引链表的节点1中的字节串之间的相同字节串的字节长度大,为“5”,则以(18,5)来代替字节串“abcde”。Continue to slide the sliding window. When the "ab" 2-byte string appears again in the sliding window, m_awSortTable[a×256+b]=11 is found, indicating that the "ab" byte string has appeared once in the previous byte stream. Utilize the above method to obtain the byte string (the distance between the two, the byte length of the same byte string) with the node 11, and at this time find that the wnext=1 corresponding to the node 11 shows that there is also an "ab" at the node 1 "byte string, at this time, the current position of the byte stream to the position of node 1 will be obtained in the above steps at the same time (the distance between the two, the byte length of the same byte string), and the "same" obtained by comparing twice The byte length of the byte string", the largest one being the same byte string as finalized. In this example, the byte length of the same byte string between the current byte string "ab" and the byte string in node 1 of the index linked list is large, which is "5", then use (18, 5) to replace the word The stanza string "abcde".

需要说明的是,在滑动窗滑动到节点19处的“ab”时,本发明也可以先对节点11所在的字节串进行字节串的匹配,确定“相同字节串的字节长度”为“3”,则与设定的阈值(如设置为4)比较,大于等于阈值时,则直接以该最近的相同字节串的最大相同字节数作为最终的最大相同字节数;确定出的相同字节长度小于设定阈值时,确定该最近的相同字节串(“ab”)之前是否还有该相同的字节串(“ab”),并确定出与次近的相同字节串的“相同字节串的字节长度”,并与设定的阈值比较,如果小于设定阈值则继续查找之前是否还有与当前滑动窗口中相同的字节串(“ab”),直到查找出大于设定阈值的“相同字节串的字节长度”,或者查找完毕所有的字节流,如果查找完毕所有的字节流仍未查找到大于设定阈值的“相同字节串的字节长度”的字节串,则以所查找到的最大“相同字节串的字节长度”的子节串作为最终的相同字节串。It should be noted that, when the sliding window slides to "ab" at node 19, the present invention may first perform byte string matching on the byte string where node 11 is located, and determine "the byte length of the same byte string" If it is "3", then it is compared with the set threshold (such as being set to 4), and when it is greater than or equal to the threshold, the maximum number of identical bytes of the nearest identical byte string is directly used as the final maximum number of identical bytes; determine When the same byte length is less than the set threshold, determine whether there is the same byte string ("ab") before the nearest identical byte string ("ab"), and determine the next closest identical byte string The "byte length of the same byte string" of the section string, and compare it with the set threshold, if it is less than the set threshold, continue to search whether there is the same byte string ("ab") as in the current sliding window before, Until the "byte length of the same byte string" greater than the set threshold is found, or all the byte streams are searched, if all the byte streams are searched, the "same byte string" greater than the set threshold is still not found The byte string of the "byte length of the same byte string", then the subsection string of the largest "byte length of the same byte string" found is used as the final identical byte string.

以下说明本发明如何实现压缩的,假定有一段字节流“abcdef1234abcki123abcde1234abcki123abcde1234”,当滑动窗指向字节流开头位置,取出2字节串”ab”,由图1中的滑动窗滑动过程可知,此时”ab”2字节串,没有重复,所以先往输出缓冲区中,写入bit‘0’,接着写入‘a’。The following describes how the present invention realizes compression. Suppose there is a section of byte stream "abcdef1234abcki123abcde1234abcki123abcde1234". When the sliding window points to the beginning of the byte stream, the 2-byte string "ab" is taken out. As can be seen from the sliding window sliding process in Figure 1, this When "ab" is a 2-byte string, there is no repetition, so first write bit'0' to the output buffer, and then write 'a'.

滑动窗继续往后滑动一个字节,取出2字节串”bc”,此时”bc”2字节串,也有重复,所以也是先往输出缓冲区中,写入bit‘0’,接着写入‘b’。往索引链表添加一个节点“ab”,即图1中的索引链表中的节点1。The sliding window continues to slide one byte backward, and the 2-byte string "bc" is taken out. At this time, the 2-byte string of "bc" also has repetitions, so it is also written into the output buffer first, bit'0', and then write Enter 'b'. Add a node "ab" to the index linked list, that is, node 1 in the index linked list in FIG. 1 .

滑动窗继续往后滑动一个字节,取出2字节串“cd”,此时“cd”2字节串,也有重复,所以也是先往输出缓冲区中,写入bit‘0’,接着写入‘c’。再往索引链表的后一个节点中添加“bc”,即图1中的索引链表的节点2。The sliding window continues to slide one byte back, and takes out the 2-byte string "cd". At this time, the "cd" 2-byte string also has repetitions, so it is also written into the output buffer first, bit'0', and then write Enter 'c'. Then add "bc" to the last node of the index linked list, that is, node 2 of the index linked list in FIG. 1 .

依此类推,当滑动窗移到“4a”时,取出2字节串“4a”,此时“4a”2字节串,没有重复,所以也是先往输出缓冲区中,写入bit‘0’,接着写入‘4’。再往索引链表添加一个节点“34”,即图1中索引链表的节点9。By analogy, when the sliding window moves to "4a", the 2-byte string "4a" is taken out. At this time, the "4a" 2-byte string has no repetition, so it is also written into the output buffer first and bit'0 ', then write '4'. Then add a node "34" to the index linked list, that is, node 9 of the index linked list in FIG. 1 .

滑动窗继续往下滑动一个字节,取出2字节串“ab”,此时“ab”2字节串,在索引表里遍历时,发现已经在前面出现过。而且通过其索引节点,可以找到它的wOffSet偏移。从而能够定位它出现的位置,计算出“距离10”和“相同字节串的字节长度3”。此时先往输出缓冲区中,写入bit‘1’,接着先对“距离10”、“相同字节串的字节长度”进行伽马(Gamma)编码后输出到缓冲区中。然后往索引链表添加节点“4a”“ab”,即图1中的索引链表的节点10、节点11。当滑动窗移动到“12”时,处理情况和“ab”一样。The sliding window continues to slide down one byte, and the 2-byte string "ab" is taken out. At this time, the "ab" 2-byte string, when traversing in the index table, is found to have appeared before. And through its index node, you can find its wOffSet offset. Thereby, the position where it appears can be located, and the "distance 10" and "the byte length of the same byte string 3" can be calculated. At this time, first write bit '1' into the output buffer, and then first perform Gamma (Gamma) encoding on "distance 10" and "byte length of the same byte string" and then output them to the buffer. Then add nodes "4a" and "ab" to the index linked list, that is, nodes 10 and 11 of the index linked list in FIG. 1 . When the sliding window is moved to "12", the processing is the same as "ab".

以下简单介绍一下Gamma编码,假设对x编码,令q=Int(log2x),则编码的前一部分是q个1加一个0,后一部分是q位长的二进制数,其值等于x-2q。普通数值与Gamma编码值的对应关系如表1所示:The following is a brief introduction to Gamma coding. Assuming that x is coded, let q=Int(log 2 x), then the first part of the code is q 1s plus a 0, and the second part is a q-bit long binary number whose value is equal to x- 2q . The correspondence between ordinary values and Gamma coded values is shown in Table 1:

  数值 value  1 1   2 2   3 3   4 4   5 5   6 6   7 7   8 8   9 9   编码值 coded value  0 0   10 0 10 0   10 1 10 1   110 00 110 00   110 01 110 01   110 10 110 10   110 11 110 11   111 0000 111 0000   111 0001 111 0001

表1Table 1

滑动窗往下滑动到再一个“ab”时,取出2字节串“ab”此时“ab”2字节串,在索引表里遍历时,发现已经在前面出现过。而且通过其索引节点11,可以找到它的wOfffSet偏移。从而能够定位它出现的位置,计算出它距离9和相同的长度3。此时发现索引节点的wNext值不为0,表示在节点11之前还有出现过一次“ab”所以,通出wNext取得“ab”出现的索引位置,即节点1。计算出它的距离19,相同的长度为5,比与节点11计算出来的长度还要大。接着再取得节点1的wNext的值为0,表示节点1之前没有相同的字节串“ab”了。所以此时先往输出缓冲区中,写入bit‘1’,接着先对“距离”、“相同的长度”进行Gamma编码后输出到缓冲区中。然后往索引链表添加节点“3a”、“ab”、“bc”、“cd”,即图1中的索引链表的节点18、19、20、21。When the sliding window slides down to another "ab", the 2-byte string "ab" is taken out. At this time, the "ab" 2-byte string is traversed in the index table, and it is found that it has appeared before. And through its index node 11, its wOfffSet offset can be found. So as to be able to locate the position where it appears, calculate its distance 9 and the same length 3. At this time, it is found that the wNext value of the index node is not 0, which means that "ab" appeared once before node 11. Therefore, the index position where "ab" appears is obtained through wNext, that is, node 1. Calculate its distance 19, the same length is 5, which is larger than the calculated length with node 11. Then get the value of wNext of node 1 to be 0, which means that node 1 does not have the same byte string "ab" before. So at this time, first write bit'1' into the output buffer, and then first perform Gamma encoding on "distance" and "same length" and then output it to the buffer. Then add nodes "3a", "ab", "bc", and "cd" to the index linked list, that is, nodes 18, 19, 20, and 21 of the index linked list in FIG. 1 .

依此类推,最后压缩后的数据为“30988c664329986232198d300d669c6e”(对应的ASCII码),压缩前数据的ASCII码是“616263646566313233346162636b693132336162636465313233346162636b69313233616263646531323334”。因此大大压缩了节点流数据。依此类推,最后压缩后的数据为“30988c664329986232198d300d669c6e”(对应的ASCII码),压缩前数据的ASCII码是“616263646566313233346162636b693132336162636465313233346162636b69313233616263646531323334”。 The node stream data is thus greatly compressed.

举个解压缩例子:To give an example of decompression:

解压缩是一个逆反过程,相对较简单。假定压缩后的数据为”30988c664329986232198d300d669c6e”(对应的ASCII码),首先将指针指向开头,取出第一个bit,是‘0’,得出接下来的8个bit就是需要直接输出的字符。从而得到“61”(ASCII码),即字符‘a’,接着继续往下读一个“bit”位,发现还是‘0’,继续记出接下来的8个“bit”,即得到“62”(ASCII码),即字符‘b’,依此类推,可以读出“cdef1234”,当取完字符‘4’时,接下来再读一个bit位,发现是‘1’,此时,一直往下读出bit位直到不为‘1’时,得出q个1,然后再往下读一个0,接着再读出q位bit值。此时通过Gamma编码公式进行反推,可以得出压缩前编码的数据值,从而读出重复的长度;接着往下读出压缩前两者之间的距离所需要存储的bit长度,即读出两者之间的距离,然后就可以开始复制数据到当前位置输出。依此类推,就可以将所有数据解压出来。Decompression is a reverse process, relatively simple. Assuming that the compressed data is "30988c664329986232198d300d669c6e" (corresponding ASCII code), first point the pointer to the beginning, take out the first bit, which is '0', and obtain the next 8 bits are the characters that need to be output directly. To get "61" (ASCII code), that is, the character 'a', then continue to read a "bit" and find it is still '0', continue to write down the next 8 "bits", that is, get "62" (ASCII code), that is, the character 'b', and so on, you can read "cdef1234". When the character '4' is finished, read another bit and find it is '1'. At this time, go all the way When the bit is read out until it is not '1', q 1s are obtained, and then a 0 is read down, and then the q bit value is read out. At this time, the Gamma coding formula is used to invert, and the data value encoded before compression can be obtained, so as to read the length of the repetition; then read down the bit length that needs to be stored for the distance between the two before compression, that is, read The distance between the two, and then you can start copying data to the current location for output. By analogy, all the data can be decompressed.

图2为本发明嵌入式系统的数据压缩装置的组成结构示意图,如图2所示,本发明嵌入式系统的数据压缩装置包括确定单元20和压缩单元21,其中,确定单元20用于确定出待压缩数据中相同的字节串;压缩单元21用于对相同的字节串中除第一个字节串之外,使用与相同字节串的距离及相同字节的数目来代替所述相同的字节串。Fig. 2 is a schematic diagram of the composition structure of the data compression device of the embedded system of the present invention, as shown in Fig. 2, the data compression device of the embedded system of the present invention comprises a determination unit 20 and a compression unit 21, wherein the determination unit 20 is used to determine The same byte string in the data to be compressed; the compression unit 21 is used to replace the same byte string with the distance from the same byte string and the number of the same byte except the first byte string the same byte string.

如图2所示,本发明嵌入式系统的数据压缩装置还包括设置单元22,用于传输数据时,为未压缩数据及压缩数据分别设置标识位。As shown in FIG. 2 , the data compression device of the embedded system of the present invention further includes a setting unit 22 for setting identification bits for uncompressed data and compressed data respectively when transmitting data.

如图2所示,本发明嵌入式系统的数据压缩装置还包括编码单元23,用于对与相同字节串的距离及相同字节的数目使用Gamma编码方式进行编码。As shown in FIG. 2 , the data compression device of the embedded system of the present invention further includes an encoding unit 23 for encoding the distance from the identical byte string and the number of identical bytes using Gamma encoding.

确定单元20确定出待压缩数据中相同的字节串,包括:所设置的滑动窗从待压缩数据的首字节起逐字节地滑动,根据每次滑动后所述滑动窗中的字节串确定所述字节串在类表数组中的地址,判断所述地址中存储的数据是否为零,若为零则之前没有与所述字节串相同的字节串,不为零时,则之前有与所述字节串相同的字节串,则继续滑动所述滑动窗,直到确定出与之前字节串相同的最长字节串,并确定出与之前相同的字节串的距离。Determining unit 20 determines the same byte string in the data to be compressed, including: the set sliding window slides byte by byte from the first byte of the data to be compressed, according to the byte in the sliding window after each slide String determines the address of the byte string in the class list array, and judges whether the data stored in the address is zero. If it is zero, there is no byte string identical to the byte string before. When it is not zero, Then there is a byte string identical to the byte string before, then continue to slide the sliding window until the longest byte string identical to the previous byte string is determined, and the length of the same byte string is determined. distance.

或者,确定单元20确定出待压缩数据中相同的字节串,包括:所设置的滑动窗从待压缩数据的首字节起逐字节地滑动,根据每次滑动后所述滑动窗中的字节串确定所述字节串在类表数组中的地址,判断所述地址中存储的数据是否为零,若为零则之前没有与所述字节串相同的字节串,不为零时,则确定出与所述字节串相同且距离最近的字节串,并继续滑动所述滑动窗,判断与前一滑动窗中所述字节串相同且距离最近的字节串的后一滑动窗中字节是否与当前的滑动窗中的字节串相同,确定出字节串相同的最长字节串,并判断字节串相同的最长字节串的字节数是否达到设定阈值,未达到时则判断是否还存在与第一次字节串相同的滑动窗中的字节串相同的其他字节串,直到确定出的字节串相同的最长字节串的字节数达到设定阈值;或确定不存在与第一次字节串相同的滑动窗中的字节串相同的其他字节串,并以所确定的字节串相同的最长字节串作为最终的相同的字节串。Alternatively, the determining unit 20 determines the same byte string in the data to be compressed, including: the set sliding window slides byte by byte from the first byte of the data to be compressed, and according to the The byte string determines the address of the byte string in the class table array, and judges whether the data stored in the address is zero. If it is zero, there is no byte string identical to the byte string before, and it is not zero , then determine the byte string that is the same as the byte string and the closest to the byte string, and continue to slide the sliding window to determine the next byte string that is the same as the byte string in the previous sliding window and the closest Whether the byte in the sliding window is the same as the byte string in the current sliding window, determine the longest byte string with the same byte string, and determine whether the number of bytes in the longest byte string with the same byte string reaches Set the threshold, and when it is not reached, judge whether there are other byte strings that are identical to the byte strings in the sliding window that are the same as the first byte string, until the determined byte string is the same as the longest byte string The number of bytes reaches the set threshold; or it is determined that there is no other byte string that is the same as the byte string in the sliding window that is the same as the first byte string, and the longest byte string that is the same as the determined byte string as the final same byte string.

本领域技术人员应当理解,本发明图2所示的嵌入式系统的数据压缩装置是为实现前述嵌入式系统的数据压缩方法而设计的,图2所示装置中的各处理单元的实现功能可参照前述降低多载波相互干扰的方法中的相关描述而理解,各单元的功能可通过运行于处理器上的程序而实现,也可通过相应的逻辑电路而实现。Those skilled in the art should understand that the data compression device of the embedded system shown in Figure 2 of the present invention is designed for realizing the data compression method of the aforementioned embedded system, and the realization functions of each processing unit in the device shown in Figure 2 can be It can be understood with reference to the relevant descriptions in the aforementioned method for reducing mutual interference of multiple carriers that the functions of each unit may be implemented by a program running on a processor, or may be implemented by a corresponding logic circuit.

图3为本发明嵌入式系统的数据接压缩装置的组成结构示意图,如图3所示,本发明嵌入式系统的数据压缩装置包括确定单元30、获取单元31和解压单元32,其中,确定单元30用于确定压缩数据中包含有压缩标识时,触发获取单元31;获取单元31用于获取代替字节串的距离及字节数目;解压单元32用于在已解压的数据中所述距离处获取出所述字节数目的字节,并填入当前的解压数据中。Fig. 3 is a schematic diagram of the composition structure of the data compression device of the embedded system of the present invention, as shown in Fig. 3, the data compression device of the embedded system of the present invention comprises a determination unit 30, an acquisition unit 31 and a decompression unit 32, wherein the determination unit 30 is used to trigger the acquisition unit 31 when it is determined that the compressed data contains a compression identifier; the acquisition unit 31 is used to acquire the distance and the number of bytes in place of the byte string; Obtain the bytes of the number of bytes and fill them into the current decompressed data.

本领域技术人员应当理解,本发明图3所示的嵌入式系统的数据解压缩装置是为实现前述嵌入式系统的数据压缩方法而设计的,图3所示装置中的各处理单元的实现功能可参照前述降低多载波相互干扰的方法中的相关描述而理解,各单元的功能可通过运行于处理器上的程序而实现,也可通过相应的逻辑电路而实现。Those skilled in the art should understand that the data decompression device of the embedded system shown in Figure 3 of the present invention is designed for realizing the data compression method of the aforementioned embedded system, and the realization functions of each processing unit in the device shown in Figure 3 It can be understood with reference to the related descriptions in the aforementioned method for reducing mutual interference of multiple carriers, that the functions of each unit can be realized by a program running on a processor, or can be realized by a corresponding logic circuit.

以上所述,仅为本发明的较佳实施例而已,并非用于限定本发明的保护范围。The above descriptions are only preferred embodiments of the present invention, and are not intended to limit the protection scope of the present invention.

Claims (11)

1.一种嵌入式系统的数据压缩方法,其特征在于,包括:1. A data compression method for an embedded system, comprising: 确定出待压缩数据中相同的字节串,相同的字节串中除第一个字节串之外,使用与相同字节串的距离及相同字节的数目来代替所述相同的字节串;Determine the same byte string in the data to be compressed, in the same byte string except the first byte string, use the distance from the same byte string and the number of the same byte to replace the same byte string; 其中,所述确定出待压缩数据中相同的字节串,包括:Wherein, the determination of the same byte string in the data to be compressed includes: 所设置的滑动窗从待压缩数据的首字节起逐字节地滑动,根据每次滑动后所述滑动窗中的字节串确定所述字节串在类表数组中的地址,判断所述地址中存储的数据是否为零,若为零则之前没有与所述字节串相同的字节串,不为零时,则之前有与所述字节串相同的字节串,则继续滑动所述滑动窗,直到确定出与之前字节串相同的最长字节串,并确定出与之前相同的字节串的距离。The set sliding window slides byte by byte from the first byte of the data to be compressed, determines the address of the byte string in the class table array according to the byte string in the sliding window after each slide, and judges the Whether the data stored in the above address is zero, if it is zero, there is no byte string identical to the byte string before, if it is not zero, then there is a byte string identical to the byte string before, then continue The sliding window is slid until the longest byte string identical to the previous byte string is determined, and the distance to the same byte string is determined. 2.根据权利要求1所述的方法,其特征在于,所述方法还包括:2. The method according to claim 1, characterized in that the method further comprises: 传输数据时,为未压缩数据及压缩数据分别设置标识位。When transmitting data, set flags for uncompressed data and compressed data respectively. 3.根据权利要求1所述的方法,其特征在于,所述与相同字节串的距离及相同字节的数目使用伽马Gamma编码方式进行编码。3. The method according to claim 1, wherein the distance from the same byte string and the number of the same byte are encoded using a Gamma encoding method. 4.根据权利要求1所述的方法,其特征在于,所述确定出待压缩数据中相同的字节串,包括:4. The method according to claim 1, wherein said determining the same byte string in the data to be compressed comprises: 所设置的滑动窗从待压缩数据的首字节起逐字节地滑动,根据每次滑动后所述滑动窗中的字节串确定所述字节串在类表数组中的地址,判断所述地址中存储的数据是否为零,若为零则之前没有与所述字节串相同的字节串,不为零时,则确定出与所述字节串相同且距离最近的字节串,并继续滑动所述滑动窗,判断与前一滑动窗中所述字节串相同且距离最近的字节串的后一滑动窗中字节是否与当前的滑动窗中的字节串相同,确定出字节串相同的最长字节串,并判断字节串相同的最长字节串的字节数是否达到设定阈值,未达到时则判断是否还存在与第一次字节串相同的滑动窗中的字节串相同的其他字节串,直到确定出的字节串相同的最长字节串的字节数达到设定阈值;或确定不存在与第一次字节串相同的滑动窗中的字节串相同的其他字节串,并以所确定的字节串相同的最长字节串作为最终的相同的字节串。The set sliding window slides byte by byte from the first byte of the data to be compressed, determines the address of the byte string in the class table array according to the byte string in the sliding window after each slide, and judges the Whether the data stored in the above address is zero, if it is zero, there is no byte string identical to the byte string before, if it is not zero, then determine the byte string that is the same as the byte string and the closest distance , and continue to slide the sliding window, and judge whether the byte in the next sliding window of the byte string that is the same as the byte string in the previous sliding window and the nearest byte string is the same as the byte string in the current sliding window, Determine the longest byte string with the same byte string, and judge whether the number of bytes in the longest byte string with the same byte string reaches the set threshold, and if it does not reach, judge whether there is still the same byte string as the first byte string Other byte strings identical to the byte strings in the same sliding window until the number of bytes of the longest byte string identical to the determined byte string reaches the set threshold; or it is determined that there is no byte string identical to the first byte string The byte strings in the same sliding window are the same as other byte strings, and the determined longest byte string is the same as the final identical byte string. 5.根据权利要求1或4所述的方法,其特征在于,所述方法还包括:5. according to the described method of claim 1 or 4, it is characterized in that, described method also comprises: 确定所述滑动窗中的字节串与之前字节串不同时,将所述字节串存储于索引链表的节点中,记录相同字节串的偏移量为零,记录所述字节串在类表数组中对应地址存储的数据为存储所述字节串的节点在所述索引链表中顺序值;确定所述滑动窗中的字节串与之前字节串相同时,将所述字节串存储于索引链表的节点中,所述字节串与当前最近的相同字节串的偏移量作为所述字节串的偏移量,记录所述字节串在类表数组中对应地址存储的数据为存储所述字节串的节点在所述索引链表中顺序值。When determining that the byte string in the sliding window is different from the previous byte string, store the byte string in the node of the index linked list, record the offset of the same byte string as zero, and record the byte string The data stored in the corresponding address in the class table array is the sequence value of the node storing the byte string in the index linked list; when it is determined that the byte string in the sliding window is the same as the previous byte string, the word The byte string is stored in the node of the index linked list, the offset between the byte string and the current closest same byte string is used as the offset of the byte string, and the corresponding byte string is recorded in the class table array The data stored by the address is the sequence value of the node storing the byte string in the index linked list. 6.一种嵌入式系统的数据解压缩方法,其特征在于,包括:6. A data decompression method for an embedded system, comprising: 压缩数据中包含有压缩标识时,获取代替字节串的距离及字节数目,并在已解压的数据中所述距离处获取出所述字节数目的字节,填入当前的解压数据中;When the compressed data contains a compression mark, obtain the distance and the number of bytes that replace the byte string, and obtain the bytes with the number of bytes at the distance in the decompressed data, and fill them in the current decompressed data ; 具体包括:根据所述字节串在类表数组中的地址,判断所述地址中存储的数据是否为零,若为零则直接输出对应的字符;不为零时,则通过Gamma编码公式反推得出压缩前编码的数据值,读出重复的长度,然后读出压缩前所述代替字节串和之前相同字节串之间的距离所需要存储的位长度,即读出两者之间的距离,复制对应数据到当前位置。Specifically include: according to the address of the byte string in the class table array, judge whether the data stored in the address is zero, if it is zero, then directly output the corresponding character; Deduce the encoded data value before compression, read the length of the repetition, and then read the bit length that needs to be stored for the distance between the replacement byte string and the same byte string before compression, that is, read the distance between the two Copy the corresponding data to the current location. 7.一种嵌入式系统的数据压缩装置,其特征在于,包括:7. A data compression device for an embedded system, comprising: 确定单元,用于确定出待压缩数据中相同的字节串;以及a determining unit, configured to determine the same byte string in the data to be compressed; and 压缩单元,用于对相同的字节串中除第一个字节串之外,使用与相同字节串的距离及相同字节的数目来代替所述相同的字节串;A compression unit for replacing the same byte string with the distance from the same byte string and the number of the same bytes except for the first byte string in the same byte string; 其中,所述确定单元确定出待压缩数据中相同的字节串,包括:Wherein, the determination unit determines the same byte string in the data to be compressed, including: 所设置的滑动窗从待压缩数据的首字节起逐字节地滑动,根据每次滑动后所述滑动窗中的字节串确定所述字节串在类表数组中的地址,判断所述地址中存储的数据是否为零,若为零则之前没有与所述字节串相同的字节串,不为零时,则之前有与所述字节串相同的字节串,则继续滑动所述滑动窗,直到确定出与之前字节串相同的最长字节串,并确定出与之前相同的字节串的距离。The set sliding window slides byte by byte from the first byte of the data to be compressed, determines the address of the byte string in the class table array according to the byte string in the sliding window after each slide, and judges the Whether the data stored in the above address is zero, if it is zero, there is no byte string identical to the byte string before, if it is not zero, then there is a byte string identical to the byte string before, then continue The sliding window is slid until the longest byte string identical to the previous byte string is determined, and the distance to the same byte string is determined. 8.根据权利要求7所述的装置,其特征在于,所述装置还包括:8. The device according to claim 7, further comprising: 设置单元,用于传输数据时,为未压缩数据及压缩数据分别设置标识位。The setting unit is used to set the identification bits for the uncompressed data and the compressed data respectively when transmitting data. 9.根据权利要求7所述的装置,其特征在于,所述装置还包括:9. The device according to claim 7, further comprising: 编码单元,用于对与相同字节串的距离及相同字节的数目使用Gamma编码方式进行编码。The encoding unit is used to encode the distance from the identical byte string and the number of identical bytes using Gamma encoding. 10.根据权利要求7所述的装置,其特征在于,所述确定单元确定出待压缩数据中相同的字节串,包括:10. The device according to claim 7, wherein the determination unit determines the same byte string in the data to be compressed, comprising: 所设置的滑动窗从待压缩数据的首字节起逐字节地滑动,根据每次滑动后所述滑动窗中的字节串确定所述字节串在类表数组中的地址,判断所述地址中存储的数据是否为零,若为零则之前没有与所述字节串相同的字节串,不为零时,则确定出与所述字节串相同且距离最近的字节串,并继续滑动所述滑动窗,判断与前一滑动窗中所述字节串相同且距离最近的字节串的后一滑动窗中字节是否与当前的滑动窗中的字节串相同,确定出字节串相同的最长字节串,并判断字节串相同的最长字节串的字节数是否达到设定阈值,未达到时则判断是否还存在与第一次字节串相同的滑动窗中的字节串相同的其他字节串,直到确定出的字节串相同的最长字节串的字节数达到设定阈值;或确定不存在与第一次字节串相同的滑动窗中的字节串相同的其他字节串,并以所确定的字节串相同的最长字节串作为最终的相同的字节串。The set sliding window slides byte by byte from the first byte of the data to be compressed, determines the address of the byte string in the class table array according to the byte string in the sliding window after each slide, and judges the Whether the data stored in the above address is zero, if it is zero, there is no byte string identical to the byte string before, if it is not zero, then determine the byte string that is the same as the byte string and the closest distance , and continue to slide the sliding window, and judge whether the byte in the next sliding window of the byte string that is the same as the byte string in the previous sliding window and the nearest byte string is the same as the byte string in the current sliding window, Determine the longest byte string with the same byte string, and judge whether the number of bytes in the longest byte string with the same byte string reaches the set threshold, and if it does not reach, judge whether there is still the same byte string as the first byte string Other byte strings identical to the byte strings in the same sliding window until the number of bytes of the longest byte string identical to the determined byte string reaches the set threshold; or it is determined that there is no byte string identical to the first byte string The byte strings in the same sliding window are the same as other byte strings, and the determined longest byte string is the same as the final identical byte string. 11.一种嵌入式系统的数据解压缩装置,其特征在于,包括:11. A data decompression device for an embedded system, characterized in that, comprising: 确定单元,用于确定压缩数据中包含有压缩标识时,触发获取单元;A determination unit, configured to trigger an acquisition unit when determining that the compressed data contains a compression identifier; 获取单元,用于获取代替字节串的距离及字节数目;以及an acquisition unit, configured to acquire the distance and the number of bytes in place of the byte string; and 解压单元,用于在已解压的数据中所述距离处获取出所述字节数目的字节,并填入当前的解压数据中;The decompression unit is used to obtain the bytes of the number of bytes at the distance in the decompressed data and fill them into the current decompressed data; 所述获取单元,进一步用于通过Gamma编码公式反推得出压缩前编码的数据值,读出重复的长度,然后读出压缩前所述代替字节串和之前相同字节串之间的距离所需要存储的位长度,即读出两者之间的距离;The acquisition unit is further used to deduce the encoded data value before compression through the Gamma encoding formula, read the length of the repetition, and then read the distance between the replacement byte string and the same byte string before compression The bit length to be stored, that is, the distance between the two reads; 所述解压单元,进一步用于根据所述字节串在类表数组中的地址,判断所述地址中存储的数据是否为零,若为零则直接输出对应的字符;不为零时,则通过Gamma编码公式反推得出压缩前编码的数据值,根据获取单元读出的所述代替字节串的长度和所述代替字节串与之前相同字节串之间的距离,复制对应数据到当前位置。The decompression unit is further used to judge whether the data stored in the address is zero according to the address of the byte string in the class table array, and if it is zero, then directly output the corresponding character; if it is not zero, then Reverse the Gamma coding formula to obtain the data value encoded before compression, and copy the corresponding data according to the length of the replacement byte string read by the acquisition unit and the distance between the replacement byte string and the same byte string before to the current location.
CN2009100938637A 2009-09-22 2009-09-22 Methods and devices for compressing and uncompressing data of embedded system Expired - Fee Related CN101667843B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2009100938637A CN101667843B (en) 2009-09-22 2009-09-22 Methods and devices for compressing and uncompressing data of embedded system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2009100938637A CN101667843B (en) 2009-09-22 2009-09-22 Methods and devices for compressing and uncompressing data of embedded system

Publications (2)

Publication Number Publication Date
CN101667843A CN101667843A (en) 2010-03-10
CN101667843B true CN101667843B (en) 2013-12-11

Family

ID=41804309

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2009100938637A Expired - Fee Related CN101667843B (en) 2009-09-22 2009-09-22 Methods and devices for compressing and uncompressing data of embedded system

Country Status (1)

Country Link
CN (1) CN101667843B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US12041259B2 (en) 2020-08-31 2024-07-16 Boe Technology Group Co., Ltd. Data processing methods and systems, and electronic devices

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103856296B (en) * 2010-07-16 2017-05-17 珠海中慧微电子有限公司 Address data compression algorithm for low-voltage power line carrier communication
CN102110145A (en) * 2011-02-15 2011-06-29 中国科学院计算技术研究所 Micro FAT file system and quick data access method thereof
CN104699219B (en) * 2013-12-10 2020-06-23 联想(北京)有限公司 Electronic equipment and information processing method
CN104951473A (en) * 2014-03-31 2015-09-30 中国移动通信集团宁夏有限公司 Method and device for compressing data
CN106850507B (en) * 2015-12-04 2020-01-14 北京航空航天大学 Harmful code detection method and device based on HTTP compressed data stream
CN106982165A (en) * 2016-01-15 2017-07-25 厦门雅迅网络股份有限公司 Data compression method and its system
CN107682016B (en) * 2017-09-26 2021-09-17 深信服科技股份有限公司 Data compression method, data decompression method and related system
CN109033189B (en) * 2018-06-27 2021-08-24 创新先进技术有限公司 Compression method and device of link structure log, server and readable storage medium
CN110446124B (en) * 2019-08-19 2021-10-12 深圳市双翼科技股份有限公司 Unit remote management method, storage medium and device based on optical network terminal

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6037883A (en) * 1998-05-07 2000-03-14 Microsoft Corporation Efficient memory usage for two-pass compression
WO2002015408A2 (en) * 2000-08-15 2002-02-21 Seagate Technology Llc Dual mode data compression for operating code

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6037883A (en) * 1998-05-07 2000-03-14 Microsoft Corporation Efficient memory usage for two-pass compression
WO2002015408A2 (en) * 2000-08-15 2002-02-21 Seagate Technology Llc Dual mode data compression for operating code

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US12041259B2 (en) 2020-08-31 2024-07-16 Boe Technology Group Co., Ltd. Data processing methods and systems, and electronic devices

Also Published As

Publication number Publication date
CN101667843A (en) 2010-03-10

Similar Documents

Publication Publication Date Title
CN101667843B (en) Methods and devices for compressing and uncompressing data of embedded system
US20180285393A1 (en) Content estimation data compression
CN103236847B (en) Based on the data lossless compression method of multilayer hash data structure and Run-Length Coding
US8120516B2 (en) Data compression using a stream selector with edit-in-place capability for compressed data
US10187081B1 (en) Dictionary preload for data compression
US5153591A (en) Method and apparatus for encoding, decoding and transmitting data in compressed form
KR101725223B1 (en) Data compressing method of storage device
CN111294053B (en) Hardware-friendly data compression method, system and device
CN110868222B (en) LZSS compressed data error code detection method and device
WO2019153700A1 (en) Encoding and decoding method, apparatus and encoding and decoding device
US20190034334A1 (en) Methods, Devices and Systems for Compressing and Decompressing Data
CN108702160B (en) Method, apparatus and system for compressing and decompressing data
WO2021027487A1 (en) Encoding method and related device
CN103534968A (en) Method and device for encoding and decoding Ethernet physical layer
CN110557124A (en) Data compression method and device
CN104811209B (en) A kind of the compressed file data embedding method and device of anti-most long matching detection
CN101551820B (en) Generation method and apparatus for index database of points of interest attribute
CN100498794C (en) Method and device for compressing index
WO2024138981A1 (en) Data compression method and apparatus, data decompression method and apparatus, and electronic device and storage medium
US7796059B2 (en) Fast approximate dynamic Huffman coding with periodic regeneration and precomputing
CN106559085A (en) A kind of normal form Hafman decoding method and its device
CN118897690A (en) ECU flash update method and system based on compression algorithm
JPH0779265B2 (en) Decompression method of compressed data
CN116527775B (en) Data compression techniques using partition and don't care bit cancellation
WO2020259400A1 (en) Encoding method and apparatus, storage medium, and computer device

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20131211

CF01 Termination of patent right due to non-payment of annual fee