CN107330094B - Bloom filter tree structure and key-value pair storage method for dynamically storing key-value pairs - Google Patents
Bloom filter tree structure and key-value pair storage method for dynamically storing key-value pairs Download PDFInfo
- Publication number
- CN107330094B CN107330094B CN201710542207.5A CN201710542207A CN107330094B CN 107330094 B CN107330094 B CN 107330094B CN 201710542207 A CN201710542207 A CN 201710542207A CN 107330094 B CN107330094 B CN 107330094B
- Authority
- CN
- China
- Prior art keywords
- value
- node
- bloom filter
- key
- root node
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
- G06F16/2228—Indexing structures
- G06F16/2246—Trees, e.g. B+trees
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Software Systems (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
技术领域technical field
本发明涉及计算机网络领域和计算机系统存储领域,特别是高性能、高吞吐量的交互查询的应用领域,具体是一种可扩展的布鲁姆树对于键值对的存储结构及方法。The invention relates to the field of computer network and computer system storage, in particular to the application field of high-performance and high-throughput interactive query, in particular to an extensible Bloom tree storage structure and method for key-value pairs.
背景技术Background technique
近年来,随着计算机的飞速发展,数据库,网络和其他应用中的集合规模呈几何增长。存储和查询键值对(key,value)是计算机系统中常见的任务,这就需要设计对应的键值对存储数据结构,支持快速的键值对查询。键值对操作经常出现在网络和存储系统中,如key-value数据库MongoDB,CouchDB。每个放入到键值对存储系统中的唯一键key都对应一个value,例如(3,5)是一个键key为3,值value为5的键值对,将(3,5)存储到键值对存储系统后,可以通过查询键(key)为3,得到值(value)为5。In recent years, with the rapid development of computers, the size of collections in databases, networks and other applications has grown exponentially. Storing and querying key-value pairs (key, value) is a common task in computer systems, which requires designing a corresponding key-value pair storage data structure to support fast key-value pair query. Key-value pair operations often appear in networks and storage systems, such as key-value databases MongoDB and CouchDB. Each unique key key put into the key-value pair storage system corresponds to a value. For example, (3, 5) is a key-value pair with
设计高效的键值对存储及查询结构带来了巨大的挑战。在一个2层交换机中,一个MAC地址关联一个唯一的端口。当要转发一帧时,搜索引擎会查询这一帧要转发的目的地址的MAC表,因此,将一个MAC地址映射到一个端口的问题就转换成了一个键值对查询问题,这个时候,MAC地址就被看成了key,而要查询的端口号就变成了value。由于MAC地址是持续添加到列表中的,因此,元素的大小是未知的。如果采用cell的结构来存储这些键值对,需要消耗大量的空间,而且在查找对应key的value时会消耗大量的时间;如果采用静态的布鲁姆过滤器结构存储键值对,只能处理静态数据,这在实际应用中很不现实。因此,在高速的计算机网络中,如何高效存储这些信息,快速查询对应的键值对成为挑战。Designing efficient key-value pair storage and query structures brings great challenges. In a
布鲁姆过滤器(Bloom Filter)是一种空间节俭、查询高效的数据结构,它可以满足现如今生活中高效资源交互需求及查找需求,能够有效的表示数据集合。布鲁姆过滤器自1970年由B.Bloom提出以来,就被广泛应用于各种各样的计算机系统之中,用来表示庞大的数据集合,提高查询效率。布鲁姆过滤器结构实质是将集合中元素通过k个哈希函数映射到位向量中。布鲁姆过滤器在达到其高效表示集合的同时,进行元素查询时却存在一定的假阳性(某元素不属于集合而误判为属于集合中)误判率,而不存在假阴性(某元素属于集合而误判为不属于集合中)误判,在查询和存储上有较高的效率。Bloom filter is a space-saving and query-efficient data structure, which can meet the needs of efficient resource interaction and search in today's life, and can effectively represent data sets. Bloom filter has been widely used in various computer systems since it was proposed by B.Bloom in 1970 to represent huge data sets and improve query efficiency. The essence of the Bloom filter structure is to map the elements in the set to a bit vector through k hash functions. While the Bloom filter achieves its efficient representation of the set, there is a certain false positive (an element does not belong to the set but is misjudged as belonging to the set) false positive rate when performing element query, but there is no false negative (an element does not belong to the set). It belongs to the set and is misjudged as not belonging to the set) misjudgment, which has higher efficiency in query and storage.
但是传统的布鲁姆过滤器仅仅只能支持元素是否存在于集合的从属查询。如果元素为key,那么只能支持key是否存在于集合的从属查询,而无法支持(key,value)操作。因为布鲁姆过滤器无法直接存储value,因此并不能使用传统的布鲁姆过滤器对键值对进行操作。要使得布鲁姆过滤器支持键值对的基本操作,必须要改进传统的布鲁姆过滤器,设计新的布鲁姆过滤器结构。But traditional bloom filters can only support subordinate queries of whether an element exists in a collection. If the element is key, it can only support the subordinate query of whether the key exists in the collection, but cannot support the (key, value) operation. Because Bloom filters cannot store values directly, you cannot use traditional Bloom filters to operate on key-value pairs. To make the Bloom filter support the basic operations of key-value pairs, it is necessary to improve the traditional Bloom filter and design a new Bloom filter structure.
发明内容SUMMARY OF THE INVENTION
本发明所要解决的技术问题是,针对现有技术不足,提供一种动态存储键值对的布鲁姆过滤器树结构及键值对存储方法。The technical problem to be solved by the present invention is to provide a Bloom filter tree structure and a key-value pair storage method for dynamically storing key-value pairs in view of the deficiencies of the prior art.
为解决上述技术问题,本发明所采用的技术方案是:一种动态存储键值对的布鲁姆过滤器树结构,包括完全d叉树;每一棵完全d叉树的每一个节点都是一个布鲁姆过滤器;每一棵完全d叉树的每个叶子节点表示一个值value;每一个节点的存储单元大小是该节点的父节点的存储单元大小的一半,根节点包括d×k个不相同的哈希函数,即根节点包括d个哈希组,每组中包含k个哈希函数。In order to solve the above-mentioned technical problems, the technical solution adopted in the present invention is: a Bloom filter tree structure for dynamically storing key-value pairs, including a complete d-ary tree; each node of each complete d-ary tree is A Bloom filter; each leaf node of each complete d-ary tree represents a value value; the storage unit size of each node is half of the storage unit size of the parent node of the node, and the root node includes d×k There are different hash functions, that is, the root node includes d hash groups, and each group includes k hash functions.
相应地,本发明还提供了一种布鲁姆过滤器树结构进行键值对存储的方法,其包括:Correspondingly, the present invention also provides a method for storing key-value pairs in a Bloom filter tree structure, comprising:
插入操作:当需要插入一个键值对(key,value)时,首先查看value是否已经插入到布鲁姆过滤器中,如果未插入,则需要进行后续的增加新value操作;如果value已经存在于布鲁姆过滤器树中,则直接进行键值对插入,根据value查找该value对应的叶节点,得到此叶节点位置编码,确定一条从根节点到此叶节点的唯一路径,计算根节点的两组哈希函数,即对key使用k个哈希函数h(i,1),h(i,2),...,h(i,k),计算h(i,1)(key),h(i,2)(key),...,h(i,k)(key),;其中i表示选取哈希函数的组号,i=1或2;将i=1计算的哈希值保存在数组A中,将i=2计算的哈希值保存在数组B中;接着,根据叶节点的第一位编码,在根节点选取A、B数组中的一组值进行插入,即对A或B数组进行右移零位操作,然后根据叶节点的第一位编码,得到下一个需要进行插入操作的布鲁姆过滤器,再根据叶节点的第二位编码,选取A、B数组中的一组值进行右移一位操作,得到插入位置,进行插入;继续进行上述操作,对每一层布鲁姆过滤器插入key,直到插到value对应的叶节点为止;Insertion operation: When a key-value pair (key, value) needs to be inserted, first check whether the value has been inserted into the Bloom filter. If it has not been inserted, a subsequent operation of adding a new value is required; if the value already exists in the Bloom filter In the Bloom filter tree, the key-value pair is inserted directly, the leaf node corresponding to the value is searched according to the value, the position code of the leaf node is obtained, a unique path from the root node to the leaf node is determined, and the root node is calculated. Two sets of hash functions, that is, use k hash functions h (i,1) ,h (i,2) ,...,h (i,k) for the key, and calculate h (i,1) (key) ,h (i,2) (key),...,h (i,k) (key),; where i represents the group number of the selected hash function, i=1 or 2; The hash value is stored in the array A, and the hash value calculated by i=2 is stored in the array B; then, according to the first bit code of the leaf node, a set of values in the A and B arrays are selected at the root node for insertion, That is, perform a right-shift zero-bit operation on the A or B array, and then obtain the next Bloom filter that needs to be inserted according to the first bit code of the leaf node, and then select A, A group of values in the B array is shifted to the right by one bit to obtain the insertion position and inserted; continue the above operation, and insert the key into each layer of Bloom filter until the leaf node corresponding to the value is inserted;
查询操作:需要查询一个键值key对应的value时,首先,计算根节点的两组哈希函数h(i,1)(key),h(i,2)(key),...,h(i,k)(key)(i=1或2),将它们的值分别保存在A、B两组数组中;在根节点分别对A、B两数组值对应的位置单元进行查询操作,查到第一位编码值;根据得到的编码值,转到下一个节点继续查询,此时,对A、B两数组进行右移操作,在对应的位置单元进行查询,得到一个编码值;继续进行上述操作,直到查找到叶节点为止,最后得到一个完整的编码值,根据这个叶节点编码进行解码操作得到对应的value;Query operation: When you need to query the value corresponding to a key value key, first, calculate the two sets of hash functions h (i,1) (key),h (i,2) (key),...,h of the root node. (i,k) (key) (i=1 or 2), save their values in the two arrays of A and B respectively; at the root node, query the location units corresponding to the values of the two arrays of A and B, respectively, Find the first code value; according to the obtained code value, go to the next node to continue the query, at this time, perform a right-shift operation on the two arrays A and B, and query at the corresponding position unit to obtain a code value; continue Carry out the above operations until the leaf node is found, and finally obtain a complete encoded value, and perform the decoding operation according to the encoding of the leaf node to obtain the corresponding value;
增加新value操作:要添加一个新的value到布鲁姆过滤器中,分以下两种情况:如果原来的布鲁姆过滤器是一棵未满二叉树,当需要添加新的value时,直接在布鲁姆过滤器的末尾添加新的叶节点,这个叶节点表示新添加的value;如果原来的布鲁姆过滤器是一棵满二叉树,则在根节点的上方添加一个新布鲁姆过滤器,这个新添加的布鲁姆过滤器的大小是原布鲁姆过滤器树的根节点的2倍,此时,这个新的布鲁姆过滤器就变成了新的布鲁姆过滤器树的根节点,原来的布鲁姆过滤器就变成了这个新根节点的左子树,再创建一棵比原布鲁姆过滤器树高少一层的满二叉树作为新的根节点的右子树,把原根节点上置1的位置全部左移一位,插入到新的根节点中,此时新的根节点的两组H3哈希函数比原来的两组H3哈希函数多一层,即在基函数中多选取一行,继续在布鲁姆过滤器的末尾添加新的叶节点。Add new value operation: To add a new value to the Bloom filter, there are two cases: if the original Bloom filter is a binary tree that is not full, when a new value needs to be added, directly in Add a new leaf node at the end of the Bloom filter, which represents the newly added value; if the original Bloom filter is a full binary tree, add a new Bloom filter above the root node , the size of the newly added bloom filter is twice the root node of the original bloom filter tree, at this point, the new bloom filter becomes a new bloom filter tree The root node of the original Bloom filter becomes the left subtree of the new root node, and then a full binary tree that is one level less than the original Bloom filter tree is created as the right subtree of the new root node. Subtree, move all the positions set to 1 on the original root node to the left by one place, and insert them into the new root node. At this time, the two sets of H3 hash functions of the new root node are more than the original two sets of H3 hash functions. One more layer, that is, select one more row in the basis function, and continue to add new leaf nodes at the end of the Bloom filter.
所述插入操作中,在根节点选取待插入的数组的方法为:如果编码值为0,则选取A数组,如果编码值为1,则选取B数组。In the inserting operation, the method for selecting the array to be inserted at the root node is as follows: if the encoded value is 0, select the A array, and if the encoded value is 1, select the B array.
所述插入操作中,根据叶节点的第一位编码,得到下一个需要进行插入操作的布鲁姆过滤器的方法为:如果编码值为0,则操作左节点,如果编码值为1,则操作右节点。In the insertion operation, according to the first code of the leaf node, the method for obtaining the next Bloom filter that needs to be inserted into the operation is: if the code value is 0, the left node is operated; if the code value is 1, then Operate the right node.
与现有技术相比,本发明所具有的有益效果为:本发明在数据库交互查询、高速网络中资源定位、计算机网络监控等产生大量数据、需要进行键值对查询的应用领域,可以大大减少集合查询的时间,降低资源消耗,处理动态到达的数据,适应网络环境。Compared with the prior art, the present invention has the beneficial effects as follows: the present invention can greatly reduce the application fields in which a large amount of data is generated and the key-value pair query needs to be performed, such as database interactive query, resource location in high-speed network, computer network monitoring, etc. Set query time, reduce resource consumption, process dynamically arriving data, and adapt to the network environment.
附图说明Description of drawings
图1是布鲁姆树的结构示意图,它将布鲁姆过滤器与二叉树的结构相结合,每一个节点都是一个布鲁姆过滤器。根节点有两组H3哈希函数每组有k个哈希函数。一个叶节点就表示一个value,根据叶节点的编码可以得到一条由根节点到叶节点的唯一路径。根据叶节点编码,如果编码值为0,则对第一组哈希函数进行移位操作;如果编码值为1,则对第二组哈希函数进行移位操作。图1中就是得来的,就是得来的,而是得来的,是得来的。Figure 1 is a schematic diagram of the structure of a Bloom tree, which combines a Bloom filter with the structure of a binary tree, and each node is a Bloom filter. The root node has two sets of H3 hash functions Each group has k hash functions. A leaf node represents a value, and a unique path from the root node to the leaf node can be obtained according to the encoding of the leaf node. According to the leaf node encoding, if the encoded value is 0, the first group of hash functions is shifted; if the encoded value is 1, the second group of hash functions is shifted. In Figure 1 that is come, that is come, and Yes come, Yes come.
图2是在未满二叉树中增加新的value操作示意图。当一个未存在于叶节点的value需要插入时,如果二叉树不是满二叉树,则可以直接在树的末尾添加叶节点,表示插入的value。如图中编码为11的叶节点就是新添加的value。Figure 2 is a schematic diagram of the operation of adding a new value to an underfilled binary tree. When a value that does not exist in a leaf node needs to be inserted, if the binary tree is not a full binary tree, a leaf node can be added directly at the end of the tree to represent the inserted value. The leaf node coded as 11 in the figure is the newly added value.
图3是在满二叉树中增加新的value操作示意图。当一个未存在于叶节点的value需要插入时,如果二叉树是满二叉树,则不可以在原树上添加叶节点,此时需要往上扩展一层,如图3中新添加的level 3。原二叉树就变成了level 3的左子树,再构建一棵比原二叉树少一层的满二叉树作为level 3的右子树。此时,level 3也存在两组H3哈希函数其中是由得到,将原二叉树根节点中所有置向左移一位,插入到level 3中;而则是从原来的基矩阵中比多选取一行即可。此时构建了一棵新的未满二叉树,可以直接在树的末尾添加叶节点,表示插入的value。如图中编码为100的叶节点就是新添加的value。Figure 3 is a schematic diagram of the operation of adding a new value to a full binary tree. When a value that does not exist in a leaf node needs to be inserted, if the binary tree is a full binary tree, a leaf node cannot be added to the original tree. At this time, it is necessary to extend one level up, as shown in the newly added
图4是实施环境数据详细信息,包括数据来源,数据时间,包大小,本实施例与最近几年研究成果进行了对比,其中包括:Bloom Tree是2014年INFOCOM中的论文“Bloom tree:A search tree based on bloom filters for multiple-set membership testing”提出的一种将树和布鲁姆过滤器结构结合起来进行键值对查询的结构,该算法在每一个布鲁姆过滤器节点上都使用了d组哈希函数,从根节点开始计算,直到找到value对应的叶节点为止,该算法只能对静态数据进行处理;COMB是2012年IEEE/ACM Transactions onNetworking中的论文“Fast dynamic multiple-set membership testing usingcombinatorial bloom filters”提出的一种基于布鲁姆过滤器的键值对查询算法,该算法将value值进行编码后,对已知的value组数选取合适的布鲁姆过滤器数目和需要插值的布鲁姆过滤器数目,查询时则需要对所有布鲁姆过滤器进行查询,得到相应的编码。Figure 4 is the detailed information of the implementation environment data, including data source, data time, and packet size. The present embodiment is compared with the research results in recent years, including: Bloom Tree is a paper in INFOCOM in 2014 "Bloom tree: A search tree based on bloom filters for multiple-set membership testing", a structure that combines tree and bloom filter structures for key-value pair queries, the algorithm uses d on each bloom filter node The group hash function is calculated from the root node until the leaf node corresponding to the value is found. This algorithm can only process static data; COMB is the paper "Fast dynamic multiple-set membership testing" in IEEE/ACM Transactions on Networking in 2012 A key-value pair query algorithm based on Bloom filter proposed by usingcombinatorial bloom filters", the algorithm encodes the value value, selects the appropriate number of Bloom filters for the known number of value groups and the number of bloom filters that need to be interpolated. The number of Bloom filters. When querying, you need to query all Bloom filters to get the corresponding codes.
图5(a)~图5(f)是固定大小的布鲁姆过滤器树m=220bit,在不同哈希函数个数下,使用三种不同算法SBFT,Bloom Tree和COMB,处理1024组数据,在对数据的平均处理时间上进行比较。图5(a)为使用数据集MAWI 1进行仿真实验得到的实验结果示意图,图5(b)为使用数据集MAWI 2进行仿真实验得到的实验结果示意图,图5(c)为使用数据集MAWI 3进行仿真实验得到的实验结果示意图,图5(d)为使用数据集ClarkNet-HTTP进行仿真实验得到的实验结果示意图,图5(e)为使用数据集UMass进行仿真实验得到的实验结果示意图,图5(f)为使用数据集TKN进行仿真实验得到的实验结果示意图。图5(a)~图5(f)中所有数据结果显示,针对每种数据集,我们所提出的算法SBFT平均处理数据耗时最少。与Bloom Tree相比,我们的算法只需要在根节点选取d组哈希函数计算,然后进行移位操作,而BloomTree则需要每个节点都选取d组哈希函数计算,耗时较大;与COMB相比也类似,COMB对每一个布鲁姆过滤器都是需要选取多个哈希函数,耗时大,但COMB需要操作的布鲁姆过滤器数目比Bloom Tree要少,因此,耗时会比Bloom Tree小。Figures 5(a) to 5(f) are Bloom filter trees of fixed size m=2 20 bits. Under different numbers of hash functions, three different algorithms SBFT, Bloom Tree and COMB are used to process 1024 Group data, compared on the average processing time of the data. Figure 5(a) is a schematic diagram of the experimental results obtained by using the data set
图6是使用三种算法处理静态数据时内存消耗状态示意图。数据结果显示,三种算法消耗的内存是一样的,也就是说,我们的算法在处理数据耗时最少的情况下也不会占用多余的内存。Figure 6 is a schematic diagram of the memory consumption state when three algorithms are used to process static data. The data results show that the memory consumption of the three algorithms is the same, that is to say, our algorithm does not consume excess memory when processing the data with the least amount of time.
图7(a)~图7(f)是数据包不断增加时,三种算法处理数据包耗时情况示意图。每组使用k=3个哈希函数。由于Bloom Tree和COMB两种算法是在已知数据量情况下进行操作的,因此当数据包还未插入时,就需要把结构搭建好。而我们的算法则可以对未知数据量进行操作,随着数据包的到来处理数据,这是动态处理数据的过程。图7(a)为使用数据集MAWI1进行仿真实验得到的实验结果示意图,图7(b)为使用数据集MAWI 2进行仿真实验得到的实验结果示意图,图7(c)为使用数据集MAWI 3进行仿真实验得到的实验结果示意图,图7(d)为使用数据集ClarkNet-HTTP进行仿真实验得到的实验结果示意图,图7(e)为使用数据集UMass进行仿真实验得到的实验结果示意图,图7(f)为使用数据集TKN进行仿真实验得到的实验结果示意图。图7中所有数据结果显示,针对每种数据集,我们的算法处理数据包是一个动态的过程,并且处理数据耗时比Bloom Tree和COMB都要少,而Bloom Tree和COMB处理数据包则是一个静态过程,并且耗时较多。Figures 7(a) to 7(f) are schematic diagrams of the time-consuming situation of the three algorithms for processing data packets when the data packets increase continuously. Each group uses k=3 hash functions. Since the two algorithms of Bloom Tree and COMB operate under the condition of known data volume, the structure needs to be set up when the data packet has not been inserted. Our algorithm, on the other hand, can operate on an unknown amount of data and process data as packets arrive, which is a process of dynamically processing data. Figure 7(a) is a schematic diagram of the experimental results obtained by using the data set MAWI1 for simulation experiments, Figure 7(b) is a schematic diagram of the experimental results obtained by using the data set
图8(a)~图8(f)是数据包不断增加时,三种算法处理数据包消耗内存情况示意图。每组使用k=3个哈希函数。图8(a)为使用数据集MAWI 1进行仿真实验得到的实验结果示意图,图8(b)为使用数据集MAWI 2进行仿真实验得到的实验结果示意图,图8(c)为使用数据集MAWI 3进行仿真实验得到的实验结果示意图,图8(d)为使用数据集ClarkNet-HTTP进行仿真实验得到的实验结果示意图,图8(e)为使用数据集UMass进行仿真实验得到的实验结果示意图,图8(f)为使用数据集TKN进行仿真实验得到的实验结果示意图。图8(a)~图8(f)中所有数据结果显示,针对每种数据集,我们的算法处理数据包是一个动态的过程,我们的算法是边处理数据边搭建结构的过程,而Bloom Tree和COMB处理数据包则是一个静态过程,在处理数据之前就已经把结构搭建好了,这也说明它们的结构只能处理静态数据。Figures 8(a) to 8(f) are schematic diagrams of the memory consumption of the three algorithms for processing data packets when the data packets continue to increase. Each group uses k=3 hash functions. Figure 8(a) is a schematic diagram of the experimental results obtained by using the data set
具体实施方式Detailed ways
本实施中处理静态数据时,选取的内存大小m=220bit,根节点占用了95325bit,选取了18行8列的H3哈希函数基矩阵,在根节点上提取了基矩阵的前16行,每组选取k=3个哈希函数,处理组数g=1024组的数据,此时树高h=10。图1是静态布鲁姆树结构示意图,可以由此图分析键值对插入,查询过程。When processing static data in this implementation, the selected memory size is m=2 20 bits, the root node occupies 95325 bits, the H3 hash function base matrix with 18 rows and 8 columns is selected, and the first 16 bits of the base matrix are extracted from the root node. row, k=3 hash functions are selected for each group, and the data of the group number g=1024 groups are processed, and the tree height h=10 at this time. Figure 1 is a schematic diagram of a static Bloom tree structure, from which the key-value pair insertion and query process can be analyzed.
插入操作Insert(key,value)过程:根据value查找该value对应的叶节点,得到此叶节点位置编码,就可以确定一条从根节点到此叶节点的唯一路径。计算根节点的两组哈希函数,对key使用k个哈希函数h(i,1),h(i,2),...,h(i,k)(i表示选取哈希函数的组号)计算h(i,1)(key),h(i,2)(key),...,h(i,k)(key),将它们的值分别保存在两个数组A、B中。接着,根据叶节点的第一位编码,在根节点选取A、B数组中的一组值进行插入(如果编码值为0,则选取A数组,如果编码值为1,则选取B数组),即相当于对A、B数组进行了右移零位操作(A>>0或B>>0)。然后根据叶节点的第一位编码,得到下一个需要进行插入操作的布鲁姆过滤器(如果编码值为0,则操作左节点,如果编码值为1,则操作右节点)。再根据叶节点的第二位编码,选取A、B数组中的一组值进行右移一位操作(A>>1或B>>1),得到插入位置。继续进行类似的操作,对每一层布鲁姆过滤器插入key,直到插到value对应的叶节点为止。如图1,假设插入的键值对中value对应的编码为01,计算根节点的两组哈希函数,将它们的值分别保存在两个数组A、B中;由于编码第一位为0,则选择数组A在根节点右移0位A>>0插入根节点中,下一步到达左节点;由于编码第二位为1,则选择数组B在此节点右移一位B>>1插入节点中,下一步到达右节点;最后直接对数组B右移两位B>>2插入节点,完成插入操作。Insert (key, value) process: Find the leaf node corresponding to the value according to the value, get the position code of the leaf node, and then determine a unique path from the root node to the leaf node. Calculate the two sets of hash functions of the root node, and use k hash functions h (i,1) ,h (i,2) ,...,h (i,k) for the key (i represents the selected hash function group number) to calculate h (i,1) (key),h (i,2) (key),...,h (i,k) (key), and store their values in two arrays A, in B. Then, according to the first code of the leaf node, select a set of values in the A and B arrays at the root node for insertion (if the code value is 0, select the A array, if the code value is 1, select the B array), That is, it is equivalent to right-shifting the zero-bit operation on the A and B arrays (A>>0 or B>>0). Then, according to the first bit code of the leaf node, the next Bloom filter that needs to be inserted is obtained (if the code value is 0, the left node is operated, and if the code value is 1, the right node is operated). Then, according to the second bit code of the leaf node, select a set of values in the A and B arrays to perform a right-shift operation (A>>1 or B>>1) to obtain the insertion position. Continue to perform similar operations, inserting the key into each layer of Bloom filters until the leaf node corresponding to the value is inserted. As shown in Figure 1, assuming that the code corresponding to the value in the inserted key-value pair is 01, calculate the two sets of hash functions of the root node, and store their values in the two arrays A and B respectively; since the first bit of the code is 0 , then select array A and move 0 bits to the right of the root node, A>>0, and insert it into the root node, and then reach the left node in the next step; since the second bit of the code is 1, then select the array B and shift one bit to the right of this node B>>1 Insert into the node, the next step is to reach the right node; finally, directly move the array B to the right by two bits B>>2 to insert the node to complete the insertion operation.
查询Query(key)过程:首先,计算根节点的两组哈希函数h(i,1)(key),h(i,2)(key),...,h(i,k)(key)(i=1或2),其中i表示选取哈希函数的组号,将i=1计算的哈希值保存在数组A中,将i=2计算的哈希值保存在数组B中。在根节点分别对A、B两数组值对应的位置单元进行查询操作,如果对应位置的值都为1,即为满足,查到第一位编码值(如果A数组满足,则编码为0,如果B数组满足,则编码为1,如果都不满足,则该key值不存在,结束查询)。根据得到的编码值,转到下一个节点继续查询,此时,需要对A、B两数组进行右移操作(A>>1,B>>1),在对应的位置单元进行查询,得到一个编码值。继续进行类似操作,直到查找到叶节点为止。最后得到一个完整的编码值,根据这个叶节点编码就可以得到对应的value。由图1举例说明:假设查询key,首先计算根节点的两组哈希函数,将它们的值分别保存在A、B两组数组中;将A、B两数组的值分别在根节点中查询,A数组中的值在此节点中对应位置单元都为1,则记录下第一位编码为0,下一个操作节点是左节点;将A、B两数组右移一位A>>1,B>>1,分别在此节点中进行查询,B数组中的值在此节点中对应位置单元都为1,则记录下第二位编码为1,下一个操作节点是右节点;最后一步,直接对B数组右移两位B>>2进行查询,即最后的验证,这一步不用记录编码值,验证通过则查询到的编码为01,在表中找到对应的value即可,完成查询。Query (key) process: First, calculate the two sets of hash functions h (i,1) (key),h (i,2) (key),...,h (i,k) (key) of the root node ) (i=1 or 2), where i represents the group number of the selected hash function, the hash value calculated by i=1 is stored in array A, and the hash value calculated by i=2 is stored in array B. At the root node, query the position units corresponding to the two array values of A and B respectively. If the values of the corresponding positions are all 1, it is satisfied, and the first code value is found (if the A array is satisfied, the code is 0, If the B array is satisfied, the code is 1. If it is not satisfied, the key value does not exist, and the query ends). According to the obtained code value, go to the next node to continue the query. At this time, the two arrays A and B need to be shifted to the right (A>>1, B>>1), and the corresponding position unit is queried to get a encoded value. Continue similar operations until a leaf node is found. Finally, a complete encoded value is obtained, and the corresponding value can be obtained according to the encoding of this leaf node. Figure 1 illustrates as an example: Assuming to query the key, first calculate the two sets of hash functions of the root node, and store their values in the two arrays A and B respectively; query the values of the two arrays A and B in the root node respectively. , the values in the A array are all 1 in the corresponding position unit in this node, then the first bit code is recorded as 0, and the next operation node is the left node; move the A and B arrays to the right by one bit A>>1, B>>1, query in this node respectively, the value in the B array is 1 in the corresponding position unit in this node, then record the second bit code as 1, and the next operation node is the right node; the last step, Directly move the B array to the right by two bits B>>2 to query, that is, the final verification. This step does not need to record the code value. If the verification passes, the query code is 01, and the corresponding value can be found in the table to complete the query.
本实施例中处理动态数据时,增加新value操作分两种情况:如果原来的布鲁姆树是一棵未满二叉树,当需要添加新的value时,可以直接在树的末尾添加新的叶节点,这个叶节点就表示新添加的value,如图2编码为11的新的叶子节点;如果原来的布鲁姆树是一棵满二叉树,此时就需要在根节点的上方添加一个新布鲁姆过滤器,这个布鲁姆过滤器的大小是原根节点的2倍,此时,这个新的布鲁姆过滤器就是新的根节点,原来的布鲁姆树就变成了这个根节点的左子树,再创建一棵比原布鲁姆树高少一层的满二叉树作为新的根节点的右子树,把原根节点上置1的位置全部左移一位,插入到新的根节点中,此时根节点的两组H3哈希函数要比原来的两组H3哈希函数多一层,即在基函数中多选取一行,这时,就可以继续在树的末尾添加新的叶节点了,如图3中的编码为100的新的叶节点。When processing dynamic data in this embodiment, the operation of adding a new value can be divided into two cases: if the original Bloom tree is a binary tree that is not full, when a new value needs to be added, a new leaf can be added directly at the end of the tree node, this leaf node represents the newly added value, as shown in Figure 2, the new leaf node coded as 11; if the original Bloom tree is a full binary tree, then a new cloth needs to be added above the root node. Bloom filter, the size of this Bloom filter is twice the size of the original root node, at this time, the new Bloom filter is the new root node, and the original Bloom tree becomes this root The left subtree of the node, and then create a full binary tree that is one level less than the original Bloom tree as the right subtree of the new root node. In the new root node, the two sets of H3 hash functions of the root node are one layer more than the original two sets of H3 hash functions, that is, one more row is selected in the base function, and at this time, the tree can be continued. A new leaf node is added at the end of , such as the new leaf node with the
Claims (3)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201710542207.5A CN107330094B (en) | 2017-07-05 | 2017-07-05 | Bloom filter tree structure and key-value pair storage method for dynamically storing key-value pairs |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201710542207.5A CN107330094B (en) | 2017-07-05 | 2017-07-05 | Bloom filter tree structure and key-value pair storage method for dynamically storing key-value pairs |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| CN107330094A CN107330094A (en) | 2017-11-07 |
| CN107330094B true CN107330094B (en) | 2020-06-16 |
Family
ID=60196387
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN201710542207.5A Active CN107330094B (en) | 2017-07-05 | 2017-07-05 | Bloom filter tree structure and key-value pair storage method for dynamically storing key-value pairs |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN107330094B (en) |
Families Citing this family (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN108021678B (en) * | 2017-12-07 | 2022-05-17 | 北京理工大学 | Key value pair storage structure with compact structure and quick key value pair searching method |
| CN108717448B (en) * | 2018-05-18 | 2022-02-25 | 南京大学 | A key-value pair storage-oriented range query filtering method and key-value pair storage system |
| CN109508326B (en) * | 2018-11-22 | 2020-03-17 | 北京百度网讯科技有限公司 | Method, device and system for processing data |
| CN110674133B (en) * | 2019-09-09 | 2022-05-24 | 北京航天自动控制研究所 | Compression storage and calculation method for high-dimensional interpolation |
| CN110851848B (en) * | 2019-11-12 | 2022-03-25 | 广西师范大学 | Privacy protection method for symmetric searchable encryption |
| CN113794558B (en) * | 2021-09-16 | 2024-02-27 | 烽火通信科技股份有限公司 | L-tree calculation method, device and system in XMS algorithm |
Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN105745642A (en) * | 2014-03-31 | 2016-07-06 | 华为技术有限公司 | Device and method for inquiring data |
| CN106708427A (en) * | 2016-11-17 | 2017-05-24 | 华中科技大学 | Storage method suitable for key value pair data |
| CN106777003A (en) * | 2016-12-07 | 2017-05-31 | 安徽大学 | A kind of search index method and system towards Key Value storage systems |
Family Cites Families (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US10133764B2 (en) * | 2015-09-30 | 2018-11-20 | Sandisk Technologies Llc | Reduction of write amplification in object store |
-
2017
- 2017-07-05 CN CN201710542207.5A patent/CN107330094B/en active Active
Patent Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN105745642A (en) * | 2014-03-31 | 2016-07-06 | 华为技术有限公司 | Device and method for inquiring data |
| CN106708427A (en) * | 2016-11-17 | 2017-05-24 | 华中科技大学 | Storage method suitable for key value pair data |
| CN106777003A (en) * | 2016-12-07 | 2017-05-31 | 安徽大学 | A kind of search index method and system towards Key Value storage systems |
Also Published As
| Publication number | Publication date |
|---|---|
| CN107330094A (en) | 2017-11-07 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN107330094B (en) | Bloom filter tree structure and key-value pair storage method for dynamically storing key-value pairs | |
| CN103561133B (en) | A kind of IP address attribution information index method and method for quickly querying | |
| CN106326475B (en) | Efficient static hash table implementation method and system | |
| EP3435256B1 (en) | Optimal sort key compression and index rebuilding | |
| CN105260464B (en) | The conversion method and device of data store organisation | |
| CN101459560B (en) | Long stream recognition method, data flow measuring method and device thereof | |
| CN103077208B (en) | URL(uniform resource locator) matched processing method and device | |
| CN103714134A (en) | Network flow data index method and system | |
| CN106777003B (en) | An index query method and system for Key-Value storage system | |
| CN108197313B (en) | A space-optimized dictionary indexing method through a 16-bit Trie tree | |
| CN109040143A (en) | A kind of detection method and device of BGP anomalous event | |
| CN102110171A (en) | Method for inquiring and updating Bloom filter based on tree structure | |
| CN106599091B (en) | RDF graph structure storage and index method based on key value storage | |
| US20220005546A1 (en) | Non-redundant gene set clustering method and system, and electronic device | |
| CN102651026A (en) | Method for optimizing word segmentation of search engine through precomputation and word segmenting device of search engine | |
| CN110389953B (en) | Data storage method, storage medium, storage device and server based on compressed graph | |
| CN113946580B (en) | Massive heterogeneous log data retrieval middleware | |
| Gong et al. | Abc: a practicable sketch framework for non-uniform multisets | |
| Khan et al. | Set-based unified approach for attributed graph summarization | |
| CN105574076B (en) | A key-value pair storage structure and method based on Bloom Filter | |
| CN112528094B (en) | Multi-field range TCAM coding method and system based on hierarchical mapping | |
| CN105354310B (en) | Map tile storage layout optimization method based on MapReduce | |
| CN104809170B (en) | Towards the storage method of tree type data under a kind of cloud environment | |
| CN112115307A (en) | Vertex data rule storage structure of facing graph and connection topology compression method | |
| Kniesburges et al. | Hashed Patricia Trie: Efficient longest prefix matching in peer-to-peer systems |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant | ||
| GR01 | Patent grant |