CN102646118B - Data indexing method and device - Google Patents
Data indexing method and device Download PDFInfo
- Publication number
- CN102646118B CN102646118B CN201210039265.3A CN201210039265A CN102646118B CN 102646118 B CN102646118 B CN 102646118B CN 201210039265 A CN201210039265 A CN 201210039265A CN 102646118 B CN102646118 B CN 102646118B
- Authority
- CN
- China
- Prior art keywords
- tree
- index
- indexed object
- lifetime
- hash
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
- G06F16/2228—Indexing structures
- G06F16/2246—Trees, e.g. B+trees
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
技术领域 technical field
本发明涉及数据管理领域,尤其涉及一种数据索引方法和装置。The invention relates to the field of data management, in particular to a data indexing method and device.
背景技术 Background technique
近几十年来,数据管理技术发展迅猛,在国民经济建设中起到了突出作用。以Oracle、DB2、SQL Server等为代表的大型关系数据库管理系统(Relational Database Management System,RDBMS)更是诸多大型信息管理系统、客户关系管理软件不可或缺的核心部分。同时,以可扩展标记语言(Extensible Markup Language,XML)为代表的半结构化数据管理技术也在数据交换和缺乏严格结构的数据管理方面占据一席之地。上述技术均对数据质量、待处理数据的准确性要求非常高。当原始数据的质量不高时,需要先经过预处理过程提升数据质量。以部门人事管理系统为例,员工的个人资料、薪酬待遇和日常考核等信息必须准确。但在诸如经济、军事和电信等领域,数据的不确定性普遍存在,其存在性未知而且各属性值存在误差。尽管数据预处理能够提升原始数据集合的质量,但也可能会丧失原始数据集合的部分性质,导致无法返回高质量的查询结果。典型的应用背景如下。In recent decades, data management technology has developed rapidly and played a prominent role in national economic construction. The large-scale relational database management system (Relational Database Management System, RDBMS) represented by Oracle, DB2, SQL Server, etc. is an indispensable core part of many large-scale information management systems and customer relationship management software. At the same time, the semi-structured data management technology represented by Extensible Markup Language (XML) also occupies a place in data exchange and data management lacking strict structure. The above technologies all have very high requirements on the data quality and the accuracy of the data to be processed. When the quality of the original data is not high, it needs to go through the preprocessing process to improve the data quality. Taking the departmental personnel management system as an example, information such as employee personal data, salary and benefits, and daily assessment must be accurate. However, in fields such as economics, military affairs, and telecommunications, data uncertainty is common, its existence is unknown, and there are errors in each attribute value. Although data preprocessing can improve the quality of the original data set, it may also lose some of the properties of the original data set, resulting in the inability to return high-quality query results. A typical application background is as follows.
基于位置的服务(Location Base Service,LBS)是移动计算领域的核心问题。位置服务跟踪移动物体(或者用户),然后将物体(或用户)的位置在电子地图上定位,以此为基础提供空间信息服务。在这类应用中,移动物体的位置受到特定技术手段(例如GPS(Global Positioning System,全球定位系统)技术)制约,存在一定的误差。尽管这项误差会随着技术手段的提升而逐步缩小,但是“位置隐私”问题却显得日益突出。移动物体的位置信息非常重要,有些用户并不愿意公诸于众,以免带来麻烦。“位置隐私”的目的是降低位置的精度——在某时刻,移动物体并非在某一空间“点”上,而是在一个“区域”内,从而保护了隐私。与此同时,各互联网服务提供商仍然能够根据这项“区域”信息提供相应的服务,例如,查询移动对象附近的医院、宾馆等设施。Location-based service (Location Base Service, LBS) is the core issue in the field of mobile computing. Location service tracks moving objects (or users), and then locates the location of objects (or users) on the electronic map, and provides spatial information services based on this. In such applications, the position of the moving object is restricted by specific technical means (such as GPS (Global Positioning System, Global Positioning System) technology), and there is a certain error. Although this error will gradually narrow with the improvement of technical means, the problem of "location privacy" is becoming more and more prominent. The location information of moving objects is very important, and some users are reluctant to disclose it to avoid trouble. The purpose of "location privacy" is to reduce the accuracy of location - at a certain moment, the moving object is not in a certain spatial "point", but in an "area", thereby protecting privacy. At the same time, each Internet service provider can still provide corresponding services according to this "area" information, for example, inquire about facilities such as hospitals and hotels near the mobile object.
索引技术是数据管理技术的重要内容。关系型数据库往往采用B+树及其变种为一维数据建立索引;在多维数据管理领域或时间-空间数据管理领域,广泛使用R树以及其变种进行索引。这些索引技术均能够大幅提高查询处理速度。同理,在处理不确定性数据中也需要关注索引问题。在某些查询任务中,例如top-k查询,元组的概率值也非常重要,因此需要针对概率维度创建一维索引,此时传统索引技术有效。但传统的索引技术无法解决所有问题。Indexing technology is an important part of data management technology. Relational databases often use B+ trees and their variants to index one-dimensional data; in the field of multidimensional data management or time-spatial data management, R trees and their variants are widely used for indexing. These indexing technologies can greatly improve the query processing speed. Similarly, indexing issues also need to be paid attention to when dealing with uncertain data. In some query tasks, such as top-k query, the probability value of the tuple is also very important, so it is necessary to create a one-dimensional index for the probability dimension, and the traditional index technology is effective at this time. But traditional indexing techniques cannot solve all problems.
当各元组的取值必须通过概率分布函数描述,且概率分布函数无法预先指定时,传统的索引技术索引效率将大幅降低,无法满足应用需求。When the value of each tuple must be described by a probability distribution function, and the probability distribution function cannot be specified in advance, the indexing efficiency of the traditional indexing technology will be greatly reduced and cannot meet the application requirements.
发明内容 Contents of the invention
本发明提供了一种数据索引方法和装置,解决了传统索引技术无法满足大型数据检索需要的问题。The invention provides a data indexing method and device, which solves the problem that the traditional indexing technology cannot meet the needs of large-scale data retrieval.
一种数据索引方法,包括:A data indexing method, comprising:
创建所述至少一个被索引对象R的索引树;creating an index tree of the at least one indexed object R;
根据至少一个被索引对象的ID建立哈希结构;Build a hash structure based on the ID of at least one indexed object;
在所述哈希结构中存储所述被索引对象在所述索引树中各位置的生命期。The lifetime of each position of the indexed object in the index tree is stored in the hash structure.
优选的,所述创建至少一个被索引对象R的索引树包括:Preferably, said creating an index tree of at least one indexed object R includes:
创建最上层的TPR-Tree;Create the top-level TPR-Tree;
在所述TPR-Tree之下链接有至少一个2维R-Tree;There is at least one 2-dimensional R-Tree linked under the TPR-Tree;
将各R-Tree通过哈希链接链接至一个一维R-Tree。Link each R-Tree to a one-dimensional R-Tree through a hash link.
优选的,所述在所述哈希结构中存储所述被索引对象在所述索引树中各位置的生命期具体为:Preferably, the lifetime of each position of the indexed object stored in the hash structure in the index tree is specifically:
在所述哈希结构中存储所述被索引对象处在所述TPR-Tree或所述2维R-Tree或所述一维R-Tree中的生命期。The lifetime of the indexed object in the TPR-Tree or the 2-dimensional R-Tree or the 1-dimensional R-Tree is stored in the hash structure.
优选的,上述数据索引方法还包括:Preferably, the above data indexing method also includes:
在对任一被索引对象进行时间间隔查询或时间片查询时,通过所述哈希结构查找所述被索引对象的生命期;When performing a time interval query or a time slice query on any indexed object, look up the lifetime of the indexed object through the hash structure;
根据所述被索引对象在所述索引树中各位置对应的生命期,确定所述被索引对象对应索引在所述索引树中的位置。According to the lifetime corresponding to each position of the indexed object in the index tree, the position of the index corresponding to the indexed object in the index tree is determined.
优选的,所述生命期具体为被索引对象处于同一状态下持续的时间间隔。Preferably, the lifetime is specifically a time interval during which the indexed object is in the same state.
本发明还提供了一种数据索引装置,包括:The present invention also provides a data indexing device, including:
索引树创建模块,用于创建所述至少一个被索引对象R的索引树;An index tree creation module, configured to create an index tree of the at least one indexed object R;
哈希结构生成模块,用于根据至少一个被索引对象的ID建立哈希结构;A hash structure generating module, configured to establish a hash structure according to the ID of at least one indexed object;
关联模块,用于在所述哈希结构中存储所述被索引对象在所述索引树中各位置的生命期。An associating module, configured to store the lifetime of each position of the indexed object in the index tree in the hash structure.
优选的,所述索引树创建模块包括:Preferably, the index tree creation module includes:
第一创建单元,用于创建最上层的TPR-Tree;The first creation unit is used to create the uppermost TPR-Tree;
第二创建单元,用于在所述TPR-Tree之下链接有至少一个2维R-Tree;The second creation unit is used to link at least one 2-dimensional R-Tree under the TPR-Tree;
第三创建单元,将各R-Tree通过哈希链接链接至一个一维R-Tree。The third creation unit links each R-Tree to a one-dimensional R-Tree through a hash link.
优选的,上述数据索引装置还包括:Preferably, the above-mentioned data indexing device further includes:
索引模块,用于在对任一被索引对象进行时间间隔查询或时间片查询时,通过所述哈希结构查找所述被索引对象的生命期,并根据所述被索引对象在所述索引树中各位置对应的生命期,确定所述被索引对象对应索引在所述索引树中的位置。The indexing module is used to find the life cycle of the indexed object through the hash structure when performing time interval query or time slice query on any indexed object, and according to the indexed object in the index tree The lifetime corresponding to each position in the index tree is used to determine the position of the index corresponding to the indexed object in the index tree.
本发明提供了一种数据索引方法和装置,根据至少一个被索引对象的ID建立哈希结构,创建所述至少一个被索引对象R的索引树,再在所述哈希结构中存储所述被索引对象在所述索引树中各位置的生命期,将哈希索引和索引树索引两种方式结合起来对数据进行索引,提高了索引效率和索引精度,解决了传统索引技术无法满足大型数据检索需要的问题。The present invention provides a data indexing method and device. A hash structure is established according to the ID of at least one indexed object, an index tree of the at least one indexed object R is created, and the indexed object R is stored in the hash structure. The life cycle of each position of the index object in the index tree, the hash index and the index tree index are combined to index the data, which improves the index efficiency and index accuracy, and solves the problem that the traditional index technology cannot meet the requirements of large data retrieval. Questions needed.
附图说明 Description of drawings
图1为本发明的实施例一提供的一种数据索引方法的流程图;FIG. 1 is a flowchart of a data indexing method provided by Embodiment 1 of the present invention;
图2为本发明的实施例中所涉及的索引树结构示意图;FIG. 2 is a schematic diagram of an index tree structure involved in an embodiment of the present invention;
图3为本发明的实施例中哈希结构与索引树关联关系的示意图;FIG. 3 is a schematic diagram of the relationship between the hash structure and the index tree in an embodiment of the present invention;
图4为本发明的实施例三提供的一种数据索引装置的结构示意图。FIG. 4 is a schematic structural diagram of a data indexing device provided by Embodiment 3 of the present invention.
具体实施方式 Detailed ways
索引技术是数据管理技术的重要内容。关系型数据库往往采用B+树及其变种为一维数据建立索引;在多维数据管理领域或时间-空间数据管理领域,广泛使用R树以及其变种进行索引。这些索引技术均能够大幅提高查询处理速度。同理,在处理不确定性数据中也需要关注索引问题。在某些查询任务中,例如top-k查询,元组的概率值也非常重要,因此需要针对概率维度创建一维索引,此时传统索引技术有效。但传统的索引技术无法解决所有问题。Indexing technology is an important part of data management technology. Relational databases often use B+ trees and their variants to index one-dimensional data; in the field of multidimensional data management or time-spatial data management, R trees and their variants are widely used for indexing. These indexing technologies can greatly improve the query processing speed. Similarly, indexing issues also need to be paid attention to when dealing with uncertain data. In some query tasks, such as top-k query, the probability value of the tuple is also very important, so it is necessary to create a one-dimensional index for the probability dimension, and the traditional index technology is effective at this time. But traditional indexing techniques cannot solve all problems.
当各元组的取值必须通过概率分布函数描述,且概率分布函数无法预先指定时,传统的索引技术索引效率将大幅降低,无法满足应用需求。When the value of each tuple must be described by a probability distribution function, and the probability distribution function cannot be specified in advance, the indexing efficiency of the traditional indexing technology will be greatly reduced and cannot meet the application requirements.
为了解决上述问题,本发明的实施例提供了一种数据索引方法和装置。下文中将结合附图对本发明的实施例进行详细说明。需要说明的是,在不冲突的情况下,本申请中的实施例及实施例中的特征可以相互任意组合。In order to solve the above problems, embodiments of the present invention provide a data indexing method and device. Embodiments of the present invention will be described in detail below in conjunction with the accompanying drawings. It should be noted that, in the case of no conflict, the embodiments in the present application and the features in the embodiments can be combined arbitrarily with each other.
首先结合附图,对本发明的实施例一进行说明。First, Embodiment 1 of the present invention will be described with reference to the accompanying drawings.
本发明实施例提供了一种数据索引方法,能够进行不确定性数据管理索引。传统的解决方案一般采用树索引或者哈希(Hash)索引的方式,但树索引技术和哈希索引技术都有其优缺点。比如,树索引技术适合随机数据访问;哈希索引技术适合顺序结构数据,类似广播信道。树索引技术对簇集的数据广播非常有效;但簇集对哈希索引技术性能影响不大。哈希索引技术特别适合多属性的数据索引;树索引技术提供了一种基于索引值较准确和完整的全局视图,客户机能快速地在树索引上找到想得到的数据的到达时间,这样,谐调时间自然就缩短了。由于哈希索引不包含数据帧的全局信息,它仅仅只能对客户机判定当前数据帧是否与查询有关提供帮助。其过滤的有效性在很大程度上取决于哈希索引的平均失效率。An embodiment of the present invention provides a data indexing method, which can perform uncertain data management indexing. Traditional solutions generally use tree index or hash (Hash) index, but both tree index technology and hash index technology have their advantages and disadvantages. For example, tree index technology is suitable for random data access; hash index technology is suitable for sequential structured data, similar to broadcast channels. Tree index technology is very effective for data broadcasting of clusters; but clusters have little effect on the performance of hash index technology. Hash index technology is especially suitable for multi-attribute data index; tree index technology provides a more accurate and complete global view based on the index value, and the client can quickly find the arrival time of the desired data on the tree index, so that the coordination time Naturally shortened. Since the hash index does not contain the global information of the data frame, it can only help the client to determine whether the current data frame is related to the query. The effectiveness of its filtering depends largely on the average failure rate of the hash index.
使用本发明实施例提供的数据索引方法完成数据索引的流程如图1所示,包括:The process of completing data indexing using the data indexing method provided by the embodiment of the present invention is shown in Figure 1, including:
步骤101、创建所述至少一个被索引对象R的索引树;Step 101, creating an index tree of the at least one indexed object R;
本发明实施例中,该索引树的最上层是TPR-Tree,然后是多个2维R-Tree,2维的R-Tree的哈希链接一个一维的R-Tree。本发明的实施例所涉及的索引树结构如图2所示。In the embodiment of the present invention, the top layer of the index tree is a TPR-Tree, and then multiple 2-dimensional R-Trees, and the hashes of the 2-dimensional R-Trees are linked to a 1-dimensional R-Tree. The index tree structure involved in the embodiment of the present invention is shown in FIG. 2 .
TPR树是具有R树结构的多路平衡树。树中每个非叶子结点都由若干个(TPBR,Point)单元组成。TPBR为当前包含其对应孩子的带时间参数边界矩形.Point是一个指向孩子结点的指针。叶子结点由若干个(TPBR,ObjectlD)组成。其中TPBR为包含对应移动对象的带时间参数边界矩形.ObjectlD是一个指向移动对象的指针,通过指针可以得到对应移动对象的详细信息。The TPR tree is a multi-way balanced tree with an R-tree structure. Each non-leaf node in the tree consists of several (TPBR, Point) units. TPBR is the bounding rectangle with time parameter that currently contains its corresponding child. Point is a pointer to the child node. The leaf node is composed of several (TPBR, ObjectlD). Among them, TPBR is a bounding rectangle with time parameter containing the corresponding moving object. ObjectID is a pointer pointing to the moving object, and the detailed information of the corresponding moving object can be obtained through the pointer.
R-tree是B-tree向多维空间发展的另一种形式,它将空间对象按范围划分,每个结点都对应一个区域和一个磁盘页,非叶结点的磁盘页中存储其所有子结点的区域范围,非叶结点的所有子结点的区域都落在它的区域范围之内;叶结点的磁盘页中存储其区域范围之内的所有空间对象的外接矩形。每个结点所能拥有的子结点数目有上、下限,下限保证对磁盘空间的有效利用,上限保证每个结点对应一个磁盘页,当插入新的结点导致某结点要求的空间大于一个磁盘页时,该结点一分为二(分裂)。R树是一种动态索引结构,即:它的查询可与插入或删除同时进行,而且不需要定期地对树结构进行重新组织。R-tree is another form of B-tree's development to multi-dimensional space. It divides spatial objects into ranges. Each node corresponds to an area and a disk page. The disk pages of non-leaf nodes store all its children. The area range of a node, the areas of all sub-nodes of non-leaf nodes fall within its area range; the disk pages of leaf nodes store the circumscribed rectangles of all spatial objects within its area range. The number of sub-nodes that each node can have has upper and lower limits. The lower limit ensures the effective use of disk space, and the upper limit ensures that each node corresponds to a disk page. When inserting a new node, the space required by a node When larger than one disk page, the node is split in two (split). The R-tree is a dynamic index structure, that is, its query can be performed simultaneously with insertion or deletion, and the tree structure does not need to be reorganized periodically.
步骤102、根据至少一个被索引对象的ID建立哈希结构;Step 102, establishing a hash structure according to the ID of at least one indexed object;
对于全部被索引对象,可以根据它们的ID构建哈希结构(哈希表)。For all indexed objects, a hash structure (hash table) can be constructed according to their IDs.
步骤103、在所述哈希结构中存储所述被索引对象在所述索引树中各位置的生命期;Step 103, storing the lifetime of each position of the indexed object in the index tree in the hash structure;
本步骤中,在哈希结构内存储每个被索引对象处在TPR-Tree、R-Tree或(2维R-Tree+1维R-Tree)中的生命期。In this step, the lifetime of each indexed object in TPR-Tree, R-Tree or (2-dimensional R-Tree+1-dimensional R-Tree) is stored in the hash structure.
哈希结构与索引树的关联关系如图3所示。The relationship between the hash structure and the index tree is shown in Figure 3.
步骤104、在对任一被索引对象进行时间间隔查询或时间片查询时,通过所述哈希结构查找所述被索引对象的生命期;Step 104, when performing time interval query or time slice query on any indexed object, look up the lifetime of the indexed object through the hash structure;
步骤105、根据所述被索引对象在所述索引树中各位置对应的生命期,确定所述被索引对象对应索引在所述索引树中的位置;Step 105: Determine the position of the index corresponding to the indexed object in the index tree according to the lifetime corresponding to each position of the indexed object in the index tree;
本步骤中,在对于时间间隔查询和时间片查询时,通过哈希结构中搜索出来的生命期,可以直接确定从树索引中的哪个索引结构开始搜索。In this step, when querying time intervals and time slices, the lifetime searched in the hash structure can be used to directly determine which index structure in the tree index to start searching.
下面结合附图,对本发明的实施例二进行说明。Embodiment 2 of the present invention will be described below with reference to the accompanying drawings.
本发明实施例提供了一种数据索引方法,下面对本发明实施例所提供的数据索引方法应用于移动通信环境中的实现过程进行说明。最近十年来,随着无线通讯、定位技术的发展,基于位置服务(LBS)得到了非常广泛的应用。我们假定通讯公司需要跟踪每个手机用户的实时位置,以便给特定区域分配合理的带宽,保证通讯的顺畅,不至于出现拥塞现象;或者需要知道使用手机作为通讯工具报警的人的当前位置。这都需要对手机用户的位置信息进行实时追踪。持有手机者的移动可能纷繁复杂,但无非就是静止、类似静止、低速无限制移动、有限制高速移动(这通常需要借助交通工具)。An embodiment of the present invention provides a data indexing method. The implementation process of applying the data indexing method provided in the embodiment of the present invention to a mobile communication environment will be described below. In the past ten years, with the development of wireless communication and positioning technology, location-based services (LBS) have been widely used. We assume that the communication company needs to track the real-time location of each mobile phone user in order to allocate reasonable bandwidth to a specific area to ensure smooth communication without congestion; or need to know the current location of the person who uses the mobile phone as a communication tool to call the police. All of this requires real-time tracking of the location information of mobile phone users. The mobile phone holder's movement may be complicated, but it is nothing more than static, similar to static, low-speed unlimited movement, and limited high-speed movement (this usually requires the use of transportation).
可以使用R-树索引手机用户的静止和类似静止状态;使用TPR-树索引手机用户的低速无限制移动;而用(2-维R-树+1-维R-树)索引手机用户在有限制高速移动对象中的类似静止状态。R-trees can be used to index mobile phone users' static and similar static states; TPR-trees can be used to index mobile phone users' low-speed unlimited movement; and (2-dimensional R-tree + 1-dimensional R-tree) can be used to index mobile phone users in Constrains stationary-like states in high-speed moving objects.
该混合索引中的每条记录都有生命期,所谓生命期是指对象的移动速度和方向保持不变的时间间隔。用[tstart,tend]表示。Each record in the composite index has a lifetime, and the so-called lifetime refers to the time interval during which the moving speed and direction of the object remain unchanged. It is represented by [t start , t end ].
例如一手机用户出门去火车站,首先是作为一个低速移动对象被TPR-树索引;火车开动后,原记录被逻辑删除(令tend等于现在时间),将新的特殊记录插入到TPR-树,该记录指向相应的高速移动对象索引。到达目的地后,又将该特殊记录逻辑删除,并向TPR-树插入一个普通的记录。For example, when a mobile phone user goes out to the train station, it is firstly indexed by the TPR-tree as a low-speed moving object; after the train starts, the original record is logically deleted ( tend is equal to the current time), and a new special record is inserted into the TPR-tree , the record points to the corresponding high-speed moving object index. After reaching the destination, the special record is logically deleted, and an ordinary record is inserted into the TPR-tree.
对于被索引的全部对象,可以根据它们的ID,构建哈希结构,该结构中还存储每个对象处在TPR-树、R-树或(2维R-树+1维R-树)中的生命期。对于时间间隔查询和时间片查询,通过哈希结构中搜索出来的生命期,可以直接确定从哪个索引结构开始搜索。For all objects to be indexed, a hash structure can be constructed according to their IDs, which also stores each object in a TPR-tree, R-tree or (2-dimensional R-tree + 1-dimensional R-tree) lifetime. For time interval query and time slice query, through the lifetime searched in the hash structure, you can directly determine which index structure to start searching from.
下面结合附图,对本发明的实施例三进行说明。Embodiment 3 of the present invention will be described below with reference to the accompanying drawings.
本发明实施例提供了一种数据索引装置,其结构如图4所示,包括:An embodiment of the present invention provides a data indexing device, the structure of which is shown in Figure 4, including:
索引树创建模块401,用于创建所述至少一个被索引对象R的索引树;An index tree creation module 401, configured to create an index tree of the at least one indexed object R;
哈希结构生成模块402,用于根据至少一个被索引对象的ID建立哈希结构;A hash structure generation module 402, configured to establish a hash structure according to the ID of at least one indexed object;
关联模块403,用于在所述哈希结构中存储所述被索引对象在所述索引树中各位置的生命期。An associating module 403, configured to store the lifetime of each position of the indexed object in the index tree in the hash structure.
优选的,所述索引树创建模块401包括:Preferably, the index tree creation module 401 includes:
第一创建单元,用于创建最上层的TPR-Tree;The first creation unit is used to create the uppermost TPR-Tree;
第二创建单元,用于在所述TPR-Tree之下链接有至少一个2维R-Tree;The second creation unit is used to link at least one 2-dimensional R-Tree under the TPR-Tree;
第三创建单元,将各R-Tree通过哈希链接链接至一个一维R-Tree。The third creation unit links each R-Tree to a one-dimensional R-Tree through a hash link.
优选的,上述数据索引装置还包括:Preferably, the above-mentioned data indexing device further includes:
索引模块404,用于在对任一被索引对象进行时间间隔查询或时间片查询时,通过所述哈希结构查找所述被索引对象的生命期,并根据所述被索引对象在所述索引树中各位置对应的生命期,确定所述被索引对象对应索引在所述索引树中的位置。The indexing module 404 is configured to search for the life cycle of the indexed object through the hash structure when performing time interval query or time slice query on any indexed object, and according to the indexed object in the index The lifetime corresponding to each position in the tree determines the position of the index corresponding to the indexed object in the index tree.
本发明的实施例提供了一种数据索引方法和装置,根据至少一个被索引对象的ID建立哈希结构,创建所述至少一个被索引对象R的索引树,再在所述哈希结构中存储所述被索引对象在所述索引树中各位置的生命期,将哈希索引和索引树索引两种方式结合起来对数据进行索引,提高了索引效率和索引精度,解决了传统索引技术无法满足大型数据检索需要的问题。将树索引和哈希索引两种索引技术结合,有效的提高了多维数据管理的效率。An embodiment of the present invention provides a data indexing method and device, which establishes a hash structure based on the ID of at least one indexed object, creates an index tree of the at least one indexed object R, and then stores in the hash structure For the life period of each position of the indexed object in the index tree, the hash index and the index tree index are combined to index the data, which improves the indexing efficiency and indexing accuracy, and solves the problem that the traditional indexing technology cannot meet The problem of large data retrieval needs. Combining the two index technologies of tree index and hash index effectively improves the efficiency of multi-dimensional data management.
本领域普通技术人员可以理解上述实施例的全部或部分步骤可以使用计算机程序流程来实现,所述计算机程序可以存储于一计算机可读存储介质中,所述计算机程序在相应的硬件平台上(如系统、设备、装置、器件等)执行,在执行时,包括方法实施例的步骤之一或其组合。Those of ordinary skill in the art can understand that all or part of the steps of the above-mentioned embodiments can be implemented using a computer program flow, the computer program can be stored in a computer-readable storage medium, and the computer program can be run on a corresponding hardware platform (such as system, device, device, device, etc.), and when executed, includes one or a combination of the steps of the method embodiment.
可选地,上述实施例的全部或部分步骤也可以使用集成电路来实现,这些步骤可以被分别制作成一个个集成电路模块,或者将它们中的多个模块或步骤制作成单个集成电路模块来实现。这样,本发明不限制于任何特定的硬件和软件结合。Optionally, all or part of the steps in the above embodiments can also be implemented using integrated circuits, and these steps can be fabricated into individual integrated circuit modules, or multiple modules or steps among them can be fabricated into a single integrated circuit module accomplish. As such, the present invention is not limited to any specific combination of hardware and software.
上述实施例中的各装置/功能模块/功能单元可以采用通用的计算装置来实现,它们可以集中在单个的计算装置上,也可以分布在多个计算装置所组成的网络上。The devices/functional modules/functional units in the above embodiments can be realized by general-purpose computing devices, and they can be concentrated on a single computing device, or distributed on a network composed of multiple computing devices.
上述实施例中的各装置/功能模块/功能单元以软件功能模块的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。上述提到的计算机可读取存储介质可以是只读存储器,磁盘或光盘等。When each device/functional module/functional unit in the above-mentioned embodiments is realized in the form of a software function module and sold or used as an independent product, it can be stored in a computer-readable storage medium. The computer-readable storage medium mentioned above may be a read-only memory, a magnetic disk or an optical disk, and the like.
任何熟悉本技术领域的技术人员在本发明揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本发明的保护范围之内。因此,本发明的保护范围应以权利要求所述的保护范围为准。Any person familiar with the technical field can easily think of changes or substitutions within the technical scope disclosed in the present invention, and all should be covered within the protection scope of the present invention. Therefore, the protection scope of the present invention should be based on the protection scope described in the claims.
Claims (6)
Priority Applications (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201210039265.3A CN102646118B (en) | 2012-02-20 | 2012-02-20 | Data indexing method and device |
| PCT/CN2013/071627 WO2013123867A1 (en) | 2012-02-20 | 2013-02-18 | Data indexing method and device |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201210039265.3A CN102646118B (en) | 2012-02-20 | 2012-02-20 | Data indexing method and device |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| CN102646118A CN102646118A (en) | 2012-08-22 |
| CN102646118B true CN102646118B (en) | 2014-11-05 |
Family
ID=46658937
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN201210039265.3A Active CN102646118B (en) | 2012-02-20 | 2012-02-20 | Data indexing method and device |
Country Status (2)
| Country | Link |
|---|---|
| CN (1) | CN102646118B (en) |
| WO (1) | WO2013123867A1 (en) |
Families Citing this family (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN102646118B (en) * | 2012-02-20 | 2014-11-05 | 浪潮(北京)电子信息产业有限公司 | Data indexing method and device |
| CN102915382A (en) * | 2012-11-21 | 2013-02-06 | 亚信联创科技(中国)有限公司 | Method and device for carrying out data query on database based on indexes |
| CN105786932B (en) * | 2014-12-26 | 2020-03-27 | 北大医疗信息技术有限公司 | Query method and query device for clinical business in medical system |
| CN114691697B (en) * | 2022-04-22 | 2025-08-22 | 无锡科技职业学院 | A method for describing and applying the relationship between data using a multi-attribute mixed index |
Citations (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN102012908A (en) * | 2010-11-12 | 2011-04-13 | 浙江大学 | Method for inquiring visible neighbours of moving objects in environment with barriers |
Family Cites Families (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN1203433C (en) * | 2002-06-26 | 2005-05-25 | 联想(北京)有限公司 | Data storing and query combination method in a flush type system |
| CN101256579A (en) * | 2008-04-08 | 2008-09-03 | 中兴通讯股份有限公司 | Method for inquesting data organization in database |
| CN102646118B (en) * | 2012-02-20 | 2014-11-05 | 浪潮(北京)电子信息产业有限公司 | Data indexing method and device |
-
2012
- 2012-02-20 CN CN201210039265.3A patent/CN102646118B/en active Active
-
2013
- 2013-02-18 WO PCT/CN2013/071627 patent/WO2013123867A1/en active Application Filing
Patent Citations (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN102012908A (en) * | 2010-11-12 | 2011-04-13 | 浙江大学 | Method for inquiring visible neighbours of moving objects in environment with barriers |
Non-Patent Citations (3)
| Title |
|---|
| 李焕梅.《移动点对象Hash-R索引及反向最近邻查询》.《中国优秀硕士学位论文全文数据库》.2011,3.3.3节,图3-5. * |
| 李贞海.《交通网络中移动对象全时态索引研究与实现》.《中国优秀硕士学位论文全文数据库》.2011,2.7.2节. * |
| 金泽峰.《TRP-树在基于位置服务系统中的引用研究》.《中国优秀硕士学位论文全文数据库》.2008,4.2.1节,4.3.3节,图4.8. * |
Also Published As
| Publication number | Publication date |
|---|---|
| WO2013123867A1 (en) | 2013-08-29 |
| CN102646118A (en) | 2012-08-22 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN109299102B (en) | A HBase secondary index system and method based on Elastcisearch | |
| US9576007B1 (en) | Index and query serving for low latency search of large graphs | |
| CN107273506B (en) | A method for joint query of multiple tables in a database | |
| Sharma et al. | Sql and nosql databases | |
| Hariharan et al. | Processing spatial-keyword (SK) queries in geographic information retrieval (GIR) systems | |
| US8589425B2 (en) | Scalable rendering of large spatial databases | |
| CN104850601B (en) | Police service based on chart database analyzes application platform and its construction method in real time | |
| US10762068B2 (en) | Virtual columns to expose row specific details for query execution in column store databases | |
| CN106611053B (en) | Data cleaning and indexing method | |
| CN103023970A (en) | Method and system for storing mass data of Internet of Things (IoT) | |
| CN103390015A (en) | Mass data united storage method based on unified indexing and search method | |
| CN111639075B (en) | Non-relational database vector data management method based on flattened R tree | |
| CN106933833A (en) | A kind of positional information method for quickly querying based on Spatial Data Index Technology | |
| Cudré-Mauroux et al. | Graph data management systems for new application domains | |
| CN102646118B (en) | Data indexing method and device | |
| Montoya et al. | A knowledge base for personal information management | |
| CN104346444A (en) | Optimum site selection method based on road network reverse spatial keyword query | |
| CN108804580A (en) | A method of the key word of the inquiry in federal type RDF data library | |
| Alsubaiee et al. | Asterix: scalable warehouse-style web data integration | |
| Papadias et al. | Indexing and retrieval of historical aggregate information about moving objects | |
| CN115344568A (en) | Memory index mechanism processing method and device, electronic equipment and storage medium | |
| CN117271688B (en) | General position information organization method based on geospatial grid domain name | |
| EP1677209B1 (en) | Searching based on object relationships | |
| Kai et al. | Research on spatial database technology based on Arcsde | |
| Varga et al. | STORING LOCATION-BASED SERVICES'DATA IN KEY-VALUE STORE. |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| C06 | Publication | ||
| PB01 | Publication | ||
| C10 | Entry into substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| C14 | Grant of patent or utility model | ||
| GR01 | Patent grant |