[go: up one dir, main page]

CN101826098B - AB column diagram-based method for estimating spatial query selection rate - Google Patents

AB column diagram-based method for estimating spatial query selection rate Download PDF

Info

Publication number
CN101826098B
CN101826098B CN 201010105264 CN201010105264A CN101826098B CN 101826098 B CN101826098 B CN 101826098B CN 201010105264 CN201010105264 CN 201010105264 CN 201010105264 A CN201010105264 A CN 201010105264A CN 101826098 B CN101826098 B CN 101826098B
Authority
CN
China
Prior art keywords
bucket
histogram
probability
query
spatial
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN 201010105264
Other languages
Chinese (zh)
Other versions
CN101826098A (en
Inventor
程昌秀
陈荣国
周成虎
朱焰炉
张明波
谢炯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Geographic Sciences and Natural Resources of CAS
Original Assignee
Institute of Geographic Sciences and Natural Resources of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Geographic Sciences and Natural Resources of CAS filed Critical Institute of Geographic Sciences and Natural Resources of CAS
Priority to CN 201010105264 priority Critical patent/CN101826098B/en
Publication of CN101826098A publication Critical patent/CN101826098A/en
Application granted granted Critical
Publication of CN101826098B publication Critical patent/CN101826098B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

一种基于AB直方图的空间查询选择率估计方法,属于空间数据查询与处理技术领域。本发明首先对需要查询的矢量数据建立AB直方图,用户给定查询操作条件,空间选择操作的选择率是每个环形桶内对象数目与该桶满足给定选择条件的概率乘积的总和;而空间连接操作的选择率则是通过计算一个数据集中每个环形桶内对象的数目乘以该桶与另一个数据集中多个对象之间满足给定连接条件的概率最大值的总和。根据原始AB直方图以及相应空间关系的概率值,生成查询结果数据的AB直方图。本发明能较精确地估计含有disjoint,intersect,within,contains和overlap等空间关系的查询选择率,且能从原数据直方图生成查询结果数据的直方图,满足查询树中后续节点选择率估计的需求。

Figure 201010105264

A method for estimating the selection rate of spatial query based on an AB histogram belongs to the technical field of spatial data query and processing. The present invention first establishes the AB histogram for the vector data that needs to be queried, and the user specifies the query operation condition, and the selection rate of the space selection operation is the sum of the product number of objects in each annular bucket and the probability that the bucket satisfies the given selection condition; and The selection rate of the spatial join operation is calculated by calculating the sum of the number of objects in each ring bucket in a dataset multiplied by the maximum probability of satisfying a given join condition between the bucket and multiple objects in another dataset. According to the original AB histogram and the probability value of the corresponding spatial relationship, generate the AB histogram of the query result data. The present invention can more accurately estimate the query selectivity of spatial relations including disjoint, intersect, within, contains and overlap, and can generate a histogram of query result data from the original data histogram, satisfying the requirement of subsequent node selectivity estimation in the query tree need.

Figure 201010105264

Description

A kind of based on the histogrammic space querying selectivity estimation of AB method
Technical field
The present invention relates to a kind ofly based on the histogrammic space querying selectivity estimation of AB method, belong to spatial data inquiry and processing technology field.
Background technology
In recent years, spatial database becomes more and more important in many application, as Geographic Information System (GIS), computer-aided design (CAD) (CAD), image processing, multimedia system etc.Because space querying can be processed the multidimensional data in spatial database and satisfy application type widely, therefore quite concerned in recent years to the research of space querying, this is because space data quantity is huge, data structure is complicated, operation costs dearly, query optimization Cost Model in relational database can not be applicable to spatial data fully, so the optimization of space querying certainly will become difficult point and the breakthrough point that use in the space.
It is most important in space querying that the space is selected to be connected with the space, is also two kinds of more expensive operations of cost (I/O and CPU cost), about their query optimization attracts wide attention.At present, the Cost Model based on the Query Cost estimation is the most frequently used a kind of enquiring and optimizing method.The subject matter of estimation cost is the size of estimation Query Result (selection rate).
In Database Systems, for given inquiry, the cost that the more various differences of query optimizer carry into execution a plan, the person carries out therefrom to select the Least-cost.The whether accurate accuracy that directly has influence on the corresponding scheme cost estimation of selection rate estimation, thus the quality of query optimizer there is substantial impact.Therefore, the selectivity estimation of space querying is very important when seeking the most effective executive plan.
The algorithm that is used for the estimation selection rate mainly contains following three kinds: sampling method, parametric method and histogram.In numerous methods, be a kind of the most frequently used method based on histogrammic space querying selectivity estimation.Its basic thought: adopt certain strategy that data space is divided into several subspaces, the corresponding sub spaces of record cell; Statistics drops on the object number in its corresponding subspace in record cell; With someway these statistical values being estimated, obtain the estimated value of query results size, estimate thus Query Cost.These record cells are called bucket, and the set of bucket is called histogram.There is a serious defective in the spatial histogram of setting up in this way: the repeat count problem.
Chinese scholars has successively proposed the histograms such as MinSkew, SQ, CD, Euler, MP, GH.Wherein, MinSkew and SQ histogram are to adapt to the arbitary inquiry window by effective dividing data space, but there is the repeat count problem in they in the application for line and polygon data, therefore the error of its selection rate estimation is larger.The CD histogram is divided into equal-sized grid unit with data space, four angle points with data object MBR are corresponding respectively with four sub-histograms, a bucket storage in sub-histogram falls into the number of the corresponding angle point of this barrel, and it can return to the geographical geometric element number that intersects with query window accurately; The Euler histogram is the histogram that utilizes the Euler principle to build, its speciality to some extent aspect the topological relation selectivity estimation, but owing to there being " boundary problem ", can't accurately estimate the selection rate when query region does not overlap with the bins netting twine; Although CD and Euler histogram can solve the repeat count problem, but they can not distinguish more meticulous spatial relationship (as cotains, overlap etc.), two kinds of different situations are arranged in Fig. 2, CD and Euler histogram have but been stored respectively identical information (seeing accompanying drawing 2).The people such as Sun have studied more meticulous 4 kinds of topological relations (disjoint, cotains on Euelr histogrammic basis, within, overlap) selectivity estimation, 3 kinds of approximate algorithms have been proposed, but there is larger error when calculating Nie, makes its selectivity estimation error larger.Afterwards, the people such as liu had proposed the selectivity estimation algorithm that a kind of accurate multi-scalability solves above-mentioned 4 kinds of relations.The histogrammic improvement algorithm of MinSkew, SQ, CD, Euler and Euler is all selected operation for the space and is proposed, even can not apply in spatial join operation.
The histogram research that relevant space connects is less.Mainly contain at present MP algorithm and GH histogram.The basic assumption of MP algorithm is: the data object that intersects with grid unit is not only equally distributed, and similar width and height will be arranged.Therefore, under meticulous graticule mesh granularity, the non-constant of estimation effect.Generally only have under this extreme case of sizableness of the size of data object and query window, the error of MP algorithm is just smaller.The GH histogram is based on the observation that two crossing rectangles is always produced four crossing points, and the selection rate that the space connects can by at first estimating the number of joining between two data sets, then obtain divided by 4.The histogrammic degree of accuracy of GH depends on the height and width hypothesis relatively less than the height and width of query window of data object, if namely data centralization is some less data objects, the GH estimation precision is higher, otherwise, data centralization is take larger object as main, and the GH estimation precision is lower.MP algorithm and GH histogram can only be used for spatial join operation, can not be used for the selectivity estimation that the space is selected.
Above-mentioned existing various histograms, except the selectivity estimation of support space selection simultaneously and spatial join operation, also do not point out from the histogrammic method of existing histogram generated query result data, namely they only set up histogram in advance to raw data.And a query manipulation is carried out step by step according to query plan tree, and in query tree, the selection rate of each node is very important for the parameter in Cost Model.But do not estimate the histogramming algorithm of subsequent node selection rate at present, the well requirement of meeting spatial query optimization.
Summary of the invention
The objective of the invention is to propose a kind of based on the histogrammic space querying selectivity estimation of AB method.The method can more accurately be estimated disjoint, intersect, within, contains is connected the space of 5 kinds of spatial relationships to select the selection rate that is connected with the space with overlap, and can from the histogram of the histogram generated query result data of raw data, satisfy the demand of subsequent node selectivity estimation in query tree.
It is a kind of based on the histogrammic space querying selectivity estimation of AB method that the present invention proposes, and comprises the following steps:
The first step is pressed the histogrammic method for building up of AB, carries out the vector data of selectivity estimation for needs and sets up histogram;
Second step, the given query manipulation condition of user;
In the 3rd step, if only relate to a vector data set in described query manipulation condition, this is operating as the space and selects operation so, estimates first that each barrel of histogram satisfies the probability of given space alternative condition, then estimates the selection rate of this space selection operation;
The 4th step, if relate to two above vector data set in described query manipulation condition, this is operating as spatial join operation so, estimates first that each barrel of histogram satisfies the probability of given space condition of contact, then estimates the selection rate of two data ensemble spaces attended operation;
The 5th goes on foot, and utilizes the AB histogram of original AB histogram generated query result data, the selection rate of subsequent node in the estimation query tree.
Each barrel of estimation histogram of mentioning in above-mentioned the 3rd step satisfies the probability of given space alternative condition, and given space alternative condition refers to disjoint herein, intersect, within, contains, one or more in the overlap relation.Wherein, the probabilistic method of estimation disjoint and overlap is illustrated.
The 9-I model (9-intersection model) that proposes according to people such as Max J.Egenhofer as can be known, the spatial relationship between two spatial object P, Q can represent with following 3 * 3 matrixes:
P . i ∩ Q . i P . i ∩ Q . b P . i ∩ Q . e P . b ∩ Q . i P . b ∩ Q . b P . b ∩ Q . e P . e ∩ Q . i P . e ∩ Q . b P . e ∩ Q . e
Wherein, P.i, Q.i, P.e, Q.e, P.b, Q.b represent respectively inside, outside and the border of two objects.If only relatively whether intersect the inside of two spatial objects, spatial relationship can be divided into disjoint and two kinds of complementary situations of intersect; If pass through to compare the inside of two spatial objects and outside crossing situation, can obtain disjoint, overlap, contains, five kinds of objectionable intermingling situations of within and equal, be that the intersect relation can further be decomposed into overlap, contains, these four kinds of relations of within and equal.
When carrying out Probability estimate, it is 0 that there is the probability of equal relation in general hypothesis.Because in actual applications, the number that the MBR of query window and spatial object consists of the equal relation mostly is 1 (being 0 under normal circumstances) most, for a lot of spatial objects in whole data acquisition, the probability of equal relation is approximately 0, therefore this hypothesis is rational.Owing to being mutual exclusive two kinds of situations from disjoint and crossing intersect and consisting of complementary relationship, according to the fundamental property of probability, certain barrel of probability that satisfies disjoint equals 1 and deducts the probability that this barrel satisfies intersect.Because the intersect relation can be decomposed into overlap, contains, the mutual exclusive four kinds of situations of within and equal, and the probability of equal is 0, according to the addition formula of probability, certain barrel is satisfied the overlap probability and equals the probability that this barrel satisfy intersect and deduct respectively the probability that this barrel satisfies within, contains.
Above-mentioned estimated probability method is equally applicable to estimate in the 4th step that each barrel of histogram satisfies the probability of given space condition of contact.
It is a kind of based on the histogrammic space querying selectivity estimation of AB method that the present invention proposes, and its advantage is as follows:
(1) the AB histogram is the spatial histogram of a kind of more summary, practicality.The selectivity estimation that it can not only support space be selected, and the selectivity estimation that can support space connects.
(2) the AB histogram is to disjoint, intersect, and within, the selectivity estimation of contains and five kinds of spatial relationships of overlap has higher precision.In addition, utilize the rule of mutually changing between spatial relationship, the AB histogram can solve the selectivity estimation problem of more meticulousr spatial relationships.
(3) the AB histogram can from the histogram of the histogram generated query result data of raw data, satisfy the demand of subsequent node selectivity estimation in query tree.The histogram transformation approach makes the selection rate of estimating subsequent node in query tree and whole query tree become possibility.The selection rate of these nodes the optimization of space querying with carry out in be an important parameter.
(4) in spatial database, the AB histogram is easy to traditional attribute histogram integrated.Because except calculate different spaces concern the logical method of probability belong to the AB histogram peculiar, all the attribute histogram with traditional is identical for the histogrammic most of concepts of AB (such as the probability that represents various spatial relationships with patterned method etc.).Like this, can fully use for reference the existing good nature of traditional attribute histogram in the Spatial Query Optimization process.
Description of drawings
Fig. 1 is realization flow figure of the present invention;
Fig. 2 a and Fig. 2 b, Fig. 2 c are CD and the defective of Euler histogram when processing meticulous spatial relationship;
Fig. 3 a and Fig. 3 b are vector data, annular bucket and AB histogram schematic diagram;
Fig. 4 a, Fig. 4 b, Fig. 4 c, Fig. 4 d are the probability histogram of vector data shown in Figure 3 and query window spatial relationship;
Fig. 5 a is that another one vector data collection, Fig. 5 b are its corresponding AB histogram;
Fig. 6 a is that AB histogram, Fig. 6 b are intersect Spatial Join Query AB histogram as a result to the intersect Spatial Select Query as a result;
Fig. 7 a utilizes vector data, Fig. 7 b to be the query window data for the soil in certain city of China of using in the contrast experiment;
Fig. 8 is the selection rate experimental result that the space of 5 kinds of spatial relationships is selected;
Fig. 9 is the selection rate experimental result that the space of 5 kinds of spatial relationships connects, the land use data in D1 presentation graphs 7 (a); Query window data in D2 presentation graphs 7 (b).
Embodiment
1, take vector data shown in Figure 3 as example, the concrete implementation step that in the present invention, the space selects the query selection rate to estimate is described:
(1.1) be that vector data shown in Figure 3 generates the AB histogram;
MBR with each data object replaces corresponding data object, whole data space is divided 12 row * 12 row grid units, annular bucket according to each object MBR position judgement under it, if do not have the annular bucket to comprise this object, increase an annular bucket in the AB histogram, record the scope of annular bucket, and numerical value in bucket is set to 1; If have corresponding annular bucket to comprise this object, numerical value in bucket increased by 1.Repeat this process, until all data objects all have been assigned in corresponding annular bucket.The MBR of data object, and the annular bucket is as shown in Fig. 3 (a); The AB histogram of setting up is as shown in Fig. 3 (b).Total A, B, C, four annular buckets of D in this histogram, scope is respectively (2,2; 5,5), (6,2; 10,5), (2,6; 7,10), (0,0; 12,12), in each annular bucket, 2,3,1,2 data objects are arranged respectively.
(1.2) the given space querying operating conditions of user is selected operation for the space herein;
(i) user need to specify a query window, and query window herein is as shown in Fig. 3 (a) lower left corner.
(ii) estimate that each annular bucket satisfies the probability of intersect relation, as shown in Fig. 4 (a);
After estimating, A, B, C, four annular buckets of D are respectively 1.0,0.3,0.0,1.0 with the probability that query window satisfies the intersect relation.
(iii) estimate that each annular bucket satisfies the probability of within relation, as shown in Fig. 4 (b);
After estimating, A, B, C, four annular buckets of D are respectively 0.5,0.0,0.0,0.0 with the probability that query window satisfies the within relation.
(iv) estimate that each annular bucket satisfies the probability of contains relation, as shown in Fig. 4 (c);
After estimating, A, B, C, four annular buckets of D are respectively 0.0,0.0,0.0,0.5 with the probability that query window satisfies the contains relation.
(v) satisfy the overlap probability according to certain barrel and equal probability that this barrel satisfy intersect and deduct respectively the probability that this barrel satisfy within, contains and calculate probability that each annular bucket satisfies the overlap relation, as shown in Fig. 4 (d);
As calculated, A, B, C, four annular buckets of D are respectively 0.5,0.3,0.0,0.5 with the probability that query window satisfies the overlap relation.
(1.3) selection rate of selecting according to the space equals the summation that object number and this barrel in each annular bucket satisfy the probability product of given alternative condition, satisfies the spatial object selection rate of given space condition in the given query window of calculating user.
In Fig. 3 (a), the estimated value that query window and spatial object consist of the intersect relation is: 2 * 1.0+3 * 0.3+1 * 0.0+2 * 1=4.9; The estimated value that query window and spatial object consist of the overlap relation is: 2 * 0.5+3 * 0.3+1 * 0.0+2 * 0.5=2.9.In like manner can get the estimated value of other spatial relationships.
2, take Fig. 3 and vector data shown in Figure 5 as example, the concrete implementation step of intersect Spatial Join Query selectivity estimation in the present invention is described:
(2.1) be that Fig. 3 and vector data shown in Figure 5 generate the AB histogram;
Data space to two vector datas all is divided into 12 row * 12 row grid units, sets up respectively the AB histogram, and step is as described in concrete implementation step 1.1.Because grid unit does not become, therefore the AB histogram that in Fig. 3, vector data generates is also constant.The total A ' of the histogram that in Fig. 5, vector data generates, two annular buckets of B ', scope is respectively (5,7; 9,10), (1,1; 6,3), in each annular bucket, 1,1 data object is arranged respectively.
(2.2) the given space querying operating conditions of user is the intersect spatial join operation herein;
(i) obtain respectively the MBR distribution plan of data object;
The MBR of Fig. 3 data set is used as the master data set, and the MBR of Fig. 5 data set is as the query window set.
For the purpose of following narration is convenient, the data object MBR of annular bucket A ' in Fig. 5 is designated as I, the data object in annular bucket B ' is designated as II, namely always has 2 query windows in Fig. 5.
(ii) probability of the interior intersect spatial relationship of each annular bucket in drawing for estimate 3.
If bucket[i] outer shroud and query window rectangle from, P (i)=0.0; If bucket[i] outer shroud and query window rectangle intersection and interior ring and query window rectangle from, at this moment, need to calculate bucket[i] with length l ength and the width wide of query window rectangle lap, the distance of P (i)=min (length, wide) ÷ outer shroud and interior interannular; Otherwise, P (i)=1.Result of calculation is as follows:
1. the probability that after estimating, in Fig. 3, in A, B, C, four annular buckets of D, in data objects and Fig. 5, query window I satisfies the intersect relation is respectively 0.0,0.0,1.0,1.0;
2. the probability that after estimating, in Fig. 3, in A, B, C, four annular buckets of D, in data objects and Fig. 5, query window II satisfies the intersect relation is respectively 0.8,0.0,0.0,1.0;
(2.3) selection rate according to master data set space connection is to multiply by by the number that calculates the interior object of each annular bucket in this data acquisition the probability sum that this barrel satisfies given space condition of contact, and master data set (Fig. 3) with respect to the intersect space connection selection rate of query window data acquisition (Fig. 5) is:
Σ i = 1 m ( C ( i ) × max ( P ( i , 1 ) , P ( i , 2 ) , . . . , P ( i , l ) , . . . , P ( i , s ) ) ) = 2 × max ( 0.0,0.8 ) + 3 ×
max ( 0.0,0.0 ) + 1 × max ( 1.0,0.0 ) + 2 × max ( 1.0,1.0 ) = 2 × 0.8 + 3 × 0.0 + 1 × 1.0 + 2 ×
1.0 = 4.6 .
In like manner, the MBR of Fig. 3 data set can be used as the query window set, the MBR of Fig. 5 data set is as the master data set, and calculating chart 5 data sets are with respect to the intersect space connection selection rate of Fig. 3 data set.
Similar to embodiment 2, can estimate the selection rate of the Spatial Join Queries such as within, contains and overlap.
3, take intersect as example, the AB Nogata drawing generating method of Query Result in the present invention is described.
(3.1) the Query Result AB histogram of operation is selected in the intersect space.
Select to be operating as example with intersect space in Fig. 3.By Spatial Select Query as a result in the histogrammic bucket of AB numerical value be to be satisfied long-pending the obtaining as can be known of probability of given space alternative condition by numerical value in each annular bucket in the AB histogram of raw data and this barrel, in the histogrammic annular bucket of the AB of intersect Spatial Select Query result, numerical value is to be obtained by the amassing of probability that numerical value and this barrel in each annular bucket in the AB histogram of raw data satisfy intersect space alternative condition.In Query Result AB histogram, A, B, C, four interior numerical value of annular bucket of D are respectively: 2 * 1.0=2.0; 3 * 0.3=0.9; 1 * 0.0=0.0; 2 * 1.0=2.0.Generate the AB histogram of result as shown in Fig. 6 (a).
(3.2) the Query Result AB histogram of intersect spatial join operation.
The intersect spatial join operation is as example in Fig. 3 and Fig. 5.By numerical value in newly-generated AB histogram bucket be multiply by all probability that satisfy given space condition of contact in this barrel and query window data acquisition between a plurality of objects by numerical value in former annular bucket average as can be known, after vector data in Fig. 3 carried out the intersect spatial join operation, in newly-generated AB histogram annular bucket, numerical value was to multiply by by numerical value in former annular bucket the mathematical expectation of probability that satisfies intersect space condition of contact in this barrel and query window data acquisition between a plurality of objects to obtain.According to In Fig. 3 in the newly-generated AB histogram of data in A, B, C, four annular buckets of D numerical value be respectively: 2 * (0.0+0.8)/(1+1)=0.8; 3 * (0.0+0.0)/(1+1)=0.0; 1 * (1.0+0.0)/(1+1)=0.5; 2 * (1.0+1.0)/(1+1)=2.0.Generate the AB histogram of result as shown in Fig. 6 (b).In like manner can get the Query Result AB histogram after data intersect spatial join operation in Fig. 5, be omitted herein.
Similar to embodiment 3, in like manner can obtain disjoint, within, the AB histogram of the selection of the spaces such as contains and overlap and Spatial Join Query result.
The comparative analysis experiment:
We were with 1: 10 of certain city of China, and 000 land use data is experimental data, and it comprises 17207 polygons, and its MBR distributes as shown in Fig. 7 (a), and we define 9 rectangles as the query window data in addition, as shown in Fig. 7 (b).Select in test in the space, we respectively as different query windows, estimate to satisfy the number of the land use data of particular space relation with 9 rectangles.In the connecting test of space, 9 rectangles are regarded as another data set, are used as the set of query window.Use the inventive method, we respectively the space of the relations such as disjoint, intersect, within, contains and overlap are selected and the selectivity estimation of spatial join operation is tested, and have calculated the relative error between selectivity estimation value and true value.The experimental data statistics that the space is selected to be connected with the space is seen respectively Fig. 8 and Fig. 9.
As seen, the AB histogram can be used for be estimated disjoint, and intersect, within, contains are connected the space and are selected the selection rate that is connected with the space with the overlap relation.The error of method of the present invention is less, has higher accuracy rate.
The content that is not described in detail in instructions of the present invention belongs to the known prior art of this area professional and technical personnel.
The above is only the preferred embodiment of the present invention; should be pointed out that for those skilled in the art, under the prerequisite that does not break away from the principle of the invention; can also make some improvements and modifications, these improvements and modifications also should be considered as protection scope of the present invention.

Claims (4)

1.一种基于环形桶AB直方图的查询结果数据的AB直方图生成及空间查询选择率估计方法,用于优化空间查询,其特征在于包括如下步骤:1. a kind of AB histogram generation based on the query result data of annular bucket AB histogram and spatial query selectivity estimation method, for optimizing spatial query, it is characterized in that comprising the steps: 第一步,按AB直方图的建立方法,为需要进行选择率估计的矢量数据建立直方图;In the first step, according to the establishment method of AB histogram, a histogram is established for the vector data that needs to be estimated for selectivity; 第二步,用户给定查询操作条件;In the second step, the user specifies query operation conditions; 第三步,如果所述查询操作条件中只涉及一个矢量数据集合,那么该操作为空间选择操作,则先估算直方图各桶满足给定空间选择条件的概率,再估计该空间选择操作的选择率;In the third step, if only one vector data set is involved in the query operation condition, then the operation is a space selection operation, first estimate the probability that each bucket of the histogram satisfies the given space selection condition, and then estimate the selection of the space selection operation Rate; 第四步,如果所述查询操作条件中涉及两个以上矢量数据集合,那么该操作为空间连接操作,则先估算直方图各桶满足给定空间连接条件的概率,再估计两个数据集合空间连接操作的选择率;In the fourth step, if the query operation condition involves more than two vector data sets, then the operation is a spatial join operation. First, estimate the probability that each bucket of the histogram satisfies the given spatial join condition, and then estimate the space of the two data sets. selectivity of join operations; 第五步,利用原有的AB直方图生成查询结果数据的AB直方图,估算查询树中后续节点的选择率;The fifth step is to use the original AB histogram to generate the AB histogram of the query result data, and estimate the selection rate of subsequent nodes in the query tree; 所述AB直方图建立方法包括如下步骤:Described AB histogram establishment method comprises the steps: (2.1)将需要进行选择率估计的矢量数据的整个空间范围划分为大小相等的格网单元;(2.1) Divide the entire spatial range of the vector data that needs to be estimated into selectivity into grid cells of equal size; (2.2)最初AB直方图中没有任何环形桶;(2.2) There is no circular bucket in the initial AB histogram; (2.3)对于每个对象,如果它的最小外包矩形MBR不在任意一个环形桶的内外矩形之间,则向AB直方图中增加一个环形桶,并将环形桶内的数值赋为1;否则,将对象MBR所在的环形桶内数值增加1;(2.3) For each object, if its minimum bounding rectangle MBR is not between the inner and outer rectangles of any ring bucket, add a ring bucket to the AB histogram, and assign the value in the ring bucket to 1; otherwise, Increase the value in the ring bucket where the object's MBR is located by 1; (2.4)AB直方图中每个环形桶需要记录两类信息:环形桶的范围,以及落入该桶内数据对象MBR的个数,记为{(R(i),C(i))},其中1≤i≤N,N是直方图中环形桶的数目;(2.4) Each ring bucket in the AB histogram needs to record two types of information: the range of the ring bucket, and the number of MBR data objects falling into the bucket, recorded as {(R(i),C(i))} , where 1≤i≤N, N is the number of circular buckets in the histogram; 其中,先估算直方图各桶满足给定空间选择条件的概率,再估计该空间选择操作的选择率方法,包括如下步骤:Among them, first estimate the probability that each bucket of the histogram satisfies the given space selection condition, and then estimate the selection rate method of the space selection operation, including the following steps: (4.1)估算直方图各桶满足disjoint,intersect,within,contains,overlap中单个空间选择条件的概率;(4.1) Estimate the probability that each bucket of the histogram satisfies disjoint, intersect, within, contains, and overlap the probability of a single space selection condition; (4.2)估算直方图各桶满足disjoint,intersect,within,contains,overlap中复合空间选择条件的概率;(4.2) Estimate the probability that each bucket of the histogram satisfies the composite space selection conditions in disjoint, intersect, within, contains, and overlap; 对于涉及复合空间选择条件的查询都可以转换为由or或and连接的单个空间选择条件查询的组合,对于其中的单个空间选择条件按(4.1)步计算其满足单个空间条件的概率,对于由or或and连接的复合空间选择条件,则按下列公式1和公式2来计算该复合条件的概率:All queries involving composite space selection conditions can be converted into a combination of single space selection condition queries connected by or or and. For a single space selection condition, the probability of satisfying a single space condition is calculated according to step (4.1). For the single space selection condition by or or the compound space selection condition connected by and, the probability of the compound condition is calculated according to the following formula 1 and formula 2: Por(i)=P1(i)+P2(i)-P1(i)×P2(i)公式1P or (i)=P 1 (i)+P 2 (i)-P 1 (i)×P 2 (i) Formula 1 Pand(i)=P1(i)×P2(i)           公式2P and (i)=P 1 (i)×P 2 (i) Formula 2 (4.3)基于(4.1)(4.2)步得出各个桶满足给定空间选择条件的概率,则该空间选择的选择率等于每个环形桶内对象数目与该桶满足给定选择条件的概率乘积的总和,即空间选择的选择率为
Figure FDA00002481243200021
其中,1≤i≤N,N是直方图中环形桶的数目,P(i)是桶i满足给定空间条件的概率,C(i)是落入该桶内数据对象MBR的个数;
(4.3) Based on steps (4.1) and (4.2), the probability that each bucket satisfies the given space selection condition is obtained, then the selection rate of the space selection is equal to the product of the number of objects in each annular bucket and the probability that the bucket satisfies the given selection condition The sum of , that is, the selectivity of space selection is
Figure FDA00002481243200021
Among them, 1≤i≤N, N is the number of annular buckets in the histogram, P(i) is the probability that bucket i satisfies the given space condition, and C(i) is the number of data objects MBR falling into the bucket;
其中,先估算直方图各桶满足给定空间连接条件的概率,再估计两个数据集合空间连接操作的选择率方法,包括如下步骤:Among them, first estimate the probability that each bucket of the histogram satisfies the given spatial join condition, and then estimate the selectivity method of the spatial join operation of the two data sets, including the following steps: (5.1)对已经建立好AB直方图的两个矢量数据集合,分别获取数据对象的MBR分布图;(5.1) Obtain the MBR distribution diagram of the data object respectively for the two vector data collections that have established the AB histogram; (5.2)在估计两个数据集合空间连接的选择率时,将一个数据集的MBR当作基本数据集合,另一个数据集的MBR作为查询窗口数据集合;(5.2) When estimating the selectivity of the spatial connection of two data sets, the MBR of one data set is regarded as the basic data set, and the MBR of the other data set is used as the query window data set; (5.3)基本数据集合直方图的每个桶满足给定空间连接条件的概率是该桶与查询窗口数据集合中多个对象之间满足给定连接条件的概率最大值,即max(P(i,1),P(i,2),…,P(i,l),…,P(i,s)),其中,P(i,l)是基本数据集合与查询窗口数据集合中空间对象1间满足给定空间连接条件的概率,s为查询窗口数据集合中空间对象的数目;(5.3) The probability that each bucket of the basic data set histogram satisfies the given spatial connection condition is the maximum probability of satisfying the given connection condition between the bucket and multiple objects in the query window data set, that is, max(P(i ,1),P(i,2),…,P(i,l),…,P(i,s)), where P(i,l) is the spatial object in the basic data set and the query window data set The probability that a room satisfies the given spatial connection condition, s is the number of spatial objects in the query window data set; (5.4)基本数据集合空间连接的选择率是通过计算该数据集合中每个环形桶内对象的数目乘以该桶满足给定空间连接条件的概率之和,即 Σ i = 1 m ( C ( i ) × max ( P ( i , 1 ) , P ( i , 2 ) , . . . , P ( i , l ) , . . . , P ( i , s ) ) ) , 其中,m为直方图中环形桶的数目;(5.4) The selection rate of the spatial connection of the basic data set is calculated by calculating the sum of the number of objects in each annular bucket in the data set multiplied by the probability that the bucket satisfies the given spatial connection condition, that is, Σ i = 1 m ( C ( i ) × max ( P ( i , 1 ) , P ( i , 2 ) , . . . , P ( i , l ) , . . . , P ( i , the s ) ) ) , Among them, m is the number of circular buckets in the histogram; (5.5)反之,将基本数据集合作为查询窗口数据集合,将查询窗口数据集合作为基本数据集合,通过(5.3)-(5.4)步可以估算出原查询窗口数据集合满足给定空间连接条件的选择率;(5.5) Conversely, the basic data set is used as the query window data set, and the query window data set is used as the basic data set. Through steps (5.3)-(5.4), the selection of the original query window data set that satisfies the given spatial connection conditions can be estimated Rate; 其中,所述利用原有的AB直方图生成查询结果数据的AB直方图,包括如下步骤:Wherein, said utilizing the original AB histogram to generate the AB histogram of the query result data includes the following steps: (6.1)空间选择查询结果AB直方图的桶内数值是由原始数据的AB直方图中各个环形桶内数值与该桶满足给定空间选择条件的概率之积得到;(6.1) The value in the bucket of the space selection query result AB histogram is obtained by the product of the value in each ring bucket in the AB histogram of the original data and the probability that the bucket satisfies the given space selection condition; (6.2)由于空间连接查询的结果是由两个空间列组成的数据对g,g’,在满足连接条件数据集合中,g,g’的AB直方图中的每个环形桶内数值可分别按公式3和公式4计算,即新生成的AB直方图桶内数值是由原环形桶内数值乘以该桶与查询窗口数据集合中多个对象之间满足给定空间连接条件的所有概率之均值:(6.2) Since the result of the spatial join query is a data pair g, g' composed of two spatial columns, in the data set satisfying the join condition, the values in each annular bucket in the AB histogram of g, g' can be respectively Calculated according to Formula 3 and Formula 4, that is, the value in the newly generated AB histogram bucket is multiplied by the value in the original circular bucket multiplied by all the probabilities that satisfy the given spatial connection conditions between the bucket and multiple objects in the query window data set mean: C 1 ( i ) = C 0 ( i ) × Σ i = 1 s P ( i , l ) / s 公式3 C 1 ( i ) = C 0 ( i ) × Σ i = 1 the s P ( i , l ) / the s Formula 3 C 1 ′ ( i ) = C 0 ′ ( i ) × Σ i = 1 t P ′ ( i , l ) / t 公式4 C 1 ′ ( i ) = C 0 ′ ( i ) × Σ i = 1 t P ′ ( i , l ) / t Formula 4 其中,C0(i),C’0(i)分别是两个数据集合对应的AB直方图中第i个环形桶内数据对象MBR的数目;C1(i),C’1(i)分别是新生成的AB直方图环形桶内数据对象MBR的数目;P(i,l)是基本数据集合第i个环形桶与查询窗口数据中第l个对象满足给定空间连接条件的概率,P’(i,l)是查询窗口数据集合第i个环形桶与基本数据集合中第l个对象满足给定空间连接条件的概率;t、s分别为两个数据集合中空间对象的数目。Among them, C 0 (i), C' 0 (i) are the number of data object MBR in the i-th ring bucket in the AB histogram corresponding to the two data sets; C 1 (i), C' 1 (i) are the number of data object MBR in the ring bucket of the newly generated AB histogram; P(i,l) is the probability that the i-th ring bucket of the basic data set and the l-th object in the query window data meet the given spatial connection conditions, P'(i,l) is the probability that the i-th ring bucket in the query window data set and the l-th object in the basic data set meet the given spatial connection conditions; t and s are the number of spatial objects in the two data sets, respectively.
2.根据权利要求1所述的方法,其特征在于,所述给定空间选择和空间连接条件均是指disjoint,intersect,within,contains,overlap关系中的一种或多种。2. The method according to claim 1, wherein the given space selection and space connection conditions both refer to one or more of disjoint, intersect, within, contains, and overlap relations. 3.根据权利要求1所述的方法,其特征在于,所述环形桶是由数据对象的MBR所在的格网单元形成的环形区域,环形桶的范围用(xa,ya;xb,yb)表示,其中,(xa,ya),(xb,yb)分别表示环形桶外环的左下角坐标,右上角坐标。3. The method according to claim 1, wherein the annular bucket is an annular area formed by the grid unit where the MBR of the data object is located, and the range of the annular bucket is represented by (xa, ya; xb, yb) , where (xa, ya), (xb, yb) represent the coordinates of the lower left corner and the upper right corner of the outer ring of the ring barrel respectively. 4.根据权利要求1所述的方法,其特征在于,所述disjoint,intersect,within,contains,overlap空间关系中一种给定空间条件的概率估计法,包括如下步骤:4. The method according to claim 1, characterized in that, said disjoint, intersect, within, contains, a probability estimation method for a given spatial condition in the overlap spatial relationship, comprising the steps of: (8.1)估计i桶bucket[i]满足intersection概率方法如下:(8.1) The method of estimating that bucket[i] satisfies the intersection probability is as follows: A.若bucket[i]的外环与查询窗口矩形相离,则P(i)=0.0;A. If the outer ring of bucket[i] is separated from the rectangle of the query window, then P(i)=0.0; B.若bucket[i]的外环与查询窗口矩形相交、且内环与查询窗口矩形相离,此时,需要计算bucket[i]与查询窗口矩形重叠部分的长度length和宽度wide,则P(i)=min(length,wide)÷外环与内环间的距离;B. If the outer ring of bucket[i] intersects with the rectangle of the query window, and the inner ring is separated from the rectangle of the query window, at this time, it is necessary to calculate the length and width of the overlapping part of bucket[i] and the rectangle of the query window, then P (i)=min(length,wide)÷the distance between the outer ring and the inner ring; C.其他情况,P(i)=1;C. In other cases, P(i)=1; (8.2)估计i桶满足within概率方法如下:(8.2) The method of estimating that bucket i satisfies the within probability is as follows: A.若bucket[i]的外环与查询窗口矩形不构成within关系,且内环与查询窗口也不构成within关系,则P(i)=0.0;A. If the outer ring of bucket[i] does not form a within relationship with the query window rectangle, and the inner ring and the query window do not form a within relationship, then P(i)=0.0; B.若bucket[i]的外环与查询窗口矩形不构成within关系、且内环与查询窗口构成within关系,则P(i)=bucket[i]的内环的四条边与相应的查询窗口矩形四条边的最小距离÷外环与内环间的距离;B. If the outer ring of bucket[i] does not form a within relationship with the query window rectangle, and the inner ring forms a within relationship with the query window, then P(i)=the four sides of the inner ring of bucket[i] and the corresponding query window The minimum distance between the four sides of the rectangle ÷ the distance between the outer ring and the inner ring; C.其他情况,P(i)=1;C. In other cases, P(i)=1; (8.3)估计i桶满足contains概率方法如下:(8.3) The method of estimating the probability that bucket i satisfies contains is as follows: A.若bucket[i]的外环与查询窗口矩形不构成contains关系,且内环与查询窗口也不构成contains关系,则P(i)=0.0;A. If the outer ring of bucket[i] does not form a contains relationship with the query window rectangle, and the inner ring does not form a contains relationship with the query window, then P(i)=0.0; B.若bucket[i]的外环与查询窗口矩形不构成contains关系、且内环与查询窗口构成contains关系,则P(i)=bucket[i]的外环的四条边与相应的查询窗口矩形四条边的最小距离÷外环与内环间的距离;B. If the outer ring of bucket[i] does not form a contains relationship with the query window rectangle, and the inner ring forms a contains relationship with the query window, then P(i)=the four sides of the outer ring of bucket[i] and the corresponding query window The minimum distance between the four sides of the rectangle ÷ the distance between the outer ring and the inner ring; C.其他情况,P(i)=1;C. In other cases, P(i)=1; (8.4)i桶满足disjoint概率是1减去该桶满足intersect的概率;(8.4) The probability that bucket i satisfies disjoint is 1 minus the probability that the bucket satisfies intersect; (8.5)i桶满足overlap概率是该桶满足intersect的概率分别减去该桶满足within、contains的概率。(8.5) The probability that bucket i satisfies overlap is the probability that the bucket satisfies intersect minus the probability that the bucket satisfies within and contains respectively.
CN 201010105264 2010-02-03 2010-02-03 AB column diagram-based method for estimating spatial query selection rate Expired - Fee Related CN101826098B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN 201010105264 CN101826098B (en) 2010-02-03 2010-02-03 AB column diagram-based method for estimating spatial query selection rate

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN 201010105264 CN101826098B (en) 2010-02-03 2010-02-03 AB column diagram-based method for estimating spatial query selection rate

Publications (2)

Publication Number Publication Date
CN101826098A CN101826098A (en) 2010-09-08
CN101826098B true CN101826098B (en) 2013-05-08

Family

ID=42690016

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 201010105264 Expired - Fee Related CN101826098B (en) 2010-02-03 2010-02-03 AB column diagram-based method for estimating spatial query selection rate

Country Status (1)

Country Link
CN (1) CN101826098B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8949224B2 (en) * 2013-01-15 2015-02-03 Amazon Technologies, Inc. Efficient query processing using histograms in a columnar database
CN105103152A (en) * 2013-01-31 2015-11-25 惠普发展公司,有限责任合伙企业 approximate query processing
CN104731889B (en) * 2015-03-13 2018-02-06 河海大学 A kind of method for estimating query result size

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1334534A (en) * 2000-07-21 2002-02-06 Lg电子株式会社 Method for searching multimedia using progressive histogram

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7389283B2 (en) * 2004-12-07 2008-06-17 International Business Machines Corporation Method for determining an optimal grid index specification for multidimensional data

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1334534A (en) * 2000-07-21 2002-02-06 Lg电子株式会社 Method for searching multimedia using progressive histogram

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
Ning An等.Toward an Accurate Analysis of Range Queries on Spatial Data.《IEEE transactions on knowledge and data engineering》.2003,第15卷(第2期),第305-323页.
Toward an Accurate Analysis of Range Queries on Spatial Data;Ning An等;《IEEE transactions on knowledge and data engineering》;20030430;第15卷(第2期);第305-323页 *
熊伟等.空间数据库中距离连接选择率估计方法研究.《计算机学报》.2006,第29卷(第1期),第49-52页.
空间数据库中距离连接选择率估计方法研究;熊伟等;《计算机学报》;20060131;第29卷(第1期);第49-52页 *
空间查询代价模型;郭平等;《计算机科学》;20041231;第31卷(第12期);第65-67,80页 *
郭平等.空间查询代价模型.《计算机科学》.2004,第31卷(第12期),第65-67,80页.

Also Published As

Publication number Publication date
CN101826098A (en) 2010-09-08

Similar Documents

Publication Publication Date Title
CN103236086B (en) One takes the contextual multiple dimensioned DEM modeling method of the earth's surface hydrology into account
CN101510225B (en) Product STL Model Boolean Operation Method
TWI385544B (en) Density-based data clustering method
CN104156524B (en) The Aggregation Query method and system of transport data stream
CN103198151B (en) The search index system and method for regional urban public traffic vehicles operation information
CN102867066B (en) Data Transform Device and data summarization method
CN103927346B (en) Query connection method on basis of data volumes
CN102508973B (en) Rapid intersection method for STL (stereo lithography) models of products
CN102163224A (en) Adaptive spatial clustering method
CN107451183B (en) Knowledge map construction method based on text clustering idea
CN102073981A (en) Point group geographic entity selection method under the restriction of correlated elements
CN105574194B (en) A kind of coordinate points processing method and processing device for electronic map interface
CN110001066B (en) Method for determining filling direction of minimum partition in three-dimensional printing
CN101510228B (en) Nonuniform simplifying method for STL model of products
CN101692230A (en) Three-dimensional R tree spacial index method considering levels of detail
CN109598056A (en) Measurement Method, system and the storage medium of town site form compactness
CN101826098B (en) AB column diagram-based method for estimating spatial query selection rate
CN101976452B (en) Integrated filtering method of airborne laser scanning spot clouds based on contour line cluster analysis
CN112100130B (en) Massive remote sensing variable multi-dimensional aggregation information calculation method based on data cube model
CN119274081A (en) Drought monitoring method and system based on ground feature detection
CN118551248B (en) Spatial data clustering method, system, device and medium with vertical anisotropy
Zhao et al. A new k nearest neighbours algorithm using cell grids for 3d scattered point cloud
CN107276807B (en) Hierarchical network community tree pruning method based on community dynamic compactness
CN114049387A (en) Tree skeleton extraction method based on 3D point cloud
CN109241216B (en) A method for grid management using grid-level storage database architecture

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20130508

Termination date: 20170203

CF01 Termination of patent right due to non-payment of annual fee