[go: up one dir, main page]

CN114817772B - A segmented parallel expansion method for keyword optimal path query - Google Patents

A segmented parallel expansion method for keyword optimal path query Download PDF

Info

Publication number
CN114817772B
CN114817772B CN202210510076.3A CN202210510076A CN114817772B CN 114817772 B CN114817772 B CN 114817772B CN 202210510076 A CN202210510076 A CN 202210510076A CN 114817772 B CN114817772 B CN 114817772B
Authority
CN
China
Prior art keywords
keyword
path
vertex
query
target value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210510076.3A
Other languages
Chinese (zh)
Other versions
CN114817772A (en
Inventor
牛保宁
刘蒙蒙
杨茸
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Taiyuan University of Technology
Original Assignee
Taiyuan University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Taiyuan University of Technology filed Critical Taiyuan University of Technology
Priority to CN202210510076.3A priority Critical patent/CN114817772B/en
Publication of CN114817772A publication Critical patent/CN114817772A/en
Application granted granted Critical
Publication of CN114817772B publication Critical patent/CN114817772B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9537Spatial or temporal dependent retrieval, e.g. spatiotemporal queries
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • G06F16/316Indexing structures
    • G06F16/319Inverted lists
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/38Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/387Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using geographical or spatial information, e.g. location

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Library & Information Science (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本发明属于空间数据查询技术领域,具体是一种关键词最优路径查询的分段并行拓展方法。包括以下步骤。S100:计算查询图中任意两顶点间的最小代价值及最小代价值对应的目标值和最小目标值及最小目标值对应的代价值,同时记录查询图中的最小代价值和最小目标值。S200:关键词倒排列表构建:将顶点的所有关键词信息组合成非重合的关键词集合,构建形如{wi:vj}的关键词倒排列表,记录关键词对应的顶点。S300:跳过与查询关键词无关的顶点,构建目标值最小的关键词顶点路径记为以及代价值最小的关键词顶点路径记为S400:对关键词顶点路径进行划分,并对划分后的路径分段并行拓展。

The present invention belongs to the field of spatial data query technology, and specifically is a segmented parallel expansion method for keyword optimal path query. It includes the following steps. S100: Calculate the minimum cost value between any two vertices in the query graph, the target value corresponding to the minimum cost value, the minimum target value, and the cost value corresponding to the minimum target value, and record the minimum cost value and the minimum target value in the query graph at the same time. S200: Keyword inverted list construction: Combine all keyword information of the vertex into a non-overlapping keyword set, construct a keyword inverted list of the form {w i :v j }, and record the vertices corresponding to the keyword. S300: Skip vertices that are not related to the query keyword, and construct the keyword vertex path with the smallest target value, recorded as And the keyword vertex path with the smallest cost value is recorded as S400: Divide the keyword vertex path, and expand the divided path in segments in parallel.

Description

Segmentation parallel expansion method for keyword optimal path query
Technical Field
The invention belongs to the technical field of space data query, and particularly relates to a segmentation parallel expansion method for keyword optimal path query.
Background
With the diversification of the demands of people on travel paths, the path query is not just a common shortest path query, and people often comprehensively consider other factors such as path time overhead and interest points of path coverage. Consider the scenario where a user wants to travel to a certain location, and wants to find a path from a specified location, through a path meeting a point of interest specified by the user, and finally to an end point, where the shorter and better the time the path takes is on the premise of meeting the tourist's travel budget (hereinafter referred to as cost threshold). The above-mentioned query is called Keyword-aware Optimal Route Query (KOR) best path query. KOR is a path query that satisfies the keyword full coverage, cost threshold (path length) and minimum target value (time overhead). The query is typically used for travel planning, path navigation, and other applications.
Shortening the execution time is an important goal for KOR optimization, but two challenges now exist. The first challenge is to reduce the search size. The most advanced technology in the prior art uses adjacent edge expansion, which essentially traverses from the starting point one by one, or uses keyword vertex expansion, skips the vertex irrelevant to the keyword to continuously expand the keyword vertices except the adjacent vertex, but the method needs to calculate the simplified Skyline path in preprocessing, and has huge space-time overhead. Although the two methods adopt various pruning strategies to shorten the execution time during expansion, such as approximate dominant pruning, global priority expansion, feasible solution target value pruning, minimum cost value pruning and the like, the search scale of the two methods increases exponentially with the increase of the path length or the search depth, and the execution time is still unacceptable.
The second challenge is to combine existing path expansion methods with pruning strategies to further shorten execution time. The most advanced technology at present is essentially serial expansion, i.e. expanding one vertex at a time in the query phase or the preprocessing phase, but the method has reached its performance barrier. Parallel expansion paths may be a good choice. Related studies have demonstrated the feasibility of shortest path parallel path expansion under single constraint, such as expanding single source shortest paths and full source shortest paths in parallel. However, the current research on parallel expansion of the multi-constraint shortest paths is not related.
The prior art has the following problems:
(1) When searching long paths, the searching scale of the existing algorithm increases exponentially along with the increase of the path length or the searching depth, and the execution time is too long.
(2) Parallel expansion technology is mostly applied to shortest path query under single constraint condition, and research on multi-constraint path query such as KOR problem is not involved yet. In order to realize the parallel expansion of the KOR problem, the invention needs to explore how to parallelize the serial multi-constraint expansion.
Disclosure of Invention
The invention aims to solve the problems and provides a segmentation parallel expansion method for keyword optimal path query.
The invention adopts the following technical scheme that the segmentation parallel expansion method for the keyword optimal path query comprises the following steps.
S100, calculating a minimum cost value b (V i,vj) between any two vertexes in the query graph and a target value o σ(vi,vj corresponding to the minimum cost value, and a minimum target value o (V i,vj) and a cost value b τ(vi,vj corresponding to the minimum target value, and simultaneously recording the minimum cost value b min and the minimum target value o min in the query graph G= (V, E).
And S200, constructing a keyword inverted list, namely combining all keyword information { v.w 1,v.w2, & gt } of the vertexes into a non-coincident keyword set, constructing a keyword inverted list in the shape of { w i:vj }, and recording the vertexes corresponding to the keywords.
S300, skipping the vertex irrelevant to the query keyword, constructing the keyword vertex path with the minimum target value asThe keyword vertex path with the minimum cost value is marked as
S400, dividing the keyword vertex paths, and expanding the divided paths in a segmented parallel manner.
The step S300 includes the steps of,
S310, giving a queryvs、vtAnd delta respectively represents a query starting point, a query end point, a keyword set which needs to be covered by a query path and a cost threshold value, and is hidden and not containedThe method comprises the steps of (1) reserving a starting point, an ending point and keyword vertexes, constructing a simplified keyword vertex query graph G ', directly acquiring the keyword vertexes according to a keyword inverted list in S200 in the implementation process, and then adding the query starting point and the query ending point to form the query graph G ', wherein the query graph G ' contains information including (1) weights b(vi,vj)、oσ(vi,vj)、o(vi,vj)、bτ(vi,vj) between any two vertexes acquired in a preprocessing stage, a minimum cost value b min and a minimum target value o min in the query graph, and (2) querying the starting point, the ending point and the keyword vertexes.
S320, in the keyword vertex query graph G', keyword vertices are arranged and combined, traversed through breadth-first search, and finally returned to meet the keyword full coverage and cost threshold valueWherein the method comprises the steps ofThe vertex path of the keyword comprising the smallest target value is recorded asThe keyword vertex path with the minimum cost value is marked as
Step S320 includes the steps of giving a queryWherein the method comprises the steps ofwi={vk,vm},wj={vx,vy}。
S321, first construct an initial tag at v s Placing it in a priority queue.
S322, expanding from the starting point to all keyword vertexes, and constructing labels from v s to each keyword vertexAnd calculating a global priority enqueue, wherein the smaller the global priority, the higher the global priority.
S323, selecting the label with highest global priorityDequeuing at this timeHas included the keyword w j, directionVertex v i、vj related to keyword w i which is not contained is expanded, and new label is constructedAnd judging whether the new label meets the cost threshold and the approximate dominant condition, if so, calculating the global priority and then enqueuing, otherwise, discarding.
S324, continuing to select the label with the highest priority level for expansion, repeating the process S323 until the label is expanded to a terminal v t, and selecting a path corresponding to the label meeting the full coverage of the keyword and having the minimum cost threshold and the target value as a mark
S325, continuing to select the label expansion with the highest priority level, repeating the process S323 until the label expansion reaches the terminal v t, and selecting a path corresponding to the label meeting the full coverage of the keyword, the cost threshold and the minimum cost value as the path
In step S322, the tagIs expressed as global priority ofThe minimum target value from v s to v t is o (v s,vt), the priority factor β is set, 1< β <2, and the smaller the global priority of the tag is, the higher the global priority of the tag is.
Before step S400 is performed, the keyword vertex path with the smallest target value constructed in step S300 is selectedKeyword vertex path with minimum cost valueThe selection is carried out, and the specific selection process is as follows:
in construction of In the process of (2), the returned target value is the smallestIs marked asSimultaneously recording the corresponding target value O min and cost value B max, and returning the minimum cost valueIs marked asSimultaneous recording of B min and O max, availability of [ B min,Bmax ] and [ O min,Omax ], for assuranceThe effective path is returned for the path expansion, and O max is taken as the upper bound of the target value, namely, the path target value returned in the worst case is O max, namelyTo expand the path of the basis, byAndThe key word vertex path boundaries are shared by the following cases;
1) Is not the air-conditioner and is not the air-conditioner, Not null, and B min≤Bmax、Omin≤Omax, in the path-segment parallel expansion, toTaking O max as the upper boundary of the target value,The local cost threshold in (a) constrains the expansion of each path segment.
2)Is in the form of a hollow body,Not empty, path segment expansionTo be accurate.
3)Is in the form of a hollow body,For null, it is indicated that the cost threshold is too low, and the keyword vertex path with the smallest reconstruction target value for improving the cost threshold is recorded asThe keyword vertex path with the minimum cost value is marked as
Step S400 specifically includes the following steps,
And S410, setting parallelism, dividing the keyword vertex path into a plurality of sections, wherein all vertices except for the head vertex and the tail vertex in the keyword vertex path are keyword vertices, and dividing the path by using the keyword vertices.
And S420, carrying out parallel expansion on path segments, judging whether cost values of all paths from the initial point vertexes to the target vertexes of each path through the non-keyword vertexes are smaller than or equal to a local cost threshold value at the target vertexes when the path segments are expanded, if yes, carrying out feasible solution target value pruning strategy comparison applicable to parallel expansion, if yes, calculating global priority, and then queuing, otherwise, discarding.
In step S410, the principle of parallelism setting is to make the keyword vertexes in each path segment as uniform as possible so that the number of vertexes in each path segment is similar when expanding, thereby achieving the purpose of the close expansion time of each path segment.
In step S420, the judgment inequality of comparison of the feasible target value pruning strategy is:
Wherein, the AndRepresenting the vertices of the keywords,Is fromExpansion to v k,vk is fromTo the point ofA label of one vertex passing in the middle, if the constraint is satisfiedEnqueue continues expanding, otherwise discard
Compared with the prior art, the invention has the following beneficial effects:
(1) The invention provides a thought for converting a multi-constraint KOR problem into a limited shortest path problem, namely, firstly constructing a keyword vertex path meeting the full coverage and cost threshold of keywords, and then considering the minimum constraint of a target value in detailed expansion to segment the problem.
(2) Aiming at the problem of long execution time of long path search, the invention provides a segmentation parallel expansion method (PSE-KOR) for keyword optimal path query, which is characterized in that a keyword vertex path is constructed, then the keyword vertex is taken as a boundary to divide the path into a plurality of sections, the search scale (depth) during long path search is reduced, and then the execution time of the long path KOR is further shortened by adopting a parallel expansion mode.
(3) And for path expansion of each segment, local cost threshold pruning is provided to restrict expansion direction and search depth, and meanwhile, the association degree between the segmented paths is reduced. And meanwhile, the practical solution target value pruning strategy of approximate dominant pruning, global priority expansion and applicable parallel expansion is utilized to accelerate the expansion.
Drawings
FIG. 1 is a diagram of a query graph;
FIG. 2 is a schematic diagram of a keyword vertex path;
FIG. 3 is a schematic diagram of the construction of a keyword vertex path;
FIG. 4 is a keyword vertex path example.
Detailed Description
Aiming at the problems, the invention provides a segmentation parallel expansion method for keyword optimal path query, which comprises the steps of firstly skipping vertexes irrelevant to query keywords, expanding the vertexes of the keywords only, constructing keyword vertex paths, dividing the paths into multiple sections by taking the vertexes of the keywords as boundaries, expanding the paths in parallel, and finally splicing the paths into a complete path. In order to reduce the association degree among all road sections and the search scale, the invention also provides a local cost threshold and feasible solution target value pruning suitable for parallel expansion, and the local cost threshold and the feasible solution target value pruning are matched with the parallel expansion.
Before further explanation, the KOR definition will be explained first:
KOR is based on a road network, which can be abstracted into a query graph as shown in fig. 1.
Definition 1 query graph g= (V, E) is made up of vertex set V and edge set E. Any V epsilon V represents the interest point, the geographic position of V is represented by longitude and latitude, and the keyword set marked as v.loc and V is represented asThe path (v i,vj) connecting the neighbors v i and v j is called a edge, denoted as E, E. Each edge has two attributes, cost value b (v i,vj) and target value o (v i,vj).
As shown in FIG. 1, the typical cost value is represented by the value outside the brackets, and the target value is represented by the time required to pass the edge, represented by the value inside the brackets. Table 1 lists the keyword sets corresponding to the vertices in fig. 1.
Table 1 vertex keyword information
Definition 2 Path p= (v 0,v1,....vn) represents a path from v 0 through v 1,…vn-1 to v n, and p-overlaid keywords are expressed as
According to definition 2, the cost value and the target value of path p are respectively:
definition of 3KOR A KOR query Q is a four-tuple vs、vtAnd delta represents the query start point, end point, the covered keyword set, and cost threshold, respectively. Let p s,t denote the set of paths from v s to v t, and p cand denote the set of possible paths p that meet the keyword full coverage and cost threshold, i.eOptimal path of KOR
The technical scheme of the invention is further described as follows:
S100, calculating a minimum cost value b (V i,vj) between any two vertexes in the query graph and a target value o σ(vi,vj corresponding to the minimum cost value, a minimum target value o (V i,vj) and a cost value b τ(vi,vj corresponding to the minimum target value by using a Floyd algorithm, recording the minimum cost value b min and the minimum target value o min in G= (V, E), and storing the minimum cost value b min and the minimum target value o min in a memory.
And 200, constructing a keyword inverted index, namely combining all keyword information { v.w 1,v.w2..degree } of the interest points into a non-coincident keyword set, constructing a keyword inverted index table, and recording the interest points corresponding to the keywords, wherein the table 2 is shown.
TABLE 2 keyword inverted index Table
S300, skipping the vertex irrelevant to the query keyword, constructing the keyword vertex path with the minimum target value asThe keyword vertex path with the minimum cost value is marked as
Definition 4 keyword vertices givenFor keywordsIf a certain vertexCovering keywords w i, i.eThen call forIs a vertex covering the keyword w i.
Definition 5 keyword vertex Path givenContinuously checking uncovered keywords from v s, expanding the keywords to corresponding vertexes to obtain paths which are expanded from v s to v t and meet the full coverage and cost threshold of the keywords, and marking the paths asWeighing scaleIs a keyword vertex path.
Step S300 includes the steps of:
S310, giving a query vs、vtAnd delta respectively represents a query starting point, a query end point, a keyword set which needs to be covered by a query path and a cost threshold value, and is hidden and not containedThe method comprises the steps of (1) reserving a starting point, an ending point and keyword vertexes, constructing a simplified keyword vertex query graph G ', directly acquiring the keyword vertexes according to a keyword inverted list in S200 in the implementation process, and then adding the query starting point and the query ending point to form the query graph G ', wherein the query graph G ' contains information including (1) weights b(vi,vj)、oσ(vi,vj)、o(vi,vj)、bτ(vi,vj) between any two vertexes acquired in a preprocessing stage, a minimum cost value b min and a minimum target value o min in the query graph, and (2) querying the starting point, the ending point and the keyword vertexes.
A simplified keyword vertex query graph G' is shown in fig. 2. The broken lines connect the keyword vertices to form a keyword vertex query graph.
S320, in the keyword vertex query graph G', keyword vertices are arranged and combined, traversed through breadth-first search, and finally returned to meet the keyword full coverage and cost threshold valueWherein the method comprises the steps ofThe vertex path of the keyword comprising the smallest target value is recorded asThe keyword vertex path with the minimum cost value is marked asAs shown in fig. 3, the keyword vertex path with the smallest target value may be returned.
S320, in the keyword vertex query graph G', keyword vertices are arranged and combined, traversed through breadth-first search, and finally returned to meet the keyword full coverage and cost threshold valueWherein the method comprises the steps ofThe vertex path of the keyword comprising the smallest target value is recorded asThe keyword vertex path with the minimum cost value is marked as
Step S320 includes the steps of giving a queryWherein the method comprises the steps ofwi={vk,vm},wj={vx,vy}。
S321, first construct an initial tag at v s Placing it in a priority queue.
S322, expanding from the starting point to all keyword vertexes, and constructing labels from v s to each keyword vertexAnd calculating a global priority enqueue, wherein the smaller the global priority, the higher the global priority.
Label (Label)Is expressed as global priority ofThe minimum target value from v s to v t is o (v s,vt), the priority factor β is set, 1< β <2, and the smaller the global priority of the tag is, the higher the global priority of the tag is.
S323, selecting the label with highest global priorityDequeuing at this timeHas included the keyword w j, directionVertex v i、vj related to keyword w i which is not contained is expanded, and new label is constructedAnd judging whether the new label meets the cost threshold and the approximate dominant condition, if so, calculating the global priority and then enqueuing, otherwise, discarding.
Cost threshold-the path is meeting the guest's travel budget.
The approximation governs that given a parameter a, a slightly greater than 1 (e.g., a=1.1), if the start and end points of path p i、pj are the same,B (p i)≤b(pj) andPath p i approximately dominates that p j,pj is pruned.
Target correction given a query graph g= (V, E) and a queryKnown tagsMinimum cost value b min, minimum target value o min, which is corrected to target value of Is referred to as a correction ratio, where 0< epsilon <1.
S324, continuing to select the label with the highest priority level for expansion, repeating the process S323 until the label is expanded to a terminal v t, and selecting a path corresponding to the label meeting the full coverage of the keyword and having the minimum cost threshold and the target value as a mark
S325, continuing to select the label expansion with the highest priority level, repeating the process S323 until the label expansion reaches the end point v t, and selecting a path corresponding to the label meeting the full coverage of the keyword, the cost threshold and the minimum cost value as a mark
The expansion result is shown in G' of FIG. 4, v 2、v4 is in constructionKeyword vertices that were discarded.
The keyword vertex path with the smallest target value constructed in step S300Keyword vertex path with minimum cost valueThe selection is carried out, and the specific selection process is as follows:
in construction of In the process of (2), the returned target value is the smallestIs marked asSimultaneously recording the corresponding target value O min and cost value B max, and returning the minimum cost valueIs marked asB min and O max were recorded simultaneously. [ B min,Bmax ] and [ O min,Omax ] can be obtained. To ensure toReturns the effective path for the underlying path extension, as represented by O max (i.e) As an upper bound to the target value, the worst case return path target value is O max, i.e.To expand the path of the basis. In the expansion stage, byAndThe key vertex path boundaries that make up share the following:
(1) Is not the air-conditioner and is not the air-conditioner, Not empty, and B min≤Bmax、Omin≤Omax, in path segment expansion, toTaking O max as the upper boundary of the target value,The local cost threshold in (a) constrains the expansion of each path segment.
(2)Is in the form of a hollow body,Not empty, path segment expansionTo be accurate.
(3)Is in the form of a hollow body,For null, it is indicated that the cost threshold is too low, and the keyword vertex path with the smallest reconstruction target value for improving the cost threshold is recorded asThe keyword vertex path with the minimum cost value is marked as
The keyword vertex path with the smallest target value, therefore, has the smallest target value O min, since the solution is being performedWhen the target value is minimum, the path cost value is required to meet the cost threshold value, so that the contractThe path cost value of (3) is B max.
The keyword vertex path with the smallest cost value has the smallest target value B min because the solution is being carried outWhen the target value cost value is minimum, there is no constraint on the target value, so the contractThe path cost value of (3) is O max.
And (3) the minimum target value of the whole road network diagram and the minimum cost value of the whole road network diagram are respectively represented by o min and b min.
From the above, it can be seen that in constructionAnd obtaining two keyword vertex paths with the minimum target value and the minimum cost value, and selecting a specific expansion basis according to the three conditions in the actual expansion process.
S400, dividing the keyword vertex paths, and expanding the divided paths in a segmented parallel manner. The method comprises the following specific steps:
And S410, setting parallelism, dividing the keyword vertex path into a plurality of sections, wherein all vertices except for the head vertex and the tail vertex in the keyword vertex path are keyword vertices, and dividing the path by using the keyword vertices.
All vertices in the keyword vertex path except the head and tail vertices are keyword vertices, so the path can be segmented by using the keyword vertices. As shown in fig. 4, the pathCan be divided into three sections v s→v1、v1→v3、v3→vt or two sections v s→v1→v3、v3→vt.
The principle of setting the parallelism is that the keyword vertexes in each path segment are uniform as much as possible so that the number of vertexes in each path segment is similar when expanding, thereby achieving the purpose of the approaching expanding time of each path segment.
And S420, carrying out parallel expansion on path segments, judging whether cost values of all paths from the initial point vertexes to the target vertexes of each path through the non-keyword vertexes are smaller than or equal to a local cost threshold value at the target vertexes when the path segments are expanded, if yes, carrying out feasible solution target value pruning strategy comparison applicable to parallel expansion, if yes, calculating global priority, and then queuing, otherwise, discarding.
Feasible solution target value pruning when obtaining a feasible path p i(pi∈pcand), taking the target value o (p i) as pruning condition, if the target value of the path p j(pj∈pcand) is larger than o (p i), namely o (p i)<o(pj), then p j is pruned.
Local cost threshold value-known keyword vertex pathTotal cost value for the pathVertex for any two adjacent keywordsAndWhen the path is expanded, the methodAs a slaveTo the point ofThe local cost threshold of the path expansion is calculated.
When the keyword vertex path is constructed, two constraint conditions of keyword full coverage and cost threshold are satisfied, and the local cost threshold of each path segment is recorded, so that the local cost threshold can be used for replacing global cost threshold constraint when each path segment is expanded in parallel. The local cost threshold value only constrains each path and is independent from other paths, so that the association degree between paths can be reduced. The judgment inequality of comparison of the viable solution target value pruning strategy is as follows:
Wherein, the AndRepresenting the vertices of the keywords,Is fromExpansion to v k,vk is fromTo the point ofA label of one vertex passing in the middle, if the constraint is satisfiedEnqueue continues expanding, otherwise discard

Claims (5)

1.一种关键词最优路径查询的分段并行拓展方法,其特征在于:包括以下步骤,1. A segmented parallel expansion method for keyword optimal path query, characterized by comprising the following steps: S100:计算查询图中任意两顶点间的最小代价值b(vi,vj)及最小代价值对应的目标值oσ(vi,vj)和最小目标值o(vi,vj)及最小目标值对应的代价值bτ(vi,vj),同时记录查询图G=(V,E)中的最小代价值bmin和最小目标值ominS100: Calculate the minimum cost value b(v i ,v j ) between any two vertices in the query graph, the target value o σ (v i ,v j ) corresponding to the minimum cost value, the minimum target value o(v i ,v j ) and the cost value b τ (v i ,v j ) corresponding to the minimum target value, and record the minimum cost value b min and the minimum target value o min in the query graph G=(V,E); S200:关键词倒排列表构建:将顶点的所有关键词信息{v.w1,v.w2,...}组合成非重合的关键词集合,构建形如{wi:vj}的关键词倒排列表,记录关键词对应的顶点;S200: constructing a keyword inverted list: combining all keyword information {vw 1 ,vw 2 ,...} of the vertices into a non-overlapping keyword set, constructing a keyword inverted list of the form { wi : vj }, and recording the vertices corresponding to the keywords; S300:跳过与查询关键词无关的顶点,构建目标值最小的关键词顶点路径记为以及代价值最小的关键词顶点路径记为 S300: Skip vertices that are not related to the query keyword and construct the keyword vertex path with the smallest target value, recorded as And the keyword vertex path with the smallest cost value is recorded as S400:对关键词顶点路径进行划分,并对划分后的路径分段并行拓展;步骤S400包括以下步骤,S400: dividing the keyword vertex path, and expanding the divided path in segments in parallel; step S400 includes the following steps: S410:设置并行度,将关键词顶点路径划分为多段,关键词顶点路径中除首尾顶点外的所有顶点都是关键词顶点,使用关键词顶点对路径进行分割;S410: setting the degree of parallelism, dividing the keyword vertex path into multiple segments, all vertices except the first and last vertices in the keyword vertex path are keyword vertices, and the keyword vertices are used to segment the path; 设置并行度的原则是使各个路径分段中的关键词顶点尽量均匀以便拓展时各个路径分段中的顶点数相近,从而达到各路径分段的拓展时间接近的目的;在实际并行时,可以根据关键词顶点路径的中的关键词顶点的个数或者通过衡量每段局部代价阈值的大小设置并行度;The principle of setting the degree of parallelism is to make the keyword vertices in each path segment as uniform as possible so that the number of vertices in each path segment is close during expansion, thereby achieving the purpose of close expansion time for each path segment; in actual parallelism, the degree of parallelism can be set according to the number of keyword vertices in the keyword vertex path or by measuring the size of the local cost threshold of each segment; S420:对路径分段进行并行拓展,在拓展路径分段时,判断从起始点顶点经过非关键词顶点到每段路径的目标顶点的所有路径的代价值是否小于等于目标顶点处的局部代价阈值,若是且不被近似支配,则进行适用并行拓展的可行解目标值剪枝策略对比,若符合判断条件,则计算全局优先度后入队,否则被舍弃;S420: Expand the path segments in parallel. When expanding the path segments, determine whether the cost values of all paths from the starting point vertex through the non-keyword vertex to the target vertex of each path segment are less than or equal to the local cost threshold at the target vertex. If so and not dominated by the approximation, compare the target value pruning strategy of feasible solutions applicable to parallel expansion. If the judgment condition is met, calculate the global priority and enter the queue, otherwise it will be discarded. 可行解目标值剪枝策略对比的判断不等式为:The judgment inequality for the comparison of feasible solution target value pruning strategies is: 其中,代表关键词顶点,是从拓展到vk,vk是从中间经过的一个顶点的标签;如果满足约束,则入队继续拓展,否则丢弃 in, and Represents the keyword vertex, is from Expand to v k , v k is from arrive The label of a vertex in the middle; if the constraint is satisfied, then Enter the team and continue to expand, otherwise discard 2.根据权利要求1所述的关键词最优路径查询的分段并行拓展方法,其特征在于:所述的步骤S300包括以下步骤,2. The segmented parallel expansion method for keyword optimal path query according to claim 1 is characterized in that: the step S300 comprises the following steps: S310:给定一个查询vs、vt和Δ分别表示查询起点、查询终点、查询路径需要覆盖的关键词集合和代价阈值,隐藏没有包含中任意一个查询关键词的顶点,保留起点、终点和关键词顶点,构造一个简化的关键词顶点查询图G′,在实现过程中,直接根据S200中的关键词倒排列表获取关键词顶点,然后加入查询起点和查询终点,形成查询图G′;查询图G′包含的信息有:(1)预处理阶段获取的任意两顶点间的权值b(vi,vj)、oσ(vi,vj)、o(vi,vj)、bτ(vi,vj)以及查询图中的最小代价值bmin、最小目标值omin;(2)查询起点、终点、关键词顶点;S310: Given a query v s 、v t and Δ represent the query starting point, query end point, and keyword set and cost threshold that the query path needs to cover, respectively. The vertex of any query keyword in S200 is retained, and the starting point, end point and keyword vertex are retained to construct a simplified keyword vertex query graph G′. In the implementation process, the keyword vertex is directly obtained according to the keyword inverted list in S200, and then the query starting point and query end point are added to form the query graph G′; the query graph G′ contains the following information: (1) the weights b(v i ,v j ), o σ (v i ,v j ), o(v i ,v j ), b τ (v i ,v j ) between any two vertices obtained in the preprocessing stage, as well as the minimum cost value b min and the minimum target value o min in the query graph; (2) the query starting point, end point and keyword vertex ; S320:在关键词顶点查询图G′中,将关键词顶点排列组合,通过广度优先搜索对其遍历,最终返回满足关键词全覆盖和代价阈值的其中包括目标值最小的关键词顶点路径记为代价值最小的关键词顶点路径记为 S320: In the keyword vertex query graph G′, the keyword vertices are arranged and combined, and they are traversed through breadth-first search, and finally the ones that meet the keyword full coverage and cost threshold are returned. in The path including the keyword vertex with the smallest target value is recorded as The keyword vertex path with the smallest cost value is recorded as 3.根据权利要求2所述的关键词最优路径查询的分段并行拓展方法,其特征在于:所述的步骤S320包括以下步骤,给定一个查询其中wi={vk,vm},wj={vx,vy};3. The segmented parallel expansion method for keyword optimal path query according to claim 2 is characterized in that: the step S320 comprises the following steps: given a query in w i ={v k , v m }, w j ={v x , v y }; S321:首先在vs处构建初始标签将其置入优先级队列;S321: First construct the initial label at vs Put it into the priority queue; S322:从起点开始向所有关键词顶点拓展,并构建从vs到各关键词顶点的标签并计算全局优先度入队,其中全局优先度越小,全局优先级越高;S322: Expand from the starting point to all keyword vertices and construct labels from vs to each keyword vertex And calculate the global priority and enter the queue, where the smaller the global priority, the higher the global priority; S323:选取全局优先级最高的标签出队,此时已经包含关键词wj,向尚未包含的关键词wi相关的顶点vi、vj拓展,构建新标签 并判断新标签是否满足代价阈值和近似支配条件,若满足则计算全局优先度后入队,否则被舍弃;S323: Select the label with the highest global priority Out of the team, at this time Already contains the keyword w j , to Expand the vertices v i and v j related to the keyword w i that has not yet been included and construct a new label And determine whether the new label meets the cost threshold and approximate dominance conditions. If so, it is added to the queue after calculating the global priority, otherwise it is discarded; S324:继续选取优先级别最高的标签拓展,重复过程S323,直至拓展至终点vt,选取满足关键词全覆盖、代价阈值和目标值最小的标签对应的路径记为 S324: Continue to select the label with the highest priority to expand, repeat the process S323 until the end point v t is reached, and select the path corresponding to the label that satisfies the full coverage of keywords, the minimum cost threshold and the target value, and record it as S325:继续选取优先级别最高的标签拓展,重复过程S323,直至拓展至终点vt,选取满足关键词全覆盖、代价阈值和代价值最小的标签对应的路径记为 S325: Continue to select the label with the highest priority to expand, repeat the process S323 until the end point v t is reached, and select the path corresponding to the label that meets the keyword full coverage, cost threshold and minimum cost value, and record it as 4.根据权利要求3所述的关键词最优路径查询的分段并行拓展方法,其特征在于:所述的步骤S322中,标签的全局优先度表示为从vs到vt的最小目标值为o(vs,vt),设置优先级因子β,1<β<2,该标签的全局优先度越小,则该标签的全局优先级越高。4. The segmented parallel expansion method for keyword optimal path query according to claim 3 is characterized in that: in the step S322, the tag The global priority is expressed as The minimum target value from vs to vt is o( vs , vt ), and the priority factor β is set, 1<β<2. The smaller the global priority of the label, the higher the global priority of the label. 5.根据权利要求4所述的关键词最优路径查询的分段并行拓展方法,其特征在于:所述的进行步骤S400前,对步骤S300中构建的目标值最小的关键词顶点路径以及代价值最小的关键词顶点路径进行选择,具体选择过程为:5. The segmented parallel expansion method for keyword optimal path query according to claim 4 is characterized in that: before the step S400, the keyword vertex path with the smallest target value constructed in step S300 is And the keyword vertex path with the smallest cost value Make a selection. The specific selection process is as follows: 在构建的过程中,返回的目标值最小的记为同时记录对应的目标值Omin和代价值Bmax;返回的代价值最小的记为同时记录Bmin和Omax;可得[Bmin,Bmax]及[Omin,Omax];为保证以为依据的路径拓展会返回有效路径,以Omax作为目标值的上界,即最坏情况下返回的路径目标值为Omax,即以为拓展依据的路径,由组成的关键词顶点路径边界共有以下几种情况;In construction In the process, the target value returned is the smallest Recorded as At the same time, the corresponding target value O min and cost value B max are recorded; the one with the smallest cost value is returned. Recorded as Record B min and O max at the same time; [B min , B max ] and [O min , O max ] can be obtained; to ensure The path expansion based on will return a valid path, with O max as the upper bound of the target value, that is, the path target value returned in the worst case is O max , that is, To expand the path based on and There are several situations for the composed keyword vertex path boundaries; 1)不为空,不为空,且Bmin≤Bmax、Omin≤Omax,则在路径分段并行拓展中,以为准,并以Omax作为目标值的上界,中的局部代价阈值约束每段路径拓展;1) is not null, is not empty, and B minB max , O min ≤ O max , then in the path segmentation parallel expansion, and O max is used as the upper limit of the target value. The local cost threshold in constrains the expansion of each path; 2)为空,不为空,路径分段拓展以为准;2) is empty, Not empty, the path segment is extended to shall prevail; 3)为空,为空,说明代价阈值太低,提高代价阈值重新构建目标值最小的关键词顶点路径记为以及代价值最小的关键词顶点路径记为 3) is empty, If it is empty, it means that the cost threshold is too low. Increase the cost threshold to reconstruct the keyword vertex path with the smallest target value and record it as And the keyword vertex path with the smallest cost value is recorded as
CN202210510076.3A 2022-05-11 2022-05-11 A segmented parallel expansion method for keyword optimal path query Active CN114817772B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210510076.3A CN114817772B (en) 2022-05-11 2022-05-11 A segmented parallel expansion method for keyword optimal path query

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210510076.3A CN114817772B (en) 2022-05-11 2022-05-11 A segmented parallel expansion method for keyword optimal path query

Publications (2)

Publication Number Publication Date
CN114817772A CN114817772A (en) 2022-07-29
CN114817772B true CN114817772B (en) 2025-06-27

Family

ID=82512945

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210510076.3A Active CN114817772B (en) 2022-05-11 2022-05-11 A segmented parallel expansion method for keyword optimal path query

Country Status (1)

Country Link
CN (1) CN114817772B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN120199246B (en) * 2025-05-26 2025-08-01 浙江云朵网科技股份有限公司 Intelligent robot question-answering method and system based on artificial intelligence

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8364629B2 (en) * 2009-07-02 2013-01-29 Palo Alto Research Center Incorporated Depth-first search for target value problems
CA3054432C (en) * 2015-04-30 2024-04-23 10353744 Canada Ltd. Method and device for searching for electronic transaction certificate, and network search engine
CN105868336B (en) * 2016-03-28 2019-11-29 哈尔滨工程大学 Spatial key word querying method in road network towards set
CN106446242B (en) * 2016-10-12 2019-10-25 太原理工大学 An Efficient Multi-keyword Matching Optimal Path Query Method
CN107944041B (en) * 2017-12-14 2021-11-09 成都雅骏新能源汽车科技股份有限公司 Storage structure optimization method of HDFS (Hadoop distributed File System)
CN109974732B (en) * 2019-03-28 2022-11-15 东北大学 Top-k multi-request path planning method based on semantic perception
CN110347977A (en) * 2019-06-28 2019-10-18 太原理工大学 A kind of news automated tag method based on LDA model

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
关键词最优路径查询的分段拓展算法;刘蒙蒙 等;计算机工程;20210802;第第48卷卷(第第6期期);第79-88页 *

Also Published As

Publication number Publication date
CN114817772A (en) 2022-07-29

Similar Documents

Publication Publication Date Title
Bauer et al. Combining hierarchical and goal-directed speed-up techniques for dijkstra's algorithm
CN105091889B (en) A kind of determination method and apparatus of hotspot path
CN106503789A (en) Loop-free shortest path searching method based on Di Jiesitela and minimax ant colony
TW202146852A (en) Route deviation quantification and vehicular route learning based thereon
CN110348636A (en) Path planning prediction method, device, equipment and computer readable storage medium
CN111783895B (en) Neural network-based travel plan recommendation method, device, computer equipment and storage medium
Lowalekar et al. Zone path construction (zac) based approaches for effective real-time ridesharing
Wu et al. Service network design for same-day delivery with hub capacity constraints
US20230196215A1 (en) Method for computing a set of itineraries from a departure location to an arrival location using cluster-based searching
CN108765180A (en) The overlapping community discovery method extended with seed based on influence power
Wang et al. Constrained route planning over large multi-modal time-dependent networks
CN106052692A (en) Shortest route planning and navigating method and system
CN113468293A (en) Road network Top-k path query method based on multi-keyword coverage
CN114817772B (en) A segmented parallel expansion method for keyword optimal path query
CN107332770A (en) One kind must be through a routed path system of selection
Delling et al. Faster batched shortest paths in road networks
Garcia et al. Hybrid approach for the public transportation time dependent orienteering problem with time windows
CN117633535A (en) Commuting mode identification method and device
CN114611668A (en) Vector representation learning method and system based on heterogeneous information network random walk
He et al. Exploring public transport transfer opportunities for pareto search of multicriteria journeys
CN112418563B (en) Journey planning method based on graph clustering and iterative local search
CN108521376B (en) A flow table design method based on attribute similarity in software-defined networks
Aliyan et al. Analysis and performance evaluation of various shortest path algorithms
CN109711633B (en) Public transport travel path planning and indexing method based on MapReduce
CN115547087B (en) Urban road network shortest path acquisition method based on two-stage method and direction induction and application

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant