CN107704451A - Semantic analysis based on grammer networks and lucene - Google Patents
Semantic analysis based on grammer networks and lucene Download PDFInfo
- Publication number
- CN107704451A CN107704451A CN201710972496.2A CN201710972496A CN107704451A CN 107704451 A CN107704451 A CN 107704451A CN 201710972496 A CN201710972496 A CN 201710972496A CN 107704451 A CN107704451 A CN 107704451A
- Authority
- CN
- China
- Prior art keywords
- lucene
- semantic analysis
- grammer networks
- rule
- grammer
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/31—Indexing; Data structures therefor; Storage structures
- G06F16/316—Indexing structures
- G06F16/322—Trees
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Audiology, Speech & Language Pathology (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Software Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a kind of semantic analysis based on grammer networks and lucene, grammer networks syntax rule a) is write;B) Lucene index files are built;C) lucene search leaf node rule lists are increased in resolver;D) statement matching, according to the rule of definition, the leaf node specified is matched.Invention combines grammer networks and lucene, and workload and repeatability are reduced in terms of semantic analysis, is retrieved using lucene, improves retrieval rate, can be good at index variables, and parsing that can be rapidly and efficiently is semantic.
Description
Technical field
The present invention relates to natural language processing technique field, and in particular to one kind is searched using ABNF semantic normalizations and Lucene
Index holds up the method for solving semantic analysis.
Background technology
With the rise of artificial intelligence, an important directions of the natural language processing as artificial intelligence field, mainly grind
Study carefully theory and method that people is communicated with computer by natural language, the grammatical specification of regularization is still the master of in the market
Stream selection.The either mode of canonical matching or the mode of grammer networks, are required for exhaustion to go out all grammers being likely to occur,
But as the data of video name, performer, television channel etc are clearly irrational with the exhaustive mode of text.
Current natural language understanding, generic way are:Matched or neutral net deep learning by rule.Nerve
Network depth study needs long-term Data Collection and substantial amounts of data sample to be trained and marked with dictionary, and precision is not
Stable, analysis result can be deviated, and on short terms, be especially unsuitable for the starting stage, so rule match and neutral net
Exploitation of arranging in pairs or groups is a rational development approach of stalwartness.
The content of the invention
Instant invention overcomes the deficiencies in the prior art, there is provided a kind of rapidly and efficiently parsing it is semantic based on grammer networks and
Lucene semantic analysis.
In view of the above mentioned problem of prior art, according to one side disclosed by the invention, the present invention uses following technology
Scheme:
A kind of semantic analysis based on grammer networks and lucene, including:
A) grammer networks syntax rule is write;
B) Lucene index files are built;
C) lucene search leaf node rule lists are increased in resolver;
D) statement matching, according to the rule of definition, the leaf node specified is matched.
In order to which the present invention is better achieved, further technical scheme is:
According to one embodiment of the invention, in addition to:Timing carries out Lucene index file incremental builds.
According to another embodiment of the invention, using ABNF syntax rules.
According to another embodiment of the invention, antlr syntax specification resolvers are used in the step c).
According to another embodiment of the invention, matched in layer according to node in the step d).
According to another embodiment of the invention, the index file includes:VIDEO、CATEGORY、MUSICTYPE.
The present invention can also be:
According to another embodiment of the invention, tree is parsed into using antlr resolvers, in leaf node
Increase lucene query nodes.
According to another embodiment of the invention, the index file of variable is built in a hard disk by lucene.
Compared with prior art, one of beneficial effects of the present invention are:
A kind of semantic analysis based on grammer networks and lucene of the present invention, grammer networks and lucene are tied
Close, workload and repeatability are reduced in terms of semantic analysis, is retrieved using lucene, improves retrieval rate, can be fine
Index variables, parsing that can be rapidly and efficiently is semantic.
Brief description of the drawings
, below will be to embodiment for clearer explanation present specification embodiment or technical scheme of the prior art
Or the required accompanying drawing used is briefly described in the description of prior art, it should be apparent that, drawings in the following description are only
It is the reference to some embodiments in present specification, for those skilled in the art, is not paying creative work
In the case of, other accompanying drawings can also be obtained according to these accompanying drawings.
Fig. 1 is to be illustrated according to one embodiment of the invention based on the method and step of grammer networks and lucene search engines
Figure.
Fig. 2 is the rule match schematic flow sheet according to one embodiment of the invention.
Embodiment
The present invention is described in further detail with reference to embodiment, but the implementation of the present invention is not limited to this.
Canonical matching and grammer networks matching, rule can be used in the content mentioned based on background section, rule match
Matching needs exhaustion to go out all grammers possibility, is more suitable for the narrower scene of some scope of applications, such as seat reservation system, tv instructions
Deng.In video field, I wants to see the film of certain performer and certain director, then needs the exhaustion in rule to go out all performers and lead
Drill, workload is too big, and repeatability is too high, and this is clearly less rational, therefore one embodiment of the invention actor keywords
Replacing all performers, directory replaces all directors, and video replaces all video names, when resolver recognizes this
During a little keyword nodes, corresponding variable is searched for using lucene.
Step includes:1. building performer actor in a hard disk by lucene, director, video video, sound are directed
The index file of the variables such as happy music;2. choose suitable ABNF resolvers analytic grammar specification;3. resolver is changed, addition
The leaf node searched based on lucene.
Wherein, antlr syntax specification resolvers are changed, add the step of lucene searches for TerminalNode leaf nodes
Including:
A. first by variable data:As performer, role, song, novel etc. divide file structure index.With video name
Exemplified by VIDEO:
Video name:Name=" song is without cease "
Video alias list:[" song to song ", " song sung for you ", " song is continuous " " can not nothing by otherNames=
My god "]
Video classification:Category=" movie "
B. and then lucene is searched for into leaf node and writes grammer networks rule documents as a rule.With video display field
Exemplified by video name:
Rule_VIDEO_QUERY=[" I "] [" wanting to see "] VIDEO [" this portion "] [CATEGORY]
Lucene search leaf node VIDEO and CATEGORY represent respectively video name and video classification (such as film, it is comprehensive
Skill, TV play), when tree runs to VIDEO nodes, special lucene searching codes can be performed.Rather than picture "
I ", " this portion " and etc. the simple equal judgement of character string.
And the data of television channel pointed out according to background technology etc are clearly unreasonable with the exhaustive mode of text
The problem of, the present embodiment combines grammer networks and lucene, and regular part is completed with grammer networks:Such as:
[" I "] [" thinking "/" wanting "] [" seeing "/" come "] [DIGIT " portion "/" individual "] actor [" protagonist "] [" "] [" electricity
Shadow "/" TV play "]
Any one following clause can be matched:
I want to see XXX, and I will see XXX, carrys out an XXX, the film that XXX is acted the leading role, sees XXX TV play ...
The part of XXX variant contents is completed with lucene, and lucene is established after index, energy as full-text search instrument
It is enough quickly to retrieve very much the variable physical contents to match.Why can be stored in originally because of the index that it is established with lucene
Ground hard disk, search speed ratio and be stored in database such as mysql, mongo, caching system such as redis etc. soon a lot.
The semantic normalization of standard is provided by ABNF, tree is parsed into using antlr resolvers, increases in leaf node
Add lucene query nodes, to provide as the search of the variable data such as performer, movie name, song title, director.
As shown in figure 1, the implementation of a semantic parsing specific workflow based on grammer networks and lucene search engines
Example, scheme are as follows:
A) eradication demand writes grammer networks syntax rule:It is as follows
Rule_VIDEO_QUERY=[" I "] [" wanting to see "] VIDEO [" this portion "] [CATEGORY]
Rule_MUSIC_QUERY=[" giving "] [" this baby "] [" next "] [" head "] MUSICTYPE [" type "] [" "]
(" sings "/" song ");
B) after program starts, timing carries out Lucene index file incremental builds, as follows:
Newly-built or renewal index file:VIDEO、CATEGORY、MUSICTYPE;
C) the rule nodes such as VIDEO, CATEGORY, MUSICTYPE are added in resolver;
D) statement matching, according to the rule of definition, the leaf node specified is matched.
As shown in Fig. 2 it is rule match flow chart:Matched in layer according to node, the figure illustrate in detail a rules and regulations
Complete match process then.
For to sum up, the semantic analysis of the invention based on grammer networks and lucene, a) determined using grammer networks
Adopted syntax rule structure.B) a variety of variable modules are built, correct string matching item is retrieved using lucene.C) grammatical net
The leaf node rule of variable module is added in network resolver.D) under specific area, syntax rule is all relatively limited, but noun
Variable part can not then use exhaustive mode, and lucene possesses high retrieval rate, can be good at indexing these variables,
The two combines parsing semanteme that can be rapidly and efficiently.
" one embodiment " for being spoken of in this manual, " another embodiment ", " embodiment ", etc., refer to tying
Specific features, structure or the feature for closing embodiment description are included at least one embodiment of the application generality description
In.It is not necessarily to refer to same embodiment that statement of the same race, which occur, in multiple places in the description.Appoint furthermore, it is understood that combining
When one embodiment describes a specific features, structure or feature, what is advocated is this to realize with reference to other embodiment
Feature, structure or feature are also fallen within the scope of the present invention.
Although reference be made herein to invention has been described for multiple explanatory embodiments of the invention, however, it is to be understood that
Those skilled in the art can be designed that a lot of other modifications and embodiment, and these modifications and embodiment will fall in this Shen
Please be within disclosed spirit and spirit.More specifically, can be to master in the range of disclosure and claim
The building block and/or layout for inscribing composite configuration carry out a variety of variations and modifications.Except what is carried out to building block and/or layout
Outside variations and modifications, to those skilled in the art, other purposes also will be apparent.
Claims (8)
- A kind of 1. semantic analysis based on grammer networks and lucene, it is characterised in that including:A) grammer networks syntax rule is write;B) Lucene index files are built;C) lucene search leaf node rule lists are increased in resolver;D) statement matching, according to the rule of definition, the leaf node specified is matched.
- 2. the semantic analysis according to claim 1 based on grammer networks and lucene, it is characterised in that also include: Timing carries out Lucene index file incremental builds.
- 3. the semantic analysis according to claim 1 based on grammer networks and lucene, it is characterised in that use ABNF syntax rules.
- 4. the semantic analysis according to claim 1 based on grammer networks and lucene, it is characterised in that the step It is rapid c) middle to use antlr syntax specification resolvers.
- 5. the semantic analysis according to claim 1 based on grammer networks and lucene, it is characterised in that the step It is rapid d) in matched in layer according to node.
- 6. the semantic analysis according to claim 1 based on grammer networks and lucene, it is characterised in that the rope Quotation part includes:VIDEO、CATEGORY、MUSICTYPE.
- 7. the semantic analysis according to claim 1 based on grammer networks and lucene, it is characterised in that utilize Antlr resolvers are parsed into tree, increase lucene query nodes in leaf node.
- 8. the semantic analysis according to claim 1 based on grammer networks and lucene, it is characterised in that pass through Lucene builds the index file of variable in a hard disk.
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201710972496.2A CN107704451A (en) | 2017-10-18 | 2017-10-18 | Semantic analysis based on grammer networks and lucene |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201710972496.2A CN107704451A (en) | 2017-10-18 | 2017-10-18 | Semantic analysis based on grammer networks and lucene |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| CN107704451A true CN107704451A (en) | 2018-02-16 |
Family
ID=61181571
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN201710972496.2A Pending CN107704451A (en) | 2017-10-18 | 2017-10-18 | Semantic analysis based on grammer networks and lucene |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN107704451A (en) |
Cited By (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN109271459A (en) * | 2018-09-18 | 2019-01-25 | 四川长虹电器股份有限公司 | Chat robots and its implementation based on Lucene and grammer networks |
| CN112133303A (en) * | 2020-09-16 | 2020-12-25 | 四川长虹电器股份有限公司 | Method for realizing one set of system supporting multi-brand intelligent sound box semantic instruction |
| CN116913262A (en) * | 2023-07-24 | 2023-10-20 | 重庆赛力斯新能源汽车设计院有限公司 | Methods, devices, electronic equipment and readable storage media for in-vehicle semantic understanding |
Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN1512484A (en) * | 2002-12-27 | 2004-07-14 | 联想(北京)有限公司 | Organizing and identifying method for natural language |
| US20080052663A1 (en) * | 2006-07-17 | 2008-02-28 | Rod Cope | Project extensibility and certification for stacking and support tool |
| CN102243647A (en) * | 2010-05-11 | 2011-11-16 | 微软公司 | Extracting higher-order knowledge from structured data |
| CN106547516A (en) * | 2016-11-01 | 2017-03-29 | 航天恒星科技有限公司 | Spacecraft telecommand instructs upload control method and device |
-
2017
- 2017-10-18 CN CN201710972496.2A patent/CN107704451A/en active Pending
Patent Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN1512484A (en) * | 2002-12-27 | 2004-07-14 | 联想(北京)有限公司 | Organizing and identifying method for natural language |
| US20080052663A1 (en) * | 2006-07-17 | 2008-02-28 | Rod Cope | Project extensibility and certification for stacking and support tool |
| CN102243647A (en) * | 2010-05-11 | 2011-11-16 | 微软公司 | Extracting higher-order knowledge from structured data |
| CN106547516A (en) * | 2016-11-01 | 2017-03-29 | 航天恒星科技有限公司 | Spacecraft telecommand instructs upload control method and device |
Non-Patent Citations (1)
| Title |
|---|
| 杜红芳: "个人数据空间管理系统查询与索引机制的研究与实现", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
Cited By (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN109271459A (en) * | 2018-09-18 | 2019-01-25 | 四川长虹电器股份有限公司 | Chat robots and its implementation based on Lucene and grammer networks |
| CN112133303A (en) * | 2020-09-16 | 2020-12-25 | 四川长虹电器股份有限公司 | Method for realizing one set of system supporting multi-brand intelligent sound box semantic instruction |
| CN116913262A (en) * | 2023-07-24 | 2023-10-20 | 重庆赛力斯新能源汽车设计院有限公司 | Methods, devices, electronic equipment and readable storage media for in-vehicle semantic understanding |
| CN116913262B (en) * | 2023-07-24 | 2025-06-03 | 重庆赛力斯凤凰智创科技有限公司 | Method, device, electronic device and readable storage medium for vehicle-mounted semantic understanding |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US20240012810A1 (en) | Clause-wise text-to-sql generation | |
| CN102063476B (en) | Video searching method and system | |
| CN110442777B (en) | BERT-based pseudo-correlation feedback model information retrieval method and system | |
| JP6014725B2 (en) | Retrieval and information providing method and system for single / multi-sentence natural language queries | |
| Cantador et al. | Enriching ontological user profiles with tagging history for multi-domain recommendations | |
| JP5530425B2 (en) | Method, system, and computer program for dynamic generation of user-driven semantic networks and media integration | |
| KR101255405B1 (en) | Indexing and searching speech with text meta-data | |
| Heck et al. | Leveraging knowledge graphs for web-scale unsupervised semantic parsing | |
| CN111353030A (en) | Knowledge question and answer retrieval method and device based on travel field knowledge graph | |
| EP2836935B1 (en) | Finding data in connected corpuses using examples | |
| CN110110173A (en) | Search result rank and presentation | |
| CN102750949B (en) | Voice recognition method and device | |
| CN103092943B (en) | A kind of method of advertisement scheduling and advertisement scheduling server | |
| CN105706078A (en) | Automatic definition of entity collections | |
| WO2010119288A1 (en) | Metadata browser | |
| CN102880599B (en) | For resolving the sentence heuristic approach that sentence is also supported to learn this parsing | |
| CN101655862A (en) | Method and device for searching information object | |
| KR102729987B1 (en) | Apparatus, method and computer program for processing inquiry | |
| CN107704451A (en) | Semantic analysis based on grammer networks and lucene | |
| CN103514289A (en) | Method and device for building interest entity base | |
| Song et al. | VoiceQuerySystem: A voice-driven database querying system using natural language questions | |
| CN103020311B (en) | A kind of processing method of user search word and system | |
| WO2025166377A1 (en) | Pre-computation for intermediate-representation infused search and retrieval augmented generation | |
| CN117009373A (en) | Entity query method, query end, request end and electronic equipment | |
| CN101398828A (en) | Information precision search and information publishing method |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| RJ01 | Rejection of invention patent application after publication | ||
| RJ01 | Rejection of invention patent application after publication |
Application publication date: 20180216 |