[go: up one dir, main page]

CN107704451A - Semantic analysis based on grammer networks and lucene - Google Patents

Semantic analysis based on grammer networks and lucene Download PDF

Info

Publication number
CN107704451A
CN107704451A CN201710972496.2A CN201710972496A CN107704451A CN 107704451 A CN107704451 A CN 107704451A CN 201710972496 A CN201710972496 A CN 201710972496A CN 107704451 A CN107704451 A CN 107704451A
Authority
CN
China
Prior art keywords
lucene
semantic analysis
grammer networks
rule
grammer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710972496.2A
Other languages
Chinese (zh)
Inventor
周红
刘楚雄
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sichuan Changhong Electric Co Ltd
Original Assignee
Sichuan Changhong Electric Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sichuan Changhong Electric Co Ltd filed Critical Sichuan Changhong Electric Co Ltd
Priority to CN201710972496.2A priority Critical patent/CN107704451A/en
Publication of CN107704451A publication Critical patent/CN107704451A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • G06F16/316Indexing structures
    • G06F16/322Trees
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a kind of semantic analysis based on grammer networks and lucene, grammer networks syntax rule a) is write;B) Lucene index files are built;C) lucene search leaf node rule lists are increased in resolver;D) statement matching, according to the rule of definition, the leaf node specified is matched.Invention combines grammer networks and lucene, and workload and repeatability are reduced in terms of semantic analysis, is retrieved using lucene, improves retrieval rate, can be good at index variables, and parsing that can be rapidly and efficiently is semantic.

Description

Semantic analysis based on grammer networks and lucene
Technical field
The present invention relates to natural language processing technique field, and in particular to one kind is searched using ABNF semantic normalizations and Lucene Index holds up the method for solving semantic analysis.
Background technology
With the rise of artificial intelligence, an important directions of the natural language processing as artificial intelligence field, mainly grind Study carefully theory and method that people is communicated with computer by natural language, the grammatical specification of regularization is still the master of in the market Stream selection.The either mode of canonical matching or the mode of grammer networks, are required for exhaustion to go out all grammers being likely to occur, But as the data of video name, performer, television channel etc are clearly irrational with the exhaustive mode of text.
Current natural language understanding, generic way are:Matched or neutral net deep learning by rule.Nerve Network depth study needs long-term Data Collection and substantial amounts of data sample to be trained and marked with dictionary, and precision is not Stable, analysis result can be deviated, and on short terms, be especially unsuitable for the starting stage, so rule match and neutral net Exploitation of arranging in pairs or groups is a rational development approach of stalwartness.
The content of the invention
Instant invention overcomes the deficiencies in the prior art, there is provided a kind of rapidly and efficiently parsing it is semantic based on grammer networks and Lucene semantic analysis.
In view of the above mentioned problem of prior art, according to one side disclosed by the invention, the present invention uses following technology Scheme:
A kind of semantic analysis based on grammer networks and lucene, including:
A) grammer networks syntax rule is write;
B) Lucene index files are built;
C) lucene search leaf node rule lists are increased in resolver;
D) statement matching, according to the rule of definition, the leaf node specified is matched.
In order to which the present invention is better achieved, further technical scheme is:
According to one embodiment of the invention, in addition to:Timing carries out Lucene index file incremental builds.
According to another embodiment of the invention, using ABNF syntax rules.
According to another embodiment of the invention, antlr syntax specification resolvers are used in the step c).
According to another embodiment of the invention, matched in layer according to node in the step d).
According to another embodiment of the invention, the index file includes:VIDEO、CATEGORY、MUSICTYPE.
The present invention can also be:
According to another embodiment of the invention, tree is parsed into using antlr resolvers, in leaf node Increase lucene query nodes.
According to another embodiment of the invention, the index file of variable is built in a hard disk by lucene.
Compared with prior art, one of beneficial effects of the present invention are:
A kind of semantic analysis based on grammer networks and lucene of the present invention, grammer networks and lucene are tied Close, workload and repeatability are reduced in terms of semantic analysis, is retrieved using lucene, improves retrieval rate, can be fine Index variables, parsing that can be rapidly and efficiently is semantic.
Brief description of the drawings
, below will be to embodiment for clearer explanation present specification embodiment or technical scheme of the prior art Or the required accompanying drawing used is briefly described in the description of prior art, it should be apparent that, drawings in the following description are only It is the reference to some embodiments in present specification, for those skilled in the art, is not paying creative work In the case of, other accompanying drawings can also be obtained according to these accompanying drawings.
Fig. 1 is to be illustrated according to one embodiment of the invention based on the method and step of grammer networks and lucene search engines Figure.
Fig. 2 is the rule match schematic flow sheet according to one embodiment of the invention.
Embodiment
The present invention is described in further detail with reference to embodiment, but the implementation of the present invention is not limited to this.
Canonical matching and grammer networks matching, rule can be used in the content mentioned based on background section, rule match Matching needs exhaustion to go out all grammers possibility, is more suitable for the narrower scene of some scope of applications, such as seat reservation system, tv instructions Deng.In video field, I wants to see the film of certain performer and certain director, then needs the exhaustion in rule to go out all performers and lead Drill, workload is too big, and repeatability is too high, and this is clearly less rational, therefore one embodiment of the invention actor keywords Replacing all performers, directory replaces all directors, and video replaces all video names, when resolver recognizes this During a little keyword nodes, corresponding variable is searched for using lucene.
Step includes:1. building performer actor in a hard disk by lucene, director, video video, sound are directed The index file of the variables such as happy music;2. choose suitable ABNF resolvers analytic grammar specification;3. resolver is changed, addition The leaf node searched based on lucene.
Wherein, antlr syntax specification resolvers are changed, add the step of lucene searches for TerminalNode leaf nodes Including:
A. first by variable data:As performer, role, song, novel etc. divide file structure index.With video name Exemplified by VIDEO:
Video name:Name=" song is without cease "
Video alias list:[" song to song ", " song sung for you ", " song is continuous " " can not nothing by otherNames= My god "]
Video classification:Category=" movie "
B. and then lucene is searched for into leaf node and writes grammer networks rule documents as a rule.With video display field Exemplified by video name:
Rule_VIDEO_QUERY=[" I "] [" wanting to see "] VIDEO [" this portion "] [CATEGORY]
Lucene search leaf node VIDEO and CATEGORY represent respectively video name and video classification (such as film, it is comprehensive Skill, TV play), when tree runs to VIDEO nodes, special lucene searching codes can be performed.Rather than picture " I ", " this portion " and etc. the simple equal judgement of character string.
And the data of television channel pointed out according to background technology etc are clearly unreasonable with the exhaustive mode of text The problem of, the present embodiment combines grammer networks and lucene, and regular part is completed with grammer networks:Such as:
[" I "] [" thinking "/" wanting "] [" seeing "/" come "] [DIGIT " portion "/" individual "] actor [" protagonist "] [" "] [" electricity Shadow "/" TV play "]
Any one following clause can be matched:
I want to see XXX, and I will see XXX, carrys out an XXX, the film that XXX is acted the leading role, sees XXX TV play ...
The part of XXX variant contents is completed with lucene, and lucene is established after index, energy as full-text search instrument It is enough quickly to retrieve very much the variable physical contents to match.Why can be stored in originally because of the index that it is established with lucene Ground hard disk, search speed ratio and be stored in database such as mysql, mongo, caching system such as redis etc. soon a lot.
The semantic normalization of standard is provided by ABNF, tree is parsed into using antlr resolvers, increases in leaf node Add lucene query nodes, to provide as the search of the variable data such as performer, movie name, song title, director.
As shown in figure 1, the implementation of a semantic parsing specific workflow based on grammer networks and lucene search engines Example, scheme are as follows:
A) eradication demand writes grammer networks syntax rule:It is as follows
Rule_VIDEO_QUERY=[" I "] [" wanting to see "] VIDEO [" this portion "] [CATEGORY]
Rule_MUSIC_QUERY=[" giving "] [" this baby "] [" next "] [" head "] MUSICTYPE [" type "] [" "] (" sings "/" song ");
B) after program starts, timing carries out Lucene index file incremental builds, as follows:
Newly-built or renewal index file:VIDEO、CATEGORY、MUSICTYPE;
C) the rule nodes such as VIDEO, CATEGORY, MUSICTYPE are added in resolver;
D) statement matching, according to the rule of definition, the leaf node specified is matched.
As shown in Fig. 2 it is rule match flow chart:Matched in layer according to node, the figure illustrate in detail a rules and regulations Complete match process then.
For to sum up, the semantic analysis of the invention based on grammer networks and lucene, a) determined using grammer networks Adopted syntax rule structure.B) a variety of variable modules are built, correct string matching item is retrieved using lucene.C) grammatical net The leaf node rule of variable module is added in network resolver.D) under specific area, syntax rule is all relatively limited, but noun Variable part can not then use exhaustive mode, and lucene possesses high retrieval rate, can be good at indexing these variables, The two combines parsing semanteme that can be rapidly and efficiently.
" one embodiment " for being spoken of in this manual, " another embodiment ", " embodiment ", etc., refer to tying Specific features, structure or the feature for closing embodiment description are included at least one embodiment of the application generality description In.It is not necessarily to refer to same embodiment that statement of the same race, which occur, in multiple places in the description.Appoint furthermore, it is understood that combining When one embodiment describes a specific features, structure or feature, what is advocated is this to realize with reference to other embodiment Feature, structure or feature are also fallen within the scope of the present invention.
Although reference be made herein to invention has been described for multiple explanatory embodiments of the invention, however, it is to be understood that Those skilled in the art can be designed that a lot of other modifications and embodiment, and these modifications and embodiment will fall in this Shen Please be within disclosed spirit and spirit.More specifically, can be to master in the range of disclosure and claim The building block and/or layout for inscribing composite configuration carry out a variety of variations and modifications.Except what is carried out to building block and/or layout Outside variations and modifications, to those skilled in the art, other purposes also will be apparent.

Claims (8)

  1. A kind of 1. semantic analysis based on grammer networks and lucene, it is characterised in that including:
    A) grammer networks syntax rule is write;
    B) Lucene index files are built;
    C) lucene search leaf node rule lists are increased in resolver;
    D) statement matching, according to the rule of definition, the leaf node specified is matched.
  2. 2. the semantic analysis according to claim 1 based on grammer networks and lucene, it is characterised in that also include: Timing carries out Lucene index file incremental builds.
  3. 3. the semantic analysis according to claim 1 based on grammer networks and lucene, it is characterised in that use ABNF syntax rules.
  4. 4. the semantic analysis according to claim 1 based on grammer networks and lucene, it is characterised in that the step It is rapid c) middle to use antlr syntax specification resolvers.
  5. 5. the semantic analysis according to claim 1 based on grammer networks and lucene, it is characterised in that the step It is rapid d) in matched in layer according to node.
  6. 6. the semantic analysis according to claim 1 based on grammer networks and lucene, it is characterised in that the rope Quotation part includes:VIDEO、CATEGORY、MUSICTYPE.
  7. 7. the semantic analysis according to claim 1 based on grammer networks and lucene, it is characterised in that utilize Antlr resolvers are parsed into tree, increase lucene query nodes in leaf node.
  8. 8. the semantic analysis according to claim 1 based on grammer networks and lucene, it is characterised in that pass through Lucene builds the index file of variable in a hard disk.
CN201710972496.2A 2017-10-18 2017-10-18 Semantic analysis based on grammer networks and lucene Pending CN107704451A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710972496.2A CN107704451A (en) 2017-10-18 2017-10-18 Semantic analysis based on grammer networks and lucene

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710972496.2A CN107704451A (en) 2017-10-18 2017-10-18 Semantic analysis based on grammer networks and lucene

Publications (1)

Publication Number Publication Date
CN107704451A true CN107704451A (en) 2018-02-16

Family

ID=61181571

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710972496.2A Pending CN107704451A (en) 2017-10-18 2017-10-18 Semantic analysis based on grammer networks and lucene

Country Status (1)

Country Link
CN (1) CN107704451A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109271459A (en) * 2018-09-18 2019-01-25 四川长虹电器股份有限公司 Chat robots and its implementation based on Lucene and grammer networks
CN112133303A (en) * 2020-09-16 2020-12-25 四川长虹电器股份有限公司 Method for realizing one set of system supporting multi-brand intelligent sound box semantic instruction
CN116913262A (en) * 2023-07-24 2023-10-20 重庆赛力斯新能源汽车设计院有限公司 Methods, devices, electronic equipment and readable storage media for in-vehicle semantic understanding

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1512484A (en) * 2002-12-27 2004-07-14 联想(北京)有限公司 Organizing and identifying method for natural language
US20080052663A1 (en) * 2006-07-17 2008-02-28 Rod Cope Project extensibility and certification for stacking and support tool
CN102243647A (en) * 2010-05-11 2011-11-16 微软公司 Extracting higher-order knowledge from structured data
CN106547516A (en) * 2016-11-01 2017-03-29 航天恒星科技有限公司 Spacecraft telecommand instructs upload control method and device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1512484A (en) * 2002-12-27 2004-07-14 联想(北京)有限公司 Organizing and identifying method for natural language
US20080052663A1 (en) * 2006-07-17 2008-02-28 Rod Cope Project extensibility and certification for stacking and support tool
CN102243647A (en) * 2010-05-11 2011-11-16 微软公司 Extracting higher-order knowledge from structured data
CN106547516A (en) * 2016-11-01 2017-03-29 航天恒星科技有限公司 Spacecraft telecommand instructs upload control method and device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
杜红芳: "个人数据空间管理系统查询与索引机制的研究与实现", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109271459A (en) * 2018-09-18 2019-01-25 四川长虹电器股份有限公司 Chat robots and its implementation based on Lucene and grammer networks
CN112133303A (en) * 2020-09-16 2020-12-25 四川长虹电器股份有限公司 Method for realizing one set of system supporting multi-brand intelligent sound box semantic instruction
CN116913262A (en) * 2023-07-24 2023-10-20 重庆赛力斯新能源汽车设计院有限公司 Methods, devices, electronic equipment and readable storage media for in-vehicle semantic understanding
CN116913262B (en) * 2023-07-24 2025-06-03 重庆赛力斯凤凰智创科技有限公司 Method, device, electronic device and readable storage medium for vehicle-mounted semantic understanding

Similar Documents

Publication Publication Date Title
US20240012810A1 (en) Clause-wise text-to-sql generation
CN102063476B (en) Video searching method and system
CN110442777B (en) BERT-based pseudo-correlation feedback model information retrieval method and system
JP6014725B2 (en) Retrieval and information providing method and system for single / multi-sentence natural language queries
Cantador et al. Enriching ontological user profiles with tagging history for multi-domain recommendations
JP5530425B2 (en) Method, system, and computer program for dynamic generation of user-driven semantic networks and media integration
KR101255405B1 (en) Indexing and searching speech with text meta-data
Heck et al. Leveraging knowledge graphs for web-scale unsupervised semantic parsing
CN111353030A (en) Knowledge question and answer retrieval method and device based on travel field knowledge graph
EP2836935B1 (en) Finding data in connected corpuses using examples
CN110110173A (en) Search result rank and presentation
CN102750949B (en) Voice recognition method and device
CN103092943B (en) A kind of method of advertisement scheduling and advertisement scheduling server
CN105706078A (en) Automatic definition of entity collections
WO2010119288A1 (en) Metadata browser
CN102880599B (en) For resolving the sentence heuristic approach that sentence is also supported to learn this parsing
CN101655862A (en) Method and device for searching information object
KR102729987B1 (en) Apparatus, method and computer program for processing inquiry
CN107704451A (en) Semantic analysis based on grammer networks and lucene
CN103514289A (en) Method and device for building interest entity base
Song et al. VoiceQuerySystem: A voice-driven database querying system using natural language questions
CN103020311B (en) A kind of processing method of user search word and system
WO2025166377A1 (en) Pre-computation for intermediate-representation infused search and retrieval augmented generation
CN117009373A (en) Entity query method, query end, request end and electronic equipment
CN101398828A (en) Information precision search and information publishing method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20180216