CN101789008B - Human-machine interface system knowledge base and its construction method - Google Patents
Human-machine interface system knowledge base and its construction method Download PDFInfo
- Publication number
- CN101789008B CN101789008B CN2010101037221A CN201010103722A CN101789008B CN 101789008 B CN101789008 B CN 101789008B CN 2010101037221 A CN2010101037221 A CN 2010101037221A CN 201010103722 A CN201010103722 A CN 201010103722A CN 101789008 B CN101789008 B CN 101789008B
- Authority
- CN
- China
- Prior art keywords
- corpus
- dialogue
- language material
- corresponding field
- word
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
Images
Landscapes
- Machine Translation (AREA)
Abstract
The invention discloses a man-machine interface system knowledge base. A first language material base is used for storing language material of user initiated dialog; a second language material base is used for storing language material of the returned dialog in domains; a returned language material extracting unit is used for extracting word language material of the corresponding domain according to the knowledge documents of each domain, and sends the extracted word language material of the corresponding domain to the second language material base; a match processing unit is used for matching the language material of the user initiated dialog and the language material in the first language material base to acquire matched dialog initiated language material, and matching the dialog initiated language material and the language material in the second language material base to acquire matched dialog returned language material; and a feedback unit is used for feeding back the matched dialog returned language material to a user. The man-machine interface system knowledge base can realize dialog specificity between the user and a chatting robot, and control the dialog topic in a comparatively special domain; and the two language material bases form the knowledge base together to realize the separation of form and content. Meanwhile, the invention provides a method for constructing the man-machine interface system knowledge base.
Description
Technical field
The present invention relates to human-machine interface technology and natural language processing field, particularly man-machine interface system knowledge base and construction method thereof.
Background technology
Man-machine interface system such as Jabberwacky and ALICEBOT is mainly used in the man-machine conversation field, is commonly referred to chat robots (chatbot), and chat robots mainly is intended to let by every means people and machine engage in the dialogue.Chat robots is realized and the method for user session is that rule match is carried out in user's the input and the knowledge base of chat robots storage, returns to the user to matching result immediately again.Because the match statement of chat robots knowledge base is quite extensive, the dialogue field not to be divided, the conversation content that therefore returns to the user is also quite extensive, is easy to user's diversion is arrived other themes.
Therefore, be necessary to provide a kind of improved man-machine interface system knowledge base and construction method thereof to overcome the defective of prior art.
Summary of the invention
The purpose of this invention is to provide a kind of man-machine interface system knowledge base and construction method thereof, can limit the dialogue field of user and chat robots.
To achieve these goals, the invention provides a kind of man-machine interface system knowledge base comprises first corpus, second corpus, returns language material extraction unit, matching treatment unit and feedback unit.Said first corpus is used to store the language material that the user initiates to talk with; Said second corpus is used for the language material that dialogue is returned in the storage of branch field; The said language material extraction unit that returns is connected with said second corpus, is used for extracting according to each domain knowledge document the word language material in corresponding field, and the word language material in the corresponding field of said extraction is sent to said second corpus; Said matching treatment unit is connected with said second corpus with said first corpus; The language material that is used for the user is initiated to talk with and the language material of said first corpus mate; Obtain the dialogue of coupling and initiate language material; And the language material that said dialogue is initiated in language material and said second corpus matees, and obtains the dialogue of coupling and returns language material; Said feedback unit is connected with said matching treatment unit, is used for that language material is returned in the dialogue of said coupling and feeds back to the user.
In one embodiment of the invention; Said man-machine interface system knowledge base also comprises dialogue language material collector unit; Said dialogue language material collector unit is connected with said first corpus, is used for the experiment that engages in the dialogue to the user, collects the dialogue of experiment and initiates language material; The dialogue initiation language material that frequency of utilization is higher than the prescribed threshold frequency carries out the formalization conclusion, and extremely said first corpus of language material is initiated in the dialogue of sending after formalization is concluded.
In another embodiment of the present invention, the said language material extraction unit that returns comprises that the first order is returned the language material extraction unit and the language material extraction unit is returned in the second level.The said first order is returned the language material extraction unit and is used for extracting the sentence in corresponding field according to each domain knowledge document; The said second level is returned the language material extraction unit and is returned the language material extraction unit with the said first order and be connected with said second corpus; The sentence that is used for returning according to the said first order the corresponding field that the language material extraction unit extracts extracts the word language material in corresponding field; And the word language material in the corresponding field of said extraction carried out the formalization classification; Send the sorted word language material of formalization to said second corpus, the sorted word language material of said formalization is the language material that returns dialogue.
In an embodiment more of the present invention; The classification of said formalization classification is " item ", " behavior and action ", " modification ", " orientation and time " and " pure grammer ", and the sorted word language material of formalization in said corresponding field is preserved in said second corpus classification.
In another embodiment of the present invention; Said man-machine interface system knowledge base also comprises natural language generation system; Said natural language generation system is connected with said matching treatment unit and said feedback unit; Be used for that language material is returned in the dialogue of said coupling and convert natural language to, and the result of said conversion is fed back to the user.
A kind of man-machine interface system knowledge base construction method comprises the steps: to store the language material that the user initiates to talk with; Extract the word language material in corresponding field according to each domain knowledge document; The word language material in the corresponding field that classification and storage is extracted, with the word language material in said corresponding field as the language material that returns dialogue; The language material that the language material that the user is initiated to talk with and the user of said storage initiate to talk with matees, and obtains the dialogue of coupling and initiates language material, and language material is initiated in said dialogue talk with language material with returning of said storage and mate, and obtains the dialogue of coupling and returns language material; Language material is returned in the dialogue of said coupling feed back to the user.
In one embodiment of the invention, said man-machine interface system knowledge base construction method also comprises: to user's experiment that engages in the dialogue, collect the dialogue of experiment and initiate language material, the dialogue that frequency of utilization is higher than the prescribed threshold frequency is initiated language material and is carried out formalization and conclude.The step of the language material that said storage user initiates to talk with is specially: language material is initiated in the dialogue after the file layout conclusion.
In another embodiment of the present invention, the said step of extracting the word language material in corresponding field according to each domain knowledge document is specially: extract the sentence in corresponding field according to each domain knowledge document; Sentence according to the corresponding field of extracting extracts the word language material in corresponding field; Word language material to the corresponding field of extracting carries out the formalization classification, and the sorted word language material of said formalization is the language material that returns dialogue.
In an embodiment more of the present invention, the step that said word language material to the corresponding field of extracting carries out the formalization classification is specially: according to " item ", " behavior and action ", " modifications ", " orientation and time " and " pure grammer " classification the word language material in the corresponding field of extracting is carried out formalization and classify.The step of the word language material in the corresponding field that said storage is extracted is specially: the sorted word language material of formalization in said corresponding field is preserved in classification.
In another embodiment of the present invention, said dialogue with said coupling is returned the step that language material feeds back to the user and is specially: language material is returned in the dialogue of said coupling convert natural language to; The result of said conversion is fed back to the user.
Compared with prior art; Second corpus of man-machine interface system knowledge base of the present invention is the branch field; So have selectivity when user and chat robots dialogue; Can will talk with topic and be controlled in the comparatively special field, thereby as much as possible the professional knowledge point in the field passed to the user through the form of talking with.
In addition, man-machine interface system knowledge base of the present invention is set up the form of knowledge through first corpus, sets up the content of knowledge through second corpus, and two corpus form knowledge base jointly, reaches form and content is separated.
Through following description and combine accompanying drawing, it is more clear that the present invention will become, and these accompanying drawings are used to explain embodiments of the invention.
Description of drawings
Fig. 1 is the structured flowchart of man-machine interface system knowledge base of the present invention.
Fig. 2 is the process flow diagram of inventor's machine interface system construction of knowledge base method.
Embodiment
With reference now to accompanying drawing, describe embodiments of the invention, the similar elements label is represented similar elements in the accompanying drawing.
The present embodiment man-machine interface system knowledge base comprises first corpus 20, dialogue language material collector unit 10, second corpus 30, returns language material extraction unit 40, matching treatment unit 50, feedback unit 70 and natural language generation system 60.
Said first corpus 20 is used to store the language material that the user initiates to talk with;
Said dialogue language material collector unit 10; Be connected with said first corpus 20; Be used for through chat tool for example forms such as chat robots platform, frequently asked question (FAQ, Frequently asked question), user's questionnaire user's experiment that engages in the dialogue is collected the dialogue of experiment and is initiated language material; The dialogue initiation language material that frequency of utilization is higher than the prescribed threshold frequency carries out the formalization conclusion, and extremely said first corpus 20 of language material is initiated in the dialogue of sending after formalization is concluded.Wherein, when the user was experimentized, the number of test was many more, and the dialogue language material of reservation is many more, and the success ratio of back coupling is just high more.
Said second corpus 30 is used for the language material that dialogue is returned in the storage of branch field.
The said language material extraction unit 40 that returns is connected with said second corpus 30, is used for extracting according to each domain knowledge document the word language material in corresponding field, and the word language material in the corresponding field of said extraction is sent to said second corpus 30;
Wherein, the said language material extraction unit 40 that returns comprises that the first order is returned the language material extraction unit and the language material extraction unit is returned in the second level.The first order is returned the language material extraction unit and is used for extracting the sentence in corresponding field according to each domain knowledge document; The second level is returned the language material extraction unit and is returned the language material extraction unit with the said first order and be connected with said second corpus 30; The sentence that is used for returning according to the said first order the corresponding field that the language material extraction unit extracts extracts the word language material in corresponding field; And the word language material in the corresponding field of said extraction carried out the formalization classification; Send the sorted word language material of formalization to said second corpus 30, the sorted word language material of said formalization is the language material that returns dialogue.Wherein, said formalization classification is the word language material interpolation additional information character to the corresponding field of extracting.
By on can know; The said language material extraction unit 40 that returns is described the one-tenth piece of writing of each domain knowledge document to break the whole up into parts and is become the sentence of dialogue; Break the whole up into parts again; The word language material that meets above-mentioned classification in the sentence is extracted, and carry out the formalization classification, send to storage in said second corpus 30 then.
Wherein, the classification of said formalization classification is " item ", " behavior and action ", " modification ", " orientation and time " and " pure grammer ", and the sorted word language material of formalization in said corresponding field is preserved in 30 classification of said second corpus.
Said matching treatment unit 50 is connected with said second corpus 30 with said first corpus 20; The language material that is used for the user is initiated to talk with and the language material of said first corpus 20 mate; Obtain the dialogue of coupling and initiate language material; And the language material that said dialogue is initiated in language material and said second corpus 30 matees, and obtains the dialogue of coupling and returns language material.Matched rule is set up through XML (Extensible MarkupLanguage, extend markup language) and RegExp (Regular Expression, regular expression) in said matching treatment unit 50, and matees based on the matched rule of said foundation.
Said natural language generation system 60 is connected with said matching treatment unit 50, is used for that language material is returned in the dialogue of said coupling and converts natural language to, and the result of said conversion is sent to said feedback unit 70.
Said feedback unit 70 is connected with said natural language generation system 60, is used for the result of said natural language generation system 60 conversions is fed back to the user.
By on can know the form language material (language material is initiated in dialogue) that man-machine interface system knowledge base of the present invention adopts corpus-first corpus and second corpus of two separation to store respectively to initiate dialogue and the content (language material is returned in dialogue) of knowledge that dialog procedure contains.Particularly, the present invention sets up the form of knowledge representation through first corpus 20, sets up the content of knowledge through second corpus 30, and two corpus form knowledge base jointly, reaches form and content is separated.
In addition; Second corpus 30 of native system knowledge base is the branch field; So have selectivity when user and chat robots dialogue, can will talk with topic and be controlled in the comparatively special field, thereby as much as possible the professional knowledge point in the field passed to the user through the form of talking with.Be appreciated that ground, the knowledge base that the present invention sets up can be developed various application fast, for example: question and answer learning system, advertisement recommendation system etc.Be different from general chat robots, the knowledge base that this invention generates is only applicable to a special field, and only to special theme, therefore, the user can't prevent that the user from having become random chat by learning knowledge with dispersion attention to other places.And because the content of knowledge base divides the field to collect, therefore, the knowledge of different field can constantly be added in the later stage.Therefore, this knowledge base model has expandability.
As shown in Figure 2, a kind of man-machine interface system knowledge base construction method comprises the steps:
Step S10; Through chat tool for example chat robots platform, frequently asked question (FAQ; Frequentlyasked question), form such as user's questionnaire is to user's experiment that engages in the dialogue; Collect the dialogue of experiment and initiate language material, the dialogue initiation language material that frequency of utilization is higher than the prescribed threshold frequency carries out the formalization conclusion;
Step S20, language material is initiated in the dialogue after the file layout conclusion;
Step S30 extracts the sentence in corresponding field according to each domain knowledge document;
Step S40 the word language material in corresponding field according to the sentence in the corresponding field of extracting;
Step S50; According to " item ", " behavior and action ", " modification ", " orientation and time " and " pure grammer " classification the word language material in the corresponding field of extraction is carried out the formalization classification; Preserve the sorted word language material of formalization in said corresponding field, with the sorted word language material of the formalization in said corresponding field as the language material that returns dialogue;
Step S60; The language material that the language material that the user is initiated to talk with and the user of said storage initiate to talk with matees; Obtain the dialogue of coupling and initiate language material, and the dialogue language material that returns of said dialogue initiation language material and said storage is mated, obtain the dialogue of coupling and return language material;
Step S70 returns the dialogue of said coupling to language material and converts natural language to, promptly based on dialogic operation in first corpus and the knowledge content that in second corpus, matees structure nature statement;
Step S80 feeds back to the user with the result of said conversion.
Invention has been described more than to combine most preferred embodiment, but the present invention is not limited to the embodiment of above announcement, and should contain various modification, equivalent combinations of carrying out according to essence of the present invention.
Claims (2)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN2010101037221A CN101789008B (en) | 2010-01-26 | 2010-01-26 | Human-machine interface system knowledge base and its construction method |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN2010101037221A CN101789008B (en) | 2010-01-26 | 2010-01-26 | Human-machine interface system knowledge base and its construction method |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| CN101789008A CN101789008A (en) | 2010-07-28 |
| CN101789008B true CN101789008B (en) | 2012-02-01 |
Family
ID=42532224
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN2010101037221A Expired - Fee Related CN101789008B (en) | 2010-01-26 | 2010-01-26 | Human-machine interface system knowledge base and its construction method |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN101789008B (en) |
Families Citing this family (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2015089043A (en) * | 2013-10-31 | 2015-05-07 | シャープ株式会社 | Device control server, device control method, notification device, robot terminal, device control system, and program |
| US11477139B2 (en) | 2016-02-25 | 2022-10-18 | Meta Platforms, Inc. | Techniques for messaging bot rich communication |
| CN106844732B (en) * | 2017-02-13 | 2020-05-08 | 长沙军鸽软件有限公司 | Method for automatically acquiring session scene label incapable of being directly acquired |
| CN107832291B (en) * | 2017-10-26 | 2020-03-31 | 平安科技(深圳)有限公司 | Man-machine cooperation customer service method, electronic device and storage medium |
| CN109582777A (en) * | 2018-12-06 | 2019-04-05 | 中国银行股份有限公司 | A kind of human-machine intelligence's processing method and system |
| CN112100338B (en) * | 2020-11-02 | 2022-02-25 | 北京淇瑀信息科技有限公司 | Dialog theme extension method, device and system for intelligent robot |
Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US6728695B1 (en) * | 2000-05-26 | 2004-04-27 | Burning Glass Technologies, Llc | Method and apparatus for making predictions about entities represented in documents |
| CN1949211A (en) * | 2005-10-13 | 2007-04-18 | 中国科学院自动化研究所 | New Chinese characters spoken language analytic method and device |
| CN101377777A (en) * | 2007-09-03 | 2009-03-04 | 北京百问百答网络技术有限公司 | Automatic inquiring and answering method and system |
| CN101630314A (en) * | 2008-07-16 | 2010-01-20 | 中国科学院自动化研究所 | Semantic query expansion method based on domain knowledge |
-
2010
- 2010-01-26 CN CN2010101037221A patent/CN101789008B/en not_active Expired - Fee Related
Patent Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US6728695B1 (en) * | 2000-05-26 | 2004-04-27 | Burning Glass Technologies, Llc | Method and apparatus for making predictions about entities represented in documents |
| CN1949211A (en) * | 2005-10-13 | 2007-04-18 | 中国科学院自动化研究所 | New Chinese characters spoken language analytic method and device |
| CN101377777A (en) * | 2007-09-03 | 2009-03-04 | 北京百问百答网络技术有限公司 | Automatic inquiring and answering method and system |
| CN101630314A (en) * | 2008-07-16 | 2010-01-20 | 中国科学院自动化研究所 | Semantic query expansion method based on domain knowledge |
Also Published As
| Publication number | Publication date |
|---|---|
| CN101789008A (en) | 2010-07-28 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN101789008B (en) | Human-machine interface system knowledge base and its construction method | |
| CN110674639B (en) | Natural language understanding method based on pre-training model | |
| CN105786798B (en) | Natural language is intended to understanding method in a kind of human-computer interaction | |
| JP6667504B2 (en) | Orphan utterance detection system and method | |
| CN110335595A (en) | Speech recognition-based interrogation dialogue method, device and storage medium | |
| CN111078856B (en) | Group chat conversation processing method and device and electronic equipment | |
| CN108427722A (en) | intelligent interactive method, electronic device and storage medium | |
| US8509396B2 (en) | Automatic creation of complex conversational natural language call routing system for call centers | |
| CN103413549A (en) | Voice interaction method and system and interaction terminal | |
| CN110704590B (en) | Method and apparatus for augmenting training samples | |
| CN106710586A (en) | Speech recognition engine automatic switching method and device | |
| KR101677859B1 (en) | Method for generating system response using knowledgy base and apparatus for performing the method | |
| US20170011114A1 (en) | Common data repository for improving transactional efficiencies of user interactions with a computing device | |
| CN103744836A (en) | Man-machine conversation method and device | |
| TW200933391A (en) | Network information search method applying speech recognition and sysrem thereof | |
| CN108595406B (en) | A reminding method, device, electronic device and storage medium of user status | |
| CN110442855A (en) | A kind of speech analysis method and system | |
| CN117725175A (en) | Intelligent system based on large language model | |
| JP2023034235A (en) | Text summarization method and text summarization system | |
| CN109388695B (en) | User intent recognition method, device, and computer-readable storage medium | |
| Harsha et al. | Lexical ambiguity in natural language processing applications | |
| CN110706704A (en) | Method, device and computer equipment for generating voice interaction prototype | |
| CN109147792A (en) | A kind of voice resume system | |
| CN111046149A (en) | Content recommendation method and device, electronic device and storage medium | |
| CN112149425A (en) | Terminal control method, device, equipment and computer readable storage medium |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| C06 | Publication | ||
| PB01 | Publication | ||
| C10 | Entry into substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| C14 | Grant of patent or utility model | ||
| GR01 | Patent grant | ||
| CF01 | Termination of patent right due to non-payment of annual fee | ||
| CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20120201 Termination date: 20130126 |