[go: up one dir, main page]

CN118520095A - Method and device for generating reply content based on language model - Google Patents

Method and device for generating reply content based on language model Download PDF

Info

Publication number
CN118520095A
CN118520095A CN202410891489.XA CN202410891489A CN118520095A CN 118520095 A CN118520095 A CN 118520095A CN 202410891489 A CN202410891489 A CN 202410891489A CN 118520095 A CN118520095 A CN 118520095A
Authority
CN
China
Prior art keywords
knowledge
data
node
current query
query data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202410891489.XA
Other languages
Chinese (zh)
Inventor
许春媛
贾伟
张安洁
陈梓健
汪利飞
尤炜岑
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Lazas Network Technology Shanghai Co Ltd
Zhejiang Bird Tide Supply Chain Management Co ltd
Original Assignee
Lazas Network Technology Shanghai Co Ltd
Zhejiang Bird Tide Supply Chain Management Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Lazas Network Technology Shanghai Co Ltd, Zhejiang Bird Tide Supply Chain Management Co ltd filed Critical Lazas Network Technology Shanghai Co Ltd
Priority to CN202410891489.XA priority Critical patent/CN118520095A/en
Publication of CN118520095A publication Critical patent/CN118520095A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • G06F16/316Indexing structures
    • G06F16/322Trees
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • G06F40/35Discourse or dialogue representation
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/02Knowledge representation; Symbolic representation
    • G06N5/022Knowledge engineering; Knowledge acquisition

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Human Computer Interaction (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本公开提供了一种基于语言模型的答复内容生成方法及装置,答复内容生成方法包括:获取用户的当前询问数据,并基于所述当前询问数据,从预设的知识库中查询与所述当前询问数据关联的目标知识,其中,所述目标知识包括与所述当前询问数据匹配的第一知识,以及与所述第一知识存在层次关系的第二知识,用于构建所述知识库的源数据包含具有层次关系的多条知识;生成包含所述当前询问数据和所述目标知识的提示信息;将所述提示信息输入至基于预训练的语言模型中,由所述语言模型生成所述提示信息对应的答复内容,能够增强知识库拓展性的同时还增强了知识库检索的拓展性,从而使语言模型基于检索所生成的答复内容准确性增强。

The present disclosure provides a method and device for generating reply content based on a language model, the method for generating reply content comprising: obtaining current inquiry data of a user, and based on the current inquiry data, querying target knowledge associated with the current inquiry data from a preset knowledge base, wherein the target knowledge comprises first knowledge matching the current inquiry data, and second knowledge having a hierarchical relationship with the first knowledge, and source data for constructing the knowledge base comprises multiple pieces of knowledge having a hierarchical relationship; generating prompt information comprising the current inquiry data and the target knowledge; inputting the prompt information into a language model based on pre-training, and generating reply content corresponding to the prompt information by the language model, which can enhance the extensibility of the knowledge base while also enhancing the extensibility of the knowledge base retrieval, thereby enhancing the accuracy of the reply content generated by the language model based on the retrieval.

Description

基于语言模型的答复内容生成方法及装置Method and device for generating reply content based on language model

技术领域Technical Field

本公开涉及自然语言处理领域,尤其涉及一种基于语言模型的答复内容生成方法及装置。The present disclosure relates to the field of natural language processing, and in particular to a method and device for generating reply content based on a language model.

背景技术Background Art

对话系统是一种人机交互的技术,旨在使计算机能够与人类进行自然而流畅的对话,对话系统是人工智能领域的重要研究方向,具有重要的实际应用价值和广泛的普适性,相关技术中的对话系统已应用在智能客服、智能助手、聊天机器人等多种领域中。The dialogue system is a human-computer interaction technology that aims to enable computers to have natural and fluent conversations with humans. The dialogue system is an important research direction in the field of artificial intelligence. It has important practical application value and wide universality. The dialogue system in related technologies has been applied in various fields such as intelligent customer service, intelligent assistants, and chatbots.

相关技术的对话系统已具备优异的文本理解能力与对话能力,然而,对于已有的基于知识库的对话系统,由于知识库的拓展性较差,对话系统在对话时依赖知识库所生成的答复内容的精确性还有待提升。The dialogue systems of related technologies already have excellent text comprehension and dialogue capabilities. However, for existing knowledge-base-based dialogue systems, due to the poor scalability of the knowledge base, the accuracy of the response content generated by the dialogue system relying on the knowledge base during the dialogue needs to be improved.

发明内容Summary of the invention

针对上述技术问题,本公开提供一种基于语言模型的答复内容生成方法及装置,技术方案如下:In view of the above technical problems, the present disclosure provides a method and device for generating reply content based on a language model, and the technical solution is as follows:

根据本说明书实施例的第一方面,提供一种基于语言模型的答复内容生成方法,该方法包括:According to a first aspect of an embodiment of this specification, a method for generating reply content based on a language model is provided, the method comprising:

获取用户的当前询问数据,并基于所述当前询问数据,从预设的知识库中查询与所述当前询问数据关联的目标知识,其中,所述目标知识包括与所述当前询问数据匹配的第一知识,以及与所述第一知识存在层次关系的第二知识,用于构建所述知识库的源数据包含具有层次关系的多条知识;Acquire the user's current query data, and based on the current query data, query the target knowledge associated with the current query data from a preset knowledge base, wherein the target knowledge includes first knowledge matching the current query data and second knowledge having a hierarchical relationship with the first knowledge, and the source data used to construct the knowledge base includes multiple pieces of knowledge having a hierarchical relationship;

生成包含所述当前询问数据和所述目标知识的提示信息;generating prompt information including the current query data and the target knowledge;

将所述提示信息输入至基于预训练的语言模型中,由所述语言模型生成所述提示信息对应的答复内容。The prompt information is input into a pre-trained language model, and the language model generates a reply content corresponding to the prompt information.

根据本说明书实施例的第二方面,提供一种基于语言模型的答复内容生成装置,所述装置包括:According to a second aspect of an embodiment of this specification, there is provided a device for generating reply content based on a language model, the device comprising:

查询模块,用于:获取用户的当前询问数据,并基于所述当前询问数据,从预设的知识库中查询与所述当前询问数据关联的目标知识,其中,所述目标知识包括与所述当前询问数据匹配的第一知识,以及与所述第一知识存在层次关系的第二知识,用于构建所述知识库的源数据包含具有层次关系的多条知识;A query module, used to: obtain the user's current query data, and based on the current query data, query the target knowledge associated with the current query data from a preset knowledge base, wherein the target knowledge includes first knowledge matching the current query data and second knowledge having a hierarchical relationship with the first knowledge, and the source data used to construct the knowledge base includes multiple pieces of knowledge having a hierarchical relationship;

生成模块,用于:生成包含所述当前询问数据和所述目标知识的提示信息;将所述提示信息输入至基于预训练的语言模型中,由所述语言模型生成所述提示信息对应的答复内容。A generation module is used to: generate prompt information including the current query data and the target knowledge; input the prompt information into a pre-trained language model, and generate a reply content corresponding to the prompt information by the language model.

本说明书的实施例提供的技术方案可以包括以下有益效果:The technical solutions provided by the embodiments of this specification may have the following beneficial effects:

本说明书实施例中,在预设的知识库中检索与用户的当前询问数据关联的目标知识,并结合检索到的目标知识与用户的当前询问数据以生成提示信息,输入语言模型,由语言模型生成提示信息对应的答复内容,在答复内容生成环节,通过知识库检索的方式能够引入外部知识源,以增强语言模型的答复能力,同时,针对具有层次关系的知识源,预先构建出能够包含该层次关系的知识库,在该知识库中查询与用户的当前询问数据关联的知识时,能够直接查询到与当前询问数据匹配的第一知识,并进一步查询到与该第一知识存在层次关系的第二知识,在增强知识库拓展性的同时还增强了知识库检索的拓展性,从而使语言模型基于检索所生成的答复内容准确性增强。In an embodiment of the present specification, target knowledge associated with the user's current query data is retrieved from a preset knowledge base, and the retrieved target knowledge is combined with the user's current query data to generate prompt information, which is input into a language model, and the language model generates reply content corresponding to the prompt information. In the reply content generation stage, external knowledge sources can be introduced through knowledge base retrieval to enhance the reply capability of the language model. At the same time, for knowledge sources with hierarchical relationships, a knowledge base that can contain the hierarchical relationship is pre-constructed. When searching for knowledge associated with the user's current query data in the knowledge base, the first knowledge matching the current query data can be directly queried, and the second knowledge that has a hierarchical relationship with the first knowledge can be further queried. While enhancing the scalability of the knowledge base, the scalability of the knowledge base retrieval is also enhanced, thereby enhancing the accuracy of the reply content generated by the language model based on the retrieval.

应当理解的是,以上的一般描述和后文的细节描述仅是示例性和解释性的,并不能限制本公开。It is to be understood that the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the present disclosure.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

为了更清楚地说明本说明书实施例或相关技术中的技术方案,下面将对实施例或相关技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本公开中记载的一些实施例,对于本领域普通技术人员来讲,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions in the embodiments of this specification or related technologies, the drawings required for use in the embodiments or related technical descriptions will be briefly introduced below. Obviously, the drawings described below are only some embodiments recorded in this disclosure, and for ordinary technicians in this field, other drawings can also be obtained based on these drawings.

图1是本说明书实施例提供的一种交互系统的结构示意图;FIG1 is a schematic diagram of the structure of an interactive system provided by an embodiment of this specification;

图2是本说明书一个实施例的答复内容生成方法的流程示意图;FIG2 is a flow chart of a method for generating reply content according to an embodiment of the present specification;

图3是本说明书一个实施例的文档结构示意图;FIG3 is a schematic diagram of a document structure of an embodiment of the present specification;

图4是本说明书一个实施例的Markdown格式示意图;FIG4 is a schematic diagram of the Markdown format of an embodiment of the present specification;

图5是本说明书一个实施例的KB-Tree结构示意图;FIG5 is a schematic diagram of the KB-Tree structure of an embodiment of the present specification;

图6是本说明书一个实施例的知识节点层次关系示意图;FIG6 is a schematic diagram of the hierarchical relationship of knowledge nodes according to an embodiment of the present specification;

图7是本说明书一个实施例的答复内容生成具体应用场景示意图;FIG7 is a schematic diagram of a specific application scenario of generating reply content according to an embodiment of the present specification;

图8是本说明书一个实施例的答复内容生成装置的结构示意图;FIG8 is a schematic diagram of the structure of a reply content generating device according to an embodiment of the present specification;

图9是本说明书一个实施例的电子设备的结构示意图。FIG. 9 is a schematic diagram of the structure of an electronic device according to an embodiment of the present specification.

具体实施方式DETAILED DESCRIPTION

这里将详细地对示例性实施例进行说明,其示例表示在附图中。下面的描述涉及附图时,除非另有表示,不同附图中的相同数字表示相同或相似的要素。以下示例性实施例中所描述的实施方式并不代表与本说明书相一致的所有实施方式。相反,它们仅是与如所附权利要求书中所详述的、本说明书的一些方面相一致的装置和方法的例子。Exemplary embodiments will be described in detail herein, examples of which are shown in the accompanying drawings. When the following description refers to the drawings, the same numbers in different drawings represent the same or similar elements unless otherwise indicated. The implementations described in the following exemplary embodiments do not represent all implementations consistent with this specification. Instead, they are merely examples of devices and methods consistent with some aspects of this specification as detailed in the appended claims.

在本说明书使用的术语是仅仅出于描述特定实施例的目的,而非旨在限制本说明书。在本说明书和所附权利要求书中所使用的单数形式的“一种”、“所述”和“该”也旨在包括多数形式,除非上下文清楚地表示其他含义。还应当理解,本文中使用的术语“和/或”是指并包含一个或多个相关联的列出项目的任何或所有可能组合。The terms used in this specification are for the purpose of describing specific embodiments only and are not intended to limit this specification. The singular forms "a", "the" and "the" used in this specification and the appended claims are also intended to include plural forms unless the context clearly indicates otherwise. It should also be understood that the term "and/or" used herein refers to and includes any or all possible combinations of one or more associated listed items.

应当理解,尽管在本说明书可能采用术语第一、第二、第三等来描述各种信息,但这些信息不应限于这些术语。这些术语仅用来将同一类型的信息彼此区分开。例如,在不脱离本说明书范围的情况下,第一信息也可以被称为第二信息,类似地,第二信息也可以被称为第一信息。取决于语境,如在此所使用的词语“如果”可以被解释成为“在……时”或“当……时”或“响应于确定”。It should be understood that although the terms first, second, third, etc. may be used in this specification to describe various information, these information should not be limited to these terms. These terms are only used to distinguish the same type of information from each other. For example, without departing from the scope of this specification, the first information may also be referred to as the second information, and similarly, the second information may also be referred to as the first information. Depending on the context, the word "if" as used herein may be interpreted as "at the time of" or "when" or "in response to determining".

对话系统是一种人机交互的技术,旨在使计算机能够与人类进行自然而流畅的对话,对话系统是人工智能领域的重要研究方向,具有重要的实际应用价值和广泛的普适性,对话系统的重要性在于它可以提供高效便捷的人机交互方式,传统的人机界面,如图形用户界面和命令行界面,需要用户学习特定的操作和命令,并遵循固定的交互模式,而对话系统通过自然语言交互,更接近人类日常的交流方式,使得用户可以通过自然的对话方式表达意图、提问问题、获取信息等,无需学习特定的命令和操作。The dialogue system is a human-computer interaction technology that aims to enable computers to have natural and fluent conversations with humans. The dialogue system is an important research direction in the field of artificial intelligence, with important practical application value and wide universality. The importance of the dialogue system lies in that it can provide an efficient and convenient human-computer interaction method. Traditional human-computer interfaces, such as graphical user interfaces and command line interfaces, require users to learn specific operations and commands and follow fixed interaction modes. The dialogue system interacts through natural language, which is closer to the daily communication method of humans. It allows users to express their intentions, ask questions, and obtain information through natural dialogue without learning specific commands and operations.

对话系统在现实生活和技术领域的应用非常广泛,在日常生活中,对话系统的应用场景之一——语音助手已经成为人们常用的工具,可以帮助用户完成各种任务,如査询天气、播放音乐、设置提醒等,此外,对话系统在客户服务领域也得到广泛应用,可以提供自动化的客户支持和问题解答,减轻人工客服的负担。Dialogue systems are widely used in real life and technology. In daily life, one of the application scenarios of dialogue systems - voice assistants have become a common tool that can help users complete various tasks, such as checking the weather, playing music, setting reminders, etc. In addition, dialogue systems are also widely used in the field of customer service, providing automated customer support and problem solving, reducing the burden on manual customer service.

在技术领域,对话系统被广泛应用于智能机器人、虚拟个人助手、智能推荐系统等,智能机器人可以与用户进行实时对话,并执行指定的任务,如家庭助理机器人可以通过对话与用户交流并执行家庭管理任务,虚拟个人助手(如智能手机上的语音助手)可以为用户提供个性化的服务和建议,智能推荐系统可以通过对话与用户了解其喜好和偏好,从而提供个性化的推荐信息。In the field of technology, dialogue systems are widely used in intelligent robots, virtual personal assistants, intelligent recommendation systems, etc. Intelligent robots can have real-time conversations with users and perform designated tasks. For example, home assistant robots can communicate with users through dialogues and perform household management tasks. Virtual personal assistants (such as voice assistants on smartphones) can provide users with personalized services and suggestions. Intelligent recommendation systems can understand users' likes and preferences through dialogues, thereby providing personalized recommendation information.

此外,对话系统在教育、医疗、金融、电子商务等领域也有广泛的应用,在教育领域,对话系统可以作为学习伴侣,提供个性化的学习支持和反馈,在医疗领域,对话系统可以用于健康咨询、疾病诊断辅助等方面,在金融领域,对话系统可以提供客户服务、投资咨询等服务,在电子商务领域,对话系统可以提供个性化的商品推荐、购物指导等服务。In addition, dialogue systems are also widely used in education, medical care, finance, e-commerce and other fields. In the field of education, dialogue systems can serve as learning companions to provide personalized learning support and feedback. In the medical field, dialogue systems can be used for health consultation, disease diagnosis assistance, etc. In the financial field, dialogue systems can provide customer service, investment consulting and other services. In the field of e-commerce, dialogue systems can provide personalized product recommendations, shopping guidance and other services.

相关技术的对话系统已具备优异的文本理解能力与对话能力,然而,对于已有的基于知识库的对话系统,由于知识库的拓展性较差,对话系统在对话时依赖知识库所生成的答复内容的精确性还有待提升。例如,基于问答(Question-Answer,QA)知识库的传统问答机器人,在知识库构造阶段,需要大量专家经验将文档知识人工整理成问题-答案对的形式,知识库维护的人力成本高昂、时效性差;在用户问题阶段,只能基于存有问题-答案对的知识库中相关问题固定对应的答案进行答复,答复无拓展性;在答复阶段,由于知识库中问题对应的答案是固定不变的,则机器人答复的话术模板固定,无阅读理解能力。The dialogue system of related technologies has excellent text comprehension and dialogue capabilities. However, for the existing dialogue system based on knowledge base, due to the poor scalability of the knowledge base, the accuracy of the reply content generated by the dialogue system during the dialogue still needs to be improved. For example, the traditional question-answering robot based on the question-answer (QA) knowledge base requires a lot of expert experience to manually organize the document knowledge into question-answer pairs during the knowledge base construction stage. The labor cost of knowledge base maintenance is high and the timeliness is poor. In the user question stage, it can only reply based on the fixed answers corresponding to the relevant questions in the knowledge base where the question-answer pairs exist, and the reply is not scalable. In the reply stage, since the answers corresponding to the questions in the knowledge base are fixed and unchanging, the robot's reply has a fixed speech template and no reading comprehension ability.

虽然已有的对话系统已经在多种领域进行了应用,但仍有部分领域并未涉及到对话系统的应用,以内容安全领域为例,随着内容安全领域的业务形态和场景的多样化及复杂化,内容安全领域沉淀的知识(包括规则、案例、解读、风险日历、工作流程、评审机制、对接人员等知识)也越来越多,然而,当前内容安全领域下的业务人员在需要获取内容安全领域的知识时,往往需要人工咨询内容安全运营人员,由内容安全运营人员进行人工回复,工作效率较为低下,因此,如何将对话系统应用在内容安全领域,使业务人员(即发起询问的用户)在不需要成为内容安全领域专家的情况下,能够利用对话系统更方便、更精准、更智能的获取所需要的内容安全领域知识内容及工作指引也是亟待解决的问题。Although existing dialogue systems have been applied in many fields, there are still some fields that do not involve the application of dialogue systems. Take the content security field as an example. With the diversification and complexity of business forms and scenarios in the content security field, the accumulated knowledge in the content security field (including rules, cases, interpretations, risk calendars, workflows, review mechanisms, liaison personnel, etc.) is also increasing. However, when business personnel in the current content security field need to obtain knowledge in the content security field, they often need to manually consult content security operators, who will manually reply to them, and the work efficiency is relatively low. Therefore, how to apply the dialogue system in the content security field so that business personnel (that is, users who initiate inquiries) can use the dialogue system to obtain the required content security knowledge and work guidelines more conveniently, accurately and intelligently without becoming experts in the content security field is also an urgent problem to be solved.

针对上述问题,本说明书实施例提供一种基于语言模型的答复内容生成方法,能够增强知识库拓展性的同时还增强了知识库检索的拓展性,从而使语言模型基于检索所生成的答复内容准确性增强。在实际应用过程中,首先获取用户的当前询问数据,并基于用户的当前询问数据,从预设的知识库中查询与该当前询问数据关联的目标知识,其中,目标知识包括与所述当前询问数据匹配的第一知识,以及与第一知识存在层次关系的第二知识,用于构建知识库的源数据包含具有层次关系的多条知识,查询出目标知识后,生成包含用户的当前询问数据以及与当前询问数据关联的目标知识的提示信息,并将生成的提示信息输入至基于预训练的语言模型中,由该语言模型生成提示信息对应的答复内容。In response to the above problems, the embodiments of this specification provide a method for generating reply content based on a language model, which can enhance the extensibility of the knowledge base and the extensibility of the knowledge base search, thereby enhancing the accuracy of the reply content generated by the language model based on the search. In the actual application process, the user's current query data is first obtained, and based on the user's current query data, the target knowledge associated with the current query data is queried from the preset knowledge base, wherein the target knowledge includes the first knowledge matching the current query data, and the second knowledge having a hierarchical relationship with the first knowledge, and the source data used to construct the knowledge base includes multiple pieces of knowledge with a hierarchical relationship. After the target knowledge is queried, prompt information including the user's current query data and the target knowledge associated with the current query data is generated, and the generated prompt information is input into the pre-trained language model, and the language model generates the reply content corresponding to the prompt information.

本说明书实施例,可以在预设的知识库中检索与用户的当前询问数据关联的目标知识,并结合检索到的目标知识与用户的当前询问数据以生成提示信息,输入语言模型,由语言模型生成提示信息对应的答复内容,在答复内容生成环节,通过知识库检索的方式能够引入外部知识源,以增强语言模型的答复能力,同时,针对具有层次关系的知识源,预先构建出能够包含该层次关系的知识库,在该知识库中查询与用户的当前询问数据关联的知识时,能够直接查询到与当前询问数据匹配的第一知识,并进一步查询到与该第一知识存在层次关系的第二知识,在增强知识库拓展性的同时还增强了知识库检索的拓展性,从而使语言模型基于检索所生成的答复内容准确性增强。In the embodiments of the present specification, the target knowledge associated with the user's current query data can be retrieved from a preset knowledge base, and the retrieved target knowledge and the user's current query data can be combined to generate prompt information, which is input into a language model, and the language model generates reply content corresponding to the prompt information. In the reply content generation stage, external knowledge sources can be introduced through knowledge base retrieval to enhance the reply capability of the language model. At the same time, for knowledge sources with hierarchical relationships, a knowledge base that can contain the hierarchical relationship is pre-constructed. When searching for knowledge associated with the user's current query data in the knowledge base, the first knowledge that matches the current query data can be directly queried, and the second knowledge that has a hierarchical relationship with the first knowledge can be further queried. While enhancing the scalability of the knowledge base, the scalability of the knowledge base retrieval is also enhanced, thereby enhancing the accuracy of the reply content generated by the language model based on the retrieval.

在一示例性的实施例中,请参阅图1,图1为本说明书实施例提供的一种交互系统,所述交互系统包括服务端101和至少一个客户端102,示例性地,所述客户端102可以通过网络访问服务端101以使用服务端101提供的服务,包括但不限于对话服务、商品配送服务、商品购买服务、阅读服务、音视频播放服务、或者搜索服务等等。In an exemplary embodiment, please refer to Figure 1, which is an interactive system provided in an embodiment of this specification, wherein the interactive system includes a server 101 and at least one client 102. Exemplarily, the client 102 can access the server 101 through the network to use the services provided by the server 101, including but not limited to conversation services, product delivery services, product purchase services, reading services, audio and video playback services, or search services, etc.

服务端101可以是安装在后台设备中为用户提供服务的程序。示例性的,如图1所示,该后台设备可以是服务器,该服务器可以是独立的物理服务器,也可以是多个物理服务器构成的服务器集群或者分布式系统,还可以是提供云服务、云数据库、云计算、云函数、云存储、网络服务、云通信、中间件服务、域名服务、安全服务、CDN(Content DeliveryNetwork,内容分发网络)、以及大数据和人工智能平台等基础云计算服务的云服务器。The server 101 may be a program installed in a background device to provide services to users. Exemplarily, as shown in FIG1 , the background device may be a server, which may be an independent physical server, or a server cluster or distributed system composed of multiple physical servers, or a cloud server that provides basic cloud computing services such as cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communications, middleware services, domain name services, security services, CDN (Content Delivery Network), and big data and artificial intelligence platforms.

客户端102可以是安装在用户设备中为用户提供服务的程序,客户端102包括但不限于应用程序APP、Web网页、小程序、插件或组件等。如图1所示,用户设备包括但不限于智能手机、个人数字助理、平板电脑、个人计算机、笔记本电脑、虚拟现实终端设备、增强现实终端设备等。The client 102 may be a program installed in a user device to provide services to the user, and the client 102 includes but is not limited to an application APP, a web page, a small program, a plug-in or a component, etc. As shown in FIG1 , the user device includes but is not limited to a smart phone, a personal digital assistant, a tablet computer, a personal computer, a laptop computer, a virtual reality terminal device, an augmented reality terminal device, etc.

本说明书实施例提供的基于语言模型的答复内容生成方法可以由服务端101和客户端102中的任一种来执行,或者由两者配合(两者各执行一部分步骤)来执行,本实施例对此不做任何限制。以服务端101执行所述答复内容生成方法为例,服务端101可以部署有语言模型,客户端102可以将用户的当前询问数据发送给服务端101,服务端101可以执行本说明书实施例提供的答复内容生成方法得到答复内容,进而可以将答复内容发送给客户端102,以使得客户端102可以展示该答复内容;或者也可以在客户端102中部署语言模型,由客户端102执行所述答复内容生成方法,对此具体不作限定。The method for generating reply content based on the language model provided in the embodiment of this specification can be executed by either the server 101 or the client 102, or by the cooperation of the two (each of which executes a part of the steps), and this embodiment does not impose any restrictions on this. Taking the server 101 executing the reply content generation method as an example, the server 101 can be deployed with a language model, the client 102 can send the user's current inquiry data to the server 101, and the server 101 can execute the reply content generation method provided in the embodiment of this specification to obtain the reply content, and then the reply content can be sent to the client 102 so that the client 102 can display the reply content; or the language model can be deployed in the client 102, and the reply content generation method can be executed by the client 102, and there is no specific limitation on this.

接下来对答复内容生成方法的流程进行示例性说明:Next, the process of the reply content generation method is exemplified:

请参阅图2,本说明书实施例提供了一种基于语言模型的答复内容生成方法,如图2所示,该方法包括以下步骤:Please refer to FIG2 . The embodiment of this specification provides a method for generating reply content based on a language model. As shown in FIG2 , the method includes the following steps:

S201、获取用户的当前询问数据,并基于所述当前询问数据,从预设的知识库中查询与所述当前询问数据关联的目标知识。S201. Acquire current query data of a user, and based on the current query data, query target knowledge associated with the current query data from a preset knowledge base.

其中,所述目标知识包括与所述当前询问数据匹配的第一知识,以及与所述第一知识存在层次关系的第二知识,用于构建所述知识库的源数据包含具有层次关系的多条知识。The target knowledge includes first knowledge matching the current query data and second knowledge having a hierarchical relationship with the first knowledge, and the source data used to construct the knowledge base includes multiple pieces of knowledge having a hierarchical relationship.

S202、生成包含所述当前询问数据和所述目标知识的提示信息。S202: Generate prompt information including the current query data and the target knowledge.

S203、将所述提示信息输入至基于预训练的语言模型中,由所述语言模型生成所述提示信息对应的答复内容。S203: input the prompt information into a pre-trained language model, and use the language model to generate a reply content corresponding to the prompt information.

作为一个示例,可以构建某个垂直领域的知识库,以基于该知识库实现对该垂直领域的自动问答的语言模型。实际应用中,一些垂直领域的知识库的源数据,包含具有层次关系的多条知识,例如内容安全领域等。以内容安全领域为例,该领域的源数据包括法律法规、规则、案例、工作流程或评审流程等数据,这些数据中记录的多条知识具有层次关系。值得说明的是,上述以内容安全领域为例对垂直领域的介绍仅是示例性展示,在实际应用中,本实施例的知识库还可以是其他各种垂直领域,对此具体不作限定。As an example, a knowledge base in a certain vertical field can be constructed to implement a language model for automatic question and answering in the vertical field based on the knowledge base. In actual applications, the source data of some vertical field knowledge bases contain multiple pieces of knowledge with hierarchical relationships, such as the content security field. Taking the content security field as an example, the source data in this field includes data such as laws, regulations, rules, cases, work processes or review processes, and the multiple pieces of knowledge recorded in these data have a hierarchical relationship. It is worth noting that the above introduction to the vertical field using the content security field as an example is only an exemplary display. In actual applications, the knowledge base of this embodiment can also be various other vertical fields, and there is no specific limitation on this.

上述知识库可以通过多种方式构建,作为例子,其中一种构建方式可以包括:用于构建知识库的源数据可以包括文档,可以识别该文档中文本,将识别到的文本转换为Markdown格式的数据,再基于该Markdown格式的数据构建树结构,得到包含该树结构的知识库。基于带有层次结构信息的Markdown存储格式构建树结构的数据结构,从而搭建知识库中具有层次关系的知识结构,值得说明的是,上述对于知识库的构建方式的介绍仅是示例性展示,在实际应用中,不排除存在其他构建实现,对此不作限定。其中,树结构中包含多个节点,每个节点存储一条知识,各个节点之间的关系与源数据中各条知识的层次关系相对应。The above-mentioned knowledge base can be constructed in a variety of ways. As an example, one of the construction methods may include: the source data used to construct the knowledge base may include a document, the text in the document may be identified, the identified text is converted into data in Markdown format, and then a tree structure is constructed based on the data in Markdown format to obtain a knowledge base containing the tree structure. A data structure of a tree structure is constructed based on the Markdown storage format with hierarchical information, so as to build a knowledge structure with a hierarchical relationship in the knowledge base. It is worth noting that the above introduction to the construction method of the knowledge base is only an exemplary display. In actual applications, other construction implementations are not excluded and are not limited to this. Among them, the tree structure contains multiple nodes, each node stores a piece of knowledge, and the relationship between each node corresponds to the hierarchical relationship between each piece of knowledge in the source data.

作为例子,上述树结构可以是KB-Tree的数据结构,也可以是其他数据结构,对于树结构的具体实现不作限制。As an example, the tree structure may be a KB-Tree data structure or other data structures, and there is no limitation on the specific implementation of the tree structure.

下面结合图3-图5,以上述文档为广告法的文档为例,在识别出广告法的文档中文本,并转换为Markdown格式,再将广告法的Markdown格式构建为KB-Tree的数据结构的一种具体实施方式可以包括:In conjunction with FIG. 3 to FIG. 5 , taking the above document as an example, a specific implementation method of identifying the text in the document of the Advertising Law, converting it into Markdown format, and then constructing the Markdown format of the Advertising Law into a KB-Tree data structure may include:

(1)将广告法文档中转换为Markdown格式:(1) Convert the Advertising Law document to Markdown format:

在这一步中,需要将广告法的原始格式(可能是Word文档、PDF或其他格式)转换成Markdown格式,可以涉及以下操作:In this step, you need to convert the original format of the Advertising Law (which may be a Word document, PDF or other format) into Markdown format, which may involve the following operations:

①标题识别:识别文档中的各个标题,如章节标题、小节标题等,并使用Markdown的标题语法进行格式化,例如使用`#`、`##`、`###`等表示不同级别的标题。① Title recognition: Identify the various titles in the document, such as chapter titles, section titles, etc., and format them using Markdown's title syntax. For example, use `#`, `##`, `###`, etc. to represent titles of different levels.

②文本格式化:将文本的正文内容转换为Markdown格式的段落。②Text formatting: Convert the main body of the text into paragraphs in Markdown format.

③列表转换:如果文档中有列表(如条款列表、条件列表等),需要将其转换为Markdown支持的无序列表(使用`-`或`*`)或有序列表(使用数字后跟`.`)。③List conversion: If there are lists in the document (such as a list of terms, a list of conditions, etc.), they need to be converted into an unordered list (using `-` or `*`) or an ordered list (using a number followed by `.`) supported by Markdown.

④链接和图片:如果文档中包含链接或图片,使用Markdown的链接和图片语法进行格式化。④ Links and pictures: If the document contains links or pictures, use Markdown's link and picture syntax to format them.

⑤表格创建:如果文档中有表格数据,使用Markdown的表格语法创建表格。⑤Table creation: If there is table data in the document, use Markdown's table syntax to create the table.

(2)根据Markdown结构构建KB-Tree:(2) Construct KB-Tree based on Markdown structure:

①确定父节点:将文本的最高级别标题作为KB-Tree的父节点。① Determine the parent node: Take the highest level title of the text as the parent node of KB-Tree.

②构建子节点:对于每个下一级别的标题或内容,创建父节点的子节点。例如,如果Markdown文档中有一个一级标题,下面有多个二级标题,那么每个二级标题都应该是一级标题节点的子节点。②Build child nodes: For each next-level title or content, create a child node of the parent node. For example, if there is a first-level title in the Markdown document and multiple second-level titles below it, then each second-level title should be a child node of the first-level title node.

③处理正文内容:对于非标题的正文内容,可以将其作为叶子节点添加到相应的父节点下,或者将其包含在父节点的内容属性中。③Process the main text content: For non-title main text content, you can add it as a leaf node under the corresponding parent node, or include it in the content attribute of the parent node.

④添加相关属性:如果Markdown文档中包含链接、图片或其他特殊格式的内容,可以将这些内容作为节点的属性添加到KB-Tree中。④Add related attributes: If the Markdown document contains links, pictures or other special format content, these contents can be added to KB-Tree as node attributes.

⑤维护层级关系:确保KB-Tree中的每个节点都正确地反映了其在Markdown文档中的层级关系。⑤Maintain hierarchical relationships: Ensure that each node in the KB-Tree correctly reflects its hierarchical relationship in the Markdown document.

下面结合图3-图5对上述过程进行举例说明:The above process is described below with reference to Figures 3 to 5:

首先,可以假设广告法文档包含如图3所示结构,将如图3所示的结构转换为Markdown格式可能如图4所示,根据如图4所示的Markdown格式构建的KB-Tree可以如图5所示。First, it can be assumed that the advertising law document contains the structure shown in FIG. 3 , and the structure shown in FIG. 3 may be converted into the Markdown format as shown in FIG. 4 , and the KB-Tree constructed according to the Markdown format shown in FIG. 4 may be shown in FIG. 5 .

在实际应用中,根据Markdown格式构建KB-Tree的具体实施方式不限于以上方式,对此具体不作限定。上述文档可以有多种具体实现,作为例子,上述文档可以是DOC文档、Excel文档或PDF文档,每份文档中记录了一类知识,对于文档的具体实现不作限定。In practical applications, the specific implementation of constructing KB-Tree according to the Markdown format is not limited to the above method, and there is no specific limitation on this. The above document can have multiple specific implementations. As an example, the above document can be a DOC document, an Excel document or a PDF document. Each document records a type of knowledge. There is no limitation on the specific implementation of the document.

作为一个例子,对于上述Excel文档,可以通过列名转换模板,将原来按照表格存储的数据转变为纯文本数据;作为另一例子,对于上述PDF文档,可以通过OCR的PDF转换能力,将原来图像形式的文本提取成纯文本数据。As an example, for the above-mentioned Excel document, the data originally stored in the table can be converted into plain text data through the column name conversion template; as another example, for the above-mentioned PDF document, the original image text can be extracted into plain text data through the PDF conversion capability of OCR.

作为例子,上述知识库的KB-Tree的数据结构所表征的不同层次关系的知识结构,可以包括节点索引、标题和正文信息等结构,对此具体不作限定。As an example, the knowledge structure of different hierarchical relationships represented by the KB-Tree data structure of the above-mentioned knowledge base may include structures such as node index, title and text information, which are not specifically limited.

作为一个示例,每份源数据可以构建出一个树结构,知识库可以包含来自于多份源数据所构建的多个树结构,这些树结构可以作为根节点的一个子树,根节点可以是知识库的入口,不用于存储具体知识。As an example, each source data can construct a tree structure, and the knowledge base can contain multiple tree structures constructed from multiple source data. These tree structures can serve as a subtree of the root node. The root node can be the entrance to the knowledge base and is not used to store specific knowledge.

作为例子,在构建出上述树结构,得到包含该树结构的知识库后,可以将包含树结构的知识库同步存储至中间存储介质OSS中,并由线上服务定时对存储的知识库进行更新,从而保证知识库中知识的便捷更新。具体的更新方式可以是定时获取新的源数据,对新的源数据继续按照上述实施例进行树结构的构建,将新构建的树结构更新至知识库中。As an example, after constructing the above tree structure and obtaining a knowledge base containing the tree structure, the knowledge base containing the tree structure can be synchronously stored in the intermediate storage medium OSS, and the stored knowledge base can be updated regularly by the online service to ensure the convenient update of knowledge in the knowledge base. The specific update method can be to obtain new source data regularly, continue to construct the tree structure of the new source data according to the above embodiment, and update the newly constructed tree structure to the knowledge base.

基于预训练的语言模型可以通过多种方式构建,作为一个例子,可以首先选取基座模型,基座模型可以理解为已经利用大量的通用数据进行了预训练,具备了通用能力的语言模型;作为另一例子,可以选取大型语言模型(Large Language Model,LLM)作为上述基座模型。A pre-trained language model can be constructed in many ways. As an example, a base model can be selected first. The base model can be understood as a language model that has been pre-trained with a large amount of general data and has general capabilities. As another example, a large language model (LLM) can be selected as the above base model.

选取好基座模型后,在保留原有基座模型基座参数能力的同时,可以利用特定领域的数据,对基座模型再次进行训练(技术术语也称对基座模型进行“微调”)。作为例子,上述特定领域的数据可以指上文实施例中提到的内容安全领域,也可以是其他垂直领域,对此具体不作限定。After selecting the base model, while retaining the base parameter capabilities of the original base model, the base model can be trained again using data from a specific field (technically referred to as "fine-tuning" the base model). As an example, the data in the specific field mentioned above can refer to the content security field mentioned in the above embodiment, or other vertical fields, which are not specifically limited.

对选取的基座模型,即基于预训练的语言模型再次进行训练(即“微调”选取的基座模型)的方式有多种,作为例子,其中一种训练方式可以包括:获取历史会话数据,并针对历史会话数据中的用户询问数据以及与用户询问数据对应的答复数据分别生成不同描述方式的数据,得到训练数据集,再至少以该训练数据集对语言模型再次进行训练。为防止再次训练语言模型时过拟合,在构造训练数据集时需要保证问题以及对应答复的多样性,同时引入不同描述方式进行数据增强。There are many ways to retrain the selected base model, i.e., the pre-trained language model (i.e., "fine-tune" the selected base model). As an example, one of the training methods may include: obtaining historical conversation data, and generating data with different description methods for the user query data and the answer data corresponding to the user query data in the historical conversation data, respectively, to obtain a training data set, and then retraining the language model with at least the training data set. In order to prevent overfitting when retraining the language model, it is necessary to ensure the diversity of questions and corresponding answers when constructing the training data set, and introduce different description methods for data enhancement.

针对历史会话数据中的用户询问数据以及与用户询问数据对应的答复数据分别生成不同描述方式的数据的方式有多种,作为一个例子,其中一种生成方式可以包括:获取预设描述模板,并基于该预设描述模板,针对历史会话数据中的用户询问数据以及与用户询问数据对应的答复数据分别生成不同语义的描述方式的数据,预设描述模板包含针对同一类型的问题及问题对应回答的多种不同语义的预设描述方式;作为另一例子,历史会话数据中的用户询问数据所涉及的问题及问题对应回答可以是定义类的问题及对应回答,预设描述模板所针对的问题及问题对应回答的类型也可以是定义类的问题及对应回答。There are multiple ways to generate data with different description modes for user inquiry data in historical session data and reply data corresponding to the user inquiry data. As an example, one of the generation methods may include: obtaining a preset description template, and based on the preset description template, generating data with different semantic description modes for user inquiry data in historical session data and reply data corresponding to the user inquiry data, the preset description template including multiple preset description modes with different semantics for the same type of questions and corresponding answers to the questions. As another example, the questions and corresponding answers involved in the user inquiry data in the historical session data may be questions and corresponding answers of a defined category, and the types of questions and corresponding answers targeted by the preset description template may also be questions and corresponding answers of a defined category.

作为另一例子,上述定义类的含义可以是针对特定领域的专有名词的定义的询问,也可以是针对其他对象的定义的询问,对此具体不作限定。As another example, the definition category may mean a query for the definition of a proper noun in a specific field, or a query for the definition of other objects, and there is no specific limitation on this.

对“定义类”举例说明:以针对同一个特定人物A为例,“A是谁”、“能不能介绍一下A”、“请介绍一下A”、“A是干什么的”、“请简单描述一下A”这些均属于针对A的定义信息的不同语义的询问。To illustrate the "definition type", take the same specific person A as an example, "Who is A", "Can you introduce A", "Please introduce A", "What does A do", "Please briefly describe A" are all inquiries with different semantics for A's definition information.

作为另一例子,针对历史会话数据中的用户询问数据以及与用户询问数据对应的答复数据分别生成不同描述方式的数据的另一方式可以包括:生成任务信息,该任务信息可以用以提示通用的基于预训练的语言模型针对同一问题以及同一问题对应的回答分别生成不同语义的描述方式,将该任务信息以及上述历史会话数据输入至通用的基于预训练的语言模型,以使该通用的基于预训练的语言模型针对历史会话数据中的用户询问数据以及与用户询问数据对应的答复数据分别生成不同语义的描述方式的数据。As another example, another method of generating data with different description methods for user inquiry data in historical conversation data and answer data corresponding to the user inquiry data may include: generating task information, which may be used to prompt a general pre-trained language model to generate different semantic description methods for the same question and the answer corresponding to the same question, and inputting the task information and the above-mentioned historical conversation data into the general pre-trained language model, so that the general pre-trained language model generates data with different semantic description methods for the user inquiry data in the historical conversation data and the answer data corresponding to the user inquiry data.

作为另一例子,“通用的基于预训练的语言模型”可以指本说明书S203中提及的语言模型,也可以是其他通用的语言模型,对此具体不作限定。As another example, "general pre-trained language model" may refer to the language model mentioned in S203 of this specification, or may refer to other general language models, and there is no specific limitation on this.

作为例子,可以基于上述训练数据集以及有监督微调(Supervised Fine-Tuning,SFT)方式对该语言模型再次进行训练。通过SFT可以是构造有监督数据集对语言模型的参数进行重训练微调的过程,让语言模型参数学习到相关数据的知识和表达方式,例如,可以构造内容安全领域的语言模型微调数据集(即上述训练数据集可以是内容安全领域的),并基于基座模型的选型调整数据格式。模型在训练过程和应用阶段的输入和输出的形式是一致的,在进行训练时,语言模型的输入与步骤202中的提示信息的形式或结构等是一致的,因此具体可见后续对提示信息的实施例说明。As an example, the language model can be trained again based on the above training data set and supervised fine-tuning (SFT). SFT can be a process of constructing a supervised data set to retrain and fine-tune the parameters of the language model, so that the language model parameters can learn the knowledge and expression of related data. For example, a language model fine-tuning data set in the field of content security can be constructed (that is, the above training data set can be in the field of content security), and the data format can be adjusted based on the selection of the base model. The input and output forms of the model in the training process and the application stage are consistent. When training, the input of the language model is consistent with the form or structure of the prompt information in step 202, so the details can be seen in the subsequent description of the implementation example of the prompt information.

作为例子,上述语言模型为LLM时,可以基于AdaLora结构以及SFT对LLM进行上述再次训练的过程,即对语言模型进行重训练微调的过程,AdaLora可以根据语言模型不同层和不同类型参数的重要程度自动分配可微调参数的多少。As an example, when the above language model is LLM, the above-mentioned re-training process of LLM can be performed based on the AdaLora structure and SFT, that is, the process of retraining and fine-tuning the language model. AdaLora can automatically allocate the number of fine-tunable parameters according to the importance of different layers and different types of parameters of the language model.

作为一个示例,上述知识库以及语言模型构建完成后,可以利用上述知识库以及上述语言模型进行答复内容生成,下面对上述知识库以及上述语言模型的使用阶段进行示例性说明:As an example, after the above-mentioned knowledge base and language model are constructed, the above-mentioned knowledge base and the above-mentioned language model can be used to generate reply content. The following is an exemplary description of the use stages of the above-mentioned knowledge base and the above-mentioned language model:

首先需要获取用户的当前询问数据:First, you need to get the user's current query data:

用户的当前询问数据(query)可以有多种具体实现,作为一个例子,其中一种具体实现可以包括:当前询问数据可以是文本数据;作为另一例子,该文本数据可以是关键词文本,短语文本,或者问题文本。The user's current query data (query) may have multiple specific implementations. As an example, one specific implementation may include: the current query data may be text data; as another example, the text data may be keyword text, phrase text, or question text.

作为另一例子,用户的当前询问数据还可以是图像数据,该图像数据中可以包含可提取的文本信息。As another example, the user's current query data may also be image data, which may contain extractable text information.

作为另一例子,用户的当前询问数据还可以是语音数据,该语音数据中可以包含可提取的文本信息。As another example, the user's current inquiry data may also be voice data, which may contain extractable text information.

值得说明的是,上述对于用户的当前询问数据的具体实现的介绍仅是示例性展示,在实际应用中,不排除存在其他具体实现,对此不作限定。It is worth noting that the above introduction to the specific implementation of the user's current query data is only an exemplary display. In actual application, other specific implementations are not excluded and are not limited to this.

获取到当前询问数据后,再基于当前询问数据从知识库中查询与当前询问数据关联的目标知识:After obtaining the current query data, the target knowledge associated with the current query data is queried from the knowledge base based on the current query data:

考虑知识库中的一些特定知识可能具备时效性,即一些知识的有效性是有时间限制的,例如,当知识为法律法规领域的知识时,某些法律法规可能存在具体的生效时间或失效时间,如广告法第X条第X款是在X年X月X日之后才生效的,若在进行知识查询时仅考虑知识的内容而不考虑知识的时效性,可能会检索到无效的知识,从而使答复内容失去准确性。Consider that some specific knowledge in the knowledge base may be time-sensitive, that is, the validity of some knowledge is time-limited. For example, when the knowledge is in the field of laws and regulations, some laws and regulations may have specific effective or expiration times. For example, Article X, Paragraph X of the Advertising Law will not take effect until after X month X day of X year. If only the content of the knowledge is considered when conducting knowledge query without considering the timeliness of the knowledge, invalid knowledge may be retrieved, which will make the response content inaccurate.

针对该问题,从预设的知识库中查询与当前询问数据关联的目标知识的方式有多种,作为例子,其中一种查询方式可以包括:预设的知识库中的至少一条知识可以关联有标准时间信息,可以识别用户的当前询问数据中与时间相关的文本,并将与时间相关的文本转换为标准时间信息,再基于用户的当前询问数据以及转换得到的标准时间信息查询上述目标知识。由于相关技术中语言模型本身不具有当前时间的判断能力,语言模型不一定能感知用户的当前询问数据中与时间相关的文本具体指的是什么时间、日期,例如今天、本周、等多种模糊的时间表述方式,语言模型本身可能不具有当前时间的判断能力,如“今天”指什么日期,本实施例可提供对用户query的时间模糊识别能力,例如将用户query 中提及的“今天”转换为标准时间信息“2024年4月1日”,使得用户的当前询问数据中与时间相关的文本更好地被识别理解。In view of this problem, there are multiple ways to query the target knowledge associated with the current query data from the preset knowledge base. As an example, one query method may include: at least one piece of knowledge in the preset knowledge base may be associated with standard time information, and the time-related text in the user's current query data may be identified, and the time-related text is converted into standard time information, and then the above target knowledge is queried based on the user's current query data and the converted standard time information. Since the language model itself in the related art does not have the ability to judge the current time, the language model may not be able to perceive what time and date the time-related text in the user's current query data specifically refers to, such as today, this week, and other fuzzy time expressions. The language model itself may not have the ability to judge the current time, such as what date "today" refers to. This embodiment can provide the time fuzzy recognition capability of the user's query, for example, converting the "today" mentioned in the user's query into the standard time information "April 1, 2024", so that the time-related text in the user's current query data can be better recognized and understood.

可以通过多种方式为预设的知识库中的知识关联标准时间信息,作为一个例子,标准时间信息可以是预先携带于知识中的;作为另一例子,标准时间信息可以是在知识存储至知识库的过程中被关联在知识上的。例如,某份源数据具有生效时间范围,对于该份源数据所构建得到的树结构,所有节点存储的知识均需要关联标准时间信息,可以为该树结构的根节点关联标准时间信息,则该根节点下的所有子节点均关联标准时间信息。或者,可以是一份源数据中的部分知识具有生效时间范围,对于该份源数据所构建得到的树结构,那些需要关联标准时间信息的知识所在的节点关联标准时间信息。值得说明的是,上述对于标准时间信息的关联方式的介绍仅是示例性展示,在实际应用中,不排除存在其他关联方式,对此不作限定。The knowledge in the preset knowledge base can be associated with standard time information in a variety of ways. As an example, the standard time information can be pre-carried in the knowledge; as another example, the standard time information can be associated with the knowledge in the process of storing the knowledge in the knowledge base. For example, a certain source data has an effective time range. For the tree structure constructed by the source data, all knowledge stored in the nodes need to be associated with standard time information. The root node of the tree structure can be associated with standard time information, and all child nodes under the root node are associated with standard time information. Alternatively, part of the knowledge in a source data may have an effective time range. For the tree structure constructed by the source data, the nodes where the knowledge that needs to be associated with standard time information is located are associated with standard time information. It is worth noting that the above introduction to the association method of standard time information is only an exemplary display. In practical applications, other association methods are not excluded and are not limited to this.

标准时间信息可以有多种具体实现,作为一个例子,标准时间信息可以是包含年、月、日的信息,例如“2024年4月1日”;也可以仅包含月份或年份的信息,例如“4月”或“2024年”;也可以是不包含年份的日期信息,例如,“4月1日”;作为另一例子,标准时间信息中还可以包含精确到小时、分钟或秒的时间信息,例如“2024年4月1日13时1分1秒”,标准时间信息还可以是与季节对应的时间信息,例如1至3月(春季),4至6月(夏季),7-9月(秋季),10至12月(冬季),在实际应用中,标准时间信息不排除存在其他具体实现,对此不作限定。There can be many specific implementations of standard time information. As an example, the standard time information can be information including year, month, and day, such as "April 1, 2024"; it can also include only month or year information, such as "April" or "2024"; it can also be date information without year, for example, "April 1"; as another example, the standard time information can also include time information accurate to hours, minutes or seconds, such as "April 1, 2024 13:01:01". The standard time information can also be time information corresponding to the season, such as January to March (spring), April to June (summer), July to September (autumn), and October to December (winter). In actual applications, the standard time information does not exclude the existence of other specific implementations and is not limited to this.

考虑知识库中的一些特定知识可能与敏感人物相关,例如某些特定事迹、特定历史事件、特定司法案例是与特定的敏感人物关联的,若在进行知识查询时仅考虑知识的内容而不考虑知识的敏感人物相关性,可能会检索到敏感的知识,从而使答复内容较为敏感,存在较大风险性。针对该问题,作为例子,从预设的知识库中查询与当前询问数据关联的目标知识的另一方式可以包括:预设的知识库中的至少一条知识可以关联有命名实体信息,可以识别用户的当前询问数据中与命名实体相关的命名信息,并基于用户的当前询问数据以及从用户的当前询问数据识别出的命名信息查询上述目标知识。在某些对话场景下,命名实体是需要在用户的询问数据中重点识别的实体内容,因此提供命名实体识别能力,以高效地查询出与询问数据所包含的命名实体关联的知识。Consider that some specific knowledge in the knowledge base may be related to sensitive people. For example, certain specific deeds, specific historical events, and specific judicial cases are related to specific sensitive people. If only the content of the knowledge is considered when performing knowledge query without considering the relevance of the knowledge to sensitive people, sensitive knowledge may be retrieved, making the content of the response more sensitive and posing a greater risk. In response to this problem, as an example, another way to query the target knowledge associated with the current inquiry data from a preset knowledge base may include: at least one piece of knowledge in the preset knowledge base may be associated with named entity information, and the naming information related to the named entity in the user's current inquiry data may be identified, and the above-mentioned target knowledge may be queried based on the user's current inquiry data and the naming information identified from the user's current inquiry data. In some dialogue scenarios, named entities are entity content that needs to be identified in the user's inquiry data, so a named entity recognition capability is provided to efficiently query the knowledge associated with the named entity contained in the inquiry data.

命名实体信息可以有多种具体实现,作为例子,命名实体信息可以是人物名称信息,以内容安全领域对话场景为例,在该场景下,需要重点识别用户的询问数据中所包含的人物名称信息,以识别出特定人物的名称,从而快速查询到特定人物关联的知识,从而高效答复。值得说明的是,上述对于命名实体信息的具体实现的介绍仅是示例性展示,在实际应用中,不排除存在其他具体实现,例如,命名实体信息还可以是特定物体的信息等其他实体对象的信息,对此具体不作限定。Named entity information can have multiple specific implementations. For example, named entity information can be character name information. Taking the content security field dialogue scenario as an example, in this scenario, it is necessary to focus on identifying the character name information contained in the user's query data to identify the name of a specific person, so as to quickly query the knowledge associated with the specific person and provide an efficient response. It is worth noting that the above introduction to the specific implementation of named entity information is only an exemplary display. In actual applications, other specific implementations are not excluded. For example, named entity information can also be information about other entity objects such as information about a specific object, and there is no specific limitation on this.

作为例子,从预设的知识库中查询与当前询问数据关联的目标知识的另一方式可以包括:预设的知识库中的至少一条知识可以同时关联有标准时间信息以及命名实体信息,可以识别用户的当前询问数据中与时间相关的文本,以及与命名实体相关的命名信息,并将与时间相关的文本转换为标准时间信息,最后基于用户的当前询问数据、转换得到的标准时间信息以及从用户的当前询问数据识别出的命名信息查询上述目标知识。As an example, another way to query target knowledge associated with current query data from a preset knowledge base may include: at least one piece of knowledge in the preset knowledge base may be simultaneously associated with standard time information and named entity information, and time-related text and named entity-related naming information in the user's current query data may be identified, and the time-related text may be converted into standard time information, and finally the above-mentioned target knowledge may be queried based on the user's current query data, the converted standard time information, and the naming information identified from the user's current query data.

作为一个示例,从预设的知识库中查询与当前询问数据关联的目标知识时,由于知识库包含了树结构,树结构中的每个节点存储了一条知识,可以从树结构的根节点开始,依次遍历整个树结构的所有节点,以检索出与当前询问数据关联的目标知识。As an example, when querying the target knowledge associated with the current query data from the preset knowledge base, since the knowledge base contains a tree structure, each node in the tree structure stores a piece of knowledge. Starting from the root node of the tree structure, all nodes of the entire tree structure can be traversed in sequence to retrieve the target knowledge associated with the current query data.

作为例子,从预设的知识库中查询与当前询问数据关联的目标知识的另一方式可以包括:从上述知识库中,至少以向量相似度查询方式查询与用户的当前询问数据关联的第一知识,该第一知识与用户的当前询问数据的向量相似度大于或等于第一阈值。As an example, another way to query target knowledge associated with current query data from a preset knowledge base may include: from the above-mentioned knowledge base, querying the first knowledge associated with the user's current query data at least in a vector similarity query manner, and the vector similarity between the first knowledge and the user's current query data is greater than or equal to a first threshold.

作为例子,从预设的知识库中查询与当前询问数据关联的目标知识的另一方式可以包括: 从上述知识库中,至少以向量相似度查询方式以及字符级相似度查询方式,分别查询与用户的当前询问数据关联的第一知识,第一知识与所述当前询问数据的向量相似度大于或等于所述第一阈值,并且第一知识与所述当前询问数据的字符级相似度大于或等于第二阈值。结合向量相似度查询方式以及字符级相似度查询方式两种查询方式,形成二者融合的融合相似度查询方式,能够提高查询精度。As an example, another method of querying target knowledge associated with current query data from a preset knowledge base may include: querying the first knowledge associated with the user's current query data from the above-mentioned knowledge base at least in a vector similarity query method and a character-level similarity query method, respectively, wherein the vector similarity between the first knowledge and the current query data is greater than or equal to the first threshold, and the character-level similarity between the first knowledge and the current query data is greater than or equal to the second threshold. Combining the two query methods of the vector similarity query method and the character-level similarity query method to form a fusion similarity query method that combines the two can improve query accuracy.

作为例子,从预设的知识库中查询与当前询问数据关联的目标知识的另一方式可以包括: 在上述向量相似度查询方式,或上述向量相似度查询方式结合上述字符级相似度查询方式的基础上,还可以识别用户的当前询问数据包含的关键词,再从上述知识库中,通过关键词正则匹配的查询方式,查询与所述当前询问数据包含的关键词匹配的第一知识。关键词正则匹配的查询方式能够对于特定场景下的专有名词的查询精确度,例如上述内容安全领域的对话场景下,对命名实体信息的查询精度;更进一步,关键词正则匹配的查询方式结合上述向量相似度查询方式以及上述字符级相似度查询方式的融合查询方式,能更进一步提高查询精度。As an example, another method of querying target knowledge associated with current query data from a preset knowledge base may include: Based on the above-mentioned vector similarity query method, or the above-mentioned vector similarity query method combined with the above-mentioned character-level similarity query method, the keywords contained in the user's current query data can also be identified, and then the first knowledge matching the keywords contained in the current query data is queried from the above-mentioned knowledge base through the keyword regular matching query method. The keyword regular matching query method can improve the query accuracy of proper nouns in specific scenarios, such as the query accuracy of named entity information in the above-mentioned conversation scenario in the content security field; further, the keyword regular matching query method combined with the above-mentioned vector similarity query method and the above-mentioned character-level similarity query method can further improve the query accuracy.

作为一个示例,融合查询方式具体可以是:目标知识包含了:与所述当前询问数据的向量相似度大于或等于所述第一阈值,并且与所述当前询问数据的字符级相似度大于或等于第二阈值的知识,还包含了与所述当前询问数据包含的关键词匹配的第一知识。As an example, the fusion query method can specifically be: the target knowledge includes: knowledge whose vector similarity with the current query data is greater than or equal to the first threshold, and whose character-level similarity with the current query data is greater than or equal to the second threshold, and also includes first knowledge that matches the keywords contained in the current query data.

上述向量相似度查询方式可以有多种具体实现,作为一个例子,向量相似度查询方式可以是文本向量化(Text to Vector,Text2vec)查询方式,Text2vec是一种文本向量化表征的神经网络模型,可以将文本转变为向量特征,再通过余弦相似度衡量搜索用户query和检索到的知识的相关性,对于向量相似度查询方式的具体实现不作限定;作为另一例子,在上述知识库里的每条知识均可预先计算好对应的向量并存储于知识库中,当获取到用户的当前询问数据后,可以先计算出用户的当前询问数据对应的向量,并将用户的当前询问数据对应的向量与知识库中每条知识对应的向量进行比较,以计算出向量相似度。The above-mentioned vector similarity query method can be implemented in many specific ways. As an example, the vector similarity query method can be a text vectorization (Text to Vector, Text2vec) query method. Text2vec is a neural network model for text vectorization representation, which can convert text into vector features, and then measure the correlation between the search user query and the retrieved knowledge through cosine similarity. The specific implementation of the vector similarity query method is not limited. As another example, the corresponding vector of each piece of knowledge in the above-mentioned knowledge base can be pre-calculated and stored in the knowledge base. After the user's current query data is obtained, the vector corresponding to the user's current query data can be calculated first, and the vector corresponding to the user's current query data can be compared with the vector corresponding to each piece of knowledge in the knowledge base to calculate the vector similarity.

上述字符级相似度查询方式可以有多种具体实现,作为例子,字符级相似度查询方式可以是字符级相似度查询算法(Best matching 25,BM25)查询方式,BM25是基于TF-IDF的改进算法,用于衡量搜索词与文档的相关性,其中,TF-IDF中, TF是词频(TermFrequency),IDF是逆文本频率指数(Inverse Document Frequency),可以作为文本信息的表征特征,用于信息检索。The above-mentioned character-level similarity query method can be implemented in many specific ways. As an example, the character-level similarity query method can be a character-level similarity query algorithm (Best matching 25, BM25) query method. BM25 is an improved algorithm based on TF-IDF, which is used to measure the relevance between search terms and documents. In TF-IDF, TF is term frequency (TermFrequency), and IDF is inverse document frequency index (Inverse Document Frequency), which can be used as a representation feature of text information for information retrieval.

作为例子,从预设的知识库中查询与当前询问数据关联的目标知识的另一方式可以包括:上述知识库采用树结构存储用于构建知识库的源数据所包含的具有层次关系的多条知识,可以从该树结构中查询存储了与用户的当前询问数据匹配的第一知识的第一节点,再从该树结构中获取与第一节点具有相同父节点的第二节点,获取第二节点存储的第二知识。第一知识与第二知识所属节点具有相同父节点,则第一知识与第二知识的层次关系可以是相同层次关系(同层级关系),可以通过与当前询问数据匹配的知识所属节点的同级节点的拓扑关系进行知识进一步关联与知识检索查询的拓展。As an example, another method of querying the target knowledge associated with the current query data from the preset knowledge base may include: the above-mentioned knowledge base uses a tree structure to store multiple pieces of knowledge with a hierarchical relationship contained in the source data used to construct the knowledge base, and the first node storing the first knowledge matching the user's current query data can be queried from the tree structure, and then the second node having the same parent node as the first node is obtained from the tree structure, and the second knowledge stored in the second node is obtained. If the nodes to which the first knowledge and the second knowledge belong have the same parent node, the hierarchical relationship between the first knowledge and the second knowledge can be the same hierarchical relationship (same-level relationship), and the knowledge can be further associated and the knowledge retrieval query can be expanded through the topological relationship of the nodes at the same level of the nodes to which the knowledge matching the current query data belongs.

作为一个例子,从预设的知识库中查询与当前询问数据关联的目标知识的另一方式可以包括:上述知识库采用树结构存储用于构建知识库的源数据所包含的具有层次关系的多条知识;作为另一例子,可以从该树结构中查询存储了与用户的当前询问数据匹配的第一知识的第一节点,再从该树结构中获取作为第一节点的父节点的第三节点,并获取第三节点存储的第三知识;作为另一例子,可以从该树结构中查询存储了与用户的当前询问数据匹配的第一知识的第一节点,再从该树结构中获取作为第一节点的子节点的第四节点,并获取第四节点存储的第四知识;作为另一例子,还可以从该树结构中查询存储了与用户的当前询问数据匹配的第一知识的第一节点,再从该树结构中获取作为第一节点的父节点的第三节点以及作为第一节点的子节点的第四节点,并获取第三节点存储的第三知识,以及第四节点存储的第四知识。第三知识所属节点为第一知识所属节点的父节点,第四知识所属节点为第一知识所属节点的子节点,能够通过与当前询问数据匹配的知识所属节点的父节点或子节点的拓扑关系进行知识进一步关联与知识检索查询的拓展。As an example, another method of querying target knowledge associated with current query data from a preset knowledge base may include: the above-mentioned knowledge base uses a tree structure to store multiple pieces of knowledge with a hierarchical relationship contained in the source data used to construct the knowledge base; as another example, the first node storing the first knowledge matching the user's current query data can be queried from the tree structure, and then the third node as the parent node of the first node is obtained from the tree structure, and the third knowledge stored in the third node is obtained; as another example, the first node storing the first knowledge matching the user's current query data can be queried from the tree structure, and then the fourth node as the child node of the first node is obtained from the tree structure, and the fourth knowledge stored in the fourth node is obtained; as another example, the first node storing the first knowledge matching the user's current query data can be queried from the tree structure, and then the third node as the parent node of the first node and the fourth node as the child node of the first node are obtained from the tree structure, and the third knowledge stored in the third node and the fourth knowledge stored in the fourth node are obtained. The third knowledge node is the parent node of the first knowledge node, and the fourth knowledge node is the child node of the first knowledge node. The topological relationship between the parent node or child node of the knowledge node matching the current query data can be used to further associate knowledge and expand knowledge retrieval queries.

请参见图6,下面对上述知识及所述节点的层次关系进行示例性介绍:Please refer to FIG6 , and the following is an exemplary introduction to the above knowledge and the hierarchical relationship of the nodes:

作为例子,与上述第一知识存在层次关系的第二知识,可以理解为第二知识与第一知识属于相同层次的关系,即第二知识与第一知识所属节点为具有相同父节点的同级节点;与上述第一知识存在层次关系的第二知识,也可以理解为第二知识是所属节点为第一知识所属节点的父节点的上述第三知识,或者,第二知识也可以是所属节点为第一知识所属节点的子节点的上述第四知识,或者,第二知识也可以包含第一知识所属节点的同级节点存储的知识(上述第二知识)、第一知识所属节点的父节点存储的知识(上述第三知识)以及第一知识所属节点的子节点存储的知识(上述第四知识)这三种知识的一种或全部。值得说明的是,上述对于知识及所述节点的层次关系的具体实现的介绍仅是示例性展示,在实际应用中,不排除存在其他具体实现,对此不作限定。As an example, the second knowledge that has a hierarchical relationship with the first knowledge can be understood as the second knowledge and the first knowledge belonging to the same level, that is, the node to which the second knowledge and the first knowledge belong is a node of the same level with the same parent node; the second knowledge that has a hierarchical relationship with the first knowledge can also be understood as the second knowledge being the third knowledge whose node is the parent node of the node to which the first knowledge belongs, or the second knowledge can also be the fourth knowledge whose node is the child node of the node to which the first knowledge belongs, or the second knowledge can also include one or all of the three types of knowledge: the knowledge stored in the node of the same level of the node to which the first knowledge belongs (the second knowledge), the knowledge stored in the parent node of the node to which the first knowledge belongs (the third knowledge), and the knowledge stored in the child node of the node to which the first knowledge belongs (the fourth knowledge). It is worth noting that the above introduction to the specific implementation of the hierarchical relationship of knowledge and the nodes is only an exemplary display. In practical applications, other specific implementations are not excluded and are not limited to this.

在查询出目标知识后,可以生成包含用户的当前询问数据和查询出的目标知识的提示信息:After the target knowledge is found, prompt information including the user's current query data and the found target knowledge can be generated:

生成上述提示信息的方式有多种,作为例子,其中一种生成方式可以包括:可以基于上述目标知识,获取与上述语言模型对应的身份信息,该身份信息用于指示语言模型以预设身份进行答复,再生成包含用户的当前询问数据、上述目标知识以及与上述语言模型对应的身份信息的提示信息。构建的提示信息可以包括身份角色信息,以指示语言模型以特定身份角色进行答复,提高语言模型针对特定领域的答复准确性。There are many ways to generate the above prompt information. As an example, one of the generating methods may include: based on the above target knowledge, the identity information corresponding to the above language model is obtained, and the identity information is used to instruct the language model to respond with a preset identity, and then generate prompt information containing the user's current query data, the above target knowledge, and the identity information corresponding to the above language model. The constructed prompt information may include identity role information to instruct the language model to respond with a specific identity role, thereby improving the accuracy of the language model's response to specific fields.

上述预设身份可以有多种具体实现,作为一个例子,上述预设身份可以是指与目标知识所属领域对应的答复角色身份;作为另一例子,目标知识所属领域可以是内容安全领域;举一个示例性的例子:当目标知识所属领域是内容安全领域时,预设身份可以是内容安全领域的审核助手或答疑助手;作为一个更详细的例子,当目标知识所属领域是内容安全领域中的广告法领域时,预设身份可以是“广告法专家身份”;因此,对于预设身份的具体实现不作限定。The above-mentioned preset identity can have multiple specific implementations. As an example, the above-mentioned preset identity can refer to the reply role identity corresponding to the field to which the target knowledge belongs; as another example, the field to which the target knowledge belongs can be the content security field; to give an illustrative example: when the field to which the target knowledge belongs is the content security field, the preset identity can be an audit assistant or a question-and-answer assistant in the content security field; as a more detailed example, when the field to which the target knowledge belongs is the advertising law field in the content security field, the preset identity can be an "advertising law expert identity"; therefore, there is no limitation on the specific implementation of the preset identity.

作为例子,生成上述提示信息的另一种方式可以包括:生成包含用户的当前询问数据、上述目标知识、与上述语言模型对应的身份信息和预设的限制回答信息的提示信息,该限制回答信息用以指示该所述语言模型以目标知识所表征的答复范围进行答复,能够限制语言模型答复的范围,尽可能避免语言模型编造答复内容,进一步提高答复准确性。As an example, another way to generate the above-mentioned prompt information may include: generating prompt information including the user's current inquiry data, the above-mentioned target knowledge, the identity information corresponding to the above-mentioned language model and preset restricted answer information, wherein the restricted answer information is used to instruct the language model to reply within the range of replies represented by the target knowledge, which can limit the scope of the language model's reply, avoid the language model from fabricating reply content as much as possible, and further improve the accuracy of the reply.

下面对提示信息进行举例说明:The following is an example of the prompt information:

作为一个示例,提示信息可以包含如下一种或多种信息:As an example, the prompt information may include one or more of the following information:

①用户的当前询问数据:①The user's current query data:

例如可以是如下的文本“人物A在日期B针对广告法修订的内容” 或“在日期C作出的行为D是否违反广告法”等类似的内容,实际应用中可以根据需要灵活调整,本实施例对此不进行限定。For example, the text may be "Contents of the amendment to the Advertising Law by person A on date B" or "Whether behavior D performed on date C violates the Advertising Law" or similar content. In actual applications, it can be flexibly adjusted as needed, and this embodiment does not limit this.

②目标知识:②Target knowledge:

例如可以是如下的文本“修订内容为:针对广告法第X章第X条第X款进行了修订,修订前内容为:XX,修订后内容为:XXX” 或“违反了广告法第X章第X条第X款的规定,同时还违反了第X章第X条第X款的规定,在日期C之前行为D并不违反上述条款,而在日期C之后违反上述条款”等类似的内容,实际应用中可以根据需要灵活调整,本实施例对此不进行限定。For example, the text may be as follows: "The revised content is: revisions have been made to Chapter X, Article X, Clause X of the Advertising Law. The content before the revision was: XX, and the content after the revision is: XXX" or "It violates the provisions of Chapter X, Article X, Clause X of the Advertising Law, and also violates the provisions of Chapter X, Article X, Clause X. Behavior D does not violate the above clause before date C, but violates the above clause after date C" and other similar content. In actual applications, it can be flexibly adjusted as needed, and this embodiment does not limit this.

③身份信息:③Identity information:

例如可以是如下的文本“你作为内容安全领域答疑助手”、“你作为内容安全领域审核助手”或“你作为广告法专家”等类似的内容,实际应用中可以根据需要灵活调整,本实施例对此不进行限定。For example, the text may be "You are a question-answering assistant in the field of content security", "You are an auditing assistant in the field of content security", or "You are an advertising law expert" or similar content. In actual applications, it can be flexibly adjusted as needed, and this embodiment does not limit this.

④预设的限制回答信息:④Preset restricted answer information:

例如可以是如下的文本“请基于提示文本中的目标知识进行回答”、“请在广告法所涉及的具体条款的范围内进行回答”或“请在广告法修订的日期的范围内进行回答”等类似的内容,实际应用中可以根据需要灵活调整,本实施例对此不进行限定。For example, the text may be as follows: "Please answer based on the target knowledge in the prompt text", "Please answer within the scope of the specific clauses involved in the Advertising Law", or "Please answer within the scope of the revision date of the Advertising Law", etc. In actual applications, it can be flexibly adjusted as needed, and this embodiment does not limit this.

值得说明的是,上述对于生成上述提示信息的方式的介绍仅是示例性展示,在实际应用中,不排除存在其他生成方式,对此具体不作限定。It is worth noting that the above-mentioned introduction to the method of generating the above-mentioned prompt information is only an exemplary display. In practical applications, other generation methods are not excluded and are not specifically limited to this.

在生成提示信息后,可以将生成的提示信息输入至上述基于预训练的语言模型中,由该语言模型生成提示信息对应的答复内容。After the prompt information is generated, the generated prompt information can be input into the above-mentioned pre-trained language model, and the language model generates the reply content corresponding to the prompt information.

请参照图7,图7所示是基于本说明书实施例构建的答复内容生成框架,作为例子,该框架可以是基于检索增强生成(Retrieval-Augmented Generation,RAG)框架构建的,可分为三个环节:知识库构建、知识检索(查询)、大模型推理生成,这三个环节均可部署于用户私有服务器中,能够有效保护用户信息和领域知识数据的私密性。Please refer to Figure 7. Figure 7 shows a reply content generation framework constructed based on the embodiments of this specification. As an example, the framework can be constructed based on the Retrieval-Augmented Generation (RAG) framework, which can be divided into three parts: knowledge base construction, knowledge retrieval (query), and large model reasoning generation. These three parts can all be deployed in the user's private server, which can effectively protect the privacy of user information and domain knowledge data.

相应于上述方法实施例,本说明书实施例还提供一种答复内容生成装置,参见图8所示,该装置可以包括:Corresponding to the above method embodiment, the embodiment of this specification further provides a reply content generating device, as shown in FIG8 , the device may include:

查询模块801,用于获取用户的当前询问数据,并基于所述当前询问数据,从预设的知识库中查询与所述当前询问数据关联的目标知识,其中,所述目标知识包括与所述当前询问数据匹配的第一知识,以及与所述第一知识存在层次关系的第二知识,用于构建所述知识库的源数据包含具有层次关系的多条知识;The query module 801 is used to obtain the current query data of the user, and based on the current query data, query the target knowledge associated with the current query data from the preset knowledge base, wherein the target knowledge includes the first knowledge matching the current query data and the second knowledge having a hierarchical relationship with the first knowledge, and the source data used to construct the knowledge base includes multiple pieces of knowledge having a hierarchical relationship;

生成模块802、用于生成包含所述当前询问数据和所述目标知识的提示信息;将所述提示信息输入至基于预训练的语言模型中,由所述语言模型生成所述提示信息对应的答复内容。The generation module 802 is used to generate prompt information including the current query data and the target knowledge; input the prompt information into a pre-trained language model, and generate a reply content corresponding to the prompt information by the language model.

作为例子,查询模块801具体用于识别所述当前询问数据中与时间相关的文本,并将所述与时间相关的文本转换为标准时间信息;基于所述当前询问数据以及转换得到的标准时间信息查询所述目标知识。As an example, the query module 801 is specifically used to identify time-related text in the current query data and convert the time-related text into standard time information; and query the target knowledge based on the current query data and the converted standard time information.

作为例子,所述知识库中的至少一条知识关联有命名实体信息,查询模块801具体用于识别所述当前询问数据中与命名实体相关的命名信息,并基于所述当前询问数据以及所述命名信息查询所述目标知识。As an example, at least one piece of knowledge in the knowledge base is associated with named entity information, and the query module 801 is specifically used to identify the naming information related to the named entity in the current query data, and query the target knowledge based on the current query data and the naming information.

作为例子,查询模块801具体用于从所述知识库中,至少以向量相似度查询方式查询与所述当前询问数据关联的第一知识,所述第一知识与所述当前询问数据的向量相似度大于或等于第一阈值。As an example, the query module 801 is specifically configured to query the first knowledge associated with the current query data from the knowledge base at least in a vector similarity query manner, wherein the vector similarity between the first knowledge and the current query data is greater than or equal to a first threshold.

作为例子,查询模块801具体用于从所述知识库中,至少以向量相似度查询方式以及字符级相似度查询方式,分别查询所述第一知识,所述第一知识与所述当前询问数据的向量相似度大于或等于所述第一阈值,所述第一知识与所述当前询问数据的字符级相似度大于或等于第二阈值。As an example, the query module 801 is specifically used to query the first knowledge from the knowledge base, at least using a vector similarity query method and a character level similarity query method, respectively, where the vector similarity between the first knowledge and the current query data is greater than or equal to the first threshold, and the character level similarity between the first knowledge and the current query data is greater than or equal to a second threshold.

作为例子,查询模块801具体用于识别所述当前询问数据包含的关键词;从所述知识库中,通过关键词正则匹配的查询方式,查询与所述当前询问数据包含的关键词匹配的第一知识。As an example, the query module 801 is specifically used to identify the keywords contained in the current query data; and query the first knowledge matching the keywords contained in the current query data from the knowledge base through a keyword regular matching query method.

作为例子,查询模块801具体用于基于所述目标知识,获取与所述语言模型对应的身份信息,所述身份信息用于指示所述语言模型以预设身份进行答复;生成包含所述当前询问数据、所述目标知识以及所述身份信息的提示信息。As an example, the query module 801 is specifically used to obtain identity information corresponding to the language model based on the target knowledge, and the identity information is used to instruct the language model to reply with a preset identity; and generate prompt information including the current query data, the target knowledge and the identity information.

作为例子,生成模块802具体用于生成包含所述当前询问数据、所述目标知识、所述身份信息和预设的限制回答信息的提示信息,所述限制回答信息用以指示所述语言模型以所述目标知识所表征的答复范围进行答复。As an example, the generation module 802 is specifically used to generate prompt information including the current query data, the target knowledge, the identity information and preset restricted answer information, and the restricted answer information is used to instruct the language model to respond within the range of responses represented by the target knowledge.

作为例子,所述语言模型的训练过程至少包括:As an example, the training process of the language model at least includes:

获取历史会话数据,并针对所述历史会话数据中的用户询问数据以及与用户询问数据对应的答复数据分别生成不同描述方式的数据,得到训练数据集;Acquire historical conversation data, and generate data in different description modes for user inquiry data and answer data corresponding to the user inquiry data in the historical conversation data, to obtain a training data set;

至少以所述训练数据集对所述语言模型再次进行训练。The language model is trained again using at least the training data set.

作为例子,所述知识库采用树结构存储所述具有层次关系的多条知识;查询模块801具体用于从所述树结构中查询存储了与所述当前询问数据匹配的第一知识的第一节点;从所述树结构中获取与所述第一节点具有相同父节点的第二节点,获取所述第二节点存储的第二知识。As an example, the knowledge base uses a tree structure to store the multiple pieces of knowledge with a hierarchical relationship; the query module 801 is specifically used to query the first node that stores the first knowledge matching the current query data from the tree structure; obtain the second node that has the same parent node as the first node from the tree structure, and obtain the second knowledge stored in the second node.

作为例子,所述知识库采用树结构存储所述具有层次关系的多条知识;查询模块801具体用于从所述树结构中查询存储了与所述当前询问数据匹配的第一知识的第一节点;从所述树结构中获取作为所述第一节点的父节点的第三节点,并获取所述第三节点存储的第三知识,和/或,从所述树结构中获取作为所述第一节点的子节点的第四节点,并获取所述第四节点存储的第四知识。As an example, the knowledge base uses a tree structure to store the multiple pieces of knowledge with a hierarchical relationship; the query module 801 is specifically used to query the first node that stores the first knowledge matching the current query data from the tree structure; obtain the third node that is the parent node of the first node from the tree structure, and obtain the third knowledge stored in the third node, and/or obtain the fourth node that is the child node of the first node from the tree structure, and obtain the fourth knowledge stored in the fourth node.

作为例子,所述源数据包括文档;上述知识库通过以下方式构建:识别所述文档中文本,将识别到的文本转换为Markdown格式的数据;基于所述Markdown格式的数据构建树结构,得到包含所述树结构的知识库。As an example, the source data includes a document; the above-mentioned knowledge base is constructed in the following manner: identifying text in the document, converting the identified text into data in Markdown format; constructing a tree structure based on the data in Markdown format to obtain a knowledge base containing the tree structure.

本说明书实施例还提供了一种计算机程序产品, 包括计算机程序,所述计算机程序被处理器执行时实现上文中任一实施例所描述的答复内容生成方法。The embodiments of this specification also provide a computer program product, including a computer program, which, when executed by a processor, implements the reply content generation method described in any of the embodiments above.

本说明书实施例还提供了一种电子设备, 如图9所示,该电子设备包括:The embodiment of this specification also provides an electronic device, as shown in FIG9 , the electronic device includes:

处理器901;Processor 901;

用于存储处理器可执行指令的存储器902;Memory 902 for storing processor executable instructions;

其中,所述处理器901被配置为实现上文中任一实施例所描述的答复内容生成方法。The processor 901 is configured to implement the reply content generation method described in any of the above embodiments.

本说明书实施例还提供了一种计算机可读存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时实现上文中任一实施例所描述的答复内容生成方法。The embodiments of this specification also provide a computer-readable storage medium on which a computer program is stored. When the computer program is executed by a processor, the method for generating reply content described in any of the embodiments above is implemented.

对于装置实施例而言,由于其基本对应于方法实施例,所以相关之处参见方法实施例的部分说明即可。以上所描述的装置实施例仅仅是示意性的,其中所述作为分离部件说明的模块可以是或者也可以不是物理上分开的,作为模块显示的部件可以是或者也可以不是物理模块,即可以位于一个地方,或者也可以分布到多个网络模块上。可以根据实际的需要选择其中的部分或者全部模块来实现本说明书方案的目的。本领域普通技术人员在不付出创造性劳动的情况下,即可以理解并实施。For the device embodiment, since it basically corresponds to the method embodiment, the relevant parts can refer to the partial description of the method embodiment. The device embodiment described above is only schematic, wherein the modules described as separate components may or may not be physically separated, and the components displayed as modules may or may not be physical modules, that is, they may be located in one place, or they may be distributed on multiple network modules. Some or all of the modules may be selected according to actual needs to achieve the purpose of the scheme of this specification. Ordinary technicians in this field can understand and implement it without paying creative work.

上述实施例可以应用于一个或者多个计算机设备中,所述计算机设备是一种能够按照事先设定或存储的指令,自动进行数值计算和/或信息处理的设备,所述计算机设备的硬件包括但不限于微处理器、专用集成电路(Application Specific IntegratedCircuit,ASIC)、可编程门阵列(Field-Programmable Gate Array,FPGA)、数字处理器(Digital Signal Processor,DSP)、嵌入式设备等。The above embodiments can be applied to one or more computer devices, where the computer device is a device that can automatically perform numerical calculations and/or information processing according to pre-set or stored instructions. The hardware of the computer device includes but is not limited to a microprocessor, an application specific integrated circuit (ASIC), a field-programmable gate array (FPGA), a digital signal processor (DSP), an embedded device, etc.

所述计算机设备可以是任何一种可与用户进行人机交互的电子产品,例如,个人计算机、平板电脑、智能手机、个人数字助理(Personal Digital Assistant,PDA)、游戏机、交互式网络电视(Internet Protocol Television,IPTV)、智能式穿戴式设备等。The computer device may be any electronic product that can perform human-computer interaction with a user, such as a personal computer, a tablet computer, a smart phone, a personal digital assistant (PDA), a game console, an interactive network television (IPTV), a smart wearable device, etc.

所述计算机设备还可以包括网络设备和/或用户设备。其中,所述网络设备包括,但不限于单个网络服务器、多个网络服务器组成的 服务器组或基于云计算(CloudComputing)的由大量主机或网络服务器构成的云。The computer device may also include a network device and/or a user device. The network device includes, but is not limited to, a single network server, a server group consisting of multiple network servers, or a cloud consisting of a large number of hosts or network servers based on cloud computing.

所述计算机设备所处的网络包括但不限于互联网、广域网、城域网、局域网、虚拟专用网络(Virtual Private Network,VPN)等。The network where the computer device is located includes but is not limited to the Internet, a wide area network, a metropolitan area network, a local area network, a virtual private network (VPN), etc.

本公开所涉及的用户信息(包括但不限于用户设备信息、用户个人信息等)和数据(包括但不限于用于分析的数据、存储的数据、展示的数据等),均为经用户授权或者经过各方充分授权的信息和数据,并且相关数据的收集、使用和处理需要遵守相关国家和地区的相关法律法规和标准,并提供有相应的操作入口,供用户选择授权或者拒绝。The user information (including but not limited to user device information, user personal information, etc.) and data (including but not limited to data used for analysis, stored data, displayed data, etc.) involved in this disclosure are all information and data authorized by the user or fully authorized by all parties, and the collection, use and processing of relevant data must comply with the relevant laws, regulations and standards of relevant countries and regions, and provide corresponding operation entrances for users to choose to authorize or refuse.

上面各种方法的步骤划分,只是为了描述清楚,实现时可以合并为一个步骤或者对某些步骤进行拆分,分解为多个步骤,只要包括相同的逻辑关系,都在本专利的保护范围内;对算法中或者流程中添加无关紧要的修改或者引入无关紧要的设计,但不改变其算法和流程的核心设计都在该申请的保护范围内。The step division of the above methods is only for clear description. When implemented, they can be combined into one step or some steps can be split and decomposed into multiple steps. As long as they include the same logical relationship, they are all within the protection scope of this patent; adding insignificant modifications to the algorithm or process or introducing insignificant designs without changing the core design of the algorithm and process are all within the protection scope of this application.

其中,“具体示例”、或“一些示例”等的描述意指结合所述实施例或示例描述的具体特征、结构、材料或者特点包含于本说明书的至少一个实施例或示例中。在本说明书中,对上述术语的示意性表述不一定指的是相同的实施例或示例。而且,描述的具体特征、结构、材料或者特点可以在任何的一个或多个实施例或示例中以合适的方式结合。The description of "specific examples" or "some examples" means that the specific features, structures, materials or characteristics described in conjunction with the embodiment or example are included in at least one embodiment or example of this specification. In this specification, the schematic representation of the above terms does not necessarily refer to the same embodiment or example. Moreover, the specific features, structures, materials or characteristics described can be combined in any one or more embodiments or examples in a suitable manner.

本领域技术人员在考虑说明书及实践这里申请的发明后,将容易想到本说明书的其它实施方案。本说明书旨在涵盖本说明书的任何变型、用途或者适应性变化,这些变型、用途或者适应性变化遵循本说明书的一般性原理并包括本说明书未申请的本技术领域中的公知常识或惯用技术手段。说明书和实施例仅被视为示例性的,本说明书的真正范围和精神由下面的权利要求指出。Those skilled in the art will readily appreciate other embodiments of the specification after considering the specification and practicing the invention claimed herein. The specification is intended to cover any variations, uses or adaptations of the specification that follow the general principles of the specification and include common knowledge or customary techniques in the art that are not claimed in the specification. The specification and examples are to be considered exemplary only, and the true scope and spirit of the specification are indicated by the following claims.

应当理解的是,本说明书并不局限于上面已经描述并在附图中示出的精确结构,并且可以在不脱离其范围进行各种修改和改变。本说明书的范围仅由所附的权利要求来限制。It should be understood that the present description is not limited to the precise structures that have been described above and shown in the drawings, and that various modifications and changes may be made without departing from the scope thereof. The scope of the present description is limited only by the appended claims.

以上所述仅为本说明书的较佳实施例而已,并不用以限制本说明书,凡在本说明书的精神和原则之内,所做的任何修改、等同替换、改进等,均应包含在本说明书保护的范围之内。The above description is only a preferred embodiment of this specification and is not intended to limit this specification. Any modifications, equivalent substitutions, improvements, etc. made within the spirit and principles of this specification should be included in the scope of protection of this specification.

Claims (12)

1. A language model-based reply content generation method, comprising:
acquiring current query data of a user, and inquiring target knowledge associated with the current query data from a preset knowledge base based on the current query data, wherein the target knowledge comprises first knowledge matched with the current query data and second knowledge in hierarchical relationship with the first knowledge, and source data used for constructing the knowledge base comprises a plurality of pieces of knowledge in hierarchical relationship;
Generating prompt information containing the current query data and the target knowledge;
Inputting the prompt information into a pre-training-based language model, and generating reply content corresponding to the prompt information by the language model.
2. The method of claim 1, the knowledge base storing the plurality of knowledge in a hierarchical relationship in a tree structure;
The querying the target knowledge associated with the current query data from a preset knowledge base comprises the following steps:
Querying a first node from the tree structure that stores first knowledge matching the current query data;
and acquiring a second node which has the same father node as the first node from the tree structure, and acquiring second knowledge stored by the second node.
3. The method of claim 1 or 2, the knowledge base storing the plurality of knowledge in a hierarchical relationship in a tree structure;
The querying the target knowledge associated with the current query data from a preset knowledge base comprises the following steps:
Querying a first node from the tree structure that stores first knowledge matching the current query data;
and acquiring a third node serving as a father node of the first node from the tree structure, and acquiring third knowledge stored by the third node, and/or acquiring a fourth node serving as a child node of the first node from the tree structure, and acquiring fourth knowledge stored by the fourth node.
4. The method of claim 1, the source data comprising a document; the knowledge base is constructed by:
identifying the text in the document, and converting the identified text into data in a Markdown format;
Constructing a tree structure based on the data in the MarkDown format to obtain a knowledge base containing the tree structure;
and/or, at least one piece of knowledge in the knowledge base is associated with standard time information, the querying, from a preset knowledge base, the target knowledge associated with the current query data comprises:
Identifying time-related text in the current query data and converting the time-related text into standard time information;
inquiring the target knowledge based on the current inquiry data and the standard time information obtained by conversion;
And/or, at least one piece of knowledge in the knowledge base is associated with named entity information, the querying, from a preset knowledge base, the target knowledge associated with the current query data comprises:
and identifying naming information related to the naming entity in the current query data, and querying the target knowledge based on the current query data and the naming information.
5. The method of claim 1, the querying target knowledge associated with the current query data from a preset knowledge base, comprising:
And querying first knowledge associated with the current query data in a vector similarity query mode from the knowledge base, wherein the vector similarity of the first knowledge and the current query data is greater than or equal to a first threshold value.
6. The method of claim 5, wherein querying the target knowledge associated with the current query data from a preset knowledge base comprises:
And respectively inquiring the first knowledge from the knowledge base at least in a vector similarity inquiring mode and a character level similarity inquiring mode, wherein the vector similarity of the first knowledge and the current inquiring data is larger than or equal to the first threshold value, and the character level similarity of the first knowledge and the current inquiring data is larger than or equal to the second threshold value.
7. The method of claim 5, wherein querying the target knowledge associated with the current query data from a preset knowledge base, further comprises:
identifying keywords contained in the current query data;
And inquiring first knowledge matched with the keywords contained in the current inquiry data in the knowledge base in a keyword regular matching inquiry mode.
8. The method of claim 1, the generating a hint message containing the current query data and the target knowledge, comprising:
Based on the target knowledge, acquiring identity information corresponding to the language model, wherein the identity information is used for indicating the language model to answer with a preset identity;
And generating prompt information containing the current query data, the target knowledge and the identity information.
9. The method of claim 8, the generating a hint message containing the current query data, the target knowledge, and the identity information, comprising:
Generating prompt information comprising the current query data, the target knowledge, the identity information and preset limit answer information, wherein the limit answer information is used for indicating the language model to answer with an answer range characterized by the target knowledge.
10. The method of claim 1, the training process of the language model comprising at least:
Acquiring historical session data, and respectively generating data in different description modes aiming at user inquiry data and reply data corresponding to the user inquiry data in the historical session data to obtain a training data set;
the language model is retrained with at least the training dataset.
11. A language model-based reply content generation apparatus, the apparatus comprising:
A query module for: acquiring current query data of a user, and inquiring target knowledge associated with the current query data from a preset knowledge base based on the current query data, wherein the target knowledge comprises first knowledge matched with the current query data and second knowledge in hierarchical relationship with the first knowledge, and source data used for constructing the knowledge base comprises a plurality of pieces of knowledge in hierarchical relationship;
A generation module for: generating prompt information containing the current query data and the target knowledge; inputting the prompt information into a pre-training-based language model, and generating reply content corresponding to the prompt information by the language model.
12. The apparatus of claim 11, the knowledge base storing the plurality of knowledge in a tree structure; the query module is further configured to: querying a first node from the tree structure that stores first knowledge matching the current query data; acquiring a second node with the same father node as the first node from the tree structure, and acquiring second knowledge stored by the second node;
And/or, the query module is further configured to: querying a first node from the tree structure that stores first knowledge matching the current query data; acquiring a third node serving as a father node of the first node from the tree structure, acquiring third knowledge stored by the third node, acquiring a fourth node serving as a child node of the first node from the tree structure, and acquiring fourth knowledge stored by the fourth node;
And/or, the source data comprises a document; the knowledge base is constructed by: identifying the text in the document, and converting the identified text into data in a Markdown format; constructing a tree structure based on the data in the MarkDown format to obtain a knowledge base containing the tree structure;
And/or, at least one piece of knowledge in the knowledge base is associated with standard time information, and the query module is further configured to: identifying time-related text in the current query data and converting the time-related text into standard time information; inquiring the target knowledge based on the current inquiry data and the standard time information obtained by conversion;
And/or, at least one piece of knowledge in the knowledge base is associated with named entity information, and the query module is further used for: identifying naming information related to a naming entity in the current query data, and inquiring the target knowledge based on the current query data and the naming information;
And/or, the query module is further configured to: and querying first knowledge associated with the current query data in a vector similarity query mode from the knowledge base, wherein the vector similarity of the first knowledge and the current query data is greater than or equal to a first threshold value.
CN202410891489.XA 2024-07-03 2024-07-03 Method and device for generating reply content based on language model Pending CN118520095A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410891489.XA CN118520095A (en) 2024-07-03 2024-07-03 Method and device for generating reply content based on language model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410891489.XA CN118520095A (en) 2024-07-03 2024-07-03 Method and device for generating reply content based on language model

Publications (1)

Publication Number Publication Date
CN118520095A true CN118520095A (en) 2024-08-20

Family

ID=92281164

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410891489.XA Pending CN118520095A (en) 2024-07-03 2024-07-03 Method and device for generating reply content based on language model

Country Status (1)

Country Link
CN (1) CN118520095A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118917305A (en) * 2024-10-11 2024-11-08 杭州谐云科技有限公司 RAG system optimization method, system, electronic equipment and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113342955A (en) * 2021-06-29 2021-09-03 南京星云数字技术有限公司 Question and answer sentence processing method and device and electronic equipment
CN117194646A (en) * 2023-10-24 2023-12-08 Oppo广东移动通信有限公司 Question and answer method and device and electronic equipment
CN117520523A (en) * 2023-12-29 2024-02-06 中邮消费金融有限公司 Data processing method, device, equipment and storage medium
CN117874204A (en) * 2024-01-16 2024-04-12 西安交通大学 Knowledge question-answering method, system, storage medium and computer equipment
CN117952200A (en) * 2023-12-27 2024-04-30 北京邮电大学 A method and system for constructing knowledge graph and personalized learning path
CN118093828A (en) * 2024-03-20 2024-05-28 中国联合网络通信集团有限公司 Question and answer method, system, device and medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113342955A (en) * 2021-06-29 2021-09-03 南京星云数字技术有限公司 Question and answer sentence processing method and device and electronic equipment
CN117194646A (en) * 2023-10-24 2023-12-08 Oppo广东移动通信有限公司 Question and answer method and device and electronic equipment
CN117952200A (en) * 2023-12-27 2024-04-30 北京邮电大学 A method and system for constructing knowledge graph and personalized learning path
CN117520523A (en) * 2023-12-29 2024-02-06 中邮消费金融有限公司 Data processing method, device, equipment and storage medium
CN117874204A (en) * 2024-01-16 2024-04-12 西安交通大学 Knowledge question-answering method, system, storage medium and computer equipment
CN118093828A (en) * 2024-03-20 2024-05-28 中国联合网络通信集团有限公司 Question and answer method, system, device and medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118917305A (en) * 2024-10-11 2024-11-08 杭州谐云科技有限公司 RAG system optimization method, system, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
US12223404B2 (en) Iterative attention-based neural network training and processing
RU2745632C1 (en) Automated response server device, terminal device, response system, response method and program
US9519681B2 (en) Enhanced knowledge repository
US10795919B2 (en) Assisted knowledge discovery and publication system and method
US11132610B2 (en) Extracting structured knowledge from unstructured text
US10853396B2 (en) Intelligent natural language query processor
US9098492B2 (en) Knowledge repository
US20200327432A1 (en) Intelligent communication manager and summarizer
CN118410152B (en) Information processing method, question-answering method and question-answering system
CN118277588B (en) Query request processing method, electronic device and storage medium
JP2022523601A (en) Systems and methods for adaptive question answering
CN113407688B (en) Method for establishing knowledge graph-based survey standard intelligent question-answering system
CN118520095A (en) Method and device for generating reply content based on language model
CN120162326A (en) A personalized interaction enhancement method and system based on large language model
CN119358657A (en) Method, device and storage medium for selecting knowledge base
Majid et al. Ontology-Based System for Educational Program Counseling.
CN118897910A (en) A method, device, equipment and medium for generating answers based on information retrieval
CN117217309A (en) Information processing method and device based on semantic understanding and computer equipment
Chiu et al. Using rough set theory to construct e-learning faq retrieval infrastructure
KR102790504B1 (en) Method for generating structured report through generative ai and the table of contents creation process
CN118535715B (en) Automatic reply method, equipment and storage medium based on tree structure knowledge base
Keltoum et al. An ontology driven question answering system for fatawa retrieval
JP7132576B2 (en) Security ID Conversation Search System
CN120687547A (en) Information determination method, device, electronic device, storage medium and program product
CN120541172A (en) A summarizing question-answering optimization method for private domain multimodal knowledge graph based on a large model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20240820