CN118964549A

CN118964549A - A method for constructing multi-role intelligent agents based on large language models

Info

Publication number: CN118964549A
Application number: CN202410953944.4A
Authority: CN
Inventors: 崔绍钧; 王志; 李跃
Original assignee: Weijie Technology Hangzhou Co ltd
Current assignee: Weijie Technology Hangzhou Co ltd
Priority date: 2024-07-17
Filing date: 2024-07-17
Publication date: 2024-11-15
Anticipated expiration: 2044-07-17
Also published as: CN118964549B

Abstract

Provided is a method for constructing a multi-role agent based on a large language model, comprising: receiving a plurality of groups of sample data corresponding to different roles, wherein each group of sample data comprises interaction record data, professional terms, problem answering data and operation processing data; converting each data element in each group of sample data into natural language texts by using a sequence-to-sequence model, performing context modeling on all the natural language texts to establish topological relations among the natural language texts, and obtaining a plurality of training data groups according to the topological relations; wherein each training data set comprises a plurality of natural language texts with strong relevance; and fine tuning the large language model by using each training data set to obtain a corresponding character model, and integrating each character model into a multi-character intelligent body. The scheme of the invention realizes the automatic and rapid construction of the multi-role intelligent body, so that the intelligent body can cope with the interaction requirements of various different scenes.

Description

Method for constructing multi-role intelligent body based on large language model

Technical Field

The invention relates to the technical field of psychological assessment, in particular to a method for constructing a multi-role intelligent agent based on a large language model.

Background

In the field of artificial intelligence, an "Agent" is a system that is able to perceive its environment and make decisions based on the perceived information to achieve a specific goal. The agent may be a simple, e.g., rule-based system, or may be a complex, e.g., intelligent system with learning capabilities.

By fine-tuning a large language model for data of a specific task (corresponding to a specific role) to improve its performance on the specific task, an agent of the corresponding role can be obtained. The intelligent agent with multiple roles is also widely required, for example, the intelligent agent is applied to service providing robots in business places such as banks, hospitals and the like, the intelligent agent is applied to intelligent customer service, game AI and the like, and the intelligent agent in the application scenes is required to have functions of multiple roles so as to be convenient for role switching among multiple specific tasks, thereby providing better use experience for users.

However, the construction method of the multi-role intelligent agent in the prior art is not researched enough, so that the multi-role intelligent agent cannot meet the actual needs of users.

Disclosure of Invention

In order to solve the technical problems, the invention provides a method for constructing a multi-role intelligent agent based on a large language model.

A method of constructing a multi-role agent based on a large language model, the method comprising the steps of:

Receiving a plurality of groups of sample data corresponding to different roles, wherein each group of sample data comprises interaction record data, technical terms, problem answering data and operation processing data;

Converting each data element in each group of sample data into natural language texts by using a sequence-to-sequence model, performing context modeling on all the natural language texts to establish topological relations among the natural language texts, and obtaining a plurality of training data groups according to the topological relations; wherein, each training data set contains natural language text with strong relevance;

And fine tuning the large language model by using each training data set to obtain a corresponding character model, and integrating each character model into a multi-character intelligent body.

In some embodiments, the large language model is a GPT model or a BERT model.

In some embodiments, said fine-tuning the large language model using each of said training data sets comprises:

Inputting each training data set into the large language model, and outputting a prediction feature vector by the large language model;

Inputting the predicted feature vector into an LDA classifier to obtain a predicted result, calculating a deviation value between the predicted result and a corresponding true value, updating and adjusting parameters of the large language model according to the deviation value, and sequentially inputting the rest training data sets until the deviation value is converged.

In some embodiments, the integrating each character model into a multi-character agent comprises:

Performing scene prediction on all natural language texts corresponding to each character model by using Resnet models to obtain a plurality of predicted scenes;

Analyzing the association strength of each predicted scene among the character models, and constructing a scene association network of each character intelligent agent according to the association strength;

integrating each of the character models into the multi-character agent based on the scene correlation network.

In some embodiments, the multi-role agent works as follows:

the decision maker in the multi-role intelligent agent receives the interactive content input by the user, carries out semantic understanding on the interactive content to obtain the type of the scene and the corresponding confidence probability;

Performing matching calculation on a plurality of predicted scenes corresponding to each role model based on the scene type, so as to obtain a plurality of role models and corresponding matching values in a matching way, determining the largest matching value as a target role model, and screening a plurality of role models from the rest matching according to the confidence probability to obtain a designated number of candidate role models;

in the process of interacting with a user, both the target character model and the candidate character model respond to the interactive contents of the user, but only the response contents of the target character model are output, and when the response contents of the target character model are empty, the response contents of any candidate character model are output.

In some embodiments, performing a matching calculation on a plurality of predicted scenes corresponding to each of the character models based on the scene type, thereby matching the plurality of character models and corresponding matching values, comprising:

calculating the semantic matching degree of a plurality of predicted scenes corresponding to the scene types and the role models, counting the number of the predicted scenes with the semantic matching degree higher than a first threshold value, and determining the number as the matching value of the role models;

and screening out the character models with the matching values lower than a second threshold value, wherein the rest character models are a plurality of character models obtained by matching.

The invention has the beneficial effects that: the scheme of the invention realizes the automatic and rapid construction of the multi-role intelligent body, so that the intelligent body can cope with the interaction requirements of various different scenes.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the embodiments will be briefly described below, it being understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and other related drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a schematic flow chart of a method for constructing a multi-role agent based on a large language model according to an embodiment of the present invention.

Detailed Description

Other advantages and advantages of the present application will become apparent to those skilled in the art from the following detailed description, which, by way of illustration, is to be read in connection with certain specific embodiments, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

In addition, the technical features of the different embodiments of the present application described below may be combined with each other as long as they do not collide with each other.

As shown in fig. 1, the embodiment of the invention discloses a method for constructing a multi-role intelligent agent based on a large language model, which comprises the following steps:

Converting each data element in each group of sample data into natural language texts by using a sequence-to-sequence model, performing context modeling on all the natural language texts to establish topological relations among the natural language texts, and obtaining a plurality of training data groups according to the topological relations; wherein each training data set comprises a plurality of natural language texts with strong relevance;

The invention first collects multiple sets of sample data from different roles, which may be assistant/advisor, teacher, customer service, etc., correspondingly, such as work data of a bank/hospital guidance assistant, work data of a virtual teacher in a learning machine, work data of an intelligent customer service in an application program. The sample data includes interactive record data (interactive dialogue record in text or voice form), technical terms, question answering data and operation processing data (processing operations related to the interactive dialogue and aiming at specific questions, such as outputting answer of questions, opening air conditioner and the like), and the data includes unstructured text data, voice data, operation processing data, structured technical terms, question answering data and the like. And then carrying out semantic association analysis on all the natural language texts by using a context modeling mechanism, thereby constructing a topological relation network containing all the natural language texts, wherein the distance between the natural language texts in the topological relation network reflects the semantic association strength of the natural language texts, and carrying out grouping processing on the natural language texts according to the semantic association strength, wherein each natural language text in each group corresponds to one type of interaction, such as mathematical knowledge questions and answers, english knowledge questions and answers, geographic knowledge questions and answers, and the like in teacher roles. Thus, a plurality of training data sets are obtained, the training data sets correspond to one role, each training data set belonging to different roles is used for fine tuning a large language model, a corresponding role model is obtained, and finally all the role models are integrated and packaged, so that a multi-element role model is constructed.

In some embodiments, the large language model is a GPT model or a BERT model.

In this embodiment, the sample data is converted into natural language text, and the function is simple and single, so the present invention preferably uses a sequence-to-sequence (Seq 2 Seq) model to reduce the conversion calculation amount. Such a model uses an encoder-decoder architecture to convert input non-natural language text into a series of symbols, which are then decoded to produce natural language text.

While large language models need to be the basis for the construction of character models, which will be used later for analytical calculations of various complex problems, GPT models or BERT models are preferably used. After the pre-training is completed, the large language model can be finely tuned according to specific tasks so as to adapt to different downstream tasks, such as text classification, question-answering, machine translation and the like.

In this embodiment, the large language model extracts corresponding feature vectors from respective natural language text data in the training data set, performs semantic understanding and prediction processing based on the feature vectors, and obtains predicted feature vectors, which are output results of the large language model. And then, verifying the output result of the large language model by using an LDA classifier, specifically, classifying and calculating the predicted feature vector output by the large language model by using the LDA classifier to obtain a corresponding predicted result, and further determining the accuracy of the predicted feature vector of the large language model by calculating a deviation value between the predicted result and the corresponding true value label because the training data set also comprises the corresponding true value, so that the parameters of the large language model can be updated and adjusted, and then sequentially inputting the rest training data set to continue fine-tuning training until the deviation value reaches convergence.

In this embodiment, character models corresponding to different characters can be obtained through training in the foregoing manner, and each character model can cope with various interactive processing services adapted to the character, such as problem solutions in specific fields, device operation processes, and the like. However, when a user interacts with a multi-role agent, the multi-role agent may involve various fields, such as switching or inserting interactions of medical problems when the user interacts with mathematical knowledge of the multi-role agent, which requires the multi-role agent to switch the corresponding role model for interaction. However, when the user switches the interaction domain, the user may not give the multi-role intelligent agent a switching prompt, so after the user inputs the interaction content in the new domain, the multi-role intelligent agent needs to spend a long time to analyze the domain to which the interaction content belongs, or can confirm the domain to which the interaction content belongs after performing multiple re-interactions with the user, and then can schedule the corresponding role model to perform interaction response, which is obviously inefficient.

In order to solve the technical problems, the method uses Resnet models to conduct scene analysis and prediction on all natural language texts corresponding to each role model, so that a plurality of prediction scenes corresponding to each role model are obtained. The predicted scenario refers to a use scenario where all interaction capabilities of the character model may be adapted, for example, the character model 1 has a geographic knowledge interaction capability, the character model 1 adapts to a scenario of geographic teaching, navigation, etc., the character model 2 has interaction capabilities of various planning algorithms, the character model 1 adapts to a scenario of route planning, etc., in navigation, and thus the character model 1 and the character model 2 have the same use scenario, that is, navigation scenario, and have a correlation therebetween. Accordingly, a scene association network of each character agent can be gradually established, and the scene association network also relates to association strength, wherein the association strength is the number of the same or related predicted scenes among the character models, and the larger the number is, the larger the corresponding association strength is.

In some embodiments, the multi-role agent works as follows:

In this embodiment, when integrating each character model into a multi-character agent, a decision maker should also be added to the multi-character agent, the decision maker being used to schedule the corresponding character model to cope with interactions with the user. The term "user" as used herein includes both human users and various machines or electronic devices, and even other agents, as the present invention is not limited in this regard.

Firstly, a user inputs a piece of interaction content to the multi-role intelligent body, and a decision maker in the multi-role intelligent body carries out semantic understanding on the interaction content so as to analyze and obtain the scene type and the corresponding confidence probability of the interaction content. And performing matching calculation based on a plurality of predicted scenes corresponding to the scene types and the character models, so as to screen out a plurality of matched character models and corresponding matching values, wherein the most matched character model is determined as a target character model, and the other character models are determined as candidate character models. The selected character models simultaneously respond to interactive contents input by the user and give respective response contents, but the decision maker only outputs the response contents of the target character model to the user, and only when the response contents of the target character model are empty, the decision maker selects one of the response contents of the candidate character models to output. That is, the user newly switches or inserts the interactive contents belonging to the new domain, and the target character model does not have the interactive capability for the new domain, and cannot output the response contents, and at this time, the other candidate character models are switched to perform the interactive response. The multi-character agent switches character models, but because the candidate character models calculate response content at the same time, the interaction process is still timely and smooth, and the user does not perceive interaction jamming.

Wherein the number of candidate character models to be screened is determined based on the confidence probabilities derived as described above. Specifically, when the confidence probability of the scene type predicted by the decision maker is higher, the analysis accuracy of the decision maker on the field of the interactive content of the user at the moment is higher, and only a small number of candidate role models are required to be set; and when the confidence probability of the scene type obtained by the decision maker is lower, the analysis accuracy of the decision maker on the field of the interactive content of the user at the moment is lower, more candidate character models are required to be set at the moment, the response to the interactive content of the user is carried out through the more candidate character models, the candidate character models can respond continuously when the target character model cannot respond, and obviously, the more the number of the candidate character models is, the higher the success rate of continuous response is.

In this embodiment, the semantic matching degree of a plurality of predicted scenes corresponding to the scene type predicted by the decision maker and the character model predicted by the Resnet model is calculated respectively, for example, the scene type is a navigation scene, the plurality of predicted scenes corresponding to the character model 1 include a navigation scene, an entertainment question-answering scene and a video screening scene, the plurality of predicted scenes corresponding to the character model 2 include a weather report scene and a geography teaching scene, and the plurality of predicted scenes corresponding to the character model 3 include a primary knowledge teaching scene and a banking guiding scene. Obviously, the scene type has high semantic matching degree with the navigation scene and the entertainment question-answering scene (music recommendation, delicacy recommendation and the like) in the character model 1, has high semantic matching degree with the weather broadcasting scene in the character model 2, but has low semantic matching degree with the geography teaching scene, and has low semantic matching degree with the primary knowledge teaching scene and the banking guiding scene in the character model 3. Then, the matching value of the character model 1 is 2, the matching value of the character model 2 is 1, the matching value of the character model 3 is 0, and the character model 3 is screened out because the matching value is lower than the second threshold value, and the remaining character models 1 and 2 are character models obtained by matching.

It is noted that the flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

In general, the various example embodiments of the disclosure may be implemented in hardware or special purpose circuits, software, firmware, logic, or any combination thereof. Some aspects may be implemented in hardware, while other aspects may be implemented in firmware or software which may be executed by a controller, microprocessor or other computing device. While aspects of some embodiments of the present disclosure are illustrated or described as block diagrams, flow charts, or using some other pictorial representation, it is well understood that the blocks, apparatus, systems, techniques or methods described herein may be implemented in, as non-limiting examples, hardware, software, firmware, special purpose circuits or logic, general purpose hardware or controller P0781-or other computing devices, or some combination thereof.

The exemplary embodiments of the present disclosure described in detail above are illustrative only and are not limiting. Those skilled in the art will understand that various modifications and combinations of these embodiments or features thereof may be made without departing from the principles and spirit of the disclosure, and such modifications should fall within the scope of the disclosure.

Claims

1. A method for constructing a multi-role intelligent agent based on a large language model is characterized by comprising the following steps: the method comprises the following steps:

2. The method for constructing a multi-role agent based on a large language model of claim 1, wherein: the large language model is a GPT model or a BERT model.

3. The method for constructing a multi-role agent based on a large language model of claim 2, wherein: the fine tuning of the large language model using each of the training data sets includes:

4. The method for constructing a multi-role agent based on a large language model of claim 1, wherein: the integration of character models into a multi-character agent includes:

5. The method for building a multi-role agent based on a large language model of claim 4, wherein: the working mode of the multi-role intelligent agent is as follows:

6. The method for building a multi-role agent based on a large language model of claim 5, wherein: performing matching calculation on a plurality of predicted scenes corresponding to each role model based on the scene type, so as to obtain a plurality of role models and corresponding matching values in a matching way, wherein the matching calculation comprises the following steps: