CN114328992B

CN114328992B - Multimedia information recommendation method, device, program product, equipment and medium

Info

Publication number: CN114328992B
Application number: CN202111302377.9A
Authority: CN
Inventors: 旷宗强; 张翔宇; 黄帆
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2021-11-04
Filing date: 2021-11-04
Publication date: 2025-04-29
Anticipated expiration: 2041-11-04
Also published as: CN114328992A

Abstract

The invention provides a multimedia information recommendation method, a device, a program product, equipment and a medium, wherein the method comprises the steps of obtaining first object feedback data aiming at target multimedia information, and predicting and obtaining a first score corresponding to each first sub-object feedback data based on the first object feedback data; determining fusion weight parameters corresponding to feedback data of each first sub-object through a multi-target weight search network based on first object feature data of an object to be recommended; based on the fusion weight parameters corresponding to the feedback data of each first sub-object, the first score corresponding to the feedback data of each first sub-object is fused to obtain the second score of the target multimedia information, so that the second score of the target multimedia information can be used for recommending the multimedia information to the object to be recommended, more accurate recommendation is realized, and meanwhile, the hardware cost required by the multi-target weight search network is smaller.

Description

Multimedia information recommendation method, device, program product, equipment and medium

Technical Field

The present invention relates to the field of artificial intelligence technologies, and in particular, to a method, an apparatus, a program product, a device, and a medium for recommending multimedia information.

Background

The neural network model based on machine learning can be used for recommending various videos, and the artificial intelligence (ARTIFICIAL INTELLIGENCE AI) is a comprehensive technology of computer science, so that the machine has the functions of sensing, reasoning and decision making by researching the design principle and the implementation method of various intelligent machines. Artificial intelligence technology is a comprehensive discipline, and is widely related to fields, such as natural language processing technology, machine learning/deep learning and other directions, and it is believed that with the development of technology, the artificial intelligence technology will be applied in more fields and become more and more valuable.

In the traditional technology, a large number of user operation records are required to be collected accurately for video recommendation, but for a new user, enough marked samples are difficult to collect for traditional machine learning to extract information which can be recommended from data, so that a model fitting phenomenon easily occurs, new noise is easily introduced, the data processing effect of the model is affected, and meanwhile, the hardware cost required for realizing accurate video recommendation is large.

Disclosure of Invention

In view of this, embodiments of the present invention provide a method, an apparatus, a program product, a device, and a medium for recommending multimedia information, which can determine, through a multi-objective weight search network, a fusion weight parameter corresponding to feedback data of each first sub-object, and perform fusion processing on a first score corresponding to feedback data of each first sub-object, to obtain a second score of target multimedia information, so as to recommend multimedia information to an object to be recommended by using the second score.

The embodiment of the invention provides a multimedia information recommendation method, which comprises the following steps:

Acquiring first object feedback data aiming at target multimedia information, wherein the first object feedback data comprises at least two first sub-object feedback data with different feedback types;

predicting to obtain a first score corresponding to each first sub-object feedback data based on the first object feedback data;

Determining fusion weight parameters corresponding to feedback data of each first sub-object through a multi-target weight search network based on first object feature data of an object to be recommended, wherein the multi-target weight search network is obtained by updating network parameters of an initial multi-target weight search network through an evolutionarily learning mode based on second object feature data of the object to be recommended and real object feedback data of a sample object aiming at sample multimedia information;

And carrying out fusion processing on the first scores corresponding to the feedback data of each first sub-object based on the fusion weight parameters corresponding to the feedback data of each first sub-object to obtain second scores of the target multimedia information, wherein the second scores of the target multimedia information are used for recommending the multimedia information to the object to be recommended.

The embodiment of the invention also provides a multimedia information recommending device, which comprises:

The information transmission module is used for acquiring first object feedback data aiming at target multimedia information, wherein the first object feedback data comprises at least two first sub-object feedback data with different feedback types;

The information processing module is used for predicting and obtaining a first score corresponding to each first sub-object feedback data based on the first object feedback data;

The information processing module is used for determining fusion weight parameters corresponding to feedback data of each first sub-object through a multi-target weight search network based on first object feature data of an object to be recommended, wherein the multi-target weight search network is obtained by updating network parameters of an initial multi-target weight search network through an evolutionary learning mode based on second object feature data of the object to be recommended and real object feedback data of a sample object aiming at sample multimedia information;

The information processing module is configured to perform fusion processing on the first score corresponding to each first sub-object feedback data based on the fusion weight parameter corresponding to each first sub-object feedback data, so as to obtain a second score of the target multimedia information, where the second score of the target multimedia information is used to recommend the multimedia information to the object to be recommended.

In the above-described arrangement, the first and second embodiments,

The information processing module is used for acquiring a training sample set, wherein the training sample set comprises the real object feedback data of the sample object aiming at sample multimedia information;

The information processing module is used for updating the network parameters of the initial multi-target weight search network through an evolution strategy algorithm by utilizing the training sample set, and obtaining the multi-target weight search network through the updated network parameters.

In the above-described arrangement, the first and second embodiments,

The information processing module is used for calculating the network parameter updating direction of the initial multi-target weight search network according to the noise standard deviation parameter of the initial multi-target weight search network;

The information processing module is used for configuring a first random noise parameter for each sample object in the training sample set according to the noise standard deviation parameter;

The information processing module is configured to determine, according to the first random noise parameter, the network parameter of the initial multi-objective weight search network, and the noise standard deviation parameter, an fitness function corresponding to the initial multi-objective weight search network, a fitness score corresponding to each sample object, and use the fitness score corresponding to the sample object as real object feedback data of the sample object;

the information processing module is used for sharing real object feedback data of the sample objects in all sample objects of the training sample set;

The information processing module is used for configuring a second random noise parameter for each sample object in the training sample set;

The information processing module is used for determining first gradient change parameters of different rounds of the initial multi-target weight search network according to the second random noise parameter, the network parameter of the initial multi-target weight search network and the noise standard deviation parameter;

The information processing module is used for updating the network parameters of the initial multi-target weight search network by using the first gradient change parameters of different rounds of the initial multi-target weight search network according to the network parameter updating direction of the initial multi-target weight search network.

In the above-described arrangement, the first and second embodiments,

The information processing module is used for updating network parameters of the initial multi-target weight search network through a meta-learning framework based on an evolution strategy algorithm by utilizing the training sample set, and obtaining the multi-target weight search network based on the updated network parameters.

In the above-described arrangement, the first and second embodiments,

The information processing module is used for extracting first training sample sets of target task numbers from the training sample sets, wherein each first training sample set corresponds to one sample object;

configuring a support set for the initial multi-objective weight search network based on at least one task number set;

calculating the network parameter updating direction of the initial multi-target weight search network according to the noise standard deviation parameter of the initial multi-target weight search network;

Configuring a third random noise parameter for each sample object in the support set according to the noise standard deviation parameter;

According to the third random noise parameter, the network parameter of the initial multi-target weight search network and the noise standard deviation parameter, determining the fitness score corresponding to each sample object in the support set through the fitness function corresponding to the initial multi-target weight search network, and taking the fitness score corresponding to the sample object as real object feedback data of the sample object;

Sharing real object feedback data of the sample objects among all sample objects of the support set;

configuring a fourth random noise parameter for each sample object in the support set;

determining second gradient change parameters of different rounds of the initial multi-target weight search network according to the fourth random noise parameter, the network parameter of the initial multi-target weight search network and the noise standard deviation parameter;

And updating the network parameters of the initial multi-target weight search network by using the second gradient change parameters of different rounds of the initial multi-target weight search network according to the network parameter updating direction of the initial multi-target weight search network to obtain the first network parameters of the initial multi-target weight search network.

In the above-described arrangement, the first and second embodiments,

The information processing module is used for searching a network configuration query set for the initial multi-objective weight based on at least one task number set;

Configuring a fifth random noise parameter for each sample object in the query set according to the noise standard deviation parameters;

According to the fifth random noise parameter, the network parameter of the initial multi-target weight search network and the noise standard deviation parameter, determining an fitness score corresponding to each sample object in the query set through a fitness function corresponding to the initial multi-target weight search network, and taking the fitness score corresponding to the sample object as real object feedback data of the sample object;

sharing real object feedback data of the sample objects in all sample objects of the query set, and configuring a sixth random noise parameter for each sample object in the query set;

determining third gradient change parameters of different rounds of the initial multi-target weight search network according to the sixth random noise parameter, the network parameter of the initial multi-target weight search network and the noise standard deviation parameter;

And updating the network parameters of the initial multi-target weight search network by utilizing third gradient change parameters of different rounds of the initial multi-target weight search network and the first network parameters of the initial multi-target weight search network according to the network parameter updating direction of the initial multi-target weight search network to obtain second network parameters of the initial multi-target weight search network.

The embodiment of the invention also provides a computer program product, which comprises a computer program or instructions, and is characterized in that the computer program or instructions realize the multimedia information recommendation method when being executed by a processor

The embodiment of the invention also provides electronic equipment, which comprises:

a memory for storing executable instructions;

and the processor is used for realizing the preamble data processing method when the executable instructions stored in the memory are run.

The embodiment of the invention also provides a computer readable storage medium which stores executable instructions, and is characterized in that the executable instructions realize a preamble multimedia information recommendation method or a preamble data processing method when being executed by a processor.

The embodiment of the invention has the following beneficial effects:

According to the embodiment of the invention, the first score corresponding to each first sub-object feedback data is obtained by obtaining the first object feedback data aiming at the target multimedia information, based on the first object feedback data, the first score corresponding to each first sub-object feedback data is obtained through prediction, the fusion weight parameter corresponding to each first sub-object feedback data is determined through a multi-target weight search network based on the first object feature data of the object to be recommended, the first score corresponding to each first sub-object feedback data is subjected to fusion processing based on the fusion weight parameter corresponding to each first sub-object feedback data, and the second score of the target multimedia information is obtained, so that the second score of the target multimedia information is obtained through fusion processing of the first score corresponding to each first sub-object feedback data, the accurate multimedia information recommendation is carried out to the object to be recommended by using the second score, and meanwhile, the effects of stronger generalization capability and data processing capability of the multi-target weight search network and lower hardware cost are achieved.

Drawings

Fig. 1 is a schematic view of a scenario of a multimedia information recommendation method according to an embodiment of the present invention;

Fig. 2 is a schematic diagram of a composition structure of a multimedia information recommendation device according to an embodiment of the present invention;

fig. 3 is a schematic flowchart of an alternative method for recommending multimedia information according to an embodiment of the present invention;

fig. 4 is a schematic flow chart of an alternative method for recommending multimedia information according to an embodiment of the present invention;

FIG. 5 is a schematic diagram of a multi-objective weight search network according to an embodiment of the present invention;

FIG. 6 is a schematic diagram illustrating an alternative processing procedure of a multimedia information recommendation method according to an embodiment of the present invention;

FIG. 7 is a schematic diagram of the effect of the evolutionary strategy algorithm in an embodiment of the invention;

FIG. 8 is a schematic diagram illustrating an alternative processing procedure of a multimedia information recommendation method according to an embodiment of the present invention;

Fig. 9A is a schematic diagram illustrating an effect of a multimedia information recommendation method according to an embodiment of the present invention;

Fig. 9B is a schematic diagram illustrating an effect of a multimedia information recommendation method according to an embodiment of the present invention.

Detailed Description

The present invention will be further described in detail with reference to the accompanying drawings, for the purpose of making the objects, technical solutions and advantages of the present invention more apparent, and the described embodiments should not be construed as limiting the present invention, and all other embodiments obtained by those skilled in the art without making any inventive effort are within the scope of the present invention.

In the following description, reference is made to "some embodiments" which describe a subset of all possible embodiments, but it is to be understood that "some embodiments" can be the same subset or different subsets of all possible embodiments and can be combined with one another without conflict.

Before describing embodiments of the present invention in further detail, the terms and terminology involved in the embodiments of the present invention will be described, and the terms and terminology involved in the embodiments of the present invention will be used in the following explanation.

1) The neural network (Neural Network, NN) is an artificial neural network (ARTIFICIAL NEURAL NE TWORK, ANN), abbreviated as neural network or neural-like network, in the field of machine learning and cognition science, is a mathematical or computational model that mimics the structure and function of a biological neural network (the central nervous system of an animal, particularly the brain) for estimating or approximating functions.

2) Model parameters are one number of relationships between functions and variables that are established using common variables. In artificial neural networks, the model parameters are typically real matrices.

3) Model training, multi-classification learning is carried out on the image data set. The model can be constructed by adopting various deep learning frameworks, and a multi-class model is formed by using the multi-layer combination of CNN and other neural network layers. The input of the model is a three-channel or original channel matrix formed by reading an image through tools such as openCV, the model is output as multi-classification probability, and the webpage category is finally output through algorithms such as softmax. During training, the model approaches to the correct trend through an objective function such as cross entropy and the like.

4) Multi-TASK LEARNING, in the field of machine Learning, by simultaneously performing Joint Learning and evolution on multiple related tasks, better model accuracy than that of a single task can be achieved, and the multiple tasks help each other by sharing a presentation layer, and the training method is called Multi-task Learning, also called Joint Learning (Joint Learning).

5) Meta learning, meta-learning, also known as academic learning (Learning to learn), refers to the process of learning how to learn. The conventional machine learning problem is to learn a mathematical model for prediction from scratch, which is far from the process of human learning, accumulating historical experience (also called meta knowledge), and guiding new learning tasks. Meta-learning is a learning training process that learns different machine learning tasks, and how to learn how to train a model faster and better.

6) N-way K-shot, and learning a small sample in a common training set in the classification field. In the training stage, N categories are extracted from the training set, K samples of each category and a total of N x K samples form a meta-task to serve as a support set (support set) of the model, and then a batch of samples are extracted from the rest data of the N categories to serve as a query set (query set) of the model. Such tasks are called N-way K-shot problems.

7) Task, model training and testing unit of meta learning. Consists of a support set (support set) and a query set (query set). For example, according to the experimental setting of 5-way 5-shot, 5 categories are randomly selected from the data set, 5 samples are randomly selected from each category to form a support set, and a certain sample (for example, 15 samples) is extracted from the same category to form a query set, and finally a task is formed.

8) In response to a condition or state representing a dependency of an operation performed, the one or more operations performed may be in real-time or with a set delay when the dependency is satisfied, and without any particular limitation to execution sequencing.

9) Terminals, including but not limited to, ordinary terminals that maintain long and/or short connections with a transmission channel, and dedicated terminals that maintain long connections with the transmission channel.

10 A carrier in the terminal that implements a specific function, for example, a mobile client (APP) is a carrier of a specific function in the mobile terminal, for example, a payment consumption function or a video play function is performed.

11 In the recommendation system, comprehensively utilizing the pre-evaluation values (such as estimated click rate, comment rate and the like) of the candidate items by the user, fusing the pre-evaluation values into a score which can measure the satisfaction degree of the user on recommendation by a certain mode (such as weighted summation and the like), then sequencing the candidate items according to the scores of the candidate items, and finally recommending the candidate items to the corresponding users.

The embodiment of the invention can be realized by combining Cloud technology, wherein Cloud technology (Cloud technology) refers to a hosting technology for integrating hardware, software, network and other series resources in a wide area network or a local area network to realize calculation, storage, processing and sharing of data, and can also be understood as the general term of network technology, information technology, integration technology, management platform technology, application technology and the like applied based on a Cloud computing business model. Background services of technical network systems require a large amount of computing and storage resources, such as video websites, picture websites and more portal websites, so cloud technologies need to be supported by cloud computing.

It should be noted that cloud computing is a computing mode, which distributes computing tasks on a resource pool formed by a large number of computers, so that various application systems can acquire computing power, storage space and information service as required. The network that provides the resources is referred to as the "cloud". Resources in the cloud are infinitely expandable in the sense of users, and can be acquired at any time, used as needed, expanded at any time and paid for use as needed. As a basic capability provider of cloud computing, a cloud computing resource pool platform, referred to as a cloud platform for short, is generally referred to as Infrastructure AS A SERVICE (IaaS), and multiple types of virtual resources are deployed in the resource pool for external clients to select for use. The cloud computing resource pool mainly comprises computing equipment (which can be a virtualized machine and comprises an operating system), storage equipment and network equipment.

Fig. 1 is a schematic view of a scenario of a multimedia information processing method according to an embodiment of the present invention, referring to fig. 1, a terminal (including a terminal 10-1 and a terminal 10-2) is provided with a corresponding client capable of playing embedded multimedia information, the terminal is connected to a server 200 through a network 300, the network 300 may be a wide area network or a local area network, or a combination of the two, and data transmission is implemented by using a wireless link, where the multimedia information includes, but is not limited to, video, pictures, GIF animation, and advertisement information. The types of multimedia information acquired by the terminals (including the terminal 10-1 and the terminal 10-2) from the corresponding server 200 through the network 300 may be the same or different, for example, the terminals (including the terminal 10-1 and the terminal 10-2) may acquire various types of multimedia advertisements delivered by advertisers from the corresponding server 200 through the network 300, and may acquire short videos or long videos provided by a video operator from the corresponding server 200 through the network 300 for viewing, which is not limited by the specific type of multimedia information. Different multimedia information may be stored in the server 200, wherein the multimedia information used as an advertisement may be content in different dynamic formats, such as gif, mp4, mov, etc.

In the process that the terminal (terminal 10-1 and/or terminal 10-2) obtains and displays the corresponding service with embedded multimedia information from the server 200 through the network 300, the user can perform different operations on the multimedia information displayed in the multimedia information display window through the terminal (terminal 10-1 and/or terminal 10-2), so as to generate different user behaviors, for example, when the multimedia information is a video advertisement, the user can share and/or approve the exposed short video in the process of watching the information, and also can collect products provided in the video advertisement by clicking hyperlinks in the short video. When the multimedia information is a dynamic GIF advertisement, the user may forward and/or comment on the advertisement during the exposure of the advertisement through the terminal (terminal 10-1 and/or terminal 10-2), or may jump to the corresponding product purchase link page through the GIF advertisement.

As an example, when determining what multimedia information is recommended to the terminal 10-1 or 10-2 of the user to play, the server 200 needs to adjust the multimedia information to be played in time, for example, replace any multimedia information in the multimedia information set to be played, so as to adapt to the viewing requirements of different target users, and taking short video information as an example, the multi-target weight search network provided by the invention can be applied to short video playing, in which different short video information of different data sources are usually processed, and finally, the corresponding different information and corresponding video to be recommended corresponding to the corresponding short video recommendation process are presented on the user interface UI (User Interface), where the accuracy and timeliness of the characteristics of the different information phases directly affect the user experience. The background database of video playing receives a large amount of video data from different sources every day, and the obtained different information recommended to the target user can be called by other application programs (for example, the recommendation result of the short video recommendation process is migrated to the long video recommendation process or the news recommendation process), and of course, the multi-target weight search network matched with the corresponding target user can also be migrated to different video recommendation processes (for example, the web page video recommendation process, the applet video recommendation process or the video recommendation process of the long video client).

As an example, the server 200 is configured to lay a corresponding multi-target weight search network to implement the multi-media information recommendation method provided by the present invention, or lay a multi-media information recommendation device to implement the multi-media information recommendation method, specifically, multi-media information recommendation may be implemented by acquiring first object feedback data for target multi-media information, where the first object feedback data includes at least two first sub-object feedback data of different feedback types, predicting, based on the first object feedback data, a first score corresponding to each first sub-object feedback data, determining, based on first object feature data of an object to be recommended, a fusion weight parameter corresponding to each first sub-object feedback data through a multi-target weight search network, where the multi-target weight search network is obtained by updating network parameters of an initial multi-target multi-media information, based on second object feature data of the object to be recommended and real object feedback data of a sample multi-media information, each first score corresponding to the sample feedback data of the multi-media object to be recommended is obtained by fusion weight search network, and the multi-target weight search network is based on the second object feature data of the object to be recommended, and the sample object feedback data of the sample media information is obtained by fusion weight data corresponding to the real object feedback data of the sample media information.

Further, the multimedia information to be recommended matched with the target user can be displayed and output through the terminal (the terminal 10-1 and/or the terminal 10-2). Taking short video playing as an example, the multi-target weight search network provided by the invention can be applied to processing corresponding object feature data in a short video recommendation environment, and determining fusion weight parameters corresponding to feedback data of each sub-object, and processing different short multimedia information with different data sources in short video playing generally, and finally presenting corresponding different multimedia information on a user interface UI (User Interface), wherein the accuracy and timeliness of multimedia information recommendation directly influence user experience. The background database for video playing receives a large amount of multimedia information data from different sources every day, and the obtained different multimedia information for recommending the multimedia information to the user can be called by other application programs (for example, the recommendation result of the short video recommendation process is migrated to the recommendation process or the news recommendation process in the instant messaging client), and of course, the multi-target weight search network matched with the corresponding target user can also be migrated to different video recommendation processes (for example, the web video recommendation process, the applet video recommendation process or the video recommendation process of the client in the instant messaging client).

The multimedia information recommendation method provided by the embodiment of the invention is realized based on artificial intelligence, wherein the artificial intelligence (ARTIFICIAL INTELLIGENCE, AI) is a theory, a method, a technology and an application system which simulate, extend and expand human intelligence by using a digital computer or a machine controlled by the digital computer, sense environment, acquire knowledge and acquire an optimal result by using the knowledge. In other words, artificial intelligence is an integrated technology of computer science that attempts to understand the essence of intelligence and to produce a new intelligent machine that can react in a similar way to human intelligence. Artificial intelligence, i.e. research on design principles and implementation methods of various intelligent machines, enables the machines to have functions of sensing, reasoning and decision.

The artificial intelligence technology is a comprehensive subject, and relates to the technology with wide fields, namely the technology with a hardware level and the technology with a software level. Artificial intelligence infrastructure technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and other directions.

In the embodiment of the invention, the mainly related artificial intelligence software technology comprises the voice processing technology, machine learning and other directions. For example, speech recognition techniques (Automatic Speech Recognition, ASR) in speech technology (Speech Technology) may be involved, including speech signal preprocessing (SPEECH SIGNAL preprocessing), speech signal frequency domain analysis (SPEECH SIGNAL frequency analyzing), speech signal feature extraction (SPEECH SIGNAL feature extraction), speech signal feature matching/recognition (SPEECH SIGNAL feature matching/recognition), training of speech (SPEECH TRAINING), and the like.

For example, machine learning (MACHINE LEARNING, ML) can be involved, which is a multi-domain interdisciplinary, involving multiple disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory, and the like. It is specially studied how a computer simulates or implements learning behavior of a human to acquire new knowledge or skills, and reorganizes existing knowledge structures to continuously improve own performance. Machine learning is the core of artificial intelligence, a fundamental approach to letting computers have intelligence, which is applied throughout various areas of artificial intelligence. Machine learning typically includes techniques such as deep learning (DEEP LEARNING) that includes artificial neural networks (ARTIFICIAL NEURAL NETWORK), such as convolutional neural networks (Convolutional Neural Network, CNN), recurrent neural networks (Recurrent Neural Network, RNN), deep neural networks (Deep neural network, DNN), and the like.

It can be appreciated that the multimedia information recommendation method and the voice processing provided by the invention can be applied to an intelligent device (INTELLIGENT DEVICE), and the intelligent device can be any device with an information display function, for example, an intelligent terminal, an intelligent home device (such as an intelligent sound box and an intelligent washing machine), an intelligent wearable device (such as an intelligent watch), a vehicle-mounted intelligent central control system (for displaying multimedia information to a user through a small program for executing different tasks), or an AI intelligent medical device (for displaying a treatment case through displaying multimedia information), and the like.

The following describes the structure of the multimedia information recommendation device according to the embodiment of the present invention in detail, and the multimedia information recommendation device may be implemented in various forms, such as a dedicated terminal with a multimedia information recommendation processing function, or may be a server provided with a multimedia information recommendation device processing function, for example, the server 200 in fig. 1. Fig. 2 is a schematic diagram of a composition structure of a multimedia information recommendation apparatus according to an embodiment of the present invention, and it can be understood that fig. 2 only shows an exemplary structure of the multimedia information recommendation apparatus, but not all the structures, and a part of or all the structures shown in fig. 2 can be implemented as required.

The multimedia information recommendation device provided by the embodiment of the invention comprises at least one processor 201, a memory 202, a user interface 203 and at least one network interface 204. The various components of the multimedia information recommendation device are coupled together by a bus system 205. It is understood that the bus system 205 is used to enable connected communications between these components. The bus system 205 includes a power bus, a control bus, and a status signal bus in addition to the data bus. But for clarity of illustration the various buses are labeled as bus system 205 in fig. 2.

The user interface 203 may include, among other things, a display, keyboard, mouse, trackball, click wheel, keys, buttons, touch pad, or touch screen, etc.

It will be appreciated that the memory 202 may be either volatile memory or nonvolatile memory, and may include both volatile and nonvolatile memory. The memory 202 in embodiments of the present invention is capable of storing data to support operation of the terminal (e.g., 10-1). Examples of such data include any computer program for operating on a terminal (e.g., 10-1), such as an operating system and application programs. The operating system includes various system programs, such as a framework layer, a core library layer, a driver layer, and the like, for implementing various basic services and processing hardware-based tasks. The application may comprise various applications.

In some embodiments, the multimedia information recommendation apparatus provided by the embodiments of the present invention may be implemented by combining software and hardware, and as an example, the multimedia information recommendation apparatus provided by the embodiments of the present invention may be a processor in the form of a hardware decoding processor, which is programmed to execute the multimedia information recommendation method provided by the embodiments of the present invention. For example, a processor in the form of a hardware decoding processor may employ one or more Application Specific Integrated Circuits (ASICs), DSPs, programmable logic devices (PLDs, programmable Logic Device), complex Programmable logic devices (cplds, complex Programmable Logic Device), field-Programmable gate arrays (FPGAs), or other electronic components.

As an example of implementation of the multimedia information recommendation device provided by the embodiment of the present invention by combining software and hardware, the multimedia information recommendation device provided by the embodiment of the present invention may be directly embodied as a combination of software modules executed by the processor 201, the software modules may be located in a storage medium, the storage medium is located in the memory 202, the processor 201 reads executable instructions included in the software modules in the memory 202, and the multimedia information recommendation method provided by the embodiment of the present invention is completed by combining necessary hardware (including, for example, the processor 201 and other components connected to the bus 205).

By way of example, the Processor 201 may be an integrated circuit chip having signal processing capabilities such as a general purpose Processor, such as a microprocessor or any conventional Processor, a digital signal Processor (DSP, digital Signal Processor), or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like.

As an example of hardware implementation of the multimedia information recommendation apparatus provided by the embodiment of the present invention, the apparatus provided by the embodiment of the present invention may be directly implemented by the processor 201 in the form of a hardware decoding processor, for example, one or more Application specific integrated circuits (ASICs, application SPECIFIC INT EGRATED circuits), DSPs, programmable logic devices (PLDs, programmable Logic Device), complex Programmable logic devices (CPLDs, complex Programmable Logic Device), field-Programmable gate arrays (FPGAs), or other electronic components to implement the multimedia information recommendation method provided by the embodiment of the present invention.

The memory 202 in the embodiment of the present invention is used to store various types of data to support the operation of the multimedia information recommendation apparatus. Examples of such data include any executable instructions for operating on the multimedia information recommendation device, such as executable instructions, in which a program implementing the method of recommending slave multimedia information according to an embodiment of the present invention may be included.

In other embodiments, the multimedia information recommendation device provided in the embodiments of the present invention may be implemented in a software manner, and fig. 2 shows the multimedia information recommendation device stored in the memory 202, which may be software in the form of a program, a plug-in, or the like, and includes a series of modules, and as an example of the program stored in the memory 202, may include the multimedia information recommendation device, where the multimedia information recommendation device includes the following software modules:

An information transmission module 2081 and an information processing module 2082. When a software module in the multimedia information recommendation device is read into a RAM by the processor 201 and executed, the multimedia information recommendation method provided by the embodiment of the invention is realized, wherein the functions of each software module in the multimedia information recommendation device comprise an information transmission module, a target user behavior parameter information acquisition module and a multimedia information recommendation request acquisition module, wherein the information transmission module is used for responding to the multimedia information recommendation request;

The information transmission module 2081 is configured to obtain first object feedback data for the target multimedia information, where the first object feedback data includes at least two first sub-object feedback data with different feedback types;

the information processing module 2082 is configured to predict and obtain a first score corresponding to each first sub-object feedback data based on the first object feedback data;

The information processing module 2082 is configured to determine, based on first object feature data of an object to be recommended, a fusion weight parameter corresponding to each first sub-object feedback data through a multi-objective weight search network, where the multi-objective weight search network is obtained by updating network parameters of an initial multi-objective weight search network through an evolutionary learning method based on second object feature data of the object to be recommended and real object feedback data of a sample object for sample multimedia information;

The information processing module 2082 is configured to perform fusion processing on the first score corresponding to each first sub-object feedback data based on the fusion weight parameter corresponding to each first sub-object feedback data, so as to obtain a second score of the target multimedia information, where the second score of the target multimedia information is used to recommend the multimedia information to the object to be recommended.

According to the electronic device shown in fig. 2, in one aspect of the invention, the invention also provides a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The computer instructions are read from the computer readable storage medium by a processor of the electronic device, which executes the computer instructions, causing the electronic device to perform the different embodiments and combinations of embodiments provided in the various alternative implementations of the multimedia information recommendation method described above.

Referring to fig. 3, fig. 3 is a schematic flowchart illustrating an alternative method for recommending multimedia information according to an embodiment of the present invention, and it may be understood that the steps shown in fig. 3 may be performed by various electronic devices running the multimedia information recommending apparatus, for example, a terminal with the multimedia information recommending apparatus, a server, or a server cluster, where the terminal with the multimedia information recommending apparatus may be the electronic device with the multimedia information recommending apparatus in the embodiment shown in fig. 2. The following is a description of the steps shown in fig. 3.

Step 301, the multimedia information recommendation device acquires first object feedback data aiming at target multimedia information.

In some embodiments, before acquiring feedback data of a first object for target multimedia information, a user may first trigger a client for presenting the multimedia information, when the client is triggered and starts recommending different multimedia information to the user, a server receives a multimedia information recommendation request sent by a terminal, and the server responds to the multimedia information recommendation request to acquire multimedia information to be recommended in a multimedia information data source.

The first object feedback data comprises at least two first sub-object feedback data with different feedback types. Taking multimedia information as short video as an example, a user can share and/or pray the exposed short video in the process of watching the information, and can also collect products provided in video advertisements by clicking hyperlinks in the short video. When the multimedia information is a dynamic GIF advertisement, in the exposure process of the advertisement through the terminal (terminal 10-1 and/or terminal 10-2), the user may forward and/or comment on the advertisement, or may jump to a corresponding product purchase link page through the GIF advertisement, and each operation performed on the short video information may be used as feedback data of a first sub-object of a feedback type.

Step 302, the multimedia information recommendation device predicts and obtains a first score corresponding to each first sub-object feedback data based on the first object feedback data.

Step 303, the multimedia information recommendation device determines fusion weight parameters corresponding to the feedback data of each first sub-object through a multi-target weight search network based on the first object feature data of the object to be recommended.

In step 303, the first object feature data may include, but is not limited to, an age feature, a gender feature, a consumption level feature, and a user context feature, where, for any user as an object to be recommended, the context in which the user is located may include, but is not limited to, a time, a place, a mood, etc. of the user accessing the recommendation system, which is very important for improving the accuracy of the recommendation, where, by the user accessing the time context feature of the recommendation system, interest changes reflected by the user's recent behavior may be reflected by the user's recent and long-term behaviors, and continuity of prediction of the user's interest in the recommendation process is ensured.

The multi-target weight search network used in the recommendation process is obtained by updating network parameters of an initial multi-target weight search network in an evolutionary learning mode based on second object feature data of the object to be recommended and real object feedback data of a sample object aiming at sample multimedia information, wherein each real object feedback data aiming at the sample multimedia information is object feedback data of the sample object aiming at the recommended sample multimedia information. The sample object may include one or more of a low-resolution user, a new user and a high-resolution user, specifically, the new user is a user who registers for the first time to use the client, the low-resolution user is a user who registers for the short video client to use, but generates less user behavior information, and the high-resolution user is a user who registers for the short video client to use, and generates more user behavior information in the use process of the client.

In an exemplary embodiment, in the recommendation environment of the short video client, the threshold of the number of pieces of user behavior information may be set to 10, and when the number of pieces of user behavior information of a certain user is less than or equal to 10, the certain user may be a low-resolution user, and when the number of pieces of user behavior information of a certain user is greater than 10, the certain user may be a high-resolution user. It should be noted that, for different multimedia information recommendation environments, the threshold value of the user behavior information may be dynamically adjusted, and the embodiment of the present invention is not limited in particular.

Before multimedia information recommendation is implemented using the multi-objective weight search network, training of the multi-objective weight search network is also required to determine network parameters of the multi-objective weight search network.

Taking the multimedia information as a short video as an example, before using the multi-objective weight search network, training the multi-objective weight search network is further needed to determine network parameters of the multi-objective weight search network, where the determined network parameters of the multi-objective weight search network can adapt to the current recommendation environment of the multimedia information, see fig. 4, fig. 4 is an optional flowchart of the multimedia information recommendation method provided by the embodiment of the present invention, and it can be understood that the steps shown in fig. 4 may be performed by various servers running the multimedia information recommendation device, for example, may be a dedicated terminal, a server or a server cluster with a video processing function.

Before introducing the multi-objective weight search network training process provided by the invention, firstly, the defects in the related information recommendation method technology are described, when the related technology is used for recommending different types of multimedia information, two modes of single-objective optimization and multi-objective optimization can be provided, and the single-objective optimization with the basic characteristic of the click rate of a user is taken as an example, and the user can not click to watch or read after seeing a video or an article. In actual situations, however, there are a variety of actions of the user on a product, such as praise, forwarding, collection, comments, etc. Optimizing the CTR of a user for a single goal may not be able to achieve an ideal recommendation result simply by relying on the clicking behavior of the user to recommend content to the user. The user may see that the popular content clicks, which may result in a higher CTR, but the recommended content based on the CTR may not meet the actual needs of the user.

The multi-objective optimization can have 4 ways, taking the probability of predicting a series of behaviors of a user such as reading, praying, forwarding, commenting and the like as an example, and the method can comprise the following steps:

1) changing sample weights, 2) Multi-model score fusion, 3) sorting Learning (LTR), 4) Multi-objective Learning (Multi-TASK LEARNING, MTL), in which Multi-objective fusion is an important step in generating sorting results, and the feedback behavior indexes predicted by MTL model are fused into scores reflecting the satisfaction degree of the user on short videos (namely, second scores of objective multimedia information in the invention). And sequencing the recommended short videos through the scores, so that the effect of maximizing the satisfaction degree of the recommendation system is achieved. In the related art, the multi-objective fusion mainly comprises two methods, namely formula fusion and order learning. Compared with order learning, formula fusion has the advantages of simple model, low cost and the like, and is widely used. The formula fusion is mainly divided into two parts, namely agent function design and fusion weight search. The proxy function mostly adopts linear weighted fusion or weighted multiplication fusion. The searching of the fusion weight generally comprises a manual setting parameter, grid searching and a Bayesian optimization method. The manual setting parameters are to select several groups of solutions as fusion candidate vectors according to subjective experience, and select an optimal solution through A/B test. The grid search mainly searches each possible combination of the super parameter set by an exhaustive iteration method, and selects the optimal super parameter combination as a fusion weight by using a cross-validation method. The Bayesian optimization is mainly to establish a probability model based on past evaluation results of the objective function, and infer the super-parameter value of the extreme point as a fusion weight by updating the model.

However, in any of the target optimization methods, the defects of low parameter searching efficiency and high experimental resource consumption occur during training. Grid search is an exhaustive search method for specifying parameter values, and has low search efficiency. The Bayesian optimization method guides the next parameter collection process by utilizing the information of the previous search points, so that the parameter searching efficiency is improved, but the whole process needs to be still on-line for collecting samples by multiple groups of parameters, and the resource consumption is high.

Meanwhile, personalized modeling cannot be performed for special users, and as the results of grid search and Bayesian optimization search are still fixed parameters, personalized modeling cannot be performed, and self-adaptive adjustment cannot be performed according to distribution changes of ranking scoring, so that recommendation strategies of multimedia information are too single, and the use experience of users is affected.

More importantly, accurate video recommendation often requires collection of a large number of user action records. Recommendation systems are also often accompanied by few sample learning problems in real-world scenarios, such as low-consumption users (users with less sample information may become low-consumption users) and recommendations of new users. When the number of training samples is small, the model cannot fit the actual behaviors of the users, so that it is often difficult for new users to collect enough marked samples for traditional machine learning to extract information which can be recommended from data, the model overfitting phenomenon easily occurs, new noise is easily introduced, and the data processing effect of the model is affected.

In order to solve these drawbacks, referring to fig. 4, the method for recommending multimedia information according to the embodiments of the present invention may be used for non-real-time multi-objective weight search network training, and may implement recommendation sequence adjustment for any multimedia information (including various video types such as drama, movie, and short video). The following is a description of the steps shown in fig. 4.

Step 401, a multimedia information recommendation device acquires a training sample set, wherein the training sample set comprises real object feedback data of the sample object aiming at sample multimedia information.

The training samples include operation behavior information of users in different types on multimedia information, and it is to be noted that, due to different viewing habits of users, different types of users often occur, for users who frequently view short videos, different operations on the short videos may be recorded in a viewing record, and satisfaction evaluation of short video recommendation is based on relevant indexes of clicking, browsing depth (residence time), purchasing, collecting, purchasing goods in the short videos, repeatedly purchasing goods in the short videos, giving good comments to the goods in the short videos, and the like. Different indexes are often generated under different recommendation systems, different periods and different product forms, and the operation behavior information of new users and users with fewer usage records on the multimedia information is often less, so that the formed training samples are also less. In the training process, data in the training sample set can be extracted, and the trained multi-objective weight search network can process the characteristics of different types of users through different training processes so as to meet the recommendation environments of different multimedia information recommendations.

Taking multimedia information as a short video as an example, in order to realize more personalized recommendation, when the target multimedia information is recommended, each index of the target multimedia information can be predicted by utilizing a multi-target ordering model, wherein different indexes can correspond to different feedback types, and the indexes can specifically comprise one or more of a play completion rate, a consumption duration, a praise rate, a comment rate, a purchase rate, a praise rate and the like. Therefore, the first sub-object feedback data of the target multimedia information corresponding to different feedback types can be obtained, and the corresponding first score is obtained through prediction based on the first sub-object feedback data.

Further, in order to fuse the prediction results (the first score corresponding to each first sub-object feedback data) of the multiple indexes into a score (the second score corresponding to the target multimedia information) capable of measuring the satisfaction degree of the user on the target multimedia information, the prediction results are used for recommending the target multimedia information, for example, the second score of each target multimedia information can be obtained by respectively performing the above operation on different target multimedia information, and each target multimedia information is ranked according to the second score, and then the target multimedia information with the head (for example, the first one of the ranking results or the first two of the ranking results) can be preferentially recommended to the user terminal based on the ranking results of each target multimedia information. In an exemplary embodiment, the first score corresponding to the feedback data of each first sub-object may be fused by a target fusion function to obtain a second score of the target multimedia information, and an expression of the target fusion function may be shown in formula 1.

Fus _score＝i∑(α_i*log(pred_scorei+β_i)+γ_i) equation 1

Wherein pred _scorei represents a first score corresponding to the feedback data of the first sub-object, fus _score represents a second score of the target multimedia information, and α, β and γ are super-parameters. Alpha is used as a fusion weight parameter of a fusion target, and the importance degree of a user on each target is measured, so that the accuracy of a fusion function is directly influenced. Therefore, the multi-objective weight search network provided by the invention is required to dynamically provide personalized fusion weight parameters for each user so as to realize that the recommended multimedia information is more accurate and enable the user to obtain better viewing experience.

In some embodiments of the present invention, the model structure of the multi-objective ranking model may be a Long Short-Term Memory (LSTM) model, a gated cyclic unit network model (Gated Recurrent Unit, GRU) model, a convolutional neural network (Convolutional Neural Network, CNN), a cyclic neural network (Recurrent Neural Network, RNN), a deep neural network (Deep neural network, DNN), and a three-factor model (Fama-French 3-factor model), and the specific model structure may be flexibly selected according to different types of multimedia information, for example, when the multimedia information is a Short video, the multi-objective ranking model may be constructed based on the Long-Term Memory network, and when the multimedia information is a video advertisement, the multi-objective ranking model may be constructed based on the cyclic neural network.

Referring to fig. 5, fig. 5 is a schematic diagram of a Multi-Objective weight search Network structure according to an embodiment of the present invention, and the Multi-Objective weight search Network (MOWSN) used in the present invention is shown. Wherein MOWSN is a perceptron model. Specifically, the perceptron (perceptron) is a linear classification model of class II classification, the feature vector of an example is input, the class of the example is output at the same time, the +1 and-1 binary values are taken, the perceptron corresponds to a separation hyperplane for dividing the example into positive and negative two classes in an input space (feature space), the perceptron belongs to a discrimination model and aims to solve the separation hyperplane for linearly dividing training data, a loss function based on misclasis introduced, the loss function is minimized by a gradient descent method, the perceptron model is obtained, and new input is classified by the perceptron model. One input x= (X ₁.X₂…X_n) of a given model, where n represents the dimension of the input data and xi represents one feature of the user, including the user's age, gender, consumption level, user context features, etc. The output of the model is the weight of each object α _k (k e {1,2, 3..m }) where m represents the number of objects of the multi-object ranking model. For α _k, it can be calculated by means of equation 2 as follows:

alpha _k＝W^T X+b formula 2

Where W ε { W ₁,w₂,w_i....w_n } is a parameter of the perceptron model and b represents the bias of the perceptron model.

Step 402, the multimedia information recommendation device uses the training sample set to update the network parameters of the initial multi-target weight search network through an evolutionary strategy algorithm, and obtains the multi-target weight search network through the updated network parameters.

Further, because accurate video recommendation often requires collection of a large number of user action records. Recommendation systems are also often accompanied by few sample learning problems in real-world scenarios, such as low-consumption users (users with less sample information may become low-consumption users) and recommendations of new users. When the number of training samples is small, the model cannot fit the actual behavior of the part of the user, so step 403 can also be performed for this case.

Step 403, the multimedia information recommendation device updates the network parameters of the initial multi-target weight search network by using the training sample set through a meta-learning framework based on an evolutionary strategy algorithm, and obtains the multi-target weight search network based on the updated network parameters.

The training methods of the two multi-objective weight search networks involved in the step 402 and the step 403 are described below, wherein in the technical solution of the present invention, the evolutionary learning algorithm may use an evolutionary strategy (Evolution Strategy, ES) algorithm. Other algorithm implementations may also be employed, and embodiments of the present invention are not limited in this regard.

In performing step 402, the network parameters of the initial multi-objective weight search network are updated by the evolutionary strategy algorithm, and in the process of obtaining the multi-objective weight search network by the updated network parameters, the evolutionary strategy algorithm used refers to table 1, and the process of updating the network parameters of the initial multi-objective weight search network by the evolutionary strategy algorithm will be described below with reference to table 1 and the different processing steps in fig. 6.

TABLE 1

Referring to fig. 6, fig. 6 is a schematic diagram of an optional processing procedure of a multimedia information recommendation method according to an embodiment of the present invention, which specifically includes the following steps:

And 601, calculating the network parameter updating direction of the initial multi-target weight search network according to the noise standard deviation parameter of the initial multi-target weight search network.

Wherein, since the noise epsilon follows Gaussian distribution, the update direction of the network parameter W can be determined by the common algorithm

Formula 3:

Step 602, obtaining a noise standard deviation parameter and a learning rate parameter of the multi-objective weight search network.

As shown in line 1 of table 1, when the evolutionary strategy algorithm is used, the acquired noise standard deviation parameter is σ, the learning rate parameter of the multi-target weight search network is θ, and the network parameter W ₀ of the initial multi-target weight search network can be obtained.

In some embodiments of the present invention, in order to ensure that the parameters of the multi-objective weight search network are more reasonable when the noise standard deviation parameter and the learning rate parameter of the multi-objective weight search network are obtained, different learning rate parameters may be set according to different types of recommendation environments of the multi-objective weight search network, for example, the learning rate parameter in a short video recommendation environment including entertainment information may be smaller than the learning rate parameter of a short video recommendation of shopping promotion, so as to achieve accurate ordering of the short video recommendation of shopping promotion, and guide a user to generate more purchase behaviors.

Step 603, configuring a first random noise parameter for each sample object in the training sample set according to the noise standard deviation parameter.

The target users in table 1 are sample objects in the processing steps shown in fig. 6, and the description is not repeated in the following, as shown in lines 4-5 in table 1, for each target user i=1..n, a random noise parameter needs to be configured, e.g. e 1..epsilon.n ~ N (0,I) is randomly generated, before configuring the first random noise parameter, it is necessary to initialize N target users using the known random seed and the network parameter W ₀, as shown in line 2 of table 1, since the random numbers generated during training are all pseudo random numbers, which are a series of values generated using an algorithm. Therefore, a random value needs to be given to the function as an initial value, and a series of random numbers are obtained by iteration continuously based on the reference. This initial value is called a random seed, for example a known random seed in the start time table 1 that can be used by the server to train the multi-target weight search network.

Step 604, determining an fitness score corresponding to each sample object according to the first random noise parameter, the network parameter of the initial multi-objective weight search network and the noise standard deviation parameter and through an fitness function corresponding to the initial multi-objective weight search network, and taking the fitness score corresponding to the sample object as real object feedback data of the sample object.

As shown in line 6 of table 1, by calculating the result of the fitness function F _i＝F(W_t+σε_i), for each target user i=1..n, a corresponding fitness score can be calculated, and through the processing of step 604, the scalar return value Fi of each sample object can be used as the real object feedback data of the sample object.

Step 605, sharing real object feedback data of the sample object in all sample objects of the training sample set.

The invention needs to be explained that the evolution strategy algorithm carries out global optimization by simulating the natural evolution, and follows the evolutionary theory of Dall relic bid-selection and survival of the fittest. Therefore, when model training is realized based on the evolutionary strategy algorithm, only user feedback data is relied on, and real object feedback data of sample objects are shared in all sample objects through step 605, so that error back propagation is not relied on, and the model training method is more suitable for an optimization process of a neural network model comprising multiple stages of multiple strategies.

Step 606, configuring a second random noise parameter for each sample object in the training sample set.

Since the real object feedback data of the sample object is shared among all sample objects of the training sample set, the real object feedback data may have an influence on the behavior of other sample objects, and thus the noise parameters need to be readjusted, via step 605. The training process is implemented with the second random noise parameters epsilon 1..epsilon.n.to N (0,I) continuing to reconfigure the second random noise parameters epsilon.1 using the known random seeds as shown in line 10 of Table 1.

Step 607, determining first gradient change parameters of different rounds of the initial multi-objective weight search network according to the second random noise parameter, the network parameter of the initial multi-objective weight search network and the noise standard deviation parameter.

Step 608, updating the network parameters of the initial multi-target weight search network by using the first gradient change parameters of different rounds of the initial multi-target weight search network according to the network parameter updating direction of the initial multi-target weight search network.

In some embodiments, the recommendation may be measured by one or more of the number and duration of real viewing videos of the target user, i.e., the real object feedback data may include one or more of the number and duration of real viewing videos. The multiple target users are highly parallel at the same time, different target users communicate through only one scalar Fi, and the network parameters W can be updated according to the real-time feedback behaviors of the users through the evolution strategy algorithm shown in the table 1, so that personalized and high-accuracy target fusion parameters can be obtained. In the update process, as shown in line 11 of table 1, the network parameters of the update initial multi-objective weight search network can be expressed as: The rotation of the first gradient change parameter can be controlled through the learning rate theta, and the accuracy of the multi-target weight search network can be adjusted.

Referring to fig. 7, fig. 7 is a schematic view of the effect of the evolutionary strategy algorithm in the embodiment of the present invention, and the round is 4, in which, in the embodiment of the present invention, the processing effect of the evolutionary strategy algorithm is described, as shown in fig. 7, after 4 iterative processes (learning rate θ is 4), the reward is increased from-0.13 to 0.4, and in order to better perform updating of the network parameters by the computer program, referring to table 1, when optimizing the updating result of the first network parameters, the evolutionary strategy algorithm that can be adopted is shown in table 1, where θ=0.01, and σ=0.5 is set. The fitness function F (x) is a measure of the actual consumer behavior score of the user given the network parameters W, which score is fed back in real time by the user.

Continuing with the description of the process of updating the network parameters of the initial multi-objective weight search network based on the meta-learning framework of the evolutionary strategy algorithm provided in step 403, the trained multi-objective weight search network can already process the feature vectors of the target objects with more sample volumes (greater than the sample volume threshold value) through the processing process shown in fig. 6, but the processing of the feature vectors cannot be performed accurately for the few sample objects with less sample volumes (less than or equal to the sample volume threshold value), so that the training can be performed more specifically by extracting samples of the few sample users in the training sample set, and the training process uses a fine-tuning mode to process the multi-objective weight search network by using the meta-learning framework, wherein the principle of the fine-tuning mode is to modify the output layer by using the known network structure and the known network parameters, and use samples of the few sample users (samples with sample volumes less than or equal to the sample volume threshold value) as training samples, so that the training cost of all layers before the last layer in the multi-objective weight search network is increased, and the learning cost of the training process is reduced. The training in this fine-tuning mode uses a Meta-learning framework, which is briefly described below, and the Meta-learning-based learning method consists of a Meta-learner (Meta-learner) and a Base-learner (Base-learner). The element learner mainly learns the commonalities among different tasks through a small number of samples, and the base learner generates a neural network more suitable for the individuality of the Task after performing gradient iteration at least once by using a small number of samples of a certain Task on the basis of the knowledge learned by the element learner. For example, a Model-agnostic meta-learner (MAML) algorithm, in which the Model-Agnostic Meta-Learning algorithm is responsible for seeking the initial parameters of the base learner, is trained to minimize the meta-loss of the meta-learner over a large number of target tasks (i.e., the base learner). The basic learner is a prediction model used by a target Task, initialization parameters are given to the basic learner by the basic learner and then are trained by a small number of gradient iterations, for example, in a parameter generation algorithm based on the basic learner, the basic learner learns and generates common network parameters of a classification layer of the basic learner, and a new Task iterates the network to generate the basic learner of the Task by using a small number of sample gradients.

Through the training process shown in fig. 6, the multi-objective weight search network is enabled to make video recommendations for users with abundant samples, but for users with few samples, the meta-learning framework based on the evolutionary strategy algorithm can be continued, the multi-objective weight search network is trained, with continued reference to table 2,

TABLE 2

Where p (t) is a set of task numbers, the tasks are divided with the user as granularity, each task represents all training samples of the user, and θ _a＝0.02,θ_b =0.01 and σ=0.6 are set.

Referring to fig. 8 in combination with the processing procedures shown in table 2, fig. 8 is a schematic diagram of an alternative processing procedure of a multimedia information recommendation method according to an embodiment of the present invention, which specifically includes the following steps:

Step 801, a first training sample set of a target task number is extracted from the training sample sets, wherein each first training sample set corresponds to one sample object.

Step 802, searching a network configuration support set for the initial multi-objective weight based on at least one task number set.

Specifically, a batch of task tasks can be randomly selected, and a support set support-set is used for training a base learner in each task, so that the effect of updating network parameters of an initial multi-target weight search network is achieved. After the training of the base learner of the batch of tasks is completed, the training samples in the query set query-set are used to update the element learner by the sum of the test results, namely, the slow weight update.

As shown in table 2, each task in the task number set T _t,T_t is randomly extracted from p (T), and is denoted as task _i. For each task _i, a sample of ns (preferably set to 5) is extracted to generate a support-set for the task, denoted as support-task _i, and the remainder are query-set, denoted as query-task _i.

Because the meta learning framework shown in the table still uses the evolutionary strategy algorithm to update the parameters of the multi-objective weight search network, the method shown in table 1 is used to update the parameters of the multi-objective weight search network through the base learner and the meta learner, as shown in table 2, in this embodiment, the base learner performs the loop process of 7-9 in table 2 to update the network parameters of the initial multi-objective weight search network, and finally determines the network parameters of the multi-objective weight search network, the meta learner performs the loop process of 10-11 in table 2, initializes the network parameters of the base learner and trains through a small number of gradient iterations,

Step 803, calculating the network parameter updating direction of the initial multi-target weight search network according to the noise standard deviation parameter of the initial multi-target weight search network.

Step 804, configuring a third random noise parameter for each sample object in the support set according to the noise standard deviation parameter.

The target users involved in table 1 are sample objects in the processing steps shown in fig. 8, and are not repeated in the following, as shown in rows 4-5 in table 1, i=1 for each target user. A third set of noise parameters, such as randomly generated epsilon 1..epsilon.n.about.N (0,I), is required, which can be a known random seed in the starting time table 1 for the server to train the multi-objective weight search network.

Step 805, determining an fitness score corresponding to each sample object in the support set according to the third random noise parameter, the network parameter of the initial multi-objective weight search network, and the noise standard deviation parameter, and through an fitness function corresponding to the initial multi-objective weight search network, and taking the fitness score corresponding to the sample object as real object feedback data of the sample object.

Step 806, sharing real object feedback data of the sample objects in all sample objects of the support set.

As shown in line 6 of table 1, by calculating the result of the fitness function F _i＝F(W_t+σε_i), for each target user i=1,..n, a corresponding fitness score can be calculated, and by the processing of steps 805 and 806, the scalar return value Fi of each sample object can be used as the real object feedback data of the sample object. The process is independent of error back propagation, so that the meta-learning framework based on the evolution strategy algorithm is also suitable for the optimization process of the neural network model containing multiple stages of multiple strategies, and meanwhile, the calculation by using the evolution strategy algorithm is simple and efficient, can be highly parallel and has small resource loss, and therefore, the hardware cost of model training can be effectively reduced.

Step 807 configures a fourth random noise parameter for each sample object in the support set.

Since the real object feedback data of the sample object is shared among all sample objects of the training sample set, the real object feedback data may have an influence on the behavior of other sample objects, and thus the noise parameters need to be readjusted, via step 807. The training of the model is continued with the fourth random noise parameters epsilon 1..epsilon.n.to N (0,I) by continuing to reconfigure the fourth random noise parameters using the known random seeds as shown in line 10 of Table 1.

Step 808, determining second gradient change parameters of different rounds of the initial multi-objective weight search network according to the fourth random noise parameter, the network parameter of the initial multi-objective weight search network and the noise standard deviation parameter.

Through the process shown in Table 2, the network parameters for this task are updated using the support-task _i using the evolutionary strategy algorithm, resulting in the network parameters W' _i. For each task in T _t, a sample set is constructed using query-task _i, and the network parameters w are updated using an evolutionary strategy algorithm. Thus, the trained multi-objective weight search network solves the problem of poor fitting ability due to the lack of labeled samples.

Step 809, updating the network parameters of the initial multi-target weight search network by using the second gradient change parameters of different rounds of the initial multi-target weight search network according to the network parameter updating direction of the initial multi-target weight search network, so as to obtain the first network parameters of the initial multi-target weight search network.

The recommendation effect can be measured by the number and duration of the real watched videos of the user. The multiple target users are highly parallel at the same time, different target users communicate through only one scalar Fi, and the network parameters W can be updated according to the real-time feedback behaviors of the users through the evolution strategy algorithm shown in the table 1, so that personalized and high-accuracy target fusion parameters can be obtained. In the update process, as shown in line 8 of table 2, the network parameters of the update initial multi-objective weight search network can be expressed as: The rotation of the second gradient change parameter can be controlled through the learning rate theta, and the accuracy of the multi-target weight search network can be adjusted.

Since the processing steps shown in fig. 8 can be performed on training samples of a few-sample user, when a model-agnostic meta-learning algorithm is adopted, the meta-learner is responsible for seeking the initialization parameters of the base learner, and the training process is to minimize the meta-loss of the meta-learner on a large number of target tasks (i.e., the base learner). The basic learner is a predictive model used by the target task, the initial parameters are given by the meta learner and then are trained through a small number of gradient iterations, therefore, the meta learner is also required to be trained,

In this embodiment, as shown in table 2, the base learner executes the loop process of 7-9 in table 2 to update the network parameters of the initial multi-objective weight search network, and finally determines the network parameters of the multi-objective weight search network, the meta learner executes the loop process of 10-11 in table 2, and the network parameters of the initial base learner are trained through a small number of gradient iterations, in which process, the parameter update still uses an evolutionary strategy algorithm, specifically, referring to table 2, a query set is configured for the initial multi-objective weight search network based on at least one task number set, the network parameter update direction of the initial multi-objective weight search network is calculated according to the noise standard deviation parameter of the initial multi-objective weight search network, and a fifth random noise parameter is configured for each sample object in the query set according to the noise standard deviation parameter, wherein, as shown in lines 4-5 in table 1, a fifth random noise parameter is required to be configured for each objective user i=1..n, for example, a random noise parameter is required to be configured for each objective user i=1..n, and a random seed table is required to be trained for a plurality of target users in the table 571.

And then, according to the fifth random noise parameter, the network parameter of the initial multi-objective weight search network and the noise standard deviation parameter, determining an adaptability score corresponding to each sample object in the query set through an adaptability function corresponding to the initial multi-objective weight search network, and taking the adaptability score corresponding to the sample object as real object feedback data of the sample object, wherein the real object feedback data of the sample object is shared in all sample objects in the query set, specifically, as shown in the 6 th row in table 1, by calculating the result of the adaptability function F _i＝F(W_t+σε_i), for each objective user i=1..n, the respective corresponding adaptability score can be calculated, and the scalar return value Fi of each sample object can be taken as the real object feedback data of the sample object.

And finally, configuring a sixth random noise parameter for each sample object in the query set, determining third gradient change parameters of different rounds of the initial multi-target weight search network according to the sixth random noise parameter, the network parameters of the initial multi-target weight search network and the noise standard deviation parameter, and updating the network parameters of the initial multi-target weight search network according to the network parameter updating direction of the initial multi-target weight search network and by utilizing the third gradient change parameters of different rounds of the initial multi-target weight search network and the first network parameters of the initial multi-target weight search network to obtain second network parameters of the initial multi-target weight search network. As shown in lines 10-11 of table 2, the meta-learner uses all query-task _i of T _t to search for third gradient change parameters of different rounds of the network with the initial multi-objective weight, and the first network parameters of the network with the initial multi-objective weight, byContinuing to update W _t+1, the updated W _t+1 may continue to be used by the base learner, as shown in Table 2, line 8 The method continues to update the network parameters of the multi-target weight search network until the network parameters of the target weight search network are determined, and the training of the target weight search network is finished.

Therefore, for users with less sample information and new users, the defect of poor fitting capacity of the multi-objective weight search network can be overcome by the method shown in fig. 8, and accurate multimedia information recommendation for the users with less sample information and the new users is realized.

Step 305, the multimedia information recommendation device performs fusion processing on the first score corresponding to each first sub-object feedback data based on the fusion weight parameter corresponding to each first sub-object feedback data to obtain a second score of the target multimedia information, wherein the second score of the target multimedia information is used for recommending the multimedia information to the object to be recommended.

The second score of the target multimedia information is used for representing the order of recommending the multimedia information to the user, for example, the two multimedia information are video advertisements a and B, wherein when the second score of the target multimedia information of a is 1 and the second score of the target multimedia information of B is 2, the current user is indicated to be more interested in the video advertisement B, therefore, according to the second score of the target multimedia information, the advertisement a is replaced by the advertisement B, so that more play flow is configured for the advertisement B, the exposure rate of the advertisement B is improved, and better viewing experience is achieved for the user.

It should be noted that, since the multi-objective weight search network may be set to multi-objective weight search networks with different dimensions (for example, a multi-objective weight search network with a user dimension and a multi-objective weight search network with a video dimension to be recommended may be set), at least 2 multi-objective weight search networks may be overlapped and used to adapt to different scene requirements. The method comprises the steps of determining the number of multi-target weight search networks according to the type of a multimedia information recommendation environment, determining fusion weight parameters corresponding to the two multi-target weight search networks when the number of the multi-target weight search networks is two, wherein a first fusion weight parameter corresponds to a behavior information dimension of a first object, a second fusion weight parameter corresponds to a multimedia information dimension, and determining a corresponding second score based on the first fusion weight parameter and the second fusion weight parameter. The method comprises the steps of acquiring multimedia information to be recommended in a multimedia information data source, processing different types of sub-information included in the multimedia information to be recommended through a multi-target weight search network, determining identification embedded features, text embedded features and statistical features matched with the multimedia information to be recommended, determining the attribute features of the multimedia information to be recommended corresponding to the multimedia information to be recommended through conversion processing of attribute types of the multimedia information to be recommended, determining the attribute features of users corresponding to the multimedia information to be recommended, performing feature fusion processing on the identification embedded features, the text embedded features, the statistical features, the attribute features of the multimedia information and the attribute features of the users, and determining fused weight information through the multi-target weight search network based on the result of the feature fusion processing.

Taking a video news information recommending environment in a short video playing interface as an example, the multimedia information recommending method provided by the embodiment of the present invention is described, where fig. 9A is an effect schematic diagram of the multimedia information recommending method in the embodiment of the present invention, where, as shown in fig. 9A, the video news information playing interface may be displayed in a corresponding APP or may be triggered by an instant messaging client applet (the multi-objective weight search network may be trained and then packaged in the corresponding APP or stored in a plug-in form in the instant messaging client applet, where the recommending environment is a recommendation of news information), and as short video application products continuously develop and increase, the bearing capacity of video news video information is far greater than that of text information, and video news information may be continuously recommended to a user through the corresponding application program. Such as a "see at a glance" portal included in the discovery page of the instant messaging client application, or an audio recommendation portal for an audio application, or a video recommendation portal for a video application, or a live recommendation portal for a live application, etc. When the target terminal runs the target application according to the user operation and controls the target application to display an application page comprising a trigger entry for triggering and starting the recommended content display page, the target terminal can detect the trigger operation of the trigger entry. When a triggering operation corresponding to the triggering entrance is generated, a recommendation request is sent to a server, after the recommendation content fed back by the server in response to the recommendation request is received, a corresponding fusion weight parameter is determined by utilizing a multi-target weight search network through the multi-media information recommendation method provided by the invention, second scores of different target multi-media information are calculated, the recommendation content is displayed according to the second scores of the target multi-media information on a recommendation content display page, as shown in fig. 9A, for an object to be recommended (i.e. a short video watching user) 1, the second score of the short video A is 1, the second score of the short video B is 2, therefore, firstly, the short video recommended to the user 1 is B, and as shown in fig. 9B, for the object to be recommended 2, the second score of the short video A is 2, therefore, firstly, the short video recommended to the user is A, as shown in fig. 9A, the short video is accurately recommended to the different objects to be recommended, and meanwhile, as shown in fig. 9B, the short video is 2, as a new video is registered as a new recommendation target for the user, and the short video is a new user 1, and the short video is a new user is recommended to the user 1, and the short video is a new user is registered in the short video recommendation method according to the invention, and the time is 8.

The invention has the following beneficial technical effects:

The method comprises the steps of obtaining first object feedback data aiming at target multimedia information, wherein the first object feedback data comprise at least two first sub-object feedback data of different feedback types, predicting to obtain first scores corresponding to the first sub-object feedback data based on the first object feedback data, determining fusion weight parameters corresponding to the first sub-object feedback data based on first object feature data of an object to be recommended through a multi-target weight search network, wherein the multi-target weight search network is obtained by updating network parameters of an initial multi-target weight search network based on second object feature data of the object to be recommended and real object feedback data of a sample object aiming at sample multimedia information in an evolutionary learning mode, enabling the real object feedback data aiming at the sample multimedia information to be the feedback data of the sample object aiming at the recommended sample multimedia information, processing the first scores corresponding to the first sub-object feedback data based on the fusion weight parameters corresponding to the first sub-object feedback data, and obtaining the second scores of the sample multimedia information to be recommended to the target multimedia information. Therefore, the fusion processing of the first scores corresponding to the feedback data of each first sub-object can be realized, the second scores of the target multimedia information are obtained, the second scores are used for accurately recommending the multimedia information to the object to be recommended, the effects of stronger generalization capability and data processing capability of the multi-target weight search network and lower hardware cost are achieved, the multi-target weight search network is conveniently deployed in the mobile terminal, and the large-scale application of the multi-target weight search network is realized.

The foregoing description of the embodiments of the invention is not intended to limit the scope of the invention, but is intended to cover any modifications, equivalents, and alternatives falling within the spirit and principles of the invention.

Claims

1. A multimedia information recommendation method, the method comprising:

2. The method according to claim 1, wherein before determining, based on the first object feature data of the object to be recommended, the fusion weight parameter corresponding to each of the first sub-object feedback data through the multi-objective weight search network, the method further comprises:

Acquiring a training sample set, wherein the training sample set comprises real object feedback data of the sample object aiming at sample multimedia information;

And updating network parameters of the initial multi-target weight search network by using the training sample set through an evolution strategy algorithm, and obtaining the multi-target weight search network through the updated network parameters.

3. The method of claim 2, wherein updating network parameters of the initial multi-objective weight search network by an evolutionary strategy algorithm using the training sample set comprises:

Configuring a first random noise parameter for each sample object in the training sample set according to the noise standard deviation parameter;

According to the first random noise parameter, the network parameter of the initial multi-target weight search network and the noise standard deviation parameter, determining the fitness score corresponding to each sample object through the fitness function corresponding to the initial multi-target weight search network, and taking the fitness score corresponding to the sample object as real object feedback data of the sample object;

sharing real object feedback data of the sample objects in all sample objects of the training sample set;

configuring a second random noise parameter for each sample object in the training sample set;

Determining first gradient change parameters of different rounds of the initial multi-target weight search network according to the second random noise parameter, the network parameter of the initial multi-target weight search network and the noise standard deviation parameter;

And updating the network parameters of the initial multi-target weight search network by using the first gradient change parameters of different rounds of the initial multi-target weight search network according to the network parameter updating direction of the initial multi-target weight search network.

4. The method according to claim 2, wherein updating network parameters of the initial multi-objective weight search network by using the training sample set through an evolutionary strategy algorithm, and obtaining the multi-objective weight search network by using the updated network parameters comprises:

And updating network parameters of the initial multi-target weight search network by using the training sample set through a meta-learning framework based on an evolutionary strategy algorithm, and obtaining the multi-target weight search network based on the updated network parameters.

5. The method of claim 4, wherein updating network parameters of the initial multi-objective weight search network using the training sample set by means of meta-learning using an evolutionary strategy algorithm comprises:

extracting first training sample sets of target task numbers from the training sample sets, wherein each first training sample set corresponds to one sample object;

6. The method of claim 5, wherein the method further comprises:

Searching a network configuration query set for the initial multi-objective weight based on at least one task number set;

7. A multimedia information recommendation apparatus, the apparatus comprising:

8. A computer program product comprising a computer program or instructions which, when executed by a processor, implements the multimedia information recommendation method of any one of claims 1 to 6.

9. An electronic device, the electronic device comprising:

a memory for storing executable instructions;

a processor for implementing the multimedia information recommendation method of any one of claims 1 to 6 when executing the executable instructions stored in the memory.

10. A computer readable storage medium storing executable instructions which when executed by a processor implement the multimedia information recommendation method of any one of claims 1-6.