CN115982453A - Content recommendation method and device based on feature engineering, electronic equipment and medium - Google Patents
Content recommendation method and device based on feature engineering, electronic equipment and medium Download PDFInfo
- Publication number
- CN115982453A CN115982453A CN202211633788.0A CN202211633788A CN115982453A CN 115982453 A CN115982453 A CN 115982453A CN 202211633788 A CN202211633788 A CN 202211633788A CN 115982453 A CN115982453 A CN 115982453A
- Authority
- CN
- China
- Prior art keywords
- feature
- content
- target
- information
- click
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Landscapes
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
Description
技术领域technical field
本申请涉及人工智能技术领域,尤其涉及一种基于特征工程的内容推荐方法和装置、电子设备、介质。The present application relates to the technical field of artificial intelligence, in particular to a feature engineering-based content recommendation method and device, electronic equipment, and media.
背景技术Background technique
特征工程是将原始数据转化成更好的表示业务逻辑的特征的一个过程。现阶段,基于特征工程的内容推荐一般包括:获取原始数据,然后将原始数据进行特征编码,再通过离线开发的预测模型对编码后的数据进行点击预测,得到预测结果,根据预测结果进行内容推荐。在线上的内容推荐中,线上数据的实时性较高,但离线开发的预测模型由离线数据训练得到,离线数据的实时性较低。由于线上数据与离线数据在实时性上的不一致,从而当离线开发的预测模型应用于线上部署时,影响了推荐内容的准确率。因此,如何提供一种基于特征工程的内容推荐方法,能够提高推荐内容的准确率,成为了亟待解决的技术问题。Feature engineering is a process of transforming raw data into features that better represent business logic. At this stage, content recommendation based on feature engineering generally includes: obtaining original data, then performing feature encoding on the original data, and then performing click prediction on the encoded data through an offline-developed prediction model to obtain the prediction result, and recommending content based on the prediction result . In online content recommendation, the real-time performance of online data is high, but the prediction model developed offline is trained from offline data, and the real-time performance of offline data is low. Due to the real-time inconsistency between online data and offline data, when the prediction model developed offline is applied to online deployment, the accuracy of recommended content is affected. Therefore, how to provide a content recommendation method based on feature engineering, which can improve the accuracy of recommended content, has become an urgent technical problem to be solved.
发明内容Contents of the invention
本申请实施例的主要目的在于提出一种基于特征工程的内容推荐方法和装置、电子设备、介质,能够提高推荐内容的准确率。The main purpose of the embodiments of the present application is to provide a content recommendation method and device, electronic equipment, and media based on feature engineering, which can improve the accuracy of recommended content.
为实现上述目的,本申请实施例的第一方面提出了一种基于特征工程的内容推荐方法,所述方法包括:In order to achieve the above purpose, the first aspect of the embodiment of the present application proposes a content recommendation method based on feature engineering, the method includes:
获取当前点击数据,所述当前点击数据包括点击对象的当前对象信息、当前点击内容信息、当前点击位置信息;Acquiring current click data, the current click data includes current object information of the clicked object, current click content information, and current click position information;
从预设的配置文件读取特征工程配置信息和特征抽取信息;Read feature engineering configuration information and feature extraction information from a preset configuration file;
根据所述特征工程配置信息对所述当前点击位置信息进行特征编码处理,得到当前位置特征;performing feature encoding processing on the current click position information according to the feature engineering configuration information to obtain the current position feature;
根据所述当前对象信息从预设的对象编码文件读取目标对象特征;Reading target object characteristics from a preset object encoding file according to the current object information;
根据所述当前点击内容信息从预设的内容编码文件读取目标内容特征;Reading target content characteristics from a preset content encoding file according to the currently clicked content information;
根据所述特征抽取信息对所述当前位置特征、所述目标对象特征和所述目标内容特征进行特征抽取,得到目标特征编码数据;performing feature extraction on the current location feature, the target object feature, and the target content feature according to the feature extraction information to obtain target feature encoding data;
对所述目标特征编码数据进行解码处理,得到目标对象、目标展示内容和目标展示位置;其中,所述目标对象包括所述点击对象,或者非所述点击对象;Decoding the target feature encoding data to obtain a target object, target display content, and target display position; wherein, the target object includes the click object, or is not the click object;
通过预设的预测模型对所述目标特征编码数据进行点击预测,得到目标点击得分;所述目标点击得分用于表征:处于所述目标展示位置的所述目标展示内容被所述目标对象点击的概率;Click prediction is performed on the target feature coding data through a preset prediction model to obtain a target click score; the target click score is used to represent: the target display content at the target display position is clicked by the target object probability;
根据所述目标点击得分和所述目标展示位置将所述目标展示内容推荐给所述目标对象。Recommending the target display content to the target object according to the target click score and the target display position.
在一些实施例,所述当前点击数据还包括:当前点击时间,在所述通过预设的预测模型对所述目标特征编码数据进行点击预测,得到目标点击得分之后,所述方法还包括:In some embodiments, the current click data further includes: the current click time, and after performing click prediction on the target feature encoding data through a preset prediction model to obtain a target click score, the method further includes:
根据所述特征工程配置信息对所述当前对象信息进行特征编码处理,得到当前对象特征;performing feature encoding processing on the current object information according to the feature engineering configuration information to obtain the current object features;
根据所述特征工程配置信息对所述当前点击内容信息进行特征编码处理,得到当前内容特征;performing feature encoding processing on the currently clicked content information according to the feature engineering configuration information to obtain current content features;
根据所述特征抽取信息对所述当前对象特征、所述当前内容特征和所述当前位置特征进行特征抽取,得到当前特征编码数据;performing feature extraction on the current object feature, the current content feature, and the current location feature according to the feature extraction information to obtain current feature encoding data;
若所述目标特征编码数据与所述当前特征编码数据相同,则获取所述目标特征编码数据的展示时间;If the target characteristic encoding data is the same as the current characteristic encoding data, then obtain the display time of the target characteristic encoding data;
若所述目标特征编码数据与所述当前特征编码数据相同,则获取所述目标特征编码数据的展示时间;If the target characteristic encoding data is the same as the current characteristic encoding data, then obtain the display time of the target characteristic encoding data;
对所述当前点击时间和所述展示时间进行差值计算,得到时间差值;calculating the difference between the current click time and the display time to obtain a time difference;
根据所述时间差值对所述目标点击得分进行调整。The target click score is adjusted according to the time difference.
在一些实施例,在所述通过预设的预测模型对所述目标特征编码数据进行点击预测,得到目标点击得分之前,所述方法还包括:In some embodiments, before performing click prediction on the target feature encoding data through a preset prediction model to obtain a target click score, the method further includes:
预训练所述预测模型,具体包括:Pre-training the prediction model specifically includes:
获取历史展示数据,并获取所述历史展示数据的原始标签;所述历史展示数据包括历史对象的历史对象信息、历史展示内容信息和历史展示位置信息;Obtain historical display data, and obtain the original label of the historical display data; the historical display data includes historical object information, historical display content information, and historical display position information of historical objects;
根据所述特征工程配置信息对所述历史对象信息进行特征编码处理,得到历史对象特征;performing feature encoding processing on the historical object information according to the feature engineering configuration information to obtain historical object features;
根据所述特征工程配置信息对所述历史展示内容信息进行特征编码处理,得到历史内容特征;performing feature encoding processing on the historical display content information according to the feature engineering configuration information to obtain historical content features;
根据所述特征工程配置信息对所述历史展示位置信息进行特征编码处理,得到历史位置特征;performing feature encoding processing on the historical display location information according to the feature engineering configuration information to obtain historical location features;
根据所述特征抽取信息对所述历史对象特征、历史内容特征和所述历史位置特征进行特征抽取,得到训练特征编码数据;performing feature extraction on the historical object features, historical content features and the historical location features according to the feature extraction information to obtain training feature encoding data;
基于所述原始标签,确定所述训练特征编码数据的原始训练得分;determining a raw training score for the training feature-encoded data based on the raw label;
通过所述预测模型对所述训练特征编码数据进行得分预测,得到目标训练得分;performing score prediction on the training feature encoding data through the prediction model to obtain a target training score;
根据所述原始训练得分和所述目标训练得分进行损失计算,得到损失值;performing loss calculation according to the original training score and the target training score to obtain a loss value;
根据所述损失值对所述预测模型进行参数调整,得到训练后的所述预测模型。The parameters of the prediction model are adjusted according to the loss value to obtain the trained prediction model.
在一些实施例,所述获取历史展示数据,并获取所述历史展示数据的原始标签,包括:In some embodiments, the obtaining the historical display data and obtaining the original label of the historical display data includes:
获取所述历史对象的对象行为日志;Obtaining an object behavior log of the historical object;
对所述对象行为日志进行信息读取,得到所述历史对象信息、所述历史展示内容信息、所述历史展示位置信息;Reading information from the object behavior log to obtain the historical object information, the historical display content information, and the historical display position information;
将所述历史对象信息、所述历史展示内容信息和所述历史展示位置信息进行合并,得到所述历史展示数据;combining the historical object information, the historical display content information and the historical display position information to obtain the historical display data;
对所述对象行为日志进行时间读取,得到所述历史展示数据的内容标注时间,所述内容标注时间包括:内容曝光时间或内容点击时间;Read the time of the object behavior log to obtain the content marking time of the historical display data, the content marking time includes: content exposure time or content click time;
根据所述内容标注时间确定所述原始标签。The original tag is determined according to the tagging time of the content.
在一些实施例,在对所述对象行为日志进行读取之前,所述方法还包括:In some embodiments, before reading the object behavior log, the method further includes:
更新所述对象行为日志,具体包括:Update the object behavior log, specifically including:
获取对象数据和对象标识码,所述对象数据用于提供所述历史对象信息;acquiring object data and an object identification code, the object data being used to provide the historical object information;
获取内容数据和内容标识码和所述内容数据用于提供所述历史内容信息;Obtaining content data and content identification codes and using the content data to provide the historical content information;
将所述对象标识码、所述内容标识码记录至所述对象行为日志;Recording the object identification code and the content identification code in the object behavior log;
通过所述对象标识码将所述对象数据关联至所述对象行为日志,并通过所述内容标识码将所述内容数据关联至所述对象行为日志。The object data is associated with the object behavior log through the object identification code, and the content data is associated with the object behavior log through the content identification code.
在一些实施例,所述对象数据包括对象基础信息和对象行为信息,所述内容数据包括内容基础信息和内容统计信息,所述通过所述对象标识码将所述对象数据关联至所述对象行为日志,并通过所述内容标识码将所述内容数据关联至所述对象行为日志,包括:In some embodiments, the object data includes object basic information and object behavior information, the content data includes content basic information and content statistical information, and the object data is associated with the object behavior through the object identification code log, and associate the content data with the object behavior log through the content identification code, including:
对所述对象基础信息和所述对象行为信息建立映射关系,得到对象宽表;Establishing a mapping relationship between the object basic information and the object behavior information to obtain an object wide table;
对所述内容基础信息和所述内容统计信息建立映射关系,得到内容宽表;Establishing a mapping relationship between the content basic information and the content statistical information to obtain a wide content table;
通过对象标识码将所述对象宽表关联至所述对象行为日志,并通过所述内容标识码将所述内容宽表关联至所述对象行为日志。The object wide table is associated with the object behavior log through the object identification code, and the content wide table is associated with the object behavior log through the content identification code.
在一些实施例,所述从预设的配置文件读取特征工程配置信息和特征抽取信息,包括:In some embodiments, the reading feature engineering configuration information and feature extraction information from a preset configuration file includes:
获取当前特征配置版本号;Get the current feature configuration version number;
根据所述当前特征配置版本号从所述配置文件读取所述特征工程配置信息和所述特征抽取信息。Reading the feature engineering configuration information and the feature extraction information from the configuration file according to the current feature configuration version number.
为实现上述目的,本申请实施例的第二方面提出了一种基于特征工程的内容推荐装置,所述装置包括:In order to achieve the above purpose, the second aspect of the embodiment of the present application proposes a content recommendation device based on feature engineering, the device includes:
点击数据获取模块,用于获取当前点击数据,所述当前点击数据包括点击对象的当前对象信息、当前点击内容信息、当前点击位置信息;The click data acquisition module is used to acquire the current click data, the current click data includes the current object information of the click object, the current click content information, and the current click position information;
配置信息获取模块,用于从预设的配置文件读取特征工程配置信息和特征抽取信息;The configuration information acquisition module is used to read feature engineering configuration information and feature extraction information from a preset configuration file;
位置特征确定模块,用于根据所述特征工程配置信息对所述当前点击位置信息进行特征编码处理,得到当前位置特征;A position feature determination module, configured to perform feature encoding processing on the current click position information according to the feature engineering configuration information to obtain the current position feature;
对象特征确定模块,用于根据所述当前对象信息从预设的对象编码文件读取目标对象特征;An object feature determination module, configured to read target object features from a preset object encoding file according to the current object information;
内容特征确定模块,用于根据所述当前点击内容信息从预设的内容编码文件读取目标内容特征;A content feature determination module, configured to read target content features from a preset content encoding file according to the currently clicked content information;
特征抽取模块,用于根据所述特征抽取信息对所述当前位置特征、所述目标对象特征和所述目标内容特征进行特征抽取,得到目标特征编码数据;A feature extraction module, configured to perform feature extraction on the current location feature, the target object feature, and the target content feature according to the feature extraction information, to obtain target feature encoding data;
特征解码模块,用于对所述目标特征编码数据进行解码处理,得到目标对象、目标展示内容和目标展示位置;其中,所述目标对象包括所述点击对象,或者非所述点击对象;A feature decoding module, configured to decode the target feature encoding data to obtain a target object, target display content, and target display position; wherein, the target object includes the click object, or is not the click object;
模型预测模块,用于通过预设的预测模型对所述目标特征编码数据进行点击预测,得到目标点击得分;所述目标点击得分用于表征:处于所述目标展示位置的所述目标展示内容被所述目标对象点击的概率;A model prediction module, configured to perform click prediction on the target feature encoding data through a preset prediction model to obtain a target click score; the target click score is used to represent: the target display content at the target display position is The probability of the target object being clicked;
内容推荐模块,用于根据所述目标点击得分和所述目标展示位置将所述目标展示内容推荐给所述目标对象。A content recommendation module, configured to recommend the target display content to the target object according to the target click score and the target display position.
为实现上述目的,本申请实施例的第三方面提出了一种电子设备,所述电子设备包括存储器和处理器,所述存储器存储有计算机程序,所述处理器执行所述计算机程序时实现上述第一方面所述的方法。In order to achieve the above purpose, the third aspect of the embodiments of the present application proposes an electronic device, the electronic device includes a memory and a processor, the memory stores a computer program, and the processor implements the above-mentioned computer program when executing the computer program. The method described in the first aspect.
为实现上述目的,本申请实施例的第四方面提出了一种存储介质,所述存储介质为计算机可读存储介质,所述存储介质存储有计算机程序,所述计算机程序被处理器执行时实现上述第一方面所述的方法。In order to achieve the above purpose, the fourth aspect of the embodiments of the present application proposes a storage medium, the storage medium is a computer-readable storage medium, the storage medium stores a computer program, and the computer program is implemented when the computer program is executed by a processor. The method described in the first aspect above.
本申请提出的基于特征工程的内容推荐方法和装置、电子设备、介质,通过获取对象信息、内容信息和位置信息等多维度信息进行特征编码和特征抽取,得到目标特征编码数据,该目标特征编码数据既包括位置特征,又包括对象特征和内容特征,这样一来,预测模型在对目标特征编码数据进行点击预测时,增加了特征的维度,结合了位置特征、对象特征和内容特征,使得预测得到的目标点击得分更加准确,从而根据目标点击得分进行内容推荐的准确率更高。The feature engineering-based content recommendation method and device, electronic equipment, and media proposed in this application perform feature encoding and feature extraction by acquiring multi-dimensional information such as object information, content information, and location information to obtain target feature encoding data. The data includes not only location features, but also object features and content features. In this way, when the prediction model performs click prediction on the target feature encoding data, the dimension of the features is increased, and the location features, object features and content features are combined to make the prediction The obtained target click score is more accurate, so that the accuracy of content recommendation based on the target click score is higher.
附图说明Description of drawings
图1是本申请实施例提供的基于特征工程的内容推荐方法的流程图;FIG. 1 is a flowchart of a feature engineering-based content recommendation method provided in an embodiment of the present application;
图2是本申请另一实施例提供的基于特征工程的内容推荐方法的流程图;FIG. 2 is a flowchart of a content recommendation method based on feature engineering provided by another embodiment of the present application;
图3是图2中的步骤S201的流程图;Fig. 3 is the flowchart of step S201 in Fig. 2;
图4是本申请另一实施例提供的基于特征工程的内容推荐方法的流程图;FIG. 4 is a flowchart of a content recommendation method based on feature engineering provided by another embodiment of the present application;
图5是图4中的步骤S404的流程图;Fig. 5 is the flowchart of step S404 in Fig. 4;
图6是本申请另一实施例提供的基于特征工程的内容推荐方法的流程图;FIG. 6 is a flowchart of a content recommendation method based on feature engineering provided by another embodiment of the present application;
图7是本申请实施例提供的基于特征工程的内容推荐装置的模块结构框图;FIG. 7 is a block diagram of a module structure of a content recommendation device based on feature engineering provided by an embodiment of the present application;
图8是本申请实施例提供的电子设备的硬件结构示意图。FIG. 8 is a schematic diagram of a hardware structure of an electronic device provided by an embodiment of the present application.
具体实施方式Detailed ways
为了使本申请的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处所描述的具体实施例仅用以解释本申请,并不用于限定本申请。In order to make the purpose, technical solution and advantages of the present application clearer, the present application will be further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the present application, not to limit the present application.
需要说明的是,虽然在装置示意图中进行了功能模块划分,在流程图中示出了逻辑顺序,但是在某些情况下,可以以不同于装置中的模块划分,或流程图中的顺序执行所示出或描述的步骤。说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。It should be noted that although the functional modules are divided in the schematic diagram of the device, and the logical sequence is shown in the flowchart, in some cases, it can be executed in a different order than the module division in the device or the flowchart in the flowchart. steps shown or described. The terms "first", "second" and the like in the specification and claims and the above drawings are used to distinguish similar objects, and not necessarily used to describe a specific sequence or sequence.
除非另有定义,本文所使用的所有的技术和科学术语与属于本申请的技术领域的技术人员通常理解的含义相同。本文中所使用的术语只是为了描述本申请实施例的目的,不是旨在限制本申请。Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the technical field to which this application belongs. The terms used herein are only for the purpose of describing the embodiments of the present application, and are not intended to limit the present application.
首先,对本申请中涉及的若干名词进行解析:First, analyze some nouns involved in this application:
人工智能(artificial intelligence,AI):是研究、开发用于模拟、延伸和扩展人的智能的理论、方法、技术及应用系统的一门新的技术科学;人工智能是计算机科学的一个分支,人工智能企图了解智能的实质,并生产出一种新的能以人类智能相似的方式做出反应的智能机器,该领域的研究包括机器人、语言识别、图像识别、自然语言处理和专家系统等。人工智能可以对人的意识、思维的信息过程的模拟。人工智能还是利用数字计算机或者数字计算机控制的机器模拟、延伸和扩展人的智能,感知环境、获取知识并使用知识获得最佳结果的理论、方法、技术及应用系统。Artificial Intelligence (AI): It is a new technical science that studies and develops theories, methods, technologies and application systems for simulating, extending and expanding human intelligence; artificial intelligence is a branch of computer science. Intelligence attempts to understand the essence of intelligence and produce a new intelligent machine that can respond in a manner similar to human intelligence. Research in this field includes robotics, language recognition, image recognition, natural language processing, and expert systems. Artificial intelligence can simulate the information process of human consciousness and thinking. Artificial intelligence is also a theory, method, technology and application system that uses digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use knowledge to obtain the best results.
特征工程是将原始数据转化成更好的表示业务逻辑的特征的一个过程,使得将这些特征运用到预测模型中能提高对不可见数据的模型预测精度和提高机器学习的性能。例如在搜索、广告、推荐等领域,数据决定了效果的上限,算法只能决定逼近上限的程度,而特征工程则是数据与算法之间的桥梁。特征工程对业务效果具有很大的影响。主要原因是特征工程能够显著提升模型性能,大大简化模型复杂度,降低模型的维护成本,高质量的特征在简单的线性模型上也能表现出不错的效果。Feature engineering is a process of transforming raw data into features that better represent business logic, so that applying these features to predictive models can improve model prediction accuracy for unseen data and improve machine learning performance. For example, in the fields of search, advertising, recommendation, etc., the data determines the upper limit of the effect, and the algorithm can only determine the degree of approaching the upper limit, while feature engineering is the bridge between data and algorithms. Feature engineering has a big impact on business outcomes. The main reason is that feature engineering can significantly improve model performance, greatly simplify model complexity, and reduce model maintenance costs. High-quality features can also show good results on simple linear models.
现阶段,基于特征工程的内容推荐一般包括:获取线上数据,然后将线上数据进行特征编码,再通过离线开发的预测模型对编码后的数据进行点击预测,得到预测结果,根据预测结果进行内容推荐。在线上的内容推荐中,线上数据的实时性较高,但离线开发的预测模型由离线数据训练得到,离线数据的实时性较低。由于线上数据与离线数据在实时性上的不一致,从而当离线开发的预测模型应用于线上部署时,影响了推荐内容的准确率。因此,如何提供一种基于特征工程的内容推荐方法,能够提高推荐内容的准确率,成为了亟待解决的技术问题。At this stage, content recommendation based on feature engineering generally includes: obtaining online data, then performing feature encoding on the online data, and then performing click prediction on the encoded data through an offline-developed prediction model to obtain the prediction result, and based on the prediction result. Content recommendation. In online content recommendation, the real-time performance of online data is high, but the prediction model developed offline is trained from offline data, and the real-time performance of offline data is low. Due to the real-time inconsistency between online data and offline data, when the prediction model developed offline is applied to online deployment, the accuracy of recommended content is affected. Therefore, how to provide a content recommendation method based on feature engineering, which can improve the accuracy of recommended content, has become an urgent technical problem to be solved.
基于此,本申请实施例的主要目的在于提出基于特征工程的内容推荐方法和基于特征工程的内容推荐装置、电子设备、存储介质,通过获取对象信息、内容信息和位置信息等多维度信息进行特征抽取,得到目标特征编码数据,该目标特征编码数据既包括高实时性的位置特征,又包括实时性相对较低的对象特征和内容特征,这样一来,预测模型在对目标特征编码数据进行预测时,既能结合实时性高的线上特征,又能兼顾实时性相对较低的离线特征,使得预测结果(目标点击得分)更加准确,从而根据目标点击得分进行内容推荐的准确率更高。另外,在本申请实施例的内容推荐中,不仅考虑了目标点击得分,还考虑了目标展示位置,有助于提高目标对象对目标展示内容的点击率。Based on this, the main purpose of the embodiment of the present application is to propose a content recommendation method based on feature engineering and a content recommendation device, electronic device, and storage medium based on feature engineering, and perform feature Extraction to obtain target feature coding data, the target feature coding data includes not only high real-time location features, but also relatively low real-time object features and content features, so that the prediction model is predicting the target feature coding data In this case, it can not only combine the high real-time online features, but also take into account the relatively low real-time offline features, so that the prediction result (target click score) is more accurate, so that the accuracy of content recommendation based on the target click score is higher. In addition, in the content recommendation in the embodiment of the present application, not only the target click score but also the target display position are considered, which helps to improve the click rate of the target object on the target display content.
本申请实施例提供的基于特征工程的内容推荐方法应用于服务器端中,还可以是运行于终端或服务器端中的软件。服务器端可以配置成独立的物理服务器,也可以配置成多个物理服务器构成的服务器集群或者分布式系统,还可以配置成提供云服务、云数据库、云计算、云函数、云存储、网络服务、云通信、中间件服务、域名服务、安全服务、CDN以及大数据和人工智能平台等基础云计算服务的云服务器;软件可以是实现基于特征工程的内容推荐方法的应用等,但并不局限于以上形式。The content recommendation method based on feature engineering provided in the embodiment of the present application is applied to the server, and may also be software running on the terminal or the server. The server side can be configured as an independent physical server, or as a server cluster or distributed system composed of multiple physical servers, and can also be configured to provide cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, Cloud servers for basic cloud computing services such as cloud communications, middleware services, domain name services, security services, CDN, and big data and artificial intelligence platforms; software can be applications that implement content recommendation methods based on feature engineering, but are not limited to above form.
本申请可用于众多通用或专用的计算机系统环境或配置中。例如:服务器计算机、多处理器系统、基于微处理器的系统、置顶盒、可编程的消费电子设备、网络PC、包括以上任何系统或设备的分布式计算环境等等。本申请可以在由计算机执行的计算机可执行指令的一般上下文中描述,例如程序模块。一般地,程序模块包括执行特定任务或实现特定抽象数据类型的例程、程序、对象、组件、数据结构等等。也可以在分布式计算环境中实践本申请,在这些分布式计算环境中,由通过通信网络而被连接的远程处理设备来执行任务。在分布式计算环境中,程序模块可以位于包括存储设备在内的本地和远程计算机存储介质中。The application can be used in numerous general purpose or special purpose computer system environments or configurations. For example: server computers, multiprocessor systems, microprocessor-based systems, set-top boxes, programmable consumer electronics devices, network PCs, distributed computing environments including any of the above systems or devices, and the like. This application may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The application may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including storage devices.
本申请实施例提供基于特征工程的内容推荐方法和基于特征工程的内容推荐装置、电子设备、存储介质,具体通过如下实施例进行说明,首先描述本申请实施例中的基于特征工程的内容推荐方法。The embodiment of the present application provides a content recommendation method based on feature engineering and a content recommendation device, electronic device, and storage medium based on feature engineering. Specifically, the following embodiments will be used to describe the content recommendation method based on feature engineering in the embodiment of the present application. .
需要说明的是,在本申请的各个具体实施方式中,当涉及到需要根据用户信息、用户行为数据,用户历史数据以及用户位置信息等与用户身份或特性相关的数据进行相关处理时,都会先获得用户的许可或者同意,而且,对这些数据的收集、使用和处理等,都会遵守相关国家和地区的相关法律法规和标准。此外,当本申请实施例需要获取用户的敏感个人信息时,会通过弹窗或者跳转到确认页面等方式获得用户的单独许可或者单独同意,在明确获得用户的单独许可或者单独同意之后,再获取用于使本申请实施例能够正常运行的必要的用户相关数据。It should be noted that, in each specific implementation of the present application, when it comes to related processing based on user information, user behavior data, user history data, user location information and other data related to user identity or characteristics, all will first Obtain the user's permission or consent, and the collection, use and processing of these data will comply with the relevant laws, regulations and standards of the relevant countries and regions. In addition, when the embodiment of this application needs to obtain the user's sensitive personal information, the user's separate permission or separate consent will be obtained through a pop-up window or jump to a confirmation page, etc. After the user's separate permission or separate consent is clearly obtained, the Obtain necessary user-related data for the normal operation of this embodiment of the application.
图1是本申请实施例提供的基于特征工程的内容推荐方法的一个可选的流程图,可以包括但不限于包括步骤S101至步骤S109。Fig. 1 is an optional flow chart of the content recommendation method based on feature engineering provided by the embodiment of the present application, which may include but not limited to steps S101 to S109.
步骤S101,获取当前点击数据,当前点击数据包括点击对象的当前对象信息、当前点击内容信息、当前点击位置信息;Step S101, obtaining current click data, which includes current object information of the clicked object, current click content information, and current click position information;
步骤S102,从预设的配置文件读取特征工程配置信息和特征抽取信息;Step S102, reading feature engineering configuration information and feature extraction information from a preset configuration file;
步骤S103,根据特征工程配置信息对当前点击位置信息进行特征编码处理,得到当前位置特征;Step S103, performing feature encoding processing on the current click position information according to the feature engineering configuration information to obtain the current position feature;
步骤S104,根据当前对象信息从预设的对象编码文件读取目标对象特征;Step S104, read the target object characteristics from the preset object encoding file according to the current object information;
步骤S105,根据当前点击内容信息从预设的内容编码文件读取目标内容特征;Step S105, read the target content characteristics from the preset content encoding file according to the currently clicked content information;
步骤S106,根据特征抽取信息对当前位置特征、目标对象特征和目标内容特征进行特征抽取,得到目标特征编码数据;Step S106, perform feature extraction on the current location feature, target object feature and target content feature according to the feature extraction information, to obtain target feature encoding data;
步骤S107,对目标特征编码数据进行解码处理,得到目标对象、目标展示内容和目标展示位置;其中,目标对象包括点击对象,或者非点击对象;Step S107, decoding the target feature coded data to obtain the target object, target display content and target display position; wherein, the target object includes a click object or a non-click object;
步骤S108,通过预设的预测模型对目标特征编码数据进行点击预测,得到目标点击得分;目标点击得分用于表征:处于目标展示位置的目标展示内容被目标对象点击的概率;In step S108, click prediction is performed on the target feature coding data through a preset prediction model to obtain a target click score; the target click score is used to represent: the probability that the target display content at the target display position is clicked by the target object;
步骤S109,根据目标点击得分和目标展示位置将目标展示内容推荐给目标对象。Step S109, recommending the target display content to the target object according to the target click score and the target display position.
本申请实施例所示意的步骤S101至步骤S109,通过获取对象信息、内容信息和位置信息等多维度信息进行特征编码和特征抽取,得到目标特征编码数据,该目标特征编码数据包括位置特征,又包括对象特征和内容特征。由于位置特征由点击对象进行点击得到,实时性较高,而对象特征和内容特征从编码文件中获得,实时性相对较低。这样一来,预测模型在对目标特征编码数据进行点击预测时,既能结合实时性高的特征,又能兼顾实时性相对较低的特征,使得预测结果(目标点击得分)更加准确,从而根据目标点击得分进行内容推荐的准确率更高。另外,在本申请实施例的内容推荐中,不仅考虑了目标点击得分,还考虑了目标展示位置,有助于提高目标对象对目标展示内容的点击率。From steps S101 to S109 shown in the embodiment of the present application, by acquiring multi-dimensional information such as object information, content information, and location information, feature encoding and feature extraction are performed to obtain target feature encoding data, the target feature encoding data includes location features, and Including object characteristics and content characteristics. Because the location feature is obtained by clicking on the object, the real-time performance is high, while the object feature and content feature are obtained from the encoded file, and the real-time performance is relatively low. In this way, when the prediction model performs click prediction on the target feature encoding data, it can not only combine the features with high real-time performance, but also take into account the features with relatively low real-time performance, so that the prediction result (target click score) is more accurate. The accuracy rate of content recommendation is higher when the target click score is used. In addition, in the content recommendation in the embodiment of the present application, not only the target click score but also the target display position are considered, which helps to improve the click rate of the target object on the target display content.
在一些实施例的步骤S101中,可从客户端获取当前点击数据。以视频推荐为例,将被点击过的视频的情况记录至点击日志中。因此,在一示例中,获取客户端的点击日志,从点击日志中读取点击对象的当前对象信息、当前点击内容信息和当前点击位置信息,将当前对象信息、当前点击内容信息和当前点击位置信息进行合并,得到当前点击数据。In step S101 of some embodiments, current click data may be obtained from the client. Taking video recommendation as an example, record the clicked videos in the click log. Therefore, in an example, the click log of the client is obtained, the current object information, current click content information and current click position information of the click object are read from the click log, and the current object information, current click content information and current click position information Merge to get the current click data.
在另一示例中,获取客户端的当前浏览页面;根据当前浏览页面的登录账号得到对象标识码;根据对象标识码从预设的对象数据库读取当前对象信息;接收点击信号;响应于点击信号,从浏览页面得到当前点击内容信息和当前点击位置信息;将当前对象信息、当前点击内容信息和当前点击位置信息进行合并,得到当前点击数据。In another example, obtain the current browsing page of the client; obtain the object identification code according to the login account of the current browsing page; read the current object information from the preset object database according to the object identification code; receive a click signal; respond to the click signal, Obtain current click content information and current click position information from the browsing page; combine current object information, current click content information and current click position information to obtain current click data.
需要说明的是,本申请实施例提及点击对象、历史对象和目标对象。以上三个对象均是指用户,但可能是不同的用户。例如,客户端1对应一个对象A,客户端2对应一个对象B。若对象A在客户端1的浏览页面上进行实时点击,则对象A为点击对象,则本申请实施例可以获取到对象A的当前点击数据。由于本申请实施例将通过当前对象信息从预设的对象编码文件读取目标对象特征,而在目标对象特征可能是对象A的对象特征,也可能是对象B的对象特征,因此,根据目标对象特征得到的目标对象,可能是点击对象,或者是非点击对象的其他对象。而历史对象则是出现在本申请的预训练模型阶段,无论对象A或对象B是否对浏览页面进行点击,均可为历史对象。It should be noted that the embodiments of the present application refer to click objects, history objects and target objects. The above three objects all refer to users, but may be different users. For example, client 1 corresponds to an object A, and client 2 corresponds to an object B. If the object A clicks in real time on the browsing page of the client 1, then the object A is the clicked object, and the current click data of the object A can be acquired in this embodiment of the present application. Since the embodiment of the present application will use the current object information to read the target object feature from the preset object encoding file, and the target object feature may be the object feature of object A or the object feature of object B, therefore, according to the target object The target object obtained by the feature may be the clicked object, or other objects other than the clicked object. The historical object appears in the pre-training model stage of this application, regardless of whether object A or object B clicks on the browsing page, it can be a historical object.
当前对象信息与历史对象信息均可指对象数据。对象数据一般包括对象基础信息和对象行为信息。对象基础信息包括性别、年龄、当前城市等。对象行为信息包括点击的资讯列表、收藏的资讯列表、点赞的资讯列表等。Both current object information and historical object information may refer to object data. Object data generally includes object basic information and object behavior information. The basic information of the object includes gender, age, current city, etc. The object behavior information includes the news list clicked, the news list favorited, the news list liked, etc.
当前点击内容信息和历史展示内容信息均可指内容数据,内容数据包括内容基础信息和内容统计信息;内容基础信息包括一/二级类目、发布者、关键词等。内容统计信息包括收藏量、点赞量、评论量等。Both current click content information and historical display content information refer to content data, and content data includes content basic information and content statistical information; content basic information includes primary/secondary categories, publishers, keywords, etc. Content statistics include favorites, likes, comments, etc.
当前点击位置信息是指:在点击对象对内容进行点击时,该内容在浏览页面的展示位置。以视频推荐为例,客户端在浏览页面上同时展示多条视频,每条视频在浏览页面有对应的展示位置。浏览页面会随着对象的操作更换展示的视频,也就是视频在展示过程中展示位置会随着对象的操作进行更改。当接收到对象的点击信号,则响应于点击信号,根据点击的视频所处于的展示位置得到当前点击位置。The current click position information refers to the display position of the content on the browsing page when the click object clicks on the content. Taking video recommendation as an example, the client displays multiple videos on the browsing page at the same time, and each video has a corresponding display position on the browsing page. The browsing page will change the displayed video with the operation of the object, that is, the display position of the video will change with the operation of the object during the display process. When the click signal of the object is received, the current click position is obtained according to the display position of the clicked video in response to the click signal.
在一些实施例的步骤S102中,事先在配置文件中设置特征工程配置信息和特征抽取信息。特征工程配置信息主要是对数据进行特征编码。例如,根据特征工程配置信息对当前点击位置信息进行特征编码处理,得到当前位置特征。In step S102 of some embodiments, feature engineering configuration information and feature extraction information are set in a configuration file in advance. The feature engineering configuration information is mainly to encode the features of the data. For example, the feature encoding process is performed on the current click position information according to the feature engineering configuration information to obtain the current position feature.
在一示例中,步骤S102可以包括:获取当前特征配置版本号;根据当前特征配置版本号从配置文件读取特征工程配置信息和特征抽取信息。In an example, step S102 may include: acquiring a current feature configuration version number; reading feature engineering configuration information and feature extraction information from a configuration file according to the current feature configuration version number.
具体地,特征配置文件:特征工程配置信息和特征抽取信息可配置,且在离线开发阶段和线上实时推荐阶段是同一个特征配置文件。能够解决需要频繁更新项目代码,且上线发版周期较长的问题,有效提升特征迭代效率。另外,不同特征迭代实验,特征版本可配置,做到特征版本可控。Specifically, feature configuration file: feature engineering configuration information and feature extraction information are configurable, and the feature configuration file is the same in the offline development stage and the online real-time recommendation stage. It can solve the problem that the project code needs to be updated frequently, and the release cycle is long, and effectively improve the efficiency of feature iteration. In addition, different feature iterative experiments and feature versions can be configured to make feature versions controllable.
需要说明的是,本申请实施例的配置文件在离线开发阶段和线上实时推荐阶段是同一个。由于离线数据和线上数据存在不同,从配置文件获取到的特征工程配置信息可能无法处理线上数据。例如,在客户端的浏览页面中,若浏览页面M一次展示三条视频,则视频的展示位置可分为上、中和下三个展示位置。若点击对象对位置为“上”的视频进行点击,则当前位置信息则为“上”。若浏览页面N一次展示五条视频,则视频的展示位置可分为上、中上、中、中下和下五个展示位置。若点击对象对位置为“中上”的视频进行点击,则当前位置信息则为“中上”。可以从中发现,从不同的浏览页面上可能会获取到不同的位置信息,且浏览页面M与浏览页面N的位置信息无法等同处理。It should be noted that the configuration file in the embodiment of the present application is the same in the offline development stage and the online real-time recommendation stage. Due to the difference between offline data and online data, the feature engineering configuration information obtained from the configuration file may not be able to process online data. For example, in the browsing page of the client, if the browsing page M displays three videos at a time, the display positions of the videos can be divided into three display positions: upper, middle and lower. If the click object clicks on the video whose position is "up", then the current position information is "up". If the browsing page N displays five videos at a time, the display positions of the videos can be divided into five display positions: upper, upper middle, middle, lower middle, and lower. If the click object clicks on the video whose position is "upper middle", then the current position information is "upper middle". It can be found that different location information may be obtained from different browsing pages, and the location information of browsing page M and browsing page N cannot be treated equally.
因此,在一示例中,在步骤S103之前,获取映射配置信息,通过映射配置信息对当前位置信息进行格式调整,得到调整后的当前位置信息。再根据特征工程配置信息对调整后的当前位置信息进行特征编码,得到当前位置特征。具体地,在进行格式调整后,浏览页面的“中上”将对应为“上”或“中”均可。Therefore, in an example, before step S103, the mapping configuration information is obtained, and the format of the current location information is adjusted through the mapping configuration information to obtain the adjusted current location information. Then perform feature encoding on the adjusted current position information according to the feature engineering configuration information to obtain the current position feature. Specifically, after format adjustment, the "upper middle" of the browsing page will correspond to either "upper" or "middle".
在另一示例中,对格式调整的结果进行检测,若结果为格式无法调整,则获取上一次点击的历史位置信息,再根据特征工程配置信息对历史位置信息进行特征编码,得到当前位置特征。另外,若结果为格式无法调整,向服务器发送提醒信息,提醒信息用于反馈特征工程配置信息无法正常编码的情况,以使得开发人员对配置文件进行更新或优化。In another example, the result of format adjustment is detected, and if the result is that the format cannot be adjusted, the historical location information of the last click is obtained, and then feature encoding is performed on the historical location information according to the feature engineering configuration information to obtain the current location feature. In addition, if the result is that the format cannot be adjusted, a reminder message is sent to the server. The reminder message is used to feed back the fact that the feature engineering configuration information cannot be encoded normally, so that developers can update or optimize the configuration file.
在一些实施例的步骤S103中,根据特征工程配置信息对当前位置信息进行特征编码,得到当前位置特征。特征工程配置信息包括:待处理信息和特征处理方法的匹配信息。例如,特征工程配置信息包括:对象信息,对象特征处理方法;内容信息,内容特征处理方法;位置信息,位置特征处理方法等。因此,在一示例中,根据当前位置信息从特征工程配置信息读取出位置特征处理方法,通过位置特征处理方法对当前位置信息进行特征处理,得到当前位置特征。In step S103 of some embodiments, feature encoding is performed on the current location information according to the feature engineering configuration information to obtain the current location feature. The feature engineering configuration information includes: information to be processed and matching information of feature processing methods. For example, feature engineering configuration information includes: object information, object feature processing method; content information, content feature processing method; location information, location feature processing method, etc. Therefore, in an example, the location feature processing method is read from the feature engineering configuration information according to the current location information, and the current location information is subjected to feature processing through the location feature processing method to obtain the current location feature.
在一些实施例的步骤S104中,根据当前对象信息从预设的对象编码文件读取目标对象特征。具体地,对象编码文件包括性别特征、年龄特征、学历特征和资讯点击序列特征等多个候选对象特征。当前对象信息包括性别信息、年龄信息和学历信息等。根据当前对象信息对多个候选对象特征进行筛选,得到目标对象特征;其中,目标对象特征包括至少一个候选对象特征。例如,若当前对象信息包括性别信息和年龄信息,则根据性别信息从对象编码文件中读取性别特征,并根据年龄信息从对象编码文件中读取年龄特征,将性别特征和年龄特征进行拼接,得到目标对象特征。In step S104 of some embodiments, the characteristics of the target object are read from a preset object encoding file according to the current object information. Specifically, the object encoding file includes multiple candidate object characteristics such as gender characteristics, age characteristics, educational background characteristics, and information click sequence characteristics. The current object information includes gender information, age information, educational background information, and the like. A plurality of candidate object features are screened according to the current object information to obtain target object features; wherein the target object features include at least one candidate object feature. For example, if the current object information includes gender information and age information, read the gender characteristics from the object encoding file according to the gender information, and read the age characteristics from the object encoding file according to the age information, and splice the gender characteristics and age characteristics, Get the characteristics of the target object.
在一些实施例的步骤S105中,根据当前点击内容信息从预设的内容编码文件读取目标内容特征。具体地,内容编码文件包括:一/二级类目特征、关键词特征、点击量特征、收藏量特征、点赞量特征、评论量特征等多个候选内容特征。当前内容信息包括一/二级类目信息、关键词信息、点击量信息、收藏量信息、点赞量信息、评论量信息等。根据当前内容信息对多个候选内容特征进行筛选,得到目标内容特征;其中,目标内容特征包括至少一个候选内容特征。例如,若当前内容信息包括关键词信息和评论量信息,则根据关键词信息从内容编码文件读取关键词特征,并根据评论量信息从内容编码文件中读取评论量特征,将关键词特征和评论量特征进行拼接,得到目标内容特征。In step S105 of some embodiments, the target content feature is read from a preset content encoding file according to the currently clicked content information. Specifically, the content coding file includes multiple candidate content features such as primary/secondary category features, keyword features, click volume features, collection volume features, like volume features, and comment volume features. The current content information includes primary/secondary category information, keyword information, click volume information, collection volume information, like volume information, comment volume information, etc. A plurality of candidate content features are screened according to the current content information to obtain a target content feature; wherein, the target content feature includes at least one candidate content feature. For example, if the current content information includes keyword information and comment amount information, then read the keyword features from the content encoding file according to the keyword information, and read the comment amount features from the content encoding file according to the comment amount information, and convert the keyword features Splicing with the comment quantity feature to get the target content feature.
在一些实施例的步骤S106中,根据特征抽取信息对当前位置特征、目标对象特征和目标内容特征进行特征抽取,得到目标特征编码数据。具体地,特征抽取信息主要是对多个特征进行特征抽取,该特征抽取又称特征合并或特征拼接。例如,根据特征抽取信息对当前位置特征、目标对象特征和目标内容特征进行特征拼接,得到目标特征编码数据。In step S106 of some embodiments, feature extraction is performed on the current location feature, the target object feature and the target content feature according to the feature extraction information to obtain target feature encoding data. Specifically, the feature extraction information is mainly to perform feature extraction on multiple features, and this feature extraction is also called feature merging or feature splicing. For example, according to the feature extraction information, the current location feature, the target object feature and the target content feature are feature spliced to obtain the target feature encoding data.
在一示例中,在进行特征拼接之前,可对当前位置特征、目标对象特征和目标内容特征进行特征维度处理,使得当前位置特征、目标对象特征和目标内容特征都具有相同的特征维度。In an example, before performing feature splicing, feature dimension processing may be performed on the current location feature, target object feature, and target content feature, so that the current location feature, target object feature, and target content feature all have the same feature dimension.
在另一示例中,步骤S106具体包括:获取目标内容特征的特征类别,得到原始特征类别;若原始特征类别为连续类别,则对目标内容特征进行离散化处理,得到目标离散内容特征;对当前位置特征、目标对象特征和目标离散内容特征进行合并,得到目标特征编码数据。In another example, step S106 specifically includes: obtaining the feature category of the target content feature to obtain the original feature category; if the original feature category is a continuous category, performing discretization processing on the target content feature to obtain the target discrete content feature; The location features, target object features and target discrete content features are combined to obtain target feature encoding data.
例如,目标内容特征中的点赞量特征属于连续特征,模型预测中需要对点赞量特征进行离散化才能得到更好的效果,故该点赞量特征需要离散化处理,如log(点赞量特征*10+1)。For example, the number of likes feature in the target content feature is a continuous feature. In the model prediction, it is necessary to discretize the number of likes feature to get better results. Therefore, the number of likes feature needs to be discretized, such as log(likes Quantitative features*10+1).
在一些实施例的步骤S107中,对目标特征编码数据进行解码处理,得到目标对象、目标展示内容和目标展示位置;其中,目标对象包括点击对象,或者非点击对象。具体地,对目标特征编码数据进行特征分离,得到待解码对象特征、待解码内容特征和待解码位置特征;分别对待解码对象特征、待解码内容特征和待解码位置特征进行解码处理,得到目标对象、目标展示内容和目标展示位置。需要说明的是,待解码对象特征与目标对象特征并不完全相同,待解码内容特征与目标内容特征并不完全相同,待解码位置特征与当前位置特征并不完全相同。由于当前位置特征、目标内容特征和目标对象特征之间并无直接联系,若没有通过特征抽取信息进行特征抽取,而是直接进行特征解码和候选步骤的模型预测,可能会影响预测模型进行点击预测的准确率。In step S107 of some embodiments, the target characteristic encoding data is decoded to obtain the target object, the target display content and the target display position; wherein, the target object includes a click object or a non-click object. Specifically, feature separation is performed on the target feature encoding data to obtain object features to be decoded, content features to be decoded, and location features to be decoded; object features to be decoded, content features to be decoded, and location features to be decoded are respectively decoded to obtain the target object , targeted display content, and targeted placement. It should be noted that the characteristics of the object to be decoded are not completely the same as the characteristics of the target object, the characteristics of the content to be decoded are not completely the same as the characteristics of the target content, and the characteristics of the location to be decoded are not completely the same as the characteristics of the current location. Since there is no direct connection between the current location features, target content features, and target object features, if feature extraction is not performed through feature extraction information, but feature decoding and model prediction of candidate steps are directly performed, it may affect the prediction model for click prediction the accuracy rate.
在一示例中,对待解码对象特征进行解码处理,得到目标对象,具体包括:In an example, the features of the object to be decoded are decoded to obtain the target object, which specifically includes:
对待解码对象特征进行解码处理,得到目标对象信息;目标对象信息包括以下至少一个:性别信息、年龄信息和学历信息;Perform decoding processing on the characteristics of the object to be decoded to obtain target object information; the target object information includes at least one of the following: gender information, age information, and education information;
根据目标对象信息从预设的对象集筛选出目标对象。Filter out the target object from the preset object set according to the target object information.
具体地,若目标对象信息为性别信息,则从对象集筛选出性别信息一致的目标对象。若目标对象信息为年龄信息,则根据年龄信息确定年龄区间,且从对象集筛选出符合年龄区间的目标对象。Specifically, if the target object information is gender information, target objects with consistent gender information are screened out from the object set. If the target object information is age information, an age interval is determined according to the age information, and target objects meeting the age interval are screened out from the object set.
在另一示例中,对待解码内容特征进行解码处理,得到目标展示内容,具体包括:In another example, the features of the content to be decoded are decoded to obtain the target display content, which specifically includes:
对待解码内容特征进行解码处理,得到目标内容信息;目标内容信息包括以下至少一个:关键词信息、点赞量信息;Perform decoding processing on the characteristics of the content to be decoded to obtain target content information; the target content information includes at least one of the following: keyword information, number of likes information;
根据目标内容信息从预设的内容集筛选出目标展示内容。The target display content is filtered out from the preset content set according to the target content information.
具体地,若目标内容信息包括关键词信息,则从内容集筛选出涵盖关键词信息的目标展示内容。若目标内容信息包括点赞量信息,则根据点赞量信息确定点赞量阈值,且从内容集筛选出大于点赞量阈值的目标展示内容。Specifically, if the target content information includes keyword information, the target display content including the keyword information is screened out from the content set. If the target content information includes the number of likes, the likes threshold is determined according to the likes information, and the target display content greater than the likes threshold is screened out from the content set.
需要说明的是,本申请实施例的预测模型可以是逻辑回归模型(LogisticRegression,LR),还可以是在线学习模型(Follow the Regularized Leader,FTRL)。It should be noted that the prediction model in this embodiment of the present application may be a logistic regression model (LogisticRegression, LR), or an online learning model (Follow the Regularized Leader, FTRL).
请参阅图2,在另一些实施例中,在步骤S108之前,基于特征工程的内容推荐方法还包括:预训练预测模型,具体包括但不限于包括步骤S201至步骤S207:Please refer to FIG. 2. In some other embodiments, before step S108, the content recommendation method based on feature engineering further includes: pre-training the prediction model, specifically including but not limited to steps S201 to S207:
步骤S201,获取历史展示数据,并获取历史展示数据的原始标签;历史展示数据包括历史对象的历史对象信息、历史展示内容信息和历史展示位置信息;Step S201, obtaining historical display data, and obtaining the original label of the historical display data; the historical display data includes historical object information, historical display content information, and historical display location information of historical objects;
步骤S202,根据特征工程配置信息对历史对象信息进行特征编码处理,得到历史对象特征;根据特征工程配置信息对历史展示内容信息进行特征编码处理,得到历史内容特征;根据特征工程配置信息对历史展示位置信息进行特征编码处理,得到历史位置特征;Step S202, perform feature encoding processing on historical object information according to feature engineering configuration information to obtain historical object features; perform feature encoding processing on historical display content information according to feature engineering configuration information to obtain historical content features; The location information is subjected to feature encoding processing to obtain historical location features;
步骤S203,根据特征抽取信息对历史对象特征、历史内容特征和历史位置特征进行特征抽取,得到训练特征编码数据;Step S203, perform feature extraction on historical object features, historical content features and historical location features according to the feature extraction information to obtain training feature encoding data;
步骤S204,基于原始标签,确定训练特征编码数据的原始训练得分;Step S204, based on the original label, determine the original training score of the training feature encoding data;
步骤S205,通过预测模型对训练特征编码数据进行得分预测,得到目标训练得分;Step S205, predicting the score of the training feature coding data through the prediction model to obtain the target training score;
步骤S206,根据原始训练得分和目标训练得分进行损失计算,得到损失值;Step S206, performing loss calculation according to the original training score and the target training score to obtain a loss value;
步骤S207,根据损失值对预测模型进行参数调整,得到训练后的预测模型。Step S207, adjusting the parameters of the prediction model according to the loss value to obtain a trained prediction model.
本申请实施例所示意的步骤S201至步骤S207,通过特征工程配置信息对历史展示数据进行特征编码,得到离线特征,该离线特征包括历史对象特征、历史内容特征和历史位置特征。将离线特征输入预测模型进行点击预测,得到目标训练得分。目标训练得分用于表示处于历史位置的历史内容被历史对象点击的预测概率。原始标签用于表示历史内容是否有被历史对象点击。若是,则原始标签为正标签,若否,则原始标签为负标签。若原始标签为正标签,则原始训练得分较高,若原始标签为负标签,则原始训练得分较低。根据原始训练得分和目标训练得分确定损失值,从而根据损失值对预测模型进行参数调整,得到训练后的预测模型。在一示例中,对原始训练得分和目标训练得分进行差值计算,得到损失值。Steps S201 to S207 illustrated in the embodiment of the present application, feature encoding is performed on historical display data through feature engineering configuration information to obtain offline features, which include historical object features, historical content features, and historical location features. Input the offline features into the prediction model for click prediction, and obtain the target training score. The target training score is used to represent the predicted probability that the historical content at the historical position is clicked by the historical object. The original tag is used to indicate whether the historical content has been clicked by the historical object. If yes, the original label is a positive label, if not, the original label is a negative label. If the original label is positive, the original training score is higher, and if the original label is negative, the original training score is lower. The loss value is determined according to the original training score and the target training score, and the parameters of the prediction model are adjusted according to the loss value to obtain the trained prediction model. In an example, the difference calculation is performed on the original training score and the target training score to obtain a loss value.
在一示例中,原始标签为内容标注时间,内容标注时间包括内容曝光时间或内容点击时间。若内容标注时间为内容曝光时间,将预设的曝光得分作为原始训练得分;若内容标注时间为内容点击时间,则对内容曝光时间和内容点击时间进行差值计算,得到时间系数,根据时间系数和曝光得分的乘积计算得到原始训练得分。内容曝光时间与内容点击时间的差值越大,则时间系数越小。但时间系数至少大于预设系数阈值,预设系数阈值最小为1。In an example, the original tag marks the time of the content, and the time of the content marking includes the exposure time of the content or the click time of the content. If the content tagging time is the content exposure time, use the preset exposure score as the original training score; if the content tagging time is the content click time, then calculate the difference between the content exposure time and the content click time to get the time coefficient, according to the time coefficient The product of the exposure score and the exposure score is calculated to obtain the original training score. The greater the difference between the content exposure time and the content click time, the smaller the time coefficient. However, the time coefficient is at least greater than the preset coefficient threshold, and the preset coefficient threshold is at least 1.
请参阅图3,在另一些实施例中,步骤S201可以包括但不限于包括步骤S301至步骤S305:Please refer to FIG. 3 , in some other embodiments, step S201 may include but not limited to include steps S301 to S305:
步骤S301,获取历史对象的对象行为日志;Step S301, acquiring object behavior logs of historical objects;
步骤S302,对对象行为日志进行信息读取,得到历史对象信息、历史展示内容信息、历史展示位置信息;Step S302, read information from the object behavior log to obtain historical object information, historical display content information, and historical display location information;
步骤S303,将历史对象信息、历史展示内容信息和历史展示位置信息进行合并,得到历史展示数据;Step S303, combining historical object information, historical display content information and historical display location information to obtain historical display data;
步骤S304,对对象行为日志进行时间读取,得到历史展示数据的内容标注时间,内容标注时间包括:内容曝光时间或内容点击时间;Step S304, read the time of the object behavior log to obtain the content marking time of the historical display data, the content marking time includes: content exposure time or content click time;
步骤S305,根据内容标注时间确定原始标签。Step S305, determining the original tag according to the tagged time of the content.
本申请实施例所示意的步骤S301至步骤S305,对象行为日志中存储着历史对象信息、历史展示内容信息、历史展示位置信息和内容标注时间。通过对对象行为日志进行读取,可快速获取到历史展示数据和原始标签。From step S301 to step S305 shown in the embodiment of the present application, the object behavior log stores historical object information, historical display content information, historical display position information and content annotation time. By reading object behavior logs, historical display data and original labels can be quickly obtained.
需要说明的是,除了对象行为日志,历史对象信息可存储于hbase数据库,历史展示内容信息可存储于redis数据库,历史位置信息可存储于redis数据库,内容标注时间可存储于redis数据库。在另一实施例中,步骤S201还可以包括:通过数据接口从hbase数据库和redis数据得到历史对象信息、历史展示内容信息、历史展示位置信息和内容标注时间。It should be noted that, in addition to object behavior logs, historical object information can be stored in the hbase database, historical display content information can be stored in the redis database, historical location information can be stored in the redis database, and content annotation time can be stored in the redis database. In another embodiment, step S201 may further include: obtaining historical object information, historical display content information, historical display location information, and content annotation time from the hbase database and redis data through the data interface.
请参阅图4,在另一些实施例中,在步骤S301之前,基于特征工程的内容推荐方法还包括:更新对象行为日志,具体包括但不限于包括步骤S401至步骤S404:Please refer to FIG. 4 , in some other embodiments, before step S301, the content recommendation method based on feature engineering further includes: updating the object behavior log, specifically including but not limited to steps S401 to S404:
步骤S401,获取对象数据和对象标识码,对象数据用于提供历史对象信息;Step S401, acquiring object data and object identification code, the object data is used to provide historical object information;
步骤S402,获取内容数据和内容标识码,内容数据用于提供历史内容信息;Step S402, acquiring content data and content identification code, the content data is used to provide historical content information;
步骤S403,将对象标识码和内容标识码记录至对象行为日志;Step S403, recording the object identification code and the content identification code in the object behavior log;
步骤S404,通过对象标识码将对象数据关联至对象行为日志,并通过内容标识码将内容数据关联至对象行为日志。Step S404, associate the object data with the object behavior log through the object identification code, and associate the content data with the object behavior log through the content identification code.
本申请实施例所示意的步骤S401至步骤S404,由于对象数据和内容数据的数据量过大,若直接将对象数据和内容数据存储至对象行为日志,会导致对象行为日志的数据量过大,影响对对象行为日志的读取效率。因此,本申请实施例为对象数据设置对象标识码,并为内容数据设置内容标识码,通过标识码进行数据关联的方式,减少了对象行为日志的存储量,且提高了读取效率。From step S401 to step S404 shown in the embodiment of the present application, since the data volume of the object data and content data is too large, if the object data and content data are directly stored in the object behavior log, the data volume of the object behavior log will be too large. Affects the reading efficiency of object behavior logs. Therefore, in the embodiment of the present application, an object identification code is set for object data, and a content identification code is set for content data, and data association is performed through the identification code, which reduces the storage capacity of object behavior logs and improves reading efficiency.
请参阅图5,在另一些实施例中,对象数据包括对象基础信息和对象行为信息,内容数据包括内容基础信息和内容统计信息,步骤S404具体包括但不限于包括步骤S501至步骤S503:Please refer to FIG. 5. In some other embodiments, the object data includes object basic information and object behavior information, and the content data includes content basic information and content statistical information. Step S404 specifically includes but is not limited to steps S501 to S503:
步骤S501,对对象基础信息和对象行为信息建立映射关系,得到对象宽表;Step S501, establishing a mapping relationship between the object basic information and the object behavior information, and obtaining the object wide table;
步骤S502,对内容基础信息和内容统计信息建立映射关系,得到内容宽表;Step S502, establishing a mapping relationship between content basic information and content statistical information to obtain a wide content table;
步骤S503,通过对象标识码将对象宽表关联至对象行为日志,并通过内容标识码将内容宽表关联至对象行为日志。Step S503, associate the object wide table with the object behavior log through the object identification code, and associate the content wide table with the object behavior log through the content identification code.
本申请实施例所示意的步骤S501至步骤S503,用户基础信息包括性别、年龄、当前城市等、用户行为信息包括点击的资讯列表、收藏或点赞的资讯列表等。对象宽表是指将用户基础信息和行为信息关联到一起,形成一张全量的用户画像表。内容基础信息包括一/二级类目、发布者、关键词等、内容统计信息包括收藏量、点赞量、评论量等。内容宽表是指将内容基础信息和内容统计信息关联到一起,形成一张全量的内容画像表。通过为对象数据和内容数据建立宽表,方便对数据进行管理和读取。From step S501 to step S503 shown in the embodiment of this application, user basic information includes gender, age, current city, etc., and user behavior information includes clicked information list, favorited or liked information list, etc. The object wide table refers to the association of user basic information and behavior information to form a full user portrait table. Basic content information includes primary/secondary categories, publishers, keywords, etc. Content statistical information includes favorites, likes, comments, etc. The content wide table refers to the association of content basic information and content statistical information to form a full content portrait table. By establishing wide tables for object data and content data, it is convenient to manage and read data.
在一些实施例的步骤S108中,通过预设的预测模型对目标特征编码数据进行点击预测,得到目标点击得分;目标点击得分用于表征:处于目标展示位置的目标展示内容被目标对象点击的概率。In step S108 of some embodiments, click prediction is performed on the target feature coding data through a preset prediction model to obtain a target click score; the target click score is used to represent: the probability that the target display content at the target display position is clicked by the target object .
请参阅图6,在另一些实施例中,当前点击数据还包括:当前点击时间,在步骤S108之后,基于特征工程的内容推荐方法还可以包括但不限于包括步骤S601至步骤S605:Please refer to FIG. 6. In some other embodiments, the current click data also includes: the current click time. After step S108, the content recommendation method based on feature engineering may also include, but is not limited to, step S601 to step S605:
步骤S601,根据特征工程配置信息对当前对象信息进行特征编码处理,得到当前对象特征;根据特征工程配置信息对当前点击内容信息进行特征编码处理,得到当前内容特征;Step S601, perform feature encoding processing on the current object information according to the feature engineering configuration information to obtain current object features; perform feature encoding processing on the currently clicked content information according to the feature engineering configuration information to obtain current content features;
步骤S602,根据特征抽取信息对当前对象特征、当前内容特征和当前位置特征进行特征抽取,得到当前特征编码数据;Step S602, perform feature extraction on the current object feature, current content feature and current location feature according to the feature extraction information to obtain current feature encoding data;
步骤S603,若目标特征编码数据与当前特征编码数据相同,则获取目标特征编码数据的展示时间;Step S603, if the target characteristic coding data is the same as the current characteristic coding data, obtain the display time of the target characteristic coding data;
步骤S604,对当前点击时间和展示时间进行差值计算,得到时间差值;Step S604, calculating the difference between the current click time and the display time to obtain the time difference;
步骤S605,根据时间差值对目标点击得分进行调整。Step S605, adjusting the target click score according to the time difference.
本申请实施例所示意的步骤S601至步骤S605,若目标特征编码数据与当前特征编码数据相同,则预测模型将会输出较高的目标点击得分,则获取的是点击对象的当前点击数据,又将同样的内容推荐给该点击对象,为了避免同一个目标对象一直收到相同的推荐内容,本申请实施例判断目标特征编码数据与当前特征编码数据是否相同。若不同,则无需对目标点击得分进行调整,直接根据目标点击得分进行内容推荐。若相同,则需下降目标点击得分,以便其他目标点击得分更高的内容被推荐给目标对象。更具体,本申请实施例通过展示时间和当前点击时间的时间差值对目标点击得分进行调整。展示时间与当前点击时间均可通过对客户端的系统计时器中得到。当前点击时间是指在点击对象进行点击时,对应得到的系统时间。而展示时间是指,若将目标展示内容推荐给目标对象进行展示,对应得到的系统时间。可根据时间差值计算调整系数,根据调整系数和目标点击得分的乘积,得到调整后的目标点击得分。时间差值越大,则调整系数越大,调整系数的范围在(0,1)。需要说明的是,若时间差值越大,说明展示时间与当前点击时间相隔较大,此时调整系数越大,对目标点击得分的下降程度更小,即使相同的展示内容再次重复推荐给同一个目标对象也无较大问题。From step S601 to step S605 shown in the embodiment of the present application, if the target feature encoding data is the same as the current feature encoding data, the prediction model will output a higher target click score, then the current click data of the clicked object is obtained, and To recommend the same content to the clicked object, in order to prevent the same target object from receiving the same recommended content all the time, this embodiment of the present application judges whether the target feature code data is the same as the current feature code data. If they are different, there is no need to adjust the target click score, and the content recommendation is performed directly according to the target click score. If they are the same, the target click score needs to be decreased so that other content with higher target click score can be recommended to the target object. More specifically, in the embodiment of the present application, the target click score is adjusted through the time difference between the display time and the current click time. Both the display time and the current click time can be obtained from the system timer of the client. The current click time refers to the corresponding system time when the click object is clicked. The display time refers to the corresponding system time when the target display content is recommended to the target object for display. The adjustment coefficient can be calculated according to the time difference, and the adjusted target click score can be obtained according to the product of the adjustment coefficient and the target click score. The larger the time difference, the larger the adjustment coefficient, and the range of the adjustment coefficient is (0,1). It should be noted that if the time difference is larger, it means that the display time is farther away from the current click time. At this time, the larger the adjustment coefficient, the smaller the decrease in the target click score, even if the same display content is repeatedly recommended to the same There is no major problem with a target audience.
在一些实施例的步骤S109中,根据目标点击得分和目标展示位置将目标展示内容推荐给目标对象。在一示例中,步骤S109具体包括:若目标点击得分大于预设的点击得分阈值,则获取目标对象的浏览页面;若目标点击得分小于或等于点击得分阈值,则降低目标展示位置,并获取目标对象的浏览页面;将目标展示内容显示在浏览页面的目标展示位置上。In step S109 of some embodiments, the target display content is recommended to the target object according to the target click score and the target display position. In an example, step S109 specifically includes: if the target click score is greater than the preset click score threshold, obtaining the browsing page of the target object; if the target click score is less than or equal to the click score threshold, reducing the target display position and obtaining the target The browse page of the object; displaying the target display content on the target display position of the browse page.
需要说明的是,若某个目标对象是新用户,则对该用户的了解过少,进行内容推荐时很可能无法准确捕捉到该目标对象的喜好,导致内容推荐的准确率下降,从而内容的点击率下降,不利于维系该新用户。因此,本申请实施例提出,通过点击对象的当前对象信息对对象编码文件进行读取,以此获取与当前对象信息关联的目标对象特征,目标对象特征可以表示点击对象,也可以表示非点击对象。在目标对象特征表示非点击对象的情况下,该非点击对象可以是与点击对象具有相似对象信息的另一个用户,在此情况下内容推荐的准确率较高。若该用户是一个新用户,即可实现对新用户的内容推荐,有助于维系新用户。It should be noted that if a certain target object is a new user, the knowledge of the user is too little, and the preferences of the target object may not be accurately captured when recommending content, resulting in a decrease in the accuracy of content recommendation, and thus the accuracy of content recommendation. The click-through rate drops, which is not conducive to maintaining the new user. Therefore, the embodiment of the present application proposes to read the object encoding file through the current object information of the clicked object, so as to obtain the target object features associated with the current object information. The target object feature can represent the clicked object or the non-clicked object . In the case that the feature of the target object represents a non-click object, the non-click object may be another user who has similar object information to the click object, and in this case, the accuracy of content recommendation is relatively high. If the user is a new user, the content recommendation for the new user can be implemented, which helps to maintain the new user.
请参阅图7,本申请实施例还提供基于特征工程的内容推荐装置,可以实现上述基于特征工程的内容推荐方法,图7为本申请实施例提供的基于特征工程的内容推荐装置的模块结构框图,该装置包括:点击数据获取模块701、配置信息获取模块702、位置特征确定模块703、对象特征确定模块704、内容特征确定模块705、特征抽取模块706、特征解码模块707、模型预测模块708和内容推荐模块709。其中,点击数据获取模块701用于获取当前点击数据,当前点击数据包括点击对象的当前对象信息、当前点击内容信息、当前点击位置信息;配置信息获取模块702用于从预设的配置文件读取特征工程配置信息和特征抽取信息;位置特征确定模块703用于根据特征工程配置信息对当前点击位置信息进行特征编码处理,得到当前位置特征;对象特征确定模块704用于根据当前对象信息从预设的对象编码文件读取目标对象特征;内容特征确定模块705用于根据当前点击内容信息从预设的内容编码文件读取目标内容特征;特征抽取模块706用于根据特征抽取信息对当前位置特征、目标对象特征和目标内容特征进行特征抽取,得到目标特征编码数据;特征解码模块707用于对目标特征编码数据进行解码处理,得到目标对象、目标展示内容和目标展示位置;其中,目标对象包括点击对象,或者非点击对象;模型预测模块708用于通过预设的预测模型对目标特征编码数据进行点击预测,得到目标点击得分;目标点击得分用于表征:处于目标展示位置的目标展示内容被目标对象点击的概率;内容推荐模块709用于根据目标点击得分和目标展示位置将目标展示内容推荐给目标对象。Please refer to FIG. 7. The embodiment of the present application also provides a content recommendation device based on feature engineering, which can implement the above-mentioned content recommendation method based on feature engineering. FIG. 7 is a block diagram of the module structure of the content recommendation device based on feature engineering provided by the embodiment of the present application. , the device includes: a click
需要说明的是,该基于特征工程的内容推荐装置的具体实施方式与上述基于特征工程的内容推荐方法的具体实施例基本相同,在此不再赘述。It should be noted that the specific implementation of the device for recommending content based on feature engineering is basically the same as the specific embodiment of the method for recommending content based on feature engineering, and will not be repeated here.
本申请实施例还提供了电子设备,电子设备包括:存储器、处理器、存储在存储器上并可在处理器上运行的程序以及用于实现处理器和存储器之间的连接通信的数据总线,程序被处理器执行时实现上述网页性能评估方法。该电子设备可以为包括平板电脑、车载电脑等任意智能终端。The embodiment of the present application also provides an electronic device, the electronic device includes: a memory, a processor, a program stored on the memory and operable on the processor, and a data bus for realizing connection and communication between the processor and the memory, and the program When executed by the processor, the above webpage performance evaluation method is realized. The electronic device may be any intelligent terminal including a tablet computer, a vehicle-mounted computer, and the like.
请参阅图8,图8示意了另一实施例的电子设备的硬件结构,电子设备包括:Please refer to FIG. 8. FIG. 8 illustrates a hardware structure of an electronic device in another embodiment. The electronic device includes:
处理器801,可以采用通用的CPU(Central Processing Unit,中央处理器)、微处理器、应用专用集成电路(Application Specific Integrated Circuit,ASIC)、或者一个或多个集成电路等方式实现,用于执行相关程序,以实现本申请实施例所提供的技术方案;The
存储器802,可以采用只读存储器(Read Only Memory,ROM)、静态存储设备、动态存储设备或者随机存取存储器(Random Access Memory,RAM)等形式实现。存储器802可以存储操作系统和其他应用程序,在通过软件或者固件来实现本说明书实施例所提供的技术方案时,相关的程序代码保存在存储器802中,并由处理器801来调用执行本申请实施例的基于特征工程的内容推荐方法;The
输入/输出接口803,用于实现信息输入及输出;The input/output interface 803 is used to realize information input and output;
通信接口804,用于实现本设备与其他设备的通信交互,可以通过有线方式(例如USB、网线等)实现通信,也可以通过无线方式(例如移动网络、WIFI、蓝牙等)实现通信;The
总线805,在设备的各个组件(例如处理器801、存储器802、输入/输出接口803和通信接口804)之间传输信息;A bus 805, which transmits information between various components of the device (such as a
其中处理器801、存储器802、输入/输出接口803和通信接口804通过总线805实现彼此之间在设备内部的通信连接。The
本申请实施例还提供了存储介质,存储介质为计算机可读存储介质,用于计算机可读存储,存储介质存储有一个或者多个程序,一个或者多个程序可被一个或者多个处理器执行,以实现上述基于特征工程的内容推荐方法。The embodiment of the present application also provides a storage medium, the storage medium is a computer-readable storage medium for computer-readable storage, the storage medium stores one or more programs, and one or more programs can be executed by one or more processors , to implement the above-mentioned content recommendation method based on feature engineering.
存储器作为一种非暂态计算机可读存储介质,可用于存储非暂态软件程序以及非暂态性计算机可执行程序。此外,存储器可以包括高速随机存取存储器,还可以包括非暂态存储器,例如至少一个磁盘存储器件、闪存器件、或其他非暂态固态存储器件。在一些实施方式中,存储器可选包括相对于处理器远程设置的存储器,这些远程存储器可以通过网络连接至该处理器。上述网络的实例包括但不限于互联网、企业内部网、局域网、移动通信网及其组合。As a non-transitory computer-readable storage medium, memory can be used to store non-transitory software programs and non-transitory computer-executable programs. In addition, the memory may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid-state storage devices. In some embodiments, the memory optionally includes memory located remotely from the processor, and these remote memories may be connected to the processor via a network. Examples of the aforementioned networks include, but are not limited to, the Internet, intranets, local area networks, mobile communication networks, and combinations thereof.
本申请实施例提供的基于特征工程的内容推荐方法、基于特征工程的内容推荐装置、电子设备及存储介质,通过获取对象信息、内容信息和位置信息等多维度信息进行特征编码和特征抽取,得到目标特征编码数据,该目标特征编码数据既包括高实时性的位置特征,又包括实时性相对较低的对象特征和内容特征,这样一来,预测模型在对目标特征编码数据进行点击预测时,既能结合实时性高的特征,又能兼顾实时性相对较低的特征,使得预测结果(目标点击得分)更加准确,从而根据目标点击得分进行内容推荐的准确率更高。另外,在本申请实施例的内容推荐中,不仅考虑了目标点击得分,还考虑了目标展示位置,有助于提高目标对象对目标展示内容的点击率。The feature engineering-based content recommendation method, feature engineering-based content recommendation device, electronic equipment, and storage medium provided in the embodiments of the present application perform feature encoding and feature extraction by acquiring multi-dimensional information such as object information, content information, and location information to obtain Target feature coding data, the target feature coding data includes not only high real-time location features, but also relatively low real-time object features and content features, so that when the prediction model performs click prediction on the target feature coding data, It can not only combine high real-time features, but also take into account relatively low real-time features, so that the prediction result (target click score) is more accurate, so that the accuracy of content recommendation based on the target click score is higher. In addition, in the content recommendation in the embodiment of the present application, not only the target click score but also the target display position are considered, which helps to improve the click rate of the target object on the target display content.
本申请实施例描述的实施例是为了更加清楚的说明本申请实施例的技术方案,并不构成对于本申请实施例提供的技术方案的限定,本领域技术人员可知,随着技术的演变和新应用场景的出现,本申请实施例提供的技术方案对于类似的技术问题,同样适用。The embodiments described in the embodiments of the present application are to illustrate the technical solutions of the embodiments of the present application more clearly, and do not constitute a limitation to the technical solutions provided by the embodiments of the present application. Those skilled in the art know that with the evolution of technology and new For the emergence of application scenarios, the technical solutions provided by the embodiments of the present application are also applicable to similar technical problems.
本领域技术人员可以理解的是,图1-6中示出的技术方案并不构成对本申请实施例的限定,可以包括比图示更多或更少的步骤,或者组合某些步骤,或者不同的步骤。Those skilled in the art can understand that the technical solutions shown in Figures 1-6 do not constitute a limitation to the embodiments of the present application, and may include more or fewer steps than those shown in the illustrations, or combine certain steps, or be different A step of.
以上所描述的装置实施例仅仅是示意性的,其中作为分离部件说明的单元可以是或者也可以不是物理上分开的,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。The device embodiments described above are only illustrative, and the units described as separate components may or may not be physically separated, that is, they may be located in one place, or may be distributed to multiple network units. Part or all of the modules can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
本领域普通技术人员可以理解,上文中所公开方法中的全部或某些步骤、系统、设备中的功能模块/单元可以被实施为软件、固件、硬件及其适当的组合。Those of ordinary skill in the art can understand that all or some of the steps in the methods disclosed above, the functional modules/units in the system, and the device can be implemented as software, firmware, hardware, and an appropriate combination thereof.
本申请的说明书及上述附图中的术语“第一”、“第二”、“第三”、“第四”等(如果存在)是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便这里描述的本申请的实施例能够以除了在这里图示或描述的那些以外的顺序实施。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或单元的过程、方法、系统、产品或设备不必限于清楚地列出的那些步骤或单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。The terms "first", "second", "third", "fourth", etc. (if any) in the description of the present application and the above drawings are used to distinguish similar objects and not necessarily to describe specific sequence or sequence. It is to be understood that the data so used are interchangeable under appropriate circumstances such that the embodiments of the application described herein can be practiced in sequences other than those illustrated or described herein. Furthermore, the terms "comprising" and "having", as well as any variations thereof, are intended to cover a non-exclusive inclusion, for example, a process, method, system, product or device comprising a sequence of steps or elements is not necessarily limited to the expressly listed instead, may include other steps or elements not explicitly listed or inherent to the process, method, product or apparatus.
应当理解,在本申请中,“至少一个(项)”是指一个或者多个,“多个”是指两个或两个以上。“和/或”,用于描述关联对象的关联关系,表示可以存在三种关系,例如,“A和/或B”可以表示:只存在A,只存在B以及同时存在A和B三种情况,其中A,B可以是单数或者复数。字符“/”一般表示前后关联对象是一种“或”的关系。“以下至少一项(个)”或其类似表达,是指这些项中的任意组合,包括单项(个)或复数项(个)的任意组合。例如,a,b或c中的至少一项(个),可以表示:a,b,c,“a和b”,“a和c”,“b和c”,或“a和b和c”,其中a,b,c可以是单个,也可以是多个。It should be understood that in this application, "at least one (item)" means one or more, and "multiple" means two or more. "And/or" is used to describe the association relationship of associated objects, indicating that there can be three types of relationships, for example, "A and/or B" can mean: only A exists, only B exists, and A and B exist at the same time , where A and B can be singular or plural. The character "/" generally indicates that the contextual objects are an "or" relationship. "At least one of the following" or similar expressions refer to any combination of these items, including any combination of single or plural items. For example, at least one item (piece) of a, b or c can mean: a, b, c, "a and b", "a and c", "b and c", or "a and b and c ", where a, b, c can be single or multiple.
在本申请所提供的几个实施例中,应该理解到,所揭露的装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。In the several embodiments provided in this application, it should be understood that the disclosed devices and methods may be implemented in other ways. For example, the device embodiments described above are only illustrative. For example, the division of the units is only a logical function division. In actual implementation, there may be other division methods. For example, multiple units or components can be combined or May be integrated into another system, or some features may be ignored, or not implemented. In another point, the mutual coupling or direct coupling or communication connection shown or discussed may be through some interfaces, and the indirect coupling or communication connection of devices or units may be in electrical, mechanical or other forms.
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place, or may be distributed to multiple network units. Part or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。In addition, each functional unit in each embodiment of the present application may be integrated into one processing unit, each unit may exist separately physically, or two or more units may be integrated into one unit. The above-mentioned integrated units can be implemented in the form of hardware or in the form of software functional units.
所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括多指令用以使得一台电子设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(Read-Only Memory,简称ROM)、随机存取存储器(Random Access Memory,简称RAM)、磁碟或者光盘等各种可以存储程序的介质。If the integrated unit is realized in the form of a software function unit and sold or used as an independent product, it can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present application is essentially or part of the contribution to the prior art or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium , including multiple instructions to make an electronic device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods described in the various embodiments of the present application. The aforementioned storage media include: U disk, mobile hard disk, read-only memory (Read-Only Memory, referred to as ROM), random access memory (Random Access Memory, referred to as RAM), magnetic disk or optical disc, etc., which can store programs. medium.
以上参照附图说明了本申请实施例的优选实施例,并非因此局限本申请实施例的权利范围。本领域技术人员不脱离本申请实施例的范围和实质内所作的任何修改、等同替换和改进,均应在本申请实施例的权利范围之内。The preferred embodiments of the embodiments of the present application have been described above with reference to the accompanying drawings, which does not limit the scope of rights of the embodiments of the present application. Any modifications, equivalent replacements and improvements made by those skilled in the art without departing from the scope and essence of the embodiments of the present application shall fall within the scope of rights of the embodiments of the present application.
Claims (10)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202211633788.0A CN115982453A (en) | 2022-12-19 | 2022-12-19 | Content recommendation method and device based on feature engineering, electronic equipment and medium |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202211633788.0A CN115982453A (en) | 2022-12-19 | 2022-12-19 | Content recommendation method and device based on feature engineering, electronic equipment and medium |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| CN115982453A true CN115982453A (en) | 2023-04-18 |
Family
ID=85971607
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202211633788.0A Pending CN115982453A (en) | 2022-12-19 | 2022-12-19 | Content recommendation method and device based on feature engineering, electronic equipment and medium |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN115982453A (en) |
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN119094833A (en) * | 2024-11-05 | 2024-12-06 | 卓望数码技术(深圳)有限公司 | User behavior analysis method, device, equipment and medium based on large-screen TV |
Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN111339355A (en) * | 2020-05-21 | 2020-06-26 | 北京搜狐新媒体信息技术有限公司 | A video recommendation method and system |
| CN111538912A (en) * | 2020-07-07 | 2020-08-14 | 腾讯科技(深圳)有限公司 | Content recommendation method, device, equipment and readable storage medium |
| CN112035747A (en) * | 2020-09-03 | 2020-12-04 | 腾讯科技(深圳)有限公司 | Information recommendation method and device |
| US20220245213A1 (en) * | 2020-04-07 | 2022-08-04 | Tencent Technology (Shenzhen) Company Limited | Content recommendation method and apparatus, electronic device, and storage medium |
-
2022
- 2022-12-19 CN CN202211633788.0A patent/CN115982453A/en active Pending
Patent Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20220245213A1 (en) * | 2020-04-07 | 2022-08-04 | Tencent Technology (Shenzhen) Company Limited | Content recommendation method and apparatus, electronic device, and storage medium |
| CN111339355A (en) * | 2020-05-21 | 2020-06-26 | 北京搜狐新媒体信息技术有限公司 | A video recommendation method and system |
| CN111538912A (en) * | 2020-07-07 | 2020-08-14 | 腾讯科技(深圳)有限公司 | Content recommendation method, device, equipment and readable storage medium |
| CN112035747A (en) * | 2020-09-03 | 2020-12-04 | 腾讯科技(深圳)有限公司 | Information recommendation method and device |
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN119094833A (en) * | 2024-11-05 | 2024-12-06 | 卓望数码技术(深圳)有限公司 | User behavior analysis method, device, equipment and medium based on large-screen TV |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN108009228B (en) | Method, device and storage medium for setting content label | |
| CN111444428B (en) | Information recommendation method and device based on artificial intelligence, electronic equipment and storage medium | |
| US10380249B2 (en) | Predicting future trending topics | |
| CN111767461B (en) | Data processing method and device | |
| US11120093B1 (en) | System and method for providing a content item based on computer vision processing of images | |
| CN111625715B (en) | Information extraction method and device, electronic equipment and storage medium | |
| CN116975340A (en) | Information retrieval method, apparatus, device, program product, and storage medium | |
| CN110598095B (en) | Method, device and storage medium for identifying article containing specified information | |
| CN113806588B (en) | Method and device for searching videos | |
| CN106339507A (en) | Method and device for pushing streaming media message | |
| CN112749326A (en) | Information processing method, information processing device, computer equipment and storage medium | |
| CN114896454B (en) | Short video data recommendation method and system based on label analysis | |
| CN114817692A (en) | Method, Apparatus and Device for Determining Recommendation Object and Computer Storage Medium | |
| TW201931163A (en) | Image search and index building | |
| CN114443904B (en) | Video query method, device, computer equipment and computer readable storage medium | |
| CN111310041A (en) | Image-text publishing method, model training method and device and storage medium | |
| CN112188312A (en) | Method and apparatus for determining video material of news | |
| CN120031611B (en) | Advertisement publishing information management system and method | |
| CN113641855A (en) | Video recommendation method, device, equipment and storage medium | |
| CN110889034A (en) | Data analysis method and data analysis system | |
| CN108563648A (en) | data display method and device, storage medium and electronic device | |
| CN115982453A (en) | Content recommendation method and device based on feature engineering, electronic equipment and medium | |
| CA2932310A1 (en) | System and method for automating information abstraction process for documents | |
| CN114363664A (en) | Method and device for generating video collection title | |
| CN116956117A (en) | Method, device, equipment, storage medium and program product for identifying label |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination |