CN114912009A

CN114912009A - User portrait generation method, device, electronic equipment and computer program medium

Info

Publication number: CN114912009A
Application number: CN202110184908.2A
Authority: CN
Inventors: 刘雨丹; 郝晓波; 葛凯凯; 刘诗万; 林乐宇; 张旭
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2021-02-10
Filing date: 2021-02-10
Publication date: 2022-08-16

Abstract

The embodiment of the application provides a user portrait generation method and device, electronic equipment and a computer program medium, and relates to the technical field of artificial intelligence. The method in the embodiment of the application comprises the following steps: acquiring user characteristic information of a target user; respectively generating a plurality of user characteristic vectors of a target user under a user characteristic dimension based on the user characteristic information; inputting a plurality of user feature vectors into a pre-trained machine learning model; acquiring preference level labels of target objects under the target classification attributes of target users output by a pre-trained machine learning model; and if the acquired preference level label is matched with a preset preference level label, generating a user portrait of the target user based on the target classification attribute. The technical scheme of the embodiment of the application improves the accuracy of the obtained user portrait.

Description

User portrait generation method, apparatus, electronic device and computer program medium

技术领域technical field

本申请涉及计算机技术领域，具体而言，涉及一种用户画像的生成方法、装置、电子设备和计算机程序介质。The present application relates to the field of computer technology, and in particular, to a method, apparatus, electronic device, and computer program medium for generating a user portrait.

背景技术Background technique

随着互联网技术的发展，基于大数据得到用户画像，进而通过用户画像实现多种业务成为互联网的核心技术之一，用户画像，即用户信息结构化与标签化，通过刻画用户的人口属性、社会属性、兴趣偏好等各个维度的数据，对用户各方面的信息进行精准地刻画、分析，挖掘潜在价值。With the development of Internet technology, obtaining user portraits based on big data, and then realizing various businesses through user portraits has become one of the core technologies of the Internet. Attributes, interests and preferences and other dimensions of data, accurately describe and analyze all aspects of user information, and tap potential value.

相关技术中在生成用户的用户画像时，一般通过从用户行为数据中抽取画像标签，对用户行为数据中涉及的画像标签进行统计，按照并根据统计的频次对每个用户的画像标签进行打分，再根据画像标签的打分得到获得用户画像。而对于没有产生行为的用户而言，由于行为数据较少，导致难以基于标签统计获得的用户画像或基于标签统计获得的用户画像的准确性较低，进而影响到根据用户画像进行的相关业务的精准度。In the related art, when a user portrait of a user is generated, the portrait label is generally extracted from the user behavior data, and the portrait labels involved in the user behavior data are counted, and each user's portrait label is scored according to the frequency of statistics. Then, the user portrait is obtained according to the score of the portrait label. For users who have not generated behaviors, due to the lack of behavioral data, it is difficult to obtain user portraits based on tag statistics or the accuracy of user portraits obtained based on tag statistics is low, which in turn affects the performance of related services based on user portraits. precision.

发明内容SUMMARY OF THE INVENTION

本申请的实施例提供了一种用户画像的生成方法、装置、电子设备和计算机程序介质，用于提高获得的用户画像的准确性。The embodiments of the present application provide a method, apparatus, electronic device, and computer program medium for generating a user portrait, which are used to improve the accuracy of the obtained user portrait.

本申请的其他特性和优点将通过下面的详细描述变得显然，或部分地通过本申请的实践而习得。Other features and advantages of the present application will become apparent from the following detailed description, or be learned in part by practice of the present application.

根据本申请实施例的一个方面，提供了一种用户画像的生成方法，包括：获取目标用户的用户特征信息，所述用户特征信息包括所述目标用户的用户属性信息以及历史行为数据；基于所述用户特征信息，分别生成所述目标用户在用户特征维度下的多个用户特征向量，所述用户特征维度是指由所述用户属性信息的类别、以及所述历史行为数据所包含的操作事件类别和操作对象的分类属性构成的特征维度；将所述多个用户特征向量输入至预训练的机器学习模型中，所述预训练的机器学习模型通过包含有样本用户在用户特征维度下的多个用户特征向量以及样本用户对目标分类属性下的目标对象的偏好等级标签；获取所述预训练的机器学习模型输出的目标用户对目标分类属性下的目标对象的偏好等级标签；若获取的偏好等级标签与预设的偏好等级标签相匹配，则基于所述目标分类属性生成所述目标用户的用户画像。According to an aspect of the embodiments of the present application, a method for generating a user portrait is provided, including: acquiring user feature information of a target user, where the user feature information includes user attribute information and historical behavior data of the target user; the user feature information, and respectively generate multiple user feature vectors of the target user under the user feature dimension, where the user feature dimension refers to the category of the user attribute information and the operation events contained in the historical behavior data The feature dimension composed of the category and the classification attribute of the operation object; the multiple user feature vectors are input into the pre-trained machine learning model, and the pre-trained machine learning model contains the sample users under the user feature dimension. a user feature vector and the preference grade label of the sample user for the target object under the target classification attribute; obtain the target user's preference grade label for the target object under the target classification attribute output by the pre-trained machine learning model; if the obtained preference If the grade label matches the preset preference grade label, the user portrait of the target user is generated based on the target classification attribute.

根据本申请实施例的一个方面，提供了一种用户画像的生成装置，包括：第一获取单元，用于获取目标用户的用户特征信息，所述用户特征信息包括所述目标用户的用户属性信息以及历史行为数据；第一生成单元，用于基于所述用户特征信息，分别生成所述目标用户在用户特征维度下的多个用户特征向量，所述用户特征维度是指由所述用户属性信息的类别、以及所述历史行为数据所包含的操作事件类别和操作对象的分类属性构成的特征维度；输入单元，用于将所述多个用户特征向量输入至预训练的机器学习模型中，所述预训练的机器学习模型通过包含有样本用户在用户特征维度下的多个用户特征向量以及样本用户对目标分类属性下的目标对象的偏好等级标签；输出单元，用于获取所述预训练的机器学习模型输出的目标用户对目标分类属性下的目标对象的偏好等级标签；第二生成单元，用于若获取的偏好等级标签与预设的偏好等级标签相匹配，则基于所述目标分类属性生成所述目标用户的用户画像。According to an aspect of the embodiments of the present application, an apparatus for generating a user portrait is provided, including: a first obtaining unit, configured to obtain user feature information of a target user, where the user feature information includes user attribute information of the target user and historical behavior data; the first generating unit is used to respectively generate a plurality of user feature vectors of the target user under the user feature dimension based on the user feature information, and the user feature dimension refers to the user attribute information determined by the user feature information. category, and the feature dimension composed of the operation event category included in the historical behavior data and the classification attribute of the operation object; the input unit is used to input the multiple user feature vectors into the pre-trained machine learning model, the The pre-trained machine learning model contains multiple user feature vectors of the sample user under the user feature dimension and the sample user's preference level label for the target object under the target classification attribute; the output unit is used to obtain the pre-trained data. The preference grade label of the target user for the target object under the target classification attribute output by the machine learning model; the second generating unit is used for if the obtained preference grade label matches the preset preference grade label, based on the target classification attribute A user portrait of the target user is generated.

在本申请的一些实施例中，基于前述方案，所述用户画像的生成装置还包括：第三生成单元，用于分别对所述多个用户特征向量进行转换处理，生成所述多个用户特征向量对应的第一特征向量，所述多个用户特征对应的第一特征向量是同一维度的向量；聚合单元，用于基于预设的所述多个用户特征信息之间的关联关系，对所述多个用户特征向量对应的第一特征向量进行聚合处理，生成聚合后的多个第二特征向量；预测单元，用于基于所述聚合后的多个第二特征向量，预测目标用户对目标分类属性下的目标对象的偏好等级标签，所述多个用户特征对应的第一特征向量是同一维度的向量。In some embodiments of the present application, based on the foregoing solution, the device for generating user portraits further includes: a third generating unit, configured to perform conversion processing on the plurality of user feature vectors respectively to generate the plurality of user features The first feature vector corresponding to the vector, the first feature vector corresponding to the multiple user features is a vector of the same dimension; the aggregation unit is used for the preset association relationship between the multiple user feature information. The first feature vectors corresponding to the plurality of user feature vectors are aggregated to generate a plurality of second feature vectors after aggregation; a prediction unit is used to predict the target user's response to the target based on the plurality of second feature vectors after the aggregation. The preference level label of the target object under the classification attribute, and the first feature vector corresponding to the multiple user features is a vector of the same dimension.

在本申请的一些实施例中，基于前述方案，所述用户画像的生成装置还包括：第二获取单元，用于获取用于对待训练的机器学习模型进行训练的训练集样本数据，所述训练集样本数据中的每条样本数据包括样本用户在用户特征维度下的多个用户特征向量以及样本用户对目标分类属性下的目标对象的偏好等级标签；训练单元，用于通过所述训练集样本数据对待训练的机器学习模型进行训练，得到预训练的机器学习模型。In some embodiments of the present application, based on the foregoing solution, the device for generating user portraits further includes: a second acquiring unit, configured to acquire training set sample data for training the machine learning model to be trained, and the training Each piece of sample data in the set sample data includes multiple user feature vectors of the sample user under the user feature dimension and the sample user's preference level label for the target object under the target classification attribute; the training unit is used to pass the training set sample The data is used to train the machine learning model to be trained to obtain a pre-trained machine learning model.

在本申请的一些实施例中，基于前述方案，第二获取单元被配置为：获取候选用户集中的候选用户的用户特征信息；针对所述候选用户集中的每个候选用户，基于所述候选用户的用户特征信息中的历史行为数据，确定第一类候选用户以及第二类候选用户，所述第一类候选用户是包含目标操作事件的候选用户，所述第二类候选用户是未包含目标操作事件的候选用户，所述目标操作事件是指对所述目标分类属性下的目标对象进行操作的操作事件；基于所述第一类候选用户的用户特征信息，生成所述第一类候选用户在用户特征维度下的多个用户特征向量，并为所述第一类候选用户添加偏好程度高的偏好等级标签，得到正样本用户的样本数据；基于所述第二类候选用户的用户特征信息，生成所述第二类候选用户在用户特征维度下的多个用户特征向量，并为所述第二类候选用户添加偏好程度低的偏好等级标签，得到负样本用户的样本数据；基于所述正样本用户的样本数据以及所述负样本用户的样本数据，得到用于对待训练的机器学习模型进行训练的训练集样本数据。In some embodiments of the present application, based on the foregoing solution, the second obtaining unit is configured to: obtain user feature information of candidate users in the candidate user set; for each candidate user in the candidate user set, based on the candidate user The historical behavior data in the user feature information of the A candidate user of an operation event, the target operation event refers to an operation event that operates the target object under the target classification attribute; based on the user feature information of the first type of candidate user, the first type of candidate user is generated Multiple user feature vectors under the user feature dimension, and add preference level labels with high preference to the first type of candidate users to obtain sample data of positive sample users; based on the user characteristic information of the second type of candidate users , generate a plurality of user feature vectors of the second type of candidate users under the user characteristic dimension, and add a preference level label with a low degree of preference to the second type of candidate users to obtain sample data of negative sample users; based on the The sample data of the positive sample user and the sample data of the negative sample user are used to obtain the training set sample data for training the machine learning model to be trained.

在本申请的一些实施例中，基于前述方案，第二获取单元被配置为：在所述第二类候选用户中，抽取候选用户；基于抽取的候选用户的用户特征信息，生成所述抽取的候选用户在用户特征维度下的多个用户特征向量，并为所述抽取的候选用户添加偏好程度低的偏好等级标签，得到负样本用户的样本数据。In some embodiments of the present application, based on the foregoing solution, the second obtaining unit is configured to: extract candidate users from the second type of candidate users; generate the extracted candidate users based on user feature information of the extracted candidate users Multiple user feature vectors of the candidate users under the user feature dimension, and adding a preference level label with a low degree of preference to the extracted candidate users to obtain sample data of negative sample users.

在本申请的一些实施例中，基于前述方案，第二获取单元被配置为：基于所述第二类候选用户的用户特征信息所包含的历史行为数据的频次，确定所述第二类候选用户的画像密集度，所述画像密集度与所述频次为正相关关系；在画像密集度高于预定的画像密集度阈值的所述第二类候选用户中，以第一比例抽取候选用户，在画像密集度低于或等于预定的画像密集度阈值的所述第二类候选用户中，以第二比例抽取候选用户，所述第一比例大于第二比例。In some embodiments of the present application, based on the foregoing solution, the second acquisition unit is configured to: determine the second type of candidate user based on the frequency of historical behavior data included in the user feature information of the second type of candidate user The portrait density is positively correlated with the frequency; in the second type of candidate users whose portrait density is higher than the predetermined portrait density threshold, the candidate users are extracted at the first ratio, and the Among the candidate users of the second category whose portrait density is lower than or equal to a predetermined portrait density threshold, candidate users are extracted with a second proportion, and the first proportion is greater than the second proportion.

在本申请的一些实施例中，基于前述方案，第二获取单元被配置为：获取配置文件，所述配置文件用于对所获取的用户属性信息的属性类别、以及所获取的历史行为数据的操作事件类别和操作对象的分类属性进行配置；针对候选用户集中的每个候选用户，基于所述配置文件中配置的用户属性信息的类别，获取所述候选用户在所配置的属性类别下的用户属性信息；针对候选用户集中的每个候选用户，基于所述配置文件中配置的操作事件类别和操作对象的分类属性，获取所述候选用户在所述操作事件类别和操作对象的分类属性下的历史行为数据。In some embodiments of the present application, based on the foregoing solution, the second obtaining unit is configured to: obtain a configuration file, where the configuration file is used to identify the attribute category of the obtained user attribute information and the obtained historical behavior data. The operation event category and the classification attribute of the operation object are configured; for each candidate user in the candidate user set, based on the category of the user attribute information configured in the configuration file, the user of the candidate user under the configured attribute category is obtained. Attribute information; for each candidate user in the candidate user set, based on the operation event category and the classification attribute of the operation object configured in the configuration file, obtain the candidate user's information under the operation event category and the classification attribute of the operation object. Historical behavioral data.

根据本申请实施例的一个方面，提供了一种计算机可读介质，其上存储有计算机程序，所述计算机程序被处理器执行时实现如上述实施例中所述的用户画像的生成方法。According to an aspect of the embodiments of the present application, a computer-readable medium is provided, on which a computer program is stored, and when the computer program is executed by a processor, implements the method for generating a user portrait as described in the foregoing embodiments.

根据本申请实施例的一个方面，提供了一种电子设备，包括：一个或多个处理器；存储装置，用于存储一个或多个程序，当所述一个或多个程序被所述一个或多个处理器执行时，使得所述一个或多个处理器实现如上述实施例中所述的用户画像的生成方法。According to an aspect of the embodiments of the present application, an electronic device is provided, including: one or more processors; and a storage device for storing one or more programs, when the one or more programs are stored by the one or more programs When executed by multiple processors, the one or more processors are made to implement the method for generating a user portrait as described in the foregoing embodiments.

根据本申请实施例的一个方面，提供了一种计算机程序产品或计算机程序，该计算机程序产品或计算机程序包括计算机指令，该计算机指令存储在计算机可读存储介质中。计算机设备的处理器从计算机可读存储介质读取该计算机指令，处理器执行该计算机指令，使得该计算机设备执行上述各种可选实施例中提供的用户画像的生成方法。According to one aspect of the embodiments of the present application, there is provided a computer program product or computer program, where the computer program product or computer program includes computer instructions, and the computer instructions are stored in a computer-readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the computer device executes the method for generating a user portrait provided in the above-mentioned various optional embodiments.

在本申请的一些实施例所提供的技术方案中，获取目标用户的用户特征信息，用户特征信息包括目标用户的用户属性信息以及历史行为数据；基于用户特征信息，分别生成目标用户在用户特征维度下的多个用户特征向量，用户特征维度是指由用户属性信息的类别、以及历史行为数据所包含的操作事件类别和操作对象的分类属性构成的特征维度；将多个用户特征向量输入至预训练的机器学习模型中，预训练的机器学习模型通过包含有样本用户在用户特征维度下的多个用户特征向量以及样本用户对目标分类属性下的目标对象的偏好等级标签；获取预训练的机器学习模型输出的目标用户对目标分类属性下的目标对象的偏好等级标签；若获取的偏好等级标签与预设的偏好等级标签相匹配，则基于目标分类属性生成所述目标用户的用户画像。相较于基于标签统计获得用户画像来说，用户特征向量更能全面表征用户喜好，从而可以提高预测用户对特定分类属性下的目标对象的偏好等级标签的准确度，进而方便找出具有特定用户画像的目标用户，以便于提高进行相关业务推荐的准确度。In the technical solutions provided by some embodiments of the present application, user feature information of the target user is obtained, and the user feature information includes the user attribute information and historical behavior data of the target user; based on the user feature information, the user feature dimension of the target user is generated respectively Multiple user feature vectors below, the user feature dimension refers to the feature dimension composed of the category of user attribute information, the operation event category included in the historical behavior data, and the classification attribute of the operation object; In the trained machine learning model, the pre-trained machine learning model obtains the pre-trained machine by including multiple user feature vectors of the sample user under the user feature dimension and the sample user's preference level label for the target object under the target classification attribute; The preference grade label of the target user for the target object under the target classification attribute output by the learning model; if the obtained preference grade label matches the preset preference grade label, a user portrait of the target user is generated based on the target classification attribute. Compared with obtaining user portraits based on tag statistics, user feature vectors can more comprehensively represent user preferences, which can improve the accuracy of predicting users' preference grade labels for target objects under specific classification attributes, thereby facilitating the identification of specific users. The target users of the portrait, in order to improve the accuracy of the relevant business recommendation.

应当理解的是，以上的一般描述和后文的细节描述仅是示例性和解释性的，并不能限制本申请。It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not limiting of the present application.

附图说明Description of drawings

此处的附图被并入说明书中并构成本说明书的一部分，示出了符合本申请的实施例，并与说明书一起用于解释本申请的原理。显而易见地，下面描述中的附图仅仅是本申请的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他的附图。在附图中：The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the application and together with the description serve to explain the principles of the application. Obviously, the drawings in the following description are only some embodiments of the present application, and for those of ordinary skill in the art, other drawings can also be obtained from these drawings without creative effort. In the attached image:

图1示出了可以应用本申请实施例的技术方案的示例性系统架构的示意图。FIG. 1 shows a schematic diagram of an exemplary system architecture to which the technical solutions of the embodiments of the present application can be applied.

图2示出了根据本申请的一个实施例的用户画像的生成方法的流程图。FIG. 2 shows a flowchart of a method for generating a user portrait according to an embodiment of the present application.

图3示出了根据本申请的一个实施例中的用户画像的生成方法的流程图。FIG. 3 shows a flowchart of a method for generating a user portrait according to an embodiment of the present application.

图4示出了根据本申请的一个实施例中的预训练的机器学习模型的结构示意图。FIG. 4 shows a schematic structural diagram of a pre-trained machine learning model according to an embodiment of the present application.

图5示出了根据本申请的一个实施例中的预训练的机器学习模型中的胶囊层的具体结构示意图。FIG. 5 shows a schematic diagram of a specific structure of a capsule layer in a pre-trained machine learning model according to an embodiment of the present application.

图6示出了根据本申请的一个实施例中的用户画像的生成方法的流程图。FIG. 6 shows a flowchart of a method for generating a user portrait according to an embodiment of the present application.

图7示出了根据本申请的一个实施例的用户画像的生成方法的步骤S610的具体流程图。FIG. 7 shows a specific flowchart of step S610 of the method for generating a user portrait according to an embodiment of the present application.

图8示出了根据本申请的一个实施例的用户画像的生成方法的步骤S710的具体流程图。FIG. 8 shows a specific flowchart of step S710 of the method for generating a user portrait according to an embodiment of the present application.

图9示出了根据本申请的一个实施例的用户画像的生成方法的步骤S740的具体流程图。FIG. 9 shows a specific flowchart of step S740 of the method for generating a user portrait according to an embodiment of the present application.

图10示出了根据本申请的一个实施例的用户画像的生成方法的步骤S910的具体流程图。FIG. 10 shows a specific flowchart of step S910 of the method for generating a user portrait according to an embodiment of the present application.

图11示出了根据本申请的一个实施例的用户画像的生成装置的框图。FIG. 11 shows a block diagram of an apparatus for generating a user portrait according to an embodiment of the present application.

图12示出了适于用来实现本申请实施例的电子设备的计算机系统的结构示意图。FIG. 12 shows a schematic structural diagram of a computer system suitable for implementing the electronic device according to the embodiment of the present application.

具体实施方式Detailed ways

现在将参考附图更全面地描述示例实施方式。然而，示例实施方式能够以多种形式实施，且不应被理解为限于在此阐述的范例；相反，提供这些实施方式使得本申请将更加全面和完整，并将示例实施方式的构思全面地传达给本领域的技术人员。Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments, however, can be embodied in various forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this application will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art.

此外，所描述的特征、结构或特性可以以任何合适的方式结合在一个或更多实施例中。在下面的描述中，提供许多具体细节从而给出对本申请的实施例的充分理解。然而，本领域技术人员将意识到，可以实践本申请的技术方案而没有特定细节中的一个或更多，或者可以采用其它的方法、组元、装置、步骤等。在其它情况下，不详细示出或描述公知方法、装置、实现或者操作以避免模糊本申请的各方面。Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided in order to give a thorough understanding of the embodiments of the present application. However, those skilled in the art will appreciate that the technical solutions of the present application may be practiced without one or more of the specific details, or other methods, components, devices, steps, etc. may be employed. In other instances, well-known methods, devices, implementations, or operations have not been shown or described in detail to avoid obscuring aspects of the present application.

附图中所示的方框图仅仅是功能实体，不一定必须与物理上独立的实体相对应。即，可以采用软件形式来实现这些功能实体，或在一个或多个硬件模块或集成电路中实现这些功能实体，或在不同网络和/或处理器装置和/或微控制器装置中实现这些功能实体。The block diagrams shown in the figures are merely functional entities and do not necessarily necessarily correspond to physically separate entities. That is, these functional entities may be implemented in software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor devices and/or microcontroller devices entity.

附图中所示的流程图仅是示例性说明，不是必须包括所有的内容和操作/步骤，也不是必须按所描述的顺序执行。例如，有的操作/步骤还可以分解，而有的操作/步骤可以合并或部分合并，因此实际执行的顺序有可能根据实际情况改变。The flowcharts shown in the figures are only exemplary illustrations and do not necessarily include all contents and operations/steps, nor do they have to be performed in the order described. For example, some operations/steps can be decomposed, and some operations/steps can be combined or partially combined, so the actual execution order may be changed according to the actual situation.

人工智能(Artificial Intelligence，AI)：是利用数字计算机或者数字计算机控制的机器模拟、延伸和扩展人的智能，感知环境、获取知识并使用知识获得最佳结果的理论、方法、技术及应用系统。换句话说，人工智能是计算机科学的一个综合技术，它企图了解智能的实质，并生产出一种新的能以人类智能相似的方式做出反应的智能机器。人工智能也就是研究各种智能机器的设计原理与实现方法，使机器具有感知、推理与决策的功能。Artificial Intelligence (AI): It is a theory, method, technology and application system that uses digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new kind of intelligent machine that can respond in a similar way to human intelligence. Artificial intelligence is to study the design principles and implementation methods of various intelligent machines, so that the machines have the functions of perception, reasoning and decision-making.

人工智能技术是一门综合学科，涉及领域广泛，既有硬件层面的技术也有软件层面的技术。人工智能基础技术一般包括如传感器、专用人工智能芯片、云计算、分布式存储、大数据处理技术、操作/交互系统、机电一体化等技术。人工智能软件技术主要包括计算机视觉技术、语音处理技术、自然语言处理技术以及机器学习/深度学习等几大方向。比如，在本申请实施例中，通过人工智能技术确定目标用户在用户特征维度下的多个用户特征向量，并基于目标用户在用户特征维度下的多个用户特征向量，确定目标用户对目标分类属性下的目标对象的偏好等级标签，进而目标用户对目标分类属性下的目标对象的偏好等级标签，生成目标用户的用户画像。Artificial intelligence technology is a comprehensive discipline, involving a wide range of fields, including both hardware-level technology and software-level technology. The basic technologies of artificial intelligence generally include technologies such as sensors, special artificial intelligence chips, cloud computing, distributed storage, big data processing technology, operation/interaction systems, and mechatronics. Artificial intelligence software technology mainly includes computer vision technology, speech processing technology, natural language processing technology, and machine learning/deep learning. For example, in the embodiment of the present application, the artificial intelligence technology is used to determine multiple user feature vectors of the target user under the user feature dimension, and based on the multiple user feature vectors of the target user under the user feature dimension, the target user is determined to classify the target The preference grade label of the target object under the attribute, and then the preference grade label of the target user to the target object under the target classification attribute, and the user portrait of the target user is generated.

用户画像：用户画像是根据用户社会属性、生活习惯和消费行为等信息抽象出的一个标签化的用户模型。构建用户画像的核心工作即是给用户贴“标签”，而标签是通过对用户信息分析得来的高度精炼的特征标识。User portrait: A user portrait is a labelled user model abstracted from information such as user social attributes, living habits and consumption behavior. The core work of building user portraits is to “label” users, and labels are highly refined feature identifiers obtained by analyzing user information.

机器学习(ML，Machine Learning)是一门多领域交叉学科，涉及概率论、统计学、逼近论、凸分析、算法复杂度理论等多门学科。专门研究计算机怎样模拟或实现人类的学习行为，以获取新的知识或技能，重新组织已有的知识结构使之不断改善自身的性能。机器学习是人工智能的核心，是使计算机具有智能的根本途径，其应用遍及人工智能的各个领域。机器学习和深度学习通常包括人工神经网络、置信网络、强化学习、迁移学习、归纳学习、式教学习等技术。Machine Learning (ML, Machine Learning) is a multi-field interdisciplinary subject involving probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and other disciplines. It specializes in how computers simulate or realize human learning behaviors to acquire new knowledge or skills, and to reorganize existing knowledge structures to continuously improve their performance. Machine learning is the core of artificial intelligence and the fundamental way to make computers intelligent, and its applications are in all fields of artificial intelligence. Machine learning and deep learning usually include artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, teaching learning and other technologies.

如图1所示，系统架构可以包括用户画像需求方平台101、网络102和用户画像提供方平台103。用户画像需求方平台101和用户画像提供方平台103之间通过网络102连接，并基于网络102进行数据交互，该网络可以包括各种连接类型，例如有线通信链路、无线通信链路等等。As shown in FIG. 1 , the system architecture may include a user profile demander platform 101 , a network 102 and a user profile provider platform 103 . The user profile demander platform 101 and the user profile provider platform 103 are connected through a network 102, and data interaction is performed based on the network 102. The network may include various connection types, such as wired communication links, wireless communication links and so on.

应该理解，图1中的用户画像需求方平台101、网络102和用户画像提供方平台103的数目仅仅是示意性的。根据实现需要，可以具有任意数目的用户画像需求方平台101、网络102和用户画像提供方平台103。例如，用户画像提供方平台103可以为提供用户画像生成服务的服务器集群，用户画像需求方平台101可以为需要获取用户画像的服务器集群或客户端，客户端可以为手机、平板、便携式计算机和台式计算机的一种或多种，当然，并不限定于此。用户画像提供方平台103可以是为用户提供多种业务服务的平台，如社交应用或即时通讯应用等，其可以包含大量用户的用户属性信息以及历史行为数据。It should be understood that the numbers of user portrait demander platforms 101 , networks 102 and user portrait provider platforms 103 in FIG. 1 are merely illustrative. According to implementation requirements, there may be any number of user portrait demander platforms 101 , networks 102 and user portraiture provider platforms 103 . For example, the user portrait provider platform 103 can be a server cluster that provides user portrait generation services, the user portrait demander platform 101 can be a server cluster or client that needs to obtain user portraits, and the client can be mobile phones, tablets, portable computers and desktops One or more kinds of computers, of course, are not limited to this. The user portrait provider platform 103 may be a platform that provides users with various business services, such as social applications or instant messaging applications, which may include user attribute information and historical behavior data of a large number of users.

用户画像需求方平台101提供需要生成的用户画像的目标用户的标识信息，用户画像提供方平台103基于用户的标识信息获取目标用户的用户特征信息，用户特征信息包括目标用户的用户属性信息以及历史行为数据；基于用户特征信息，分别生成目标用户在用户特征维度下的多个用户特征向量，用户特征维度是指由用户特征信息的类别、以及历史行为数据所包含的操作事件类别和操作对象的分类属性构成的特征维度；将多个用户特征向量输入至预训练的机器学习模型中，预训练的机器学习模型通过包含有样本用户在用户特征维度下的多个用户特征向量以及样本用户对目标分类属性下的目标对象的偏好等级标签；获取预训练的机器学习模型输出的目标用户对目标分类属性下的目标对象的偏好等级标签；若获取的偏好等级标签与预设的偏好等级标签相匹配，则基于目标分类属性生成目标用户的用户画像。The user portrait demander platform 101 provides the identification information of the target user of the user portrait that needs to be generated, and the user portrait provider platform 103 obtains the user characteristic information of the target user based on the user identification information, and the user characteristic information includes the user attribute information and history of the target user. Behavior data; based on the user feature information, generate multiple user feature vectors of the target user under the user feature dimension. The user feature dimension refers to the category of the user feature information, as well as the operation event category and operation object contained in the historical behavior data. The feature dimension composed of categorical attributes; multiple user feature vectors are input into the pre-trained machine learning model, and the pre-trained machine learning model contains multiple user feature vectors of the sample users under the user feature dimension and the sample user's target The preference level label of the target object under the classification attribute; obtain the preference level label of the target user for the target object under the target classification attribute output by the pre-trained machine learning model; if the obtained preference level label matches the preset preference level label , the user portrait of the target user is generated based on the target classification attribute.

以上可以看出，相较于基于标签统计获得用户画像来说，用户特征向量更能全面表征用户喜好，从而可以提高预测用户对特定分类属性下的目标对象的偏好等级标签的准确度，进而方便找出具有特定用户画像的目标用户，以便于提高进行相关业务推荐的准确度。It can be seen from the above that, compared with obtaining user portraits based on tag statistics, user feature vectors can more comprehensively represent user preferences, which can improve the accuracy of predicting users’ preference grade labels for target objects under specific classification attributes, which is convenient for Find out target users with specific user profiles, so as to improve the accuracy of relevant business recommendations.

需要说明的是，本申请实施例所提供的用户画像的生成方法一般由用户画像提供方平台103执行，相应地，用户画像的生成装置一般设置于用户画像提供方平台103中。但是，在本申请的其它实施例中，用户画像需求方平台101也可以与用户画像提供方平台103具有相似的功能，从而执行本申请实施例所提供的用户画像的生成方法的方案。以下对本申请实施例的技术方案的实现细节进行详细阐述。It should be noted that the method for generating the user portrait provided in the embodiment of the present application is generally performed by the user portrait provider platform 103 , and correspondingly, the user portrait generating device is generally set in the user portrait provider platform 103 . However, in other embodiments of the present application, the user portrait demander platform 101 may also have similar functions to the user portrait provider platform 103, so as to implement the solution of the user portrait generation method provided by the embodiments of the present application. The implementation details of the technical solutions of the embodiments of the present application are described in detail below.

图2示出了根据本申请的一个实施例的用户画像的生成方法的流程图，该用户画像的生成方法可以由用户画像提供方平台来执行，该服务器可以是图1中所示的用户画像提供方平台103。参照图2所示，该用户画像的生成方法至少包括步骤S210至步骤S250，详细介绍如下。FIG. 2 shows a flowchart of a method for generating a user portrait according to an embodiment of the present application. The method for generating a user portrait may be performed by a user portrait provider platform, and the server may be the user portrait shown in FIG. 1 . Provider Platform 103. Referring to FIG. 2 , the method for generating a user portrait includes at least steps S210 to S250, which are described in detail as follows.

在步骤S210中，获取目标用户的用户特征信息，用户特征信息包括目标用户的用户属性信息以及历史行为数据。In step S210, user characteristic information of the target user is acquired, where the user characteristic information includes user attribute information and historical behavior data of the target user.

在本申请的一个实施例中，用户特征信息是指用户在多个维度的特性信息，它具体包括用户属性信息和历史行为数据，目标用户是指需要生成用户画像的用户。In an embodiment of the present application, the user feature information refers to the feature information of the user in multiple dimensions, which specifically includes user attribute information and historical behavior data, and the target user refers to the user who needs to generate a user portrait.

在本申请的一个实施例中，目标用户的用户特征信息可以由用户画像提供方平台从目标用户的注册信息中获取，目标用户的用户属性信息可以包括多种类别的属性信息，如年龄、出生年月日、性别、地址和职位等，当然，并不限定于此。In an embodiment of the present application, the user characteristic information of the target user can be obtained from the registration information of the target user by the user portrait provider platform, and the user attribute information of the target user can include various types of attribute information, such as age, birth Date, gender, address and position, etc., of course, are not limited to this.

在本申请的一个实施例中，目标用户的历史行为数据包括目标用户在用户画像提供方平台中的历史行为数据。行为数据包括操作事件以及操作对象的属性信息，操作事件可以是点击、浏览、收藏、评论、转发等。操作对象可以是商品广告或内容等，而操作对象的分类属性可以包括主题、类目、标签等。In an embodiment of the present application, the historical behavior data of the target user includes the historical behavior data of the target user in the user portrait provider platform. The behavior data includes operation events and attribute information of the operation objects, and the operation events can be click, browse, favorite, comment, forward, etc. The operation object may be commodity advertisement or content, etc., and the classification attribute of the operation object may include subject, category, label, and so on.

可选地，当操作对象是商品广告，操作事件可以为浏览事件，其分类属性可以为类目；当操作对象是内容时，操作事件可以为点击事件、浏览事件、收藏事件、评论事件、转发事件，其分类属性可以为主题、类目、标签。Optionally, when the operation object is a commodity advertisement, the operation event can be a browsing event, and its classification attribute can be a category; when the operation object is content, the operation event can be a click event, a browsing event, a favorite event, a comment event, a forwarding event, etc. Events, whose classification attributes can be topics, categories, and tags.

在步骤S220中，基于用户特征信息，分别生成目标用户在用户特征维度下的多个用户特征向量，用户特征维度是指由用户属性信息的类别、以及历史行为数据所包含的操作事件类别和操作对象的分类属性构成的特征维度。In step S220, based on the user feature information, respectively generate multiple user feature vectors of the target user under the user feature dimension, where the user feature dimension refers to the category of the user attribute information and the operation event category and operation included in the historical behavior data The feature dimension formed by the categorical attributes of an object.

在本申请的一个实施例中，用户特征维度是指由用户属性信息的类别、以及历史行为数据所包含的操作事件类别和操作对象的分类属性构成的特征维度。In an embodiment of the present application, the user feature dimension refers to a feature dimension composed of categories of user attribute information, operation event categories included in historical behavior data, and classification attributes of operation objects.

在本申请的一个实施例中，针对用户属性信息的类别构成的特征维度而言，每个类别的用户属性信息会生成一个用户特征向量，如用户属性信息包括年龄、性别，那么年龄、性别构成两个特征维度，因而会根据年龄和性别分别生成一个用户特征向量。In an embodiment of the present application, for the feature dimension composed of categories of user attribute information, each category of user attribute information will generate a user feature vector. If the user attribute information includes age and gender, then age and gender constitute Two feature dimensions, thus generating a user feature vector based on age and gender.

在本申请的一个实施例中，针对历史行为数据所包含的操作事件类别和操作对象的分类属性构成的特征维度而言，其可以由多种实施方式。In an embodiment of the present application, for the feature dimension formed by the operation event category included in the historical behavior data and the classification attribute of the operation object, it can be implemented in various ways.

在一种实施方式中，可以根据操作对象的分类属性确定特征维度。以操作对象是内容为例，当内容的分类属性包括主题、类目、标签，那么将主题、类目、标签分别作为一个特征维度，进而生成三个特征维度的用户特征向量。In one embodiment, the feature dimension may be determined according to the classification attribute of the operation object. Taking the operation object as content as an example, when the classification attributes of the content include subject, category, and tag, then the subject, category, and tag are taken as one feature dimension respectively, and then user feature vectors of three feature dimensions are generated.

在另一种实施方式中，可以根据操作对象的分类属性和操作事件类别共同确定特征维度。以操作对象是内容为例，当内容的分类属性包括主题、类目、标签，且操作事件类别包括点击事件和评论事件，那么将针对主题的点击事件、针对主题的评论事件、针对类目的点击事件、针对类目的评论事件、针对标签的点击事件、针对标签的评论事件分别作为一个特征维度，进而生成六个特征维度的用户特征向量。In another embodiment, the feature dimension may be jointly determined according to the classification attribute of the operation object and the operation event category. Taking the operation object as content as an example, when the classification attributes of the content include topic, category, and label, and the operation event category includes click events and comment events, then the click event for the topic, the comment event for the topic, and the category The click event, the comment event for the category, the click event for the label, and the comment event for the label are respectively taken as a feature dimension, and then a user feature vector of six feature dimensions is generated.

需要说明的是，特征维度划分的实施方式并不仅限于上述实施方式，还可以其他实施方式，对此，本申请不做具体限定。It should be noted that the implementation of the feature dimension division is not limited to the above-mentioned implementation, and other implementations may also be used, which are not specifically limited in this application.

在步骤S230中，将多个用户特征向量输入至预训练的机器学习模型中，预训练的机器学习模型通过包含有样本用户在用户特征维度下的多个用户特征向量以及样本用户对目标分类属性下的目标对象的偏好等级标签。In step S230, a plurality of user feature vectors are input into a pre-trained machine learning model, and the pre-trained machine learning model includes a plurality of user feature vectors of the sample user under the user feature dimension and the sample user's target classification attribute Target object's preference rating under the tab.

在本申请的一个实施例中，可以将目标用户在用户特征维度下的多个用户特征向量输入至预训练的机器学习模型中，预训练的机器学习模型将目标用户对目标分类属性下的目标对象的偏好等级标签，其中，预训练的机器学习模型可以是look-alike模型等，也可以是深度神经网络模型、CNN(Convolutional Neural Network，卷积神经网络)模型。偏好等级标签可以反映目标用户对目标分类属性下的目标对象的偏好程度，偏好等级标签从高到低可以包括“感兴趣”和“不感兴趣”两个等级，具体的，偏好等级标签从高到低也可以包括“特别感兴趣”、“感兴趣”、“不感兴趣”和“特别不感兴趣”四个等级，等级越高，目标用户对目标分类属性下的目标对象的偏好程度越高。In an embodiment of the present application, multiple user feature vectors of the target user under the user feature dimension can be input into a pre-trained machine learning model, and the pre-trained machine learning model will assign the target user to the target under the target classification attribute. The preference level label of the object, where the pre-trained machine learning model may be a look-alike model, etc., or a deep neural network model or a CNN (Convolutional Neural Network, convolutional neural network) model. The preference level label can reflect the preference degree of the target user to the target object under the target classification attribute. The preference level label can include two levels of "interested" and "not interested" from high to low. Specifically, the preference level label is from high to low. Low can also include four levels of "special interest", "interested", "not interested" and "especially not interested". The higher the level, the higher the target user's preference for the target object under the target classification attribute.

参考图3，图3示出了根据本申请的一个实施例中的用户画像的生成方法的流程图，预训练的机器学习模型基于以下方法确定目标用户对目标分类属性下的目标对象的偏好等级标签，具体可以包括步骤S310至步骤S330。Referring to FIG. 3, FIG. 3 shows a flow chart of a method for generating a user portrait according to an embodiment of the present application. The pre-trained machine learning model determines the preference level of the target user to the target object under the target classification attribute based on the following method The label may specifically include steps S310 to S330.

参考图4，图4示出了根据本申请的一个实施例中的预训练的机器学习模型的结构示意图，图4所示的预训练的机器学习模型具体可以包括投影层(Projection Layer)402、胶囊层(Capsule Layer)403和胶囊注意层(Capsule Attention Layer)405。Referring to FIG. 4, FIG. 4 shows a schematic structural diagram of a pre-trained machine learning model according to an embodiment of the present application, and the pre-trained machine learning model shown in FIG. 4 may specifically include a projection layer (Projection Layer) 402, Capsule Layer 403 and Capsule Attention Layer 405.

参考图5，图5示出了根据本申请的一个实施例中的预训练的机器学习模型中的胶囊层的具体结构示意图，以下结合3至图5对步骤S310至步骤S330进行详细描述。Referring to FIG. 5 , FIG. 5 shows a specific structural schematic diagram of a capsule layer in a pre-trained machine learning model according to an embodiment of the present application. Steps S310 to S330 are described in detail below with reference to 3 to FIG. 5 .

在步骤S310中，分别对多个用户特征向量进行转换处理，生成多个用户特征向量对应的第一特征向量，多个用户特征对应的第一特征向量是同一维度的向量。In step S310, the conversion processing is performed on the plurality of user feature vectors respectively to generate first feature vectors corresponding to the plurality of user feature vectors, and the first feature vectors corresponding to the plurality of user features are vectors of the same dimension.

在本申请的一个实施例中，如图4所示，在多个用户特征向量401被输入至预训练的机器学习模型后，其通过投影层(Projection Layer)402对多个用户特征向量进行转换处理，生成多个用户特征向量对应的第一特征向量，需要指出的是，多个用户特征对应的第一特征向量是同一维度的向量。In an embodiment of the present application, as shown in FIG. 4 , after multiple user feature vectors 401 are input into the pre-trained machine learning model, it converts the multiple user feature vectors through a projection layer (Projection Layer) 402 processing to generate first feature vectors corresponding to multiple user feature vectors. It should be noted that the first feature vectors corresponding to multiple user features are vectors of the same dimension.

具体的，通过投影层(Projection Layer)402对多个用户特征向量进行转换处理，生成多个用户特征向量对应的第一特征向量时，可以通过公式来实现

为第i个用户特征向量，

为第i个用户特征向量对应的第一特征向量，e和m为固定参数，W_d为对对多个用户特征向量进行转换处理的转换矩阵。通过投影层(Projection Layer)将多个用户特征向量转换成同一维度的向量，可以提高预训练的机器学习模型的数据处理效率。Specifically, when the projection layer (Projection Layer) 402 is used to convert multiple user feature vectors to generate the first feature vector corresponding to the multiple user feature vectors, the formula can be used to achieve

is the i-th user feature vector,

is the first eigenvector corresponding to the i-th user eigenvector, e and m are fixed parameters, and W _d is a conversion matrix for converting multiple user eigenvectors. Converting multiple user feature vectors into vectors of the same dimension through the projection layer can improve the data processing efficiency of the pre-trained machine learning model.

在步骤S320中，基于预设的多个用户特征信息之间的关联关系，对多个用户特征向量对应的第一特征向量进行聚合处理，生成聚合后的多个第二特征向量，多个用户特征对应的第一特征向量是同一维度的向量。In step S320, based on the preset association relationship between multiple user feature information, an aggregation process is performed on the first feature vectors corresponding to the multiple user feature vectors to generate multiple aggregated second feature vectors, and multiple user feature vectors are generated. The first feature vector corresponding to the feature is a vector of the same dimension.

在本申请的一个实施例中，对于经过投影层(Projection Layer)得到的多个用户特征向量对应的第一特征向量，胶囊层(Capsule Layer)403依据多个用户特征信息之间的关联关系，对多个用户特征向量对应的第一特征向量进行聚合处理，生成聚合后的多个第二特征向量，多个用户特征对应的第一特征向量是同一维度的向量In an embodiment of the present application, for the first feature vector corresponding to the plurality of user feature vectors obtained through the projection layer (Projection Layer), the capsule layer (Capsule Layer) 403 is based on the association relationship between the plurality of user feature information, Aggregate the first feature vectors corresponding to multiple user feature vectors to generate multiple second feature vectors after aggregation, and the first feature vectors corresponding to multiple user features are vectors of the same dimension

具体的，对于输入至胶囊层(Capsule Layer)403中的第i个第一特征向量

可以是根据预定的公式

对其进行聚合处理，生成对应的向量

并对

通过公式

进行归一化处理生成单位向量

Specifically, for the i-th first feature vector input into the Capsule Layer 403

can be based on a predetermined formula

Aggregate it to generate the corresponding vector

and to

by formula

Perform normalization to generate unit vectors

I_u表示的是第一特征向量的序列数，S为转换矩阵，w_ij为第i个第一特征向量与第j个第一特征向量进行聚合时的权重，b_ij是胶囊层(Capsule Layer)403进行前向传播计算时进行初始化的参数，可以通过w_ij←softmax(b_ij)确定w_ij，即对b_ij进行归一化处理来得到w_ij。需要指出的是，对待训练的机器学习模型进行训练时，在进行前向传播计算而对待训练的机器学习模型中的参数进行更新时，b_ij是根据公式

来更新的，

为是对

的转置变换得到的，当待训练的机器学习模型达到收敛条件时，则第一特征向量进行聚合时的权重w_ij也随即确定，Squash是对

进行归一化处理的转换矩阵。I _u represents the sequence number of the first eigenvector, S is the transformation matrix, w _ij is the weight when the i-th first eigenvector and the j-th first eigenvector are aggregated, and b _ij is the capsule layer (Capsule Layer ) 403 is the parameter initialized when performing forward propagation calculation, and w _ij may be determined by w _ij ←softmax(b _ij ), that is, by normalizing b _ij to obtain w _ij . It should be pointed out that when the machine learning model to be trained is trained, when the parameters in the machine learning model to be trained are updated by forward propagation calculation, b _ij is based on the formula

to update,

for yes

obtained from the _transpose transformation of

The transformation matrix to be normalized.

在步骤S330中，基于聚合后的多个第二特征向量，预测目标用户对目标分类属性下的目标对象的偏好等级标签。In step S330, based on the aggregated multiple second feature vectors, the preference level label of the target user for the target object under the target classification attribute is predicted.

在本申请的一个实施例中，胶囊注意层(Capsule Attention Layer)405通过聚合后的多个第二特征向量进行分析处理，预测目标用户对目标分类属性下的目标对象的偏好得分score，并根据偏好得分score和偏好等级标签之间的对应关系，预测目标用户对目标分类属性下的目标对象的偏好等级标签。可以理解，偏好得分score越高，则对应的偏好等级标签较高。In one embodiment of the present application, the Capsule Attention Layer 405 analyzes and processes the aggregated multiple second feature vectors, predicts the target user's preference score for the target object under the target classification attribute, and predicts the target user's preference score according to the target classification attribute. The correspondence between the preference score score and the preference grade label predicts the target user's preference grade label for the target object under the target classification attribute. It can be understood that the higher the preference score, the higher the corresponding preference grade label.

具体的，胶囊注意层405对输入的聚合后的多个第二特征向量进行聚合处理时，具体可以先根据公式

计算来得到高阶特征交叉

并通过公式

来对

进行归一化处理，得到归一化处理后的a_ij，基于a_ij来预测目标用户对目标分类属性下的目标对象的偏好得分score。其中，b为固定参数，W∈R^t×k，b∈R^t，h∈R^t，t表示胶囊注意层405的隐层维度，k表示的是第二特征向量的向量维度，h^T表示的是输入的聚合后的多个第二特征向量的集合，v_i和v_j分别为聚合后的多个第二特征向量的集合中的第i个第二特征向量与第j个第二特征向量，v_i·v_j为第i个第二特征向量与第j个第二特征向量之间的数量积。Specifically, when the capsule attention layer 405 performs aggregation processing on the input aggregated second feature vectors, the specific method can firstly be performed according to the formula

Compute to get higher-order feature intersections

and by formula

come right

Perform normalization processing to obtain a _ij after normalization processing, and predict the target user's preference score score for the target object under the target classification attribute based on a _ij . Among them, b is a fixed parameter, W∈R ^t×k , b∈R ^t , h∈R ^t , t denotes the hidden layer dimension of the capsule attention layer 405 , k denotes the vector dimension of the second feature vector, h ^T denotes is the input set of multiple second feature vectors after aggregation, v _i and v _j are the i-th second feature vector and the j-th second feature in the set of multiple second feature vectors after aggregation, respectively vector, v _i ·v _j is the quantity product between the ith second eigenvector and the jth second eigenvector.

参考图6，图6示出了根据本申请的一个实施例中的用户画像的生成方法的流程图，该实施例中的用户画像的生成方法还可以包括步骤S610至步骤S620，详细描述如下。Referring to FIG. 6, FIG. 6 shows a flowchart of a method for generating a user portrait according to an embodiment of the present application. The method for generating a user portrait in this embodiment may further include steps S610 to S620, which are described in detail as follows.

在步骤S610中，获取用于对待训练的机器学习模型进行训练的训练集样本数据，训练集样本数据中的每条样本数据包括样本用户在用户特征维度下的多个用户特征向量以及样本用户对目标分类属性下的目标对象的偏好等级标签。In step S610, a training set sample data for training the machine learning model to be trained is obtained, and each piece of sample data in the training set sample data includes multiple user feature vectors of the sample user under the user feature dimension and a pair of sample user pairs. The preference rating label of the target object under the target classification attribute.

在本申请的一个实施例中，训练集样本数据为包含大量对待训练的机器学习模型进行训练的样本数据。在生成训练集样本数据中的样本数据时，可以将用户画像提供方平台中的已有用户作为样本用户，通过获取样本用户的用户属性信息以及历史行为数据，并基于样本用户的用户属性信息以及历史行为数据分别生成样本用户在用户特征维度下的多个用户特征向量，此外，对于每个样本用户，还需要根据其具体的用户属性信息以及历史行为数据，确定该样本用户目标分类属性下的目标对象的偏好情况，进而根据所确定的偏好情况生成样本用户对目标分类属性下的目标对象的偏好等级标签。In an embodiment of the present application, the training set sample data includes a large number of sample data for training a machine learning model to be trained. When generating the sample data in the sample data of the training set, the existing users in the user portrait provider platform can be used as sample users, and the user attribute information and historical behavior data of the sample users can be obtained by obtaining the user attribute information and The historical behavior data respectively generates multiple user feature vectors of the sample users under the user feature dimension. In addition, for each sample user, it is also necessary to determine the target classification attribute of the sample user according to its specific user attribute information and historical behavior data. The preference status of the target object, and then the preference level label of the sample user for the target object under the target classification attribute is generated according to the determined preference status.

参考图7，图7示出了根据本申请的一个实施例的用户画像的生成方法的步骤S610的具体流程图，该步骤S610可以包括步骤S710至步骤S750，详细描述如下。Referring to FIG. 7 , FIG. 7 shows a specific flowchart of step S610 of a method for generating a user portrait according to an embodiment of the present application. The step S610 may include steps S710 to S750 , which are described in detail as follows.

在步骤S710中，获取候选用户集中的候选用户的用户特征信息。In step S710, user feature information of the candidate users in the candidate user set is obtained.

在一个实施例中，候选用户集为在用户画像提供方平台中的已有用户集合，其包含大量可以被选取为样本用户的候选用户。In one embodiment, the candidate user set is an existing user set in the user profile provider platform, which includes a large number of candidate users that can be selected as sample users.

参考图8，图8示出了根据本申请的一个实施例的用户画像的生成方法的步骤S710的具体流程图，该步骤S710可以包括步骤S810至步骤S830，详细描述如下。Referring to FIG. 8 , FIG. 8 shows a specific flowchart of step S710 of a method for generating a user portrait according to an embodiment of the present application. The step S710 may include steps S810 to S830 , which are described in detail as follows.

在步骤S810中，获取配置文件，配置文件用于对所获取的用户属性信息的属性类别、以及所获取的历史行为数据的操作事件类别和操作对象的分类属性进行配置。In step S810, a configuration file is obtained, where the configuration file is used to configure the attribute category of the acquired user attribute information, the operation event category of the acquired historical behavior data, and the classification attribute of the operation object.

在本申请的一个实施例中，在获取候选用户的用户特征信息，为了便于按照具体的业务场景需求去生成特定的用户画像，可以预先对需要获取的用户属性信息的属性类别以及历史行为数据的操作事件类别和操作对象的分类属性进行配置，生成对应的配置文件。In an embodiment of the present application, in order to obtain user feature information of candidate users, in order to facilitate the generation of a specific user portrait according to specific business scenario requirements, the attribute category of the user attribute information to be obtained and the historical behavior data can be pre-determined. Configure the operation event category and the classification attribute of the operation object to generate a corresponding configuration file.

具体而言，当需要生成的用户画像针对的是商品广告时，在配置文件中，需要获取的用户属性信息的属性类别可以包括年龄、性别和地址，而历史行为数据的操作事件类别可以为点击事件和浏览事件，操作对象的分类属性可以包括商品广告中的商品类别，需要指出的是，商品类别可以包括一个或多个等级的分类类别。以商品类别包含一级商品类别和二级商品类别为例，一级商品类别可以包括家电、食品酒饮、医药保健等，而二级商品类别则为一级商品类别内的具体分类，如家电的二级商品类别可以包括电视、空调、冰箱等。Specifically, when the user portrait to be generated is for a product advertisement, in the configuration file, the attribute category of the user attribute information to be obtained can include age, gender and address, and the operation event category of the historical behavior data can be click For events and browsing events, the classification attribute of the operation object may include the product category in the product advertisement. It should be noted that the product category may include one or more levels of classification categories. Taking the commodity category including the first-level commodity category and the second-level commodity category as an example, the first-level commodity category can include home appliances, food and beverages, medicine and health care, etc., while the second-level commodity category is the specific category within the first-level commodity category, such as home appliances The secondary commodity categories of a tv can include TVs, air conditioners, refrigerators, etc.

在步骤S820中，针对候选用户集中的每个候选用户，基于配置文件中配置的用户属性信息的类别，获取候选用户在所配置的属性类别下的用户属性信息。In step S820, for each candidate user in the candidate user set, based on the category of the user attribute information configured in the configuration file, user attribute information of the candidate user under the configured attribute category is obtained.

在本申请的一个实施例中，在获取配置文件后，针对候选用户集中的每个候选用户，则可以基于配置文件中的配置的用户属性信息的类别，分别获取候选用户在所配置的属性类别下的用户属性信息。In an embodiment of the present application, after the configuration file is obtained, for each candidate user in the candidate user set, based on the configured user attribute information category in the configuration file, the configured attribute category of the candidate user in the configured attribute category can be obtained respectively. User attribute information under .

在步骤S830中，针对候选用户集中的每个候选用户，基于配置文件中配置的操作事件类别和操作对象的分类属性，获取候选用户在操作事件类别和操作对象的分类属性下的历史行为数据。In step S830, for each candidate user in the candidate user set, based on the operation event category and the classification attribute of the operation object configured in the configuration file, the historical behavior data of the candidate user under the operation event category and the classification attribute of the operation object is obtained.

在本申请的一个实施例中，在获取配置文件后，针对候选用户集中的每个候选用户，则可以基于配置文件中的配置的操作事件类别和操作对象的分类属性，获取候选用户在相应的操作事件类别和操作对象的分类属性下的历史行为数据。In an embodiment of the present application, after obtaining the configuration file, for each candidate user in the candidate user set, the corresponding operation event category and the classification attribute of the operation object configured in the configuration file can be obtained. Historical behavior data under the action event category and the action object's classification attribute.

图8所示的实施例的技术方案中，通过配置文件，可以对需要获取的用户属性信息的属性类别、以及需要获取的历史行为数据的操作事件类别和操作对象的分类属性进行配置，进而可以根据需求获取特定维度的用户特征数据，可以有针对性的获取所需的特征数据，进而便于生成符合特定业务场景的样本数据。In the technical solution of the embodiment shown in FIG. 8, through the configuration file, the attribute category of the user attribute information to be acquired, the operation event category of the historical behavior data to be acquired, and the classification attribute of the operation object can be configured, and further Obtaining user characteristic data of a specific dimension according to the requirements can obtain the required characteristic data in a targeted manner, thereby facilitating the generation of sample data that conforms to a specific business scenario.

还请继续参考图7，在步骤S720中，针对候选用户集中的每个候选用户，基于候选用户的用户特征信息中的历史行为数据，确定第一类候选用户以及第二类候选用户，第一类候选用户是包含目标操作事件的候选用户，第二类候选用户是未包含目标操作事件的候选用户，目标操作事件是指对目标分类属性下的目标对象进行操作的操作事件。Please also continue to refer to FIG. 7, in step S720, for each candidate user in the candidate user set, based on the historical behavior data in the user feature information of the candidate user, determine the first type of candidate users and the second type of candidate users, the first type of candidate user. The class candidate user is a candidate user that includes the target operation event, the second class candidate user is a candidate user that does not include the target operation event, and the target operation event refers to the operation event that operates the target object under the target classification attribute.

在本申请的一个实施例中，由于预训练的机器学习模型是为了预测用户对特定目标分类属性下的目标对象的偏好等级标签，因此，对机器学习模型进行训练的样本用户可以包括正样本用户和负样本用户，正样本用户是对特定目标分类属性下的目标对象是偏好程度较高的样本用户，负样本用户对特定目标分类属性下的目标对象是偏好程度较低的样本用户，通过正样本用户和负样本用户所构成的样本数据来训练机器学习模型，可以使得机器学习模型可以有效识别目标用户是否是对特定目标分类属性下的目标对象的偏好程度较高的用户。In an embodiment of the present application, since the pre-trained machine learning model is used to predict the user's preference level label for the target object under the specific target classification attribute, the sample users for training the machine learning model may include positive sample users and negative sample users, positive sample users are sample users with higher preference for the target object under the specific target classification attribute, and negative sample users have lower preference for the target object under the specific target classification attribute. The sample data composed of sample users and negative sample users is used to train the machine learning model, so that the machine learning model can effectively identify whether the target user is a user with a high degree of preference for the target object under the specific target classification attribute.

在本申请的一个实施例中，目标操作事件是用户指对目标分类属性下的目标对象进行操作的操作事件，如针对目标分类属性下的目标对象的点击事件、浏览事件等。当用户的历史行为数据包含该目标操作事件，可以说明对目标分类属性下的目标对象的兴趣度较高，反之，则说明对目标分类属性下的目标对象的兴趣度较低。因此，针对候选用户集中的每个候选用户，可以根据候选用户的历史行为数据中是否包含目标操作事件对候选用户进行分类，得到包含目标操作事件的候选用户的第一类候选用户以及未包含目标操作事件的候选用户的第二类候选用户，可以理解，第一类候选用户用于作为生成正样本用户的候选用户，而第二类候选用户用于作为生成负样本用户的候选用户。In an embodiment of the present application, the target operation event refers to an operation event that the user operates on the target object under the target category attribute, such as a click event, browsing event, etc. for the target object under the target category attribute. When the user's historical behavior data includes the target operation event, it can indicate that the degree of interest in the target object under the target classification attribute is high; otherwise, it indicates that the interest degree in the target object under the target classification attribute is low. Therefore, for each candidate user in the candidate user set, the candidate users can be classified according to whether the historical behavior data of the candidate user includes the target operation event, and the first type of candidate users including the target operation event candidate user and the candidate users not including the target operation event can be obtained. For the second type of candidate users of the candidate users of the operation event, it can be understood that the first type of candidate users is used as a candidate user for generating positive sample users, and the second type of candidate users is used as a candidate user for generating negative sample users.

在步骤S730中，基于第一类候选用户的用户特征信息，生成第一类候选用户在用户特征维度下的多个用户特征向量，并为第一类候选用户添加偏好程度高的偏好等级标签，得到正样本用户的样本数据。In step S730, based on the user characteristic information of the first type of candidate users, multiple user characteristic vectors of the first type of candidate users under the user characteristic dimension are generated, and a preference level label with a high degree of preference is added to the first type of candidate users, Get sample data of positive sample users.

在本申请的一个实施例中，在生成正样本用户的样本数据时，可以基于第一类候选用户的用户特征信息，生成第一类候选用户在用户特征维度下的多个用户特征向量，并为第一类候选用户添加偏好程度高的偏好等级标签。In an embodiment of the present application, when generating the sample data of the positive sample users, based on the user feature information of the first-type candidate users, multiple user feature vectors of the first-type candidate users in the user feature dimension can be generated, and Add preference grade labels with high preference for the first type of candidate users.

可选地，针对第一类候选用户，可以将第一类候选用户中的全部候选用户选取为正样本用户。Optionally, for the first type of candidate users, all candidate users in the first type of candidate users may be selected as positive sample users.

可选地，针对第一类候选用户，可以将第一类候选用户中的部分候选用户选取为正样本用户。Optionally, for the first type of candidate users, some candidate users in the first type of candidate users may be selected as positive sample users.

在步骤S740中，基于第二类候选用户的用户特征信息，生成第二类候选用户在用户特征维度下的多个用户特征向量，并为第二类候选用户添加偏好程度低的偏好等级标签，得到负样本用户的样本数据。In step S740, based on the user characteristic information of the second type of candidate users, multiple user characteristic vectors of the second type of candidate users under the user characteristic dimension are generated, and a preference level label with a low degree of preference is added to the second type of candidate users, Get sample data of negative sample users.

在本申请的一个实施例中，在生成负样本用户的样本数据时，可以基于第二类候选用户的用户特征信息，生成第二类候选用户在用户特征维度下的多个用户特征向量，并为第二类候选用户添加偏好程度低的偏好等级标签。In an embodiment of the present application, when generating sample data of negative sample users, multiple user feature vectors of the second-type candidate users in the user-feature dimension may be generated based on the user feature information of the second-type candidate users, and Add preference grade labels with low preference for the second type of candidate users.

可选地，针对第二类候选用户，可以将第二类候选用户中的全部候选用户选取为负样本用户。Optionally, for the second type of candidate users, all candidate users in the second type of candidate users may be selected as negative sample users.

参考图9，图9示出了根据本申请的一个实施例的用户画像的生成方法的步骤S740的具体流程图，该步骤S740可以包括步骤S910至步骤S920，详细描述如下。Referring to FIG. 9 , FIG. 9 shows a specific flow chart of step S740 of a method for generating a user portrait according to an embodiment of the present application. The step S740 may include steps S910 to S920 , which are described in detail as follows.

在步骤S910中，在第二类候选用户中，抽取候选用户。In step S910, candidate users are extracted from the second type of candidate users.

参考图10，图10示出了根据本申请的一个实施例的用户画像的生成方法的步骤S910的具体流程图，该步骤S910可以包括步骤S1010至步骤S1020，详细描述如下。Referring to FIG. 10 , FIG. 10 shows a specific flowchart of step S910 of a method for generating a user portrait according to an embodiment of the present application. The step S910 may include steps S1010 to S1020 , which are described in detail as follows.

在步骤S1010中，基于第二类候选用户的用户特征信息中所包含的历史行为数据的频次，确定第二类候选用户的画像密集度，画像密集度与频次为负相关关系。In step S1010, based on the frequency of the historical behavior data included in the user characteristic information of the second-type candidate user, the portrait density of the second-type candidate user is determined, and the portrait density and the frequency are negatively correlated.

在本申请的一个实施例中，针对第二类候选用户，可以将第二类候选用户中的全部候选用户选取为负样本用户。In an embodiment of the present application, for the second type of candidate users, all candidate users in the second type of candidate users may be selected as negative sample users.

在本申请的一个实施例中，由于第二类候选用户为历史行为数据中不在目标操作事件的候选用户，其存在历史行为丰富和历史行为稀疏的两类候选用户，当候选用户的历史行为丰富且其历史行为数据中不在目标操作事件时，则该候选用户为负样本用户的置信度较高，当候选用户的历史行为稀疏且历史行为数据中不在目标操作事件时，则会存在部分候选用户是由于历史行为稀疏而被选取为负样本用户的情况。因此，需要选取历史行为相对较多的候选样本作为负样本用户，而选取历史行为相对较少的候选样本作为负样本用户。In an embodiment of the present application, since the second type of candidate users is a candidate user who is not in the target operation event in the historical behavior data, there are two types of candidate users with rich historical behavior and sparse historical behavior. When the historical behavior of the candidate user is rich And when the historical behavior data is not in the target operation event, the confidence of the candidate user as a negative sample user is high. When the historical behavior of the candidate user is sparse and the historical behavior data is not in the target operation event, there will be some candidate users. It is the case where users are selected as negative samples due to sparse historical behavior. Therefore, it is necessary to select candidate samples with relatively more historical behaviors as negative sample users, and select candidate samples with relatively few historical behaviors as negative sample users.

画像密集度作为反映用户历史行为是否丰富的一种度量值，画像密集度越大，用户历史行为越丰富。Portrait density is a measure that reflects the richness of user historical behavior. The greater the portrait density, the richer the user's historical behavior.

画像密集度可以根据第二类候选用户的用户特征信息中所包含的历史行为数据的频次来确定的，画像密集度与历史行为数据的频次是正相关关系。历史行为数据的频次可以由多种计算方式。The portrait density can be determined according to the frequency of the historical behavior data included in the user characteristic information of the second type of candidate users, and the portrait density is positively correlated with the frequency of the historical behavior data. The frequency of historical behavioral data can be calculated in a number of ways.

可选地，可以将历史行为数据所包含的各类别的操作事件的频次之和，作为历史行为数据的频次。Optionally, the sum of the frequencies of various types of operation events included in the historical behavior data may be used as the frequency of the historical behavior data.

可选地，可以为历史行为数据所包含的各类别的操作事件分配不同的权重，并将历史行为数据所包含的各类别的操作事件的频次的加权和，作为历史行为数据的频次。Optionally, different weights may be assigned to various types of operation events included in the historical behavior data, and the weighted sum of the frequencies of the various types of operation events included in the historical behavior data may be used as the frequency of the historical behavior data.

在步骤S1020中，在画像密集度高于预定的画像密集度阈值的第二类候选用户中，以第一比例抽取候选用户，在画像密集度低于或等于预定的画像密集度阈值的第二类候选用户中，以第二比例抽取候选用户，第一比例大于第二比例。In step S1020, among the second type of candidate users whose portrait density is higher than a predetermined portrait density threshold, the candidate users are extracted at a first ratio, and among the second type of candidate users whose portrait density is lower than or equal to the predetermined portrait density threshold Among the class candidate users, candidate users are extracted in a second proportion, and the first proportion is greater than the second proportion.

在本申请的一个实施例中，画像密集度阈值为确定候选用户是否为历史行为丰富的用户的度量值，若画像密集度高于预定的画像密集度阈值，则候选用户是历史行为丰富的用户，若画像密集度低于或等于预定的画像密集度阈值，则候选用户是历史行为稀疏的用户。在画像密集度高于预定的画像密集度阈值的第二类候选用户中，可以以第一比例来抽取候选用户，在画像密集度低于或等于预定的画像密集度阈值的第二类候选用户中，可以第二比例抽取候选用户，第一比例可以设置为大于第二比例In one embodiment of the present application, the profile density threshold is a metric to determine whether the candidate user is a user with rich historical behavior. If the profile density is higher than a predetermined profile density threshold, the candidate user is a user with rich historical behavior. , if the profile density is lower than or equal to the predetermined profile density threshold, the candidate user is a user with sparse historical behavior. Among the second type of candidate users whose portrait density is higher than the predetermined portrait density threshold, the candidate users can be extracted with a first ratio, and the second type of candidate users whose portrait density is lower than or equal to the predetermined portrait density threshold , candidate users can be extracted at a second ratio, and the first ratio can be set to be greater than the second ratio

图10所示实施例的技术方案中，通过在历史行为数据中不在目标操作事件的第二类候选用户中，选取历史行为相对较多的候选样本作为负样本用户，而选取历史行为相对较少的候选样本作为负样本用户，可以提高所选取的负样本用户的准确度，进而提高预训练的机器学习模型的分类置信度。In the technical solution of the embodiment shown in FIG. 10 , the candidate samples with relatively more historical behaviors are selected as negative sample users by selecting the candidate samples with relatively more historical behaviors as the negative sample users, and the historical behaviors relatively few users are selected in the historical behavior data that are not among the second-type candidate users of the target operation event. As a negative sample user, the selected candidate sample can improve the accuracy of the selected negative sample user, thereby improving the classification confidence of the pre-trained machine learning model.

还请继续参考图9，在步骤S920中，基于抽取的候选用户的用户特征信息，生成抽取的候选用户在用户特征维度下的多个用户特征向量，并为抽取的候选用户添加偏好程度低的偏好等级标签，得到负样本用户的样本数据。Please also continue to refer to Fig. 9, in step S920, based on the user feature information of the extracted candidate user, generate multiple user feature vectors of the extracted candidate user under the user feature dimension, and add a low preference level for the extracted candidate user. Preference level label to get the sample data of negative sample users.

在本申请的一个实施例中，对于从第二类候选用户中抽取到的候选用户，在生成其对应的样本数据时，可以基于抽取的候选用户的用户特征信息，生成抽取的候选用户在用户特征维度下的多个用户特征向量，并为抽取的候选用户添加偏好程度低的偏好等级标签，进而生成负样本用户的样本数据。In an embodiment of the present application, for the candidate users extracted from the second type of candidate users, when generating the corresponding sample data, the extracted candidate users can be generated based on the user feature information of the extracted candidate users. Multiple user feature vectors under the feature dimension are added, and preference grade labels with low preference are added to the extracted candidate users, thereby generating sample data of negative sample users.

还请继续参考图7，在步骤S750中，基于正样本用户的样本数据以及负样本用户的样本数据，得到用于对待训练的机器学习模型进行训练的训练集样本数据。Please continue to refer to FIG. 7, in step S750, based on the sample data of the positive sample users and the sample data of the negative sample users, the training set sample data for training the machine learning model to be trained is obtained.

在一个实施例中，在得到正样本用户的样本数据以及负样本用户的样本数据后，则可以将所得到的正样本用户的样本数据以及负样本用户的样本数据，作为训练集样本数据，以实现通过训练集样本数据对待训练的机器学习模型进行训练。In one embodiment, after obtaining the sample data of the positive sample users and the sample data of the negative sample users, the obtained sample data of the positive sample users and the sample data of the negative sample users can be used as the training set sample data to Realize the training of the machine learning model to be trained through the sample data of the training set.

还请继续参考图6，在步骤S620中，通过训练集样本数据对待训练的机器学习模型进行训练，得到预训练的机器学习模型。Please continue to refer to FIG. 6 , in step S620 , the machine learning model to be trained is trained by using the sample data of the training set to obtain a pre-trained machine learning model.

在本申请的一个实施例中，将训练集样本数据输入至待训练的机器学习模型中，通过训练集样本数据对待训练的机器学习模型进行训练，得到预训练的机器学习模型。对机器学习模型进行训练的过程是调整机器学习模型对应的网络层中的各项系数，使得对于输入的目标用户在用户特征维度下的多个用户特征向量，经过机器学习模型对应的网络层中的各项系数运算，输出得到目标用户对目标分类属性下的目标对象的偏好等级标签。In one embodiment of the present application, the sample data of the training set is input into the machine learning model to be trained, and the machine learning model to be trained is trained by the sample data of the training set to obtain a pre-trained machine learning model. The process of training the machine learning model is to adjust the coefficients in the network layer corresponding to the machine learning model, so that the input target user's multiple user feature vectors under the user feature dimension can pass through the network layer corresponding to the machine learning model. The coefficient operation of , and the output obtains the preference level label of the target user for the target object under the target classification attribute.

还请继续参考图2，在步骤S240中，获取预训练的机器学习模型输出的目标用户对目标分类属性下的目标对象的偏好等级标签。Please also continue to refer to FIG. 2 , in step S240 , the preference level label of the target user for the target object under the target classification attribute output by the pre-trained machine learning model is obtained.

在本申请的一个实施例中，获取预训练的机器学习模型输出的目标用户对目标分类属性下的目标对象的偏好等级标签，即可得到目标用户对目标分类属性下的目标对象的偏好情况。In an embodiment of the present application, the preference level label of the target user for the target object under the target classification attribute output by the pre-trained machine learning model can be obtained, so as to obtain the target user's preference for the target object under the target classification attribute.

在步骤S250中，若获取的偏好等级标签与预设的偏好等级标签相匹配，则基于目标分类属性生成目标用户的用户画像。In step S250, if the acquired preference level label matches the preset preference level label, a user portrait of the target user is generated based on the target classification attribute.

在本申请的一个实施例中，预设的偏好等级标签为表征目标用户对目标分类属性的目标属性的偏好程度较高的偏好等级标签，如“感兴趣”和“特别感兴趣”的偏好等级标签，也可以仅是“特别感兴趣”的偏好等级标签。当从预训练的机器学习模型获取的偏好等级标签与预设的偏好等级标签相匹配时，则可以确定目标用户对目标分类属性下的目标对象的偏好等级很高，因此可以根据目标分类属性生成目标用户的用户画像。In an embodiment of the present application, the preset preference level label is a preference level label representing a higher degree of preference of the target user to the target attribute of the target classification attribute, such as the preference level of “interested” and “specially interested” tags, or just "particularly interesting" preference rating tags. When the preference level label obtained from the pre-trained machine learning model matches the preset preference level label, it can be determined that the target user has a high preference level for the target object under the target classification attribute, so it can be generated according to the target classification attribute. User persona of the target user.

以上可以看出，获取目标用户的用户特征信息，用户特征信息包括目标用户的用户属性信息以及历史行为数据；基于用户特征信息，分别生成目标用户在用户特征维度下的多个用户特征向量，用户特征维度是指由用户属性信息的类别、以及历史行为数据所包含的操作事件类别和操作对象的分类属性构成的特征维度；将多个用户特征向量输入至预训练的机器学习模型中，预训练的机器学习模型通过包含有样本用户在用户特征维度下的多个用户特征向量以及样本用户对目标分类属性下的目标对象的偏好等级标签；获取预训练的机器学习模型输出的目标用户对目标分类属性下的目标对象的偏好等级标签；若获取的偏好等级标签与预设的偏好等级标签相匹配，则基于目标分类属性生成目标用户的用户画像。相较于基于标签统计获得用户画像来说，用户特征向量更能全面表征用户喜好，从而可以提高预测用户对特定分类属性下的目标对象的偏好等级标签的准确度，进而方便找出具有特定用户画像的目标用户，以便于提高进行相关业务推荐的准确度。It can be seen from the above that the user characteristic information of the target user is obtained, and the user characteristic information includes the user attribute information and historical behavior data of the target user; based on the user characteristic information, multiple user characteristic vectors of the target user under the user characteristic dimension are respectively generated. The feature dimension refers to the feature dimension composed of the category of user attribute information, the operation event category contained in the historical behavior data, and the classification attribute of the operation object; multiple user feature vectors are input into the pre-training machine learning model, and the pre-training The machine learning model contains multiple user feature vectors of the sample user under the user feature dimension and the sample user's preference level label for the target object under the target classification attribute; obtain the target user's target classification output by the pre-trained machine learning model The preference grade label of the target object under the attribute; if the obtained preference grade label matches the preset preference grade label, a user portrait of the target user is generated based on the target classification attribute. Compared with obtaining user portraits based on tag statistics, user feature vectors can more comprehensively represent user preferences, which can improve the accuracy of predicting users' preference grade labels for target objects under specific classification attributes, thereby facilitating the identification of specific users. The target users of the portraits, in order to improve the accuracy of related business recommendations.

以下介绍本申请的装置实施例，可以用于执行本申请上述实施例中的用户画像的生成方法。对于本申请装置实施例中未披露的细节，请参照本申请上述的方法的实施例。The apparatus embodiments of the present application are described below, which can be used to execute the method for generating a user portrait in the above-mentioned embodiments of the present application. For details not disclosed in the device embodiments of the present application, please refer to the above-mentioned method embodiments of the present application.

参照图11所示，根据本申请的一个实施例的装置1100，包括：第一获取单元1110、第一生成单元1120、输入单元1130、输出单元1140以及第二生成单元1150。其中，第一获取单元1110，用于获取目标用户的用户特征信息，所述用户特征信息包括所述目标用户的用户属性信息以及历史行为数据；第一生成单元1120，用于基于所述用户特征信息，分别生成所述目标用户在用户特征维度下的多个用户特征向量，所述用户特征维度是指由所述用户属性信息的类别、以及所述历史行为数据所包含的操作事件类别和操作对象的分类属性构成的特征维度；输入单元1130，用于将所述多个用户特征向量输入至预训练的机器学习模型中，所述预训练的机器学习模型通过包含有样本用户在用户特征维度下的多个用户特征向量以及样本用户对目标分类属性下的目标对象的偏好等级标签；输出单元1140，用于获取所述预训练的机器学习模型输出的目标用户对目标分类属性下的目标对象的偏好等级标签；第二生成单元1150，用于若获取的偏好等级标签与预设的偏好等级标签相匹配，则基于所述目标分类属性生成所述目标用户的用户画像。Referring to FIG. 11 , an apparatus 1100 according to an embodiment of the present application includes: a first obtaining unit 1110 , a first generating unit 1120 , an input unit 1130 , an output unit 1140 and a second generating unit 1150 . Wherein, the first obtaining unit 1110 is used to obtain user characteristic information of the target user, the user characteristic information includes the user attribute information and historical behavior data of the target user; the first generating unit 1120 is used to obtain the user characteristic information based on the user characteristic information, and respectively generate multiple user feature vectors of the target user under the user feature dimension, where the user feature dimension refers to the category of the user attribute information and the operation event category and operation included in the historical behavior data. The feature dimension formed by the classification attributes of the object; the input unit 1130 is used for inputting the plurality of user feature vectors into a pre-trained machine learning model, and the pre-trained machine learning model contains sample users in the user feature dimension A plurality of user feature vectors and the preference grade labels of the sample users to the target object under the target classification attribute; output unit 1140 is used to obtain the target user output of the pre-trained machine learning model to the target object under the target classification attribute. The second generation unit 1150 is configured to generate the user portrait of the target user based on the target classification attribute if the acquired preference grade label matches the preset preference grade label.

需要说明的是，图12示出的电子设备的计算机系统1200仅是一个示例，不应对本申请实施例的功能和使用范围带来任何限制。It should be noted that the computer system 1200 of the electronic device shown in FIG. 12 is only an example, and should not impose any limitations on the functions and scope of use of the embodiments of the present application.

如图12所示，计算机系统1200包括中央处理单元(Central Processing Unit，CPU)1201，其可以根据存储在只读存储器(Read-Only Memory，ROM)1202中的程序或者从储存部分1208加载到随机访问存储器(Random Access Memory，RAM)1203中的程序而执行各种适当的动作和处理，例如执行上述实施例中所述的方法。在RAM 1203中，还存储有系统操作所需的各种程序和数据。CPU 1201、ROM 1202以及RAM 1203通过总线1204彼此相连。输入/输出(Input/Output，I/O)接口1205也连接至总线1204。As shown in FIG. 12 , the computer system 1200 includes a central processing unit (Central Processing Unit, CPU) 1201, which can be loaded into a random device according to a program stored in a read-only memory (Read-Only Memory, ROM) 1202 or from a storage part 1208 A program in a random access memory (RAM) 1203 is accessed to perform various appropriate actions and processes, for example, the methods described in the above embodiments are performed. In the RAM 1203, various programs and data necessary for system operation are also stored. The CPU 1201 , the ROM 1202 , and the RAM 1203 are connected to each other through a bus 1204 . An Input/Output (I/O) interface 1205 is also connected to the bus 1204 .

以下部件连接至I/O接口1205：包括键盘、鼠标等的输入部分1206；包括诸如阴极射线管(Cathode Ray Tube，CRT)、液晶显示器(Liquid Crystal Display，LCD)等以及扬声器等的输出部分1207；包括硬盘等的储存部分1208；以及包括诸如LAN(Local AreaNetwork，局域网)卡、调制解调器等的网络接口卡的通信部分1209。通信部分1209经由诸如因特网的网络执行通信处理。驱动器1210也根据需要连接至I/O接口1205。可拆卸介质1211，诸如磁盘、光盘、磁光盘、半导体存储器等等，根据需要安装在驱动器1210上，以便于从其上读出的计算机程序根据需要被安装入储存部分1208。The following components are connected to the I/O interface 1205: an input section 1206 including a keyboard, a mouse, etc.; an output section 1207 including a cathode ray tube (CRT), a liquid crystal display (LCD), etc., and a speaker, etc. ; a storage part 1208 including a hard disk and the like; and a communication part 1209 including a network interface card such as a LAN (Local Area Network) card, a modem, and the like. The communication section 1209 performs communication processing via a network such as the Internet. Drivers 1210 are also connected to I/O interface 1205 as needed. A removable medium 1211, such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, etc., is mounted on the drive 1210 as needed so that a computer program read therefrom is installed into the storage section 1208 as needed.

特别地，根据本申请的实施例，上文参考流程图描述的过程可以被实现为计算机软件程序。例如，本申请的实施例包括一种计算机程序产品，其包括承载在计算机可读介质上的计算机程序，该计算机程序包含用于执行流程图所示的方法的计算机程序。在这样的实施例中，该计算机程序可以通过通信部分1209从网络上被下载和安装，和/或从可拆卸介质1211被安装。在该计算机程序被中央处理单元(CPU)1201执行时，执行本申请的系统中限定的各种功能。In particular, according to embodiments of the present application, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present application include a computer program product comprising a computer program carried on a computer-readable medium, the computer program comprising a computer program for performing the method illustrated in the flowchart. In such an embodiment, the computer program may be downloaded and installed from the network via the communication portion 1209, and/or installed from the removable medium 1211. When the computer program is executed by the central processing unit (CPU) 1201, various functions defined in the system of the present application are executed.

需要说明的是，本申请实施例所示的计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质或者是上述两者的任意组合。计算机可读存储介质例如可以是——但不限于——电、磁、光、电磁、红外线、或半导体的系统、装置或器件，或者任意以上的组合。计算机可读存储介质的更具体的例子可以包括但不限于：具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机访问存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(Erasable Programmable Read Only Memory，EPROM)、闪存、光纤、便携式紧凑磁盘只读存储器(Compact Disc Read-Only Memory，CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本申请中，计算机可读存储介质可以是任何包含或存储程序的有形介质，该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。而在本申请中，计算机可读的信号介质可以包括在基带中或者作为载波一部分传播的数据信号，其中承载了计算机可读的计算机程序。这种传播的数据信号可以采用多种形式，包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读的信号介质还可以是计算机可读存储介质以外的任何计算机可读介质，该计算机可读介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。计算机可读介质上包含的计算机程序可以用任何适当的介质传输，包括但不限于：无线、有线等等，或者上述的任意合适的组合。It should be noted that the computer-readable medium shown in the embodiments of the present application may be a computer-readable signal medium or a computer-readable storage medium, or any combination of the above two. The computer-readable storage medium can be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus or device, or a combination of any of the above. More specific examples of computer readable storage media may include, but are not limited to, electrical connections with one or more wires, portable computer disks, hard disks, random access memory (RAM), read only memory (ROM), erasable Erasable Programmable Read Only Memory (EPROM), flash memory, optical fiber, portable Compact Disc Read-Only Memory (CD-ROM), optical storage device, magnetic storage device, or any suitable of the above The combination. In this application, a computer-readable storage medium can be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device. In this application, however, a computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, carrying a computer-readable computer program therein. Such propagated data signals may take a variety of forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing. A computer-readable signal medium can also be any computer-readable medium other than a computer-readable storage medium that can transmit, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device . A computer program embodied on a computer-readable medium may be transmitted using any suitable medium, including but not limited to: wireless, wired, etc., or any suitable combination of the foregoing.

附图中的流程图和框图，图示了按照本申请各种实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。其中，流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分，上述模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意，在有些作为替换的实现中，方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如，两个接连地表示的方框实际上可以基本并行地执行，它们有时也可以按相反的顺序执行，这依所涉及的功能而定。也要注意的是，框图或流程图中的每个方框、以及框图或流程图中的方框的组合，可以用执行规定的功能或操作的专用的基于硬件的系统来实现，或者可以用专用硬件与计算机指令的组合来实现。The flowchart and block diagrams in the Figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. Wherein, each block in the flowchart or block diagram may represent a module, program segment, or part of code, and the above-mentioned module, program segment, or part of code contains one or more executables for realizing the specified logical function instruction. It should also be noted that, in some alternative implementations, the functions noted in the blocks may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It is also noted that each block of the block diagrams or flowchart illustrations, and combinations of blocks in the block diagrams or flowchart illustrations, can be implemented in special purpose hardware-based systems that perform the specified functions or operations, or can be implemented using A combination of dedicated hardware and computer instructions is implemented.

描述于本申请实施例中所涉及到的单元可以通过软件的方式实现，也可以通过硬件的方式来实现，所描述的单元也可以设置在处理器中。其中，这些单元的名称在某种情况下并不构成对该单元本身的限定。The units involved in the embodiments of the present application may be implemented in software or hardware, and the described units may also be provided in a processor. Among them, the names of these units do not constitute a limitation on the unit itself under certain circumstances.

作为另一方面，本申请还提供了一种计算机可读介质，该计算机可读介质可以是上述实施例中描述的电子设备中所包含的；也可以是单独存在，而未装配入该电子设备中。上述计算机可读介质承载有一个或者多个程序，当上述一个或者多个程序被一个该电子设备执行时，使得该电子设备实现上述实施例中所述的方法。As another aspect, the present application also provides a computer-readable medium. The computer-readable medium may be included in the electronic device described in the above embodiments; it may also exist alone without being assembled into the electronic device. middle. The above-mentioned computer-readable medium carries one or more programs, and when the above-mentioned one or more programs are executed by an electronic device, enables the electronic device to implement the methods described in the above-mentioned embodiments.

应当注意，尽管在上文详细描述中提及了用于动作执行的设备的若干模块或者单元，但是这种划分并非强制性的。实际上，根据本申请的实施方式，上文描述的两个或更多模块或者单元的特征和功能可以在一个模块或者单元中具体化。反之，上文描述的一个模块或者单元的特征和功能可以进一步划分为由多个模块或者单元来具体化。It should be noted that although several modules or units of the apparatus for action performance are mentioned in the above detailed description, this division is not mandatory. Indeed, according to embodiments of the present application, the features and functions of two or more modules or units described above may be embodied in one module or unit. Conversely, the features and functions of one module or unit described above may be further divided into multiple modules or units to be embodied.

通过以上的实施方式的描述，本领域的技术人员易于理解，这里描述的示例实施方式可以通过软件实现，也可以通过软件结合必要的硬件的方式来实现。因此，根据本申请实施方式的技术方案可以以软件产品的形式体现出来，该软件产品可以存储在一个非易失性存储介质(可以是CD-ROM，U盘，移动硬盘等)中或网络上，包括若干指令以使得一台计算设备(可以是个人计算机、服务器、触控终端、或者网络设备等)执行根据本申请实施方式的方法。From the description of the above embodiments, those skilled in the art can easily understand that the exemplary embodiments described herein may be implemented by software, or may be implemented by software combined with necessary hardware. Therefore, the technical solutions according to the embodiments of the present application may be embodied in the form of software products, and the software products may be stored in a non-volatile storage medium (which may be CD-ROM, U disk, mobile hard disk, etc.) or on the network , which includes several instructions to cause a computing device (which may be a personal computer, a server, a touch terminal, or a network device, etc.) to execute the method according to the embodiments of the present application.

本领域技术人员在考虑说明书及实践这里公开的实施方式后，将容易想到本申请的其它实施方案。本申请旨在涵盖本申请的任何变型、用途或者适应性变化，这些变型、用途或者适应性变化遵循本申请的一般性原理并包括本申请未公开的本技术领域中的公知常识或惯用技术手段。Other embodiments of the present application will readily occur to those skilled in the art upon consideration of the specification and practice of the embodiments disclosed herein. This application is intended to cover any variations, uses or adaptations of this application that follow the general principles of this application and include common knowledge or conventional techniques in the technical field not disclosed in this application .

应当理解的是，本申请并不局限于上面已经描述并在附图中示出的精确结构，并且可以在不脱离其范围进行各种修改和改变。本申请的范围仅由所附的权利要求来限制。It is to be understood that the present application is not limited to the precise structures described above and illustrated in the accompanying drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the application is limited only by the appended claims.

Claims

1. A method for generating a user representation, comprising:

acquiring user characteristic information of a target user, wherein the user characteristic information comprises user attribute information and historical behavior data of the target user;

respectively generating a plurality of user feature vectors of the target user under a user feature dimension based on the user feature information, wherein the user feature dimension is a feature dimension formed by the category of the user attribute information, the operation event category contained in the historical behavior data and the classification attribute of the operation object;

inputting the plurality of user feature vectors into a pre-trained machine learning model, wherein the pre-trained machine learning model comprises a plurality of user feature vectors of sample users under user feature dimensions and preference level labels of the sample users for target objects under target classification attributes;

acquiring a preference level label of a target object under the target classification attribute of a target user output by the pre-trained machine learning model;

and if the acquired preference level label is matched with a preset preference level label, generating a user portrait of the target user based on the target classification attribute.

2. The method of generating a user representation as claimed in claim 1 wherein the pre-trained machine learning model determines a preference level label of a target user for a target object under a target classification attribute based on:

respectively carrying out conversion processing on the plurality of user characteristic vectors to generate first characteristic vectors corresponding to the plurality of user characteristic vectors, wherein the first characteristic vectors corresponding to the plurality of user characteristics are vectors with the same dimensionality;

based on preset incidence relations among the plurality of user feature information, carrying out aggregation processing on first feature vectors corresponding to the plurality of user feature vectors to generate a plurality of aggregated second feature vectors, wherein the first feature vectors corresponding to the plurality of user features are vectors with the same dimensionality;

and predicting preference level labels of the target users on the target objects under the target classification attributes based on the aggregated second feature vectors.

3. A user representation generation method as claimed in claim 1, further comprising:

acquiring training set sample data for training a machine learning model to be trained, wherein each piece of sample data in the training set sample data comprises a plurality of user feature vectors of a sample user under a user feature dimension and a preference level label of the sample user for a target object under a target classification attribute;

and training the machine learning model to be trained through the training set sample data to obtain the pre-trained machine learning model.

4. The method of generating a user representation of claim 3, wherein said obtaining training set sample data for training a machine learning model to be trained comprises:

acquiring user characteristic information of candidate users in a candidate user set;

for each candidate user in the candidate user set, determining a first class of candidate users and a second class of candidate users based on historical behavior data in user feature information of the candidate users, wherein the first class of candidate users are candidate users including target operation events, the second class of candidate users are candidate users not including the target operation events, and the target operation events refer to operation events for operating target objects under the target classification attributes;

generating a plurality of user characteristic vectors of the first type candidate users under the user characteristic dimension based on the user characteristic information of the first type candidate users, and adding preference grade labels with high preference degrees to the first type candidate users to obtain sample data of a positive sample user;

generating a plurality of user feature vectors of the second type candidate users under the user feature dimension based on the user feature information of the second type candidate users, and adding preference level labels with low preference degrees to the second type candidate users to obtain sample data of negative sample users;

and obtaining training set sample data used for training a machine learning model to be trained based on the sample data of the positive sample user and the sample data of the negative sample user.

5. The method of claim 4, wherein the generating a plurality of user feature vectors of the second type candidate users in the user feature dimension based on the user feature information of the second type candidate users and adding preference level labels with low preference degrees to the second type candidate users to obtain sample data of negative sample users comprises:

extracting candidate users from the second type of candidate users;

and generating a plurality of user feature vectors of the extracted candidate users under the user feature dimension based on the extracted user feature information of the candidate users, and adding preference grade labels with low preference degrees to the extracted candidate users to obtain sample data of the negative sample users.

6. The method of claim 5, wherein the extracting candidate users from the second category of candidate users comprises:

determining portrait intensity of the second type candidate users based on frequency of historical behavior data contained in user characteristic information of the second type candidate users, wherein the portrait intensity and the frequency are in positive correlation;

candidate users are extracted at a first ratio among the second category of candidate users having a portrait intensity higher than a predetermined portrait intensity threshold, and candidate users are extracted at a second ratio among the second category of candidate users having a portrait intensity lower than or equal to the predetermined portrait intensity threshold, the first ratio being greater than the second ratio.

7. The method of claim 4, wherein the obtaining user feature information of the candidate users in the candidate user set comprises:

acquiring a configuration file, wherein the configuration file is used for configuring the attribute category of the acquired user attribute information, the operation event category of the acquired historical behavior data and the classification attribute of an operation object;

for each candidate user in the candidate user set, acquiring user attribute information of the candidate user under the configured attribute category based on the category of the user attribute information configured in the configuration file;

and for each candidate user in the candidate user set, acquiring historical behavior data of the candidate user under the operation event category and the classification attribute of the operation object based on the operation event category and the classification attribute of the operation object configured in the configuration file.

8. An apparatus for generating a user representation, comprising:

the device comprises a first acquisition unit, a second acquisition unit and a third acquisition unit, wherein the first acquisition unit is used for acquiring user characteristic information of a target user, and the user characteristic information comprises user attribute information and historical behavior data of the target user;

a first generating unit, configured to generate, based on the user feature information, a plurality of user feature vectors of the target user in a user feature dimension, where the user feature dimension is a feature dimension formed by a category of the user attribute information, and an operation event category and a classification attribute of an operation object included in the historical behavior data;

the input unit is used for inputting the plurality of user feature vectors into a pre-trained machine learning model, and the pre-trained machine learning model comprises a plurality of user feature vectors of sample users under user feature dimensions and preference level labels of the sample users for target objects under target classification attributes;

the output unit is used for acquiring a preference level label of the target user on the target object under the target classification attribute output by the pre-trained machine learning model;

and the second generation unit is used for generating the user portrait of the target user based on the target classification attribute if the acquired preference level label is matched with a preset preference level label.

9. An electronic device, comprising:

one or more processors;

storage means for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to carry out a method of user representation generation as claimed in any one of claims 1 to 7.

10. A computer program medium having computer readable instructions stored thereon which, when executed by a processor, implement a method of user representation generation as claimed in any of claims 1 to 7.