CN114330519A - Data determination method and device, electronic equipment and storage medium - Google Patents

Data determination method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN114330519A
CN114330519A CN202111565403.7A CN202111565403A CN114330519A CN 114330519 A CN114330519 A CN 114330519A CN 202111565403 A CN202111565403 A CN 202111565403A CN 114330519 A CN114330519 A CN 114330519A
Authority
CN
China
Prior art keywords
target object
sample
vector
features
resource
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111565403.7A
Other languages
Chinese (zh)
Inventor
林伟
陈超超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Dajia Internet Information Technology Co Ltd
Original Assignee
Beijing Dajia Internet Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Dajia Internet Information Technology Co Ltd filed Critical Beijing Dajia Internet Information Technology Co Ltd
Priority to CN202111565403.7A priority Critical patent/CN114330519A/en
Publication of CN114330519A publication Critical patent/CN114330519A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present disclosure relates to a data determination method, apparatus, electronic device, and storage medium, which can depict user interest from multiple aspects and improve recall rate; relates to the technical field of big data processing. The method comprises the following steps: acquiring interactive information of a target object aiming at a target resource; determining the characteristics of the target object according to the interaction information; the features of the target object comprise features of multiple dimensions; determining a plurality of sets of weight coefficients, each set of weight coefficients comprising a plurality of weight coefficients corresponding to features of the plurality of dimensions; weighting each group of weight coefficients and the characteristics of the target object to obtain a plurality of vectors of the target object; wherein each vector of the target object corresponds to a set of weight coefficients; and searching candidate resources with the correlation meeting the preset requirement for each vector of the target object, and determining a data recall result corresponding to the target object according to the candidate resources corresponding to each vector in the plurality of vectors of the target object.

Description

Data determination method and device, electronic equipment and storage medium
Technical Field
The present disclosure relates to the field of big data processing technologies, and in particular, to a data determination method, a data determination apparatus, an electronic device, a computer-readable storage medium, and a computer program product.
Background
Data recall refers to the process of determining a portion of data of most interest to a user from a database having millions of pieces of data. The data may specifically include pictures, text, video, etc., and may also include targeted assets, such as advertisements, goods, etc. The determined data can be displayed to the user after being sequenced, and the method has a vital effect on data retrieval, information popularization and the like.
In the related art, data recall mainly adopts: vectorization-based recall schemes, tag-based recall schemes, and feature-based recall schemes. The retrieval method based on vectorization learns vector expression for users and advertisements through a neural network, calculates the relevance between the vectors of the users and the vectors of the advertisements, and retrieves the target advertisements with high relevance. However, the interests of users are various, the above recall scheme based on vectorization is difficult to characterize the interests of users in many aspects, and the recall result is biased to popular advertisements and lacks accuracy.
Disclosure of Invention
The present disclosure provides a data determination method, apparatus, electronic device and storage medium, to at least solve the problem of low accuracy of data recall in the related art. The technical scheme of the disclosure is as follows:
according to a first aspect of the embodiments of the present disclosure, there is provided a data determining method, including: acquiring interactive information of a target object aiming at a target resource; determining the characteristics of the target object according to the interaction information; the features of the target object include features of multiple dimensions; determining a plurality of sets of weight coefficients, each set of weight coefficients comprising a plurality of weight coefficients corresponding to features of a plurality of dimensions; weighting each group of weight coefficients and the characteristics of the target object to obtain a plurality of vectors of the target object; wherein each vector of the target object corresponds to a set of weight coefficients; and searching candidate resources with the correlation meeting the preset requirement for each vector of the target object, and determining a data recall result corresponding to the target object according to the candidate resources corresponding to each vector in the plurality of vectors of the target object.
The technical scheme provided by the embodiment of the disclosure at least brings the following beneficial effects:
in the technical scheme of the embodiment, a plurality of vectors of the target object can be obtained according to the characteristics of the target object, and the plurality of vectors can depict the user interests from a plurality of aspects, so that recalled data can be more comprehensively matched with the user interests, and the recall rate is improved. Meanwhile, one target object is expressed through a plurality of vectors, so that the difference among different users can be increased, the recalled data can meet the individual requirements of the users better, and the accuracy of data recall is improved.
In an exemplary embodiment, the weighting each set of weighting coefficients and the feature of the target object to obtain a plurality of vectors of the target object includes: inputting the characteristics of the target object into a plurality of preset characteristic selection networks, wherein each characteristic selection network comprises a group of weight coefficients, and weighting the characteristics of the target object through the weight coefficients of the characteristic selection networks to obtain vectors output by each characteristic selection network.
The characteristic selection network is a machine learning model, the characteristics of the target object are weighted in a machine learning model mode, the accuracy of the weight coefficient can be improved, the weight coefficient can be updated continuously, and the usability and the effectiveness are guaranteed.
In an exemplary embodiment, the method further includes: acquiring sample characteristics and label information corresponding to the sample characteristics; the sample characteristics comprise sample object characteristics and sample resource characteristics; the label information is used for representing whether an interaction behavior exists between the sample object corresponding to the sample object characteristic and the sample resource corresponding to the sample resource characteristic; inputting sample object features in the sample features into a plurality of first models to obtain a sample vector output by each first model; determining a sample resource vector corresponding to the sample resource feature in the sample features; respectively calculating the similarity of each sample vector and the sample resource vector, and determining a target sample vector with the highest similarity; determining the occurrence probability of the pre-estimated interactive behavior according to the target sample vector and the sample resource vector; and training the plurality of first models according to the estimated interactive behavior occurrence probability and the label information corresponding to the sample characteristics until a preset training end condition is met, and obtaining the plurality of characteristic selection networks.
The trained first models can ensure that the target object can be accurately depicted to obtain the vector of the target object, and different characteristics of the target object can be represented from different aspects and different emphasis points, so that the difference among the vectors is improved, and the individuation of the vectors is improved.
In an exemplary embodiment, training the plurality of first models according to the estimated interaction behavior occurrence probability and the label information corresponding to the sample features until a preset training end condition is met to obtain the plurality of feature selection networks includes: determining a loss value according to the estimated interactive behavior occurrence probability and the label information corresponding to the sample characteristics; updating model parameters in the first models corresponding to the target sample vectors according to the loss values until the loss values meet preset end conditions, and ending the training of the plurality of first models; and taking the first model after training as the feature selection network, wherein the model parameters of the first model after training are the weight coefficients of the feature selection network.
In an exemplary embodiment, before inputting the features of the sample target objects in the sample features into the plurality of first models, the method further includes: initial values of model parameters of a plurality of first models are randomly determined.
The initial values of the first models are randomly determined, so that the difference between different first models can be improved, and the difference between different object vectors can be improved.
In an exemplary embodiment, weighting each set of weighting coefficients and features of a target object to obtain a plurality of vectors of the target object includes: clustering the characteristics of the target object to obtain a clustered class; and determining the weight coefficient of the characteristics of the target object according to the weight coefficient corresponding to the class cluster so as to obtain a vector corresponding to each class cluster.
The clustering model can divide the characteristics of the target object into a plurality of clusters without supervision, which is more beneficial to discovering the interest of users and improving the recall rate of target resources. In addition, human participation is not needed in the clustering process, the efficiency can be improved, and the labor time resource can be saved.
In an exemplary embodiment, obtaining a vector of a target object output by each feature selection network by weighting features of the target object by a weighting coefficient of the feature selection network includes: inputting the characteristics of the target object into a characteristic selection network, wherein the characteristic selection network comprises a compression layer, an excitation layer and an output layer; compressing the dimension of the feature of the target object through the compression layer to obtain the feature of the one-dimensional target object; weighting the characteristics of the one-dimensional target object through the weight coefficient contained in the excitation layer to obtain the weighted characteristics of the target object; and reducing the dimension of the weighted characteristics of the target object through an output layer to obtain a vector of the target object.
In the above embodiment, the feature selection network can not only weight the features of the target object to obtain a plurality of vectors, enrich the representation of the user interest, but also reduce the dimension of the features, reduce the amount of calculation, and improve the calculation speed.
In an exemplary embodiment, the interaction information includes one or more of object attribute information, object behavior information, resource attribute information, resource behavior information, or object resource interaction information. The more information contained in the interactive information, the higher the number and the dimensionality of the features of the target object, and the more comprehensive the representation of the features of the target object, which is beneficial to improving the accuracy of the user vector.
According to a second aspect of the embodiments of the present disclosure, there is provided a data determination apparatus including: the data acquisition module is configured to acquire interactive information of a target object; the feature extraction module is configured to determine features of the target object according to the interaction information, wherein the features of the target object comprise features of multiple dimensions; a vector representation module configured to determine a plurality of sets of weight coefficients, each set of weight coefficients comprising a plurality of weight coefficients corresponding to features of a plurality of dimensions; weighting each group of weight coefficients and the characteristics of the target object to obtain a plurality of vectors of the target object; wherein each vector of the target object corresponds to a set of weight coefficients; the recall result determining module is configured to retrieve a plurality of candidate resources with the correlation meeting the preset requirement for each vector of the target object, and determine a data recall result corresponding to the target object according to the candidate resource corresponding to each vector in the plurality of vectors of the target object.
In an implementation manner of the second aspect of the embodiments of the present disclosure, the vector expression module is further configured to input the features of the target object into a plurality of preset feature selection networks, where each feature selection network includes a set of weight coefficients, and obtain the vector output by each feature selection network by weighting the features of the target object through the weight coefficients of the feature selection networks.
In an implementation manner of the second aspect of the embodiments of the present disclosure, the data determination apparatus further includes: the training data determining module is configured to obtain the sample characteristics and label information corresponding to the sample characteristics; the sample characteristics comprise sample object characteristics and sample resource characteristics; the label information is used for representing whether an interactive behavior exists between the sample object corresponding to the sample object characteristic and the sample resource corresponding to the sample resource characteristic; the first model training module is configured to input sample object features in the sample features into a plurality of first models to obtain a sample vector output by each first model; an object vector determination module configured to determine a sample resource vector corresponding to the sample resource feature in the sample features; the similarity calculation module is configured to calculate the similarity between each sample vector and the sample resource vector respectively and determine a target sample vector with the highest similarity; the interactive prediction module is configured to determine the occurrence probability of the pre-estimated interactive behavior according to the target sample vector and the sample resource vector; and the model training ending module is configured to train the plurality of first models according to the estimated interactive behavior occurrence probability and the label information corresponding to the sample characteristics until a preset training ending condition is met, so as to obtain a plurality of characteristic selection networks.
In one implementation of the second aspect of the embodiments of the present disclosure, the model training ending module includes: the loss determining module is configured to determine a loss value according to the estimated interactive behavior occurrence probability and the label information corresponding to the sample characteristics; updating model parameters in the first models corresponding to the target sample vectors according to the loss values, and ending the training of the plurality of first models until the loss values meet a preset ending condition; and taking the first model after the training is finished as a feature selection network, and taking the model parameters of the first model after the training is finished as the weight coefficients of the feature selection network.
In an implementation manner of the second aspect of the embodiments of the present disclosure, the data determination apparatus further includes: an initialization module configured to randomly determine initial values of model parameters of a plurality of first models.
In one implementation of the second aspect of the embodiments of the present disclosure, the vector expression module includes: the clustering module is configured to cluster the characteristics of the target object to obtain a clustered cluster; and the vector determining module is configured to determine a weight coefficient of the feature of the target object according to the weight coefficient corresponding to the class cluster so as to obtain a vector corresponding to each class cluster.
In one implementation of the second aspect of the embodiments of the present disclosure, the vector expression module includes: the characteristic input module is configured to input the characteristics of the target object into a characteristic selection network, and the characteristic selection network comprises a compression layer, an excitation layer and an output layer; the feature compression module is configured to compress the dimension of the feature of the target object through the compression layer to obtain the feature of the one-dimensional target object; the weighting module is configured to weight the characteristics of the one-dimensional target object through the weight coefficients contained in the excitation layer to obtain the weighted characteristics of the target object; and the vector output module is configured to reduce the dimension of the weighted features of the target object through an output layer to obtain a vector of the target object.
In an implementation of the second aspect of the embodiments of the present disclosure, the interaction information includes one or more of object attribute information, object behavior information, resource attribute information, resource behavior information, or object resource interaction information.
According to a third aspect of the embodiments of the present disclosure, there is provided an electronic apparatus including: a processor; a memory for storing processor-executable instructions. Wherein the processor is configured to execute the instructions to implement the method of the first aspect and any implementation manner of the embodiments of the present disclosure.
According to a fourth aspect of embodiments of the present disclosure, there is provided a computer-readable storage medium, wherein instructions, when executed by a processor of an electronic device, enable the electronic device to perform the method of the first aspect of the embodiments of the present disclosure and any implementation thereof.
According to a fifth aspect of embodiments of the present disclosure, there is provided a computer program product comprising computer programs/instructions. Wherein the computer programs/instructions, when executed by the processor, implement the methods of the first aspect and any of its embodiments of the present disclosure.
It should be understood that, for the beneficial effects that can be achieved by the data determination apparatus in the second aspect and any one of the embodiments thereof, the electronic device in the third aspect, the computer-readable storage medium in the fourth aspect, and the computer program product in the fifth aspect, reference may be made to the beneficial effects in the first aspect and any one of the possible design manners thereof, and details are not described here again.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the principles of the disclosure and are not to be construed as limiting the disclosure.
FIG. 1 is a flow chart illustrating a method of data determination according to an exemplary embodiment;
FIG. 2 is a flow chart illustrating a method of data determination according to another exemplary embodiment;
FIG. 3 is a schematic diagram illustrating a structure of a first model in a data determination method in accordance with an exemplary embodiment;
FIG. 4 is a schematic diagram illustrating a scenario of a data determination method according to an example embodiment;
FIG. 5 is a flow chart illustrating a method of data determination according to an exemplary embodiment;
FIG. 6 is a block diagram illustrating a data determination device in accordance with an exemplary embodiment;
fig. 7 is a block diagram illustrating an electronic device for implementing the data determination method described above according to an exemplary embodiment.
Detailed Description
In order to make the technical solutions of the present disclosure better understood by those of ordinary skill in the art, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings.
It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the disclosure described herein are capable of operation in sequences other than those illustrated or otherwise described herein. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.
The internet advertisement refers to an advertisement displayed on an electronic device in a form of text, picture, audio, video or other forms through internet media such as a website, a webpage, an internet application program and the like. Compared with the traditional propagation media advertisement, the internet advertisement has the advantages of being unique, not only having the fastest speed, but also recommending the advertisement information which is most matched according to the behavior characteristics, regions, interests and the like of each user.
The internet advertisement delivery comprises four main links of orientation, recall, sorting and bidding. Targeting refers to picking out eligible advertisements based on the user's own attributes, such as gender, age, etc. Recall refers to the selection of high value ads from the targeted output. The sorting means that the value of each advertisement output by recall is estimated and the advertisements are sorted according to the value. Bidding refers to selecting N advertisements with the highest value for display. In the recalling process, thousands of advertisements with high relevance to the user need to be found from millions of advertisement libraries, and the method plays an important role in putting internet advertisements.
Based on this, the embodiments of the present disclosure provide a data determination method, which can be applied to an electronic device. The electronic device may perform internet advertising in a website, web page, or application. When the electronic device carries out internet advertisement (hereinafter referred to as advertisement) putting, the advertisement recall result can be obtained by the data determination method. The electronic device can perform subsequent link (i.e. sorting and bidding) processing on the advertisement recall result, so that the finally screened advertisements are displayed, and the purpose of putting the advertisements to the user is achieved.
For example, the electronic device in the embodiment of the present application may be a portable computer (e.g., a mobile phone), a tablet computer, a notebook computer, a Personal Computer (PC), a wearable electronic device (e.g., a smart watch), an Augmented Reality (AR) \ Virtual Reality (VR) device, an in-vehicle computer, a smart home device, and the like, and the embodiment does not particularly limit the specific form of the electronic device.
It is to be understood that the data determination method in this embodiment may also be applied to a server or a cluster formed by a plurality of servers, and this embodiment is not particularly limited thereto.
Fig. 1 is a flowchart illustrating a data determination method according to an exemplary embodiment, where the data determination method is applied to the electronic device as shown in fig. 1, and includes the following steps:
step S11: and acquiring the interactive information of the target object aiming at the target resource.
Step S12: determining characteristics of the target object according to the interaction information, wherein the characteristics of the target object comprise characteristics of multiple dimensions.
Step S13: determining a plurality of sets of weight coefficients, each set of weight coefficients comprising a plurality of weight coefficients corresponding to features of a plurality of dimensions; weighting each group of weight coefficients and the characteristics of the target object to obtain a plurality of vectors of the target object; wherein each vector of the target object corresponds to a set of weight coefficients.
Step S14: and searching candidate resources with the correlation meeting the preset requirement for each vector of the target object, and determining a data recall result corresponding to the target object according to the candidate resources corresponding to each vector in the plurality of vectors of the target object.
The target resource refers to a resource promoted on the internet, such as an advertisement. The target object may be a user or an account or account of the user. In the present disclosure, a target object and an object are mainly used as an example for explanation. When the electronic equipment carries out advertisement putting, the electronic equipment can obtain the interaction information of the current target object, determine the characteristics of the target object according to the interaction information of the target object, obtain a plurality of vectors corresponding to the target object by weighting the characteristics of the target object, and then the electronic equipment can retrieve candidate resources respectively corresponding to the vectors, so that a data recall result is obtained. In this embodiment, the electronic device may determine the features of the target object according to the interaction information of the target object, and may learn a plurality of vectors by performing weighting processing on the features of the target object, so that the user interest may be described from multiple aspects by the plurality of vectors, the probability of matching the advertisement is increased, and the recall rate and accuracy of the advertisement recall are improved. In addition, the method can increase the difference among different users, avoid the problem that the recall result tends to hot advertisements, and meet the personalized requirements of the users.
Next, the respective steps of the data determination method of the present embodiment are explained in detail with reference to the drawings.
In step S11, interaction information of the target object with respect to the target resource is acquired.
The interactive information is data recorded by the electronic device when the user browses and clicks the target resource in a website, a webpage or an application program, and is used for recording the interactive behavior of the user and the target resource. Illustratively, the interaction information may specifically include object attribute information and object behavior information. The object attribute information refers to the attribute of an object (such as a user), and specifically may include an account, a gender, an age, a region, and the like of the object; the object behavior information refers to the behavior of the object on the target resource, and may specifically include information of the target resource clicked by the object, information of the purchased target resource, or information of the shielded target resource, and the like. The interaction information may further include information related to the target resource, such as resource attribute information and resource behavior information. The resource attribute information refers to the attribute of the target resource, and specifically may include a picture, a video, a character, and the like of the target resource; the resource behavior information refers to the interaction behavior of the target resource and the object, and may specifically include the number of times that the target resource is clicked, the number of times that the target resource is purchased, and the like within a period of time. The interaction information may also include object resource interaction information, and illustratively, the object resource interaction information may include the type of interaction between the object and the target resource, such as exposure, click, purchase, and the like, and the time when the interaction occurred. In addition, the interaction information may also include other information, such as an object identifier, a target resource type, and a category of the target resource, which is not particularly limited in this embodiment.
The more information contained in the interactive information, the higher the number and the dimensionality of the features of the target object, the more comprehensive the characterization of the features of the target object, and the improvement of the accuracy of the vector of the target object is facilitated.
In the process of resource release every time in history, the electronic device records resource attribute information, resource behavior information, object attribute information, object behavior information and object resource interaction information of target resources released every time, and records the information in a specific file, such as a log, for storage. The information may be read by an object identifier, which refers to identification information of the object (e.g., user) to be delivered, for verifying the identity of the object. For example, the object identifier may be a mobile phone number, an identification number, and the like, which is not limited in this embodiment. In addition, the information may also be read in other manners, for example, the information is queried and read through a resource identifier, where the resource identifier refers to identification information of a target resource, and the electronic device may allocate one piece of identification information to each target resource.
When a new round of popularization is needed, the electronic device can read the previously recorded log according to the object identifier of the target object which needs to be popularized currently, the information read from the log is used as the interactive information of the target object, and a new round of resource delivery is carried out by using the interactive information.
In step S12, determining the characteristics of the target object according to the interaction information; the features of the target object include features in multiple dimensions.
The characteristics of the target object refer to the characteristics represented by the interactive information of the target object, and the electronic equipment can extract a field from the interactive information as one characteristic of the target object; the stored attribute information of the target object may be used as the feature of the target object. A target object may have multiple characteristics, for example, the gender of the object may be used as one characteristic, the age may be used as one characteristic, the click behavior of the clicked target resource is used as one characteristic, the number of times the same target resource is clicked is used as one characteristic, and so on. A feature of the target object may also include a plurality of fields, for example, two fields of gender and age are used as a feature, and the feature may be an object attribute feature. Illustratively, the characteristics of the target object may include object attribute characteristics, object behavior characteristics, resource attribute characteristics, resource behavior characteristics, and object resource interaction characteristics. And extracting the stored fields from the interactive information to obtain the characteristics of the target object. For example, object behavior information (e.g., a target resource clicked by a user, the number of clicks of the target resource by the user, etc.) in the interaction information is extracted as the object behavior feature, and the like.
In step S13, determining a plurality of sets of weight coefficients, each set of weight coefficients including a plurality of weight coefficients corresponding to features of a plurality of dimensions; weighting each group of weight coefficients and the characteristics of the target object to obtain a plurality of vectors of the target object; wherein each vector of the target object corresponds to a set of weight coefficients.
In an exemplary embodiment, the electronic device may determine a weight coefficient for each feature of the target object, and then convert the feature of the weighted target object into a vector, thereby obtaining a vector of the target object. For example, features of each target object may be mapped to a number by using One-Hot coding (One-Hot) to obtain a vector corresponding to the target object. For example, if the feature of the target object is "male" and may be mapped to 1, the feature of the target object is "20 years old" and may be mapped to 010, and the feature of the target object is "click" and may be mapped to 100, the feature of the target object is expressed as a vector (1,010,100). In addition, other encoding methods may be used to express the features of the target object as vectors, such as Label encoding (Label encoding). The label coding is a mode that each value of the characteristic is defined as a digital label and the characteristic is represented by adopting the digital coding. And during coding, converting the characteristics of the target object into corresponding digital tags according to the specific values of the characteristics of each target object, so as to obtain vectors corresponding to the characteristics of the target object. For example, the feature 1 of the target object is "male", the corresponding numerical label is "1", and so on.
According to the importance of the characteristics of different target objects, the weight coefficient corresponding to the characteristics of the target objects can be determined, and the characteristics of the target objects which are more important for predicting the click rate can be determined to be a higher weight coefficient. After a weight coefficient is determined for the feature of each target object, the weight coefficient is multiplied by the feature of the target object, and a vector with weight can be obtained. For example, assuming that the electronic device extracts 5 target object features of the target object, which are expressed as (1,2,2,3,4) after encoding, a set of weight coefficients of (0.2,0.5,0.5,0.7,0.6) may be determined, and the resulting vector of (0.2 × 1,0.5 × 2,0.5 × 2,0.7 × 3,0.6 × 4).
Similarly, N different vectors can be obtained by determining N different sets of weighting coefficients. Wherein N is a positive integer. In an exemplary embodiment, the weight coefficients may be determined for the features of the target object by way of machine learning. Specifically, the electronic device may pre-train a machine learning model, where the machine learning model includes a plurality of feature selection networks, and each feature selection network may learn a set of weight coefficients. And inputting the characteristics of the target object or the vector corresponding to the characteristics of the target object into the trained machine learning model, and weighting the characteristics of the target object through a plurality of characteristic selection networks in the machine learning model so as to obtain a vector output by each characteristic selection network.
The importance of the weight coefficient more fitting the characteristics of the target object is obtained through the machine learning model, and the accuracy of vector expression can be improved, so that the accuracy of recall is improved. Moreover, the weight coefficient is obtained through learning of the machine learning model, so that the efficiency is higher, the weight coefficient can be continuously optimized and updated, the vector of the user can be accurately obtained when the interest of the user changes, and the usability is higher.
Illustratively, as shown in fig. 2, the method for training the machine learning model includes steps S21 to S26.
In step S21, a sample feature and label information corresponding to the sample feature are acquired; the sample characteristics comprise sample object characteristics and sample resource characteristics; the label information is used for representing whether an interactive behavior exists between the sample object corresponding to the sample object characteristic and the sample resource corresponding to the sample resource characteristic.
The sample features are training data required to train the machine learning model. The sample characteristics may be historical data collected by the electronic device during historical advertisement delivery, which may include characteristics of historical objects (i.e., sample object characteristics), sample resource characteristics, and tag information. The sample resource feature refers to a feature of a historically delivered advertisement, and may specifically include, for example, an advertisement type, an advertisement commodity type, and also include advertisement content, such as a picture, a video, a text, and the like, which is not particularly limited in this embodiment. The tag information indicates whether the advertisement that is historically delivered is clicked, for example, the advertisement is clicked, and the tag information may be "1", and the tag information may be "0" if not clicked.
The electronic device may first obtain a number of sample features for training the model. Each sample feature may include a sample resource feature, a sample object feature, and tag information recorded by the electronic device at the time of delivery of an advertisement. The number of sample characteristics may then include the effectiveness of different advertisements delivered to different users.
Step S22: and inputting the sample object characteristics in the sample characteristics into a plurality of first models to obtain a sample vector output by each first model.
Illustratively, the first model may consist of a Squeeze-and-Excitation Networks (SEnet) model. The SENet model can automatically acquire the importance degree of each feature through learning, and the more important features are promoted according to the importance degree. The SENET model is mainly divided into two parts, the Squeeze (compression) part and the Excitation part. Wherein, the Squeeze part can compress the dimension of the feature, and compress the multi-dimensional feature into the one-dimensional feature. Illustratively, multidimensional features may be compressed into one-dimensional features by averaging features of multiple dimensions. For example, when the age characteristic is a plurality of dimensions of "21 years", "22 years", "23 years", a mean value of the plurality of dimensions may be calculated, thereby converting the age characteristic into a one-dimensional characteristic, i.e., a characteristic of "22 years". The Excitation part may generate weights for the one-dimensional features obtained by the Squeeze part by a parameter, e.g., w. Finally, the SEnet model takes the weight output by the Excitation as the importance of the feature, and weights the feature input originally by multiplication to obtain the weighted feature.
Fig. 3 schematically shows a block diagram of the first model. As shown in fig. 3, the first model 300 includes a compression layer 301, an excitation layer 302, and an output layer 303. Wherein, the compression layer 301 is the compression part of the SENET model; the excitation layer 302 is the excitation part of the SENET model; the output layer is a DNN (Deep Neural Networks). The DNN can map high-dimensional features to low-dimensional features, thereby reducing the amount of computation of the features and increasing the computation speed.
For example, a high-dimensional sample object feature, such as the sample object feature X, may be input into the first model 300, and first, the compression layer 301 in the first model 300 may compress the high-dimensional sample object feature X into a one-dimensional sample object feature; the excitation layer 302 in the first model may then learn a set of weighting coefficients W for the one-dimensional sample object features output by the compression layer 301, where the width of the weighting coefficients W is consistent with the number of sample object features, for example, if the sample object features include 10, then the weighting coefficients W also have 10. The excitation layer 302 may weight the learned weight coefficients W one by one to the sample object features by multiplication, outputting W × X. The output layer (i.e., DNN)303 of the first model 300 may then map W X in the high dimension to the low dimension, outputting a sample vector X1.
In the present embodiment, the SENet model and the DNN are combined, the features are weighted, and the weighted features are reduced in dimension, so that the amount of calculation required for calculating the sample vector can be reduced, and the calculation speed can be increased.
According to the structure shown in fig. 3, N first patterns may be provided, where N is a positive integer. Inputting the characteristics of the same sample object into the N first models to obtain the sample vector output by each first model, namely obtaining N sample vectors. The parameters of the network layer, e.g. the excitation layer, in each first model are different, and thus the sample vector output by each first model is also different, so that a plurality of different sample vectors can be obtained. Different sample vectors can depict the user interests from different angles, so that the user interests can be more abundantly expressed, and the recall rate is improved.
In step S23, a sample resource vector corresponding to the sample resource feature in the sample features is determined.
The sample features may include a plurality of sample resource features, and for example, a sample resource vector corresponding to the sample resource features may be obtained by encoding the sample resource features. The encoding method may be One-Hot encoding, and the like, and this embodiment is not particularly limited thereto. And, the sample resource features can also be mapped into a low-dimensional sample resource vector through DNN.
In step S24, the similarity between each sample vector and the sample resource vector is calculated, and the target sample vector with the highest similarity is determined.
The similarity between the sample vector and the sample resource vector can be calculated by a distance algorithm, such as euclidean distance, cosine distance, etc. And respectively calculating the similarity of each sample vector and the sample resource vector, and determining a target sample vector with the highest similarity from the plurality of sample vectors. For example, for the sample resource vector a, the distances between the sample vector a1, the sample vector a2, the sample vector a3, and the sample resource vector a are calculated, and the obtained values are taken as the similarities. Of sample vector a1, sample vector a2, and sample vector a3, sample vector a1 is the target sample vector if the similarity between sample vector a1 and sample resource vector a is highest. Similarly, if the similarity between sample vector a2 and sample resource vector a is the highest, sample vector a2 is the target sample vector; sample vector a3 is the target sample vector if the similarity between sample vector a3 and sample resource vector a is highest.
In step S25, the estimated interactive behavior occurrence probability is determined according to the target sample vector and the sample resource vector.
The estimated click rate is a prediction of the probability that the target resource was clicked. The estimated click rate may be a probability value, such as 0.6, 0.7, etc.; the predicted probability value can also be converted into specific labels, such as yes, no, 0,1 and the like, and the labels are used as the predicted click rate to be more convenient to match. For example, the estimated interactive behavior occurrence probability may be determined according to the similarity between the target sample vector and the sample resource vector. For example, when the similarity between the target sample vector and the sample resource vector is greater than a threshold, the estimated occurrence probability of the interactive behavior may be 1, and if the similarity is less than a threshold, the estimated occurrence probability of the interactive behavior is 0.
The pre-estimated interaction behavior occurrence probability between the target sample vector with the highest similarity and the sample resource vector is predicted, and the recall rate of the first model to the sample resource features can be improved.
In step S26, training a plurality of first models according to the estimated occurrence probability of the interactive behavior and the label information corresponding to the sample features until a preset training end condition is satisfied, to obtain a plurality of feature selection networks.
Training the first model refers to the process of adjusting the parameters of the first model. Inputting each sample characteristic into the first model, updating the model parameters of the first model once according to the estimated interactive behavior occurrence probability and the label information, and finishing training when the model parameters of the first model reach the optimum value after iterating the process for multiple times through the multiple sample characteristics. The preset training end condition may include when the number of times of training reaches a preset value, when the first model converges, or when a loss of the first model satisfies a preset condition.
Specifically, determining a loss value according to the estimated interactive behavior occurrence probability and label information corresponding to the sample characteristics; updating model parameters in the first models corresponding to the target sample vectors according to the loss values until the loss values meet preset conditions, and finishing the training of the plurality of first models; and taking the first model after the training as a characteristic selection network, and taking the model parameters of the first model after the training as a weighting system of the characteristic selection network.
In this embodiment, only the model parameters of the first model corresponding to the target sample vector are updated in one training process. When the loss value of a certain first model, for example, the first model a1, satisfies the preset condition, the training of the first model a1 is ended, and the training of other first models is continued. And when the loss values of all the first models meet the preset condition, finishing the training. And taking the plurality of first models after finishing training as a plurality of feature selection networks. The preset condition may be that the loss value is not greater than a threshold value, for example, 0.2, 0.3, or the change rate of the loss value is not greater than a specific threshold value, which is not particularly limited in this embodiment.
The model parameters of the first model, i.e., the learned weight coefficients of the first model, such as the weight coefficient W in the excitation layer 302 of the first model 300. Training the N first models, N sets of weighting coefficients may be determined. For example, taking N-3 as an example, as shown in fig. 4, the network structures of the first model 401, the first model 402, and the first model 403 may be the same as those of the first model 300. An initial value may be randomly determined for the weight coefficient W in each first model. The N first models randomly determine N groups of weight coefficients W, and the initial values of the N first models are different through random initial value determination, so that the difference among the N first models is ensured.
The sample object features X are input into the first model 401, the first model 402, and the first model 403, respectively, so as to obtain a sample vector X1, a sample vector X2, and a sample vector X3, which are output by the first model 401, the first model 402, and the first model 403, respectively. The weighting coefficients in the first model 401, the first model 402, and the first model 403 are independent of each other, and a plurality of different vectors can be obtained by weighting the feature of the sample object with different side-points from a plurality of aspects. The sample resource feature Y of one target resource is input into DNN, and the sample object Y may be subjected to dimensionality reduction to obtain a corresponding sample resource vector Y1. And calculating the similarity of the sample vector X1, the sample vector X2 and the sample vector X3 with the sample resource vector Y1 respectively, and determining a target sample vector with the highest similarity. Assuming that the sample vector X1 has the highest similarity, the sample vector X1 is used as the target sample vector.
After the target sample vector is determined, the model parameters of the first model (e.g., the first model 401) corresponding to the target sample vector need to be updated. The initial value of the weight coefficient of the first model 401 is determined randomly, and when the randomly determined initial value is updated, the initial value may be enlarged or reduced, for example, each time the update is increased or reduced by 0.1, 0.05, etc. on the basis of the original weight coefficient.
For example, when the weight coefficients of the first model 401 are updated, the occurrence probability of the interaction between the sample vector X1 and the sample resource vector Y1 output by the first model 401 before updating and the loss value between the label information corresponding to the sample resource vector Y1 may be calculated, and if the loss value is a positive number, the loss value may be reduced by a predetermined value, for example, by 0.05, based on the original weight coefficients. If the calculated loss value is negative, the preset value may be increased, for example by 0.02, based on the original weighting factor.
It should be understood that the embodiment shown in fig. 2 is a process of training a model, the updating process may be performed in a loop before the training is finished, and the steps S22 to S26 may be a loop body, and the features of the sample target object input in the step S22 are updated each time the loop body is performed. Moreover, only the first model corresponding to the highest similarity is updated in each cycle, and the training processes of a plurality of first models can be differentiated, so that the difference between the trained first models is improved. When the vectors of the target object are output by using the trained first models, different first models can be guaranteed to be weighted for the characteristics of the target object from different angles and side points, then different vectors are output, and individuation and accuracy of the representation of the user interest can be improved.
For example, m sample features may be obtained in step S21, when the loop is executed for the first time, the features of the sample target object in the 1 st sample feature may be input in step S22, and so on, and when the loop is executed for the second time, the features of the sample target object in the 2 nd sample feature may be input, when the estimated interactive behavior occurrence probability of the sample vector and the sample resource vector output by a first model (e.g., the first model 401) is the same as the label information corresponding to the sample resource vector, or the loss value between the estimated interactive behavior occurrence probability and the label information satisfies the preset end condition, the first model 401 is not updated, when the estimated interactive behavior occurrence probabilities corresponding to all the sample vectors output by the first model are the same as the label information corresponding to the sample resource vector, or the loss values between the two satisfy the preset end condition, and stopping executing the circulation to obtain the finally updated N first models, and finishing the training of the first models.
In other embodiments, the preset training end condition may also be that the number of times of training of the plurality of first models reaches a preset value. For example, when the plurality of first models are trained 100 times, the training is terminated to obtain the final first model. As the number of training passes increases, the probability of model overfitting also increases. The training times are controlled within a certain range, so that the problem of model overfitting can be prevented. For example, the electronic device may also stop the loop when all the sample features are calculated, obtain the finally updated N first models, and complete the training. Or, the electronic device may set a threshold, and stop the loop when the number of times the loop body is executed reaches the threshold, so as to obtain the finally updated N first models. Or, the electronic device may calculate a distance between the updated weight coefficient of the first model and the weight coefficient before updating after each updating, stop updating the weight coefficient of the first model when the distance is smaller than a preset value, and complete updating to obtain N trained first models when the weight coefficients of all the first models meet the preset value.
If the initial values of the N first models are different, the sample vector X1, the sample vector X2, and the sample vector X3 output by the N first models for the first time are different. And in each round of circulation, only the first model corresponding to the highest similarity is updated, that is, the updating processes and the updating times of the N first models are also different, so that the weight coefficients of the N first models obtained after training are also different, and the difference between different vectors can be ensured. The trained N first models can ensure that the features of the target object are accurately described to obtain vectors, the difference between the vectors can be ensured, and the individuation of the vectors is improved.
The updated N first models may serve as N feature selection networks, and the N feature selection networks may be combined to serve as one machine learning model. When resource recall is required, the machine learning model can be utilized to obtain a plurality of vectors of the target object. Specifically, the features of the target object may be input into the machine learning model, and N sets of weighting coefficients may be added to the features of the target object through N feature selection networks in the machine learning model, so as to output N weighted vectors.
In some embodiments, multiple vectors of the target object may be obtained by other means as well. For example, as shown in fig. 5, obtaining a plurality of vectors of the target object may include step S51 and step S52.
In step S51, the features of the target object are clustered, and a plurality of cluster clusters after clustering are obtained.
The features of the target object may be clustered by a clustering model. The characteristics of the target objects may include multiple types, for example, object attribute characteristics (such as gender, age, etc.), target resource attribute characteristics (such as target resource type, target resource commodity type, etc.), the clustering model may calculate similarity between the characteristics of each target object and the characteristics of other target objects, and if the similarity satisfies a preset requirement, the characteristics of the corresponding two target objects may be classified into one class cluster. If the similarity between the features of the two target objects does not meet the preset requirement, the features of the two target objects may be respectively divided into different class clusters, for example, when the similarity between the feature a of the target object and the feature B of the target object does not meet the preset requirement, the feature a of the target object may be divided into one class cluster, the feature B of the target object may be divided into another class cluster, for example, the feature a of the target object is divided into class cluster 1, and the feature B of the target object is divided into class cluster 2.
In step S52, a weight coefficient of the feature of the target object is determined according to the weight coefficient corresponding to the class cluster, and a vector of the target object corresponding to each class cluster is obtained.
For example, a weight coefficient may be predetermined for each class cluster, and the weight coefficient corresponding to the class cluster may be obtained from the characteristics of the target object in the class cluster. Meanwhile, the features of the target object outside the class clusters may also take a preset value, for example, 0, 0.5, etc., as a weight coefficient, and a vector may be obtained for each class cluster. For example, the weight coefficient corresponding to the class cluster 1 is 1, for example, the class cluster 1 includes a feature 1 of the target object, a feature 2 of the target object, a feature 3 of the target object, and a feature 4 of the target object. The weight coefficient of the feature of each target object included in the class cluster 1 is 1, and the weight coefficient of the feature of other target objects not in the class cluster 1 may be a specific value, for example, 0, so that the weight coefficient of the feature of each target object may be determined. After the weight coefficient of the feature of each target object is determined, the feature of the weighted target object can be converted into a vector, and a vector corresponding to the cluster 1 is obtained. Similarly, a vector may be obtained for each class cluster, for example, if the clustering model divides the features of the target object into 5 class clusters, 5 vectors of the target object may be obtained.
In the above embodiment, the clustering model can divide the features of the target object into a plurality of clusters without supervision, which is more beneficial to discovering the user interest, slowing down the convergence of user vectors, and improving the recall rate of resources. In addition, human participation is not needed in the clustering process, the efficiency can be improved, and the labor time resource can be saved.
With continued reference to fig. 1, in step S14, candidate resources with correlations meeting the preset requirements are respectively retrieved for each vector, and a data recall result of the target object is determined according to the candidate resources corresponding to each vector in the plurality of vectors.
Relevance may refer to the estimated probability of occurrence of an interaction between a vector and a candidate resource.
The electronic device may store advertisements provided by advertisers in an advertisement library. All advertisements in the advertisement library can be used as candidate resources, such as clothes advertisements, daily necessities advertisements and the like; the object features of the candidate resource may include advertisement content, advertisement type (e.g., video, picture, text), advertisement duration, advertisement commodity type (e.g., clothes, daily necessities, etc.), advertisement commodity price, etc., which are not limited in this embodiment.
The characteristics of the candidate resources can be converted into vectors to obtain resource vectors corresponding to the candidate resources. Then, for a vector of the target object, the estimated occurrence probability of the interactive behavior between the vector and each resource vector can be calculated, and candidate resources with the estimated occurrence probability of the interactive behavior meeting the preset requirements are filtered out and serve as target resources corresponding to the vector. For example, after the target resource corresponding to each vector is calculated, the target resources corresponding to a plurality of vectors may be merged, and the merged result is used as the data recall result of the target object. For example, for N vectors of the target object, each vector may be respectively calculated with resource vectors of all candidate resources in the advertisement library to obtain an estimated occurrence probability of the interaction behavior, then target resources corresponding to each vector are filtered, target resources corresponding to the N vectors are respectively merged, and the merged result is used as a data recall result corresponding to the target object. That is to say, as long as the estimated occurrence probability of the interactive behavior between one candidate resource and any vector of the target object meets the preset requirement, the candidate resource can be used as the data recall result of the target object.
In the embodiment of the application, a plurality of vectors are obtained through feature calculation of the target object, and the features of the user (namely the object) can be represented from a plurality of aspects through the plurality of vectors, so that the user can be more comprehensively depicted, the advertisement recall result obtained by utilizing the plurality of vectors is more comprehensive, and the advertisement recall rate can be improved. Particularly for long-tail advertisements, the probability of being matched into the recall result is greatly increased, and the recall rate is improved more obviously.
For example, for a vector of a target object, the estimated occurrence probability of the interaction behavior between the vector and the resource vector of each candidate resource may be calculated, and the candidate resource whose estimated occurrence probability of the interaction behavior meets the preset requirement is filtered out and used as the target resource corresponding to the vector. After the target resources corresponding to each vector of the target object are obtained through calculation, the intersection between the target resources corresponding to each vector can be calculated, and the intersection is used as a data recall result of the target object. By calculating the intersection of the target resources recalled by the vectors respectively, the recall result can be more accurate, the calculation resources required by the follow-up processing of the recall result can be reduced, and the efficiency of popularizing the resources or data such as advertisements and the like is favorably improved.
FIG. 6 is a block diagram illustrating a data determination device according to an example embodiment. Referring to fig. 6, the data determination apparatus 600 includes: a data acquisition module 601, a feature extraction module 602, a vector expression module 603 and a recall result determination module 604. The data determination apparatus 600 may be configured to perform the data determination method, for example, the data acquisition module 601 may be configured to perform the step S11, the feature extraction module 602 may be configured to perform the step S12, the vector expression module 603 may be configured to perform the step S13, and the recall result determination module 604 may be configured to perform the step S14.
Specifically, the data acquisition module 601 is configured to acquire interaction information of the target object. The feature extraction module 602 is configured to determine features of the target object based on the interaction information, the features of the target object including features of a plurality of dimensions. The vector expression module 603 is configured to determine a plurality of sets of weight coefficients, each set of weight coefficients comprising a plurality of weight coefficients corresponding to features of a plurality of dimensions; weighting each group of weight coefficients and the characteristics of the target object to obtain a plurality of vectors of the target object; wherein each vector of the target object corresponds to a set of weight coefficients; the recall result determining module 604 is configured to retrieve, for each vector of the target object, a plurality of candidate resources whose correlations meet preset requirements, and determine a data recall result corresponding to the target object according to the candidate resource corresponding to each vector of the plurality of vectors of the target object.
In one embodiment, the vector expression module 603 is further configured to input the features of the target object into a plurality of preset feature selection networks, each feature selection network includes a set of weight coefficients, and the weight coefficients of the feature selection networks are used to weight the features of the target object, so as to obtain a vector output by each feature selection network.
In one embodiment, the data determination apparatus 600 further comprises: the training data determining module is configured to obtain the sample characteristics and label information corresponding to the sample characteristics; the sample characteristics comprise sample object characteristics and sample resource characteristics; the label information is used for representing whether an interactive behavior exists between the sample object corresponding to the sample object characteristic and the sample resource corresponding to the sample resource characteristic; the first model training module is configured to input sample object features in the sample features into a plurality of first models to obtain a sample vector output by each first model; an object vector determination module configured to determine a sample resource vector corresponding to the sample resource feature in the sample features; the similarity calculation module is configured to calculate the similarity between each sample vector and the sample resource vector respectively and determine a target sample vector with the highest similarity; the interactive prediction module is configured to determine the occurrence probability of the pre-estimated interactive behavior according to the target sample vector and the sample resource vector; and the model training ending module is configured to train the plurality of first models according to the estimated interactive behavior occurrence probability and the label information corresponding to the sample characteristics until a preset training ending condition is met, so as to obtain a plurality of characteristic selection networks.
In one embodiment, the model training completion module comprises: the loss determining module is configured to determine a loss value according to the estimated interactive behavior occurrence probability and the label information corresponding to the sample characteristics; updating model parameters in the first models corresponding to the target sample vectors according to the loss values, and ending the training of the plurality of first models until the loss values meet a preset ending condition; and taking the first model after the training is finished as a feature selection network, and taking the model parameters of the first model after the training is finished as the weight coefficients of the feature selection network.
In one embodiment, the data determination apparatus 600 further comprises: an initialization module configured to randomly determine initial values of model parameters of a plurality of first models.
In one embodiment, the vector expression module 603 comprises: the clustering module is configured to cluster the characteristics of the target object to obtain a clustered cluster; and the vector determining module is configured to determine a weight coefficient of the feature of the target object according to the weight coefficient corresponding to the class cluster so as to obtain a vector corresponding to each class cluster.
In one embodiment, the vector expression module 603 comprises: the characteristic input module is configured to input the characteristics of the target object into a characteristic selection network, and the characteristic selection network comprises a compression layer, an excitation layer and an output layer; the feature compression module is configured to compress the dimension of the feature of the target object through the compression layer to obtain the feature of the one-dimensional target object; the weighting module is configured to weight the characteristics of the one-dimensional target object through the weight coefficients contained in the excitation layer to obtain the weighted characteristics of the target object; and the vector output module is configured to reduce the dimension of the weighted features of the target object through an output layer to obtain a vector of the target object.
In one embodiment, the interaction information includes one or more of object attribute information, object behavior information, resource attribute information, resource behavior information, or object resource interaction information.
With regard to the data determination apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the data determination method, and will not be elaborated here.
The present disclosure also provides an electronic device, which is applicable to perform the above data determination method. Exemplarily, fig. 7 is a schematic structural diagram of an electronic device 700 provided by the present disclosure. As shown in fig. 7, the electronic device 700 may include at least one processor 701 and a memory 703 for storing instructions executable by the processor 701. Wherein the processor 701 is configured to execute instructions in the memory 703 to implement the data determination method in the above-described embodiments.
Additionally, electronic device 700 may include a communication bus 702 and at least one communication interface 704.
The processor 701 may be a GPU, a micro-processing unit, an ASIC, or one or more integrated circuits for controlling the execution of programs in accordance with the disclosed aspects.
The communication bus 702 may include a path that conveys information between the aforementioned components.
Communication interface 704, using any transceiver or the like, may be used to communicate with other devices or communication networks, such as an ethernet, a Radio Access Network (RAN), a Wireless Local Area Network (WLAN), etc.
The memory 703 may be, but is not limited to, a read-only memory (ROM) or other type of static storage device that can store static information and instructions, a Random Access Memory (RAM) or other type of dynamic storage device that can store information and instructions, an electrically erasable programmable read-only memory (EEPROM), a compact disk read-only memory (CD-ROM) or other optical disk storage, optical disk storage (including compact disk, laser disk, optical disk, digital versatile disk, blu-ray disk, etc.), magnetic disk storage media or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer. The memory may be self-contained and connected to the processing unit by a bus. The memory may also be integrated with the processing unit as a volatile storage medium in the GPU.
The memory 703 is used for storing instructions for executing the disclosed solution, and is controlled by the processor 701. The processor 701 is configured to execute instructions stored in the memory 703 to implement the functions of the data determination method of the present disclosure.
In particular implementations, processor 701 may include one or more GPUs, such as GPU0 and GPU1 in fig. 7, as one embodiment.
In particular implementations, electronic device 700 may include multiple processors, such as processor 701 and processor 707 in fig. 7, for example, as an embodiment. Each of these processors may be a single-Core (CPU) processor or a multi-core (multi-GPU) processor. A processor herein may refer to one or more devices, circuits, and/or processing cores for processing data (e.g., computer program instructions).
In particular implementations, electronic device 700 may also include an output device 705 and an input device 706, as one embodiment. An output device 705 is in communication with the processor 701 and may display information in a variety of ways. For example, the output device 705 may be a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display device, a Cathode Ray Tube (CRT) display device, a projector (projector), or the like. The input device 706 communicates with the processor 701 and may accept input from a user in a variety of ways. For example, the input device 706 may be a mouse, a keyboard, a touch screen device, or a sensing device, among others.
Those skilled in the art will appreciate that the configuration shown in fig. 7 does not constitute a limitation of the electronic device 700 and may include more or fewer components than those shown, or combine certain components, or employ a different arrangement of components.
The present disclosure also provides a computer-readable storage medium having instructions stored thereon, which, when executed by a processor of a server, enable the server to perform the data determination method provided by the embodiment of the present disclosure.
The embodiment of the present disclosure further provides a computer program product containing instructions, which when run on a server, causes the server to execute the data determination method provided by the embodiment of the present disclosure.
Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This disclosure is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.
It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims (10)

1. A method for determining data, comprising:
acquiring interactive information of a target object aiming at a target resource;
determining the characteristics of the target object according to the interaction information; the features of the target object comprise features of multiple dimensions;
determining a plurality of sets of weight coefficients, each set of weight coefficients comprising a plurality of weight coefficients corresponding to features of the plurality of dimensions; weighting each group of weight coefficients and the characteristics of the target object to obtain a plurality of vectors of the target object; wherein each vector of the target object corresponds to a set of weight coefficients;
and searching candidate resources with the correlation meeting the preset requirement for each vector of the target object, and determining a data recall result corresponding to the target object according to the candidate resources corresponding to each vector in the plurality of vectors of the target object.
2. The method of claim 1, wherein weighting each set of weighting coefficients with the feature of the target object to obtain a plurality of vectors of the target object comprises:
inputting the characteristics of the target object into a plurality of preset characteristic selection networks, wherein each characteristic selection network comprises a group of weight coefficients, and weighting the characteristics of the target object through the weight coefficients of the characteristic selection networks to obtain the vector of the target object output by each characteristic selection network.
3. The method of claim 2, further comprising:
acquiring sample characteristics and label information corresponding to the sample characteristics; the sample characteristics comprise sample object characteristics and sample resource characteristics; the label information is used for representing whether an interaction behavior exists between the sample object corresponding to the sample object characteristic and the sample resource corresponding to the sample resource characteristic;
inputting sample object features in the sample features into a plurality of first models to obtain a sample vector output by each first model;
determining a sample resource vector corresponding to the sample resource feature in the sample features;
respectively calculating the similarity of each sample vector and the sample resource vector, and determining a target sample vector with the highest similarity;
determining the occurrence probability of the pre-estimated interactive behavior according to the target sample vector and the sample resource vector;
and training the plurality of first models according to the estimated interactive behavior occurrence probability and the label information corresponding to the sample characteristics until a preset training end condition is met, and obtaining the plurality of characteristic selection networks.
4. The method of claim 3, wherein training the plurality of first models according to the estimated interaction behavior occurrence probability and the label information corresponding to the sample features until a preset training end condition is met to obtain the plurality of feature selection networks comprises:
determining a loss value according to the estimated interactive behavior occurrence probability and the label information corresponding to the sample characteristics;
updating model parameters in the first models corresponding to the target sample vectors according to the loss values until the loss values meet preset end conditions, and ending the training of the plurality of first models;
and taking the first model after training as the feature selection network, wherein the model parameters of the first model after training are the weight coefficients of the feature selection network.
5. The method of claim 3, wherein prior to inputting sample object features of the sample features into a plurality of first models, the method further comprises:
initial values of model parameters of a plurality of the first models are randomly determined.
6. The method of claim 1, wherein weighting each set of weighting coefficients and features of the target object to obtain a plurality of vectors of the target object comprises:
clustering the characteristics of the target object to obtain a clustered class;
and determining the weight coefficient of the characteristics of the target object according to the weight coefficient corresponding to the class cluster so as to obtain the vector of the target object corresponding to each class cluster.
7. A data determination apparatus, comprising:
the data acquisition module is configured to acquire the interaction information of the target object aiming at the target resource;
a feature extraction module configured to determine features of the target object according to the interaction information, the features of the target object including features of multiple dimensions;
a vector representation module configured to determine a plurality of sets of weight coefficients, each set of weight coefficients comprising a plurality of weight coefficients corresponding to features of the plurality of dimensions; weighting each group of weight coefficients and the characteristics of the target object to obtain a plurality of vectors of the target object; wherein each vector of the target object corresponds to a set of weight coefficients;
and the recall result determining module is configured to search a plurality of candidate resources with correlation meeting preset requirements for each vector of the target object, and determine a data recall result corresponding to the target object according to the candidate resource corresponding to each vector in the plurality of vectors of the target object.
8. An electronic device, comprising:
a processor;
a memory for storing the processor-executable instructions;
wherein the processor is configured to execute the instructions to implement the data determination method of any one of claims 1 to 6.
9. A computer-readable storage medium, wherein instructions in the computer-readable storage medium, when executed by a processor of an electronic device, enable the electronic device to perform the data determination method of any of claims 1 to 6.
10. A computer program product comprising computer programs/instructions, characterized in that the computer programs/instructions, when executed by a processor, implement the data determination method of any of claims 1 to 6.
CN202111565403.7A 2021-12-20 2021-12-20 Data determination method and device, electronic equipment and storage medium Pending CN114330519A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111565403.7A CN114330519A (en) 2021-12-20 2021-12-20 Data determination method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111565403.7A CN114330519A (en) 2021-12-20 2021-12-20 Data determination method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN114330519A true CN114330519A (en) 2022-04-12

Family

ID=81052957

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111565403.7A Pending CN114330519A (en) 2021-12-20 2021-12-20 Data determination method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN114330519A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115618748A (en) * 2022-11-29 2023-01-17 支付宝(杭州)信息技术有限公司 Model optimization method, device, equipment and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115618748A (en) * 2022-11-29 2023-01-17 支付宝(杭州)信息技术有限公司 Model optimization method, device, equipment and storage medium
CN115618748B (en) * 2022-11-29 2023-05-02 支付宝(杭州)信息技术有限公司 Model optimization method, device, equipment and storage medium

Similar Documents

Publication Publication Date Title
CN111538912B (en) Content recommendation method, device, equipment and readable storage medium
US20220405607A1 (en) Method for obtaining user portrait and related apparatus
CN110489582B (en) Method and device for generating personalized display image and electronic equipment
CN109783730A (en) Products Show method, apparatus, computer equipment and storage medium
CN111784455A (en) Article recommendation method and recommendation equipment
CN109471978B (en) Electronic resource recommendation method and device
CN110008397B (en) Recommendation model training method and device
US20210073890A1 (en) Catalog-based image recommendations
CN113379449B (en) Multimedia resource recall method and device, electronic equipment and storage medium
CN111292168B (en) Data processing method, device and equipment
CN112905897B (en) Similar user determination method, vector conversion model, device, medium and equipment
CN112801425B (en) Method and device for determining information click rate, computer equipment and storage medium
CN111967924A (en) Commodity recommendation method, commodity recommendation device, computer device, and medium
CN111861605A (en) Business object recommendation method
US11823217B2 (en) Advanced segmentation with superior conversion potential
CN115982463A (en) Resource recommendation method, device, equipment and storage medium
US20240214616A1 (en) Machine learning techniques for advanced frequency management
CN114862480A (en) Advertisement putting orientation method and its device, equipment, medium and product
CN114330519A (en) Data determination method and device, electronic equipment and storage medium
CN112036987B (en) Method and device for determining recommended commodity
CN113327132A (en) Multimedia recommendation method, device, equipment and storage medium
CN112115354A (en) Information processing method, information processing apparatus, server, and storage medium
AU2019200721B2 (en) Online training and update of factorization machines using alternating least squares optimization
CN112989174A (en) Information recommendation method and device, medium and equipment
CN111768218A (en) Method and device for processing user interaction information

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination