CN114385917A

CN114385917A - Data processing method, device and equipment

Info

Publication number: CN114385917A
Application number: CN202210039937.4A
Authority: CN
Inventors: 王新民; 胡伟龙; 张伟锋; 谭莲芝; 袁镱; 潘欣
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2022-01-13
Filing date: 2022-01-13
Publication date: 2022-04-22

Abstract

The application discloses a data processing method, a data processing device and data processing equipment, wherein related embodiments of the data processing method, the data processing device and the data processing equipment can be applied to scenes such as cloud technology, artificial intelligence, intelligent traffic, driving assistance and the like; the method comprises the following steps: acquiring a plurality of to-be-processed features, wherein the plurality of to-be-processed features comprise object features of a target object and resource features of target resources to be pushed to the target object; performing outer product processing on the plurality of features to be processed to perform feature interaction on the plurality of features to be processed to obtain feature interaction vectors corresponding to the features to be processed in the plurality of features to be processed, wherein the feature interaction vectors corresponding to the features to be processed are high-order semantic features of the features to be processed; carrying out click rate estimation processing on the feature interaction vectors corresponding to the features to be processed to obtain the estimated click rate of the target object to the target resource, wherein the estimated click rate is used for indicating: probability of the target object clicking the target resource. The method can improve the accuracy of estimating the click rate of the target object to the target resource.

Description

Data processing method, device and equipment

Technical Field

The present application relates to the field of artificial intelligence, and in particular, to a data processing method, apparatus, device, storage medium, and computer program product.

Background

Click-through rate prediction (CTR prediction), which is a technique for predicting the probability of a user clicking on a resource; the click rate estimation is an important part for realizing personalized pushing, and whether the resource is pushed to the user can be determined according to the estimated probability that the user clicks the resource. The accuracy of the click rate estimation directly influences the accuracy of personalized push, and further influences the user experience and the resource exposure. Therefore, how to improve the accuracy of the click rate estimation is the current research focus.

Disclosure of Invention

The embodiment of the application provides a data processing method, a data processing device, data processing equipment, a storage medium and a computer program product, which can improve the accuracy of estimating the click rate of a target object to a target resource.

In one aspect, an embodiment of the present application provides a data processing method, including:

acquiring a plurality of to-be-processed features, wherein the plurality of to-be-processed features comprise object features of a target object and resource features of target resources to be pushed to the target object;

performing outer product processing on the plurality of features to be processed to perform feature interaction on the plurality of features to be processed to obtain feature interaction vectors corresponding to the features to be processed in the plurality of features to be processed, wherein the feature interaction vectors corresponding to the features to be processed are high-order semantic features of the features to be processed;

performing click rate estimation processing on the feature interaction vectors corresponding to the features to be processed to obtain estimated click rate of the target object on the target resource, wherein the estimated click rate is used for indicating: a probability of the target object clicking the target resource.

In one aspect, an embodiment of the present application provides a data processing apparatus, including:

the device comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring a plurality of to-be-processed characteristics, and the to-be-processed characteristics comprise object characteristics of a target object and resource characteristics of target resources to be pushed to the target object;

the processing unit is used for performing outer product processing on the plurality of features to be processed so as to perform feature interaction on the plurality of features to be processed to obtain feature interaction vectors corresponding to the features to be processed in the plurality of features to be processed, wherein the feature interaction vectors corresponding to the features to be processed are high-order semantic features of the features to be processed;

the processing unit is further configured to perform click rate estimation processing on the feature interaction vector corresponding to each feature to be processed to obtain an estimated click rate of the target object on the target resource, where the estimated click rate is used to indicate: a probability of the target object clicking the target resource.

In one aspect, an embodiment of the present application provides a data processing device, where the data processing device includes an input interface and an output interface, and further includes:

a processor adapted to implement one or more instructions; and the number of the first and second groups,

a computer storage medium having stored thereon one or more instructions adapted to be loaded by the processor and to execute the above-described data processing method.

In one aspect, an embodiment of the present application provides a computer storage medium, where computer program instructions are stored in the computer storage medium, and when the computer program instructions are executed by a processor, the computer storage medium is configured to execute the data processing method.

In one aspect, embodiments of the present application provide a computer program product or a computer program, where the computer program product or the computer program includes computer instructions, and the computer instructions are stored in a computer-readable storage medium; the processor of the data processing device reads the computer instructions from the computer readable storage medium, and executes the computer instructions, and the computer instructions, when executed by the processor, are used for executing the data processing method.

In the embodiment of the application, after acquiring a plurality of to-be-processed features including resource features of a target resource and object features of a target object, a data processing device may perform outer product processing on the plurality of to-be-processed features to perform feature interaction on the plurality of to-be-processed features to obtain feature interaction vectors corresponding to each to-be-processed feature in the plurality of to-be-processed features; carrying out click rate estimation processing on the feature interaction vector corresponding to each feature to be processed to obtain the estimated click rate of the target object to the target resource; wherein the estimated click rate is used to indicate: probability of the target object clicking the target resource. The high-order semantic features of the features to be processed can be learned through the outer product processing of the features to be processed, so that the accuracy of estimating the click rate of the target object to the target resource can be improved, and the interested information can be efficiently acquired.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a block diagram of a data processing system according to an embodiment of the present disclosure;

fig. 2 is a schematic flowchart of a data processing method according to an embodiment of the present application;

fig. 3 is a schematic diagram for obtaining feature interaction vectors corresponding to features to be processed according to an embodiment of the present application;

fig. 4 is a schematic diagram of a projection matrix and a corresponding relationship between scaling weights and features to be processed provided in an embodiment of the present application;

fig. 5 is a schematic diagram of obtaining a feature interaction vector corresponding to an nth feature to be processed according to an embodiment of the present application;

FIG. 6 is a schematic diagram illustrating click through rate estimation performed based on a data processing model according to an embodiment of the present disclosure;

FIG. 7 is a flowchart illustrating a method for training a data processing model according to an embodiment of the present disclosure;

FIG. 8 is a schematic overall flowchart of training an initial image processing model according to an embodiment of the present disclosure;

FIG. 9 is a schematic diagram of training an initial image processing model according to an embodiment of the present disclosure;

fig. 10 is a schematic structural diagram of a data processing apparatus according to an embodiment of the present application;

fig. 11 is a schematic structural diagram of a data processing apparatus according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

Artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human Intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.

The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.

In order to improve the accuracy of click rate estimation, the embodiment of the application provides a data processing scheme based on machine learning in the field of artificial intelligence, and can perform outer product processing on a plurality of to-be-processed features after acquiring the plurality of to-be-processed features including resource features of target resources and object features of target objects to obtain feature interaction vectors corresponding to the to-be-processed features in the plurality of to-be-processed features; and then carrying out click rate estimation processing on the feature interaction vectors corresponding to the features to be processed to obtain the estimated click rate of the target object to the target resource.

The data processing scheme can be executed through data processing equipment, wherein the data processing equipment can be terminal equipment, and the terminal equipment can include but is not limited to a smart phone, a tablet computer, a notebook computer, a desktop computer, intelligent voice interaction equipment, intelligent household appliances, a vehicle-mounted terminal, intelligent wearable equipment and the like; the server may be, for example, an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a web service, cloud communication, a middleware service, a domain name service, a security service, a Content Delivery Network (CDN), a big data and artificial intelligence platform, and the like. The data processing scheme can be applied to various scenes such as cloud technology, artificial intelligence, intelligent traffic, auxiliary driving and the like.

Based on the above data processing scheme, an embodiment of the present application provides a data processing system, and referring to fig. 1, a schematic structural diagram of the data processing system provided in the embodiment of the present application is provided. The data processing system shown in fig. 1 may include a data processing apparatus 101 and a terminal apparatus 102. The data processing device 101 may be a server, for example, an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing basic cloud computing services such as cloud service, a cloud database, cloud computing, a cloud function, cloud storage, Network service, cloud communication, middleware service, domain name service, security service, Content Delivery Network (CDN), big data, and an artificial intelligence platform; terminal device 102 may include any one or more of a smartphone, a tablet, a laptop, a desktop computer, a smart car, and a smart wearable device. The data processing device 101 and the terminal device 102 may be directly or indirectly connected in a communication manner through a wired or wireless communication manner, and the present application is not limited herein.

In one embodiment, the terminal device 102 may have a target application running therein, and the target application may be any application that can provide a resource pushing service, for example, a social application, a news application, a music application, a video playing application, or the like. The target application may provide a resource pushing service for a target object corresponding to the terminal device 102, where the resource pushed by the target application is a resource related to a service provided by the target application, for example, if the target application is a news application, the target application may push news, current comment articles, and the like to the target object; if the target application is a video playing application, the target application can push a movie, a variety program, a documentary, and the like to the target object. The data processing device 101 is a data processing device corresponding to a target application, and may provide service support for the target application, and specifically may provide resource push service support for the target application. For example, if the target application is a news application, the data processing device 101 is a data processing device corresponding to the news application, and may provide service support for the news application; for another example, if the target application is a video playing application, the data processing device 101 may provide service support for the video playing application for a data processing device corresponding to the video playing application. The target object may be any user that uses the service provided by the target application.

In one embodiment, the target object may use a service provided by the target application through its terminal device 102, for example, click to view a resource pushed by the target application. After acquiring a plurality of to-be-processed features including resource features of a target resource and object features of a target object, the data processing device 101 may perform outer product processing on the plurality of to-be-processed features to obtain feature interaction vectors corresponding to each to-be-processed feature in the plurality of to-be-processed features; and then carrying out click rate estimation processing on the feature interaction vectors corresponding to the features to be processed to obtain the estimated click rate of the target object to the target resource. Further, the data processing device 101 may determine whether to push the target resource to the target object according to the estimated click rate; if the data processing device 101 determines that the target resource needs to be pushed to the target object, the data processing device 101 may send the target resource to the terminal device 102, so that the target object may click and view the pushed target resource. For example, if the target resource is an advertisement resource to be pushed in the target application, the advertisement resource can be determined to be pushed to which objects in the target application according to the estimated click rate of each object in the target application to the advertisement resource, and thus the exposure rate of the advertisement resource in the pushed objects can be increased. For example, in a scenario of smart traffic, the terminal device 102 may be a vehicle-mounted terminal, and the target application may be an application for pushing traffic information; if the target resource is the traffic information to be pushed, the data processing device 101 may determine whether to push the traffic information to the vehicle-mounted terminal according to the estimated click rate of the target object on the traffic information, so that the target object may know the traffic information, and the driving experience is improved. Further, the data processing apparatus 101 may determine which traffic information is pushed to the target object according to the estimated click rate of the target object for various traffic information, for example, the traffic information with the highest estimated click rate may be pushed to the target object, so as to improve the driving experience.

In one embodiment, the object feature of the target object included in the plurality of features to be processed may be extracted by the data processing apparatus 101 based on raw object feature data generated by the target object in the target application acquired from the terminal apparatus 102, that is, the terminal apparatus 102 may transmit the raw object feature data generated by the target object in the target application to the data processing apparatus 101, and the data processing apparatus 101 may extract the object feature from the raw object feature data.

In the present application, data generated by a target object in a target application, such as data of original object feature data generated by the target object in the target application, are referred to, and when the embodiments of the present application are applied to specific products or technologies, they all obtain user permission or consent, and extraction, use and processing of relevant data comply with local laws and regulations. For example, the data processing device 101 may send an authorization protocol for acquiring data generated by the target object in the target application to the terminal device 102 of the target object before acquiring the data generated by the target object in the target application, for example, before acquiring original object feature data generated by the target object in the target application; the data processing device 101 may obtain the data generated by the target object in the target application only when the target object agrees with the authorization protocol, otherwise, the data processing device 101 may not obtain the data generated by the target object in the target application.

Based on the data processing scheme, the embodiment of the application provides a data processing method. Referring to fig. 2, a schematic flow chart of a data processing method according to an embodiment of the present application is shown. The data processing method shown in fig. 2 may be performed by a data processing apparatus. The data processing method shown in fig. 2 may include the steps of:

s201, acquiring a plurality of features to be processed.

The plurality of features to be processed comprise object features of a target object, and the target object can be any user using a service provided by a target application; the plurality of characteristics to be processed further include resource characteristics of a target resource, and the target resource can be any resource which can be provided by a target application; further, the target resource may be a resource to be pushed to the target object, that is, the plurality of features to be processed may include an object feature of the target object and a resource feature of the target resource to be pushed to the target object. For example, if the target application is a video playing application, the target resource may be a movie to be pushed to the target object by the target application; the click rate of the target object clicking on the target resource is used for indicating that: probability of the target object clicking the target resource.

In one embodiment, the resource characteristics of the target resource are used to describe the target resource, such as a resource identifier that can be used to describe the target resource, a name of the target resource, a resource type of the target resource, and so on. The object characteristics of the target object are used to describe the target object, for example, a User Identification (UID) that can be used to describe the target object, a nickname of the target object, an age of the target object, a resident city of the target object, and the like. Furthermore, the object characteristics of the target object can also be used for describing the context environment where the target object is located; for example, the page information may be used to describe a target page where the target object is currently located in the target application, for example, the page information may be used to describe a page identifier of the current target page, a page type of the target page, and the like; for another example, the access information generated in the target application before the target object accesses the target page may also be used to describe that the target object accesses the target page, for example, the access information may include page information of a preset number of pages that the target object accesses before the target page, and an access sequence when each page is accessed. The preset number is preset according to specific requirements, and may be, for example, page information of all pages accessed by the target object before accessing the target page in a single use process of the target application; or page information of a plurality of pages accessed by the target object in the single use process of the target application before the target page is accessed; the single-use process of the target object in the target application refers to a process in which the target object is closed from an open target application.

In an embodiment, in different service scenarios, when the click rate estimation for the target object to click on the target resource is implemented, the resource characteristics of the target resource and the object characteristics of the target object may be different, and the resource characteristics of the target resource and the specific content of the object characteristics of the target object are not limited in the embodiment of the present application, and any click rate estimation for the target object to click on the target resource implemented based on the method provided in the embodiment of the present application is within the protection scope of the embodiment of the present application. For example, in a service scene recommended by a movie, the click rate of a target object for clicking a target movie (i.e., a target resource) needs to be estimated based on the historical score of the target object for the movie, that is, the probability of the target object for clicking the target movie needs to be estimated based on the historical score of the target object for the movie; then, the resource characteristics of the target resource may be used to indicate the movie identification of the target movie, the movie name of the target movie, the movie type of the target movie, and so on; the object characteristics of the target object may be used to describe the UID of the target object, the nickname of the target object, the age of the target object, the resident city of the target object, etc.; further, the movie information (e.g., including movie identification, movie name, movie type, etc.) of the scored movie that may be used to describe the historical score of the target object on the scored movie may also be used to describe the historical score of the target object on the scored movie.

In one embodiment, the resource characteristics of the target resource may be extracted from raw resource characteristic data of the target resource, and the raw resource characteristic data of the target resource may be uploaded to the data processing device by business personnel of the target application; the object feature of the target object may be extracted from original object feature data of the target object, and the original object feature data of the target object may be acquired by the data processing apparatus from a terminal apparatus of the target object. When the resource feature of the target resource is a feature expressed in a vectorization manner, that is, when the data processing device extracts the resource feature of the target resource from the original resource feature data of the target resource, it is necessary to perform vectorization processing on the original resource feature data of the target resource, for example, if the resource feature data of the target resource indicates that the resource type of the target resource is resource type 1, that is, the original resource feature data of the target resource includes "resource type 1", and if the resource type of the resource includes 3 types and the corresponding relationships between the vector elements are { resource type 1, resource type 2, and resource type 3}, the resource feature corresponding to the original resource feature data may be expressed as {1,0,0 }. When the object feature of the target object is a feature expressed by vectorization, that is, the data processing device extracts the object feature of the target object from the original object feature data of the target object, it is necessary to perform vectorization processing on the original object feature data of the target object.

S202, performing outer product processing on the multiple to-be-processed features to perform feature interaction on the multiple to-be-processed features to obtain feature interaction vectors corresponding to the to-be-processed features in the multiple to-be-processed features.

And the feature interaction vector corresponding to each feature to be processed is a high-order semantic feature of each feature to be processed.

In one embodiment, the number of the plurality of features to be processed is N, where N is an integer greater than 1; the data processing device performs outer product processing on the multiple features to be processed to perform feature interaction on the multiple features to be processed, so as to obtain a feature interaction vector corresponding to each feature to be processed in the multiple features to be processed, and may include: traversing the N to-be-processed features, performing outer product processing on each to-be-processed feature and the nth to-be-processed feature in a target projection space corresponding to the nth to-be-processed feature in the N to-be-processed features, so as to perform feature interaction on each to-be-processed feature and the nth to-be-processed feature in the target projection space, and obtain a feature interaction vector corresponding to the nth to-be-processed feature, wherein N is a positive integer less than or equal to N. As shown in fig. 3, for a schematic diagram for obtaining a feature interaction vector corresponding to each feature to be processed provided in this embodiment of the present application, assuming that N is 5, that is, the number of the features to be processed is 5, the features to be processed are, respectively, a feature 1 to be processed, a feature 2 to be processed, a feature 3 to be processed, a feature 4 to be processed, and a feature 5 to be processed, and as denoted by 301, an interaction process between each feature to be processed and each feature to be processed is obtained, and a feature interaction vector corresponding to the feature 1 to be processed, a feature interaction vector corresponding to the feature 2 to be processed, a feature interaction vector corresponding to the feature 3 to be processed, a feature interaction vector corresponding to the feature 4 to be processed, and a feature interaction vector corresponding to the feature 5 to be processed are obtained, respectively.

Each feature to be processed in the N features to be processed corresponds to one projection space, and the projection space corresponding to one feature to be processed is used for performing feature interaction between each feature to be processed and the one feature to be processed. Maintaining a set of projection matrices and a set of scaling weights based on a projection space corresponding to a feature to be processed, wherein the set of projection matrices maintained based on the projection space corresponding to the feature to be processed includes: performing feature interaction on each feature to be processed in the plurality of features to be processed and the feature to be processed to generate a plurality of projection matrixes required by a feature interaction vector corresponding to the feature to be processed, wherein the number of the projection matrixes is the same as that of the features to be processed; the set of scaling weights maintained based on the projection space corresponding to the one feature to be processed includes: and performing feature interaction on each feature to be processed in the plurality of features to be processed and the one feature to be processed to generate a plurality of scaling weights required by a feature interaction vector corresponding to the one feature to be processed, wherein the number of the scaling weights is the same as that of the features to be processed.

Taking the nth feature to be processed of the N features to be processed as an example, maintaining a set of projection matrices and a set of scaling weights based on the target projection space corresponding to the nth feature to be processed, where the set of projection matrices maintained based on the target projection space corresponding to the nth feature to be processed includes: performing feature interaction on each feature to be processed in the N features to be processed and the nth feature to be processed to generate N projection matrixes required by a feature interaction vector corresponding to the nth feature to be processed; the set of scaling weights maintained based on the projection space corresponding to the nth feature to be processed includes: and performing feature interaction on each feature to be processed in the N features to be processed and the nth feature to be processed to generate N scaling weights required by a feature interaction vector corresponding to the nth feature to be processed. A group of projection matrixes maintained on the basis of a target projection space corresponding to the nth feature to be processed comprise N projection matrixes, and a group of maintained scaling weights comprise N scaling weights; the ith projection matrix and the ith scaling weight in the N projection matrices are used for adjusting a result obtained by performing feature interaction on the ith feature to be processed and the nth feature to be processed, and then adjusting a result obtained by performing feature interaction on each feature to be processed and the nth feature to be processed based on each projection matrix and each scaling weight, so as to obtain a feature interaction vector corresponding to the nth feature to be processed. Furthermore, the total number of the projection matrixes maintained on the basis of the N projection spaces corresponding to the N features to be processed is N × N, and the matrixes of the maintained scaling weights are N × N.

In one embodiment, F may be used_(*)Representing a feature to be processed, i.e. F_(n)Representing the nth feature to be processed of the N features to be processed, denoted by F_(i)Representing the ith characteristic to be processed in the N characteristics to be processed; can use

Representing a projection matrix corresponding to the characteristic interaction between the ith characteristic to be processed and the nth characteristic to be processed in a target projection space corresponding to the nth characteristic to be processed

And representing the corresponding scaling weight when the ith feature to be processed and the nth feature to be processed are subjected to feature interaction in the target projection space corresponding to the nth feature to be processed. For example, as shown in fig. 4, for a schematic diagram of a projection matrix and a corresponding relationship between a scaling weight and a feature to be processed provided in the embodiment of the present application, assuming that N is 5 and N is 2, in a feature space corresponding to a2 nd feature to be processed, each feature to be processed in N features to be processed performs a feature process with the 2 nd feature to be processedThe projection matrixes corresponding to the sign interaction are respectively

The scaling weights are respectively

In one embodiment, the dimension of a projection matrix corresponding to the ith feature to be processed and the nth feature to be processed in feature interaction in a target projection space corresponding to the nth feature to be processed is d_i*d_nI.e. the projection matrix is d_iLine, d_nA matrix of columns; wherein d is_iDimension of the embedding vector corresponding to the ith feature to be processed, d_nDimension of an embedding vector corresponding to the nth feature to be processed; the embedding vector corresponding to the ith characteristic to be processed is obtained by performing characteristic embedding processing on the ith characteristic to be processed, and the embedding vector corresponding to the nth characteristic to be processed is obtained by performing characteristic embedding processing on the nth characteristic to be processed; further, an embedding vector corresponding to any one of the N to-be-processed features may be obtained by performing feature embedding processing on any one of the to-be-processed features; the feature embedding processing is carried out on the features to be processed, the high-dimensional sparse vector can be converted into the dense vector, the dimension reduction processing of the features to be processed is realized, the processing resources can be saved, the processing of the features to be processed is converted into the processing of the embedded vector corresponding to the features to be processed, and the data in the processing process is reduced.

In specific implementation, taking obtaining a feature interaction vector corresponding to an nth feature to be processed from among N features to be processed as an example, the data processing device performs outer product processing on each feature to be processed and the nth feature to be processed in a target projection space corresponding to the nth feature to be processed from among the N features to be processed, so as to perform feature interaction on each feature to be processed and the nth feature to be processed in the target projection space, and obtain a feature interaction vector corresponding to the nth feature to be processedThe feature interaction vector of (2) may include: in a target projection space, solving the outer product of an embedded vector corresponding to the ith feature to be processed and an embedded vector corresponding to the nth feature to be processed in the N features to be processed to obtain a first processing result; adjusting the first processing result to obtain a feature interaction sub-vector for performing feature interaction on the ith feature to be processed and the nth feature to be processed; and summing the feature interaction sub-vectors of the features to be processed and the nth feature to be processed to obtain a feature interaction vector corresponding to the nth feature to be processed. The embedding vector corresponding to any one of the N to-be-processed features is obtained by performing feature embedding processing on any one of the N to-be-processed features, wherein any one of the N to-be-processed features is the ith to-be-processed feature or the nth to-be-processed feature, and i is a positive integer less than or equal to N; adjusting the first processing result to obtain a feature interaction subvector for performing feature interaction between the ith feature to be processed and the nth feature to be processed, namely, a target projection matrix (namely, a target projection matrix corresponding to the feature interaction between the ith feature to be processed and the nth feature to be processed in the target projection space based on the N X N projection matrices and the N scaling weights

) And target scaling weights (i.e.

) And (4) obtaining the product.

In a specific implementation, the adjusting, by the data processing device, the first processing result to obtain a feature interaction sub-vector for performing feature interaction between the ith to-be-processed feature and the nth to-be-processed feature, which may include: determining a target projection matrix corresponding to the ith to-be-processed feature and the nth to-be-processed feature based on the corresponding relation among the to-be-processed feature, the to-be-processed feature used for performing feature interaction with the to-be-processed feature and the projection matrix; determining a target scaling weight corresponding to the ith to-be-processed feature and the nth to-be-processed feature based on the corresponding relation among the to-be-processed feature, the to-be-processed feature for performing feature interaction with the to-be-processed feature and the scaling weight; performing matrix adjustment processing on the first processing result based on the target projection matrix and the target scaling weight to obtain a second processing result; and performing vector conversion processing on the second processing result to obtain a feature interaction sub-vector for performing feature interaction on the ith feature to be processed and the nth feature to be processed.

The data processing equipment determines that the ith to-be-processed feature and the nth to-be-processed feature correspond to target projection matrixes based on the corresponding relation among the to-be-processed feature, the to-be-processed feature used for performing feature interaction with the to-be-processed feature and the projection matrixes, namely: a projection matrix corresponding to the characteristic interaction between the ith characteristic to be processed and the nth characteristic to be processed in the target projection space, that is

Based on the corresponding relationship among the to-be-processed features, the to-be-processed features for performing feature interaction with the to-be-processed features, and the scaling weights, the determined target scaling weights corresponding to the ith to-be-processed features and the nth to-be-processed features are as follows: the scaling weight corresponding to the feature interaction between the ith feature to be processed and the nth feature to be processed in the target projection space, that is

In one embodiment, the data processing device obtains a feature interaction sub-vector for performing feature interaction between the ith to-be-processed feature and the nth to-be-processed feature based on the target projection space corresponding to the nth to-be-processed feature, which can be given by formula 1:

wherein v is_iRepresents the ith feature to be processed F_(i)Corresponding embedding vector, v_nRepresents the nth feature F to be processed_(n)A corresponding embedded vector;

represents the ith feature to be processed F_(i)Corresponding embedding vector v_iWith the nth feature to be processed F_(n)Corresponding embedding vector v_nThe outer product between, i.e. the first processing result; d_iFor the ith feature F to be processed_(i)Corresponding embedding vector v_iThe dimension (c) of (a) is,

expressed dimension of 1 x d_iThe vector elements of the vector are all 1. Due to the ith feature F to be processed_(i)Corresponding embedding vector v_iHas a dimension of d_iN-th feature to be processed F_(n)Corresponding embedding vector v_nHas a dimension of d_n(ii) a Then, the ith feature to be processed F_(i)Corresponding embedding vector v_iWith the nth feature to be processed F_(n)Corresponding embedding vector v_(n)Outer product of between

Is one dimension d_i*d_nA matrix of (a); indicates a multiplication of corresponding elements between two matrices, i.e

Representing a projection matrix

And matrix

Multiplication of corresponding elements due to projection matrix

Has a dimension of d_i*d_nThe scaling weight is a scalar parameter, so the second processing result

Is one dimension d_i*d_nOf the matrix of (a). Using one dimension of 1 x d_iAll 1 vectors (i.e.

) D is the dimension obtained after the adjustment processing of the projection matrix and the scaling weight_i*d_nThe matrix (i.e., the second processing result) is subjected to vector conversion to obtain the dimension (i.e., d) of the embedded vector corresponding to the nth feature to be processed_n) Equal vectors (i.e. feature interaction sub-vectors for feature interaction between the ith feature to be processed and the nth feature to be processed).

In an embodiment, the data processing device sums feature interaction sub-vectors obtained in a target projection space corresponding to an nth feature to be processed, where each feature to be processed and the nth feature to be processed perform feature interaction, to obtain a feature interaction vector corresponding to the nth feature to be processed, and the feature interaction vector corresponding to the nth feature to be processed may be given by formula 2:

wherein phi isⁿ(v) And representing the feature interaction vector corresponding to the nth feature to be processed.

In one embodiment, the data processing device obtains the feature interaction vector corresponding to each feature to be processed based on the outer product processing of the plurality of features to be processed, can obtain a high-order semantic space corresponding to each feature to be processed, and can fully learn the high-order semantic characteristics of each feature to be processed, so that the feature interaction vector corresponding to each feature to be processed obtained based on the outer product processing has the high-order semantic characteristics of each feature to be processed, and the accuracy of click rate estimation can be further improved.

In another embodiment, taking the feature interaction vector corresponding to the nth feature to be processed in the N features to be processed as an example, the data processing device performs outer product processing on each feature to be processed and the nth feature to be processed in a target projection space corresponding to the nth feature to be processed in the N features to be processed, so as to project empty space on the targetPerforming feature interaction between each feature to be processed and the nth feature to be processed within a certain time to obtain a feature interaction vector corresponding to the nth feature to be processed, which may include: respectively carrying out vector conversion processing on each to-be-processed feature in the N to-be-processed features and the scaling weight corresponding to the nth to-be-processed feature to obtain N scaling weight vectors, and carrying out splicing processing on the N scaling weight vectors to obtain spliced scaling weight vectors; readjusting a spliced embedded vector obtained by splicing the embedded vectors corresponding to the features to be processed by using the spliced scaling weight vector to obtain a third processing result; performing matrix adjustment processing on the third processing result based on the splicing projection matrix to obtain a hidden vector; and carrying out vector element interaction processing on the embedded vector corresponding to the nth characteristic to be processed and the implicit vector to obtain a characteristic interaction vector corresponding to the nth characteristic to be processed. Wherein, the scaling weight corresponding to each feature to be processed and the nth feature to be processed is: scaling weights corresponding to each feature to be processed in the target projection space when feature interaction is carried out on the nth feature to be processed; taking the ith to-be-processed feature of the N to-be-processed features as an example, the scaling weights corresponding to the ith to-be-processed feature and the nth to-be-processed feature are as follows: the scaling weight corresponding to the feature interaction between the ith feature to be processed and the nth feature to be processed in the target projection space, that is

The embedded vector corresponding to each feature to be processed is obtained by performing feature embedding processing on each feature to be processed; the tiled projection matrix is: and performing splicing processing on each feature to be processed and the projection matrix corresponding to the nth feature to be processed to obtain a matrix, namely performing splicing processing on the projection matrix corresponding to each feature to be processed and the nth feature to be processed in the target projection space.

In a specific implementation, taking the ith to-be-processed feature of the N to-be-processed features as an example, when the data processing device performs vector conversion processing on the ith to-be-processed feature of the N to-be-processed features and the scaling weight corresponding to the nth to-be-processed feature, the scaling weight may be based on oneDimension 1 x d_iAll 1 vectors (i.e.

) Carrying out vector conversion processing on the scaling weights corresponding to the ith feature to be processed and the nth feature to be processed to obtain a scaling weight vector, wherein vector elements of the scaling weight vector are the scaling weights corresponding to the ith feature to be processed and the nth feature to be processed (namely the scaling weights are the scaling weights corresponding to the ith feature to be processed and the nth feature to be processed)

) The scaling weight vector has a dimension of 1 × d_i(ii) a I.e. the one scaling weight vector is one 1-line, d_iColumn, vector elements are all

The vector of (2). Further, the data processing device splices the N scaling weight vectors, and the obtained spliced scaling weight vector can be given by formula 3:

wherein the content of the first and second substances,

representing a stitching scaling weight vector;

representing that the scaling weight corresponding to the 1 st feature to be processed and the nth feature to be processed in the N features to be processed is subjected to vector conversion processing to obtain a scaling weight vector with a dimension d₁，

Representing that the scaling weight corresponding to the ith feature to be processed and the nth feature to be processed in the N features to be processed is subjected to vector conversion processing to obtain a scaling weight vector with a dimension d_i，

Representing that the scaling weight corresponding to the Nth feature to be processed and the nth feature to be processed in the N features to be processed is subjected to vector conversion processing to obtain a scaling weight vector with a dimension d_N(ii) a The dimension of the embedded vector corresponding to the ith feature to be processed is d_iSo the dimension of the stitching scaling weight vector is the same as the dimension of the stitching embedded vector resulting from the stitching of the embedded vectors corresponding to the respective features to be processed, d,

… denotes a stitching operation.

In one embodiment, the mosaic embedded vector obtained by mosaicing the embedded vectors corresponding to the features to be processed can be given by equation 4:

v＝[v₁,…,v_i,…,v_N] (4)

where v represents the stitching embedding vector, v₁Represents the embedded vector corresponding to the 1 st feature to be processed in the N features to be processed, v_iRepresents the embedded vector corresponding to the ith feature to be processed, v_NIndicating the embedding vector corresponding to the Nth feature to be processed, and … indicating the splicing operation.

In one embodiment, the stitched projection matrix refers to: the matrix is obtained after splicing the projection matrixes corresponding to the features to be processed and the nth feature to be processed, namely the matrix is obtained after splicing the projection matrixes corresponding to the features to be processed and the nth feature to be processed in the target projection space; the stitched projection matrix may be given by equation 5:

wherein the content of the first and second substances,

denotes the 1 st pendingA projection matrix corresponding to the feature and the nth feature to be processed, namely a projection matrix corresponding to the feature interaction between the 1 st feature to be processed and the nth feature to be processed in the target projection space, with a dimension d₁*d_n；

A projection matrix corresponding to the ith feature to be processed and the nth feature to be processed is represented, and the dimension is d_i*d_n；

Representing the Nth feature to be processed and the projection matrix corresponding to the nth feature to be processed with the dimension d_N*d_n，

Representing a tiled projection matrix with dimensions d x d_nAnd … denotes a stitching operation.

In one embodiment, the third processing result is matrix-adjusted based on the stitched projection matrix, and the resulting implicit vector can be given by equation 6:

wherein, v denotes the multiplication of corresponding elements between two matrices or vectors, v denotes a stitching embedded vector,

a stitching scaling weight vector is represented and,

indicating a third processing result;

representing a stitched projection matrix; g_nAnd representing an implicit vector, wherein the dimension of the implicit vector is the same as that of the embedded vector corresponding to the nth feature to be processed.

Further, the feature interaction vector corresponding to the nth feature to be processed can be given by equation 7:

wherein an |, denotes the multiplication of corresponding elements between two matrices or vectors, φⁿ(v) Representing the feature interaction vector corresponding to the nth feature to be processed, v_nAnd representing the embedded vector corresponding to the nth feature to be processed.

Referring to fig. 5, a schematic diagram of obtaining a feature interaction vector corresponding to an nth feature to be processed is provided in the embodiment of the present application; the splicing embedding vector can be marked as 501, and marked as 502, the first vector element (denoted as "i" in the embedding vector corresponding to the ith feature to be processed)

) (ii) a The stitched embedding vector is rescaled with the stitched scaling weight vector as denoted by 503 to obtain a third processing result, and the 1 st vector element in the ith scaling weight vector may be denoted by 504 (denoted as

) (ii) a Performing matrix adjustment processing on the third processing result based on the splicing projection matrix shown by the 505 label to obtain an implicit vector shown by the 506 label, wherein the dimension of the implicit vector is the same as the dimension of the embedded vector corresponding to the nth feature to be processed, and a first vector element in the implicit vector can be represented as

As indicated by 507 marker; and performing vector element interaction processing on the embedded vector corresponding to the nth feature to be processed shown by the 508 marker and the implicit vector shown by the 506 marker to obtain a feature interaction vector corresponding to the nth feature to be processed shown by the 509 marker.

In one embodiment, the number of the plurality of features to be processed is N, H channels are arranged in a target projection space corresponding to an nth feature to be processed in the N features to be processed, N is an integer greater than 1, H is a positive integer, and N is a positive integer less than or equal to N; the data processing device performs outer product processing on the multiple features to be processed to perform feature interaction on the multiple features to be processed, so as to obtain a feature interaction vector corresponding to each feature to be processed in the multiple features to be processed, and may include: traversing the N to-be-processed features, and performing outer product processing on each to-be-processed feature and the nth to-be-processed feature in each channel in the target projection space so as to perform feature interaction on each to-be-processed feature and the nth to-be-processed feature in each channel in the target projection space to obtain a feature interaction vector corresponding to the nth to-be-processed feature in each channel; and combining the feature interaction vectors corresponding to the nth feature to be processed in each channel to obtain the feature interaction vector corresponding to the nth feature to be processed. Because the feature interaction vector corresponding to the nth feature to be processed is obtained by combining the feature interaction vectors corresponding to the nth feature to be processed in each channel, the number of the channels can be set according to different requirements, generally speaking, when H is set to a numerical value in the range of [2,6], the obtained feature interaction vector corresponding to each feature to be processed is relatively superior, i.e., the high-order semantic features of each feature to be processed can be more fully learned. Setting H channels in a projection space corresponding to each feature to be processed, and solving a feature interaction vector corresponding to each projection space (namely, a feature interaction vector corresponding to each feature to be processed) based on H groups of projection matrixes maintained by the H channels in each projection space and H groups of scaling weights aiming at the projection space corresponding to each feature to be processed; the ambiguity of each feature to be processed can be fully learned, and the accuracy of click rate estimation can be further improved.

Maintaining an H group of projection matrices and an H group of scaling weights based on a target projection space corresponding to an nth feature to be processed, that is, maintaining 1 group of projection matrices and 1 group of scaling weights based on each of H channels in the target projection space corresponding to the nth feature to be processed; based on 1 set of projection matrix and 1 set of scaling weight maintained by each channel in H channels in the target projection space corresponding to the nth feature to be processed, the set of projection matrix and the set of scaling weight maintained by the target projection space corresponding to the nth feature to be processed have similar characteristics. In a specific implementation, the data processing device performs outer product processing on each to-be-processed feature and an nth to-be-processed feature in each channel in the target projection space, so as to perform feature interaction on each to-be-processed feature and the nth to-be-processed feature in each channel in the target projection space, obtain a feature interaction vector corresponding to the nth to-be-processed feature in each channel, perform outer product processing on each to-be-processed feature and the nth to-be-processed feature in the target projection space corresponding to the nth to-be-processed feature in the N to-be-processed features, so as to perform feature interaction on each to-be-processed feature and the nth to-be-processed feature in the target projection space, obtain a feature interaction vector corresponding to the nth to-be-processed feature, which is similar in process and is not described herein again.

Further, the data processing device performs combination processing on feature interaction vectors corresponding to an nth feature to be processed in each channel to obtain a feature interaction vector corresponding to the nth feature to be processed, which may be obtained by performing combination processing on feature interaction vectors corresponding to an nth feature to be processed in each channel based on a combination function, where the combination function may be a function capable of arbitrarily combining multiple vectors, and may be, for example: indicating a function for summing element-by-element feature interaction vectors corresponding to the nth feature to be processed in each channel; summing the first vector elements of the feature interaction vectors corresponding to the nth feature to be processed in each channel to obtain the first vector element of the feature interaction vector corresponding to the nth feature to be processed; summing second vector elements of the feature interaction vector corresponding to the nth feature to be processed in each channel to obtain second vector elements of the feature interaction vector corresponding to the nth feature to be processed; each vector element of the feature interaction vector corresponding to the nth feature to be processed is obtained based on the method, and then the feature interaction vector corresponding to the nth feature to be processed is obtained. For example, it may be: indicating a function for carrying out element-by-element average on feature interaction vectors corresponding to the nth feature to be processed in each channel; summing and averaging the first vector elements of the feature interaction vector corresponding to the nth feature to be processed in each channel to obtain the first vector element of the feature interaction vector corresponding to the nth feature to be processed; summing and averaging second vector elements of the feature interaction vector corresponding to the nth feature to be processed in each channel to obtain the second vector elements of the feature interaction vector corresponding to the nth feature to be processed; each vector element of the feature interaction vector corresponding to the nth feature to be processed is obtained based on the method, and then the feature interaction vector corresponding to the nth feature to be processed is obtained.

In one embodiment, if in the h-th channel in the target projection space, each feature to be processed and the nth feature to be processed are subjected to outer product processing, and the obtained feature interaction vector corresponding to the nth feature to be processed in the h-th channel is represented as

Then the feature interaction vector corresponding to the nth feature to be processed can be given by equation 8:

wherein phi isⁿ(v) Representing the feature interaction vector corresponding to the nth feature to be processed, f representing a combination function,

representing the feature interaction vector corresponding to the nth feature to be processed in the 1 st channel,

and the feature interaction vector corresponding to the nth feature to be processed in the H channel.

In one embodiment, the data processing method provided by the embodiment of the present application may be implemented based on a data processing model, where the data processing model may include an embedding layer, a feature interaction layer, a forward full-link layer, and a normalization layer; the parameters used for implementing the data processing method provided by the embodiment of the present application are model parameters of a data processing model, for example, the projection matrix and the scaling weight are both model parameters of the data processing model, and the data processing model may be obtained by training an initial data processing model. As shown in fig. 6, for a schematic diagram of performing click rate estimation based on a data processing model according to an embodiment of the present application, a data processing device may perform outer product processing on a plurality of features to be processed through an embedding layer and a feature interaction layer in the data processing model, so as to perform feature interaction on the plurality of features to be processed, and obtain a feature interaction vector corresponding to each feature to be processed in the plurality of features to be processed; and carrying out click rate estimation processing on the feature interaction vectors corresponding to the features to be processed through a forward full-link layer and a normalization layer in the data processing model to obtain the estimated click rate of the target object to the target resource.

In the specific implementation, when the data processing device performs outer product processing on a plurality of features to be processed through an embedding layer and a feature interaction layer in a data processing model so as to obtain a feature interaction vector corresponding to each feature to be processed in the plurality of features to be processed, the data processing device may perform feature embedding processing on each feature to be processed through the embedding layer in the data processing model so as to obtain an embedded vector corresponding to each feature to be processed; and then, a feature interaction vector corresponding to each feature to be processed can be obtained based on the embedded vector corresponding to each feature to be processed through a feature interaction layer in the data processing model. The specific process of obtaining the feature interaction vector corresponding to each feature to be processed is described in detail above, and is not described herein again. Further, the data processing device performs click rate estimation processing on the feature interaction vectors corresponding to the features to be processed through a forward full-link layer and a normalization layer in the data processing model to obtain an estimated click rate of the target object on the target resource, that is, the relevant process of step S203 is implemented.

S203, carrying out click rate estimation processing on the feature interaction vectors corresponding to the features to be processed to obtain the estimated click rate of the target object to the target resource.

Wherein the estimated click rate is used to indicate: probability of the target object clicking the target resource.

In an embodiment, the performing, by the data processing device, click rate estimation processing on the feature interaction vector corresponding to each feature to be processed to obtain an estimated click rate of the target object to the target resource may include: performing full-connection processing on the feature interaction vectors corresponding to the features to be processed to obtain a plurality of prediction vectors corresponding to the features to be processed; and normalizing the prediction vectors corresponding to the plurality of to-be-processed features to obtain the estimated click rate of the target object to the target resource. The data processing device performs full-connection processing on the feature interaction vector corresponding to each feature to be processed to obtain a plurality of prediction vectors corresponding to the feature to be processed, wherein the prediction vectors corresponding to the feature to be processed can be obtained through a forward full-connection layer in the data processing model, and the data processing device performs normalization processing on the prediction vectors corresponding to the feature to be processed to obtain the pre-estimated click rate of the target object on the target resource, and can be obtained through a normalization layer in the data processing model. The activation function of the forward full-link layer is a sigmoid function, and can map the prediction vectors corresponding to a plurality of features to be processed into a probability value in a range of [0,1 ].

In an embodiment, click rate estimation processing is performed on the feature interaction vector corresponding to each feature to be processed, and the obtained estimated click rate of the target object to the target resource can be given by formula 9:

y＝σ(w([φ¹(v),…,φⁿ(v),…,φ^N(v)]+b) (9)

wherein phi is¹(v) Represents the feature interaction vector, phi, corresponding to the 1 st feature to be processedⁿ(v) Representing the feature interaction vector, phi, corresponding to the nth feature to be processed^N(v) Representing a feature interaction vector corresponding to the Nth feature to be processed; w represents a weight parameter in the forward full-link layer, b represents a bias term in the forward full-link layer, represents an activation function of a normalization layer of the data processing model, and y represents an estimated click rate of a target object to a target resource.

In one embodiment, the data processing device may determine whether to push the target resource to the target object according to the estimated click rate; if the data processing device determines that the target resource needs to be pushed to the target object, the target resource can be sent to the terminal device of the target object, so that the target object can click and view the pushed target resource.

The data processing method can be realized through a data processing model, the data processing model can be obtained by training an initial data processing model, the initial data processing model comprises an embedding layer, a characteristic interaction layer, a forward full-connection layer and a normalization layer, and model parameters of the initial data processing model and the data processing model are different. Based on this, the embodiment of the application provides a training method of a data processing model. Referring to fig. 7, a flowchart of a training method of a data processing model according to an embodiment of the present application is schematically shown. The training method of the data processing model shown in fig. 7 may be executed by the data processing device, and may also be executed by any other electronic device capable of implementing the training of the data processing model. The training method of the data processing model shown in fig. 7 may include the steps of:

and S701, acquiring a training sample.

The training sample comprises a plurality of training features and a sample label; the plurality of training features includes resource features of the training resources and object features of the training objects, and the sample label is used to indicate: whether the training subject clicks on the training resource. The resource features of the training resources and the object features of the training objects included in the training features are the same as the resource features of the target resources and the feature items of the object features of the target objects included in the features to be processed, and are not described herein again.

In one embodiment, different data processing models can be trained for training samples in different service scenarios to achieve click rate estimation of target objects clicking target resources in different service scenarios. For example, in a service scene recommended by a movie, the click rate of a target object for clicking a target movie (i.e., a target resource) needs to be estimated based on the historical score of the target object for the movie, that is, the probability of the target object for clicking the target movie needs to be estimated based on the historical score of the target object for the movie; then the initial data processing model needs to be trained to arrive at the data processing model based on the historical scores of the movies by the training subjects. Based on the above, when the initial data processing model under the movie recommendation service scene is trained, the resource features of the training resources can be used for indicating movie identification of the training movie, movie name of the training movie, movie type of the training movie, and the like; the object characteristics of the training object can be used to describe the UID of the training object, the nickname of the training object, the age of the training object, the resident city of the training object, etc.; further, the movie information (e.g., including movie identification, movie name, movie type, etc.) of the scored movie that may be used to describe the historical score of the training subject on the scored movie may also be used to describe the historical score of the training subject on the scored movie.

In one embodiment, the training samples may be extracted from historical data generated in the target application by a user of the target application; the training object may be any user who uses the service provided by the target application, and the training resource may be any resource that the target application can provide, for example, a resource that the target application has pushed to the user of the target application. In another embodiment, the training samples may be obtained from a common data set used for model training in the field of click rate estimation; furthermore, when training data processing models for realizing click rate estimation in different service scenes, training samples can be obtained from common data for model training in corresponding service scenes in the field of click rate estimation; for example, if in a service scene recommended by a movie, an initial data processing model needs to be trained based on a historical score of a training object on the movie, so as to obtain a data processing model which can estimate a click rate of a target object for clicking a target movie (i.e., a target resource) based on the historical score of the target object on the movie; then, training samples can be obtained from a common data set MovieLens data set used for model training in a movie recommendation scene in the field of click rate estimation; the historical scores of the objects recorded in one piece of MovieLens data set for their scored movies are scored according to the 5-star system and are incremented on a half-star basis, i.e., the historical scores for a scored movie may only be scores of 0.5, 1, 1.5, 2, 2.5, 3, 3.5, 4, 4.5, and 5 stars.

In an embodiment, if a data processing model for realizing click rate estimation is trained in the movie recommendation service scenario, when a data processing device acquires training samples from a MovieLens data set, training data may be determined from the data set; and then preprocessing the training data to obtain a training sample. In a specific implementation, the data processing device may determine training data from the data set based on a random extraction; further, the data processing apparatus may randomly extract a certain amount of data from the data set as training data, for example, 80% of the data in the data set may be randomly extracted as training data, 10% of the data may be randomly extracted as verification data, and 10% of the data may be randomly extracted as test data, wherein the training data, the verification data, and the test data are different from each other.

Further, when the data processing device preprocesses the training data to obtain the training sample, the sample label of the training sample determined by the training data may be determined based on the historical score of the training resource by the training object corresponding to the training data, specifically, if the historical score of the training resource by the training object is greater than or equal to a preset score threshold, it is determined that the sample label of the training sample indicates that the training object clicks the training sample, that is, the training sample is a positive sample; and if the historical score of the training object on the training resources is smaller than a preset score threshold, determining that the sample label of the training sample indicates that the training object does not click on the training sample, namely that the training sample is a negative sample. For example, the preset scoring threshold may be 3 stars.

In one embodiment, when the data processing device preprocesses the training data to obtain the object features of the training objects and the resource features of the training resources in the training sample, the training data needs to be vectorized. Further, when the data processing device performs vectorization processing on the training data, it may perform normalization processing on numerical data included in the training data, and then perform vector conversion on the data after the normalization processing. Optionally, the data processing device may standardize the numerical data by using a series of standardization methods, for example, the numerical data may be standardized by using a mean variance standardization method, or the numerical data may be standardized by using a logarithm-based standardization method, and so on; for example, when a normalization method based on logarithm is adopted, the numerical data greater than 2 may be normalized so that the data after the normalization processing is the result of taking the logarithm of the numerical data greater than 2 with the base 2 (i.e., a2 ═ log)₂(a1) Where a1 represents numerical data greater than 2, and a2 represents data after normalization processing).

Referring to fig. 8, which is a schematic view of an overall process for training an initial image processing model according to an embodiment of the present disclosure, a data processing device determines training data from a data set for model training, and performs preprocessing on the training data to obtain a training sample; initializing the model parameters of the initial data processing model through a hypercomplex product parameterization strategy; performing outer product processing on a plurality of training features included in a training sample through an initial data processing model to perform feature interaction on the plurality of training features to obtain feature interaction vectors corresponding to each of the plurality of training features; and carrying out click rate estimation processing on the feature interaction vectors corresponding to the training features to obtain the training estimated click rate of the training object to the training resources. The data processing equipment performs outer product processing on a plurality of training features included in a training sample through an initial data processing model so as to perform feature interaction on the plurality of training features, and when a feature interaction vector corresponding to each training feature in the plurality of training features is obtained, the adopted model parameters are the model parameters obtained by initializing the model parameters of the initial data processing model through a hypercomplex product parameterization strategy.

S702, initializing the model parameters of the initial data processing model by a hypercomplex product parameterization strategy based on the dimensionality of the embedded vector corresponding to each training feature in the plurality of training features.

And the embedded vector corresponding to each training feature is obtained by performing feature embedding processing on each training feature.

In one embodiment, the number of the plurality of training features is N, the model parameters of the initial data processing model include a training projection matrix, and N is an integer greater than 1; the data processing equipment can train model parameters of the initial data processing model based on the training samples and construct a data processing model based on the trained model parameters; that is to say, the model parameters in the data processing model are obtained by training the model parameters in the initial data processing model, and the projection matrix in the data processing model is obtained by training the training projection matrix in the initial data processing model, so that one projection matrix included in the model parameters of the data processing model and one projection matrix included in the initial data processing model have similar data structures and characteristics. In a specific implementation, the initializing, by the data processing device, the model parameters of the initial data processing model by using a hypercomplex product parameterization strategy based on the dimensionality of the embedded vector corresponding to each of the plurality of training features may include: traversing the N training features, and initializing a splicing training projection matrix in model parameters through a hypercomplex product parameterization strategy based on the dimension of an embedded vector corresponding to the kth training feature in the N training features and the dimension of a training splicing embedded vector; splitting the spliced training projection matrix to obtain N training projection matrices; the training splicing embedded vector is obtained by splicing embedded vectors corresponding to the training features, and the N training projection matrixes are used for obtaining feature interaction vectors corresponding to the kth training feature.

In one embodiment, F 'may be used'_(*)Representing a training feature, i.e. F'_(k)Representing the kth of the N training features, by F'_(j)Representing a jth training feature of the N training features; can use

Represents a training projection matrix corresponding to the feature interaction between the jth training feature and the kth training feature in the training projection space corresponding to the kth training feature, namely

Representing a training projection matrix corresponding to the jth training feature and the kth training feature; can be prepared from v'_jRepresenting the embedded vector corresponding to the jth training feature, v'_kRepresenting the corresponding embedded vector of the k-th training feature.

Further, the training splicing embedded vector obtained by splicing the embedded vectors corresponding to the training features may be represented as: v '═ v'₁,…,v′_j,…,v′_N](ii) a Wherein v 'represents a training splice embedding vector, v'₁Representing an embedded vector v 'corresponding to the 1 st training feature of the N training features'_jRepresenting the embedded vector v 'corresponding to the jth training feature'_NRepresenting an embedding vector corresponding to the Nth training feature, and … representing a splicing operation; the dimension of the embedded vector corresponding to the jth training feature is d_jThe dimension of the embedded vector corresponding to the kth training feature is d_kThe training splicing embedded vector obtained by splicing the embedded vectors corresponding to the training characteristics is d,

in one embodiment, the projection matrix corresponding to the ith feature to be processed and the nth feature to be processed in the model parameters of the data processing model is: a projection matrix corresponding to the ith feature to be processed and the nth feature to be processed in the target projection space corresponding to the nth feature to be processed during feature interaction; the projection matrix corresponding to the jth feature to be processed and the kth feature to be processed in the model parameters of the data processing model is as follows: a projection matrix corresponding to the characteristic interaction between the jth characteristic to be processed and the kth characteristic to be processed in the target projection space corresponding to the kth characteristic to be processed; the projection matrix corresponding to the jth feature to be processed and the kth feature to be processed in the model parameters of the data processing model is obtained by training based on the jth training feature and the training projection matrix corresponding to the kth training feature in the initial data processing model; according to the training requirement, the j training characteristic obtained by initialization and the training projection matrix corresponding to the k training characteristic are required to be obtained

Has a dimension of d_j*d_k(ii) a Splicing projection matrix corresponding to target projection space corresponding to kth feature to be processed in model parameters of data processing model and splicing training corresponding to training projection space corresponding to kth training feature in model parameters of initial data processing modelThe projection matrixes correspond to each other, namely the splicing training projection matrix is obtained by splicing each training characteristic and the training projection matrix corresponding to the kth training characteristic, namely the matrix obtained by splicing the training projection matrix corresponding to each training characteristic in the training projection space corresponding to the kth training characteristic and the training projection matrix corresponding to the kth training characteristic when the training characteristics and the kth training characteristic are subjected to characteristic interaction; therefore, according to the training requirement, the dimension of the spliced training projection matrix obtained by the initialization is required to be d x d_k. Based on the above, the data processing device may initialize the splicing training projection matrix in the model parameters through a hypercomplex product parameterization strategy based on the dimension of the embedding vector corresponding to the kth training feature of the N training features and the dimension of the training splicing embedding vector; and then splitting the spliced training projection matrix to obtain N training projection matrices, wherein the N training projection matrices are used for obtaining a characteristic interaction vector corresponding to the kth training characteristic.

In a specific implementation, the data processing device initializes the splicing training projection matrix in the model parameter by the hypercomplex product parameterization strategy based on the dimension of the embedded vector corresponding to the kth training feature in the N training features and the dimension of the training splicing embedded vector, and may include: dimension d of embedded vector corresponding to kth training feature_kTraining the dimension d of the splicing embedded vector, and initializing B projection parameter matrixes; the B-th projection parameter matrix in the B projection parameter matrices is the initialized first parameter matrix with dimension B x B and the initialized dimension

Kronecker product of the second parameter matrix of (a); and summing the B projection parameter matrixes to obtain a splicing training projection matrix. Wherein, B can be set according to specific requirements. Optionally, the model parameters may be initialized by using an Xavier mode.

If the first parameter matrix in the b-th projection parameter matrix is used

It is shown that,for the second parameter matrix in the b-th projection parameter matrix

Represents; the stitching training projection matrix can be as shown in equation 10:

wherein the content of the first and second substances,

a stitching training projection matrix is represented and,

representing a kronecker product of a first parameter matrix in the b-th projection parameter matrix and a second parameter matrix in the b-th projection parameter matrix; since the kronecker product of any two matrices is a block matrix, namely, the dimension is x₁*x₂Matrix X of₁And dimension x₃*x₄Matrix X of₂The kronecker product of (a) has a dimension of x₁x₃*x₂x₄(ii) a If x₁*x₂Is 3 x 4, x₃*x₄5X 6, then the matrix X₁And matrix X₂The kronecker product of (a) has a dimension of 15 x 24 (i.e., (3.5) × (4.6)). The dimension of the kronecker product of the first parameter matrix of the b-th projection parameter matrix and the second parameter matrix of the b-th projection parameter matrix is d x d_k(ii) a The dimension satisfying the splicing training projection matrix is d x d_kThe requirements of (1); the splicing training projection matrix can be a matrix obtained by splicing each training feature and the training projection matrix corresponding to the kth training feature, that is, the matrix is

Therefore, the spliced training projection matrix can be split according to the dimension required by each training projection matrix to obtain N training projection matrices and model parameters of the initial data processing modelTraining projection matrix corresponding to jth training feature and kth training feature in number

In one embodiment, model parameters of the initial data processing model are initialized through a hypercomplex product parameterization strategy, so that a training projection matrix in the initial data processing model can be learned from a hypercomplex space when the initial data processing model is trained subsequently, the interaction between a real number component and an imaginary number component can be learned, and the vector outer product can be popularized to a real space with higher dimensionality; in addition, since the kronecker product of the first parameter matrix in the b-th projection parameter matrix and the second parameter matrix in the b-th projection parameter matrix reuses the matrix parameters in the first parameter matrix in the b-th projection parameter matrix and the matrix parameters in the second parameter matrix in the b-th projection parameter matrix, the model parameters of the initial data processing model can be reduced to the values of the initial data processing model

The model parameters required to pass training can be reduced to those adopting the conventional training method

The number of model parameters used in the training process of the initial data processing model is reduced, so that the model parameters of the data processing model obtained by training the initial data processing model are reduced, the deployment cost of the data processing model can be reduced, and the deployment of the data processing model is easier; and because of the reduction of the model parameters of the data processing model, the computing resources are saved.

In one embodiment, when the training projection space corresponding to each training feature includes H channels, that is, when the click rate of the target object on the target resource is estimated, the target projection space corresponding to each feature to be processed includes H channels; the initialization of 1 set of training projection matrices maintained for each channel is the same as the initialization process of 1 set of training projection matrices maintained for one training projection space, and is not described herein again; if the training samples are acquired in the MovieLens data set based on a movie recommended service scene, when H is set to 3, that is, the training projection space corresponding to each training feature includes 3 channels, and when 3 sets of training projection matrices are maintained, a better training effect is achieved.

And S703, performing outer product processing on the plurality of training features through the initial data processing model to perform feature interaction on the plurality of training features to obtain feature interaction vectors corresponding to each of the plurality of training features.

And the feature interaction vector corresponding to each training feature is a high-order semantic feature of each training feature.

S704, carrying out click rate estimation processing on the feature interaction vectors corresponding to the training features to obtain the training estimated click rate of the training object to the training resources.

Wherein the training estimated click rate is used to indicate: the probability of the training object clicking the training resource;

steps S703 to S704 are similar to steps S202 to S203, and are not repeated herein.

S705, training the initial data processing model based on the training estimated click rate and the sample label to obtain the data processing model.

Referring to fig. 9, which is a schematic diagram of an initial image processing model for training provided in the embodiment of the present application, a data processing device may perform outer product processing on a plurality of training features through an embedding layer and a feature interaction layer in the initial image processing model, so as to perform feature interaction on the plurality of training features, and obtain a feature interaction vector corresponding to each of the plurality of training features; carrying out click rate estimation processing on the feature interaction vectors corresponding to the training features through a forward full-link layer and a normalization layer in the initial data processing model to obtain the training estimated click rate of the training object on the training resources; and training the initial data processing model by the data processing equipment based on the training estimated click rate and the sample label to obtain the data processing model.

In one embodiment, the training of the initial data processing model by the data processing device based on the training estimated click rate and the sample label to obtain the data processing model may include: determining a loss value of a loss function based on the training estimated click rate and the sample label; and training the initial data processing model towards the direction of reducing the loss value to obtain the data processing model. Optionally, when the initial data processing model is trained, Adam optimization algorithm in the optimization algorithm may be used to solve model parameters in the initial image processing model, and an error back propagation algorithm may be used to perform training optimization. When the initial training model is trained based on a plurality of training samples, the loss function can be given by equation 11:

wherein Z is the number of training samples and is an independent variable of the training samples; y'_zFor the sample label of the Z-th training sample among the Z training samples,

and estimating click rate for the training corresponding to the z-th training sample obtained by processing the z-th training sample.

In one embodiment, the model parameters of the initial data processing model may be initialized by the hypercomplex product parameterization strategy, and then the initial data processing model may be trained based on the training samples to obtain the data processing model. Optionally, the obtained feature interaction layer in the data processing model may be applied to another existing model for implementing click rate estimation in a modular manner, so that the deployment cost of the existing model to which the feature interaction layer of the data processing model is applied is reduced. For example, the method can be applied to a Field-aware Factorization Machine (FFM) model or a weighted Field-aware Factorization Machine (FwFM) model.

In the embodiment of the application, after the data processing device obtains the training sample comprising a plurality of training features and a sample label, the data processing device can initialize the model parameters of the initial data processing model through a hypercomplex product parameterization strategy based on the dimensionality of the embedded vector corresponding to each training feature in the plurality of training features; the initial data processing model is trained based on the initialized model parameters, and the adoption of a hypercomplex product parameterization strategy can learn a training projection matrix in the initial data processing model from a hypercomplex space, learn the interaction between a real number component and an imaginary number component and popularize a vector outer product to a higher-dimension real space; the number of model parameters used in the training process of the initial data processing model can be reduced, so that the model parameters of the data processing model obtained by training the initial data processing model are reduced, the deployment cost of the data processing model can be reduced, and the deployment of the data processing model is easier; and because of the reduction of the model parameters of the data processing model, the computing resources are saved.

Based on the above data processing method embodiment, the present application provides a data processing apparatus. Referring to fig. 10, a schematic structural diagram of a data processing apparatus according to an embodiment of the present disclosure is shown, where the data processing apparatus may include an obtaining unit 1001 and a processing unit 1002. The data processing apparatus shown in fig. 10 may operate as follows:

an obtaining unit 1001, configured to obtain a plurality of features to be processed, where the plurality of features to be processed include an object feature of a target object and a resource feature of a target resource to be pushed to the target object;

the processing unit 1002 is configured to perform outer product processing on the multiple features to be processed, so as to perform feature interaction on the multiple features to be processed, to obtain a feature interaction vector corresponding to each feature to be processed in the multiple features to be processed, where the feature interaction vector corresponding to each feature to be processed is a high-order semantic feature of each feature to be processed;

the processing unit 1002 is further configured to perform click rate estimation processing on the feature interaction vector corresponding to each feature to be processed, so as to obtain an estimated click rate of the target object on the target resource, where the estimated click rate is used to indicate: a probability of the target object clicking the target resource.

In one embodiment, the number of the plurality of features to be processed is N, where N is an integer greater than 1;

when the processing unit 1002 performs outer product processing on the multiple features to be processed to perform feature interaction on the multiple features to be processed to obtain a feature interaction vector corresponding to each feature to be processed in the multiple features to be processed, the following operations are specifically performed:

traversing the N to-be-processed features, performing outer product processing on each to-be-processed feature and the nth to-be-processed feature in a target projection space corresponding to the nth to-be-processed feature in the N to-be-processed features, so as to perform feature interaction on each to-be-processed feature and the nth to-be-processed feature in the target projection space, and obtain a feature interaction vector corresponding to the nth to-be-processed feature, wherein N is a positive integer smaller than or equal to N.

In an embodiment, when the processing unit 1002 performs an outer product processing on each feature to be processed and the nth feature to be processed in a target projection space corresponding to an nth feature to be processed in the N features to perform feature interaction on each feature to be processed and the nth feature to be processed in the target projection space, so as to obtain a feature interaction vector corresponding to the nth feature to be processed, the following operations are specifically performed:

in the target projection space, solving an outer product of an embedded vector corresponding to the ith feature to be processed and an embedded vector corresponding to the nth feature to be processed in the N features to be processed to obtain a first processing result; an embedding vector corresponding to any one to-be-processed feature of the N to-be-processed features is obtained by performing feature embedding processing on the any one to-be-processed feature, wherein the any one to-be-processed feature is the ith to-be-processed feature or the nth to-be-processed feature, and i is a positive integer less than or equal to N;

adjusting the first processing result to obtain a feature interaction sub-vector for performing feature interaction on the ith feature to be processed and the nth feature to be processed;

and summing the feature interaction sub-vectors of the feature interaction between each feature to be processed and the nth feature to be processed to obtain a feature interaction vector corresponding to the nth feature to be processed.

In an embodiment, when the processing unit 1002 performs adjustment processing on the first processing result to obtain a feature interaction sub-vector for performing feature interaction between the ith feature to be processed and the nth feature to be processed, the following operation is specifically performed:

determining a target projection matrix corresponding to the ith feature to be processed and the nth feature to be processed based on the corresponding relation among the feature to be processed, the feature to be processed for performing feature interaction with the feature to be processed and the projection matrix;

determining a target scaling weight corresponding to the ith feature to be processed and the nth feature to be processed based on the corresponding relationship among the feature to be processed, the feature to be processed for performing feature interaction with the feature to be processed and the scaling weight;

performing matrix adjustment processing on the first processing result based on the target projection matrix and the target scaling weight to obtain a second processing result;

and performing vector conversion processing on the second processing result to obtain a feature interaction sub-vector for performing feature interaction on the ith feature to be processed and the nth feature to be processed.

In one embodiment, the number of the plurality of features to be processed is N, H channels are located in a target projection space corresponding to an nth feature to be processed in the N features to be processed, N is an integer greater than 1, H is a positive integer, and N is a positive integer less than or equal to N;

traversing the N to-be-processed features, and performing outer product processing on each to-be-processed feature and the nth to-be-processed feature in each channel in the target projection space so as to perform feature interaction on each to-be-processed feature and the nth to-be-processed feature in each channel in the target projection space, thereby obtaining a feature interaction vector corresponding to the nth to-be-processed feature in each channel;

and combining the feature interaction vectors corresponding to the nth feature to be processed in each channel to obtain the feature interaction vector corresponding to the nth feature to be processed.

In one embodiment, the obtaining of the estimated click rate of the target object to the target resource by performing outer product processing on the plurality of to-be-processed features is realized by a data processing model, wherein the data processing model is obtained by training an initial data processing model;

the obtaining unit 1001 is further configured to obtain a training sample, where the training sample includes a plurality of training features and a sample label; the plurality of training features includes resource features of a training resource and object features of a training object, the sample label to indicate: whether the training object clicks on the training resource;

the processing unit 1002 is further configured to initialize a model parameter of the initial data processing model through a hypercomplex product parameterization strategy based on a dimension of an embedded vector corresponding to each of the plurality of training features; the embedded vector corresponding to each training feature is obtained by performing feature embedding processing on each training feature;

the processing unit 1002 is further configured to perform outer product processing on the multiple training features through the initial data processing model, so as to perform feature interaction on the multiple training features, to obtain a feature interaction vector corresponding to each training feature in the multiple training features, where the feature interaction vector corresponding to each training feature is a high-order semantic feature of each training feature;

the processing unit 1002 is further configured to perform click rate estimation processing on the feature interaction vectors corresponding to the training features to obtain a training estimated click rate of the training object on the training resources, where the training estimated click rate is used to indicate: a probability of the training object clicking the training resource;

the processing unit 1002 is further configured to train the initial data processing model based on the training estimated click rate and the sample label, so as to obtain the data processing model.

In one embodiment, the number of the training features is N, the model parameters of the initial data processing model include a training projection matrix, and N is an integer greater than 1;

when the processing unit 1002 initializes the model parameters of the initial data processing model by the hypercomplex product parameterization strategy based on the dimension of the embedded vector corresponding to each of the plurality of training features, the following operations are specifically performed:

traversing the N training features, and initializing a splicing training projection matrix in the model parameters through a hypercomplex product parameterization strategy based on the dimension of an embedded vector corresponding to the kth training feature in the N training features and the dimension of a training splicing embedded vector; the training splicing embedded vector is obtained by splicing the embedded vectors corresponding to the training characteristics;

and splitting the spliced training projection matrix to obtain N training projection matrices, wherein the N training projection matrices are used for obtaining the feature interaction vector corresponding to the kth training feature.

In one embodiment, the dimension of the embedded vector corresponding to the k-th training feature is d_kThe dimension of the training splicing embedded vector is d;

the processing unit 1002 specifically executes the following operations when initializing the splicing training projection matrix in the model parameter by a hypercomplex product parameterization strategy based on the dimension of the embedded vector corresponding to the kth training feature of the N training features and the dimension of the training splicing embedded vector:

dimension d of embedded vector corresponding to the kth training feature_kTraining the dimension d of the splicing embedded vector, and initializing B projection parameter matrixes; the B-th projection parameter matrix in the B projection parameter matrices is the initialized first parameter matrix with dimension B x B and the initialized dimension

Kronecker product of the second parameter matrix of (a);

and summing the B projection parameter matrixes to obtain the splicing training projection matrix.

According to an embodiment of the present application, the steps involved in the data processing methods shown in fig. 2 and fig. 7 may be performed by units in the data processing apparatus shown in fig. 10. For example, step S201 shown in fig. 2 may be performed by the acquisition unit 1001 in the data processing apparatus shown in fig. 10, and steps S202 to S203 shown in fig. 2 may be performed by the processing unit 1002 in the data processing apparatus shown in fig. 10. For another example, step S701 shown in fig. 7 may be executed by the acquisition unit 1001 in the data processing apparatus shown in fig. 10, and steps S702 to S705 shown in fig. 7 may be executed by the processing unit 1002 in the data processing apparatus shown in fig. 10.

According to another embodiment of the present application, the units in the data processing apparatus shown in fig. 10 may be respectively or entirely combined into one or several other units to form one or several other units, or some unit(s) therein may be further split into multiple functionally smaller units to form one or several other units, which may achieve the same operation without affecting the achievement of the technical effect of the embodiment of the present application. The units are divided based on logic functions, and in practical application, the functions of one unit can be realized by a plurality of units, or the functions of a plurality of units can be realized by one unit. In other embodiments of the present application, the data processing apparatus based on logical function division may also include other units, and in practical applications, the functions may also be implemented by being assisted by other units, and may be implemented by cooperation of a plurality of units.

According to another embodiment of the present application, the data processing apparatus shown in fig. 10 may be constructed by running a computer program (including program codes) capable of executing the steps involved in the respective methods shown in fig. 2 and fig. 7 on a general-purpose computing device such as a computer including a processing element such as a Central Processing Unit (CPU), a random access storage medium (RAM), a read-only storage medium (ROM), and a storage element, and implementing the data processing method of the embodiment of the present application. The computer program may be embodied on a computer-readable storage medium, for example, and loaded into and executed by the above-described computing apparatus via the computer-readable storage medium.

In this embodiment of the application, after the obtaining unit 1001 obtains the multiple to-be-processed features including the resource features of the target resource and the object features of the target object, the processing unit 1002 may perform outer product processing on the multiple to-be-processed features to perform feature interaction on the multiple to-be-processed features, so as to obtain a feature interaction vector corresponding to each to-be-processed feature in the multiple to-be-processed features; carrying out click rate estimation processing on the feature interaction vector corresponding to each feature to be processed to obtain the estimated click rate of the target object to the target resource; wherein the estimated click rate is used to indicate: probability of the target object clicking the target resource. The high-order semantic features of the features to be processed can be learned through the outer product processing of the features to be processed, so that the accuracy of estimating the click rate of the target object to the target resource can be improved, and the interested information can be efficiently acquired.

Based on the data processing method embodiment and the data processing device embodiment, the application also provides a data processing device. Fig. 11 is a schematic structural diagram of a data processing apparatus according to an embodiment of the present application. The data processing device shown in fig. 11 may comprise at least a processor 1101, an input interface 1102, an output interface 1103, and a computer storage medium 1104. The processor 1101, the input interface 1102, the output interface 1103, and the computer storage medium 1104 may be connected by a bus or other means.

A computer storage medium 1104 may be stored in the memory of the data processing apparatus, the computer storage medium 1104 being for storing a computer program comprising program instructions, the processor 1101 being for executing the program instructions stored by the computer storage medium 1104. The processor 1101 (or CPU) is a computing core and a control core of the data Processing apparatus, and is adapted to implement one or more instructions, and in particular, is adapted to load and execute one or more instructions so as to implement the data Processing method flow or corresponding functions.

An embodiment of the present application further provides a computer storage medium (Memory), which is a Memory device in a data processing device and is used to store programs and data. It is understood that the computer storage medium herein may include a built-in storage medium in the terminal, and may also include an extended storage medium supported by the terminal. The computer storage medium provides a storage space that stores an operating system of the terminal. Also stored in this memory space are one or more instructions, which may be one or more computer programs (including program code), suitable for loading and execution by the processor 1101. The computer storage medium may be a Random Access Memory (RAM) memory, or a non-volatile memory (non-volatile memory), such as at least one disk memory; and optionally at least one computer storage medium located remotely from the processor.

In one embodiment, one or more instructions stored in the computer storage medium may be loaded and executed by the processor 1101 and the input interface 1102 to implement the corresponding steps of the method in the data processing method embodiment described above with reference to fig. 2 and fig. 7, and in particular, one or more instructions stored in the computer storage medium may be loaded and executed by the processor 1101 and the input interface 1102 to implement the following steps:

an input interface 1102, configured to obtain a plurality of features to be processed, where the plurality of features to be processed include an object feature of a target object and a resource feature of a target resource to be pushed to the target object;

the processor 1101 is configured to perform outer product processing on the multiple features to be processed, so as to perform feature interaction on the multiple features to be processed, so as to obtain a feature interaction vector corresponding to each feature to be processed in the multiple features to be processed, where the feature interaction vector corresponding to each feature to be processed is a high-order semantic feature of each feature to be processed;

the processor 1101 is further configured to perform click rate estimation processing on the feature interaction vector corresponding to each feature to be processed, so as to obtain an estimated click rate of the target object on the target resource, where the estimated click rate is used to indicate: a probability of the target object clicking the target resource.

when the processor 1101 performs outer product processing on the multiple features to be processed to perform feature interaction on the multiple features to be processed to obtain a feature interaction vector corresponding to each feature to be processed in the multiple features to be processed, the following operations are specifically performed:

In an embodiment, when the processor 1101 performs outer product processing on each feature to be processed and the nth feature to be processed in a target projection space corresponding to an nth feature to be processed in the N features to be processed, so as to perform feature interaction on each feature to be processed and the nth feature to be processed in the target projection space, and obtain a feature interaction vector corresponding to the nth feature to be processed, the following operations are specifically performed:

In an embodiment, when the processor 1101 performs adjustment processing on the first processing result to obtain a feature interaction sub-vector for performing feature interaction between the ith feature to be processed and the nth feature to be processed, the following operation is specifically performed:

the input interface 1102 is further configured to obtain a training sample, where the training sample includes a plurality of training features and a sample label; the plurality of training features includes resource features of a training resource and object features of a training object, the sample label to indicate: whether the training object clicks on the training resource;

the processor 1101 is further configured to initialize a model parameter of the initial data processing model through a hypercomplex product parameterization strategy based on a dimension of an embedded vector corresponding to each of the plurality of training features; the embedded vector corresponding to each training feature is obtained by performing feature embedding processing on each training feature;

the processor 1101 is further configured to perform outer product processing on the plurality of training features through the initial data processing model to perform feature interaction on the plurality of training features, so as to obtain a feature interaction vector corresponding to each training feature in the plurality of training features, where the feature interaction vector corresponding to each training feature is a high-order semantic feature of each training feature;

the processor 1101 is further configured to perform click rate estimation processing on the feature interaction vectors corresponding to the training features to obtain a training estimated click rate of the training object on the training resources, where the training estimated click rate is used to indicate: a probability of the training object clicking the training resource;

the processor 1101 is further configured to train the initial data processing model based on the training estimated click rate and the sample label, so as to obtain the data processing model.

the processor 1101, when initializing the model parameters of the initial data processing model by using a hypercomplex product parameterization strategy based on the dimension of the embedded vector corresponding to each of the plurality of training features, specifically performs the following operations:

the processor 1101 specifically executes the following operations when initializing the splicing training projection matrix in the model parameters by a hypercomplex product parameterization strategy based on the dimension of the embedding vector corresponding to the kth training feature of the N training features and the dimension of the training splicing embedding vector:

Kronecker product of the second parameter matrix of (a);

summing the B projection parameter matrixes to obtain the splicing training projection matrix

Embodiments of the present application provide a computer program product or a computer program, the computer program product comprising a computer program, the computer program being stored in a computer storage medium; the processor of the data processing device reads the computer program from the computer storage medium, and the processor executes the computer program, so that the data processing device executes the method embodiments as shown in fig. 2 and fig. 7. The computer-readable storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), or the like.

The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A data processing method, comprising:

2. The method of claim 1, wherein the number of the plurality of features to be processed is N, N being an integer greater than 1;

the performing outer product processing on the plurality of features to be processed to perform feature interaction on the plurality of features to be processed to obtain a feature interaction vector corresponding to each feature to be processed in the plurality of features to be processed includes:

3. The method according to claim 2, wherein the performing outer product processing on each feature to be processed and the nth feature to be processed in a target projection space corresponding to an nth feature to be processed in the N features to be processed to perform feature interaction on each feature to be processed and the nth feature to be processed in the target projection space to obtain a feature interaction vector corresponding to the nth feature to be processed includes:

4. The method according to claim 3, wherein the adjusting the first processing result to obtain a feature interaction sub-vector for performing feature interaction between the ith feature to be processed and the nth feature to be processed comprises:

5. The method according to any one of claims 1 to 4, wherein the number of the plurality of features to be processed is N, H channels are provided in the target projection space corresponding to the nth feature to be processed in the N features to be processed, N is an integer greater than 1, H is a positive integer, and N is a positive integer less than or equal to N;

6. The method of claim 1, wherein the obtaining of the estimated click rate of the target object to the target resource by performing outer product processing on the plurality of features to be processed is performed by a data processing model, wherein the data processing model is obtained by training an initial data processing model;

the method further comprises the following steps:

obtaining a training sample, wherein the training sample comprises a plurality of training features and a sample label; the plurality of training features includes resource features of a training resource and object features of a training object, the sample label to indicate: whether the training object clicks on the training resource;

initializing the model parameters of the initial data processing model by a hypercomplex product parameterization strategy based on the dimensionality of the embedded vector corresponding to each training feature in the training features; the embedded vector corresponding to each training feature is obtained by performing feature embedding processing on each training feature;

performing outer product processing on the training features through the initial data processing model to perform feature interaction on the training features to obtain feature interaction vectors corresponding to the training features in the training features, wherein the feature interaction vectors corresponding to the training features are high-order semantic features of the training features;

performing click rate estimation processing on the feature interaction vectors corresponding to the training features to obtain a training estimated click rate of the training object on the training resources, wherein the training estimated click rate is used for indicating: a probability of the training object clicking the training resource;

and training the initial data processing model based on the training estimated click rate and the sample label to obtain the data processing model.

7. The method of claim 6, wherein the number of the plurality of training features is N, the model parameters of the initial data processing model include a training projection matrix, N is an integer greater than 1;

initializing the model parameters of the initial data processing model by a hypercomplex product parameterization strategy based on the dimensionality of the embedded vector corresponding to each of the plurality of training features, comprising:

8. The method of claim 7, wherein the k-th training feature corresponds to an embedded vector having a dimension d_kThe dimension of the training splicing embedded vector is d;

initializing a splicing training projection matrix in the model parameters by a hypercomplex product parameterization strategy based on the dimension of the embedding vector corresponding to the kth training feature in the N training features and the dimension of the training splicing embedding vector, and the method comprises the following steps:

Kronecker product of the second parameter matrix of (a);

9. A data processing apparatus, comprising:

10. A data processing apparatus, characterized in that the data processing apparatus comprises an input interface and an output interface, and further comprises:

computer storage medium storing one or more instructions adapted to be loaded by the processor and to perform the data processing method according to any of claims 1-8.