CN111460302B

CN111460302B - Data processing method, device, electronic equipment and computer readable storage medium

Info

Publication number: CN111460302B
Application number: CN202010247366.4A
Authority: CN
Inventors: 周瑜; 赵彬杰; 臧云飞
Original assignee: Lazas Network Technology Shanghai Co Ltd
Current assignee: Lazas Network Technology Shanghai Co Ltd
Priority date: 2020-03-31
Filing date: 2020-03-31
Publication date: 2023-08-08
Anticipated expiration: 2040-03-31
Also published as: CN111460302A

Abstract

The embodiment of the disclosure discloses a data processing method, a device, electronic equipment and a computer readable storage medium, wherein the data processing method comprises the steps of acquiring input text data of a user and N pieces of historical behavior text data, wherein the historical behavior text data corresponds to the historical behavior of the user, and N is an integer greater than or equal to 1; acquiring, by a processor, a first text semantic vector corresponding to the input text data and N second text semantic vectors corresponding to the N historical behavioral text data, respectively, using a first model; obtaining N third text semantic vectors C by a processor according to the N second text semantic vectors by using one second model or a plurality of second models connected in series _i I=1 to N; based on the first text semantic vector and the N third text semantic vectors C by a processor _i The interest vector of the user is acquired, so that the acquired interest vector of the user can more accurately express the current interest of the user.

Description

Data processing method, device, electronic equipment and computer readable storage medium

Technical Field

The present disclosure relates to the field of computer application technologies, and in particular, to a data processing method, a data processing device, an electronic device, and a computer readable storage medium.

Background

With the development of internet technology, the internet platform uses a recommendation model to determine content that may be of interest to a user and recommend the content to the user. For example, in a recommendation model, user interests are characterized by interest features of a user, and introducing the interest features into the recommendation model can enhance the degree of distinction between different samples of the recommendation model, thereby improving the accuracy of the recommendation model. In the prior art, the interest characteristics of the user are generally determined based on the historical behaviors of the user, but the user behaviors are influenced by the time and place of the behaviors, the application programs in use, the exposure content and other factors, namely, the historical behaviors of the user often cannot accurately reflect the current preference of the user, so how to acquire the effective user interest characteristics based on the user behaviors becomes a problem to be solved urgently.

Disclosure of Invention

To solve the problems in the related art, embodiments of the present disclosure provide a data processing method, apparatus, electronic device, and computer-readable storage medium.

In a first aspect, an embodiment of the present disclosure provides a data processing method.

Specifically, the data processing method includes:

acquiring input text data of a user and N historical behavior text data, wherein the historical behavior text data corresponds to the historical behavior of the user, and N is an integer greater than or equal to 1;

acquiring, by a processor, a first text semantic vector corresponding to the input text data and N second text semantic vectors corresponding to the N historical behavioral text data, respectively, using a first model;

obtaining N third text semantic vectors C by a processor according to the N second text semantic vectors by using one second model or a plurality of second models connected in series _i ，i＝1～N；

Based on the first text semantic vector and the N third text semantic vectors C by a processor _i And obtaining the interest vector of the user.

With reference to the first aspect, in a first implementation manner of the first aspect, the acquiring input text data and N pieces of historical behavioral text data of a user includes:

acquiring the input text data of the user;

based on the input text data, the N historical behavioral text data are determined.

With reference to the first implementation manner of the first aspect, in a second implementation manner of the first aspect, the determining the N pieces of historical behavioral text data based on the input text data includes:

Acquiring M candidate historical behavior text data of the user in a preset historical time period, wherein M is an integer greater than or equal to N;

and according to the relativity of the input text data and the M candidate historical behavior text data, determining N candidate historical behavior text data in the M candidate historical behavior text data as the N historical behavior text data.

With reference to the first aspect, in a third implementation manner of the first aspect, the first model includes any one of the following models: word2vector model, item2vector model, BERT model.

With reference to the first aspect, in a fourth implementation manner of the first aspect, the second model includes a first linear layer, a first multi-head attention layer, a first residual normalization layer, a first feedforward neural network layer, and a second residual normalization layer that are sequentially connected along an input-to-output direction of the second model, where the first multi-head attention layer includes a first sub-linear layer, a dot product attention layer, a first stitching layer, and a second sub-linear layer that are sequentially connected along the input-to-output direction of the first multi-head attention layer.

With reference to the fourth implementation manner of the first aspect, in a fifth implementation manner of the first aspect, the dot product attention layer includes a first dot product layer, a scaling layer, a first mask layer, a first Softmax function activation layer, and a second dot product layer sequentially connected along an input-to-output direction of the dot product attention layer, where the first dot product layer is configured to perform a dot product operation on a first key vector and a first query vector output by the first sub-linear layer, and the second dot product layer is configured to perform a dot product operation on an output result of the first Softmax function activation layer and a first value vector output by the first sub-linear layer.

With reference to the first aspect, in a sixth implementation manner of the first aspect, the second model includes a second linear layer, a second multi-head attention layer, a third residual normalization layer, a second feedforward neural network layer, and a fourth residual normalization layer that are sequentially connected along an input-to-output direction of the second model, where the second multi-head attention layer includes a third sub-linear layer, a cosine operation attention layer, a second stitching layer, and a fourth sub-linear layer that are sequentially connected along the input-to-output direction of the second multi-head attention layer.

With reference to the sixth implementation manner of the first aspect, in a seventh implementation manner of the first aspect, the cosine operation attention layer includes a cosine operation layer, a second mask layer, a second Softmax function activation layer, and a third dot product layer sequentially connected along an input-to-output direction of the cosine operation attention layer, where the cosine operation layer is configured to perform a cosine operation on a second key vector and a second query vector output by the third sub-linear layer, and the third dot product layer is configured to perform a dot product operation on an output result of the second Softmax function activation layer and a second value vector output by the third sub-linear layer.

With reference to the first aspect, in an eighth implementation manner of the first aspect, the processing unit is configured to perform, according to the first text semantic vector and the N third text semantic vectors C _i Obtaining the interest vector of the user comprises the following steps:

respectively acquiring the first text semantic vector and the N third text semantic vectors C by using a third model _i N correlations w of (2) _i ，i＝1～N；

Based on the N correlations w _i The N third text semantic vectors C _i And obtaining the interest vector of the user.

With reference to the eighth implementation manner of the first aspect, in a ninth implementation manner of the first aspect, the method includes _i The N third text semantic vectors C _i Obtaining the interest vector of the user comprises the following steps:

the correlation degree w _i Is determined as the third text semantic vector C _i The corresponding weight;

the N third text semantic vectors C _i Is determined as the interest vector of the user.

With reference to the first aspect, in a tenth implementation manner of the first aspect, the present disclosureThe processor is used for processing the first text semantic vector and the N third text semantic vectors C according to the first text semantic vector and the N third text semantic vectors C _i Obtaining the interest vector of the user comprises the following steps:

Based on the first text semantic vector, the N relatedness w _i The N third text semantic vectors C _i And obtaining the interest vector of the user.

With reference to the tenth implementation manner of the first aspect, in an eleventh implementation manner of the first aspect, the N relevance degrees w based on the first text semantic vector _i The N third text semantic vectors C _i Obtaining the interest vector of the user comprises the following steps:

the N third text semantic vectors C _i Is determined as an intermediate interest vector for the user;

splicing the first text semantic vector and the intermediate interest vector to obtain a first spliced vector;

linearizing the first spliced vector to obtain a linearization vector;

and carrying out nonlinear processing on the linearization vector through a Tanh function to obtain the interest vector of the user.

With reference to the eighth implementation manner or the tenth implementation manner of the first aspect, in a twelfth implementation manner of the first aspect, the third model includes a difference processing layer, a third stitching layer, an activation layer, and a third linear layer that are sequentially connected along an input-to-output direction of the third model, and the third model is used to obtain the first text semantic vector and the N third text semantic vectors C respectively _i N correlations w of (2) _i Comprising:

combining the first text semantic vector with the third text semantic vector C _i Inputting the difference processing layer to obtain a difference vector;

combining the first text semantic vector and the third text semantic vector C _i The difference value vector is input into the third splicing layer, and a second splicing vector is obtained;

inputting the second splicing vector into the activation layer to obtain an activation vector;

inputting the activation vector into the third linear layer, and reducing the dimension of the activation vector to 1 to obtain a dimension-reduction value;

determining the dimensionality reduction value as the first text semantic vector and the third text semantic vector C _i Is w of the degree of correlation of _i 。

In a second aspect, a data processing apparatus is provided in an embodiment of the present disclosure.

Specifically, the data processing apparatus includes:

the system comprises a first acquisition module, a second acquisition module and a first processing module, wherein the first acquisition module is configured to acquire input text data of a user and N pieces of historical behavior text data, the historical behavior text data correspond to historical behaviors of the user, and N is an integer greater than or equal to 1;

a second acquisition module configured to acquire, by a processor, a first text semantic vector corresponding to the input text data and N second text semantic vectors corresponding to the N historical behavioral text data, respectively, using a first model;

A third obtaining module configured to obtain, by the processor, N third text semantic vectors C using one second model or a plurality of second models connected in series according to the N second text semantic vectors _i ，i＝1～N；

A fourth acquisition module configured to, by the processor, based on the first text semantic vector and the N third text semantic vectors C _i And obtaining the interest vector of the user.

With reference to the second aspect, in a first implementation manner of the second aspect, the acquiring input text data and N pieces of historical behavioral text data of the user includes:

acquiring the input text data of the user;

With reference to the first implementation manner of the second aspect, in a second implementation manner of the second aspect, the determining the N pieces of historical behavioral text data based on the input text data includes:

With reference to the second aspect, in a third implementation manner of the second aspect, the first model includes any one of the following models: word2vector model, item2vector model, BERT model.

With reference to the second aspect, in a fourth implementation manner of the second aspect, the second model includes a first linear layer, a first multi-head attention layer, a first residual normalization layer, a first feedforward neural network layer, and a second residual normalization layer that are sequentially connected along an input-to-output direction of the second model, where the first multi-head attention layer includes a first sub-linear layer, a dot product attention layer, a first stitching layer, and a second sub-linear layer that are sequentially connected along the input-to-output direction of the first multi-head attention layer.

With reference to the fourth implementation manner of the second aspect, in a fifth implementation manner of the second aspect, the dot-product attention layer includes a first dot-product layer, a scaling layer, a first mask layer, a first Softmax function activation layer, and a second dot-product layer sequentially connected along an input-to-output direction of the dot-product attention layer, where the first dot-product layer is configured to perform a dot-product operation on a first key vector and a first query vector output by the first sub-linear layer, and the second dot-product layer is configured to perform a dot-product operation on an output result of the first Softmax function activation layer and a first value vector output by the first sub-linear layer.

With reference to the second aspect, in a sixth implementation manner of the second aspect, the second model includes a second linear layer, a second multi-head attention layer, a third residual normalization layer, a second feedforward neural network layer, and a fourth residual normalization layer that are sequentially connected along an input-to-output direction of the second model, where the second multi-head attention layer includes a third sub-linear layer, a cosine operation attention layer, a second stitching layer, and a fourth sub-linear layer that are sequentially connected along the input-to-output direction of the second multi-head attention layer.

With reference to the sixth implementation manner of the second aspect, in a seventh implementation manner of the second aspect, the cosine operation attention layer includes a cosine operation layer, a second mask layer, a second Softmax function activation layer, and a third dot-product layer sequentially connected along an input-to-output direction of the cosine operation attention layer, where the cosine operation layer is configured to perform a cosine operation on a second key vector and a second query vector output by the third sub-linear layer, and the third dot-product layer is configured to perform a dot-product operation on an output result of the second Softmax function activation layer and a second value vector output by the third sub-linear layer.

With reference to the second aspect, in an eighth implementation manner of the second aspect, the processing unit is configured to perform, according to the first text semantic vector and the N third text semantic vectors C _i Obtaining the interest vector of the user comprises the following steps:

With reference to the eighth implementation manner of the second aspect, the present disclosure is in the second aspectIn a ninth implementation manner, the correlation degree w is based on the N correlations _i The N third text semantic vectors C _i Obtaining the interest vector of the user comprises the following steps:

With reference to the second aspect, in a tenth implementation manner of the second aspect, the processing unit is configured to perform, according to the first text semantic vector and the N third text semantic vectors C _i Obtaining the interest vector of the user comprises the following steps:

With reference to the tenth implementation manner of the second aspect, in an eleventh implementation manner of the second aspect, the N relevance degrees w based on the first text semantic vector _i The N third text semantic vectors C _i Obtaining the interest vector of the user comprises the following steps:

linearizing the first spliced vector to obtain a linearization vector;

With reference to the eighth implementation manner or the tenth implementation manner of the second aspect, in a twelfth implementation manner of the second aspect, the third model includes a difference processing layer, a third stitching layer, an activation layer, and a third linear layer that are sequentially connected along an input-to-output direction of the third model, and the first text semantic vector and the N third text semantic vectors C are respectively acquired by using the third model _i N correlations w of (2) _i Comprising:

In a third aspect, embodiments of the present disclosure provide an electronic device, including a memory and a processor, where the memory is configured to store one or more computer instructions, where the one or more computer instructions are executed by the processor to implement the method as in the first aspect, the first implementation manner to the twelfth implementation manner of the first aspect.

In a fourth aspect, in an embodiment of the present disclosure, there is provided a computer readable storage medium having stored thereon computer instructions which, when executed by a processor, implement a method according to the first aspect, the first to the twelfth implementation forms of the first aspect.

According to the technical scheme provided by the embodiment of the disclosure, input text data of a user and N (N is more than or equal to 1 and is an integer) historical behavior text data are obtained, wherein the historical behavior text data correspond to the historical behavior of the user, a first text semantic vector corresponding to the input text data and N second text semantic vectors corresponding to the N historical behavior text data respectively are obtained through a processor by using a first model, and N third text semantic vectors C are obtained through the processor according to the N second text semantic vectors by using a second model or a plurality of second models connected in series _i I=1 to N, and according to the first text semantic vector and the N third text semantic vectors C by the processor _i The interest vector of the user is acquired, so that the acquired interest vector of the user can more accurately express the current interest of the user, and the distinguishing degree and the accuracy of the recommendation model can be improved when the interest vector of the user is applied to a recommendation system.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

Other features, objects and advantages of the present disclosure will become more apparent from the following detailed description of non-limiting embodiments, taken in conjunction with the accompanying drawings. In the drawings:

FIG. 1 illustrates a flow chart of a data processing method according to an embodiment of the present disclosure;

FIG. 2 illustrates a flowchart for obtaining input text data and N historical behavioral text data of a user according to an embodiment of the disclosure;

FIG. 3 illustrates a flowchart for determining the N historical behavioral text data based on the input text data according to an embodiment of the disclosure;

FIG. 4 shows a schematic structural diagram of a second model according to an embodiment of the present disclosure;

FIG. 5 shows a schematic structural diagram of a second model according to an embodiment of the present disclosure;

FIG. 6 illustrates the text semantic vector and the N third text languages according to the first text semantic vector and the N third text languages by a processor according to an embodiment of the present disclosureSense vector C _i A flow chart of interest vectors of the user is obtained;

FIG. 7 shows a correlation w based on the N correlations in accordance with an embodiment of the disclosure _i The N third text semantic vectors C _i Acquiring a schematic diagram of the interest vector of the user;

FIG. 8 illustrates a text semantic vector C according to the first text semantic vector and the N third text semantic vectors by a processor according to an embodiment of the present disclosure _i A flow chart of interest vectors of the user is obtained;

FIG. 9 illustrates the N relevance w based on the first text semantic vector according to an embodiment of the present disclosure _i The N third text semantic vectors C _i Acquiring a schematic diagram of the interest vector of the user;

FIG. 10 shows a schematic structural view of a third model according to an embodiment of the present disclosure;

FIG. 11 illustrates an application scenario diagram of a data processing method according to an embodiment of the present disclosure;

FIG. 12 shows a block diagram of a data processing apparatus according to an embodiment of the present disclosure;

fig. 13 shows a block diagram of an electronic device according to an embodiment of the disclosure;

fig. 14 shows a schematic diagram of a computer system suitable for use in implementing a data processing method according to an embodiment of the present disclosure.

Detailed Description

Hereinafter, exemplary embodiments of the present disclosure will be described in detail with reference to the accompanying drawings so that those skilled in the art can easily implement them. In addition, for the sake of clarity, portions irrelevant to description of the exemplary embodiments are omitted in the drawings.

In this disclosure, it should be understood that terms such as "comprises" or "comprising," etc., are intended to indicate the presence of features, numbers, steps, acts, components, portions, or combinations thereof disclosed in this specification, and are not intended to exclude the possibility that one or more other features, numbers, steps, acts, components, portions, or combinations thereof are present or added.

The user data obtained in the present disclosure is either authorized, confirmed, or actively selected by the user. In addition, it should be noted that, without conflict, the embodiments of the present disclosure and features of the embodiments may be combined with each other. The present disclosure will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.

The inventor finds that when the prior art obtains the user interest characteristic based on the user behavior, a Session-based recommendation (Session-Based Recommendation) is generally adopted to establish a user interest model. For example, a deep interest network (Deep Interest Network, DIN) draws attention to a mechanism for the first time, obtains the correlation between historical behavioral commodities and candidate advertisement commodities through an interest activation module, and uses the correlation as an interest weight to represent user interest characteristics.

In the prior art, when the interest characteristics of the user are acquired, the correlation degree between a plurality of candidate advertisement commodities and commodities related to the historical behaviors needs to be calculated, and the calculation amount is large due to the large number of the candidate advertisement commodities, so that the overall calculation efficiency is influenced. Meanwhile, because the historical behavior of the user may be very complicated, the user interest feature thus obtained often cannot accurately reflect the current interest of the user.

The present disclosure has been made to solve the problems in the prior art as found by the inventors.

Fig. 1 shows a flow chart of a data processing method according to an embodiment of the present disclosure. As shown in fig. 1, the data processing method includes the following steps S101 to S104:

in step S101, input text data of a user and N pieces of historical behavior text data are obtained, wherein the historical behavior text data corresponds to a historical behavior of the user, and N is an integer greater than or equal to 1;

in step S102, acquiring, by a processor, a first text semantic vector corresponding to the input text data and N second text semantic vectors corresponding to the N historical behavioral text data, respectively, using a first model;

in step S103, a second mode is utilized by the processor based on the N second text semantic vectorsObtaining N third text semantic vectors C by multiple second models connected in series or in series _i ，i＝1～N；

In step S104, the first text semantic vector and the N third text semantic vectors C are processed by the processor _i And obtaining the interest vector of the user.

According to the embodiment of the disclosure, input text data of a user may be acquired, wherein the input text data may be a query sentence (query) input by the user through a terminal device, or may be any text input by the user. In general, text entered by a user can reflect aspects of the user's current interest. N (N is more than or equal to 1 and is an integer) historical behavior text data of the user can be obtained, wherein the historical behavior text data corresponds to the historical behavior of the user. For example, in the e-commerce field, the input text data may include names of goods, and the historical behavior text data may include names of goods that a user clicks, purchases, or collects on the e-commerce platform; in the take-away field, the input text data may include names of dishes, and the historical behavioral text data may include names of dishes that the user clicks, purchases, or collects at the take-away platform. The method for acquiring the input text data and the historical behavior text data is not particularly limited, and can be selected according to actual needs.

According to an embodiment of the present disclosure, in order to acquire semantic features of input text data and N pieces of historical behavioral text data, a first text semantic vector corresponding to the input text data and N pieces of second text semantic vectors corresponding to the N pieces of historical behavioral text data, respectively, may be acquired using a first model, so that the semantic features of the input text data and the N pieces of historical behavioral text data are introduced through the first model. The present disclosure does not specifically limit the first model, as long as a model that can convert text data into an ebadd vector can be implemented, which is within the scope of the embodiments of the present disclosure, such as a Word2vector model, an Item2vector model, a BERT (Bidirectional Encoder Representations from Transformers, transformer-based bi-directional encoder representation), and the like.

According to the present inventionThe disclosed embodiment can utilize one second model or a plurality of second models connected in series to acquire N third text semantic vectors C based on N second text semantic vectors respectively corresponding to N historical behavior text data by learning internal association relations among the N second text semantic vectors _i I=1 to N, wherein N third text semantic vectors C _i The internal association relationship between the N second text semantic vectors, namely the N historical behavior text data, can be embodied. The second model may be a trained deep learning model, which is not specifically limited by the present disclosure, and may be selected according to actual needs.

According to the embodiment of the disclosure, since the current input text data of the user can embody the current intention of the user to a certain extent, the text semantic vector C can be based on the first text semantic vector corresponding to the input text data of the user and N third text semantic vectors C which can embody the internal association relation among N historical behavior text data of the historical preference of the user _i The interest vector of the user is acquired, so that the acquired interest vector of the user can more accurately express the current interest of the user.

Fig. 2 illustrates a flowchart for obtaining input text data and N historical behavioral text data of a user according to an embodiment of the present disclosure. As shown in fig. 2, the step S101, that is, obtaining the input text data and N pieces of historical behavioral text data of the user, includes the following steps S201 to S202:

in step S201, the input text data of the user is acquired;

in step S202, the N pieces of history behavioral text data are determined based on the input text data.

Since the historical behaviors of the user are very abundant, various objects for executing different types of behaviors can be included, if all the historical behaviors are considered, huge calculation amount is caused, and certain interference information is also contained, so that the finally obtained interest vector of the user cannot accurately represent the current preference of the user.

According to the embodiment of the disclosure, after the input text data of the user is obtained, since the input text data can reflect the current intention of the user to a certain extent, the N historical behavior text data of the user can be determined based on the obtained input text data, so that the obtained N historical behavior text data and the input text data have a certain association relationship, even if the N historical behavior text data and the current intention of the user have a certain association relationship.

For example, in the e-commerce field, assuming that the user's current input text data is "dress", assuming that the user's historical behavior may include: clicking orange, purchasing trousers, purchasing a primer, collecting jerky, and the like. Because the one-piece dress belongs to the category of clothes, and the trousers and the primer shirt belong to the category of clothes, the trousers and the primer shirt can be used as historical behavior text data; the orange and the jerky do not belong to the category of clothes, so the orange and the jerky are not used as the historical behavior text data, and the obtained historical behavior text data has a certain association relationship with the current intention of the user.

FIG. 3 illustrates a flow chart for determining the N historical behavioral text data based on the input text data according to an embodiment of the disclosure. As shown in fig. 3, the step S202, that is, determining the N pieces of history behavioral text data based on the input text data, includes the following steps S301 to S302:

in step S301, M candidate historical behavior text data of the user in a preset historical time period are obtained, where M is an integer greater than or equal to N;

In step S302, N candidate historical behavior text data in the M candidate historical behavior text data are determined as the N historical behavior text data according to the relevance between the input text data and the M candidate historical behavior text data.

According to the embodiment of the disclosure, M (M is greater than or equal to N and is an integer) candidate historical behavior text data of a user in a preset historical time period can be obtained, wherein the preset historical time period can be determined according to actual needs, for example, the preset time period from the current time to the front can be determined, and the disclosure does not limit the text data specifically; the candidate historical behavior text data may include all objects that the user performs different types of behaviors.

According to the embodiment of the disclosure, in order to determine the association relationship between the input text data and the M candidate historical behavior text data, the correlation between the input text data and the M candidate historical behavior text data may be calculated respectively. Since the higher the correlation between the input text data and the candidate historical behavior text data is, the higher the correlation between the input text data and the candidate historical behavior text data can be represented, so that N candidate historical behavior text data having the higher correlation with the input text data can be determined as N historical behavior text data from the M candidate historical behavior text data.

According to an embodiment of the disclosure, the second model comprises a first linear layer, a first multi-headed attention layer, a first residual normalization layer, a first feedforward neural network layer and a second residual normalization layer connected in sequence along an input-to-output direction of the second model, wherein the first multi-headed attention layer comprises a first sub-linear layer, a dot product attention layer, a first splice layer and a second sub-linear layer connected in sequence along the input-to-output direction of the first multi-headed attention layer.

Fig. 4 shows a schematic structural diagram of a second model according to an embodiment of the present disclosure. As shown in fig. 4, the embodiment of the disclosure will be described by taking M (M is greater than or equal to 1 and is an integer) second models connected in series as an example, and the specific value of M is not specifically limited in the disclosure, and may be selected according to actual needs. The second model includes a first Linear layer 401 (Linear), a first Multi-Head Attention layer 402 (Multi-Head Attention), a first residual normalization layer 403 (Add & Norm), a first Feed Forward neural network layer 404 (Feed Forward), and a second residual normalization layer 405, which are sequentially connected in the input-to-output direction of the second model. The first multi-head attention layer 402 includes a first sub-linear layer 402A, a Dot product attention layer 402B (Scaled Dot-Product Attention), a first splicing layer (splice) 402C, and a second sub-linear layer 402D, which are sequentially connected along the input-to-output direction of the first multi-head attention layer 402.

According to an embodiment of the disclosure, the dot product attention layer includes a first dot product layer, a scaling layer, a first mask layer, a first Softmax function activation layer, and a second dot product layer sequentially connected along an input-to-output direction of the dot product attention layer, wherein the first dot product layer is configured to perform dot product operation on a first key vector and a first query vector output by the first sub-linear layer, and the second dot product layer is configured to perform dot product operation on an output of the first Softmax function activation layer and a first value vector output by the first sub-linear layer.

As shown in fig. 4, the dot-product attention layer 402B includes a first dot-product layer (Matmul), a scaling layer (Scale), a first Mask layer (Mask), a first Softmax function activation layer, and a second dot-product layer, which are sequentially connected in an input-to-output direction of the dot-product attention layer 402B.

According to embodiments of the present disclosure, N second text semantic vectors may be input to the second model separately or together, assuming that the vector input to the second model is B _i (i=1 to N) (for the case of using a second model, i.e., m=1, B _i Is the ithTwo text semantic vectors, for the case of using M second models in series, i.e., M >1, B of the first and second models is input _i Is the ith second text semantic vector, B is input into the rest of the second models _i Is the i-th vector output by the previous second model) and is converted into a key vector K after the action of the first linear layer 401 _i Query vector Qi and value vector V _i . Will key vector K _i Query vector Qi and value vector V _i After the first multi-head attention layers 402 are input together, under the action of the first sub-linear layers 402A, the first key vectors K are output respectively _1i First query vector Q _1i And a first value vector V _1i 。

To the first key vector K _1i First query vector Q _1i And a first value vector V _1i After inputting the dot-product attention layer 402B, a first dot-product layer is used for outputting a first key vector K to the first sub-linear layer 402A respectively _1i And N first query vectors Q _1j (j=1 to N) performing a dot product operation, i.e. a first key vector K _1i Respectively with N vectors B _i Corresponding N first query vectors Q _1j Dot product operation is carried out to obtain a vector B _i The first dot product operation result D of (2) _i Thereby making vector B _i Can be combined with other (N-1) vectors B _j (j=1 to N, j+.i). Alternatively, the first key vector K _1i First query vector Q _1i And a first value vector V _1i After entering the dot-product attention layer 402B, the first dot-product layer is used for outputting a first query vector Q to the first sub-linear layer 402A respectively _1i And N first key vectors K _1j (j=1 to N) performing a dot product operation, i.e. a first query vector Q _1i Respectively with N vectors B _j Corresponding N first key vectors K _1j Dot product operation is carried out to obtain a vector B _i The first dot product operation result D of (2) _i Thereby making vector B _i Can be combined with other (N-1) second vectors B _j (j=1 to N, j+.i).

The acquired target vector B _i The first dot product operation result D of (2) _i Sequentially through scaling layersAfter the first mask layer and the first Softmax function activation layer function, obtaining an output result of the first Softmax function activation layer, wherein the scaling layer is used for scaling the first dot product operation result D _i Divided byd _k May be a vector dimension of the first key vector or the first query vector. The second dot stacking layer is used for combining the output result of the first Softmax function activation layer with N vectors B _i Respective first value vector V _1j Performing dot product operation to obtain a second dot product operation result Y _i 。

The second dot product result Y _i After sequentially passing through the first splicing layer 402C, the second sub-linear layer 402D, the first residual normalization layer 403, the first feedforward neural network layer 404 and the second residual normalization layer 405, the output of the second model is obtained. When one second model is used, i.e. m=1, the output of a single second model is the third text semantic vector C _i The method comprises the steps of carrying out a first treatment on the surface of the When using a plurality of second models in series, i.e. M>1, the output of the Mth second model is the third text semantic vector C _i . Due to the third text semantic vector C _i Is based on the second text semantic vector B _i The association relationship with other (N-1) second text semantic vectors is obtained, and therefore, the obtained third text semantic vector C _i The internal association relationship between the N second text semantic vectors, namely the N historical behavior text data, can be embodied.

The inventors of the present disclosure recognize that when the second model includes a dot product attention layer 402B in the first multi-headed attention layer 402, due to d _k Is the vector dimension of the first key vector or the first query vector, and the vector dimension of the first key vector or the first query vector is generally at least a hundred digits, i.eThe value of (2) is greater than or equal to 10, so that the scaling result output by the scaling layer after the first dot product operation result is acted on and the value of the output result of the first Softmax function activation layerVery small, resulting in an acquired second dot product result Y _i The mean state is presented, and the size of the internal association degree between the N second text semantic vectors, namely the N historical behavior text data, cannot be distinguished. Meanwhile, as the output result of the first Softmax function activation layer is very small in value, gradient dispersion easily occurs in the process of training the second model, convergence is difficult, and parameters of the trained second model are difficult to determine. Accordingly, embodiments of the present disclosure improve upon the dot product attention layer 402B in the first multi-headed attention layer 402.

According to an embodiment of the disclosure, the second model includes a second linear layer, a second multi-head attention layer, a third residual normalization layer, a second feedforward neural network layer, and a fourth residual normalization layer that are sequentially connected along an input-to-output direction of the second model, wherein the second multi-head attention layer includes a third sub-linear layer, a cosine operation attention layer, a second splice layer, and a fourth sub-linear layer that are sequentially connected along the input-to-output direction of the second multi-head attention layer.

Fig. 5 shows a schematic structural diagram of a second model according to an embodiment of the present disclosure. As shown in fig. 5, the embodiment of the present disclosure will be described by taking a series of X (X is greater than or equal to 1 and is an integer) second models as an example, and the specific value of X is not specifically limited in the present disclosure, and may be selected according to actual needs. The second model includes a second linear layer 501, a second multi-headed attention layer 502, a third residual normalization layer 503, a second feedforward neural network layer 504, and a fourth residual normalization layer 505, which are sequentially connected in the input-to-output direction of the second model. The second multi-head attention layer 502 includes a third sub-linear layer 502A, a cosine-operation attention layer 502B, a second splicing layer 502C, and a fourth sub-linear layer 502D sequentially connected along the input-to-output direction of the second multi-head attention layer 502.

According to an embodiment of the disclosure, the cosine operation attention layer includes a cosine operation layer, a second mask layer, a second Softmax function activation layer, and a third dot product layer sequentially connected along an input-to-output direction of the cosine operation attention layer, wherein the cosine operation layer is configured to perform cosine operation on a second key vector and a second query vector output by the third sub-linear layer, and the third dot product layer is configured to perform dot product operation on an output of the second Softmax function activation layer and a second value vector output by the third sub-linear layer.

As shown in fig. 5, the cosine-operation attention layer 502B includes a cosine-operation layer, a second mask layer, a second Softmax-function activation layer, and a third dot stack, which are sequentially connected along an input-to-output direction of the cosine-operation attention layer 502B.

According to embodiments of the present disclosure, N second text semantic vectors may be input to the second model separately or together, assuming that the vector input to the second model is B _i (i=1 to N) (for the case of using a second model, i.e., x=1, B _i Is the ith second text semantic vector, for the case of using X second models in series, X>1, B of the first and second models is input _i Is the ith second text semantic vector, B is input into the rest of the second models _i Is the i-th vector output by the previous second model) and is converted into a key vector K after the action of the second linear layer 501 _i Query vector Q _i Sum vector V _i . Will key vector K _i Query vector Q _i Sum vector V _i After the second multi-head attention layer 502 is input together, the second key vectors K are output under the action of the third sub-linear layer 502A _2i Second query vector Q _2i And a second value vector V _2i 。

Will second key vector K _2i Second query vector Q _2i And a second value vector V _2i After inputting the cosine-operation attention layer 502B, the cosine-operation layer is used for outputting the second key vectors K to the third sub-linear layer 502A respectively _2i And N second query vectors Q _2j (j=1 to N) performing a cosine operation, i.e. a second key vector K _2i Respectively with N vectors B _i Corresponding N second query vectors Q _2j Cosine operation is carried out to obtain a vector B _i First cosine operation result F _i Thereby making vector B _i Can be combined with other (N-1) vectors B _j (j=1 to N, j+.i). Or alternativelyWill second key vector K _2i Second query vector Q _2i And a second value vector V _2i After inputting the cosine-operation attention layer 502B, the cosine-operation layer is configured to output the second query vectors Q to the third sub-linear layer 502A respectively _2i And N second key vectors K _2j (j=1 to N) performing a cosine operation, i.e. a second query vector Q _2i Respectively with N vectors B _i N corresponding second key vectors K _2j Cosine operation is carried out to obtain a vector B _i First cosine operation result F _i Thereby making vector B _i Can be combined with other (N-1) vectors B _j (j=1 to N, j+.i).

The acquired target vector B _i First cosine operation result F _i And after sequentially passing through the second mask layer and the second Softmax function activation layer, obtaining an output result of the second Softmax function activation layer. The third dot stacking layer is used for combining the output result of the second Softmax function activation layer with N vectors B _i Respective first value vector V _1j Performing dot product operation to obtain a third dot product operation result G _i 。

The third dot product result G _i And after the second splicing layer 502C, the fourth sub-linear layer 502D, the third residual normalization layer 503, the second feedforward neural network layer 504 and the fourth residual normalization layer 505 function in sequence, obtaining the output of the second model. When one second model is used, i.e. x=1, the output of a single second model is the third text semantic vector C _i The method comprises the steps of carrying out a first treatment on the surface of the When using a plurality of second models in series, i.e. X >1, the output of the X second model is the third text semantic vector C _i . Due to the third text semantic vector C _i Is based on the second text semantic vector B _i The association relationship with other (N-1) second text semantic vectors is obtained, and therefore, the obtained third text semantic vector C _i The internal association relationship between the N second text semantic vectors, namely the N historical behavior text data, can be embodied.

According to the technical scheme provided by the embodiment of the disclosure, the cosine operation attention layer is adopted to replace the dot product attention layer, so that the value of the output result of the second Softmax function activation layer has the differentiation degree, the magnitude of the internal association degree among N second text semantic vectors, namely N historical behavior text data, can be distinguished, gradient dispersion is avoided, and the convergence speed of the second model training process is accelerated.

FIG. 6 illustrates a text semantic vector C according to the first text semantic vector and the N third text semantic vectors by a processor according to an embodiment of the present disclosure _i And obtaining a flow chart of the interest vector of the user. As shown in fig. 6, the step S104 is to use a processor to generate the first text semantic vector and the N third text semantic vectors C _i The obtaining of the interest vector of the user comprises the following steps S601-S602:

in step S601, the first text semantic vector and the N third text semantic vectors C are respectively acquired by using a third model _i N correlations w of (2) _i ，i＝1～N；

In step S602, based on the N correlations w _i The N third text semantic vectors C _i And obtaining the interest vector of the user.

According to the embodiment of the disclosure, since the first text semantic vector can reflect the current intention of the user to a certain extent, N third text semantic vectors C _i The historical preference of the user can be reflected to a certain extent, and in order to enable the obtained interest vector of the user to better express the current interest of the user, a third model can be utilized to respectively obtain a first text semantic vector and N third text semantic vectors C _i N correlations w of (2) _i The third model is not particularly limited in this disclosure, and may be selected according to actual needs. Then can be based on the obtained N correlations w _i N third text semantic vectors C _i The interest vector of the user is acquired, so that the acquired interest vector of the user can better express the current interest of the user.

According to an embodiment of the disclosure, the step S602 is based on the N correlations w _i The N third text semantic vectors C _i Obtaining the interest vector of the user comprises the following steps:

FIG. 7 shows a correlation w based on the N correlations in accordance with an embodiment of the disclosure _i The N third text semantic vectors C _i And acquiring a schematic diagram of the interest vector of the user.

As shown in fig. 7, the first text semantic vector and the N third text semantic vectors C are acquired respectively _i N correlations w of (2) _i Thereafter, the correlation degree w may be _i Determined as a third text semantic vector C _i Corresponding weights, e.g. will w ₁ Determined as a third text semantic vector C ₁ The corresponding weight, w ₂ Determined as a third text semantic vector C ₂ The corresponding weight, w ₃ Determined as a third text semantic vector C ₃ The corresponding weight, … …, will be w _(N-1) Determined as a third text semantic vector C _(N-1) The corresponding weight, w _N Determined as a third text semantic vector C _N The corresponding weight. Then N third text semantic vectors C _i Is determined as the interest vector E of the user ₁ Wherein E is ₁ ＝w ₁ *C ₁ +w ₂ *C ₂ +w ₃ *C ₃ +……+w _(N-1) *C _(N-1) +w _N *C _N 。

FIG. 8 illustrates a text semantic vector C according to the first text semantic vector and the N third text semantic vectors by a processor according to an embodiment of the present disclosure _i And obtaining a flow chart of the interest vector of the user. As shown in fig. 8, the step S104 is to use a processor to generate the first text semantic vector and the N third text semantic vectors C _i The obtaining of the interest vector of the user includes the following steps S801-S802:

in step S801, use is made ofA third model for respectively acquiring the first text semantic vector and the N third text semantic vectors C _i N correlations w of (2) _i ，i＝1～N；

In step S802, the N relevance degrees w are based on the first text semantic vector _i The N third text semantic vectors C _i And obtaining the interest vector of the user.

According to the embodiment of the disclosure, since the first text semantic vector can reflect the current intention of the user to a certain extent, N third text semantic vectors C _i The internal association relationship between N historical behavior text data can be embodied to a certain extent, and in order to enable the obtained interest vector of the user to better express the current interest of the user, a third model can be utilized to respectively obtain a first text semantic vector and N third text semantic vectors C _i N correlations w of (2) _i The third model is not particularly limited in this disclosure, and may be selected according to actual needs.

According to the embodiment of the disclosure, in order that the obtained interest vector of the user can better reflect the current interest of the user, the obtained N relevance degrees w can be based on a first text semantic vector reflecting the current intention of the user to a certain extent _i And N third text semantic vectors C _i The interest vector of the user is obtained, so that the obtained interest vector of the user can represent the current interest of the user.

According to an embodiment of the disclosure, the step S802 is based on the first text semantic vector, the N correlations w _i The N third text semantic vectors C _i Obtaining the interest vector of the user comprises the following steps:

linearizing the first spliced vector to obtain a linearization vector;

FIG. 9 illustrates the N relevance w based on the first text semantic vector according to an embodiment of the present disclosure _i The N third text semantic vectors C _i And acquiring a schematic diagram of the interest vector of the user.

As shown in fig. 9, the first text semantic vector a and the N third text semantic vectors C are acquired respectively _i N correlations w of (2) _i Thereafter, the correlation degree w may be _i Determined as a third text semantic vector C _i Corresponding weights, e.g. will w ₁ Determined as a third text semantic vector C ₁ The corresponding weight, w ₂ Determined as a third text semantic vector C ₂ The corresponding weight, w ₃ Determined as a third text semantic vector C ₃ The corresponding weight, … …, will be w _(N-1) Determined as a third text semantic vector C _(N-1) The corresponding weight, w _N Determined as a third text semantic vector C _N The corresponding weight. N third text semantic vectors C _i Is determined as the intermediate interest vector E of the user _t ，E _t ＝w ₁ *C ₁ +w ₂ *C ₂ +w ₃ *C ₃ +……+w _(N-1) *C _(N-1) +w _N *C _N 。

According to the embodiment of the disclosure, in order that the obtained interest vector of the user can better reflect the current search preference of the user, the method can be based on a first text semantic vector A which reflects the current intention of the user to a certain extent and an obtained intermediate interest vector E _t Splicing to obtain a first splicing vector [ E ] _t ；A]. The first splice vector [ E _t ；A]Linearizing to obtain linearization vector, wherein linearization may be performed by performing dimension reduction on the first spliced vector, for exampleTo multiply the first splice vector by the matrix W _E Thereby associating the dimension of the obtained linearized vector with the first text semantic vector A or the intermediate interest vector E _t Is kept consistent. The obtained linearization vector can be subjected to nonlinear processing through a Tanh function, and an interest vector E of a user is obtained ₂ Wherein E is ₂ ＝Tanh(W _E [E _t ；A])。

According to an embodiment of the disclosure, the third model includes a difference processing layer, a third stitching layer, an activation layer, and a third linear layer sequentially connected along an input-to-output direction of the third model, and the first text semantic vector and the N third text semantic vectors C are respectively obtained by using the third model _i N correlations w of (2) _i Comprising:

Fig. 10 shows a schematic structural view of a third model according to an embodiment of the present disclosure. As shown in fig. 10, the third model includes a difference processing layer 1001, a third splice layer 1002, an activation layer 1003, and a third linear layer 1004, which are sequentially connected in the input-to-output direction of the third model.

According to embodiments of the present disclosure, the first text semantic vector and the N third text semantic vectors C may be respectively _i The difference processing layer 1001 acquires N difference vectors. For example, the first text wordSense vector and third text sense vector C _i The input difference processing layer 1001 may combine the first text semantic vector with the third text semantic vector C _i Performing a difference operation on the values of each dimension in the database to obtain an ith difference vector, other (N-1) difference vectors can be obtained in the same way.

According to the embodiment of the disclosure, the first text semantic vector and the N third text semantic vectors C can be respectively calculated _i And N difference vectors are input to the third splice layer 1002 to obtain N second splice vectors. For example, the first text semantic vector, the third text semantic vector C _i And the ith difference vector is input to the third splicing layer 1002, so as to obtain the ith second splicing vector, and other (N-1) second splicing vectors can be obtained in the same way.

According to an embodiment of the present disclosure, N second splice vectors may be input into the activation layer 1003, and N activation vectors are obtained, where the activation layer may include a pralu function activation layer or a Dice function activation layer;

according to an embodiment of the present disclosure, N activation vectors may be input to the third linear layer 1004, respectively, such that the dimensions of the N activation vectors are reduced to 1, and N reduced dimension values are acquired. N reduced dimension values may be determined as a first text semantic vector and N third text semantic vectors C _i Is related to w _i 。

According to embodiments of the present disclosure, the first model, the second model, and the third model may be trained separately; the first model, the second model and the third model can be used as a whole model for training; the first model, the second model, and the third model may also be trained in connection with applying a recommendation model for a user interest vector obtained according to an embodiment of the present disclosure. For example, in the e-commerce field, historical search data of a user may be used as training samples, search words input by the user are used as input text data, commodities related to historical behaviors related to the search words of the user are used as historical behavior text data, obtained user interest vectors are input into a recommendation model, recommendation results are generated by using the recommendation model, commodities related to actual clicking, focusing, purchasing and other operations related to the search words in the historical search data of the user are used as commodities actually interested by the user, and training is aimed at enabling the recommendation results of the recommendation model to be consistent with the commodities actually interested by the user as much as possible.

Fig. 11 shows an application scenario schematic of a data processing method according to an embodiment of the present disclosure. As shown in fig. 11, the application scenario includes a client 1101 and a server 1102, and for convenience of description, only one client 1101 and one server 1102 are drawn in the application scenario of fig. 11, it should be understood that this example is only used as an example, and is not a limitation of the present disclosure, and the number, types, and connection manners of the client 1101 and the server 1102 in the present disclosure may be set according to actual needs, which is not specifically limited in the present disclosure.

The user inputs text data Q through the client 1101, and after acquiring the input text data Q of the user, the server 1102 may acquire N pieces of history behavioral text data H associated with the input text data Q based on the input text data Q _i I=1 to N, thereby making the acquired history behavior text data H _i Has a certain association relation with the current search preference of the user.

According to an embodiment of the present disclosure, in order to obtain input text data Q and N pieces of historical behavior text data H _i Can input text data Q and N historical behavior text data H _i Inputting into a first model, obtaining a first text semantic vector A corresponding to input text data Q, and respectively obtaining N historical behavior text data H _i Corresponding N second text semantic vectors B _i Thereby introducing the input text data Q and N pieces of historical behavior text data H through the first model _i Is described.

According to an embodiment of the present disclosure, in order to obtain N historical behavior text data H _i The internal association relation between the historical behavior text data H and N historical behavior text data H can be obtained _i Corresponding N second text semantic vectors B _i Inputting the X serially connected second models to obtain N third text semantic vectors C capable of reflecting internal association relations among N historical behavior text data _i WhereinThe second model may include a second multi-head attention layer, and the second multi-head attention layer may include a cosine operation attention layer, and by introducing a cosine operation attention mechanism, the normalized weight values may be differentiated, so that the obtained N third text semantic vectors C may be obtained _i N second text semantic vectors B can be expressed _i I.e. N historical behavioral text data H _i The magnitude of the degree of internal association between the two.

According to the embodiment of the disclosure, the first text semantic vector a and the N third text semantic vectors C may be respectively acquired by using the third model _i Is a function of the correlation degree. Can be based on N relatedness and N third text semantic vectors C _i And obtaining the intermediate interest vector of the user. In order to better acquire the real-time searching intention of the user, the interest vector E of the user can be acquired by utilizing the first text semantic vector A and the intermediate interest vector of the user ₃ So that the obtained interest vector of the user can better represent the current interest of the user.

Fig. 12 shows a block diagram of a data processing apparatus according to an embodiment of the present disclosure. The apparatus may be implemented as part or all of an electronic device by software, hardware, or a combination of both. As shown in fig. 12, the data processing apparatus 1200 includes a first acquisition module 1210, a second acquisition module 1220, a third acquisition module 1230, and a fourth acquisition module 1240.

The first obtaining module 1210 is configured to obtain input text data of a user and N pieces of historical behavior text data, where the historical behavior text data corresponds to a historical behavior of the user, and N is an integer greater than or equal to 1;

the second obtaining module 1220 is configured to obtain, by the processor, a first text semantic vector corresponding to the input text data and N second text semantic vectors corresponding to the N historical behavioral text data, respectively, using a first model;

The third acquisition module 1230 is configured to acquire, by the processor, from the N second text semantic vectors, using a second model or a plurality of second models in seriesTaking N third text semantic vectors C _i ，i＝1～N；

The fourth acquisition module 1240 is configured to, via the processor, generate, based on the first text semantic vector and the N third text semantic vectors C _i And obtaining the interest vector of the user.

According to an embodiment of the present disclosure, the obtaining input text data and N pieces of historical behavior text data of a user includes:

acquiring the input text data of the user;

According to an embodiment of the disclosure, the determining the N pieces of historical behavioral text data based on the input text data includes:

According to an embodiment of the present disclosure, the first model includes any one of the following models: word2vector model, item2vector model, BERT model.

According to an embodiment of the disclosure, the dot product attention layer includes a first dot product layer, a scaling layer, a first mask layer, a first Softmax function activation layer, and a second dot product layer sequentially connected along an input-to-output direction of the dot product attention layer, where the first dot product layer is configured to perform dot product operation on a first key vector and a first query vector output by the first sub-linear layer, and the second dot product layer is configured to perform dot product operation on an output result of the first Softmax function activation layer and a first value vector output by the first sub-linear layer.

According to an embodiment of the disclosure, the cosine operation attention layer includes a cosine operation layer, a second mask layer, a second Softmax function activation layer, and a third dot product layer sequentially connected along an input-to-output direction of the cosine operation attention layer, where the cosine operation layer is configured to perform cosine operation on a second key vector and a second query vector output by the third sub-linear layer, and the third dot product layer is configured to perform dot product operation on an output result of the second Softmax function activation layer and a second value vector output by the third sub-linear layer.

According to an embodiment of the disclosure, the processor is configured to generate, based on the first text semantic vector and the N third text semantic vectors C _i Obtaining the interest vector of the user comprises the following steps:

According to an embodiment of the disclosure, the correlation values w are based on the N correlations _i The saidN third text semantic vectors C _i Obtaining the interest vector of the user comprises the following steps:

According to an embodiment of the disclosure, the N degrees of relevance w based on the first text semantic vector _i The N third text semantic vectors C _i Obtaining the interest vector of the user comprises the following steps:

linearizing the first spliced vector to obtain a linearization vector;

According to an embodiment of the present disclosure, the third model includes a plurality of models connected in order along an input-to-output direction of the third modelThe difference processing layer, the third splicing layer, the activation layer and the third linear layer are used for respectively acquiring the first text semantic vector and the N third text semantic vectors C by using a third model _i N correlations w of (2) _i Comprising:

The present disclosure also discloses an electronic device, and fig. 13 shows a block diagram of the electronic device according to an embodiment of the present disclosure.

As shown in fig. 13, the electronic device 1300 includes a memory 1301 and a processor 1302; wherein,,

the memory 1301 is used to store one or more computer instructions that are executed by the processor 1302 to implement a method according to embodiments of the present disclosure.

As shown in fig. 14, the computer system 1400 includes a processing unit 1401 that can execute various processes in the above-described embodiments in accordance with a program stored in a Read Only Memory (ROM) 1402 or a program loaded from a storage section 1408 into a Random Access Memory (RAM) 1403. In the RAM1403, various programs and data required for the operation of the system 1400 are also stored. The processing unit 1401, the ROM 1402, and the RAM1403 are connected to each other through a bus 1404. An input/output (I/O) interface 1405 is also connected to the bus 1404.

The following components are connected to the I/O interface 1405: an input section 1406 including a keyboard, a mouse, and the like; an output portion 1407 including a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, a speaker, and the like; a storage section 1408 including a hard disk or the like; and a communication section 1409 including a network interface card such as a LAN card, a modem, and the like. The communication section 1409 performs communication processing via a network such as the internet. The drive 1410 is also connected to the I/O interface 1405 as needed. Removable media 1411, such as magnetic disks, optical disks, magneto-optical disks, semiconductor memory, and the like, is installed as needed on drive 1410 so that a computer program read therefrom is installed as needed into storage portion 1408. The processing unit 1401 may be implemented as a processing unit such as CPU, GPU, TPU, FPGA, NPU.

In particular, according to embodiments of the present disclosure, the methods described above may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program tangibly embodied on a medium readable thereby, the computer program comprising program code for performing the method described above. In such an embodiment, the computer program can be downloaded and installed from a network via the communication portion 1409 and/or installed from the removable medium 1411.

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units or modules referred to in the embodiments of the present disclosure may be implemented in software or in programmable hardware. The units or modules described may also be provided in a processor, the names of which in some cases do not constitute a limitation of the unit or module itself.

As another aspect, the present disclosure also provides a computer-readable storage medium, which may be a computer-readable storage medium included in the electronic device or the computer system in the above-described embodiments; or may be a computer-readable storage medium, alone, that is not assembled into a device. The computer-readable storage medium stores one or more programs for use by one or more processors in performing the methods described in the present disclosure.

The foregoing description is only of the preferred embodiments of the present disclosure and description of the principles of the technology being employed. It will be appreciated by those skilled in the art that the scope of the invention referred to in this disclosure is not limited to the specific combination of features described above, but encompasses other embodiments in which any combination of features described above or their equivalents is contemplated without departing from the inventive concepts described. Such as those described above, are mutually substituted with the technical features having similar functions disclosed in the present disclosure (but not limited thereto).

Claims

1. A method of data processing, comprising:

obtaining N third text semantic vectors C by a processor according to the N second text semantic vectors by using one second model or a plurality of second models connected in series _i I=1 to N; the third text semantic vector C _i Based on the third text semantic vector C _i Corresponding second text semantic vector B _i The association relation between the second semantic text vectors and other N-1 second semantic text vectors is acquired;

2. The method of claim 1, wherein the obtaining input text data and N historical behavioral text data of the user comprises:

acquiring the input text data of the user;

3. The method of claim 2, wherein the determining the N historical behavioral text data based on the input text data comprises:

4. The method of claim 1, wherein the first model comprises any one of the following models: word2vector model, item2vector model, BERT model.

5. The method of claim 1, wherein the second model comprises a first linear layer, a first multi-headed attention layer, a first residual normalization layer, a first feedforward neural network layer, and a second residual normalization layer connected in sequence along an input-to-output direction of the second model, wherein the first multi-headed attention layer comprises a first sub-linear layer, a dot product attention layer, a first splice layer, and a second sub-linear layer connected in sequence along the input-to-output direction of the first multi-headed attention layer.

6. The method of claim 5, wherein the dot-product attention layer comprises a first dot-product layer, a scaling layer, a first mask layer, a first Softmax function activation layer, and a second dot-product layer sequentially connected in an input-to-output direction of the dot-product attention layer, wherein the first dot-product layer is configured to perform dot-product operations on a first key vector and a first query vector output by the first sub-linear layer, and wherein the second dot-product layer is configured to perform dot-product operations on an output result of the first Softmax function activation layer and a first value vector output by the first sub-linear layer.

7. The method of claim 1, wherein the second model comprises a second linear layer, a second multi-headed attention layer, a third residual normalization layer, a second feedforward neural network layer, and a fourth residual normalization layer connected in sequence along an input-to-output direction of the second model, wherein the second multi-headed attention layer comprises a third sub-linear layer, a cosine-operation attention layer, a second splice layer, and a fourth sub-linear layer connected in sequence along the input-to-output direction of the second multi-headed attention layer.

8. The method of claim 7, wherein the cosine-operation attention layer comprises a cosine-operation layer, a second mask layer, a second Softmax-function activation layer, and a third dot-product layer sequentially connected along an input-to-output direction of the cosine-operation attention layer, wherein the cosine-operation layer is configured to perform a cosine operation on a second key vector and a second query vector output by the third sub-linear layer, and wherein the third dot-product layer is configured to perform a dot-product operation on an output result of the second Softmax-function activation layer and a second value vector output by the third sub-linear layer.

9. The method of claim 1, wherein the passing the processor is based on the first text semantic vector and the N third text semantic vectors C _i Obtaining the interest vector of the user comprises the following steps:

10. The method according to claim 9, wherein the correlation w is based on the N correlations w _i The N third text semantic vectors C _i Obtaining the interest vector of the user comprises the following steps:

11. The method of claim 1, wherein the passing the processor is based on the first text semantic vector and the N third text semantic vectors C _i Obtaining the interest vector of the user comprises the following steps:

respectively acquiring the first text semantic vector and the N third text semantic vectors C by using a third model _i N correlations of (a)w _i ，i＝1～N；

12. The method of claim 11, wherein the N degrees of relevance w based on the first text semantic vector _i The N third text semantic vectors C _i Obtaining the interest vector of the user comprises the following steps:

linearizing the first spliced vector to obtain a linearization vector;

13. The method according to claim 9 or 11, wherein the third model includes a difference processing layer, a third stitching layer, an activation layer, and a third linear layer sequentially connected along an input-to-output direction of the third model, and the first text semantic vector and the N third text semantic vectors C are respectively acquired by using the third model _i N correlations w of (2) _i Comprising:

14. A data processing apparatus, comprising:

A third obtaining module configured to obtain, by the processor, N third text semantic vectors C using one second model or a plurality of second models connected in series according to the N second text semantic vectors _i I=1 to N; the third text semantic vector C _i Based on the third text semantic vector C _i Corresponding second text semantic vector B _i The association relation between the second semantic text vectors and other N-1 second semantic text vectors is acquired;

15. The apparatus of claim 14, wherein the obtaining input text data and N historical behavioral text data of the user comprises:

acquiring the input text data of the user;

16. The apparatus of claim 15, wherein the determining the N historical behavioral text data based on the input text data comprises:

17. The apparatus of claim 14, wherein the first model comprises any one of the following models: word2vector model, item2vector model, BERT model.

18. The apparatus of claim 14, wherein the second model comprises a first linear layer, a first multi-headed attention layer, a first residual normalization layer, a first feedforward neural network layer, and a second residual normalization layer connected in sequence along an input-to-output direction of the second model, wherein the first multi-headed attention layer comprises a first sub-linear layer, a dot product attention layer, a first splice layer, and a second sub-linear layer connected in sequence along the input-to-output direction of the first multi-headed attention layer.

19. The apparatus of claim 18, wherein the dot-product attention layer comprises a first dot-product layer, a scaling layer, a first mask layer, a first Softmax function activation layer, and a second dot-product layer sequentially connected in an input-to-output direction of the dot-product attention layer, wherein the first dot-product layer is configured to perform dot-product operations on a first key vector and a first query vector output by the first sub-linear layer, and wherein the second dot-product layer is configured to perform dot-product operations on an output result of the first Softmax function activation layer and a first value vector output by the first sub-linear layer.

20. The apparatus of claim 14, wherein the second model comprises a second linear layer, a second multi-headed attention layer, a third residual normalization layer, a second feedforward neural network layer, and a fourth residual normalization layer connected in sequence along an input-to-output direction of the second model, wherein the second multi-headed attention layer comprises a third sub-linear layer, a cosine-operation attention layer, a second splice layer, and a fourth sub-linear layer connected in sequence along the input-to-output direction of the second multi-headed attention layer.

21. The apparatus of claim 20, wherein the cosine-operation attention layer comprises a cosine-operation layer, a second mask layer, a second Softmax-function activation layer, and a third dot-product layer sequentially connected along an input-to-output direction of the cosine-operation attention layer, wherein the cosine-operation layer is configured to perform a cosine operation on a second key vector and a second query vector output by the third sub-linear layer, and wherein the third dot-product layer is configured to perform a dot-product operation on an output result of the second Softmax-function activation layer and a second value vector output by the third sub-linear layer.

22. The apparatus of claim 14, wherein the processor is configured to determine, based on the first text semantic vector and the N third text semantic vectors C _i Obtaining the interest vector of the user comprises the following steps:

23. According to claimThe apparatus of claim 22, wherein the correlation based on the N correlations w _i The N third text semantic vectors C _i Obtaining the interest vector of the user comprises the following steps:

24. The apparatus of claim 14, wherein the processor is configured to determine, based on the first text semantic vector and the N third text semantic vectors C _i Obtaining the interest vector of the user comprises the following steps:

25. The apparatus of claim 24, wherein the N degrees of relevance w based on the first text semantic vector _i The N third text semantic vectors C _i Obtaining the interest vector of the user comprises the following steps:

linearizing the first spliced vector to obtain a linearization vector;

26. The apparatus according to claim 22 or 24, wherein the third model includes a difference processing layer, a third stitching layer, an activation layer, and a third linear layer sequentially connected along an input-to-output direction of the third model, and the first text semantic vector and the N third text semantic vectors C are respectively acquired by using the third model _i N correlations w of (2) _i Comprising:

27. An electronic device comprising a memory and a processor; wherein the memory is for storing one or more computer instructions, wherein the one or more computer instructions are executed by the processor to implement the method of any of claims 1-13.

28. A readable storage medium having stored thereon computer instructions which, when executed by a processor, implement the method of any of claims 1-13.