CN111008667A

CN111008667A - Feature extraction method and device and electronic equipment

Info

Publication number: CN111008667A
Application number: CN201911244245.8A
Authority: CN
Inventors: 张文迪; 崔正文
Original assignee: Beijing IQIYI Science and Technology Co Ltd
Current assignee: Beijing IQIYI Science and Technology Co Ltd
Priority date: 2019-12-06
Filing date: 2019-12-06
Publication date: 2020-04-14
Anticipated expiration: 2039-12-06
Also published as: CN111008667B

Abstract

The embodiment of the invention provides a feature extraction method, a feature extraction device and electronic equipment, which relate to the technical field of machine learning and comprise the following steps: behavior information of behaviors generated in the process that a user uses a client is obtained; obtaining object information of an object operated by the behavior; and acquiring the behavior characteristics of the behavior according to the behavior information, acquiring the object characteristics of the object according to the object information, and performing characteristic fusion on the behavior characteristics and the object characteristics to acquire the user characteristics of the user. By applying the scheme provided by the embodiment of the invention to extract the features, the accuracy of the extracted user features can be improved.

Description

Feature extraction method and device and electronic equipment

Technical Field

The present invention relates to the field of machine learning technologies, and in particular, to a method and an apparatus for feature extraction, and an electronic device.

Background

In order to provide better service for users, various application software generally needs to extract features of the users, analyze the users according to the extracted user features, and recommend information to the users in a targeted manner according to analysis results.

In the prior art, when extracting features of a user, behavior information of the user in a process of using application software, for example, behavior information indicating behaviors of watching live broadcast, reading news, listening to rock and roll music, needs to be collected, and then feature extraction is performed on the collected behavior information, and an extraction result is used as a user feature of the user.

However, when the user features are extracted in the above manner, the behavior of the user in the process of using the software is only considered on a one-sided basis, so that the accuracy of the extracted user features is low.

Disclosure of Invention

The embodiment of the invention aims to provide a feature extraction method, a feature extraction device and electronic equipment so as to improve the accuracy of extracted user features. The specific technical scheme is as follows:

in a first aspect, an embodiment of the present invention provides a feature extraction method, where the method includes:

behavior information of behaviors generated in the process that a user uses a client is obtained;

obtaining object information of an object operated by the behavior;

and acquiring the behavior characteristics of the behavior according to the behavior information, acquiring the object characteristics of the object according to the object information, and performing characteristic fusion on the behavior characteristics and the object characteristics to acquire the user characteristics of the user.

In an embodiment of the present invention, the obtaining behavior characteristics of the behavior according to the behavior information, obtaining object characteristics of the object according to the object information, and performing characteristic fusion on the behavior characteristics and the object characteristics to obtain the user characteristics of the user includes:

inputting the behavior information and the object information into a pre-trained user feature extraction model for feature extraction to obtain the user features of the user, wherein the user feature extraction model is as follows: and the initial model of the user feature extraction model is subjected to parameter adjustment according to parameter adjustment information to obtain a model, and the user feature extraction model is used for: extracting behavior features and object features, and fusing the extracted features to obtain user features, wherein the parameter adjusting information comprises: the method comprises the following steps that a forward result with the highest confidence degree in model output results and a preset number of reverse results which are randomly selected except the forward result are obtained, and the model output results are as follows: and inputting the sample behavior information and the sample object information as sample data into the initial model to obtain an output result, wherein the sample behavior information is as follows: the sample behavior information comprises: the sample behavior is information of an object operated on.

In one embodiment of the invention, the initial model comprises a user feature extraction layer, a dimensionality raising layer and a normalization layer;

training the initial model for multiple times until a preset training end condition is reached to obtain the user feature extraction model by the following mode:

obtaining sample behavior information generated in the process of using the client by the sample user;

obtaining sample object information of an object operated by the sample behavior;

inputting the sample behavior information and the sample object information as sample data into the user feature extraction layer to obtain a user feature extraction result;

inputting the user feature extraction result into the dimension increasing layer for dimension increasing to obtain a dimension increasing result of a first dimension;

inputting the dimension-increasing result into the normalization layer for normalization processing to obtain a normalization result;

determining the result with the highest confidence coefficient as a forward result in the normalization results, and randomly determining a preset number of results in the results except the forward result as reverse results;

and taking the determined forward result and the determined reverse result as parameter adjusting information to carry out parameter adjustment on the initial model, thereby realizing one-time training on the initial model.

In an embodiment of the present invention, the positive sample behavior information in the sample behavior information includes: sample behavior information corresponding to the sample behavior of which the duration is greater than a first preset threshold value;

the negative sample behavior information in the sample behavior information comprises: and sample behavior information corresponding to the sample behavior of which the duration of the sample behavior is less than a second preset threshold, wherein the first preset threshold is not less than the second preset threshold.

In an embodiment of the present invention, the negative sample behavior information in the sample behavior information includes at least one of the following information:

sample behavior information corresponding to the sample behavior of which the duration of the sample behavior is greater than zero and less than the second preset threshold;

and randomly selecting sample behavior information from the sample behavior information corresponding to the sample behavior with the duration of 0.

In an embodiment of the present invention, the performing feature fusion on the behavior feature and the object feature to obtain the user feature of the user includes:

reducing the dimension of the behavior characteristic to obtain a behavior characteristic of a second dimension;

combining the behavior characteristic of the second dimension with the object characteristic to obtain a combined characteristic;

and reducing the dimension of the combined feature to obtain a combined feature of a third dimension as a user feature.

In a second aspect, an embodiment of the present invention provides a feature extraction apparatus, including:

the behavior information acquisition module is used for acquiring behavior information of behaviors generated in the process that a user uses the client;

the object information obtaining module is used for obtaining the object information of the object operated by the behavior;

and the user characteristic obtaining module is used for obtaining the behavior characteristics of the behaviors according to the behavior information, obtaining the object characteristics of the object according to the object information, and performing characteristic fusion on the behavior characteristics and the object characteristics to obtain the user characteristics of the user.

In an embodiment of the present invention, the user characteristic obtaining module is specifically configured to:

acquiring behavior characteristics of the behaviors according to the behavior information, and acquiring object characteristics of the objects according to the object information;

In a third aspect, an embodiment of the present invention provides an electronic device, including a processor, a communication interface, a memory, and a communication bus, where the processor and the communication interface complete communication between the memory and the processor through the communication bus;

a memory for storing a computer program; a processor for implementing the method steps of any of the first aspect when executing a program stored in the memory.

In a fourth aspect, the present invention also provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, performs the method steps of any of the first aspects.

In a fifth aspect, embodiments of the present invention also provide a computer program product comprising instructions which, when run on a computer, cause the computer to perform the method steps of any of the first aspects.

The embodiment of the invention has the following beneficial effects:

according to the technical scheme, when the scheme provided by the embodiment of the invention is applied to feature extraction, the object information of the object operated by the behavior is obtained by obtaining the behavior information of the behavior generated in the process that the user uses the client, the behavior feature of the behavior is obtained according to the behavior information, the object feature of the object is obtained according to the object information, and the behavior feature and the object feature are subjected to feature fusion to obtain the user feature of the user. The scheme provided by the embodiment of the invention not only considers the behavior characteristics of the user, but also considers the object characteristics of the object selected by the user, and the two types of characteristics are fused to obtain the user characteristics. Therefore, by applying the feature extraction scheme provided by the embodiment of the invention, the considered features are abundant in types and are not single any more, so that the accuracy of the extracted user features can be improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

Fig. 1 is a schematic flow chart of a feature extraction method according to an embodiment of the present invention;

fig. 2 is a schematic flowchart of a user characteristic obtaining method according to an embodiment of the present invention;

fig. 3 is a schematic flow chart of another feature extraction method according to an embodiment of the present invention;

FIG. 4 is a schematic structural diagram of an initial model according to an embodiment of the present invention;

FIG. 5 is a schematic flow chart of a model training method according to an embodiment of the present invention;

FIG. 6 is a schematic diagram of a model training process according to an embodiment of the present invention;

fig. 7 and fig. 8 are schematic diagrams of a video display page according to an embodiment of the present invention;

fig. 9 is a schematic structural diagram of a feature extraction apparatus according to an embodiment of the present invention;

fig. 10 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The embodiment of the invention provides a feature extraction method, a feature extraction device and electronic equipment, which are respectively described in detail below.

Referring to fig. 1, fig. 1 is a schematic flow chart of a feature extraction method according to an embodiment of the present invention. First, an execution main body of the present scheme is described, where the execution main body may be a client, and may be, for example, video software, reading software, music software, and the like. The execution main body may also be a server providing services for the client, and may also be a device providing an operating environment for the client, for example, a computer, a tablet computer, a mobile phone, and the like. The method comprises the following steps 101-103.

Step 101, behavior information of behaviors generated in the process of using the client by the user is obtained.

Step 102, obtaining object information of the object operated by the behavior.

And 103, acquiring behavior characteristics of the behaviors according to the behavior information, acquiring object characteristics of the objects according to the object information, and performing characteristic fusion on the behavior characteristics and the object characteristics to obtain user characteristics of the user.

The scheme provided by the embodiment of the invention not only considers the behavior characteristics of the user, but also considers the object characteristics of the object selected by the user, and the two types of characteristics are fused to obtain the user characteristics. Therefore, by applying the feature extraction scheme provided by the embodiment of the invention, the considered features are abundant in types and are not single any more, so that the accuracy of the extracted user features can be improved.

The above-described feature extraction method is described in detail below.

The behavior of the user in the process of using the client can be watching live broadcast, reading news, listening to rock and roll music and the like. The behavior information may include behavior identifier, duration, generation time, frequency, etc. of the behavior. Behavior characteristics of behaviors can be obtained according to the behavior information, and the behavior characteristics can be expressed in the forms of vectors, matrixes, sequences and the like.

Specifically, for a behavior, the frequency may be a proportion of the behavior to all behaviors generated by the user using the client, for example, assuming that the client is video software, and the user watches 1 movie, 3 tv series, and 6 short videos in the process of using the client, the frequency of the behavior is 10% for the behavior of watching movies; the frequency may also be the number of times the user generates the behavior per unit time, for example, if the user clicks 5 times on the short video within 1 hour, the frequency of the behavior is 5 for watching the short video.

In one embodiment of the invention, the behavior of the user can be judged according to the object clicked by the user in the process of using the client. For example, assuming that the object clicked by the user is a movie, the behavior of the user is judged to be watching the movie.

The object is determined according to the behavior of the user. For example, if the behavior of the user is watching a video, the object is the video watched by the user; assuming that the behavior of the user is reading news, the object is the article read by the user. In the case that the object is a video, the object information may be a name of the video, a cover frame of the video, a size of a storage space occupied by the video, and the like; in the case where the object is a sentence, the object information may be a name, a release time, the number of characters, and the like of the sentence. The object feature is an object feature obtained from object information, for example, in the case where the object information is a cover frame of a video, the object feature may be a color feature, a depth feature, or the like of the cover frame; when the object information is a name of a sentence, the object feature may be a semantic feature of the name, a character string feature, or the like. The object features can be expressed in the form of vectors, matrixes, sequences and the like.

In one embodiment of the invention, the object operated by the user behavior can be determined according to the historical information stored in the client. For example, in the case of video software, a viewing history of a video viewed by a user is usually saved, and the video viewed by the user can be determined according to the viewing history.

In an embodiment of the present invention, when behavior features of behaviors are obtained according to behavior information, the behavior information of a user may be input into a behavior feature extraction model to perform feature extraction, so as to obtain the behavior features. The behavior feature extraction model may be a model obtained by training a neural network model in advance and used for extracting the behavior feature of the user, where the neural network model may be a recurrent neural network model, a deep neural network model, or the like.

In an embodiment of the present invention, statistical classification may also be performed on the behavior information of the user to obtain the behavior characteristics of the user. For example, for video software, a user watching a domestic movie, a japanese-korean movie, and a european-american movie may be collectively classified as a watching movie, and the watching movie is taken as the behavior feature of the user.

In One embodiment of the invention, after the behavior feature is obtained, One-hot (One-bit valid) encoding can be performed on the obtained behavior feature. The dimensionality of the behavior characteristics after the One-hot coding can be 5000 dimensionality, 10000 dimensionality and the like.

Specifically, in the behavior characteristics after One-hot encoding, an element of each dimension may represent One behavior, and the value of an element of a dimension corresponding to One behavior actually generated by a user is 1, and the value of an element of another dimension is 0. Therefore, the behavior characteristics can be converted into sparse high-latitude coded characteristics, and the coded behavior characteristics can be conveniently processed in the follow-up process.

For example, behavior features are expressed in the form of vectors, and it is assumed that behaviors generated by a user in the process of using a client are as follows: viewing the film, wherein the dimensionality of the behavior characteristics obtained after the One-hot coding is 5000 dimensionalities, and the elements representing the behavior of viewing the film are elements of the third dimension, so that the obtained behavior characteristics after the One-hot coding are as follows:

[0 0 1 0 0……0]

the vector dimension is 5000 dimensions, the value of the element in the third dimension is 1, and the values of the elements in other dimensions are 0.

In an embodiment of the present invention, when the object feature of the object is obtained according to the object information, the object information may be input into the object feature extraction model for feature extraction, so as to obtain the object feature. For example, when the object information is a cover frame of a video, the cover frame may be input to a pre-trained object feature extraction model to obtain an object feature of the video. The object feature extraction model is a model trained in advance and used for extracting features of object information, and specifically may be a convolutional neural network model, a cyclic neural network model, a deep network model, or the like.

In an embodiment of the present invention, when the behavior feature and the object feature are feature-fused to obtain the user feature of the user, the behavior feature and the object feature may be directly fused in a merging manner to obtain the user feature. The combination may be splicing, multiplication, weighting calculation, etc.

Specifically, the above-mentioned splicing may be that two features are connected end to end, for example, in the case that the behavior feature and the object feature are both represented in the form of a vector, the object feature may be connected behind the behavior feature.

Assume behavior characteristics as: [01011] The object characteristics are: [10001] And splicing the behavior characteristics and the object characteristics to obtain:

[ 0101110001 ] the characteristics after the splicing are used as user characteristics.

The above-mentioned splicing may also superimpose two features, for example, the behavior feature may be added to the numerical value of each dimension in the object feature to realize feature fusion.

The weighting calculation may be to multiply the behavior feature, the fusion weight, and the object feature according to a preset fusion weight to realize feature fusion.

According to the technical scheme, when the scheme provided by the embodiment is applied to feature extraction, the object information of the object operated by the behavior is obtained by obtaining the behavior information of the behavior generated in the process that the user uses the client, the behavior feature of the behavior is obtained according to the behavior information, the object feature of the object is obtained according to the object information, and the behavior feature and the object feature are subjected to feature fusion to obtain the user feature of the user. The scheme provided by the embodiment considers the behavior characteristics of the user and the object characteristics of the object selected by the user, and the two types of characteristics are fused to obtain the user characteristics. It can be seen that, by applying the feature extraction scheme provided by the above embodiment, the considered features are of various types and are not single any more, so that the accuracy of the extracted user features can be improved.

Referring to fig. 2, in an embodiment of the present invention, when performing feature fusion on the behavior feature and the object feature to obtain a user feature of a user, the following steps 201 to 203 may be implemented.

And step 201, performing dimension reduction on the behavior characteristics to obtain the behavior characteristics of a second dimension.

The second dimension may be 32 dimensions, 64 dimensions, 86 dimensions, and the like, which is not limited in this embodiment of the present invention. For example, assume the original behavior signature is a 1000-dimensional vector:

[0 0 0 1 0 0……0]

wherein, the value of the fourth dimension element is 1, and the values of the other dimension elements are 0.

Assuming that the second dimension is 8 dimensions, the vector is converted into an 8-dimensional vector, and then:

00010000 is used for converting the high-dimensional behavior characteristics into low-dimensional behavior characteristics, so that the behavior characteristics and the object characteristics can be combined conveniently in the subsequent process.

Step 202, merging the behavior feature of the second dimension with the object feature to obtain a merged feature.

The merging method may be splicing, multiplication, weighting calculation, and the like. The specific combining method is the same as the step 103, and is not described herein again.

And 203, reducing the dimension of the combined feature to obtain a combined feature of a third dimension as the user feature.

The third dimension may be 32 dimensions, 64 dimensions, 86 dimensions, and the like, which is not limited in this embodiment of the present invention.

And after the behavior characteristic of the second dimension and the object characteristic are combined to obtain a combined characteristic, the dimension of the combined characteristic is increased again to become a high-dimensional combined characteristic. And reducing the dimension of the high-dimensional combination feature to obtain a low-dimensional combination feature as the user feature. Therefore, the similarity between the features can be conveniently measured, and the computing resources can be saved in a scene of recommending the object by using the user features.

Referring to fig. 3, in an embodiment of the present invention, for the above step 103, the user characteristics may be obtained according to the following step 1031.

And step 1031, inputting the behavior information and the object information into a pre-trained user feature extraction model for feature extraction, and obtaining the user features of the user.

The user feature extraction model is as follows: and carrying out parameter adjustment on the initial model of the user feature extraction model according to the parameter adjustment information to obtain a model, wherein the user feature extraction model is used for: and extracting the behavior characteristics and the object characteristics, and fusing the extracted characteristics to obtain the user characteristics.

The user feature extraction model may be a DNN (Deep Neural Networks) model, and the model structure may be a 3-hidden-layer structure, a 4-hidden-layer structure, or the like. In addition, the model may be a CNN (Convolutional Neural Network) model or an RNN (recurrent Neural Network) model.

The parameter adjusting information comprises: and the model outputs the forward result with the highest confidence level in the results and a preset number of reverse results which are randomly selected except the forward result. Wherein, the preset number may be 50, 100, 200, etc.

The model output result is: and inputting the sample behavior information and the sample object information as sample data into an initial model to obtain an output result.

The sample behavior information is: the sample behavior information and the sample object information are as follows: the sample behavior is information of the object operated on.

Specifically, the behavior information of the user and the corresponding object information are input into the user feature extraction model, so that the user features can be directly obtained, the feature extraction efficiency is improved, and the computing resources are saved.

Referring to fig. 4, in an embodiment of the present invention, the initial model includes a user feature extraction layer 401, an upscaling layer 402, and a normalization layer 403.

The user feature extraction layer 401 is configured to: and performing feature extraction on the behavior information to obtain behavior features, performing feature extraction on the object information to obtain object features, and performing feature fusion on the behavior features and the object features to obtain user features.

The dimension-raising layer 402 is used for: and performing dimension raising on the user features extracted by the user feature extraction layer 401 to obtain a dimension raising result of the first dimension.

The normalization layer 403 is used to: and performing normalization processing on the dimension-increasing results obtained by the dimension-increasing layer 402 to obtain the confidence coefficient of each result in the dimension-increasing results, wherein the sum of the confidence coefficients of the dimension-increasing results is 1. Specifically, the normalization process may be performed using a Softmax regression function.

For the normalized result, the result with the highest confidence may be used as the forward result, and the results other than the forward result may be used as the backward result. And then, the parameters of the initial model are adjusted by utilizing the forward result and the reverse result, so that the training of the model is realized.

The following describes the training process of the above-mentioned user feature extraction model in detail.

Referring to fig. 5, the initial model is trained for multiple times through the following steps 501 to 507 until a preset training end condition is reached, so as to obtain a user feature extraction model. The preset training end condition may be that the training frequency reaches a preset frequency threshold, and the frequency threshold may be 5000 times, 10000 times, and the like. The predetermined training end condition may be that the loss of the trained model is smaller than a predetermined loss threshold, and the loss threshold may be 0.02, 0.01, 0.005, or the like. The loss of the model can be obtained from the difference between the backward result and the forward result in the output result of the model.

Step 501, obtaining sample behavior information generated in the process that a sample user uses a client.

The obtaining manner of the sample behavior information is the same as the obtaining manner of the behavior information in step 101, and is not described herein again.

Step 502, obtaining sample object information of the object operated by the sample behavior.

The obtaining method of the sample object information is the same as the obtaining method of the object information in step 102, and is not described herein again.

Step 503, inputting the sample behavior information and the sample object information as sample data into the user feature extraction layer to obtain a user feature extraction result.

Specifically, the user feature extraction layer may perform feature extraction on the input sample behavior information to obtain sample behavior features, perform feature extraction on the sample object information to obtain sample object features, and perform feature fusion on the sample behavior features and the sample object features to obtain a user feature extraction result of the sample user.

The user feature extraction layer can also perform dimension reduction on the extracted sample behavior features, and fuse the dimension-reduced sample behavior features and the sample object features.

The user feature extraction layer can also perform dimension reduction on the fused features again, and the fused features after dimension reduction are used as user feature extraction results.

Step 504, inputting the user feature extraction result into a dimension-increasing layer for dimension-increasing to obtain a dimension-increasing result of the first dimension.

The first dimension may be 1000 dimensions, 5000 dimensions, 10000 dimensions, etc., and the first dimension may also be equal to the second dimension.

In an embodiment of the present invention, when performing dimension increasing, the user feature extraction result may be multiplied by a preset weight matrix to obtain a dimension increasing result of a second dimension.

And 505, inputting the dimension-increasing result into a normalization layer for normalization processing to obtain a normalization result.

In one embodiment of the invention, the normalization layer may normalize the upscaled result using a Softmax regression function to obtain a confidence level for each result.

Wherein the confidence of each result can be understood as: the result is a probability of the user's feature. The sum of the confidence levels of all the results in the normalized result is 1.

In step 506, in the normalized results, the result with the highest confidence coefficient is determined to be the forward result, and a preset number of results are randomly determined to be the reverse results in the results except the forward result.

Wherein, for the result with the highest confidence, it can be understood that the probability that the result is the user feature is the largest, and therefore the result is taken as the forward result.

For the results other than the forward result, the probability that the results are the user features is small, and for the results other than the forward result, a preset number of results can be randomly selected as the reverse result. The predetermined number may be 50, 100, 200, etc. Therefore, results except all the forward results do not need to be used as reverse results, the influence on the efficiency of model training caused by large data quantity of the reverse results is avoided, the operation resources can be saved, and the model training speed is accelerated.

For example, assume that the normalized result includes 5 results, and the confidence of each result is shown in table 1 below:

TABLE 1

Results	P1	P2	P3	P4	P5
						Confidence level	0.1	0.4	0.2	0.1	0.2

In the table, P1 and P2 … … P5 respectively represent each result in the normalized results, 0.1 represents the confidence of the result P1, 0.4 represents the confidence of the result P2, and so on, and 0.2 represents the confidence of the result P5. As can be seen from the above table, the confidence corresponding to the result P2 is the highest, so that the result P2 is taken as a positive result. Assuming that the preset number is 3, 3 results are randomly selected from the results P1, P3, P4, P5 as a reverse result, for example, the reverse result may be P1, P3, P5.

And 507, performing parameter adjustment on the initial model by using the determined forward result and the determined reverse result as parameter adjustment information to realize one-time training on the initial model.

Specifically, the initial model may be subjected to parameter adjustment according to the forward result and the reverse result to obtain a parameter-adjusted model, so as to implement one-time training of the initial model. And taking the model with the adjusted parameters as an initial model, and executing the step of model training again until a training finishing condition is reached to finally obtain the trained user feature extraction model.

In an embodiment of the present invention, the sample behavior information as sample data can be divided into positive sample behavior information and negative sample behavior information according to the duration of the behavior:

the positive sample behavior information in the sample behavior information includes: and sample behavior information corresponding to the sample behavior of which the duration is greater than the first preset threshold.

Wherein, the first preset threshold may be 3 minutes, 15 minutes, 30 minutes, etc. The second preset threshold may correspond to 1 minute, 5 minutes, 15 minutes, etc.

Specifically, the sample behavior information may include a duration of the sample behavior, and for a sample behavior whose duration is greater than the first preset threshold, it indicates that the sample user has a high interest level in the sample behavior, and the sample behavior information of such sample behavior has a characteristic meaning, and therefore may be used as positive sample behavior information. For the sample behaviors with the duration less than the second preset threshold, the sample behaviors are indicated to be low in interest degree of the sample user, and the sample behavior information of the sample behaviors has small representation meaning, so that the sample behaviors can be used as negative sample behavior information.

Taking a video watching example, the first preset threshold is 15min, the second preset threshold is 5min, and when the duration of the sample behavior is longer than 15min, it indicates that the interest of the sample user in watching the video is high, so that the sample behavior information of the sample behavior is used as the positive sample behavior information; if the duration of the sample behavior is less than 5min, the interest of the sample user in watching the video is low, and therefore the sample behavior information of the sample behavior is taken as negative sample behavior information.

sample behavior information corresponding to the sample behavior of which the duration of the sample behavior is greater than zero and less than a second preset threshold;

For the sample behaviors with the duration longer than zero and less than the second preset threshold, it is indicated that the sample user has interest in the sample behaviors, but the interest degree is low, so that the sample behavior information corresponding to the sample behaviors is used as negative sample behavior information. For the sample behavior with the duration of 0, it indicates that the sample user is not interested in the behavior, so the sample behavior information corresponding to the sample behavior is also used as the negative sample behavior information.

For the sample behavior with the duration of 0, negative sample behavior information can be randomly selected from the sample behaviors. The number of the randomly selected negative sample behavior information may be preset, for example, the number may be 50, 100, 200, and the like. The above number may be obtained by referring to the number of the positive sample behavior information, for example, the number of the negative sample behavior information may be kept consistent with the number of the positive sample behavior information, and the number of the negative sample behavior information may be in a preset ratio with respect to the number of the positive sample behavior information.

Assuming that there are sample behavior information a1, a2, A3, a4, a5, a6, a7, A8, a9, the durations of sample behaviors corresponding to the respective sample behavior information are as shown in table 2 below:

TABLE 2

Under the conditions that the first preset threshold is 10min and the second preset threshold is 5min, the sample behavior information corresponding to the sample behavior with the duration longer than the first preset threshold is as follows: a3, a4 and a9, wherein the sample behavior information corresponding to the sample behaviors with the duration longer than 0 and less than the second preset threshold is as follows: a5 and a8, the sample behavior information corresponding to the sample behavior with the duration of 0 is: a1, a6 and a 7.

Therefore, in table 2 above, as the positive sample behavior information, are A3, a4, and a9, as the negative sample behavior information, are a5, A8, and sample behavior information randomly selected from a1, a6, a 7.

According to the sample division scheme provided by the embodiment, the positive and negative sample behavior information is divided according to the duration of the sample behavior generated by the sample user, so that the negative sample behavior information is richer, and the accuracy of the model obtained by training the positive and negative sample behavior information is higher.

Referring to fig. 6, fig. 6 is a diagram illustrating a complete process of model training.

Firstly, obtaining sample behavior characteristics corresponding to sample behavior information, and obtaining sample object characteristics of sample object information of a sample object operated by the sample behavior;

reducing the dimension of the sample behavior characteristics to obtain dimension-reduced sample behavior characteristics, and then performing characteristic fusion on the dimension-reduced sample behavior characteristics and the sample object characteristics to obtain fused characteristics;

performing dimension reduction again on the fused features by using three Relu activation layers in the model to obtain a user feature extraction result;

performing dimension increasing on the user feature extraction result by using a dimension increasing layer in the model to obtain a dimension increasing result;

and carrying out normalization processing on the dimension-increasing results by utilizing a normalization layer in the model to obtain the confidence coefficient of each result, determining the result with the highest confidence coefficient as a forward result, randomly selecting a preset number of results except the forward result as reverse results, and carrying out parameter adjustment on the model based on the determined forward result and the reverse result to realize one-time training of the model.

In an embodiment of the present invention, the feature extraction scheme may be applied in the field of object recommendation, for example, video recommendation, article recommendation, song recommendation, and the like may be performed by using the feature extraction scheme. Specifically, the method comprises the following steps:

behavior information of a click behavior generated by a user is obtained, and object information of an object clicked by the user is obtained;

obtaining user characteristics according to the behavior information and the object information;

and determining objects similar to the user characteristics in the object library, and recommending the user according to the determined objects.

For example, as shown in fig. 7, in the video display page, it is assumed that the click behavior generated by the user is to see a short video, the clicked object is an entertainment video, the characteristics of the user are obtained according to the information, and then video recommendation is performed according to the characteristics of the user, so that the video display page shown in fig. 8 is displayed to the user, and video recommendation to the user is realized.

Fig. 9 is a schematic structural diagram of a feature extraction apparatus according to an embodiment of the present invention, where the apparatus includes:

a behavior information obtaining module 901, configured to obtain behavior information of a behavior generated in a process of using a client by a user;

an object information obtaining module 902, configured to obtain object information of an object operated by the behavior;

a user characteristic obtaining module 903, configured to obtain the behavior characteristic of the behavior according to the behavior information, obtain the object characteristic of the object according to the object information, and perform characteristic fusion on the behavior characteristic and the object characteristic to obtain the user characteristic of the user.

In an embodiment of the present invention, the user characteristic obtaining module 903 is specifically configured to:

The embodiment of the present invention further provides an electronic device, as shown in fig. 10, which includes a processor 1001, a communication interface 1002, a memory 1003 and a communication bus 1004, wherein the processor 1001, the communication interface 1002 and the memory 1003 complete mutual communication through the communication bus 1004,

a memory 1003 for storing a computer program;

the processor 1001 is configured to implement the steps of the feature extraction method provided in the embodiment of the present invention when executing the program stored in the memory 1003.

The communication bus mentioned in the electronic device may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The communication bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown, but this does not mean that there is only one bus or one type of bus.

The communication interface is used for communication between the electronic equipment and other equipment.

The Memory may include a Random Access Memory (RAM) or a Non-Volatile Memory (NVM), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the processor.

The Processor may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component.

In yet another embodiment provided by the present invention, a computer-readable storage medium is further provided, in which a computer program is stored, and the computer program, when executed by a processor, implements the steps of any of the above-mentioned feature extraction methods.

In a further embodiment, the present invention also provides a computer program product containing instructions which, when run on a computer, cause the computer to perform any of the feature extraction methods of the above embodiments.

According to the technical scheme, when the scheme provided by the embodiment is applied to feature extraction, the object information of the object operated by the behavior is obtained by obtaining the behavior information of the behavior generated in the process that the user uses the client, the behavior feature of the behavior is obtained according to the behavior information, the object feature of the object is obtained according to the object information, and the behavior feature and the object feature are subjected to feature fusion to obtain the user feature of the user. The scheme provided by the embodiment considers the behavior characteristics of the user and the object characteristics of the object selected by the user, and the two types of characteristics are fused to obtain the user characteristics.

The electronic device, the readable storage medium and the computer program product provided by the embodiment of the invention can quickly and accurately realize the feature extraction method provided by the embodiment of the invention. Compared with the prior art, the scheme provided by the embodiment of the invention considers the behavior characteristics of the user and the object characteristics of the object selected by the user, fuses the two types of characteristics and then extracts the characteristics to obtain the user characteristics. Therefore, by applying the feature extraction scheme provided by the embodiment of the invention, the considered features are abundant in types and are not single any more, so that the accuracy of the extracted user features can be improved.

In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the invention to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website site, computer, server, or data center to another website site, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.

It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, apparatus embodiments, electronic device embodiments, computer-readable storage medium embodiments, and computer program product embodiments are described with relative simplicity as they are substantially similar to method embodiments, where relevant only as described in portions of the method embodiments.

The above description is only for the preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention shall fall within the protection scope of the present invention.

Claims

1. A method of feature extraction, the method comprising:

obtaining object information of an object operated by the behavior;

2. The method according to claim 1, wherein the obtaining behavior features of the behavior according to the behavior information, obtaining object features of the object according to the object information, and performing feature fusion on the behavior features and the object features to obtain the user features of the user comprises:

3. The method of claim 2,

the initial model comprises a user feature extraction layer, a dimension increasing layer and a normalization layer;

4. The method of claim 3,

the positive sample behavior information in the sample behavior information includes: sample behavior information corresponding to the sample behavior of which the duration is greater than a first preset threshold value;

5. The method of claim 4, wherein negative sample behavior information in the sample behavior information comprises at least one of the following information:

6. The method according to any one of claims 1 to 5, wherein the performing feature fusion on the behavior feature and the object feature to obtain the user feature of the user comprises:

7. A feature extraction apparatus, characterized in that the apparatus comprises:

8. The apparatus of claim 7, wherein the user characteristic obtaining module is specifically configured to:

9. The apparatus of claim 8,

10. The apparatus of claim 9,

11. The apparatus of claim 10, wherein negative sample behavior information in the sample behavior information comprises at least one of the following information:

12. The apparatus according to any of claims 7-11, wherein the user profile obtaining module is specifically configured to:

13. An electronic device is characterized by comprising a processor, a communication interface, a memory and a communication bus, wherein the processor and the communication interface are used for realizing mutual communication by the memory through the communication bus;

a memory for storing a computer program; a processor for implementing the method steps of any one of claims 1 to 6 when executing a program stored in the memory.