CN115423037B

CN115423037B - User classification method and system based on big data

Info

Publication number: CN115423037B
Application number: CN202211183765.4A
Authority: CN
Inventors: 马萃; 锁海娇
Original assignee: Hubei Central China Technology Development Of Electric Power Co ltd
Current assignee: Hubei Central China Technology Development Of Electric Power Co ltd
Priority date: 2022-09-27
Filing date: 2022-09-27
Publication date: 2023-10-13
Anticipated expiration: 2042-09-27
Also published as: CN115423037A

Abstract

According to the big data based user classification method and system provided by the embodiment of the application, the characteristic difference priority which can reflect the forward excitation of each first session set on the obtained personalized behavior label is obtained, the behavior description knowledge fields corresponding to the first session sets respectively are optimized, the repeated and invalid behavior description knowledge fields are cleaned, the behavior description knowledge fields which do not have forward excitation or have forward excitation smaller than the expected behavior description knowledge fields on the obtained personalized behavior label are filtered, and the behavior description knowledge fields corresponding to the Q second session sets which influence the obtained personalized behavior label are obtained based on the obtained behavior description knowledge fields. And then, determining the personalized behavior label of the user behavior log through the behavior description knowledge fields corresponding to the Q second session sets. The reliability of personalized behavior labels is ensured, the reliability of user classification is improved, and the data processing capacity of the behavior description knowledge field is relieved because Q is smaller than P, so that the analysis efficiency is increased, and the calculation consumption is relieved.

Description

User classification method and system based on big data

Technical Field

The application relates to the field of internet business, in particular to a user classification method and system based on big data.

Background

The user classification is an indispensable ring for carrying out service pushing in the Internet field, and reasonable and accurate user classification can increase the conversion rate of service pushing. In the process of business interaction, various conversations, such as information searching, product evaluation and associated clicking in the shopping process of an electronic commerce, cannot be avoided; article use, language communication, consumption history during game operation; author incentives in video browsing, pay-per-view, comment messages, etc. The series of user behaviors can reflect the preference of the user in a certain situation, so that the user behavior data can be analyzed from certain dimensions to classify the user so as to meet the accuracy of service pushing, however, the accuracy of classification results in the prior art is still to be improved for the classification of the user.

Disclosure of Invention

The present invention aims to provide a user classification method and system based on big data, which solve the above problems.

The embodiment of the invention is realized as follows:

in a first aspect, an embodiment of the present disclosure provides a user classification method based on big data, including: receiving a user behavior log of a user to be classified; according to a user behavior log, determining P first session sets corresponding to the user behavior log and behavior description knowledge fields corresponding to each first session set;

For each first session set, determining a feature difference priority corresponding to the first session set according to a behavior description knowledge field of the first session set;

optimizing the behavior description knowledge fields corresponding to the P first session sets respectively according to the characteristic difference priority corresponding to each first session set to obtain Q second session sets and behavior description knowledge fields corresponding to each second session set, wherein Q is smaller than P;

according to the behavior description knowledge field corresponding to each second session set, determining the personalized behavior label of the user behavior log;

and determining the category attribute corresponding to the personalized behavior label according to a preset personalized label mapping relation.

In the embodiment of the application, the repeated and invalid behavior description knowledge fields can be cleaned by acquiring the characteristic difference priority which can reflect the forward excitation of each first session set on the obtained personalized behavior tag and optimizing the behavior description knowledge fields respectively corresponding to the P first session sets. For example, filtering out no forward incentive or less forward incentive than desired behavioral description knowledge fields for the obtained personalized behavior tags; based on the information, behavior description knowledge fields corresponding to the Q second session sets affecting the obtained personalized behavior tags are obtained. Then, determining personalized behavior labels of the user behavior log through behavior description knowledge fields corresponding to the Q second session sets; the reliability of the personalized behavior label is ensured, the reliability of user classification is further improved, and the data processing amount of the behavior description knowledge field is relieved because Q is smaller than P, so that the analysis efficiency is improved, and the calculation consumption is relieved.

Optionally, for each first session set, determining, according to a behavior description knowledge field of the first session set, a feature difference priority corresponding to the first session set includes:

for each first session set, performing dimension reduction operation on the behavior description knowledge field corresponding to the first session set to obtain a dimension reduction knowledge field corresponding to the first session set;

and determining the feature difference priority corresponding to the first session set according to the dimension reduction knowledge field corresponding to each first session set.

In the embodiment of the application, the dimension of the behavior description knowledge field can be reduced through dimension reduction operation, and the capacity of the knowledge field is reduced so as to carry out subsequent processing, thereby being convenient for more accurately acquiring the characteristic difference priority corresponding to the first session set.

Optionally, the determining the feature difference priority of the first session set according to the dimension-reduction knowledge field corresponding to each first session set includes:

performing global unified processing on the dimension reduction knowledge fields corresponding to each first session set to obtain dimension reduction knowledge fields after global unified processing;

performing field compression on the dimension reduction knowledge field subjected to global unified processing to obtain a first transition knowledge field, wherein the field rank corresponding to the first transition knowledge field is smaller than the field rank corresponding to the dimension reduction knowledge field;

And determining the characteristic difference priority corresponding to each first session set according to the first transition knowledge field corresponding to each first session set.

In the embodiment of the application, the dimension reduction knowledge fields corresponding to each first session set can be subjected to field unification, in other words, the dimension reduction knowledge fields are defined in a range, and then the field rank of the dimension reduction knowledge fields subjected to the global unification can be shortened through field compression, so that the calculation consumption and time for determining the characteristic difference priority corresponding to the first session set are reduced.

Optionally, the performing global unified processing on the dimension reduction knowledge field corresponding to each first session set to obtain a dimension reduction knowledge field after global unified processing includes:

determining a global unified coefficient corresponding to each first session set according to the dimension reduction knowledge field corresponding to each first session set;

and carrying out global unified processing on the dimension reduction knowledge fields corresponding to each first session set according to the global unified coefficient corresponding to each first session set to obtain the dimension reduction knowledge fields after the global unified processing.

In the embodiment of the application, according to the determined global unified coefficient, the dimension reduction knowledge fields corresponding to the first session sets can be subjected to data weighting, so that the reliability of each obtained dimension reduction knowledge field subjected to global unified processing is higher, and the reliability of the characteristic difference priority corresponding to the first session sets is increased conveniently.

Optionally, the performing field compression on the dimension reduction knowledge field after the global unified processing to obtain a first transition knowledge field includes:

determining a rank reduction ratio corresponding to the globally uniformly processed dimension reduction knowledge field according to the globally uniformly processed dimension reduction knowledge field;

and carrying out field compression on the dimension reduction knowledge field after global unified processing through the rank reduction proportion to obtain the first transition knowledge field.

In the embodiment of the application, the dimension reduction knowledge fields after global unified processing are subjected to field compression by the rank reduction proportion, so that the dimension reduction knowledge fields after different global unified processing can be subjected to multi-level data extrusion, repeated invalid knowledge fields are reduced, a first transition knowledge field is obtained, the reliability and the simplicity of the first transition knowledge field are ensured, the accuracy of the characteristic difference priority corresponding to the first session set is further facilitated to be increased, and the acquisition barrier of the characteristic difference priority corresponding to the first session set is reduced.

Optionally, the determining, according to the first transitional knowledge field corresponding to each first session set, a feature difference priority corresponding to each first session set includes:

performing expansion processing on each first transition knowledge field, and performing field compression on the first transition knowledge fields after expansion processing to obtain second transition knowledge fields corresponding to each first transition knowledge field;

and determining the feature difference priority corresponding to each first session set according to each second transition knowledge field.

In the embodiment of the application, the first transition knowledge field is expanded to make the field space more abundant, the obtained second transition knowledge field has more perfect information, and further, the reliability of the feature difference priority corresponding to each obtained first session set can be increased.

Optionally, optimizing the behavior description knowledge fields corresponding to the P first session sets according to the feature difference priority corresponding to each first session set to obtain Q second session sets and behavior description knowledge fields corresponding to each second session set, including:

Determining a first knowledge field array corresponding to the characteristic difference priority according to the characteristic difference priority corresponding to each first session set, wherein the array size of the first knowledge field array is P multiplied by Q;

performing overturn processing on the first knowledge field array to obtain a second knowledge field array with an array size of Q multiplied by P;

and optimizing the behavior description knowledge fields respectively corresponding to the P first session sets according to the second knowledge field array and the session set array corresponding to the behavior description knowledge fields of the first session sets to obtain the Q second session sets and the behavior description knowledge fields corresponding to each second session set.

In the embodiment of the application, according to the size inversion of the array, the size of the knowledge field corresponding to each characteristic difference priority in the first knowledge field array can be inverted, so that the obtained second knowledge field array can be integrated with the conversation set array, further, according to the integrated product, the optimization corresponding to the behavior description knowledge field corresponding to each corresponding to the first conversation set in the conversation set array can be performed, the repeated and invalid behavior description knowledge fields can be cleaned, the behavior description knowledge fields corresponding to the Q second conversation sets of the obtained personalized behavior tag are obtained, the calculation consumption is saved, and the analysis efficiency is increased.

Optionally, the determining, according to the behavior description knowledge field corresponding to each second session set, a personalized behavior tag of the user behavior log includes:

the second session set is replaced by a first session set, the value corresponding to P is corrected to the value of the first session set, the steps of circularly executing the steps of determining the characteristic difference priority corresponding to the first session set according to the behavior description knowledge field of the first session set, and determining the label attribution confidence result corresponding to the user behavior log according to the behavior description knowledge field corresponding to each second session set obtained last time when the number of times of circulation meets preset conditions;

and determining the personalized behavior label of the user behavior log according to the label attribution confidence result.

In the embodiment of the application, the repeated and invalid behavior description knowledge fields can be deleted repeatedly for the behavior description knowledge fields corresponding to the user behavior logs, the repeated and invalid behavior description knowledge fields can be accurately cleaned, the behavior description knowledge fields affecting the obtained personalized behavior labels can be obtained, and based on the behavior description knowledge fields, the accurate personalized behavior labels can be obtained according to the label attribution confidence results of the obtained user behavior logs corresponding to the personalized behavior labels.

Optionally, the steps are implemented through a user classification model, and the user classification model is obtained through training by the following steps; acquiring a training user behavior log; inputting the training user behavior log into a user classification model to be trained, wherein the user classification model comprises a plurality of optimization modules;

outputting a first estimated behavior description knowledge field through an optimization module of the user classification model to be trained, and determining a first estimated label attribution result corresponding to the training user behavior log;

the optimization module is used for determining the characteristic difference priority of the first estimated session set according to the behavior description knowledge fields of the first estimated session set corresponding to the training user behavior log, and optimizing the behavior description knowledge fields corresponding to the M first estimated session sets respectively according to the characteristic difference priority corresponding to each first estimated session set to obtain N target estimated session sets and first estimated behavior description knowledge fields corresponding to each target estimated session set;

inputting the training user behavior log into a preset learning model, outputting a second estimated behavior description knowledge field through each optimization module of the learning model, and determining a second estimated label attribution result corresponding to the training user behavior log;

Determining an estimated cost value of the user classification model to be trained according to the first estimated behavior description knowledge field, the second estimated behavior description knowledge field, the first estimated label attribution result and the second estimated label attribution result, and training the user classification model to be trained until convergence through the estimated cost value.

In the embodiment of the application, according to the second estimated behavior description knowledge field and the second estimated label attribution result output by the preset learning model and the first estimated behavior description knowledge field and the first estimated label attribution result output by the user classification model to be trained, the user classification model finishes high-quality knowledge extraction to obtain accurate estimated cost value, and then the user classification model to be trained is trained through the estimated cost value, so that the reasoning accuracy of the user classification model to be trained is improved.

Optionally, the determining the estimated cost value of the user classification model to be trained according to the first estimated behavior description knowledge field, the second estimated behavior description knowledge field, the first estimated tag attribution result and the second estimated tag attribution result includes:

For each optimization module in the user classification model to be trained, determining a first generation value corresponding to the optimization module according to a first estimated behavior description knowledge field and a second estimated behavior description knowledge field corresponding to the optimization module;

determining a second cost value of the user classification model to be trained according to the first estimated label attribution result and the second estimated label attribution result;

and determining the estimated cost value according to the first cost value and the second cost value corresponding to each optimization module.

According to the embodiment of the application, according to the first estimated behavior description knowledge field and the second estimated behavior description knowledge field corresponding to the optimization module, the first generation value of the optimization module when the behavior description knowledge field is estimated can be obtained, according to the first estimated label attribution result and the second estimated label attribution result, the second generation value of the user classification model to be trained when the last label attribution result is estimated can be obtained, according to the first generation value and the second generation value, the estimated generation value which is simultaneously associated with the optimization module and the user classification model to be trained when the last label attribution result is estimated can be obtained, the inference accuracy of the optimization module in the final user classification model can be increased by training the user classification model to be trained through the estimated cost value, the accuracy of the label attribution result output by the user classification model can be increased, and the reliability of the personalized behavior label output by the user classification model can be finally increased.

Optionally, the determining, according to the first estimated behavior description knowledge field and the second estimated behavior description knowledge field corresponding to the optimization module, the first generation value corresponding to the optimization module includes:

determining M recovery estimation behavior description knowledge fields according to each first estimation behavior description knowledge field;

the M is the number of the first estimated session sets corresponding to the training user behavior logs;

and determining the first generation value corresponding to the optimization module according to the M recovery estimation behavior description knowledge fields and the second estimation behavior description knowledge fields.

In the embodiment of the present application, since the first estimated behavior description knowledge field output by the user classification model to be trained is an estimated behavior description knowledge field from which the repeated and invalid behavior description knowledge field is removed, the number of the first estimated behavior description knowledge fields is smaller than the number of the first set of estimated sessions. The number of the second estimated behavior description knowledge fields output by the learning model is the same as the number of the first estimated behavior description knowledge fields, so that the number of the knowledge fields corresponding to the first estimated behavior description knowledge fields output by the user classification model to be trained can be recovered according to the obtained M estimated behavior description knowledge fields, the number of the knowledge fields corresponds to the second estimated behavior description knowledge fields, the one-by-one comparison between the estimated behavior description knowledge fields and the second estimated behavior description knowledge fields is completed, the cost value between each estimated behavior description knowledge field and the second estimated behavior description knowledge field is obtained, knowledge refinement of the user classification model to be trained is completed, and then the reliable and accurate first generation value is obtained through the cost value between each estimated behavior description knowledge field and the second estimated behavior description knowledge field.

Optionally, determining M recovery putative behavioral description knowledge fields according to each first putative behavioral description knowledge field includes:

performing global unified processing on a second knowledge field array corresponding to the first estimated behavior description knowledge field to obtain a first estimated dimension reduction knowledge field after global unified processing, and performing overturn processing on the knowledge field array corresponding to the first estimated dimension reduction knowledge field to obtain a third knowledge field array;

performing field compression on the third knowledge field array to obtain a second estimated dimension reduction knowledge field, and performing expansion processing on the second estimated dimension reduction knowledge field to obtain a third estimated dimension reduction knowledge field;

and performing field compression on the knowledge field array corresponding to the third estimated dimension reduction knowledge field, performing overturn processing on the knowledge field array after field compression to obtain a fourth knowledge field array, and determining the M recovery estimated behavior description knowledge fields according to the fourth knowledge field array, wherein the array size of the fourth knowledge field array comprises M fields, and the field rank in the array size of the fourth knowledge field array is the field rank corresponding to the behavior description knowledge field of the first estimated session set.

According to the embodiment of the application, according to the processing modes of global unified processing, array size overturning, field compression and the like, reverse processing of each processing process of the behavior description knowledge field corresponding to the first estimated session set by the optimization module can be performed, and corresponding quantity of recovery is completed for the first estimated behavior description knowledge field, so that M recovery estimated behavior description knowledge fields are obtained, and the recovery estimated behavior description knowledge field corresponds to the second estimated behavior description knowledge field in quantity.

Optionally, the determining the M recovery putative behavioral description knowledge fields according to the fourth knowledge field array includes:

performing global unified processing on the fourth knowledge field array, and performing field compression on the fourth knowledge field array subjected to global unified processing for a plurality of times to obtain a fifth knowledge field array;

and determining M recovery estimation behavior description knowledge fields according to the fifth knowledge field array and the fourth knowledge field array.

According to the embodiment of the application, the information of each estimated behavior description knowledge field in the fourth knowledge field array can be perfected according to global unified processing and multiple field compression performed on the fourth knowledge field array to obtain a fifth knowledge field array, and then, according to the splicing of the estimated behavior description knowledge fields in the fifth knowledge field array and the fourth knowledge field array, residual connection can be performed on the estimated behavior description knowledge fields in the knowledge field array, so that network degradation is prevented, and further, accurate recovery estimated behavior description knowledge fields are obtained.

Optionally, the determining, according to the M pieces of recovery putative behavioral description knowledge fields and the second putative behavioral description knowledge fields, a first generation value corresponding to the optimization module includes:

determining a first cost value according to the M recovery estimation behavior description knowledge fields and the second estimation behavior description knowledge fields;

performing recovery processing on the recovery estimation behavior description knowledge field to obtain a first target estimation behavior description knowledge field associated with the recovery estimation behavior description knowledge field, and obtaining a third estimation tag attribution result corresponding to the first target estimation behavior description knowledge field;

restoring the second estimated behavior description knowledge field to obtain a second target estimated behavior description knowledge field associated with the second estimated behavior description knowledge field, and obtaining a fourth estimated tag attribution result corresponding to the second target estimated behavior description knowledge field;

determining a second cost value according to the third estimated tag attribution result and the fourth estimated tag attribution result;

and determining the first generation value according to the first generation value and the second generation value.

In the embodiment of the application, according to M recovery estimation behavior description knowledge fields and the second estimation behavior description knowledge fields, a first cost value of an optimization module when estimating the behavior description knowledge fields can be obtained. According to the recovery processing of the recovery estimation behavior description knowledge field, the recovery estimation behavior description knowledge field can be analyzed, the second cost value when the recovery estimation behavior description knowledge field is output can be obtained, the user classification model is trained according to the first cost value obtained by the second cost value and the first cost value, and the reliability of the optimization module for outputting the recovery estimation behavior description knowledge field can be increased.

Optionally, the determining the second cost value according to the third estimated tag attribution result and the fourth estimated tag attribution result includes:

determining a third sub-cost value according to the third estimated tag attribution result and a first reference tag attribution result corresponding to the third estimated tag attribution result;

determining a fourth sub-cost value according to the fourth estimated tag attribution result and a second reference tag attribution result corresponding to the fourth estimated tag attribution result;

And determining the second sub-cost value according to the third sub-cost value and the fourth sub-cost value.

In the embodiment of the application, the learning model and the user classification model to be trained respectively correspond to different reference label attribution results, so that the targets of the output estimated label attribution results are different, and the replacement value is obtained through the different reference label attribution results, so that the reliability of obtaining the third cost value and the fourth cost value can be increased, and the second cost value can be further and accurately obtained.

Optionally, the determining the estimated cost value according to the first cost value and the second cost value corresponding to each optimization module includes:

determining a confidence estimation cost value corresponding to the user classification model to be trained according to the first estimation label attribution result and the reference label attribution result corresponding to the training user behavior log;

and determining the estimated cost value according to the first cost value, the second cost value and the confidence estimated cost value corresponding to each optimization module.

According to the embodiment of the application, according to the first estimated tag attribution result and the reference tag attribution result corresponding to the training user behavior log, the cost-confidence estimated cost value between the first estimated tag attribution result and the actual reference tag attribution result output by the user classification model to be trained can be obtained, and the reliability of the estimated tag attribution result output by the user classification model can be increased by training the user classification model to be trained through the cost.

carrying out knowledge field analysis on the training user behavior log through a preset machine learning model, and determining a fifth estimated label attribution result corresponding to the training user behavior log;

determining a third generation value of the user classification model to be trained according to the fifth estimated tag attribution result and the first estimated tag attribution result;

and determining the estimated cost value according to the first cost value, the second cost value and the third cost value.

According to the method and the device for obtaining the third generation value between the estimated label attribution output by the user classification model to be trained and the preset machine learning model, the process is richer, and the training singleness of the user classification model is avoided.

In a second aspect, an embodiment of the present application provides a user classification system, including a processor and a memory connected to each other, where the memory stores a computer program, which when executed by the processor, implements a method as provided in the first aspect of the embodiment of the present application.

In the following description, other features will be partially set forth. Upon review of the ensuing disclosure and the accompanying figures, those skilled in the art will in part discover these features or will be able to ascertain them through production or use thereof. The features of the present application may be implemented and obtained by practicing or using the various aspects of the methods, tools, and combinations that are set forth in the detailed examples described below.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the embodiments will be briefly described below, it being understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and other related drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

The methods, systems, and/or programs in the accompanying drawings will be described further in terms of exemplary embodiments. These exemplary embodiments will be described in detail with reference to the drawings. These exemplary embodiments are non-limiting exemplary embodiments, wherein reference numerals represent similar mechanisms throughout the several views of the drawings.

FIG. 1 is a block diagram of a big data based user classification system according to some embodiments of the application.

Fig. 2 is a flow chart of a big data based user classification method according to some embodiments of the application.

Fig. 3 is a schematic diagram of an architecture of a user classification device according to an embodiment of the present application.

Detailed Description

In order to better understand the above technical solutions, the following detailed description of the technical solutions of the present application is made by using the accompanying drawings and specific embodiments, and it should be understood that the specific features of the embodiments and the embodiments of the present application are detailed descriptions of the technical solutions of the present application, and not limiting the technical solutions of the present application, and the technical features of the embodiments and the embodiments of the present application may be combined with each other without conflict.

In the following detailed description, numerous specific details are set forth by way of examples in order to provide a thorough understanding of the relevant teachings. It will be apparent, however, to one skilled in the art that the application can be practiced without these details. In other instances, well known methods, procedures, systems, components, and/or circuits have been described at a relatively high-level, without detail, in order to avoid unnecessarily obscuring aspects of the present application.

These and other features, together with the functions, acts, and combinations of parts and economies of manufacture of the related elements of structure, all of which form part of this application, may become more apparent upon consideration of the following description with reference to the accompanying drawings. It is to be expressly understood, however, that the drawings are for the purpose of illustration and description only and are not intended as a definition of the limits of the application. It should be understood that the drawings are not to scale. It is to be expressly understood, however, that the drawings are for the purpose of illustration and description only and are not intended as a definition of the limits of the application. It should be understood that the figures are not to scale.

The present application uses a flowchart to illustrate the execution of a system according to an embodiment of the present application. It should be clearly understood that the execution of the flowcharts may be performed out of order. Rather, these implementations may be performed in reverse order or concurrently. Additionally, at least one other execution may be added to the flowchart. One or more of the executions may be deleted from the flowchart.

Fig. 1 is a schematic architecture diagram of a user classification system 100 according to some embodiments of the application, the user classification system 100 comprising a user classification device 110, a memory 120, a processor 130, and a communication unit 140. The memory 120, the processor 130, and the communication unit 140 are electrically connected directly or indirectly to each other to realize data transmission or interaction. For example, the components may be electrically connected to each other via one or more communication buses or signal lines. The user classification device 110 includes at least one software function module that may be stored in the memory 120 in the form of software or firmware (firmware) or cured in an Operating System (OS) of the user classification system 100. The processor 130 is configured to execute executable modules stored in the memory 120, such as software function modules and computer programs included in the remote education-based service information processing apparatus 110.

The Memory 120 may be, but is not limited to, a random access Memory (Random Access Memory, RAM), a Read Only Memory (ROM), a programmable Read Only Memory (Programmable Read-Only Memory, PROM), an erasable Read Only Memory (Erasable Programmable Read-Only Memory, EPROM), an electrically erasable Read Only Memory (Electric Erasable Programmable Read-Only Memory, EEPROM), etc. The memory 120 is used for storing a program, and the processor 130 executes the program after receiving an execution instruction. The communication unit 140 is used for establishing a communication connection between the user classification system 100 and the terminal device through a network, and for transceiving user behavior data through the network.

The processor may be an integrated circuit chip having signal processing capabilities. The processor may be a general-purpose processor, including a central processing unit (Central Processing Unit, CPU), a network processor (Network Processor, NP), etc.; but also Digital Signal Processors (DSPs)), application Specific Integrated Circuits (ASICs), field Programmable Gate Arrays (FPGAs) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components. The disclosed methods, steps, and logic blocks in the embodiments of the present invention may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

It is to be understood that the configuration shown in fig. 1 is merely illustrative and that user classification system 100 may also include more or fewer components than shown in fig. 1 or have a different configuration than shown in fig. 1. The components shown in fig. 1 may be implemented in hardware, software, or a combination thereof.

Fig. 2 is a flowchart illustrating a big data based user classification method according to some embodiments of the present application, which is applied to the user classification system 100 in fig. 1, and may include the following steps S1 to S6. Some alternative embodiments will be described on the basis of the following steps S1-S6, which should be understood as examples and should not be interpreted as essential features for implementing the present solution.

Step S1, receiving a user behavior log of a user to be classified.

In the embodiment of the present application, the user behavior log is a set of behavior data generated when a user performs service interaction, and may be obtained in real time through the server 100 and formed according to a period. These behavioral data may be game operations, prop purchases, and language exchanges in the game scenario; the product browsing history, the consumption history and the clicking history in the shopping process of the electronic commerce can be realized; may be an incentive, comment, click, etc. when the video or novice platform browses the content. In the user behavior log, various types of behaviors are stored in a partition mode.

Step S2, according to the user behavior log, determining P first session sets corresponding to the user behavior log and behavior description knowledge fields corresponding to each first session set.

In the embodiment of the present application, the first session set may be a plurality of user behavior data sets obtained by dividing the user behavior log according to a preset partition manner, each session set corresponds to one partition in the user behavior log, for example, in a game application scenario, the session set 1 corresponds to a prop purchase behavior data set of the user behavior log, the session set 2 corresponds to action track data of the user behavior log aiming at the map 1, the session set 3 corresponds to action track data of the user behavior log aiming at the map 2, the session set 4 corresponds to operation data of the user behavior log aiming at the BOSS1, the session set 5 corresponds to upgrade data … … of the user behavior log aiming at the prop 1 in other internet application scenarios, and the session set may be divided according to actual scenario needs. The behavior description knowledge field (or vector) corresponding to each first session set is the behavior description knowledge field of the partition of the user behavior log corresponding to that first session set.

The P first session sets corresponding to the user behavior log and the behavior description knowledge field corresponding to each first session set may be directly obtained by the server 100, or may be obtained by a pre-trained user classification model. The user classification model may be, for example, any neural network, and includes a plurality of optimization modules for performing optimization processing on the behavior description knowledge fields, and obtaining the behavior description knowledge fields corresponding to the second session set after the optimization, and further includes a classification module for determining the label attribution confidence result corresponding to the user behavior log. In the embodiment of the application, after the user behavior log is obtained, the user behavior log is input into a user classification model, and the user behavior log is processed through the user classification model to obtain P first session sets corresponding to the user behavior log and behavior description knowledge fields corresponding to each first session set.

Step S3, for each first session set, determining the feature difference priority corresponding to the first session set according to the behavior description knowledge field of the first session set.

Wherein the feature difference priorities are used for indicating forward excitation of the obtained personalized behavior tag by each first session set, and one first session set corresponds to one feature difference priority. For example, the feature difference priority may be a score, a score corresponding to a degree of importance, or a duty cycle corresponding to a forward incentive.

Specifically, the user behavior log is a game user behavior log, if the behavior description knowledge field of one first session set corresponding to the user behavior log is behavior data of a user purchasing prop, the purchasing prop can intuitively reflect the behavior trend of the user and is beneficial to classifying the user, the forward excitation of the corresponding first session set on the obtained personalized behavior tag can be determined to be high, and therefore the score corresponding to the first session set is high; if the behavior description knowledge field of one first session set corresponding to the user behavior log is the talking content of the user in the game chat frame, because the fitting degree of the talking content and the game may not be high, the forward incentive of the corresponding first session set to the obtained personalized behavior label is low, and even the knowledge field corresponding to the first session set may be regarded as redundant, repeated or invalid knowledge field, so that a lower score is given. Optionally, for each first session set, knowledge field extraction may be performed on the behavior description knowledge field of the first session set, and according to the result of the knowledge field extraction, an excitation result of the behavior description knowledge field of the first session set on the obtained personalized behavior tag is determined, so as to determine the feature difference priority corresponding to the first session set according to the excitation degree of the obtained behavior description knowledge field of the first session set on the obtained personalized behavior tag. Optionally, in another embodiment, when the user behavior log is processed through the user classification model, after determining the behavior description knowledge field corresponding to each first session set, the user classification model may compress the behavior description knowledge field corresponding to each first session set in the P first session sets for multiple times through the optimization module, obtain the excitation duty ratio (influence degree) corresponding to each first session set according to the compression result, and perform operations such as linear transformation on the behavior description knowledge field corresponding to each first session set for multiple times through the optimization module, so as to obtain the feature difference priority corresponding to each first session set.

And S4, optimizing the behavior description knowledge fields corresponding to the P first session sets respectively according to the characteristic difference priority corresponding to each first session set to obtain Q second session sets and the behavior description knowledge fields corresponding to each second session set, wherein Q is smaller than P. Q is the number of the obtained second session set and the behavior description knowledge fields corresponding to the second session set. The second session set is a session set obtained by cleaning the repeated and invalid behavior description knowledge fields from the behavior description knowledge fields corresponding to the first session set according to the characteristic difference priority corresponding to each first session set. Q may be associated with P, e.g., q=p/2.

The behavior description knowledge field corresponding to each second session set is a behavior description knowledge field after repeated and invalid behavior description knowledge fields are removed. In an alternative embodiment, after obtaining the feature difference priority corresponding to each first session set, determining repeated and invalid behavior description knowledge fields in the behavior description knowledge fields corresponding to the P first session sets respectively according to the feature difference priority corresponding to each first session set, and then optimizing the behavior description knowledge fields corresponding to the P first session sets respectively according to the determined repeated and invalid behavior description knowledge fields to obtain Q second session sets and the behavior description knowledge fields corresponding to each second session set, for example, through aggregation processing of the behavior description knowledge fields. If the user behavior log is processed through the user classification model, after the user classification model obtains the characteristic difference priority corresponding to each first session set, optimizing the behavior description knowledge fields corresponding to the P first session sets respectively according to the characteristic difference priority corresponding to each first session set, so as to obtain Q second session sets and the behavior description knowledge fields corresponding to each second session set. In addition, when the user behavior log is processed through the user classification model, each optimization module executes the steps, for example, after the first optimization module optimizes the behavior description knowledge field corresponding to the first session set, Q second session sets and the behavior description knowledge field corresponding to each second session set are obtained, the second session set is replaced by the second session set, the value corresponding to P is corrected to the value of the first session set, then the second optimization module again optimizes the behavior description knowledge field corresponding to each second session set output by the first optimization module, and in anticipation of a cyclic reciprocation, the behavior description knowledge field corresponding to each second session set output by the second optimization module can be optimized again through the third optimization module, so that a new second session set and a new behavior description knowledge field corresponding to the second session set are obtained. The number of the second session sets output by each optimizing module and the number of the behavior description knowledge fields corresponding to the second session sets output by each optimizing module are smaller than the number of the second session sets output by the last optimizing module and the number of the behavior description knowledge fields corresponding to the second session sets output by the last optimizing module.

In this way, according to the operation of the plurality of optimization modules, repeated and invalid behavior description knowledge fields in the user behavior log are filtered, so that the behavior description knowledge fields of the personalized behavior tags obtained by forward excitation are obtained.

And S5, determining personalized behavior labels of the user behavior log according to the behavior description knowledge fields corresponding to each second session set.

The personalized behavior label represents a behavior classification matched with the user behavior log, for example, in a game scene, the personalized behavior label matched with the user behavior log is 1, the label 1 indicates that the user is consumption conservation type, it is easy to understand that the label 1 can also be other expression forms, for example, a label A, the matched user behavior classification can be set by itself according to actual conditions, for example, consumption impulse type, specific hero preference, specific map preference, specific prop preference and the like, and in other application scenes, for example, a video platform, the behavior classification corresponding to the label can be specific classification film preference, specific author preference, specific country film preference and the like, and other application scenes are not exemplified one by one. As an implementation manner, knowledge field extraction can be performed on the behavior description knowledge field corresponding to each second session set, and personalized behavior labels of the user behavior log can be determined according to the result.

And S6, determining the category attribute corresponding to the personalized behavior label according to a preset personalized label mapping relation.

In the embodiment of the application, the one-to-one mapping relation between the personalized behavior tags and the user classifications is stored in advance, and the corresponding class attributes can be directly obtained through the mapping relation after the behavior tags are obtained.

By acquiring the characteristic difference priority which can reflect the forward excitation of each first session set on the obtained personalized behavior label, the behavior description knowledge fields corresponding to the P first session sets respectively are optimized, and the repeated and invalid behavior description knowledge fields can be cleaned. For example, filtering out no forward incentive or less forward incentive than desired behavioral description knowledge fields for the obtained personalized behavior tags; based on the information, behavior description knowledge fields corresponding to the Q second session sets affecting the obtained personalized behavior tags are obtained. Then, determining personalized behavior labels of the user behavior log through behavior description knowledge fields corresponding to the Q second session sets; the reliability of the personalized behavior label is ensured, the reliability of user classification is further improved, and the data processing amount of the behavior description knowledge field is relieved because Q is smaller than P, so that the analysis efficiency is improved, and the calculation consumption is relieved.

In the embodiment of the present application, for step S3, the execution of the user classification model may be performed, and the following execution of the user classification model by an optimization module thereof is exemplified, which may include the following steps:

step S31, for each first session set, performing dimension reduction operation on the behavior description knowledge field corresponding to the first session set to obtain the dimension reduction knowledge field corresponding to the first session set.

And for each first session set, performing dimension reduction operation on the behavior description knowledge field corresponding to the first session set through an optimization module in the user classification model to obtain dimension reduction knowledge fields corresponding to the first session set, thereby obtaining dimension reduction knowledge fields corresponding to each first session set. Wherein the dimension reduction operation can be performed by an encoder arranged in the optimization module. The user behavior log is processed by a user classification model, and the obtained behavior description knowledge fields corresponding to the first session sets can be displayed in a knowledge field array (the display form of the knowledge field array can be understood as a knowledge field matrix), and one knowledge field (vector) in the knowledge field array corresponds to the behavior description knowledge field of one first session set. The array size of the knowledge field array corresponding to the first session set may be p×q, P is the number of behavior description knowledge fields, P < D, and D is a field rank (the highest order of the effective knowledge fields in the array) corresponding to the behavior description knowledge fields obtained by processing the user behavior log by the user classification model. And performing dimension reduction operation on each behavior description knowledge field in the knowledge field array with the size of P multiplied by D through an encoder of the optimization module to obtain dimension reduction knowledge fields corresponding to each first session set. The corresponding number of dimension reduction knowledge fields corresponds to the number of behavior description knowledge fields corresponding to the first session set.

Step S32, determining the feature difference priority corresponding to the first session set according to the dimension reduction knowledge field corresponding to each first session set.

And after the dimension reduction knowledge field corresponding to each first session set is obtained, the dimension reduction knowledge field corresponding to each first session set is processed again through an optimization module, so that the characteristic difference priority corresponding to the first session set is obtained. The specific process may include:

and step S321, performing global unified processing on the dimension reduction knowledge fields corresponding to each first session set to obtain dimension reduction knowledge fields after global unified processing.

And after the optimization module outputs the dimension reduction knowledge fields corresponding to the first session sets, performing global unified processing on the dimension reduction knowledge fields corresponding to each first session set to obtain the dimension reduction knowledge fields after global unified processing corresponding to each dimension reduction knowledge field. The number of the obtained dimension reduction knowledge fields after global unified processing corresponds to the number of dimension reduction knowledge fields corresponding to the first session set. Global unified processing is a process of normalizing dimension-reduced knowledge fields, which may be performed using a norm function.

Step S321 may include:

step S3211, determining a global unified coefficient corresponding to each first session set according to the dimension-reduction knowledge field corresponding to each first session set.

The global unified coefficient is used for carrying out unified weighting operation on the dimension reduction knowledge field corresponding to the first session set. After the dimension reduction knowledge fields corresponding to each first session set are obtained, knowledge field extraction is carried out on the dimension reduction knowledge fields corresponding to each first session set, repeated and invalid knowledge fields in each dimension reduction knowledge field are determined, and then global unified coefficients corresponding to each dimension reduction knowledge field are determined based on the repeated and invalid knowledge fields in each dimension reduction knowledge field.

Step S3212, performing global unified processing on the dimension reduction knowledge fields corresponding to each first session set according to the global unified coefficient corresponding to each first session set, so as to obtain the dimension reduction knowledge fields after the global unified processing.

And step S322, performing field compression on the dimension reduction knowledge field subjected to the global unified processing to obtain a first transition knowledge field.

The field rank corresponding to the first transition knowledge field is smaller than the field rank corresponding to the dimension reduction knowledge field. The first transitional knowledge field is a knowledge field after field compression. Specifically, field compression (for example, through full-connection mapping, linear function operation in a linear layer is adopted) can be performed on each globally and uniformly processed dimension-reduced knowledge field output through the norm function processing by an optimization module, so that a first transition knowledge field corresponding to each globally and uniformly processed dimension-reduced knowledge field is obtained. The number of the obtained first transition knowledge fields corresponds to the number of the dimension reduction knowledge fields after global unified processing, and the field rank corresponding to each first transition knowledge field is smaller than the field rank of the behavior description knowledge field corresponding to the first transition knowledge field, for example, the field rank corresponding to the first transition knowledge field is one half of the field rank of the behavior description knowledge field corresponding to the first transition knowledge field. As an embodiment, step S322 may include the steps of:

Step S3221, determining the rank reduction ratio corresponding to the dimension reduction knowledge field after the global unified processing according to the dimension reduction knowledge field after the global unified processing.

In this embodiment, the rank reduction ratio indicates a compression ratio in a field rank compression process corresponding to the dimension reduction knowledge field. Specifically, knowledge field extraction is performed on each dimension reduction knowledge field after global unified processing, repeated and invalid dimension reduction knowledge fields in each dimension reduction knowledge field after global unified processing are determined, and then the rank reduction ratio corresponding to each dimension reduction knowledge field after global unified processing is determined based on the repeated and invalid dimension reduction knowledge fields in each dimension reduction knowledge field after global unified processing.

Step S3222, performing field compression on the dimension reduction knowledge field subjected to global unified processing through rank reduction proportion to obtain a first transition knowledge field.

For example, the field rank of the dimension reduction knowledge field corresponding to each first session set may be reduced by a linear function based on the rank reduction ratio corresponding to each dimension reduction knowledge field after global unified processing, so as to obtain a first transition knowledge field corresponding to each dimension reduction knowledge field after global unified processing.

Step S323, determining the feature difference priority corresponding to each first session set according to the first transition knowledge field corresponding to each first session set. For example, according to the first transitional knowledge field corresponding to each first session set, deep knowledge field transformation can be performed on each first transitional knowledge field, so as to obtain the feature difference priority corresponding to the first session set.

The step S323 may include:

step S3231, performing expansion processing on each first transition knowledge field, and performing field compression on the first transition knowledge fields after expansion processing to obtain second transition knowledge fields corresponding to each first transition knowledge field.

For example, the first transition knowledge fields can be expanded through an activation function, and then the first transition knowledge fields after the expansion are subjected to field compression through a linear function, so that second transition knowledge fields corresponding to each first transition knowledge field are obtained. The field rank corresponding to the second transition knowledge field obtained here is smaller than the corresponding field rank of the first transition knowledge field corresponding to the second transition knowledge field, for example, the field rank corresponding to the second transition knowledge field is P/2. If the array size of the knowledge field array corresponding to the first session set is p×d, and the array size of the knowledge field array corresponding to each first transition knowledge field may be p×d/2, then the array size of the knowledge field array corresponding to each second transition knowledge field obtained is p×p/2.

Step S3232, determining the feature difference priority corresponding to each first session set according to each second transition knowledge field.

For example, after the knowledge field array corresponding to the second transition knowledge fields is obtained, classifying each second transition knowledge field in the second transition knowledge fields through a normalized exponential function, and determining a score corresponding to each second transition knowledge field, thereby obtaining the feature difference priority corresponding to each second transition knowledge field.

As an embodiment, step S4 may include the steps of:

step S41, according to the feature difference priority corresponding to each first session set, determining a first knowledge field array corresponding to the feature difference priority.

Wherein the array size of the first knowledge field array is p×q. In this embodiment, the first knowledge field array includes a feature difference priority corresponding to each first session set, and the array size of the knowledge field array corresponding to the obtained feature difference priority is p×p/2, where the knowledge field array is the first knowledge field array.

And S42, performing overturn processing on the first knowledge field array to obtain second knowledge field arrays with the array size of P multiplied by QQ.

In the embodiment of the application, the turning processing is to turn over or transpose the space coordinates existing in the knowledge field corresponding to each characteristic difference priority in the knowledge field array. And (3) performing overturn processing on the first knowledge field array, and reversing the rows and columns of the first knowledge field array to obtain a second knowledge field array with the array size of Q multiplied by P. When the first knowledge field array is turned over, knowledge fields corresponding to the feature difference priorities of each row can be changed into knowledge fields corresponding to the feature difference priorities of each column one by one, and the knowledge fields become a second knowledge field array.

Step S43, optimizing the behavior description knowledge fields corresponding to the P first session sets respectively according to the second knowledge field array and the session set array corresponding to the behavior description knowledge fields of the first session sets to obtain Q second session sets and the behavior description knowledge fields corresponding to each second session set.

The session set array corresponding to the behavior description knowledge field of the first session set may be a knowledge field array corresponding to the dimension reduction knowledge field after the dimension reduction operation. For example, the second knowledge field array and the session set arrays corresponding to the behavior description knowledge fields of the first session sets may be multiplied to form a product, so that the behavior description knowledge fields corresponding to the P first session sets are optimally aggregated to obtain Q second session sets and the behavior description knowledge fields corresponding to each second session set. For example, if the session set array corresponding to the behavior description knowledge field of the first session set is a p×d knowledge field array, and the second knowledge field array is a q×p knowledge field array, where Q is P/2, then the second knowledge field array and the session set array are integrated to obtain a knowledge field array with a size of q×d, and the knowledge field of the knowledge field array of q×d is used as the behavior description knowledge field corresponding to the second session set.

In the embodiment of the present application, for step S5, since the user classification model includes a plurality of optimization modules, after performing optimization processing on P first session sets corresponding to the user behavior log and the behavior description knowledge field corresponding to each first session set by the first optimization module to obtain Q second session sets and the behavior description knowledge field corresponding to each second session set, the second session set output by the first optimization module may be replaced with the first session set, the number of new first session sets is replaced with P, and then the steps of determining the feature difference priority corresponding to the first session set according to the behavior description knowledge field of the first session set are circulated for each first session set. The cut-off is made when the number of cycles satisfies a preset condition, for example, reaches a predetermined number. For example, the first optimizing module optimizes P first session sets corresponding to the user behavior log and the behavior description knowledge field corresponding to each first session set to generate a first output, the output is used as the input of the second optimizing module, the second optimizing module optimizes the input to obtain a second output, the second optimizing module outputs are used as the input of the third optimizing module, the third optimizing module optimizes the input to obtain a third output, and the third optimizing module outputs are used as the behavior description knowledge field corresponding to each second session set obtained last time. And finally, determining a corresponding label attribution confidence result of the user behavior log according to the behavior description knowledge field corresponding to each second session set obtained last time, and determining a personalized behavior label of the user behavior log according to the label attribution confidence result. In the embodiment of the application, when determining the corresponding label attribution confidence result of the user behavior log, determining the corresponding target behavior description knowledge field of the user behavior log according to the behavior description knowledge field of each second session set obtained last time. In the above example, the last obtained behavior description knowledge field of each second session set, that is, the behavior description knowledge field of each second session set output by the third optimization module. The target behavior description knowledge field is a behavior description knowledge field obtained after performing dimension reduction operation on the behavior description knowledge field of each second session set obtained last time, and the target behavior description knowledge field comprises the behavior description knowledge field of each second session set. And determining the corresponding label attribution confidence result of the user behavior log according to the target behavior description knowledge field. The label attribution confidence results indicate a confidence level of the user behavior log corresponding to each label attribution result. For example, the target behavior description knowledge field may be input into a user classification model, and classified by a classification module set therein, to obtain a corresponding label attribution confidence result of the user behavior log. And determining the personalized behavior label of the user behavior log according to the label attribution confidence result. The label belongs to the maximum value of the confidence result, namely the personalized behavior label of the user behavior log.

The following describes a training process for a user classification model, comprising the steps of:

step S100, obtaining a training user behavior log.

The training user behavior log is a training sample, and can be any user behavior log corresponding to user classification.

Step S200, a training user behavior log is input into a user classification model to be trained, the training user behavior log is processed through the user classification model to be trained, a first estimated behavior description knowledge field output by each optimization module is determined, and a first estimated label attribution result corresponding to the training user behavior log is determined.

The optimization module is used for determining the characteristic difference priority of the first estimated session sets according to the behavior description knowledge fields of the first estimated session sets corresponding to the training user behavior logs, and optimizing the behavior description knowledge fields corresponding to the M first estimated session sets respectively according to the characteristic difference priority of each first estimated session set to obtain N target estimated session sets and the first estimated behavior description knowledge fields corresponding to each target estimated session set. The first estimated label attribution result indicates the possibility that the training user behavior log corresponds to each user behavior, the first estimated behavior description knowledge field is output by each optimizing module, the estimated behavior description knowledge field corresponding to each target estimated session set is output by one optimizing module. After the training user behavior log is input into the user classification model to be trained, the user classification model to be trained firstly partitions the training user behavior log to obtain M first estimated session sets corresponding to the training user behavior log, and determines a behavior description knowledge field corresponding to each first estimated session set. And processing the behavior description knowledge field corresponding to each first estimated session set through each optimization module in the user classification model to be trained to obtain a first estimated behavior description knowledge field output by each optimization module, and determining a first estimated label attribution result corresponding to the training user behavior log according to the first estimated behavior description knowledge field generated by the last optimization module.

Step S300, the training user behavior log is input into a preset learning model, the training user behavior log is processed through the learning model, a second estimated behavior description knowledge field output by each optimization module in the learning model is determined, and a second estimated label attribution result corresponding to the training user behavior log is determined.

The learning model may be a neural network model derived from a user classification model, such as a teacher-student network model, in which the user classification model to be trained is the student network. The number of the optimization modules in the learning model and the correspondence of the optimization modules in the user classification model, wherein the learning model comprises a first generation value, a second generation value, a third generation value and a fourth generation value, and an analysis module for determining a third estimated tag attribution result corresponding to the recovery estimated behavior description knowledge field and determining a fourth estimated tag attribution result corresponding to the second estimated behavior description knowledge field. The second estimated behavior description knowledge field is an estimated behavior description knowledge field output by an optimization module in the learning model, one optimization module in the learning model also corresponds to the second estimated behavior description knowledge field, and the second estimated label attribution result is obtained by the learning model and can reflect the possibility that the training user behavior log corresponds to each behavior classification. After the training user behavior log is input into the learning model, the learning model may partition the training user behavior log first to obtain M first estimated session sets corresponding to the training user behavior log, and determine a behavior description knowledge field corresponding to each first estimated session set. And then, performing dimension reduction operation on the behavior description knowledge fields corresponding to each of the M first estimated session sets one by one through each optimization module in the learning model, determining a second estimated behavior description knowledge field output by each optimization module, and determining a second estimated label attribution result corresponding to the training user behavior log according to the second estimated behavior description knowledge field output by the last optimization module. Each optimization module obtains M second estimated behavior description knowledge fields.

Step S400, determining the estimated cost value of the user classification model to be trained according to the first estimated behavior description knowledge field, the second estimated behavior description knowledge field, the first estimated label attribution result and the second estimated label attribution result, and training the user classification model to be trained according to the estimated cost value until convergence.

The condition of convergence may be training reaching a preset number of times or inference accuracy reaching a preset accuracy. The cost value corresponding to the estimated behavior description knowledge field may be determined according to each first estimated behavior description knowledge field and each second estimated behavior description knowledge field. A cost value between the two putative tag attribution results may also be determined based on the first putative tag attribution result and the second putative tag attribution result. And then determining the estimated cost value of the user classification model to be trained according to the cost value corresponding to the estimated behavior description knowledge field and the cost value between the two estimated label attribution results, and training the user classification model to be trained through the estimated cost value until convergence.

Wherein, step S400 may include:

step S401, for each optimization module in the user classification model to be trained, determining a first generation value corresponding to the optimization module according to the first estimated behavior description knowledge field and the second estimated behavior description knowledge field corresponding to the optimization module.

For each optimization module in the user classification model to be trained, the optimization module can be determined, the corresponding optimization module in the learning model is determined, and then a cost value between a first estimated behavior description knowledge field and a second estimated behavior description knowledge field corresponding to the optimization module is determined according to the first estimated behavior description knowledge field corresponding to the optimization module and the second estimated behavior description knowledge field corresponding to the optimization module, and the cost value is used as a first generation value corresponding to the optimization module.

Step S402, determining a second cost value of the user classification model to be trained according to the first estimated label attribution result and the second estimated label attribution result.

And determining a cost value between the two estimated label attribution results according to the first estimated label attribution result and the second estimated label attribution result, and taking the cost value as a second cost value of the user classification model to be trained.

Step S403, determining an estimated cost value according to the first cost value and the second cost value corresponding to each optimization module.

The first cost value and the second cost value corresponding to each optimization module can be used as estimated cost values.

In step S401, specifically, it may include:

in step S4011, M recovery estimation behavior description knowledge fields are determined according to each first estimation behavior description knowledge field, where M is the number of first estimation session sets corresponding to the training user behavior log.

The recovery estimated behavior description knowledge field is an estimated behavior description knowledge field obtained after the number level recovery of the first estimated behavior description knowledge field, and the number corresponding to the recovery estimated behavior description knowledge field is greater than the number corresponding to the first estimated behavior description knowledge field. Because the number of the first estimated behavior description knowledge fields output by each optimization module is smaller than the number of the behavior description knowledge fields of the first set of estimated sessions, the number of the second estimated behavior description knowledge fields output by each optimization module in the learning model is equal to the number of the behavior description knowledge fields of the first set of estimated sessions. After each first estimated behavior description knowledge field is obtained, recovering the number of each first estimated behavior description knowledge field to obtain M recovered estimated behavior description knowledge fields corresponding to each first estimated behavior description knowledge field. Such that the number of recovery putative behavioral description knowledge fields corresponds to the number of second putative behavioral description knowledge fields.

Step S4012, determining a first generation value corresponding to the optimization module according to the M recovery estimation behavior description knowledge fields and the second estimation behavior description knowledge field.

For M number of each recovery putative behavior description knowledge field, a second putative behavior description knowledge field corresponding to the recovery putative behavior description knowledge field may be obtained first, then a cost value between the recovery putative behavior description knowledge field and the second putative behavior description knowledge field is determined, then a first generation value is determined according to the cost value between every two putative behavior description knowledge fields (recovery putative behavior description knowledge field and corresponding second putative behavior description knowledge field), and the first generation value is regarded as the first generation value of the recovery putative behavior description knowledge field corresponding to the optimization module in the user classification model to be trained. And according to the two steps, determining the first generation value corresponding to each optimization module in the user classification model to be trained.

As an embodiment, step S4011 may include:

step S40111, performing global unified processing on the second knowledge field array corresponding to the first estimated behavior description knowledge field to obtain a first estimated dimension reduction knowledge field after global unified processing, and performing overturn processing on the knowledge field array corresponding to the first estimated dimension reduction knowledge field to obtain a third knowledge field array.

Step S40112, performing field compression on the third knowledge field array to obtain a second estimated dimension-reduction knowledge field, and performing expansion processing on the second estimated dimension-reduction knowledge field to obtain a third estimated dimension-reduction knowledge field.

Step S40113, performing field compression on the knowledge field array corresponding to the third estimated dimension reduction knowledge field, performing overturn processing on the knowledge field array after field compression to obtain a fourth knowledge field array, and determining M recovery estimation behavior description knowledge fields according to the fourth knowledge field array.

The array size of the fourth knowledge field array comprises M fields, and the field rank in the array size of the fourth knowledge field array is the field rank corresponding to the behavior description knowledge field of the first estimation session set. The process of recovering the first estimated behavior description knowledge field output by the second optimization module and the first estimated behavior description knowledge field output by the third optimization module in the user classification model to be trained may refer to the above process of recovering the first estimated behavior description knowledge field output by the first optimization module.

As an embodiment, for the determination of M recovery estimation performance description knowledge fields according to the fourth knowledge field array in step S40113, it may include: performing global unified processing on the fourth knowledge field array, and performing field compression on the fourth knowledge field array subjected to the global unified processing for a plurality of times to obtain a fifth knowledge field array; m recovery putative behavioral description knowledge fields are determined from the fifth knowledge field array and the fourth knowledge field array.

Wherein, for determining the first generation value corresponding to the optimization module according to the M recovery estimation behavior description knowledge fields and the second estimation behavior description knowledge field, the method may further include:

a: the first cost value is determined from the putative behavioral description knowledge field and the second putative behavioral description knowledge field.

B: and carrying out recovery processing on the recovery estimation behavior description knowledge field to obtain a first target estimation behavior description knowledge field associated with the recovery estimation behavior description knowledge field, and determining a third estimation tag attribution result corresponding to the first target estimation behavior description knowledge field.

The first cost value may be the cost value introduced in step S4011 and step S4012.

The embodiment of the application also provides an analysis module which analyzes the recovery estimated behavior description knowledge field and the second estimated behavior description knowledge field, determines the possibility that the recovery estimated behavior description knowledge field corresponds to the estimated behavior description knowledge field output for the learning model, and determines the possibility that the second estimated behavior description knowledge field corresponds to the estimated behavior description knowledge field output for the learning model. The first target estimated behavior description knowledge field is an estimated behavior description knowledge field obtained by compressing a field rank corresponding to the recovery estimated behavior description knowledge field. After each optimization module in the user classification model outputs each recovery estimation behavior description knowledge field, the recovery estimation behavior description knowledge field may be input to an analysis module, knowledge field analysis is performed on each recovery estimation behavior description knowledge field by the analysis module, for example, a field rank corresponding to each recovery estimation behavior description knowledge field is compressed, a field rank corresponding to each recovery estimation behavior description knowledge field is reduced, and a first target estimation behavior description knowledge field associated with each recovery estimation behavior description knowledge field is obtained. Then, classifying each first target putative behavioral description knowledge field by the analysis module, and determining the possibility that each first target putative behavioral description knowledge field corresponds to the putative behavioral description knowledge field output for the learning model as a third putative label attribution result corresponding to the first target putative behavioral description knowledge field.

C: and carrying out recovery processing on the second estimated behavior description knowledge field to obtain a second target estimated behavior description knowledge field associated with the second estimated behavior description knowledge field, and determining a fourth estimated tag attribution result corresponding to the second target estimated behavior description knowledge field.

The second target estimated behavior description knowledge field is an estimated behavior description knowledge field obtained by compressing a field rank corresponding to the second estimated behavior description knowledge field. The fourth putative tag attribution indicates a likelihood that each of the second target putative behavioral description knowledge fields corresponds to a putative behavioral description knowledge field output for the learning model. After each second estimated behavior description knowledge field output by each optimization module in the obtained learning model is input into an analysis module, the analysis module is used for respectively restoring each second estimated behavior description knowledge field, for example, the field rank corresponding to each second estimated behavior description knowledge field is compressed, the field rank corresponding to each second estimated behavior description knowledge field is reduced, and the second target estimated behavior description knowledge field associated with each second estimated behavior description knowledge field is obtained. In addition, the analysis module can classify the second target estimated behavior description knowledge fields to obtain the possibility that each second target estimated behavior description knowledge field corresponds to the estimated behavior description knowledge field output by the learning model, and the possibility is used as a fourth estimated tag attribution result corresponding to the second target estimated behavior description knowledge field.

D: and determining a second cost value according to the third estimated tag attribution result and the fourth estimated tag attribution result, and determining a first cost value according to the first cost value and the second cost value.

Wherein determining the second cost value based on the third putative tag attribution result and the fourth putative tag attribution result may include:

d1: and determining a third cost value according to the third estimated label attribution result and the first reference label attribution result corresponding to the third estimated label attribution result.

The third estimated tag attribution result indicates the possibility that each first target estimated behavior description knowledge field corresponds to the estimated behavior description knowledge field output for the learning model, each first target estimated behavior description knowledge field is obtained according to the first estimated behavior description knowledge field output by the optimizing module in the user classification model to be trained, so that the possibility that the first target estimated behavior description knowledge field corresponds to the estimated behavior description knowledge field output for the learning model is extremely low, cost value calculation is carried out with the third estimated tag attribution result, the first generation value is determined according to the obtained cost value, and then training is carried out on the optimizing module in the user classification model to be trained through the first generation value, so that the reliability of the first estimated behavior description knowledge field output by the optimizing module is improved.

D2: and determining a fourth sub-cost value according to the fourth estimated label attribution result and a second reference label attribution result corresponding to the fourth estimated label attribution result.

Wherein, because the fourth estimated tag attribution result indicates the possibility that each second target estimated behavior description knowledge field corresponds to the estimated behavior description knowledge field output for the learning model, and each second target estimated behavior description knowledge field is obtained according to the second estimated behavior description knowledge field output by the optimizing module in the learning model, the possibility that the second target estimated behavior description knowledge field corresponds to the estimated behavior description knowledge field output for the learning model is extremely high, the cost value is calculated with the fourth estimated tag attribution result, the first generation value is determined according to the obtained cost value, and then the optimizing module in the user classification model to be trained is trained by the first generation value, so that the output of the optimizing module in the user classification model to be trained is supervised and the reliability of the first estimated behavior description knowledge field output by the optimizing module is increased.

D3: and determining the second sub-cost value according to the third sub-cost value and the fourth sub-cost value.

Wherein the third cost value and the fourth cost value may be summed, and the result of the summation is taken as the second cost value. In addition, after the second cost value and the first cost value are obtained, the second cost value and the first cost value may be taken as the first cost value, or the second cost value and the first cost value may be summed up as the first cost value. Further, the first generation value corresponding to each optimization module in the user classification model to be trained can be determined according to the four steps of ABCD, and the optimization module can be trained through the first generation value corresponding to each optimization module.

In addition, for the step of determining the estimated cost value according to the first cost value and the second cost value corresponding to each optimization module, the step of determining the estimated cost value may include:

step S501, according to the first estimated label attribution result and the reference label attribution result corresponding to the training user behavior log, determining the confidence estimated cost value corresponding to the user classification model to be trained. The estimated cost value comprises a confidence estimated cost value between a first estimated tag attribution result and a reference tag attribution result corresponding to the training user behavior log. The confidence inferred cost value may indicate a cost value between a first inferred tag attribution result output by the user classification model to be trained and a reference tag attribution result corresponding to the training user behavior log. The attribution result of the reference label corresponding to the training user behavior log is the actual attribution result corresponding to the training user behavior log. And calculating the cost through the first estimated label attribution result and the reference label attribution result corresponding to the training user behavior log, and determining the confidence estimated cost value corresponding to the user classification model to be trained.

Step S502, determining an estimated cost value according to the first cost value, the second cost value and the confidence estimated cost value corresponding to each optimization module.

The first cost value, the determined second cost value, and the determined confidence estimated cost value corresponding to each optimization module in the user classification model to be trained may be taken as estimated cost values.

As an embodiment, for the first cost value and the second cost value corresponding to each optimization module, determining the estimated cost value may include:

step S601, carrying out knowledge field analysis on the training user behavior log through a preset machine learning model, and determining a fifth estimated label attribution result corresponding to the training user behavior log.

The preset machine learning model can be a neural network model obtained through a general means, and can classify knowledge field data. The fifth putative tag attribution result may indicate a likelihood that the training user behavior log corresponds to each behavior classification. Knowledge field analysis can be performed on the training user behavior log through a preset machine learning model, and a fifth estimated tag attribution result corresponding to the training user behavior log output by the preset machine learning model is determined.

Step S602, determining the third generation value of the user classification model to be trained according to the fifth estimated label attribution result and the first estimated label attribution result.

The cost value can be calculated through the fifth estimated tag attribution result and the first estimated tag attribution result, and the cost value is used as the third generation value of the user classification model to be trained.

Step S603, determining an estimated cost value according to the first cost value, the second cost value, and the third cost value.

The first generation value, the second generation value, and the third generation value may be regarded as estimated cost values, the first generation value, the second generation value, the third generation value, and the confidence estimated cost value in the above embodiments may be regarded as estimated cost values together, and the first generation value, the second generation value, the third generation value, and the confidence estimated cost value may be combined to be regarded as estimated cost values. In addition, as an embodiment, the user classification model to be trained may be trained by using one or more of the cost values of the above embodiments, such as the first generation value, the second generation value, the third generation value, and the confidence estimated cost value, as the estimated cost value, which is not limited by the embodiment of the present application.

Referring to fig. 3, an architecture diagram of a user classification device 110 according to an embodiment of the invention is provided, the user classification device 110 may be used to execute a user classification method based on big data, wherein the user classification device 110 includes:

the log receiving module 111 is configured to receive a user behavior log of a user to be classified.

The knowledge field extraction module 112 is configured to determine P first session sets corresponding to the user behavior log and a behavior description knowledge field corresponding to each first session set according to the user behavior log.

The priority determining module 113 is configured to determine, for each first session set, a feature difference priority corresponding to the first session set according to the behavior description knowledge field of the first session set.

And the optimization module 114 is configured to optimize the behavior description knowledge fields corresponding to the P first session sets respectively according to the feature difference priorities corresponding to each first session set, so as to obtain Q second session sets and the behavior description knowledge fields corresponding to each second session set, where Q is smaller than P.

The tag determining module 115 is configured to determine, according to the behavior description knowledge field corresponding to each second session set, a personalized behavior tag of the user behavior log.

The classification module 116 is configured to determine a class attribute corresponding to the personalized behavior tag according to a preset personalized tag mapping relationship.

The log receiving module 111 may be configured to perform step S1; the knowledge field extraction module 112 may be configured to perform step S2; the priority determining module 113 may be configured to perform step S3; the optimization module 114 is operable to perform step S4; the tag determination module 115 may be used to perform step S5; the classification module 116 may be used to perform step S6.

Since in the above embodiment, the detailed description has been made of the user classification method based on big data provided in the embodiment of the present application, and the principle of the user classification device 110 is the same as that of the method, the execution principle of each module of the user classification device 110 will not be described in detail here.

In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other manners. The apparatus embodiments described above are merely illustrative, for example, of the flowcharts and block diagrams in the figures that illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

In addition, functional modules in the embodiments of the present invention may be integrated together to form a single part, or each module may exist alone, or two or more modules may be integrated to form a single part.

The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

It is to be understood that the terminology which does not make a noun interpretation with respect to the above description is not to be interpreted as a noun interpretation, and that the skilled person can unambiguously ascertain the meaning to which it refers from the above disclosure. The foregoing disclosure of embodiments of the present application will be apparent to and complete in light of the foregoing disclosure to those skilled in the art. It should be appreciated that the development and analysis of technical terms not explained based on the above disclosure by those skilled in the art is based on the description of the present application, and thus the above is not an inventive judgment of the overall scheme.

While the basic concepts have been described above, it will be apparent to those skilled in the art that the foregoing detailed disclosure is by way of example only and is not intended to be limiting. Although not explicitly described herein, various modifications, improvements and adaptations of the application may occur to one skilled in the art. Such modifications, improvements, and modifications are intended to be suggested within the present disclosure, and therefore, such modifications, improvements, and adaptations are intended to be within the spirit and scope of the exemplary embodiments of the present disclosure.

It should also be appreciated that in the foregoing description of at least one embodiment of the application, various features are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure aiding in the understanding of at least one embodiment of the application. This method of disclosure, however, is not intended to imply that more features than are required by the subject application. Indeed, less than all of the features of a single embodiment disclosed above.

Claims

1. A big data based user classification method, characterized by being applied to a classification system, the method comprising:

receiving a user behavior log of a user to be classified; then, according to the user behavior log, determining P first session sets corresponding to the user behavior log and behavior description knowledge fields corresponding to each first session set;

for each first session set, determining a feature difference priority corresponding to the first session set according to a behavior description knowledge field of the first session set; optimizing the behavior description knowledge fields corresponding to the P first session sets respectively according to the characteristic difference priority corresponding to each first session set to obtain Q second session sets and behavior description knowledge fields corresponding to each second session set, wherein Q is smaller than P;

according to the behavior description knowledge field corresponding to each second session set, determining the personalized behavior label of the user behavior log; determining a category attribute corresponding to the personalized behavior tag according to a preset personalized tag mapping relation;

wherein for each first session set, determining, according to a behavior description knowledge field of the first session set, a feature difference priority corresponding to the first session set includes:

determining the feature difference priority corresponding to the first session set according to the dimension reduction knowledge field corresponding to each first session set;

wherein determining the feature difference priority corresponding to the first session set according to the dimension reduction knowledge field corresponding to each first session set includes:

performing field compression on the dimension reduction knowledge field subjected to global unified processing to obtain a first transition knowledge field; wherein, the field rank corresponding to the first transition knowledge field is smaller than the field rank corresponding to the dimension reduction knowledge field;

2. The method of claim 1, wherein performing global unified processing on the dimension reduction knowledge field corresponding to each first session set to obtain a globally unified processed dimension reduction knowledge field, includes:

3. The method according to claim 1 or 2, wherein the performing field compression on the globally unified dimension-reduced knowledge field to obtain a first transitional knowledge field includes:

performing field compression on the dimension reduction knowledge field after global unified processing through the rank reduction proportion to obtain the first transition knowledge field;

the determining the feature difference priority corresponding to each first session set according to the first transition knowledge field corresponding to each first session set includes:

4. The method of claim 3, wherein optimizing the behavior description knowledge fields corresponding to the P first session sets according to the feature difference priority corresponding to each first session set to obtain Q second session sets and the behavior description knowledge fields corresponding to each second session set includes:

performing overturn processing on the first knowledge field array to obtain a second knowledge field array, wherein the size of the second knowledge field array is Q multiplied by P;

optimizing the behavior description knowledge fields respectively corresponding to the P first session sets according to the second knowledge field array and the session set array corresponding to the behavior description knowledge fields of the first session sets to obtain the Q second session sets and the behavior description knowledge fields corresponding to each second session set;

The step of determining the personalized behavior label of the user behavior log according to the behavior description knowledge field corresponding to each second session set comprises the following steps:

replacing the second session set with a first session set, and correcting the value corresponding to the P to be the value of the first session set;

circularly executing the steps of determining the feature difference priority corresponding to the first session set according to the behavior description knowledge field of the first session set for each first session set;

when the number of the loops meets a preset condition, determining a label attribution confidence result corresponding to the user behavior log according to behavior description knowledge fields corresponding to each second session set obtained last time;

5. The method of claim 4, wherein the method is implemented by a user classification model, the user classification model being trained by:

acquiring a training user behavior log;

inputting the training user behavior log into a user classification model to be trained, wherein the user classification model comprises a plurality of optimization modules;

6. The method of claim 5, wherein the determining the estimated cost value of the user classification model to be trained based on the first estimated behavioral description knowledge field, the second estimated behavioral description knowledge field, the first estimated tag attribution result, and the second estimated tag attribution result comprises:

determining the estimated cost value according to the first cost value and the second cost value corresponding to each optimization module;

the determining, according to the first estimated behavior description knowledge field and the second estimated behavior description knowledge field corresponding to the optimization module, the first generation value corresponding to the optimization module includes:

performing field compression on a knowledge field array corresponding to the third estimated dimension reduction knowledge field, performing overturn processing on the knowledge field array after field compression to obtain a fourth knowledge field array, and determining the M recovery estimated behavior description knowledge fields according to the fourth knowledge field array, wherein the array size of the fourth knowledge field array comprises M fields, the field rank in the array size of the fourth knowledge field array is the field rank corresponding to the behavior description knowledge field of the first estimated session set, and M is the number of the first estimated session set corresponding to the training user behavior log;

determining a first generation value corresponding to the optimization module according to the M recovery estimation behavior description knowledge fields and the second estimation behavior description knowledge fields;

wherein said determining said M recovery putative behavioral description knowledge fields from said fourth array of knowledge fields comprises:

determining the M recovery putative behavioral description knowledge fields from the fifth knowledge field array and the fourth knowledge field array;

wherein the determining, according to the M pieces of recovery putative behavioral description knowledge fields and the second putative behavioral description knowledge fields, the first generation value corresponding to the optimization module includes:

7. The method of claim 6, wherein the determining a second cost value based on the third estimated tag home result and the fourth estimated tag home result comprises:

determining the second cost value according to the third cost value and the fourth cost value;

wherein the determining the estimated cost value according to the first cost value and the second cost value corresponding to each optimization module includes:

Determining the estimated cost value according to the first cost value, the second cost value and the confidence estimated cost value corresponding to each optimization module;

8. A user classification system comprising a processor and a memory interconnected, said memory having stored therein a computer program which, when executed by said processor, implements the method of any of claims 1-7.