CN115423037A

CN115423037A - Big data-based user classification method and system

Info

Publication number: CN115423037A
Application number: CN202211183765.4A
Authority: CN
Inventors: 马萃; 锁海娇
Original assignee: Individual
Current assignee: Hubei Central China Technology Development Of Electric Power Co ltd
Priority date: 2022-09-27
Filing date: 2022-09-27
Publication date: 2022-12-02
Anticipated expiration: 2042-09-27
Also published as: CN115423037B

Abstract

According to the big data-based user classification method and system provided by the embodiment of the invention, through acquiring the characteristic difference priority which can reflect the forward excitation of each first session set to the obtained personalized behavior tag, the behavior description knowledge fields respectively corresponding to the first session sets are optimized, repeated and invalid behavior description knowledge fields are cleaned, the behavior description knowledge fields which do not have the forward excitation or have the forward excitation smaller than the expected behavior description knowledge fields are filtered, and the behavior description knowledge fields corresponding to Q second session sets which influence the obtained personalized behavior tag are obtained on the basis of the behavior description knowledge fields. And then determining the personalized behavior tag of the user behavior log through behavior description knowledge fields corresponding to the Q second session sets. The reliability of the personalized behavior label is ensured, the reliability of user classification is improved, and the data processing amount of the behavior description knowledge field is relieved because Q is less than P, so that the analysis efficiency is increased, and the calculation consumption is relieved.

Description

Big data-based user classification method and system

Technical Field

The application relates to the field of internet services, in particular to a user classification method and system based on big data.

Background

The user classification is an indispensable link for service push in the internet field, and the reasonable and accurate user classification can increase the conversion rate of service push. In the process of service interaction, users can not avoid generating various conversations, such as information search, product evaluation and associated click in the process of e-commerce shopping; the use of articles, language exchange and consumption history in the game operation process; author motivation behavior, paid browsing behavior, comment messages, and the like in the video browsing process. The series of user behaviors can reflect the preference of the user in a certain situation, so that the user behavior data can be analyzed from certain dimensions, and the user can be classified to meet the accuracy of service pushing.

Disclosure of Invention

The present invention aims to provide a method and a system for classifying users based on big data, which improve the above problems.

The embodiment of the invention is realized as follows:

in a first aspect, an embodiment of the present disclosure provides a big data-based user classification method, including: receiving a user behavior log of a user to be classified; according to the user behavior log, determining P first session sets corresponding to the user behavior log and a behavior description knowledge field corresponding to each first session set;

for each first conversation set, determining the characteristic difference priority corresponding to the first conversation set according to the behavior description knowledge field of the first conversation set;

optimizing behavior description knowledge fields corresponding to the P first session sets respectively according to the characteristic difference priority corresponding to each first session set to obtain Q second session sets and the behavior description knowledge fields corresponding to each second session set, wherein Q is less than P;

determining a personalized behavior tag of the user behavior log according to a behavior description knowledge field corresponding to each second session set;

and determining the category attribute corresponding to the personalized behavior tag according to a preset personalized tag mapping relation.

In the embodiment of the application, by acquiring the characteristic difference priority which can reflect the forward excitation of each first session set to the acquired personalized behavior tag, behavior description knowledge fields corresponding to the P first session sets are optimized, and repeated and invalid behavior description knowledge fields can be cleaned. For example, filtering out behavior description knowledge fields that have no forward incentives or less than desired for the obtained personalized behavior tag; based on this, behavior description knowledge fields corresponding to the Q second session sets that affect the obtained personalized behavior tag are obtained. Then, determining personalized behavior labels of the user behavior logs through behavior description knowledge fields corresponding to the Q second session sets; the reliability of the personalized behavior label is ensured, the reliability of user classification is further improved, and the data processing amount of the behavior description knowledge field is relieved because Q is less than P, so that the analysis efficiency is increased, and the calculation consumption is relieved.

Optionally, for each first session set, determining a feature difference priority corresponding to the first session set according to a behavior description knowledge field of the first session set includes:

for each first session set, performing dimensionality reduction operation on the behavior description knowledge fields corresponding to the first session set to obtain dimensionality reduction knowledge fields corresponding to the first session set;

and determining the characteristic difference priority corresponding to the first session set according to the dimension reduction knowledge field corresponding to each first session set.

In the embodiment of the application, the dimension of the behavior description knowledge field can be reduced through dimension reduction operation, and the capacity of the knowledge field is reduced so as to facilitate subsequent processing, thereby facilitating more accurate acquisition of the feature difference priority corresponding to the first session set.

Optionally, the determining, according to the dimension reduction knowledge field corresponding to each first session set, the feature difference priority of the first session set includes:

carrying out global unified processing on the dimensionality reduction knowledge fields corresponding to each first session set to obtain globally unified processed dimensionality reduction knowledge fields;

performing field compression on the dimensionality reduction knowledge field subjected to global unified processing to obtain a first transition knowledge field, wherein the field rank corresponding to the first transition knowledge field is smaller than the field rank corresponding to the dimensionality reduction knowledge field;

and determining the characteristic difference priority corresponding to each first session set according to the first transition knowledge field corresponding to each first session set.

In the embodiment of the application, through the global unified processing of the dimension reduction knowledge fields corresponding to each first session set, the dimension reduction knowledge fields can be unified, in other words, the dimension reduction knowledge fields are defined within a range, and then through field compression, the field rank of the globally unified processed dimension reduction knowledge fields can be shortened, so that the computation consumption and time for determining the feature difference priority corresponding to the first session set are reduced.

Optionally, the performing global unified processing on the dimension reduction knowledge field corresponding to each first session set to obtain a dimension reduction knowledge field after global unified processing includes:

determining a global uniform coefficient corresponding to each first session set according to the dimensionality reduction knowledge field corresponding to each first session set;

and carrying out global unified processing on the dimensionality reduction knowledge fields corresponding to each first session set according to the global unified coefficient corresponding to each first session set to obtain the dimensionality reduction knowledge fields subjected to global unified processing.

In the embodiment of the application, according to the determined global unification coefficient, data weighting can be performed on the dimension reduction knowledge fields corresponding to the first session sets, so that the obtained dimension reduction knowledge fields after global unification processing have higher reliability, and the reliability of the feature difference priority corresponding to the first session sets is increased conveniently.

Optionally, the performing field compression on the globally and uniformly processed dimension reduction knowledge field to obtain a first transition knowledge field includes:

determining a rank minus proportion corresponding to the globally and uniformly processed dimension reduction knowledge field according to the globally and uniformly processed dimension reduction knowledge field;

and performing field compression on the globally and uniformly processed dimension reduction knowledge field according to the rank reduction ratio to obtain the first transition knowledge field.

In the embodiment of the application, field compression is performed on the dimensionality reduction knowledge fields subjected to global unified processing through rank reduction ratio, multi-level data extrusion can be performed on the dimensionality reduction knowledge fields subjected to different global unified processing, repeated and invalid knowledge fields are reduced, the first transition knowledge field is obtained, the reliability and the simplicity of the first transition knowledge field are guaranteed, the accuracy of the feature difference priorities corresponding to the first session set is further improved conveniently, and the acquisition barrier of the feature difference priorities corresponding to the first session set is reduced.

Optionally, the determining, according to the first transition knowledge field corresponding to each first session set, the feature difference priority corresponding to each first session set includes:

expanding each first transition knowledge field, and performing field compression on the expanded first transition knowledge fields to obtain second transition knowledge fields corresponding to each first transition knowledge field;

and determining the characteristic difference priority corresponding to each first conversation set according to each second transition knowledge field.

In the embodiment of the application, the field space is more enriched by performing extension processing on the first transition knowledge field, the obtained second transition knowledge field has more complete information, and further, the reliability of the feature difference priority corresponding to each obtained first session set can be increased.

Optionally, the optimizing, according to the feature difference priority corresponding to each first session set, the behavior description knowledge fields corresponding to the P first session sets respectively to obtain Q second session sets and the behavior description knowledge field corresponding to each second session set includes:

determining a first knowledge field array corresponding to the characteristic difference priority according to the characteristic difference priority corresponding to each first session set, wherein the array size of the first knowledge field array is P multiplied by Q;

turning over the first knowledge field array to obtain a second knowledge field array with the array size of QxP;

and optimizing the behavior description knowledge fields corresponding to the P first session sets respectively according to the second knowledge field arrays and the session set arrays corresponding to the behavior description knowledge fields of the first session sets to obtain the Q second session sets and the behavior description knowledge fields corresponding to each second session set.

In the embodiment of the application, according to the array size inversion, the sizes of the knowledge fields corresponding to the feature difference priorities in the first knowledge field array can be inverted, so that the obtained second knowledge field array can be multiplied by the session set array, and further, according to the product optimization corresponding to the behavior description knowledge fields corresponding to the first session sets in the session set array, the repeated and invalid behavior description knowledge fields can be cleaned, so that the behavior description knowledge fields corresponding to the Q second session sets affecting the obtained personalized behavior tags are obtained, the calculation consumption is saved, and the analysis efficiency is increased.

Optionally, the determining a personalized behavior tag of the user behavior log according to the behavior description knowledge field corresponding to each second session set includes:

replacing the second session set with a first session set, correcting the numerical value corresponding to the P into the numerical value of the first session set, circularly executing the step of determining the characteristic difference priority corresponding to the first session set according to the behavior description knowledge field of the first session set for each first session set, and determining the label attribution confidence result corresponding to the user behavior log according to the behavior description knowledge field corresponding to each second session set obtained at the last time when the number of circulation meets a preset condition;

and determining the personalized behavior label of the user behavior log according to the label attribution confidence result.

In the embodiment of the application, the process of deleting the repeated and invalid behavior description knowledge fields corresponding to the user behavior log can be repeatedly carried out, the repeated and invalid behavior description knowledge fields are accurately cleaned, the behavior description knowledge fields affecting the obtained personalized behavior tags are obtained, and based on the result, the accurate personalized behavior tags can be obtained according to the tag attribution confidence results of the obtained user behavior log corresponding to the personalized behavior tags.

Optionally, the above steps are implemented by a user classification model, and the user classification model is obtained by training through the following steps; acquiring a training user behavior log; inputting the training user behavior log into a user classification model to be trained, wherein the user classification model comprises a plurality of optimization modules;

outputting a first presumption behavior description knowledge field through an optimization module of the user classification model to be trained, and determining a first presumption label attribution result corresponding to the training user behavior log;

the optimization module is used for determining the characteristic difference priority of a first presumption session set according to the behavior description knowledge field of the first presumption session set corresponding to the training user behavior log, and optimizing the behavior description knowledge fields corresponding to M first presumption session sets respectively according to the characteristic difference priority corresponding to each first presumption session set to obtain N target presumption session sets and the first presumption behavior description knowledge field corresponding to each target presumption session set;

inputting the training user behavior log into a preset learning model, outputting a second presumed behavior description knowledge field through each optimization module of the learning model, and determining a second presumed label attribution result corresponding to the training user behavior log;

and determining the estimated cost value of the user classification model to be trained according to the first estimated behavior description knowledge field, the second estimated behavior description knowledge field, the first estimated label attribution result and the second estimated label attribution result, and training the user classification model to be trained until convergence through the estimated cost value.

In the embodiment of the application, according to a second inferred behavior description knowledge field and a second inferred label attribution result output by a preset learning model and a first inferred behavior description knowledge field and a first inferred label attribution result output by a user classification model to be trained, the user classification model completes high-quality knowledge extraction to obtain an accurate inferred cost value, and then the user classification model to be trained is trained through the inferred cost value to improve the reasoning accuracy of the user classification model to be trained.

Optionally, the determining, according to the first assumed behavior description knowledge field, the second assumed behavior description knowledge field, the first assumed tag attribution result, and the second assumed tag attribution result, an assumed cost value of the user classification model to be trained includes:

for each optimization module in the user classification model to be trained, determining a first generation value corresponding to the optimization module according to a first presumed behavior description knowledge field and a second presumed behavior description knowledge field corresponding to the optimization module;

determining a second generation value of the user classification model to be trained according to the first estimation label attribution result and the second estimation label attribution result;

and determining the estimated cost value according to the first cost value and the second cost value corresponding to each optimization module.

In the embodiment of the application, according to a first presumed behavior description knowledge field and a second presumed behavior description knowledge field corresponding to an optimization module, a first generation value of the optimization module in the process of presuming the behavior description knowledge field can be obtained, according to a first presumed label attribution result and a second presumed label attribution result, a second generation value of a user classification model to be trained in the process of presuming a last label attribution result can be obtained, according to the first generation value and the second generation value, a presumed generation value associated with the optimization module and the user classification model to be trained in the process of presuming the last label attribution result can be obtained, the user classification model to be trained is trained through the presumed cost value, the accuracy of the optimization module in a final user classification model can be improved, the accuracy of the user classification model in outputting a label attribution result is improved, and the reliability of an individualized behavior label output by the user classification model is improved finally.

Optionally, the determining, according to the first assumed behavior description knowledge field and the second assumed behavior description knowledge field corresponding to the optimization module, the first generation value corresponding to the optimization module includes:

determining M recovery presumed behavior description knowledge fields according to the first presumed behavior description knowledge fields;

the M is the number of the first presumption session sets corresponding to the training user behavior logs;

and determining a first generation value corresponding to the optimization module according to the M recovery presumed behavior description knowledge fields and the second presumed behavior description knowledge field.

In the embodiment of the present application, since the first assumed behavior description knowledge field output by the user classification model to be trained is an assumed behavior description knowledge field from which repeated and invalid behavior description knowledge fields are removed, the number of the first assumed behavior description knowledge fields is smaller than the number of the first assumed session set. The number of the second presumed behavior description knowledge fields output by the learning model is the same as that of the first presumed session set, so that the number of the knowledge fields corresponding to the first presumed behavior description knowledge field output by the user classification model to be trained can be restored according to the obtained M restored presumed behavior description knowledge fields, the number of the knowledge fields corresponds to that of the second presumed behavior description knowledge fields, the restored presumed behavior description knowledge fields and the second presumed behavior description knowledge fields are compared one by one, the cost value between each restored presumed behavior description knowledge field and the second presumed behavior description knowledge field is obtained, the knowledge refinement of the user classification model to be trained is completed, and then the reliable and accurate first generation value is obtained through the cost value between each restored presumed behavior description knowledge field and the second presumed behavior description knowledge field.

Optionally, the determining M recovered putative behavior description knowledge fields according to each first putative behavior description knowledge field includes:

globally and uniformly processing a second knowledge field array corresponding to the first presumed behavior description knowledge field to obtain a first presumed dimension reduction knowledge field after global uniform processing, and turning over the knowledge field array corresponding to the first presumed dimension reduction knowledge field to obtain a third knowledge field array;

performing field compression on the third knowledge field array to obtain a second presumed dimension-reduction knowledge field, and performing expansion processing on the second presumed dimension-reduction knowledge field to obtain a third presumed dimension-reduction knowledge field;

and performing field compression on a knowledge field array corresponding to the third presumption dimension reduction knowledge field, performing turnover processing on the knowledge field array after the field compression to obtain a fourth knowledge field array, and determining the M recovery presumption behavior description knowledge fields according to the fourth knowledge field array, wherein the array size of the fourth knowledge field array comprises M fields, and the field rank in the array size of the fourth knowledge field array is the field rank corresponding to the behavior description knowledge field of the first presumption session set.

In this embodiment of the present application, according to processing manners such as global unified processing, array size flipping, field compression, and the like, the optimization module may perform reverse processing on each processing procedure performed on the behavior description knowledge field corresponding to the first inference session set, and complete recovery of a corresponding number of the first inference behavior description knowledge fields to obtain M recovered inference behavior description knowledge fields, so that the recovered inference behavior description knowledge fields correspond to the second inference behavior description knowledge fields in number.

Optionally, said determining the M resumed behavioural description knowledge fields from the fourth array of knowledge fields comprises:

carrying out global unified processing on the fourth knowledge field array, and carrying out field compression on the fourth knowledge field array subjected to global unified processing for multiple times to obtain a fifth knowledge field array;

determining the M reinstatement presumptive behavior description knowledge fields based on the fifth array of knowledge fields and the fourth array of knowledge fields.

In the embodiment of the application, according to global unified processing and multiple field compression performed on the fourth knowledge field array, information of each inferred behavior description knowledge field in the fourth knowledge field array can be perfected to obtain a fifth knowledge field array, and then according to splicing of the inferred behavior description knowledge fields in the fifth knowledge field array and the fourth knowledge field array, residual error connection can be performed on the inferred behavior description knowledge fields in the knowledge field arrays, so that network degradation is prevented, and accurate recovered inferred behavior description knowledge fields are further obtained.

Optionally, the determining, according to the M recovered putative behavior description knowledge fields and the second putative behavior description knowledge field, a first generation value corresponding to the optimization module includes:

determining a first classification value according to the M recovery presumed behavior description knowledge fields and the second presumed behavior description knowledge field;

recovering the recovered presumed behavior description knowledge field to obtain a first target presumed behavior description knowledge field associated with the recovered presumed behavior description knowledge field, and acquiring a third presumed tag attribution result corresponding to the first target presumed behavior description knowledge field;

recovering the second presumed behavior description knowledge field to obtain a second target presumed behavior description knowledge field associated with the second presumed behavior description knowledge field, and acquiring a fourth presumed tag attribution result corresponding to the second target presumed behavior description knowledge field;

determining a second cost-sharing value according to the third presumed tag attribution result and the fourth presumed tag attribution result;

and determining the first generation value according to the first generation value and the second generation value.

In the embodiment of the application, according to the M recovered presumed behavior description knowledge fields and the second presumed behavior description knowledge field, the first generation value of the optimization module in presuming the behavior description knowledge field can be obtained. According to the recovery processing of the recovery presumed behavior description knowledge field, the recovery presumed behavior description knowledge field can be analyzed, the second cost value when the recovery presumed behavior description knowledge field is output can be obtained, the user classification model is trained according to the first cost value obtained by the second cost value and the first cost value, and the reliability of the recovery presumed behavior description knowledge field output by the optimization module can be improved.

Optionally, the determining a second cost score according to the third presumed tag attribution result and the fourth presumed tag attribution result includes:

determining a third cost score according to the third presumed tag attribution result and a first reference tag attribution result corresponding to the third presumed tag attribution result;

determining a fourth cost value according to the fourth presumed tag attribution result and a second reference tag attribution result corresponding to the fourth presumed tag attribution result;

and determining the second sub-cost value according to the third sub-cost value and the fourth sub-cost value.

In the embodiment of the application, the learning model and the user classification model to be trained respectively correspond to different reference label attribution results, so that the output estimation label attribution results have different targets, the substitution value is obtained through different reference label attribution results, and the reliability of obtaining the third substitution value and the fourth substitution value can be increased, so that the second substitution value can be further accurately obtained.

Optionally, the determining the estimated cost value according to the first cost value and the second cost value corresponding to each optimization module includes:

determining a confidence estimation cost value corresponding to the user classification model to be trained according to the first estimation label attribution result and the reference label attribution result corresponding to the training user behavior log;

and determining the estimated cost value according to the first cost value, the second cost value and the confidence estimated cost value corresponding to each optimization module.

In the embodiment of the application, according to the first estimated tag attribution result and the reference tag attribution result corresponding to the training user behavior log, the confidence estimated cost value, which is the cost between the first estimated tag attribution result output by the user classification model to be trained and the actual reference tag attribution result, can be obtained, and the reliability of the estimated tag attribution result output by the user classification model can be increased by training the user classification model to be trained through the cost.

analyzing the knowledge field of the training user behavior log through a preset machine learning model, and determining a fifth estimation label attribution result corresponding to the training user behavior log;

determining a third generation value of the user classification model to be trained according to the fifth inferred label attribution result and the first inferred label attribution result;

determining the putative cost value from the first cost value, the second cost value, and the third cost value.

In the embodiment of the application, according to the fifth estimated label attribution result and the first estimated label attribution result determined by the preset machine learning model, the third generation value between the estimated label attribution result output by the user classification model to be trained and the preset machine learning model can be obtained, the process is richer, and the training singleness of the user classification model is avoided.

In a second aspect, an embodiment of the present application provides a user classification system, which includes a processor and a memory that are connected to each other, where the memory stores a computer program that, when executed by the processor, implements the method as provided in the first aspect of the embodiment of the present application.

In the following description, other features will be set forth in part. These features will be in part apparent to those skilled in the art upon examination of the following and the accompanying drawings, or may be learned by production or use. The features of the present application may be realized and attained by practice or use of various aspects of the methodologies, instrumentalities and combinations particularly pointed out in the detailed examples which follow.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained from the drawings without inventive effort.

The methods, systems, and/or programs of the figures will be further described in accordance with the exemplary embodiments. These exemplary embodiments will be described in detail with reference to the drawings. These exemplary embodiments are non-limiting exemplary embodiments in which reference numerals represent similar mechanisms throughout the various views of the drawings.

FIG. 1 is a block diagram of a big data based user classification system, shown in accordance with some embodiments of the present application.

FIG. 2 is a flow diagram illustrating a big data based user classification method according to some embodiments of the present application.

Fig. 3 is a schematic structural diagram of a user classification device according to an embodiment of the present application.

Detailed Description

In order to better understand the technical solutions, the technical solutions of the present application are described in detail below with reference to the drawings and specific embodiments, and it should be understood that the specific features in the embodiments and examples of the present application are detailed descriptions of the technical solutions of the present application, and are not limitations of the technical solutions of the present application, and the technical features in the embodiments and examples of the present application may be combined with each other without conflict.

In the following detailed description, numerous specific details are set forth by way of examples in order to provide a thorough understanding of the relevant guidance. It will be apparent, however, to one skilled in the art that the present application may be practiced without these specific details. In other instances, well-known methods, procedures, systems, components, and/or circuits have been described at a relatively high-level, without detail, in order to avoid unnecessarily obscuring aspects of the present application.

These and other features, functions, methods of execution, and combination of functions and elements of related elements in the structure and economies of manufacture disclosed in the present application may become more apparent upon consideration of the following description with reference to the accompanying drawings, all of which form a part of this application. It is to be expressly understood, however, that the drawings are for the purpose of illustration and description only and are not intended as a definition of the limits of the application. It should be understood that the drawings are not to scale. It is to be expressly understood, however, that the drawings are for the purpose of illustration and description only and are not intended as a definition of the limits of the application. It should be understood that the drawings are not to scale.

Flowcharts are used herein to illustrate the implementations performed by systems according to embodiments of the present application. It should be expressly understood that the processes performed by the flowcharts may be performed out of order. Rather, these implementations may be performed in the reverse order or simultaneously. In addition, at least one other implementation may be added to the flowchart. One or more implementations may be deleted from the flowchart.

Fig. 1 is an architectural diagram of a user classification system 100 according to some embodiments of the present application, the user classification system 100 including a user classification device 110, a memory 120, a processor 130, and a communication unit 140. The memory 120, processor 130, and communication unit 140 are electrically connected to each other, directly or indirectly, to enable data transmission or interaction. For example, the components may be electrically connected to each other via one or more communication buses or signal lines. The user classifying means 110 includes at least one software function module which may be stored in the memory 120 in the form of software or firmware (firmware) or solidified in an Operating System (OS) of the user classifying system 100. The processor 130 is used to execute executable modules stored in the memory 120, such as software functional modules and computer programs included in the teleeducation-based business information processing apparatus 110.

The Memory 120 may be, but is not limited to, a Random Access Memory (RAM), a Read Only Memory (ROM), a Programmable Read-Only Memory (PROM), an Erasable Read-Only Memory (EPROM), an electrically Erasable Read-Only Memory (EEPROM), and the like. The memory 120 is used for storing a program, and the processor 130 executes the program after receiving the execution instruction. The communication unit 140 is used to establish a communication connection between the user classification system 100 and a terminal device through a network, and to transceive user behavior data through the network.

The processor may be an integrated circuit chip having signal processing capabilities. The Processor may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but may also be a Digital Signal Processor (DSP)), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components. The various methods, steps, and logic blocks disclosed in the embodiments of the present invention may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

It will be appreciated that the configuration shown in FIG. 1 is merely illustrative and that the user categorization system 100 can also include more or fewer components than shown in FIG. 1 or have a different configuration than shown in FIG. 1. The components shown in fig. 1 may be implemented in hardware, software, or a combination thereof.

Fig. 2 is a flowchart of a big data-based user classification method according to some embodiments of the present application, which is applied to the user classification system 100 in fig. 1, and may specifically include the following steps S1 to S6. On the basis of the following steps S1 to S6, some alternative embodiments will be described, which should be understood as examples and should not be understood as technical features essential for implementing the present solution.

Step S1, receiving a user behavior log of a user to be classified.

In the embodiment of the present application, the user behavior log is a set of behavior data generated when the user performs service interaction, and the user behavior log may be obtained in real time by the server 100 and formed periodically. These behavior data may be game operations, item purchases, language exchanges in a game scenario; the product browsing history, the consumption history and the click history in the e-commerce shopping process can be obtained; which may be an incentive, comment, click, etc. while the video or novel platform is browsing the content. In the user behavior log, various types of behaviors are stored in a partition dividing mode.

And S2, determining P first session sets corresponding to the user behavior logs and a behavior description knowledge field corresponding to each first session set according to the user behavior logs.

In the embodiment of the application, the first session set may be a plurality of user behavior data sets obtained by dividing a user behavior log according to a preset partition mode, each session set corresponds to one partition in the user behavior log, for example, in a game application scenario, the session set 1 corresponds to a prop purchase behavior data set of the user behavior log, the session set 2 corresponds to action track data of the user behavior log for map 1, the session set 3 corresponds to action track data of the user behavior log for map 2, the session set 4 corresponds to operation data of the user behavior log for BOSS1, the session set 5 corresponds to upgrade data of the user behavior log for prop 1, 8230, and in other internet application scenarios, the session sets may be divided according to actual scene requirements. The behavior description knowledge field (or vector) corresponding to each first session set is a behavior description knowledge field of a partition of the user behavior log corresponding to the first session set.

The P first session sets corresponding to the user behavior log and the behavior description knowledge field corresponding to each first session set may be directly obtained by the server 100 or obtained by a pre-trained user classification model. The user classification model may be, for example, an arbitrary neural network, and includes a plurality of optimization modules for performing optimization processing on the behavior description knowledge fields, and after the optimization, behavior description knowledge fields corresponding to the second session set are obtained. In the embodiment of the application, after the user behavior logs are obtained, the user behavior logs are input into the user classification model, and the user behavior logs are processed through the user classification model to obtain P first session sets corresponding to the user behavior logs and behavior description knowledge fields corresponding to each first session set.

And S3, for each first session set, determining the feature difference priority corresponding to the first session set according to the behavior description knowledge field of the first session set.

Wherein the feature difference priorities are used to indicate a forward stimulus of each first set of sessions to the obtained personalized behavior tag, one first set of sessions corresponding to one feature difference priority. For example, the feature difference priority may be a score, a score that corresponds to a degree of importance, or a proportion of positive incentives.

Specifically, the user behavior log is a game user behavior log, and if the behavior description knowledge field of a first session set corresponding to the user behavior log is behavior data of a user purchasing a prop, the purchasing prop can intuitively reflect the behavior trend of the user and is beneficial to classifying the user, it can be determined that the positive excitation of the obtained personalized behavior tag by the corresponding first session set is high, and therefore the score corresponding to the first session set is high; and if the behavior description knowledge field of the first session set corresponding to the user behavior log is the conversation content of the user in the game chat box, because the degree of engagement between the conversation content and the game is not high, the corresponding first session set has low forward excitation on the obtained personalized behavior tag, and even the knowledge field corresponding to the first session set can be considered as a redundant, repeated or invalid knowledge field, and is endowed with a low score. Optionally, for each first session set, knowledge field extraction may be performed on the behavior description knowledge field of the first session set, and according to a result of the knowledge field extraction, an excitation result of the behavior description knowledge field of the first session set on the obtained personalized behavior tag is determined, so as to determine a feature difference priority corresponding to the first session set according to an obtained excitation degree of the behavior description knowledge field of the first session set on the obtained personalized behavior tag. Optionally, in another embodiment, when the user classification model is used to process the user behavior log, after determining the behavior description knowledge field corresponding to each first session set, the user classification model may perform, through the optimization module, multiple compression on the behavior description knowledge field corresponding to each first session set in the P first session sets, obtain, according to a compression result, an excitation proportion (influence degree) corresponding to each first session set, perform, according to the excitation proportion corresponding to each first session set, and then perform, through the optimization module, multiple operations, such as linear transformation, on the behavior description knowledge field corresponding to each first session set, so as to obtain the feature difference priority corresponding to each first session set.

And S4, optimizing the behavior description knowledge fields corresponding to the P first session sets respectively according to the characteristic difference priority corresponding to each first session set to obtain Q second session sets and the behavior description knowledge fields corresponding to each second session set, wherein Q is less than P. Q is the number of the obtained second conversation set and the behavior description knowledge fields corresponding to the second conversation set. And the second session set is obtained by cleaning the repeated and invalid behavior description knowledge fields from the behavior description knowledge fields corresponding to the first session set according to the characteristic difference priority corresponding to each first session set. Q may be associated with P, e.g., Q = P/2.

And the behavior description knowledge field corresponding to each second session set is the behavior description knowledge field after the repeated and invalid behavior description knowledge field is removed. In an optional embodiment, after the feature difference priority corresponding to each first session set is obtained, according to the feature difference priority corresponding to each first session set, repeated and invalid behavior description knowledge fields in behavior description knowledge fields corresponding to P first session sets are determined, and then according to the determined repeated and invalid behavior description knowledge fields, behavior description knowledge fields corresponding to P first session sets are optimized to obtain Q second session sets and behavior description knowledge fields corresponding to each second session set, for example, by aggregating the behavior description knowledge fields. If the user behavior logs are processed through the user classification model, after the user classification model obtains the feature difference priority corresponding to each first session set, behavior description knowledge fields corresponding to the P first session sets are optimized according to the feature difference priority corresponding to each first session set, and therefore Q second session sets and the behavior description knowledge fields corresponding to each second session set are obtained. In addition, when the user behavior log is processed through the user classification model, each optimization module is executed through the above steps, for example, after the first optimization module optimizes the behavior description knowledge field corresponding to the first session set, Q second session sets and the behavior description knowledge field corresponding to each second session set are obtained, the second session set is replaced with the first session set, the value corresponding to P is corrected to the value of the first session set, then the second optimization module optimizes the behavior description knowledge field corresponding to each second session set output by the first optimization module again, and it is expected that in a cyclic manner, the behavior description knowledge field corresponding to each second session set output by the second optimization module can be optimized through the third optimization module, so as to obtain a new second session set and a behavior description knowledge field corresponding to the new second session set. The number of the second session sets output by each optimization module and the number of the behavior description knowledge fields corresponding to the output second session sets are less than the number of the second session sets output by the last optimization module and the number of the behavior description knowledge fields corresponding to the output second session sets.

In this way, repeated and invalid behavior description knowledge fields in the user behavior log are filtered according to the operation of the plurality of optimization modules, and behavior description knowledge fields of the personalized behavior tags obtained by forward excitation are obtained.

And S5, determining the personalized behavior labels of the user behavior logs according to the behavior description knowledge fields corresponding to the second session sets.

The personalized behavior tag represents a behavior classification matched with the user behavior log, for example, in a game scene, the personalized behavior tag matched with the user behavior log is 1, and tag 1 indicates that the user is consumption conservative, it is easy to understand that tag 1 may also be other expression forms, for example, tag a, and the matched user behavior classification may be set by oneself according to the actual situation, for example, it may also be consumption impulsive type, specific hero preference, specific map preference, specific prop preference, and the like, while in other application scenes, for example, a video platform, the behavior classification corresponding to the tag may be specific classification movie preference, specific author preference, specific country movie preference, and the like, and the other application scenes are not illustrated one by one. As an embodiment, knowledge field extraction may be performed on the behavior description knowledge field corresponding to each second session set, and the personalized behavior tag of the user behavior log may be determined according to the result.

And S6, determining the category attribute corresponding to the personalized behavior tag according to the preset personalized tag mapping relation.

In the embodiment of the application, a one-to-one mapping relation between the personalized behavior tag and the user classification is stored in advance, and after the behavior tag is obtained, the corresponding category attribute can be directly obtained through the mapping relation.

In the above, by obtaining the feature difference priority that can reflect the forward excitation of each first session set to the obtained personalized behavior tag, the behavior description knowledge fields corresponding to P first session sets are optimized, and the repeated and invalid behavior description knowledge fields can be cleaned. For example, filtering out behavior description knowledge fields that have no forward incentives or less than desirable forward incentives for the obtained personalized behavior tag; based on this, behavior description knowledge fields corresponding to Q second session sets that affect the obtained personalized behavior tag are obtained. Then, determining personalized behavior labels of the user behavior logs through behavior description knowledge fields corresponding to the Q second session sets; the reliability of the personalized behavior label is ensured, the reliability of user classification is further improved, and the data processing amount of the behavior description knowledge field is relieved because Q is less than P, so that the analysis efficiency is increased, and the calculation consumption is relieved.

In this embodiment of the application, for step S3, the user classification model may be executed, and the following may be executed by an optimization module thereof, for example, the following steps may be included:

and step S31, for each first session set, performing dimensionality reduction operation on the behavior description knowledge fields corresponding to the first session set to obtain dimensionality reduction knowledge fields corresponding to the first session set.

For each first session set, dimension reduction operation can be performed on the behavior description knowledge field corresponding to the first session set through an optimization module in the user classification model to obtain a dimension reduction knowledge field corresponding to the first session set, so that a dimension reduction knowledge field corresponding to each first session set is obtained. Wherein, the dimensionality reduction operation can be performed by an encoder arranged in the optimization module. The user behavior log is processed by the user classification model, and the behavior description knowledge fields corresponding to the obtained first session sets can be displayed in a knowledge field array (the display form of the knowledge field array can be understood as a knowledge field matrix), wherein one knowledge field (vector) in the knowledge field array corresponds to the behavior description knowledge field of one first session set. The array size of the knowledge field array corresponding to the first session set may be P × Q, where P is the number of the behavior description knowledge fields, P is less than D, and D is the field rank (the highest order of the effective knowledge field in the array) corresponding to the behavior description knowledge field obtained by the user classification model processing the user behavior log. And performing dimensionality reduction operation on each behavior description knowledge field in the knowledge field array with the size of P multiplied by D through an encoder of the optimization module to obtain a dimensionality reduction knowledge field corresponding to each first session set. The corresponding number of dimension reduction knowledge fields corresponds to the number of behavior description knowledge fields corresponding to the first set of sessions.

And step S32, determining the feature difference priority corresponding to the first session set according to the dimension reduction knowledge field corresponding to each first session set.

After the dimension reduction knowledge field corresponding to each first session set is obtained, the dimension reduction knowledge field corresponding to each first session set is processed again through the optimization module, and the feature difference priority corresponding to the first session set is obtained. The specific process can comprise the following steps:

step S321, performing global unified processing on the dimension reduction knowledge fields corresponding to each first session set to obtain dimension reduction knowledge fields after global unified processing.

And after the optimization module outputs the dimensionality reduction knowledge fields corresponding to the first session sets, carrying out global unified processing on the dimensionality reduction knowledge fields corresponding to each first session set to obtain the globally unified processed dimensionality reduction knowledge fields corresponding to each dimensionality reduction knowledge field. And the obtained number of the dimensionality reduction knowledge fields after the global unified processing corresponds to the number of the dimensionality reduction knowledge fields corresponding to the first session set. The global unification process is a process of standardizing the dimension reduction knowledge field, which can be performed using a norm function.

Step S321 may include:

step S3211, determining a global unification coefficient corresponding to each first session set according to the dimension reduction knowledge field corresponding to each first session set.

The global uniform coefficient is used for carrying out uniform weighting operation on the dimensionality reduction knowledge fields corresponding to the first session set. After the dimension reduction knowledge field corresponding to each first session set is obtained, knowledge field extraction is carried out on the dimension reduction knowledge field corresponding to each first session set, repeated and invalid knowledge fields in each dimension reduction knowledge field are determined, and then the global uniform coefficient corresponding to each dimension reduction knowledge field is determined based on the repeated and invalid knowledge fields in each dimension reduction knowledge field.

Step S3212, performing global unified processing on the dimensionality reduction knowledge field corresponding to each first session set according to the global unified coefficient corresponding to each first session set, to obtain a dimensionality reduction knowledge field after the global unified processing.

And step S322, performing field compression on the globally and uniformly processed dimension reduction knowledge field to obtain a first transition knowledge field.

And the field rank corresponding to the first transition knowledge field is less than the field rank corresponding to the dimension reduction knowledge field. The first transitional knowledge field is a knowledge field after field compression. Specifically, the optimization module may perform field compression (for example, by fully-connected mapping, and performing operation by using a linear function in a linear layer) on each globally-uniformly-processed dimension-reduced knowledge field output by norm function processing, so as to obtain a first transition knowledge field corresponding to each globally-uniformly-processed dimension-reduced knowledge field. The number of the obtained first transition knowledge fields corresponds to the number of the dimensionality reduction knowledge fields after global unified processing, the field rank corresponding to each first transition knowledge field is smaller than the field rank of the behavior description knowledge field corresponding to the first transition knowledge field, for example, the field rank corresponding to the first transition knowledge field is one half of the field rank of the behavior description knowledge field corresponding to the first transition knowledge field. As an embodiment, step S322 may include the following steps:

step S3221, determining a rank reduction ratio corresponding to the globally and uniformly processed dimension reduction knowledge field according to the globally and uniformly processed dimension reduction knowledge field.

In this embodiment, the rank reduction ratio represents a compression ratio in the field rank compression process corresponding to the dimension reduction knowledge field. Specifically, extracting knowledge fields of each globally unified dimension-reduced knowledge field, determining repeated and invalid dimension-reduced knowledge fields in each globally unified dimension-reduced knowledge field, and then determining a rank reduction ratio corresponding to each globally unified dimension-reduced knowledge field based on the repeated and invalid dimension-reduced knowledge fields in each globally unified dimension-reduced knowledge field.

Step S3222, performing field compression on the globally and uniformly processed dimension reduction knowledge field through rank reduction to obtain a first transition knowledge field.

For example, the field rank of the dimension-reduced knowledge field corresponding to each first session set may be reduced by a linear function based on a rank reduction ratio corresponding to each globally unified dimension-reduced knowledge field, so as to obtain a first transition knowledge field corresponding to each globally unified dimension-reduced knowledge field.

Step S323, determining a feature difference priority corresponding to each first session set according to the first transition knowledge field corresponding to each first session set. For example, according to the first transition knowledge field corresponding to each first session set, deep knowledge field transformation may be performed on each first transition knowledge field to obtain the feature difference priority corresponding to the first session set.

Wherein, the step S323 may include:

step S3231, performing expansion processing on each first transition knowledge field, and performing field compression on the expanded first transition knowledge fields to obtain a second transition knowledge field corresponding to each first transition knowledge field.

For example, the first transitional knowledge field may be expanded through an activation function, and then the expanded first transitional knowledge field may be field-compressed through a linear function, so as to obtain a second transitional knowledge field corresponding to each first transitional knowledge field. The field rank of the second transitional knowledge field obtained here is smaller than the corresponding field rank of the first transitional knowledge field corresponding to the second transitional knowledge field, for example, the field rank of the second transitional knowledge field is P/2. If the array size of the knowledge field array corresponding to the first session set is P × D, and the array size of the knowledge field array corresponding to each first transitional knowledge field may be P × D/2, then the obtained array size of the knowledge field array corresponding to each second transitional knowledge field is P × P/2.

Step S3232, determining a feature difference priority corresponding to each first session set according to each second transition knowledge field.

For example, after the knowledge field array corresponding to the second transition knowledge field is obtained, each second transition knowledge field in the second transition knowledge field is classified through the normalized exponential function, and the score corresponding to each second transition knowledge field is determined, so that the feature difference priority corresponding to each second transition knowledge field is obtained.

As an embodiment, step S4 may include the steps of:

step S41, according to the feature difference priority corresponding to each first session set, determining a first knowledge field array corresponding to the feature difference priority.

Wherein the array size of the first knowledge field array is P × Q. In this embodiment, the first knowledge field array includes a feature difference priority corresponding to each first session set, and the array size of the knowledge field array corresponding to the obtained feature difference priority is P × P/2.

And step S42, turning over the first knowledge field array to obtain second knowledge field arrays with array size of P multiplied by QQ.

In the embodiment of the present application, the flipping process is to flip or transpose the spatial coordinates of the knowledge fields corresponding to the different feature priorities in the knowledge field array. And (4) turning the first knowledge field array, and reversing the rows and columns of the first knowledge field array to obtain a second knowledge field array with the array size of QxP. When the first knowledge field array is inverted, the knowledge fields corresponding to the feature difference priorities of each row can be changed into the knowledge fields corresponding to the feature difference priorities of each column one by one, and the knowledge fields are changed into the second knowledge field array.

And S43, optimizing the behavior description knowledge fields corresponding to the P first session sets respectively according to the second knowledge field arrays and the session set arrays corresponding to the behavior description knowledge fields of the first session sets to obtain Q second session sets and the behavior description knowledge fields corresponding to each second session set.

The session set array corresponding to the behavior description knowledge field of the first session set may be a knowledge field array corresponding to the dimension reduction knowledge field after the dimension reduction operation. For example, the second knowledge field array and the session set array corresponding to the behavior description knowledge field of the first session set may be multiplied by each other to perform product multiplication, so as to perform optimized aggregation on the behavior description knowledge fields corresponding to the P first session sets, and obtain Q second session sets and the behavior description knowledge field corresponding to each second session set. For example, if the session set array corresponding to the behavior description knowledge field of the first session set is a P × D knowledge field array, and the second knowledge field array is a Q × P knowledge field array, where Q is P/2, then the second knowledge field array and the session set array are multiplied to obtain a knowledge field array with a size of Q × D, and the knowledge field of the Q × D knowledge field array is used as the behavior description knowledge field corresponding to the second session set.

In this embodiment of the application, for step S5, because the user classification model includes a plurality of optimization modules, after the first optimization module performs optimization processing on P first session sets corresponding to the user behavior log and behavior description knowledge fields corresponding to each first session set to obtain Q second session sets and behavior description knowledge fields corresponding to each second session set, the second session sets output by the first optimization module may be replaced with the first session sets, the number of new first session sets is replaced with P, and then the step of determining, for each first session set, the feature difference priority corresponding to the first session set according to the behavior description knowledge fields of the first session set is cycled. The number of cycles is cut off when a preset condition is met, for example, a predetermined number of times is reached. For example, the first optimization module performs optimization processing on P first session sets corresponding to the user behavior log and a behavior description knowledge field corresponding to each first session set to generate a first output, the output is used as an input of the second optimization module, the second optimization module performs optimization processing on the input to obtain a second output, the output of the second optimization module is used as an input of the third optimization module, the third optimization module performs optimization processing on the input to obtain a third output, and it is expected that the output of the third optimization module is used as a behavior description knowledge field corresponding to each second session set obtained last time in a cyclic reciprocating manner. And finally, determining a label attribution confidence result corresponding to the user behavior log according to the behavior description knowledge field corresponding to each second session set obtained at the last time, and determining an individualized behavior label of the user behavior log according to the label attribution confidence result. In the embodiment of the application, when the label attribution confidence result corresponding to the user behavior log is determined, the target behavior description knowledge field corresponding to the user behavior log is determined according to the behavior description knowledge fields of the second session sets obtained at the last time. In the foregoing example, the last obtained behavior description knowledge field of each second session set is the behavior description knowledge field of each second session set output by the third optimization module. The target behavior description knowledge field is obtained after dimensionality reduction is carried out on the behavior description knowledge field of each second session set obtained last time, and the target behavior description knowledge field comprises the behavior description knowledge field of each second session set. And according to the target behavior description knowledge field, determining a corresponding label attribution confidence result of the user behavior log. The tag attribution confidence results indicate confidence levels of the user behavior logs corresponding to the respective tag attribution results. For example, the target behavior description knowledge field may be input into the user classification model, and the classification module provided therein performs classification to obtain a corresponding tag attribution confidence result of the user behavior log. And determining the personalized behavior tag of the user behavior log according to the tag attribution confidence result. Wherein, the label belongs to the maximum value of the confidence result, namely the personalized behavior label of the user behavior log.

The following introduces a training process of the user classification model, which includes the following steps:

and step S100, acquiring a training user behavior log.

The training user behavior log, i.e. the training sample, may be a user behavior log corresponding to any user classification.

Step S200, inputting the training user behavior logs into a user classification model to be trained, processing the training user behavior logs through the user classification model to be trained, determining a first presumption behavior description knowledge field output by each optimization module, and determining a first presumption label attribution result corresponding to the training user behavior logs.

The optimization module is used for determining the feature difference priority of the first presumption session set according to the behavior description knowledge field of the first presumption session set corresponding to the training user behavior log, and optimizing the behavior description knowledge fields corresponding to the M first presumption session sets respectively according to the feature difference priority corresponding to each first presumption session set to obtain N target presumption session sets and the first presumption behavior description knowledge field corresponding to each target presumption session set. The first inferred behavior description knowledge field is output by each optimization module, the inferred behavior description knowledge field corresponding to each target inferred session set, and one optimization module outputs one first inferred behavior description knowledge field. After the training user behavior log is input into the user classification model to be trained, the user classification model to be trained firstly partitions the training user behavior log to obtain M first presumption session sets corresponding to the training user behavior log and determines a behavior description knowledge field corresponding to each first presumption session set. And then, processing the behavior description knowledge field corresponding to each first presumption session set through each optimization module in the user classification model to be trained to obtain a first presumption behavior description knowledge field output by each optimization module, and then determining a first presumption label attribution result corresponding to the training user behavior log according to the first presumption behavior description knowledge field generated by the last optimization module.

Step S300, inputting the training user behavior logs into a preset learning model, processing the training user behavior logs through the learning model, determining a second presumption behavior description knowledge field output by each optimization module in the learning model, and determining a second presumption label attribution result corresponding to the training user behavior logs.

The learning model can be a neural network model obtained according to a user classification model, such as a teacher-student network model, and the user classification model to be trained serves as a student network. The number of the optimization modules in the learning model corresponds to the number of the optimization modules in the user classification model, the learning model comprises a first generation value, a second generation value, a third generation value and a fourth generation value, and an analysis module is used for determining a third presumed tag attribution result corresponding to the recovery presumed behavior description knowledge field and a fourth presumed tag attribution result corresponding to the second presumed behavior description knowledge field. The second presumed behavior description knowledge field is a presumed behavior description knowledge field output by an optimization module in the learning model, one optimization module in the learning model also corresponds to the second presumed behavior description knowledge field, and the attribution result of the second presumed label is the possibility obtained by the learning model and can reflect the behavior log of the training user corresponding to each behavior classification. After the training user behavior log is input into the learning model, the learning model may partition the training user behavior log to obtain M first inference session sets corresponding to the training user behavior log, and determine a behavior description knowledge field corresponding to each first inference session set. And performing dimensionality reduction operation on the behavior description knowledge fields corresponding to each of the M first presumption session sets one by one through each optimization module in the learning model, determining a second presumption behavior description knowledge field output by each optimization module, and determining a second presumption label attribution result corresponding to the training user behavior log according to the second presumption behavior description knowledge field output by the last optimization module. Each optimization module obtains M second inferred behavior description knowledge fields.

And step S400, determining the estimated cost value of the user classification model to be trained according to the first estimated behavior description knowledge field, the second estimated behavior description knowledge field, the first estimated label attribution result and the second estimated label attribution result, and training the user classification model to be trained until convergence through the estimated cost value.

The convergence condition may be that the training reaches a preset number of times or the inference accuracy reaches a preset accuracy. A cost value corresponding to the knowledge field of the speculative behavior description may be determined based on each knowledge field of the first speculative behavior description and each knowledge field of the second speculative behavior description. A cost value between two putative tag attribution results may also be determined based on the first putative tag attribution result and the second putative tag attribution result. And then determining the estimated cost value of the user classification model to be trained according to the cost value corresponding to the estimated behavior description knowledge field and the cost value between the attribution results of the two estimated labels, and training the user classification model to be trained until convergence through the estimated cost value.

Wherein, step S400 may include:

step S401, for each optimization module in the user classification model to be trained, determining a first generation value corresponding to the optimization module according to a first presumed behavior description knowledge field and a second presumed behavior description knowledge field corresponding to the optimization module.

The optimization module can be determined for each optimization module in the user classification model to be trained, the corresponding optimization module in the learning model is determined, then a cost value between a first presumed behavior description knowledge field and a second presumed behavior description knowledge field is determined according to the first presumed behavior description knowledge field corresponding to the optimization module and the second presumed behavior description knowledge field corresponding to the corresponding optimization module, and the cost value is used as a first generation value corresponding to the optimization module.

And step S402, determining a second generation value of the user classification model to be trained according to the first presumed label attribution result and the second presumed label attribution result.

And determining a cost value between the two inferred label attribution results according to the first inferred label attribution result and the second inferred label attribution result, wherein the cost value is used as a second generation value of the user classification model to be trained.

And S403, determining a presumed cost value according to the first cost value and the second cost value corresponding to each optimization module.

The first generation value and the second generation value corresponding to each optimization module can be used as the estimated generation value.

In step S401, the method may specifically include:

step S4011, according to each first inferred behavior description knowledge field, determining M recovered inferred behavior description knowledge fields, where M is the number of first inferred session sets corresponding to the training user behavior log.

And recovering the presumed behavior description knowledge field is the presumed behavior description knowledge field obtained after the first presumed behavior description knowledge field is recovered in quantity level, and the quantity corresponding to the recovered presumed behavior description knowledge field is larger than that corresponding to the first presumed behavior description knowledge field. Because the number of first inferred behavior description knowledge fields output by each optimization module is less than the number of behavior description knowledge fields of the first set of inferred sessions, the number of second inferred behavior description knowledge fields output by each optimization module in the learning model is equal to the number of behavior description knowledge fields of the first set of inferred sessions. Then, after obtaining each first presumed behavior description knowledge field, the quantity of each first presumed behavior description knowledge field is restored, and M restored presumed behavior description knowledge fields corresponding to each first presumed behavior description knowledge field are obtained. Such that the number of recovered putative behavior description knowledge fields corresponds to the number of second putative behavior description knowledge fields.

Step S4012, determining a first generation value corresponding to the optimization module according to the M recovered inferred behavior description knowledge fields and the second inferred behavior description knowledge field.

For each of the M recovered putative behavior description knowledge fields, a second putative behavior description knowledge field corresponding to the recovered putative behavior description knowledge field may be obtained first, then a cost value between the recovered putative behavior description knowledge field and the second putative behavior description knowledge field is determined, then a first generation value is determined according to the cost value between every two putative behavior description knowledge fields (the recovered putative behavior description knowledge field and the corresponding second putative behavior description knowledge field), and the first generation value is used as the first generation value of the recovered putative behavior description knowledge field corresponding to the optimization module in the user classification model to be trained. And according to the two steps, respectively determining the first generation value corresponding to each optimization module in the user classification model to be trained.

As an embodiment, step S4011 may include:

step S40111, performing global unified processing on the second knowledge field array corresponding to the first presumed behavior description knowledge field to obtain a first presumed dimension-reducing knowledge field after global unified processing, and performing flip processing on the knowledge field array corresponding to the first presumed dimension-reducing knowledge field to obtain a third knowledge field array.

Step S40112, performing field compression on the third knowledge field array to obtain a second presumed dimension-reduction knowledge field, and performing expansion processing on the second presumed dimension-reduction knowledge field to obtain a third presumed dimension-reduction knowledge field.

And S40113, performing field compression on the knowledge field array corresponding to the third presumed dimension-reduced knowledge field, turning over the compressed knowledge field array to obtain a fourth knowledge field array, and determining M restored presumed behavior description knowledge fields according to the fourth knowledge field array.

The array size of the fourth knowledge field array comprises M fields, and the field rank in the array size of the fourth knowledge field array is the field rank corresponding to the behavior description knowledge field of the first presumption session set. The process of recovering the first inferred behavior description knowledge field output by the second optimization module and the first inferred behavior description knowledge field output by the third optimization module in the user classification model to be trained may refer to the above process of recovering the first inferred behavior description knowledge field output by the first optimization module.

As an embodiment, for determining M recovery presumed behavior description knowledge fields from the fourth knowledge field array in step S40113, the method may include: carrying out global unified processing on the fourth knowledge field array, and carrying out field compression on the fourth knowledge field array subjected to global unified processing for multiple times to obtain a fifth knowledge field array; and determining M recovery presumed behavior description knowledge fields based on the fifth array of knowledge fields and the fourth array of knowledge fields.

Wherein, for determining the first generation value corresponding to the optimization module according to the M recovered inferred behavior description knowledge fields and the second inferred behavior description knowledge field, the method may further include:

a: the inferred behavior description knowledge field and the second inferred behavior description knowledge field determine a first valuation value.

B: and recovering the recovered presumed behavior description knowledge field to obtain a first target presumed behavior description knowledge field associated with the recovered presumed behavior description knowledge field, and determining a third presumed tag attribution result corresponding to the first target presumed behavior description knowledge field.

The first fractional cost value may be the cost values introduced in step S4011 and step S4012.

The embodiment of the present application further provides an analysis module that analyzes the recovered presumed behavior description knowledge field and the second presumed behavior description knowledge field, determines a likelihood that the recovered presumed behavior description knowledge field corresponds to the presumed behavior description knowledge field output for the learning model, and determines a likelihood that the second presumed behavior description knowledge field corresponds to the presumed behavior description knowledge field output for the learning model. The first target presumption behavior description knowledge field is a presumption behavior description knowledge field obtained by compressing a field rank corresponding to the recovery presumption behavior description knowledge field. After each optimization module in the user classification model outputs each recovered presumed behavior description knowledge field, the recovered presumed behavior description knowledge field may be input to an analysis module, and the analysis module performs knowledge field analysis on each recovered presumed behavior description knowledge field, for example, compresses a field rank corresponding to each recovered presumed behavior description knowledge field, lowers the field rank corresponding to each recovered presumed behavior description knowledge field, and obtains a first target presumed behavior description knowledge field associated with each recovered presumed behavior description knowledge field. Each of the first target putative behavior description knowledge fields may then be categorized by the analysis module to determine a likelihood that each of the first target putative behavior description knowledge fields corresponds to a putative behavior description knowledge field output for the learning model as a third putative tag attribution result corresponding to the first target putative behavior description knowledge field.

C: and recovering the second presumed behavior description knowledge field to obtain a second target presumed behavior description knowledge field associated with the second presumed behavior description knowledge field, and determining a fourth presumed tag attribution result corresponding to the second target presumed behavior description knowledge field.

The second target presumption behavior description knowledge field is a presumption behavior description knowledge field obtained after compressing the field rank corresponding to the second presumption behavior description knowledge field. The fourth presumed tag attribution result indicates a likelihood that each of the second target presumed behavior description knowledge fields corresponds to a presumed behavior description knowledge field output for the learning model. And after obtaining each second presumed behavior description knowledge field output by each optimization module in the learning model, inputting the second presumed behavior description knowledge field into an analysis module, and respectively recovering each second presumed behavior description knowledge field through the analysis module, for example, compressing a field rank corresponding to each second presumed behavior description knowledge field, reducing the field rank corresponding to each second presumed behavior description knowledge field, and obtaining a second target presumed behavior description knowledge field associated with each second presumed behavior description knowledge field. Furthermore, the classification of each second target presumed behavior description knowledge field by the analysis module may be performed to obtain a possibility that each second target presumed behavior description knowledge field corresponds to a presumed behavior description knowledge field output for the learning model as a fourth presumed tag attribution result corresponding to the second target presumed behavior description knowledge field.

D: and determining a second cost value according to the third presumed tag attribution result and the fourth presumed tag attribution result, and determining a first cost value according to the first cost value and the second cost value.

Wherein determining the second cost value according to the third presumed tag attribution result and the fourth presumed tag attribution result may include:

d1: and determining a third cost value according to the third presumed label attribution result and a first reference label attribution result corresponding to the third presumed label attribution result.

The third estimation label attribution result represents the possibility that each first target estimation behavior description knowledge field corresponds to the estimation behavior description knowledge field output for the learning model, each first target estimation behavior description knowledge field is obtained according to the first estimation behavior description knowledge field output by the optimization module in the user classification model to be trained, therefore, the possibility that the first target estimation behavior description knowledge field corresponds to the estimation behavior description knowledge field output for the learning model is extremely low, the third estimation label attribution result is subjected to cost value calculation, the first cost value is determined according to the obtained cost value, the optimization module in the user classification model to be trained is trained through the first cost value, and the reliability of the first estimation behavior description knowledge field output by the optimization module is improved.

D2: and determining a fourth cost value according to the fourth presumed tag attribution result and a second reference tag attribution result corresponding to the fourth presumed tag attribution result.

The fourth estimation label attribution result represents the possibility that each second target estimation behavior description knowledge field corresponds to the estimation behavior description knowledge field output for the learning model, and each second target estimation behavior description knowledge field is obtained according to the second estimation behavior description knowledge field output by the optimization module in the learning model, so that the possibility that the second target estimation behavior description knowledge field corresponds to the estimation behavior description knowledge field output for the learning model is extremely high, the cost value is calculated according to the fourth estimation label attribution result, the first cost value is determined according to the obtained cost value, and the optimization module in the user classification model to be trained is trained through the first cost value, so that supervised learning is realized on the output of the optimization module in the user classification model to be trained, and the reliability of the first estimation behavior description knowledge field output by the optimization module is increased.

D3: and determining a second cost value according to the third cost value and the fourth cost value.

The third and fourth cost values may be summed, and the sum may be used as the second cost value. In addition, after the second cost value and the first cost value are obtained, the second cost value and the first cost value may be used as the first cost value, or the second cost value and the first cost value may be summed up to be used as the first cost value. Further, the first generation value corresponding to each optimization module in the user classification model to be trained can be determined according to the four steps of ABCD, and the optimization module can be trained through the first generation value corresponding to each optimization module.

In addition, the step of determining the estimated cost value according to the first cost value and the second cost value corresponding to each optimization module may include:

step S501, according to the first estimation label attribution result and the reference label attribution result corresponding to the training user behavior log, determining a confidence estimation cost value corresponding to the user classification model to be trained. And the estimated cost value comprises a confidence estimated cost value between the first estimated label attribution result and a reference label attribution result corresponding to the training user behavior log. The confidence estimation cost value can indicate a cost value between a first estimation label attribution result output by the user classification model to be trained and a reference label attribution result corresponding to the training user behavior log. And the reference label attribution result corresponding to the training user behavior log is the actual attribution result corresponding to the training user behavior log. And performing cost calculation according to the first estimated label attribution result and the reference label attribution result corresponding to the training user behavior log, and determining the confidence estimated cost value corresponding to the user classification model to be trained.

Step S502, determining the estimated cost value according to the first cost value, the second cost value and the confidence estimated cost value corresponding to each optimization module.

The first generation value, the determined second generation value and the determined confidence estimated generation value corresponding to each optimization module in the user classification model to be trained can be used as the estimated generation value.

As an embodiment, for the first and second generation values corresponding to each optimization module, determining the inferred cost value may include:

step S601, performing knowledge field analysis on the training user behavior log through a preset machine learning model, and determining a fifth estimation label attribution result corresponding to the training user behavior log.

The preset machine learning model can be a neural network model obtained through a general means, and can classify the knowledge field data. The fifth putative tag attribution result may indicate a likelihood that the training user behavior log corresponds to each behavior classification. And analyzing the knowledge field of the training user behavior log through a preset machine learning model, and determining a fifth estimation label attribution result corresponding to the training user behavior log output by the preset machine learning model.

Step S602, determining the third generation value of the user classification model to be trained according to the fifth estimation label attribution result and the first estimation label attribution result.

And calculating a cost value through the fifth presumed label attribution result and the first presumed label attribution result, and taking the cost value as a third generation value of the user classification model to be trained.

Step S603, determining a presumed cost value based on the first cost value, the second cost value, and the third cost value.

The first generation value, the second generation value, and the third generation value may be assumed as estimated cost values, the first generation value, the second generation value, the third generation value, and the confidence estimated cost value in the above embodiments may be assumed as estimated cost values in common, and the first generation value, the second generation value, the third generation value, and the confidence estimated cost value may be combined to be assumed as estimated cost values. In addition, as an implementation manner, the user classification model to be trained may also be trained by using one or more of various cost values of the above implementation manners, such as the first cost value, the second cost value, the third cost value and the confidence estimation cost value, as the estimation cost value, and the embodiments of the present application are not limited.

Referring to fig. 3, which is a schematic diagram of an architecture of a user classifying device 110 according to an embodiment of the present invention, the user classifying device 110 may be configured to perform a user classifying method based on big data, wherein the user classifying device 110 includes:

the log receiving module 111 is configured to receive a user behavior log of a user to be classified.

And the knowledge field extraction module 112 is configured to determine, according to the user behavior log, P first session sets corresponding to the user behavior log and a behavior description knowledge field corresponding to each first session set.

And a priority determining module 113, configured to determine, for each first session set, a feature difference priority corresponding to the first session set according to the behavior description knowledge field of the first session set.

And the optimizing module 114 is configured to optimize the behavior description knowledge fields corresponding to the P first session sets according to the feature difference priority corresponding to each first session set, to obtain Q second session sets and the behavior description knowledge field corresponding to each second session set, where Q is less than P.

And the tag determination module 115 is configured to determine a personalized behavior tag of the user behavior log according to the behavior description knowledge field corresponding to each second session set.

The classification module 116 is configured to determine a category attribute corresponding to the personalized behavior tag according to a preset personalized tag mapping relationship.

The log receiving module 111 may be configured to perform step S1; the knowledge field extraction module 112 may be configured to perform step S2; the priority determination module 113 may be configured to perform step S3; the optimization module 114 is operable to perform step S4; the tag determination module 115 may be configured to perform step S5; the classification module 116 may be configured to perform step S6.

In the foregoing embodiment, the user classification method based on big data provided in the embodiment of the present invention has been described in detail, but the principle of the user classification device 110 is the same as that of the method, and the implementation principle of each module of the user classification device 110 is not described herein again.

In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other manners. The apparatus embodiments described above are merely illustrative, and for example, the flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

In addition, the functional modules in the embodiments of the present invention may be integrated together to form an independent part, or each module may exist separately, or two or more modules may be integrated to form an independent part.

The functions may be stored in a computer-readable storage medium if they are implemented in the form of software functional modules and sold or used as separate products. Based on such understanding, the technical solution of the present invention or a part thereof which substantially contributes to the prior art may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrases "comprising a," "8230," "8230," or "comprising" does not exclude the presence of additional like elements in a process, method, article, or apparatus that comprises the element.

It should be understood that technical terms not nounced in the above-mentioned contents can be clearly determined by those skilled in the art from the above-mentioned disclosures. The above disclosure of the embodiments of the present application will be apparent to those skilled in the art from the above disclosure. It should be understood that the process of deriving and analyzing technical terms, which are not explained, by those skilled in the art based on the above disclosure is based on the contents described in the present application, and thus the above contents are not an inventive judgment of the overall scheme.

Having thus described the basic concept, it will be apparent to those skilled in the art that the foregoing detailed disclosure is to be considered merely illustrative and not restrictive of the broad application. Various modifications, improvements and adaptations to the present application may occur to those skilled in the art, although not explicitly described herein. Such modifications, improvements and adaptations are proposed in the present application and thus fall within the spirit and scope of the exemplary embodiments of the present application.

It should also be appreciated that in the foregoing description of embodiments of the present application, various features are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure aiding in the understanding of at least one embodiment of the invention. However, this method of disclosure is not intended to require more features than are expressly recited in the claims. Indeed, the embodiments may be characterized as having less than all of the features of a single disclosed embodiment.

Claims

1. A big data-based user classification method is applied to a classification system, and comprises the following steps:

receiving a user behavior log of a user to be classified; then according to the user behavior log, determining P first session sets corresponding to the user behavior log and a behavior description knowledge field corresponding to each first session set;

for each first session set, determining a characteristic difference priority corresponding to the first session set according to a behavior description knowledge field of the first session set; optimizing behavior description knowledge fields corresponding to the P first session sets respectively according to the characteristic difference priority corresponding to each first session set to obtain Q second session sets and the behavior description knowledge fields corresponding to each second session set, wherein Q is less than P;

determining a personalized behavior tag of the user behavior log according to a behavior description knowledge field corresponding to each second session set; and determining the category attribute corresponding to the personalized behavior tag according to a preset personalized tag mapping relation.

2. The method of claim 1, wherein for each of the first set of sessions, determining a feature difference priority corresponding to the first set of sessions according to a behavior description knowledge field of the first set of sessions comprises:

and determining the characteristic difference priority corresponding to each first session set according to the dimensionality reduction knowledge field corresponding to each first session set.

3. The method of claim 2, wherein the determining the feature difference priority corresponding to each of the first session sets according to the dimension reduction knowledge field corresponding to the first session set comprises:

carrying out global unified processing on the dimensionality reduction knowledge fields corresponding to each first session set to obtain globally-uniformly processed dimensionality reduction knowledge fields;

performing field compression on the globally and uniformly processed dimensionality reduction knowledge field to obtain a first transition knowledge field; wherein a field rank corresponding to the first transition knowledge field is smaller than a field rank corresponding to the dimension reduction knowledge field;

4. The method of claim 1, wherein the globally and uniformly processing the dimension-reduced knowledge field corresponding to each first session set to obtain a globally and uniformly processed dimension-reduced knowledge field comprises:

5. The method according to claim 3 or 4, wherein the performing field compression on the globally and uniformly processed dimension-reduced knowledge field to obtain a first transition knowledge field comprises:

performing field compression on the globally and uniformly processed dimension reduction knowledge field according to the rank reduction ratio to obtain a first transition knowledge field;

the determining the feature difference priority corresponding to each first session set according to the first transition knowledge field corresponding to each first session set includes:

and determining the characteristic difference priority corresponding to each first session set according to each second transition knowledge field.

6. The method according to claim 5, wherein the optimizing the behavior description knowledge fields corresponding to the P first session sets respectively according to the feature difference priority corresponding to each first session set to obtain Q second session sets and the behavior description knowledge field corresponding to each second session set comprises:

turning the first knowledge field array to obtain a second knowledge field array, wherein the size of the second knowledge field array is QxP;

optimizing the behavior description knowledge fields corresponding to the P first session sets respectively according to the second knowledge field arrays and the session set arrays corresponding to the behavior description knowledge fields of the first session sets to obtain the Q second session sets and the behavior description knowledge fields corresponding to each second session set;

the determining a personalized behavior tag of the user behavior log according to the behavior description knowledge field corresponding to each second session set includes:

replacing the second conversation set with a first conversation set, and correcting the value corresponding to the P to be the value of the first conversation set;

circularly executing the step of determining the characteristic difference priority corresponding to each first session set according to the behavior description knowledge field of the first session set;

when the number of times of circulation meets a preset condition, determining a label attribution confidence result corresponding to the user behavior log according to behavior description knowledge fields corresponding to all the second session sets obtained at the last time;

7. The method of claim 6, wherein the method is implemented by a user classification model, and the user classification model is trained by the following steps:

acquiring a training user behavior log;

inputting the training user behavior log into a user classification model to be trained, wherein the user classification model comprises a plurality of optimization modules;

8. The method of claim 1, wherein determining the inferred cost value of the user classification model to be trained based on the first inferred behavior description knowledge field, the second inferred behavior description knowledge field, the first inferred label attribution result, and the second inferred label attribution result comprises:

determining the estimated cost value according to the first cost value and the second cost value corresponding to each optimization module;

wherein the determining a first generation value corresponding to the optimization module according to the first and second inferred behavior description knowledge fields corresponding to the optimization module comprises:

performing field compression on a knowledge field array corresponding to the third presumed dimension-reduced knowledge field, performing turnover processing on the knowledge field array after the field compression to obtain a fourth knowledge field array, and determining the M restored presumed behavior description knowledge fields according to the fourth knowledge field array, wherein the array size of the fourth knowledge field array includes M fields, a field rank in the array size of the fourth knowledge field array is a field rank corresponding to a behavior description knowledge field of a first presumed session set, and M is the number of the first presumed session sets corresponding to the training user behavior log;

determining a first generation value corresponding to the optimization module according to the M recovery presumed behavior description knowledge fields and the second presumed behavior description knowledge field;

wherein said determining said M recovery putative behavior description knowledge fields from said fourth array of knowledge fields comprises:

determining the M restored putative behavior description knowledge fields from the fifth array of knowledge fields and the fourth array of knowledge fields;

wherein the determining a first cost value corresponding to the optimization module according to the M recovered putative behavior description knowledge fields and the second putative behavior description knowledge field comprises:

determining a second cost-sharing value according to the third presumed label attribution result and the fourth presumed label attribution result;

9. The method of claim 8, wherein determining a second cost-effective value based on the third and fourth inferred tag attribution results comprises:

determining a fourth cost value according to the fourth inferred label attribution result and a second reference label attribution result corresponding to the fourth inferred label attribution result;

determining the second depreciation value according to the third depreciation value and the fourth depreciation value;

wherein the determining the estimated cost value according to the first cost value and the second cost value corresponding to each optimization module comprises:

determining the estimated cost value according to the first cost value, the second cost value and the confidence estimated cost value corresponding to each optimization module;

determining a third generation value of the user classification model to be trained according to the fifth estimated label attribution result and the first estimated label attribution result;

10. A user classification system comprising an interconnected processor and memory, in which is stored a computer program which, when executed by the processor, carries out the method of any one of claims 1 to 9.