CN113836370A - User group classification method and device, storage medium and computer equipment - Google Patents

User group classification method and device, storage medium and computer equipment Download PDF

Info

Publication number
CN113836370A
CN113836370A CN202111412279.0A CN202111412279A CN113836370A CN 113836370 A CN113836370 A CN 113836370A CN 202111412279 A CN202111412279 A CN 202111412279A CN 113836370 A CN113836370 A CN 113836370A
Authority
CN
China
Prior art keywords
behavior
user
sequence
instruction
frequent
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111412279.0A
Other languages
Chinese (zh)
Other versions
CN113836370B (en
Inventor
陶景龙
王启凡
魏国富
殷钱安
余贤喆
周晓勇
梁淑云
刘胜
马影
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Information and Data Security Solutions Co Ltd
Original Assignee
Information and Data Security Solutions Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Information and Data Security Solutions Co Ltd filed Critical Information and Data Security Solutions Co Ltd
Priority to CN202111412279.0A priority Critical patent/CN113836370B/en
Priority to PCT/CN2021/135899 priority patent/WO2023092646A1/en
Publication of CN113836370A publication Critical patent/CN113836370A/en
Application granted granted Critical
Publication of CN113836370B publication Critical patent/CN113836370B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/9035Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/906Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2216/00Indexing scheme relating to additional aspects of information retrieval not explicitly covered by G06F16/00 and subgroups
    • G06F2216/03Data mining

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Computational Linguistics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a user group classification method, a user group classification device, a storage medium and computer equipment. The method comprises the following steps: acquiring behavior data of a user group, and preprocessing the behavior data of the user group to obtain a behavior sequence dataset which takes the user name of each user as a main object; extracting and frequency counting the frequent behavior instruction combinations in the behavior sequence data set by using a correlation analysis algorithm to obtain a frequent instruction combination feature table; calculating sequence matching scores and inter-sequence similarity scores among all behavior sequences in the behavior sequence data set through a sequence comparison algorithm to obtain a sequence similarity feature table; carrying out frequency statistics on the behavior instructions in the behavior sequence data set to obtain a behavior instruction frequency feature table; and classifying and analyzing the frequent instruction combination feature table, the sequence similarity feature table and the behavior instruction frequency feature table by adopting a semi-supervised classification algorithm to obtain user groups with different categories so as to improve the classification efficiency.

Description

User group classification method and device, storage medium and computer equipment
Technical Field
The invention relates to the technical field of big data processing, in particular to a user group classification method, a user group classification device, a storage medium and computer equipment.
Background
User group classification is a link which is particularly important in the development process of various industries taking users as operation carriers, and when facing platforms with huge user groups such as e-commerce, public resource management, information security management and the like, how to classify independent user objects into groups is very difficult and important work. Compared with the traditional method for carrying out group classification according to the user attribute construction characteristics, the method for carrying out user group classification by taking the operation behaviors of the user as the original characteristics is obviously more innovative and effective, and after the user group is divided according to the user operation behaviors, when classification data of the user group is applied to downstream work, all the advantages of accurate recommendation, updating and retention, group management and the like can be exerted.
In the prior art, most group classification methods based on user operation behaviors add tags to a data set in combination with business logic according to attributes such as basic attributes, user behavior tracks and user social connections of the user operation behaviors, and classify user groups by using a supervised machine learning algorithm. However, the classification method for the user group cannot be applied to an application scenario that no social relationship exists between users and no behavior track exists in user operation, and adding tags to the user group is also work with high labor cost and low efficiency. Therefore, the classification method of the user group seems to be very effective, the practical application scenarios are very limited, the required labor cost is high, and the model training efficiency is very low.
Disclosure of Invention
In view of this, the present application provides a user group classification method, device, storage medium and computer device, and mainly aims to solve the technical problems in the prior art that the application scenario of the user group classification method is limited, the required labor cost is high, and the model training efficiency is low.
According to a first aspect of the present invention, there is provided a method for classifying a user group, the method comprising:
acquiring behavior data of a user group, and preprocessing the behavior data of the user group to obtain a behavior sequence dataset which takes the user name of each user as a main object, wherein each user name corresponds to a behavior sequence, and each behavior sequence comprises at least one behavior instruction;
extracting and frequency counting the frequent behavior instruction combinations in the behavior sequence data set by using a correlation analysis algorithm to obtain a frequent instruction combination feature table;
calculating sequence matching scores and inter-sequence similarity scores among all behavior sequences in the behavior sequence data set through a sequence comparison algorithm to obtain a sequence similarity feature table;
carrying out frequency statistics on the behavior instructions in the behavior sequence data set to obtain a behavior instruction frequency feature table;
and classifying and analyzing the frequent instruction combination feature table, the sequence similarity feature table and the behavior instruction frequency feature table by adopting a semi-supervised classification algorithm to obtain user groups with different categories.
According to a second aspect of the present invention, there is provided an apparatus for classifying a user group, the apparatus comprising:
the user data acquisition module is used for acquiring behavior data of a user group and preprocessing the behavior data of the user group to obtain a behavior sequence data set taking the user name of each user as a main object, wherein each user name corresponds to one behavior sequence, and each behavior sequence comprises at least one behavior instruction;
the frequent item feature extraction module is used for extracting and carrying out frequency statistics on the frequent behavior instruction combination in the behavior sequence data set by using a correlation analysis algorithm to obtain a frequent instruction combination feature table;
the similarity characteristic extraction module is used for calculating sequence matching scores and inter-sequence similarity scores among all behavior sequences in the behavior sequence data set through a sequence comparison algorithm to obtain a sequence similarity characteristic table;
the instruction frequency characteristic extraction module is used for carrying out frequency statistics on the behavior instructions in the behavior sequence data set to obtain a behavior instruction frequency characteristic table;
and the user group classification module is used for classifying and analyzing the frequent instruction combination feature table, the sequence similarity feature table and the behavior instruction frequency feature table by adopting a semi-supervised classification algorithm to obtain user groups with different categories.
According to a third aspect of the present invention, there is provided a storage medium having stored thereon a computer program which, when executed by a processor, implements the above-described method of classifying a user population.
According to a fourth aspect of the present invention, there is provided a computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the above method of classifying a user group when executing the program.
According to the classification method, the classification device, the storage medium and the computer equipment of the user group, the behavior habit attributes existing among the users are mined by extracting and frequency counting the combination of each operation behavior and the frequent operation behavior of the user group, the potential connection scale between each user and the user group can be quantized by calculating the sequence matching score and the inter-sequence similarity score among the behavior sequences in the user group, and the behavior relation attribute among the socializing-free users is compensated. Based on the method, the behavior habit attributes, the behavior relation attributes and the potential connection attributes of the users in the user group are mined, so that the method can be widely applied to application scenes without social relations among the users and behavior tracks of user operation, and the application range of the user group classification method is expanded. In addition, the method reduces the workload of adding the classification labels to the user group by adopting a semi-supervised classification algorithm, and effectively improves the training efficiency of the user group classification model and the classification efficiency of the user group.
The foregoing description is only an overview of the technical solutions of the present application, and the present application can be implemented according to the content of the description in order to make the technical means of the present application more clearly understood, and the following detailed description of the present application is given in order to make the above and other objects, features, and advantages of the present application more clearly understandable.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention. In the drawings:
fig. 1 is a flowchart illustrating a method for classifying a user group according to an embodiment of the present invention;
FIG. 2 is a schematic diagram illustrating a sample behavior sequence dataset according to an embodiment of the present invention;
FIG. 3 is a schematic diagram illustrating a sample frequent instruction combination feature table according to an embodiment of the present invention;
FIG. 4 is a diagram illustrating a sample sequence similarity feature table provided by an embodiment of the present invention;
FIG. 5 is a schematic diagram illustrating a sample behavior instruction frequency characteristic table according to an embodiment of the present invention;
FIG. 6 is a schematic diagram illustrating a sample classification result of a user group according to an embodiment of the present invention;
FIG. 7 is a scatter plot diagram illustrating a classification result of a user group according to an embodiment of the present invention;
FIG. 8 is a flowchart illustrating a method for classifying user groups according to an embodiment of the present invention;
fig. 9 is a schematic structural diagram illustrating a classification apparatus for a user group according to an embodiment of the present invention.
Detailed Description
The invention will be described in detail hereinafter with reference to the accompanying drawings in conjunction with embodiments. It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict.
In one embodiment, as shown in fig. 1, a method for classifying a user group is provided, which is described by taking the method as an example of being applied to a computer device such as a server, and includes the following steps:
101. and acquiring behavior data of the user group, and preprocessing the behavior data of the user group to obtain a behavior sequence data set taking the user name of each user as a main object.
The behavior data of the user group refers to data related to operation behaviors of a plurality of users (usually, a large number of users) in a system, which is obtained by analyzing registration information, log information and the like by using the system or a platform with the user as an operation carrier. The operation behavior refers to an operation instruction triggered by the user at each operation time point, and the operation instruction may be, for example, login, browsing a main page, browsing a sub-page, interacting with a certain component in the page, placing an order for a certain commodity, and the like. In this embodiment, in order to facilitate data processing, each operation instruction triggered by the user may be converted into an instruction code, for example, the "login" instruction may be converted into an instruction code "h", the "browse home page" instruction may be converted into an instruction code "f", and the like.
Specifically, the computer device may obtain behavior data of a user group to be processed through a data management center of a certain system or platform, where the user group mainly refers to multiple users registered on the same system or platform, the behavior data of the user group mainly includes information such as a user name of each user, a behavior instruction of each user, and an operation time of each behavior instruction, and then the computer device may perform preprocessing operations such as data cleaning and data processing on the obtained behavior data of the user group, encode each behavior instruction in the behavior data of the user group, and sort the encoded behavior instructions according to the operation time to form a behavior sequence of each user, and finally, the computer device may list the behavior sequences of all users in the user group in a data table with the user name of each user as a subject object, to form a behavioral sequence dataset for a population of users.
In this embodiment, the behavior sequence data set at least includes two field names, which are the user name and the behavior sequence corresponding to the user name, respectively. Because the implementation adopts the semi-supervised classification algorithm to classify the user population, the classification labels of the user population can be incomplete, namely, one part of users of the behavior sequence dataset have the classification label and the other part of users do not have the classification label.
102. And extracting and carrying out frequency statistics on the frequent behavior instruction combination in the behavior sequence data set by using a correlation analysis algorithm to obtain a frequent instruction combination feature table.
The association analysis algorithm refers to an unsupervised learning algorithm for finding out some association between data in a data set, and the algorithm can find out the relationship between data and data in large-scale data, such as finding out a frequent item set (a set of items that often appear together) and an association rule (suggesting that a strong relationship may exist between two items) in the data set, and the like, wherein common association analysis algorithms mainly include an Apriori algorithm, an FP-growth algorithm, and the like.
Specifically, the computer device may find out a frequent item set in the behavior sequence data set by using an association analysis algorithm such as Apriori algorithm and FP-growth algorithm, then count the frequency of each frequent item in the frequent item set in the behavior sequence data set, and finally form a frequent instruction combination feature table using the user name and the frequent item as field names. In this embodiment, the frequent item may specifically be a frequent behavior instruction combination, where the frequent behavior instruction combination refers to a set of behavior instructions that frequently appear together in a behavior sequence data set. For example, a "login" instruction and a "browse home page" instruction typically appear concatenated, where the "login" instruction encodes "h" and the "browse home page" instruction encodes "f", then "hf" is a frequently-behaving instruction combination. Through a correlation analysis algorithm, all frequent behavior instruction combinations in the behavior sequence data set can be found, and further, a frequent instruction combination feature table can be formed by counting the frequency of each frequent behavior instruction combination in each behavior sequence. In the embodiment, by the characteristic that the frequent behavior instructions are combined with the frequency, the daily behavior habits of each user and the overall daily behavior trend of the user group can be mined, so that a powerful basis is provided for the classification of the user group without social relations. It should be noted that the frequent behavior instruction combination is composed of at least two behavior instructions that appear together, and the lengths of the frequent behavior instruction combinations may be different.
103. And calculating sequence matching scores and inter-sequence similarity scores among the behavior sequences in the behavior sequence data set through a sequence comparison algorithm to obtain a sequence similarity feature table.
The sequence alignment algorithm refers to an algorithm for mining the similarity between every two behavior sequences in a data set, and generally speaking, the sequence alignment algorithm can use two indexes to describe the similarity between the sequences, which are the consistency and the similarity, respectively. Currently, the sequence alignment algorithm mainly includes a global sequence alignment algorithm and a local sequence alignment algorithm, and the common sequence alignment algorithms mainly include a Needleman-Wunsch algorithm, a Smith-Waterman algorithm, a FASTA algorithm, a BLAST algorithm, and the like.
Specifically, the computer device may calculate, by a global sequence alignment algorithm and/or a local sequence alignment algorithm, the identity and the similarity between each behavior sequence in the behavior sequence data set and other behavior sequences, where the identity may be expressed by a sequence matching score array, and the similarity may be expressed by an inter-sequence similarity score array. Further, the computer device may calculate a maximum value, a minimum value, an average value, a standard deviation, and a variance in each array of sequence match scores and each array of inter-sequence similarity scores, respectively, to quantify the similarity features between sequences to form a sequence similarity feature table. In the embodiment, by the characteristic of sequence similarity, the behavior relation and the potential connection among the user behaviors can be mined, so that another powerful basis is provided for the user group classification without social relation and behavior tracks. It should be noted that, in this embodiment, the global sequence alignment algorithm or the local sequence alignment algorithm may be separately used to calculate the consistency and the similarity between each behavior sequence and other behavior sequences, or the global sequence alignment algorithm and the local sequence alignment algorithm may be simultaneously used to calculate the global consistency and the global similarity, the local consistency and the local similarity between each behavior sequence and other behavior sequences, respectively, so as to improve the accuracy of sequence alignment.
104. And carrying out frequency statistics on the behavior instructions in the behavior sequence data set to obtain a behavior instruction frequency characteristic table.
Specifically, the computer device may find each behavior instruction in the behavior sequence data set by combining data processing manners such as deduplication, and then count the frequency of occurrence of each behavior instruction in the behavior sequence data set to form a behavior instruction frequency feature table with the user name and the behavior instruction as field names. In the embodiment, by the characteristic of the frequency of the behavior instruction, the behavior inertia of each user and the overall behavior inertia of the user group can be mined, so that a powerful basis is provided for the classification of the user group without social relationship.
105. And classifying and analyzing the frequent instruction combination feature table, the sequence similarity feature table and the behavior instruction frequency feature table by adopting a semi-supervised classification algorithm to obtain user groups with different categories.
The semi-supervised classification algorithm is an algorithm which obtains an initial model by using labeled training data, predicts unlabeled training data by using the initial model, and then iteratively trains the initial model according to a prediction result to obtain a data classification result. The algorithm comprises the following steps: firstly, training a model by using the existing training data, predicting the non-label data, then adding a part of the non-label data with higher confidence degree and labels given by the model into a training set, outputting the current training set and the model when the output result meets the requirements met by the training set and the model, and otherwise, re-training the model until the requirements are met. Currently, the common semi-supervised classification algorithm mainly includes a semi-supervised Support Vector Machines (SVM), a semi-supervised Logistic Regression model (LR), and the like.
Specifically, the computer device may first perform preliminary training through the frequent instruction combination features, the sequence similarity features, and the behavior instruction frequency features of users with classification labels in the user group to obtain an initial classification model, then predict the behavior data of users without classification labels through the initial model to obtain classification labels of users without classification labels, and finally mix the behavior data of all users with the classification labels to perform retraining on the initial model. Repeating the above processes in such a way of continuous iteration until the model parameters and the classification results reach the preset requirements, and obtaining the user group classification model and the user groups with different classes. In the embodiment, by adopting the semi-supervised classification algorithm, a considerable part of workload for adding classification labels to the user data can be reduced, so that the training efficiency of the user classification model is improved, and the labor cost is reduced.
It is understood that, after the behavior sequence data set with the user name as the main object is obtained, the order of generating the frequent instruction combination feature table, the sequence similarity feature table and the behavior instruction frequency feature table based on the behavior sequence data set may be adjusted according to the actual situation, that is, the order of the step 102, the step 103 and the step 104 may be adjusted according to the actual need, and the present embodiment is not limited specifically herein.
According to the classification method for the user group provided by the embodiment, the behavior habit attributes existing among the users are mined by extracting and frequency counting the combination of each operation behavior and the frequent operation behavior of the user group, the potential connection scale between each user and the user group is quantized by calculating the sequence matching score and the inter-sequence similarity score among the behavior sequences in the user group, and the behavior relation attribute among the socializing-free users is compensated. Based on the method, the behavior habit attributes, the behavior relation attributes and the potential connection attributes of the users in the user group are mined, so that the user group classification method can be widely applied to application scenes without social relations among the users and behavior tracks of user operation, and the application range of the user group classification method is expanded. In addition, the method reduces the workload of adding the classification labels to the user group by adopting a semi-supervised classification algorithm, and effectively improves the training efficiency of the user group classification model and the classification efficiency of the user group.
In an embodiment, the step 101 may further include the following steps: the method comprises the steps of firstly obtaining behavior data of a user group, wherein the behavior data of the user group comprises a user name of each user, at least one behavior instruction of each user and operation time of each behavior instruction, then coding the behavior instruction of each user by using a preset character dictionary, sequencing the coded behavior instructions according to the operation time of the behavior instruction to obtain a behavior sequence of each user, and finally generating a behavior sequence data set with the user name of each user as a main object according to the user name of each user and the behavior sequence of each user. In this embodiment, the behavior data of the user group further includes a classification tag of a part of users, that is, a part of users in the user group have a classification tag, and another part of users do not have a classification tag, and correspondingly, the field names of the classification tags are also set in the behavior sequence data set. For example, fig. 2 shows a sample schematic diagram of a behavior sequence dataset, as shown in fig. 2, account refers to a user name, such as "17185", "17187", etc., opt _ seq refers to a behavior sequence, such as "hhB", "hbfhbbhbbbbbbbhbbf", etc., each letter in the behavior sequence refers to a behavior instruction, and the behavior instructions in each behavior sequence are arranged in time sequence, such as "h", "B", etc., label refers to a classification label, there is a classification label, and there is no classification label and there is a special letter, such as "1", "2", "NAN", etc. According to the embodiment, the behavior data of the user group is arranged into the behavior sequence data set, so that feature extraction and classification analysis can be conveniently performed on the behavior data of the user group subsequently, and the data processing efficiency is improved.
In an embodiment, the step 102 may further include the following steps: firstly, extracting frequent behavior instruction combinations in the behavior sequence data set by using an association analysis algorithm to obtain a frequent instruction combination list containing a plurality of frequent behavior instruction combinations, and then counting the frequency of each frequent behavior instruction combination in the frequent instruction combination list in the behavior sequence data set to obtain a frequent instruction combination feature table taking user names and frequent behavior instruction combinations as field names. In this embodiment, the computer device may specifically use the FP-Growth algorithm to extract frequent instruction combinations of all behavior sequences in the behavior sequence data set, so as to obtain a list of frequent instruction combinations with different lengths. For example, fig. 3 shows a sample schematic diagram of a frequent instruction combination feature table, as shown in fig. 3, account refers to a user name, such as "17744.0", "17763.0", etc., other field names refer to frequent behavior instruction combinations, such as "FD", "AC", etc., and numbers under each frequent behavior instruction combination refer to the frequency of occurrence of the frequent behavior instruction combination, such as "8", "16", "9", etc. In this embodiment, the frequent instruction combination list may provide a feature of frequent behavior instruction combination frequency, and through the feature, the daily behavior habit of each user and the overall daily behavior trend of the user group may be mined, so as to provide a basis for the classification accuracy of the user group without social relationship.
In an embodiment, the step 103 may further include the following steps: firstly, calculating a global sequence matching score array and a global sequence similarity score array among behavior sequences in a behavior sequence data set by a global sequence comparison algorithm, respectively calculating a maximum value, a minimum value, an average value, a standard deviation and a variance of the global sequence matching score array and the global sequence similarity score array to obtain a global sequence similarity feature table, then calculating a local sequence matching score array and a local sequence similarity score array among the behavior sequences in the behavior sequence data set by a local sequence comparison algorithm, respectively calculating a maximum value, a minimum value, an average value, a standard deviation and a variance of the local sequence matching score array and the local sequence similarity score array to obtain a local sequence similarity feature table, and finally taking a user name of each user as an associated field, and performing association and combination on the global sequence similarity feature table and the local sequence similarity feature table to obtain a sequence similarity feature table. In this embodiment, the computer device may specifically use a Needleman-Wunsch global sequence alignment algorithm and a Smith-Waterman local sequence alignment algorithm to respectively calculate a global score (sequence matching score) array, a global percent identity (percentage of similarity between sequences) array, a local score array, and a local percent identity array between the behavior sequence of each user and the behavior sequences of all other users, then respectively calculate a maximum value, a minimum value, an average value, a standard deviation, and a variance of each array to output a global sequence similarity feature table and a local sequence similarity feature table, and finally associate and combine the global sequence similarity feature table and the local sequence similarity feature table through a user name field, so as to obtain the sequence similarity feature table. For example, fig. 4 shows a sample diagram of a sequence similarity feature table, as shown in fig. 4, account refers to user names such as "17744.0", "17763.0", etc., and other field names refer to the maximum, minimum, average, standard deviation and variance of each array such as "Ioc _ score _ min", "Ioc _ score _ std", etc. In this embodiment, the sequence similarity feature table may provide a feature of sequence similarity, and through this feature, behavior relationships and potential connections between user behaviors may be mined, so as to improve the classification accuracy of a user group without social relationships and behavior tracks.
In an embodiment, the step 104 may specifically include the following steps: firstly, all behavior instructions in a behavior sequence data set are merged and deduplicated to obtain a behavior instruction list containing all behavior instructions, and then the frequency of each behavior instruction in the behavior instruction list appearing in the behavior sequence data set is counted to obtain a behavior instruction frequency characteristic table taking a user name and the behavior instruction as field names. For example, fig. 5 shows a sample schematic diagram of a behavior instruction frequency characteristic table, as shown in fig. 5, account refers to a user name, such as "17744.0", "17763.0", etc., other field names refer to behavior instructions, such as "a", "B", "C", etc., and the number under each behavior instruction refers to the frequency of occurrence of the behavior instruction, such as "0", "4", "0", etc. In this embodiment, the behavior instruction frequency feature table may provide a feature of behavior instruction frequency, and by the feature, the behavior inertia of each user and the overall behavior inertia of the user group may be mined, so as to further improve the classification accuracy of the user group without social relationship.
In an embodiment, the step 105 may specifically include the following steps: firstly, a user name of each user is used as an association field, association and combination are carried out on a frequent instruction combination feature table, a sequence similarity feature table and a behavior instruction frequency feature table to obtain a feature integrated data table, then classification analysis is carried out on the feature integrated data table through a semi-supervised support vector machine algorithm to obtain a user group classification data table, and user groups with different classes are obtained. For example, fig. 6 shows a sample schematic diagram of a user group classification data table, as shown in fig. 6, account refers to a user name, other field names refer to characteristics such as a behavior instruction and a frequent behavior instruction combination, and label refers to a classification label. Through the user group classification data table, user groups with different categories can be obtained. Further, the classification result of the user group can be more visually observed by making the classification data of the user group as a scatter diagram, wherein a scatter diagram of the classification result of the user group is shown in fig. 7. In the embodiment, by adopting the semi-supervised classification algorithm, a considerable part of workload for adding classification labels to the user data can be reduced, so that the training efficiency of the user classification model is improved, and the labor cost is reduced.
In one embodiment, the user group comprises tagged users and untagged users, wherein the behavior data of the tagged users comprises a classification tag. The step 105 may specifically include the following steps: firstly, training a support vector machine model according to the characteristics of a labeled user in a characteristic integrated data table and the classification labels of the labeled users to obtain an initial user classification model, then inputting the characteristics of a non-labeled user in the characteristic integrated data table into the initial user classification model to obtain the classification labels of the non-labeled user, further optimizing the initial user classification model according to the characteristics of the non-labeled user in the characteristic integrated data table and the classification labels of the non-labeled user to obtain a user classification model, and finally inputting the characteristics of all users in the user group in the characteristic integrated data table into the user classification model to obtain user groups with different categories.
Further, as a refinement and an extension of the specific implementation of the above embodiment, in order to fully explain the implementation process of the embodiment, a method for classifying user groups is provided, as shown in fig. 8, the method includes the following steps:
step 1, acquiring behavior data of a user group, wherein the behavior data comprises a user name, a behavior instruction, operation time of the behavior instruction and an incomplete group classification label of each user;
step 2, data cleaning and processing, which mainly comprises the steps of using a preset character dictionary to code the behavior sequence and generating a behavior sequence data set with the user name as a main object;
step 3, counting the frequent item set as a characteristic, namely performing behavior frequent item calculation and statistics on behavior sequence data of all users through an FP-Growth algorithm, and using the behavior frequent item calculation and statistics as a characteristic field to obtain a data table D0;
step 4, sequence similarity characteristic calculation, namely calculating sequence similarity by using a Needleman-Wunsch algorithm and a Smith-Waterman algorithm aiming at all user behavior sequences, wherein the two algorithms are a global sequence comparison algorithm and a local sequence comparison algorithm respectively and correspond to a score (sequence matching score) array and a percentIdentity (percentage value of similarity between sequences), calculating the maximum value, the minimum value, the average value, the standard deviation and the variance of the score and the percentIdentity array obtained by calculation respectively, and outputting the score and the percentIdentity array as a characteristic column to obtain a data table D1;
step 5, counting the occurrence frequency of each instruction in the behavior sequence of the whole main body object, and taking the occurrence frequency as a characteristic field to obtain a data table D2;
step 6, performing characteristic engineering treatment on all characteristic field data tables D0, D1 and D2, and arranging the characteristic field data tables into a model input format DX;
and 7, obtaining user group classification by using a TSVM semi-supervised classification algorithm.
According to the classification method for the user group provided by the embodiment, global and local sequence similarity comparison calculation is performed on the behavior data of the user group and processed into statistical characteristics, so that the potential connection between each user and all users can be quantized, the behavior relation attribute between users without social contact can be compensated, and the potential connection attribute between the users can be increased; by carrying out frequency statistics on the combination of the behavior instructions and the frequent operation behaviors of the user groups, behavior habit attributes existing among the user groups can be mined, so that the accuracy of classification of the user groups is improved. Finally, the work of manually adding labels can be reduced by using a semi-supervised classification algorithm, so that the automation degree and the operation efficiency of user group classification are improved.
Further, as a specific implementation of the method shown in fig. 1 to fig. 8, the present embodiment provides a user group classification apparatus, as shown in fig. 9, the apparatus includes: the system comprises a user data acquisition module 21, a frequent item feature extraction module 22, a similarity feature extraction module 23, an instruction frequency feature extraction module 24 and a user group classification module 25.
The user data obtaining module 21 may be configured to obtain behavior data of a user group, and pre-process the behavior data of the user group to obtain a behavior sequence dataset in which a user name of each user is a main object, where each user name corresponds to one behavior sequence, and each behavior sequence includes at least one behavior instruction;
the frequent item feature extraction module 22 is configured to extract and sum frequency count the frequent behavior instruction combinations in the behavior sequence data set by using an association analysis algorithm to obtain a frequent instruction combination feature table;
the similarity feature extraction module 23 is configured to calculate, through a sequence comparison algorithm, a sequence matching score and an inter-sequence similarity score between behavior sequences in the behavior sequence data set to obtain a sequence similarity feature table;
the instruction frequency characteristic extraction module 24 is configured to perform frequency statistics on the behavior instructions in the behavior sequence data set to obtain a behavior instruction frequency characteristic table;
the user group classification module 25 may be configured to perform classification analysis on the frequent instruction combination feature table, the sequence similarity feature table, and the behavior instruction frequency feature table by using a semi-supervised classification algorithm, so as to obtain user groups with different categories.
In a specific application scenario, the user data obtaining module 21 is specifically configured to obtain behavior data of a user group, where the behavior data of the user group includes a user name of each user, at least one behavior instruction of each user, and an operation time of each behavior instruction; coding the behavior instruction of each user by using a preset character dictionary; sequencing the coded behavior instructions according to the operation time of the behavior instructions to obtain a behavior sequence of each user; and generating a behavior sequence data set taking the user name of each user as a main object according to the user name of each user and the behavior sequence of each user.
In a specific application scenario, the frequent item feature extraction module 22 is specifically configured to extract a frequent behavior instruction combination in the behavior sequence data set by using an association analysis algorithm, so as to obtain a frequent instruction combination list including a plurality of frequent behavior instruction combinations; and counting the frequency of each frequent behavior instruction combination in the frequent instruction combination list in the behavior sequence data set to obtain a frequent instruction combination feature table taking the user name and the frequent behavior instruction combination as field names.
In a specific application scenario, the similarity feature extraction module 23 is specifically configured to calculate a global sequence matching score array and a global inter-sequence similarity score array between behavior sequences in the behavior sequence data set by using a global sequence comparison algorithm; respectively calculating the maximum value, the minimum value, the average value, the standard deviation and the variance of the global sequence matching score array and the global inter-sequence similarity score array to obtain a global sequence similarity feature table; calculating a local sequence matching score array and a local sequence similarity score array among all behavior sequences in the behavior sequence data set through a local sequence comparison algorithm; respectively calculating the maximum value, the minimum value, the average value, the standard deviation and the variance of the local sequence matching score array and the local sequence inter-similarity score array to obtain a local sequence similarity feature table; and taking the user name of each user as an association field, and performing association combination on the global sequence similarity feature table and the local sequence similarity feature table to obtain a sequence similarity feature table.
In a specific application scenario, the instruction frequency feature extraction module 24 may be specifically configured to perform merging and deduplication processing on all behavior instructions in the behavior sequence data set to obtain a behavior instruction list including all behavior instructions; and counting the frequency of each behavior instruction in the behavior instruction list in the behavior sequence data set to obtain a behavior instruction frequency characteristic table with the user name and the behavior instruction as field names.
In a specific application scenario, the user group classification module 25 is specifically configured to perform association and merging on the frequent instruction combination feature table, the sequence similarity feature table, and the behavior instruction frequency feature table by using a user name of each user as an association field to obtain a feature integration data table; and classifying and analyzing the feature integration data table through a semi-supervised support vector machine algorithm to obtain user groups with different categories.
In a specific application scenario, a user group comprises tagged users and non-tagged users, and behavior data of the tagged users comprises a classification tag; the user group classification module 25 is further specifically configured to train the support vector machine model according to the features of the tagged users in the feature integration data table and the classification tags of the tagged users, so as to obtain an initial user classification model; inputting the characteristics of the users without labels in the characteristic integrated data table into an initial user classification model to obtain the classification labels of the users without labels; optimizing the initial user classification model according to the characteristics of the non-label users in the characteristic integration data table and the classification labels of the non-label users to obtain a user classification model; and inputting the characteristics of all users in the user group in the characteristic integration data table into the user classification model to obtain the user groups with different categories.
It should be noted that other corresponding descriptions of the functional modules related to the classification device for a user group provided in this embodiment may refer to the corresponding descriptions in fig. 1 to fig. 8, and are not described herein again.
Based on the method shown in fig. 1 to 8, correspondingly, the present embodiment further provides a storage medium, on which a computer program is stored, and the program, when executed by a processor, implements the method for classifying a user group shown in fig. 1 to 8.
Based on such understanding, the technical solution of the present application may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (which may be a CD-ROM, a usb disk, a removable hard disk, etc.), and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method according to the implementation scenarios of the present application.
Based on the method shown in fig. 1 to 8 and the embodiment of the classification apparatus for a user group shown in fig. 9, in order to achieve the above object, the present embodiment further provides an entity device for classifying a user group, which may specifically be a personal computer, a server, a smart phone, a tablet computer, a smart watch, or other network devices, and the entity device includes a storage medium and a processor; a storage medium for storing a computer program; a processor for executing a computer program to implement the above-described method as shown in fig. 1 to 8.
Optionally, the entity device may further include a user interface, a network interface, a camera, a Radio Frequency (RF) circuit, a sensor, an audio circuit, a WI-FI module, and the like. The user interface may include a Display screen (Display), an input unit such as a keypad (Keyboard), etc., and the optional user interface may also include a USB interface, a card reader interface, etc. The network interface may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface), etc.
Those skilled in the art will appreciate that the classified entity device structure of a user group provided in the present embodiment does not constitute a limitation to the entity device, and may include more or less components, or combine some components, or arrange different components.
The storage medium may further include an operating system and a network communication module. The operating system is a program for managing the hardware of the above-mentioned entity device and the software resources to be identified, and supports the operation of the information processing program and other software and/or programs to be identified. The network communication module is used for realizing communication among components in the storage medium and other hardware and software in the entity device.
Through the above description of the embodiments, those skilled in the art will clearly understand that the present application can be implemented by software plus a necessary general hardware platform, and can also be implemented by hardware. The method comprises the steps of obtaining behavior data of a user group, preprocessing the behavior data of the user group to obtain a behavior sequence data set with the user name of each user as a main object, extracting and frequency counting frequent behavior instruction combinations in the behavior sequence data set by using an association analysis algorithm to obtain a frequent instruction combination feature table, calculating sequence matching scores and inter-sequence similarity scores among behavior sequences in the behavior sequence data set by using a sequence comparison algorithm to obtain a sequence similarity feature table, carrying out frequency counting on behavior instructions in the behavior sequence data set to obtain a behavior instruction frequency feature table, and carrying out classification analysis on the frequent instruction combination feature table, the sequence similarity feature table and the behavior instruction frequency feature table by using a semi-supervised classification algorithm to obtain user groups with different classes. Compared with the prior art, the method has the advantages that the behavior habit attributes, the behavior relation attributes and the potential connection attributes of the users in the user group are mined, so that the user group classification method can be widely applied to application scenes without social relations among the users and behavior tracks of user operation, and the application range of the user group classification method is expanded. In addition, the method also reduces the workload of adding the classification labels to the user group, and effectively improves the training efficiency of the user group classification model and the classification efficiency of the user group.
Those skilled in the art will appreciate that the figures are merely schematic representations of one preferred implementation scenario and that the blocks or flow diagrams in the figures are not necessarily required to practice the present application. Those skilled in the art will appreciate that the modules in the devices in the implementation scenario may be distributed in the devices in the implementation scenario according to the description of the implementation scenario, or may be located in one or more devices different from the present implementation scenario with corresponding changes. The modules of the implementation scenario may be combined into one module, or may be further split into a plurality of sub-modules.
The above application serial numbers are for description purposes only and do not represent the superiority or inferiority of the implementation scenarios. The above disclosure is only a few specific implementation scenarios of the present application, but the present application is not limited thereto, and any variations that can be made by those skilled in the art are intended to fall within the scope of the present application.

Claims (10)

1. A method for classifying a user population, the method comprising:
acquiring behavior data of a user group, and preprocessing the behavior data of the user group to obtain a behavior sequence data set taking the user name of each user as a main object, wherein each user name corresponds to a behavior sequence, and each behavior sequence comprises at least one behavior instruction;
extracting and carrying out frequency statistics on frequent behavior instruction combinations in the behavior sequence data set by using a correlation analysis algorithm to obtain a frequent instruction combination feature table;
calculating sequence matching scores and inter-sequence similarity scores among all behavior sequences in the behavior sequence data set through a sequence comparison algorithm to obtain a sequence similarity feature table;
performing frequency statistics on the behavior instructions in the behavior sequence data set to obtain a behavior instruction frequency feature table;
and carrying out classification analysis on the frequent instruction combination feature table, the sequence similarity feature table and the behavior instruction frequency feature table by adopting a semi-supervised classification algorithm to obtain user groups with different categories.
2. The method according to claim 1, wherein the acquiring the behavior data of the user group and preprocessing the behavior data of the user group to obtain a behavior sequence dataset with the user name of each user as a main object comprises:
acquiring behavior data of a user group, wherein the behavior data of the user group comprises a user name of each user, at least one behavior instruction of each user and operation time of each behavior instruction;
coding the behavior instruction of each user by using a preset character dictionary;
sequencing the coded behavior instructions according to the operation time of the behavior instructions to obtain a behavior sequence of each user;
and generating a behavior sequence data set taking the user name of each user as a main object according to the user name of each user and the behavior sequence of each user.
3. The method of claim 1, wherein the extracting and frequency counting the frequent behavior instruction combinations in the behavior sequence data set by using a correlation analysis algorithm to obtain a frequent instruction combination feature table comprises:
extracting frequent behavior instruction combinations in the behavior sequence data set by using a correlation analysis algorithm to obtain a frequent instruction combination list containing a plurality of frequent behavior instruction combinations;
and counting the frequency of each frequent behavior instruction combination in the frequent instruction combination list in the behavior sequence data set to obtain a frequent instruction combination feature table taking the user name and the frequent behavior instruction combination as field names.
4. The method of claim 1, wherein calculating a sequence match score and an inter-sequence similarity score between each behavior sequence in the behavior sequence data set by using a sequence alignment algorithm to obtain a sequence similarity feature table comprises:
calculating a global sequence matching score array and a global inter-sequence similarity score array among all behavior sequences in the behavior sequence data set through a global sequence comparison algorithm;
respectively calculating the maximum value, the minimum value, the average value, the standard deviation and the variance of the global sequence matching score array and the global inter-sequence similarity score array to obtain a global sequence similarity feature table;
calculating a local sequence matching score array and a local sequence similarity score array between each behavior sequence in the behavior sequence data set through a local sequence comparison algorithm;
respectively calculating the maximum value, the minimum value, the average value, the standard deviation and the variance of the local sequence matching score array and the local sequence inter-similarity score array to obtain a local sequence similarity feature table;
and taking the user name of each user as an association field, and associating and combining the global sequence similarity feature table and the local sequence similarity feature table to obtain a sequence similarity feature table.
5. The method of claim 1, wherein performing frequency statistics on the behavior commands in the behavior sequence data set to obtain a behavior command frequency feature table comprises:
merging and de-duplicating all the behavior instructions in the behavior sequence data set to obtain a behavior instruction list containing all the behavior instructions;
and counting the frequency of each behavior instruction in the behavior instruction list in the behavior sequence data set to obtain a behavior instruction frequency characteristic table taking the user name and the behavior instruction as field names.
6. The method according to claim 1, wherein the classifying and analyzing the frequent instruction combination feature table, the sequence similarity feature table and the behavior instruction frequency feature table by using a semi-supervised classification algorithm to obtain user groups with different categories comprises:
taking the user name of each user as an association field, and performing association combination on the frequent instruction combination feature table, the sequence similarity feature table and the behavior instruction frequency feature table to obtain a feature integrated data table;
and carrying out classification analysis on the feature integration data table through a semi-supervised support vector machine algorithm to obtain user groups with different categories.
7. The method of claim 6, wherein the user group comprises tagged users and untagged users, and the behavior data of the tagged users comprises a category tag; then, the classifying and analyzing the feature integration data table through a semi-supervised support vector machine algorithm to obtain user groups with different categories, including:
training a support vector machine model according to the characteristics of the labeled users in the characteristic integration data table and the classification labels of the labeled users to obtain an initial user classification model;
inputting the characteristics of the label-free user in the characteristic integration data table into the initial user classification model to obtain the classification label of the label-free user;
optimizing the initial user classification model according to the characteristics of the label-free user in the characteristic integration data table and the classification label of the label-free user to obtain a user classification model;
and inputting the characteristics of all users in the user group in the characteristic integration data table into the user classification model to obtain user groups with different categories.
8. An apparatus for classifying a user population, the apparatus comprising:
the user data acquisition module is used for acquiring behavior data of a user group and preprocessing the behavior data of the user group to obtain a behavior sequence dataset which takes the user name of each user as a main object, wherein each user name corresponds to one behavior sequence, and each behavior sequence comprises at least one behavior instruction;
the frequent item feature extraction module is used for extracting and carrying out frequency statistics on the frequent behavior instruction combination in the behavior sequence data set by using a correlation analysis algorithm to obtain a frequent instruction combination feature table;
the similarity characteristic extraction module is used for calculating sequence matching scores and inter-sequence similarity scores among all behavior sequences in the behavior sequence data set through a sequence comparison algorithm to obtain a sequence similarity characteristic table;
the instruction frequency characteristic extraction module is used for carrying out frequency statistics on the behavior instructions in the behavior sequence data set to obtain a behavior instruction frequency characteristic table;
and the user group classification module is used for classifying and analyzing the frequent instruction combination feature table, the sequence similarity feature table and the behavior instruction frequency feature table by adopting a semi-supervised classification algorithm to obtain user groups with different categories.
9. A storage medium having a computer program stored thereon, the computer program, when being executed by a processor, realizing the steps of the method of any one of claims 1 to 7.
10. A computer arrangement comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the computer program realizes the steps of the method of any one of claims 1 to 7 when executed by the processor.
CN202111412279.0A 2021-11-25 2021-11-25 User group classification method and device, storage medium and computer equipment Active CN113836370B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202111412279.0A CN113836370B (en) 2021-11-25 2021-11-25 User group classification method and device, storage medium and computer equipment
PCT/CN2021/135899 WO2023092646A1 (en) 2021-11-25 2021-12-07 Method and apparatus for classifying user group, and storage medium and computer device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111412279.0A CN113836370B (en) 2021-11-25 2021-11-25 User group classification method and device, storage medium and computer equipment

Publications (2)

Publication Number Publication Date
CN113836370A true CN113836370A (en) 2021-12-24
CN113836370B CN113836370B (en) 2022-03-01

Family

ID=78971392

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111412279.0A Active CN113836370B (en) 2021-11-25 2021-11-25 User group classification method and device, storage medium and computer equipment

Country Status (2)

Country Link
CN (1) CN113836370B (en)
WO (1) WO2023092646A1 (en)

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140101580A1 (en) * 2012-10-09 2014-04-10 Ebay Inc. Visual mining of user behavior patterns
CN106657410A (en) * 2017-02-28 2017-05-10 国家电网公司 Detection method for abnormal behaviors based on user access sequence
CN109086816A (en) * 2018-07-24 2018-12-25 重庆富民银行股份有限公司 A kind of user behavior analysis system based on Bayesian Classification Arithmetic
CN110472050A (en) * 2019-07-24 2019-11-19 阿里巴巴集团控股有限公司 A kind of clique's clustering method and device
CN110837862A (en) * 2019-11-06 2020-02-25 腾讯科技(深圳)有限公司 User classification method and device
CN110879856A (en) * 2019-11-27 2020-03-13 国家计算机网络与信息安全管理中心 Social group classification method and system based on multi-feature fusion
CN111274907A (en) * 2020-01-16 2020-06-12 支付宝(杭州)信息技术有限公司 Method and apparatus for determining a category label of a user using a category identification model
US20200195672A1 (en) * 2018-12-18 2020-06-18 Fortinet, Inc. Analyzing user behavior patterns to detect compromised nodes in an enterprise network
CN111694718A (en) * 2020-05-27 2020-09-22 平安普惠企业管理有限公司 Method and device for identifying abnormal behavior of intranet user, computer equipment and readable storage medium
CN112116464A (en) * 2020-05-21 2020-12-22 上海金融期货信息技术有限公司 Abnormal transaction behavior analysis method and system based on event sequence frequent item set
CN113011886A (en) * 2021-02-19 2021-06-22 腾讯科技(深圳)有限公司 Method and device for determining account type and electronic equipment
CN113378892A (en) * 2021-05-20 2021-09-10 南京光普信息技术有限公司 Multi-sequence comparison classification method based on mobile phone app use behavior data
CN113468432A (en) * 2021-08-02 2021-10-01 东莞市汇学汇玩教育科技有限公司 Mobile internet-based user behavior big data analysis method and system

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111488385B (en) * 2020-04-07 2023-08-15 腾讯科技(深圳)有限公司 Data processing method and device based on artificial intelligence and computer equipment
CN112541745B (en) * 2020-12-22 2024-04-09 平安银行股份有限公司 User behavior data analysis method and device, electronic equipment and readable storage medium
CN112632351B (en) * 2020-12-28 2024-01-16 北京百度网讯科技有限公司 Classification model training method, classification method, device and equipment
CN113239249A (en) * 2021-06-04 2021-08-10 腾讯科技(深圳)有限公司 Object association identification method and device and storage medium

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140101580A1 (en) * 2012-10-09 2014-04-10 Ebay Inc. Visual mining of user behavior patterns
CN106657410A (en) * 2017-02-28 2017-05-10 国家电网公司 Detection method for abnormal behaviors based on user access sequence
CN109086816A (en) * 2018-07-24 2018-12-25 重庆富民银行股份有限公司 A kind of user behavior analysis system based on Bayesian Classification Arithmetic
US20200195672A1 (en) * 2018-12-18 2020-06-18 Fortinet, Inc. Analyzing user behavior patterns to detect compromised nodes in an enterprise network
CN110472050A (en) * 2019-07-24 2019-11-19 阿里巴巴集团控股有限公司 A kind of clique's clustering method and device
CN110837862A (en) * 2019-11-06 2020-02-25 腾讯科技(深圳)有限公司 User classification method and device
CN110879856A (en) * 2019-11-27 2020-03-13 国家计算机网络与信息安全管理中心 Social group classification method and system based on multi-feature fusion
CN111274907A (en) * 2020-01-16 2020-06-12 支付宝(杭州)信息技术有限公司 Method and apparatus for determining a category label of a user using a category identification model
CN112116464A (en) * 2020-05-21 2020-12-22 上海金融期货信息技术有限公司 Abnormal transaction behavior analysis method and system based on event sequence frequent item set
CN111694718A (en) * 2020-05-27 2020-09-22 平安普惠企业管理有限公司 Method and device for identifying abnormal behavior of intranet user, computer equipment and readable storage medium
CN113011886A (en) * 2021-02-19 2021-06-22 腾讯科技(深圳)有限公司 Method and device for determining account type and electronic equipment
CN113378892A (en) * 2021-05-20 2021-09-10 南京光普信息技术有限公司 Multi-sequence comparison classification method based on mobile phone app use behavior data
CN113468432A (en) * 2021-08-02 2021-10-01 东莞市汇学汇玩教育科技有限公司 Mobile internet-based user behavior big data analysis method and system

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
MO CHEN等: "a research on user behavior sequence analysis based on social networking service use-case model", 《SCIENCE AND TECHNOLOGY》 *
陈恩红等: "用户序列行为分析研究与应用综述", 《安徽大学学报》 *

Also Published As

Publication number Publication date
CN113836370B (en) 2022-03-01
WO2023092646A1 (en) 2023-06-01

Similar Documents

Publication Publication Date Title
CN112199375B (en) Cross-modal data processing method and device, storage medium and electronic device
CN108629043B (en) Webpage target information extraction method, device and storage medium
CN111639516B (en) Analysis platform based on machine learning
CN110457672B (en) Keyword determination method and device, electronic equipment and storage medium
CN112528025A (en) Text clustering method, device and equipment based on density and storage medium
CN111104514A (en) Method and device for training document label model
CN111984792A (en) Website classification method and device, computer equipment and storage medium
CN112632278A (en) Labeling method, device, equipment and storage medium based on multi-label classification
CN112287069A (en) Information retrieval method and device based on voice semantics and computer equipment
CN111783471A (en) Semantic recognition method, device, equipment and storage medium of natural language
CN113722438A (en) Sentence vector generation method and device based on sentence vector model and computer equipment
CN112446209A (en) Method, equipment and device for setting intention label and storage medium
CN112926341A (en) Text data processing method and device
CN113836370B (en) User group classification method and device, storage medium and computer equipment
CN114970553B (en) Information analysis method and device based on large-scale unmarked corpus and electronic equipment
CN116340516A (en) Entity relation cluster extraction method, device, equipment and storage medium
CN115238676A (en) Method and device for identifying hot spots of bidding demands, storage medium and electronic equipment
CN115203208A (en) Value range table matching method, device, equipment and storage medium
CN112069807A (en) Text data theme extraction method and device, computer equipment and storage medium
CN117093717B (en) Similar text aggregation method, device, equipment and storage medium thereof
CN114971744B (en) User portrait determination method and device based on sparse matrix
CN115618968B (en) New idea discovery method and device, electronic device and storage medium
CN111881190B (en) Key data mining system based on customer portrait
CN113792549B (en) User intention recognition method, device, computer equipment and storage medium
CN113139039B (en) Dialogue data processing method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant