CN111652278B - User behavior detection method, device, electronic equipment and medium - Google Patents

User behavior detection method, device, electronic equipment and medium Download PDF

Info

Publication number
CN111652278B
CN111652278B CN202010370028.XA CN202010370028A CN111652278B CN 111652278 B CN111652278 B CN 111652278B CN 202010370028 A CN202010370028 A CN 202010370028A CN 111652278 B CN111652278 B CN 111652278B
Authority
CN
China
Prior art keywords
user
data
data set
branch
standard
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010370028.XA
Other languages
Chinese (zh)
Other versions
CN111652278A (en
Inventor
梁翰鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Property and Casualty Insurance Company of China Ltd
Original Assignee
Ping An Property and Casualty Insurance Company of China Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Property and Casualty Insurance Company of China Ltd filed Critical Ping An Property and Casualty Insurance Company of China Ltd
Priority to CN202010370028.XA priority Critical patent/CN111652278B/en
Publication of CN111652278A publication Critical patent/CN111652278A/en
Application granted granted Critical
Publication of CN111652278B publication Critical patent/CN111652278B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to artificial intelligence, and discloses a user behavior detection method, which comprises the following steps: acquiring an initial user data set, and normalizing and vectorizing the initial user data set to obtain a standard user data set; branching the standard user data set to obtain a branch combination set, and calculating the degree of dependence of each branch combination in the branch combination set; selecting a preset number of branch combinations from the branch combination sets according to the degree of dependence to obtain a training data set; training a user behavior prediction model by utilizing the training data set; and obtaining user data to be predicted, and analyzing user behaviors by utilizing the user behavior prediction model according to the user data to be predicted. Furthermore, the present invention relates to blockchain techniques in which data for model training and prediction may be stored. The invention also provides a user behavior detection device, electronic equipment and a computer readable storage medium. The invention can reduce the calculated amount of model training and further improve the efficiency of model training.

Description

User behavior detection method, device, electronic equipment and medium
Technical Field
The present invention relates to the field of artificial intelligence, and in particular, to a method, an apparatus, an electronic device, and a medium for detecting user behavior.
Background
In recent years, with the development and breakthrough of computer deep learning technology, a deep learning model is constructed, user behaviors are predicted according to user data, humanized prediction suggestions are provided for users, and the deep learning model is a target for pursuing full-scene intelligent life at present, but a large amount of data is needed for training, so that the calculation amount consumed by model training is large, and the time for model training is long.
Disclosure of Invention
The invention provides a user behavior detection method, a user behavior detection device, electronic equipment and a medium, which can reduce the calculated amount of model training and further improve the efficiency of model training.
In order to achieve the above object, the present invention provides a method for detecting user behavior, including:
acquiring an initial user data set, and carrying out standardization processing and vectorization processing on the initial user data set to obtain a standard user data set;
Branching is carried out on the standard user data set based on the coefficient of the radix, a branch combination set is obtained, and the degree of dependence of each branch combination in the branch combination set is calculated;
selecting a preset number of branch combinations from the branch combination sets according to the degree of dependence to obtain a training data set;
Training a user behavior prediction model by utilizing the training data set;
And obtaining user data to be predicted, and analyzing user behaviors by utilizing the user behavior prediction model according to the user data to be predicted.
Optionally, the normalizing the initial user data set includes:
acquiring numerical data from the initial user data set, calculating the dispersion and standard deviation of each acquired numerical data, and replacing each corresponding numerical data by the ratio of the dispersion and standard deviation of each numerical data;
Vectorizing the initial user data set includes:
And acquiring character type data from the initial user data set, performing word segmentation on the character type data by using a jieba word segmentation tool, and performing coding processing on the character type data subjected to word segmentation by using single-hot coding.
Optionally, the branching processing is performed on the standard user data set based on the kennel coefficient to obtain a branching combination set, which includes:
Dividing the standard data set into different branch combinations according to each user standard characteristic data corresponding to each user characteristic in the standard user data set;
And selecting branch combinations with the coefficient of the base coefficient reaching a preset value for summarizing to obtain the branch combination set.
Optionally, the selecting a preset number of branch data from the multicomponent branch data according to the degree of dependence to obtain a training data set includes:
and sequencing the branch combinations in the branch combination set according to the degree of dependence, and selecting the branch combinations with the preset quantity in the preset ranking range for summarizing to obtain the training data set.
Optionally, the obtaining the user data to be predicted, analyzing the user behavior according to the user data to be predicted by using a trained user behavior prediction model, including:
inputting the user data to be predicted into the user behavior prediction model, and outputting a predicted value of at least one predicted behavior of the user to be predicted;
Ordering the predicted values from high to low;
and determining the predicted behavior corresponding to the predicted value of the preset bit before row as the user behavior.
Optionally calculating the degree of dependence of each branch combination in the set of branch combinations includes:
the degree of dependence of each branch combination in the set of branch combinations is calculated using the following formula:
Wherein, C represents the degree of dependence of the branch combination, X i represents the number of users corresponding to different types of user standard feature data corresponding to the branch combination, N represents the number of all users in the branch combination, Z represents the type of user standard feature data corresponding to the branch combination, and i is the type of different user standard feature data.
Optionally, the training the user behavior prediction model using the training data set includes:
Selecting the user characteristics of the corresponding user standard characteristic data in the branch combination set as decision nodes of the user behavior prediction model;
Selecting the corresponding user standard characteristic data in the branch combination set as the split value of the user behavior prediction model;
And constructing the user behavior prediction model by using the training data set and the decision node and the split value.
In order to solve the above-mentioned problems, the present invention also provides a user behavior detection apparatus, the apparatus comprising:
The processing module is used for acquiring an initial user data set, and carrying out standardization processing and vectorization processing on the initial user data set to obtain a standard user data set;
The generation module is used for carrying out branch processing on the standard user data set based on the coefficient of the radix, obtaining a branch combination set and calculating the degree of dependence of each branch combination in the branch combination set; selecting a preset number of branch combinations from the branch combination sets according to the degree of dependence to obtain a training data set;
the model training module is used for training a user behavior prediction model by utilizing the training data set;
and the behavior prediction module is used for acquiring user data to be predicted and analyzing the user behavior by utilizing the user behavior prediction model according to the user data to be predicted.
In order to solve the above-mentioned problems, the present invention also provides an electronic apparatus including:
A memory storing at least one instruction; and
And the processor executes the instructions stored in the memory to realize the user behavior detection method.
In order to solve the above-mentioned problems, the present invention also provides a computer-readable storage medium including a storage data area storing data created according to use of a blockchain node and a storage program area storing a computer program, the computer-readable storage medium storing therein at least one instruction to be executed by a processor in an electronic device to implement the above-mentioned user behavior detection method.
In the embodiment of the invention, the initial user data set is obtained, standardized processing and vectorization processing are carried out on the initial user data set to obtain the standard user data set, irrelevant data is removed, the data size is reduced, and the occupation of data resources is reduced; branching is carried out on the standard user data set based on the coefficient of the radix, a branch combination set is obtained, the degree of dependence of each branch combination in the branch combination set is calculated, the data are combined, and the data volume is reduced; selecting a preset number of branch combinations from the branch combination sets according to the degree of dependence to obtain a training data set, and screening the data to further reduce the data quantity; training a user behavior prediction model by utilizing the training data set; and obtaining user data to be predicted, and analyzing user behaviors by utilizing the user behavior prediction model according to the user data to be predicted. The invention can reduce the data volume required by model training, thereby reducing the calculated amount of model training and further improving the efficiency of model training.
Drawings
FIG. 1 is a flowchart illustrating a user behavior detection method according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a user behavior detection method according to an embodiment of the present invention;
Fig. 3 is a schematic diagram of an internal structure of an electronic device according to a user behavior detection method according to an embodiment of the present invention;
The achievement of the objects, functional features and advantages of the present invention will be further described with reference to the accompanying drawings, in conjunction with the embodiments.
Detailed Description
It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
The execution body of the user behavior detection method provided by the embodiment of the application comprises at least one of a server, a terminal and the like which can be configured to execute the method provided by the embodiment of the application. In other words, the user behavior detection method may be performed by software or hardware installed in a terminal device or a server device, and the software may be a blockchain platform. The service end includes but is not limited to: a single server, a server cluster, a cloud server or a cloud server cluster, and the like.
Referring to fig. 1, a flowchart of a user behavior detection method according to an embodiment of the invention is shown. In this embodiment, the user behavior detection method includes:
s1, acquiring an initial user data set, and carrying out standardization processing and vectorization processing on the initial user data set to obtain a standard user data set.
In the embodiment of the present invention, the initial user data set includes user data of a plurality of users, for example, the user data includes: user personal information data (sex, age, etc.), user vehicle data (vehicle age, vehicle price, etc.), user vehicle policy information (premium, kinds of risks, etc.), and the like. The initial user data set may be obtained from a vehicle insurance database of an insurance company.
Further, the user feature data is corresponding user feature data of different features of the user, for example: the user characteristic data corresponding to the sex characteristic of a certain user is male, and the user characteristic data corresponding to the age characteristic of a certain user is 25 years old.
Further, since the initial user data set includes duplicate data and missing data, in order to avoid the duplicate data and missing data from affecting the subsequent model training, the embodiment of the present invention performs duplicate data deletion processing and missing data padding processing on the initial user data set. Wherein the de-duplication process includes deleting other data of the same attribute after the first occurrence, for example: the first item in the initial user data set is the user age, the second item is the user age, and the third item is the user age; the missing data filling process may use a mean value estimation method to process, and fill the mean value of the first n items of data of the missing value data in the user data as the missing value.
Further, since the data volume of the subsequent model is to be reduced, the data in the initial user data set is to be compressed, so that the embodiment of the invention performs standardization processing and vectorization processing on the initial user data set, converts the user characteristic data into user standard characteristic data, and gathers all the user standard characteristic data to obtain the standard user data set.
In detail, the normalizing the initial user data set according to the embodiment of the present invention includes:
acquiring numerical data from the initial user data set, calculating the dispersion and standard deviation of each acquired numerical data, and replacing each corresponding numerical data by the ratio of the dispersion and standard deviation of each numerical data;
Further, the vectorizing processing of the initial user data set according to the embodiment of the present invention includes:
And acquiring character type data from the initial user data set, performing word segmentation on the character type data by using a jieba word segmentation tool, and performing coding processing on the character type data subjected to word segmentation by using single-hot coding.
By carrying out standardization processing and vectorization processing on the initial user data set, the data size is reduced, the training time consumption of the follow-up model can be effectively reduced, and the training efficiency of the follow-up model is further improved.
S2, carrying out branch processing on the standard user data set based on the radix coefficient to obtain a branch combination set, and calculating the degree of dependence of each branch combination in the branch combination set.
In the embodiment of the invention, the normalization and vectorization processes only process the user characteristic data and do not affect the user characteristics, so that the standard user data set contains the user standard characteristic data corresponding to different user characteristics.
Further, since the same user feature includes a plurality of user standard feature data, in order to determine the importance of different user standard feature data, the embodiment of the present invention performs branching processing on the standard data set based on the base index.
In detail, the branching process includes:
s21, dividing the standard data set into different branch combinations according to different user standard characteristic data corresponding to each user characteristic in the standard user data set.
In detail, the standard data set is divided into two branch data sets according to whether each user standard feature data corresponding to each user feature is yes or not, and the two branch data sets are combined to obtain a branch combination corresponding to the user standard feature data. For example: the standard feature data of the user corresponding to the age feature of the user comprises 3 age data of 25 years old, 26 years old and 27 years old, and the standard data sets are respectively divided into a branch data set with the age data of 25 years old and a branch combination of a branch data set with the age data of not 25 years old, a branch data set with the age data of 26 years old and a branch combination of a branch data set with the age data of not 26 years old, a branch data set with the age data of 27 years old and a branch combination of a branch data set with the age data of not 27 years old.
S22, selecting branch combinations with the coefficient of the radix coefficient reaching a preset value for summarizing to obtain the branch combination set.
Further, the formula for calculating the coefficient of kunity is as follows:
Wherein D is the standard dataset, a is the user feature in the standard dataset, a i is the user standard feature data corresponding to the user feature in the standard dataset, D 1 is the branch dataset of a i, D 2 is the branch dataset of the standard dataset in which the user standard feature data is not a i, gini (D 1) is the kenel coefficient of D 1, gini (D 2) is the kenel coefficient of D 2, gini (D, a|a i) is the kenel coefficient of the branch combination corresponding to a i.
For example: the branch combination is obtained by classifying males according to user standard characteristic data corresponding to the user sex characteristics, A represents the user sex characteristics, A i represents the user standard characteristic data corresponding to the user sex characteristics as males, D 1 represents a branch data set of which the user is a male in the standard data set, and D 2 represents a branch data set of which the user is not a male in the standard data set.
S23, calculating the degree of dependence of each branch combination in the branch combination set.
In detail, the calculation formula of the degree of dependence is:
Wherein, C represents the degree of dependence of the branch combination, X i represents the number of users corresponding to different types of user standard feature data corresponding to the branch combination, N represents the number of all users in the branch combination, Z represents the type of user standard feature data corresponding to the branch combination, and i is the type of different user standard feature data.
For example: the branch combination is obtained by classifying the user standard feature data corresponding to the user gender feature, Z represents that the number of categories of the user standard feature data corresponding to the user gender feature in the branch combination is 2 (male and female), i represents the categories of different user standard feature data (i represents that the category of the user standard feature data is male, i represents that the category of the user standard feature data is female) and X 1 represents that the user standard feature data in the branch combination is male and the number of users corresponding to X 2.
S3, selecting a preset number of branch combinations from the branch combination set according to the degree of dependence, and obtaining a training data set.
In detail, in the embodiment of the present invention, the branch combinations in the branch combination set are ordered according to the degree of dependence, and the preset number of branch combinations in the preset ranking range are selected for summarization, so as to obtain the training data set. For example: and ranking the branch combinations in the branch combination set according to the degree of dependence from large to small, and selecting the branch combinations with the top 5 ranks to obtain a training data set.
The training data set is constructed by screening branch combinations with better degree of dependence, so that a trained model has higher stability, and the loss of calculation resources is reduced.
And S4, training a user behavior prediction model by using the training data set.
Preferably, the user behavior prediction model can be constructed by using a random forest regression model.
In detail, selecting the user characteristic of the corresponding user standard characteristic data in the branch combination set as a decision node of the user behavior prediction model; selecting the corresponding user standard characteristic data in the branch combination set as the split value of the user behavior prediction model; and constructing the user behavior prediction model by using the training data set and the decision node and the split value.
Further, the user standard feature data of the user behavior feature in the training data set is used as a tag set, wherein the user standard feature data of the user behavior feature can be user standard feature data corresponding to the consumption behavior feature of the user, namely, the amount spent by different consumption types of the user, for example: when the user purchases fruits, the cost of purchasing apples is 10 yuan, and the cost of purchasing pears is 20 yuan; when a user purchases the vehicle insurance, the vehicle loss insurance is purchased for 2000 yuan, and the vehicle spontaneous combustion insurance is purchased for 1000 yuan. Further, taking other user standard characteristic data except the data contained in the tag set in the training data set as a training set, constructing a model error function to train the model, wherein the model error function can be represented by the following formula:
wherein W is a model error value, n is the number of users in the training data set, A t is an expected value of user standard feature data of user behavior features, E t is a predicted value of user standard feature data of user behavior features, and t represents different users.
In the embodiment of the invention, training is stopped when the model error value reaches the preset threshold value, and the user behavior prediction model is obtained.
S5, obtaining user data to be predicted, and analyzing user behaviors according to the user data to be predicted by using a trained user behavior prediction model.
In another embodiment of the present invention, training data for the user behavior prediction model and prediction data may be stored in a blockchain.
The user data to be predicted in the embodiment of the invention comprises relevant personal data of the user, for example: personal data of the user, consumption history data of the user, and the like. The user data to be predicted can be acquired through a webpage or an APP according to user authorization.
The embodiment of the invention inputs the user data to be predicted into the user behavior prediction model, outputs the predicted value of at least one predicted behavior of the user to be predicted, further orders the predicted values from high to low, determines the predicted behavior corresponding to the predicted value of the preset bit before the predicted value as the user predicted behavior, and preferably, the preset number can be the first five bits. For example: when the user wants to purchase the fruit, the user behavior prediction model can automatically predict the user purchasing behavior according to the user data to obtain probability values of purchasing the fruit with different prices, and push the purchasing proposal of the fruit with the front five predicted values to the user, wherein the purchasing proposal of the fruit with the front five predicted values is the predicted user purchasing proposal, namely the user behavior.
In the embodiment of the invention, the initial user data set is obtained, standardized processing and vectorization processing are carried out on the initial user data set to obtain the standard user data set, irrelevant data is removed, the data size is reduced, and the occupation of data resources is reduced; branching is carried out on the standard user data set based on the coefficient of the radix, a branch combination set is obtained, the degree of dependence of each branch combination in the branch combination set is calculated, the data are combined, and the data volume is reduced; selecting a preset number of branch combinations from the branch combination sets according to the degree of dependence to obtain a training data set, and screening the data to further reduce the data quantity; training a user behavior prediction model by utilizing the training data set; and obtaining user data to be predicted, and analyzing user behaviors by utilizing the user behavior prediction model according to the user data to be predicted. The invention can reduce the data volume required by model training, thereby reducing the calculated amount of model training and further improving the efficiency of model training.
As shown in fig. 2, a functional block diagram of the user behavior detection apparatus of the present invention is shown.
The user behavior detection apparatus 100 of the present invention may be mounted in an electronic device. Depending on the implemented functionality, the apparatus may include a processing module 101, a generation module 102, a model training module 103, a behavior prediction module 104. The module of the present invention may also be referred to as a unit, meaning a series of computer program segments capable of being executed by the processor of the electronic device and of performing fixed functions, stored in the memory of the electronic device.
In the present embodiment, the functions concerning the respective modules/units are as follows:
The processing module 101 is configured to obtain an initial user data set, and perform normalization processing and vectorization processing on the initial user data set to obtain a standard user data set.
In the embodiment of the present invention, the initial user data set includes user data of a plurality of users, for example, the user data includes: user personal information data (sex, age, etc.), user vehicle data (vehicle age, vehicle price, etc.), user vehicle policy information (premium, kinds of risks, etc.), and the like. The initial user data set may be obtained from a vehicle insurance database of an insurance company.
Further, the user feature data is corresponding user feature data of different features of the user, for example: the user characteristic data corresponding to the sex characteristic of a certain user is male, and the user characteristic data corresponding to the age characteristic of a certain user is 25 years old.
Further, since the initial user data set includes duplicate data and missing data, in order to avoid the duplicate data and missing data from affecting the subsequent model training, the embodiment of the present invention performs duplicate data deletion processing and missing data padding processing on the initial user data set. Wherein the de-duplication process includes deleting other data of the same attribute after the first occurrence, for example: the first item in the initial user data set is the user age, the second item is the user age, and the third item is the user age; the missing data filling process may use a mean value estimation method to process, and fill the mean value of the first n items of data of the missing value data in the user data as the missing value.
Further, since the data volume of the subsequent model is to be reduced, the data in the initial user data set is to be compressed, so that the embodiment of the invention performs standardization processing and vectorization processing on the initial user data set, converts the user characteristic data into user standard characteristic data, and gathers all the user standard characteristic data to obtain the standard user data set.
In detail, the normalizing the initial user data set according to the embodiment of the present invention includes:
acquiring numerical data from the initial user data set, calculating the dispersion and standard deviation of each acquired numerical data, and replacing each corresponding numerical data by the ratio of the dispersion and standard deviation of each numerical data;
Further, the vectorizing processing of the initial user data set according to the embodiment of the present invention includes:
And acquiring character type data from the initial user data set, performing word segmentation on the character type data by using a jieba word segmentation tool, and performing coding processing on the character type data subjected to word segmentation by using single-hot coding.
By carrying out standardization processing and vectorization processing on the initial user data set, the data size is reduced, the training time consumption of the follow-up model can be effectively reduced, and the training efficiency of the follow-up model is further improved.
The generating module 102 is configured to perform a branch processing on the standard user data set based on a coefficient of a radix, obtain a branch combination set, and calculate a degree of dependence of each branch combination in the branch combination set; and selecting a preset number of branch combinations from the branch combination set according to the degree of dependence to obtain a training data set.
In the embodiment of the invention, the normalization and vectorization processes only process the user characteristic data and do not affect the user characteristics, so that the standard user data set contains the user standard characteristic data corresponding to different user characteristics.
Further, since the same user feature includes a plurality of user standard feature data, in order to determine the importance of different user standard feature data, the embodiment of the present invention performs branching processing on the standard data set based on the base index.
In detail, the branching process includes:
s21, dividing the standard data set into different branch combinations according to different user standard characteristic data corresponding to each user characteristic in the standard user data set.
In detail, the standard data set is divided into two branch data sets according to whether each user standard feature data corresponding to each user feature is yes or not, and the two branch data sets are combined to obtain a branch combination corresponding to the user standard feature data. For example: the standard feature data of the user corresponding to the age feature of the user comprises 3 age data of 25 years old, 26 years old and 27 years old, and the standard data sets are respectively divided into a branch data set with the age data of 25 years old and a branch combination of a branch data set with the age data of not 25 years old, a branch data set with the age data of 26 years old and a branch combination of a branch data set with the age data of not 26 years old, a branch data set with the age data of 27 years old and a branch combination of a branch data set with the age data of not 27 years old.
S22, selecting branch combinations with the coefficient of the radix coefficient reaching a preset value for summarizing to obtain the branch combination set.
Further, the formula for calculating the coefficient of kunity is as follows:
Wherein D is the standard dataset, a is the user feature in the standard dataset, a i is the user standard feature data corresponding to the user feature in the standard dataset, D 1 is the branch dataset of a i, D 2 is the branch dataset of the standard dataset in which the user standard feature data is not a i, gini (D 1) is the kenel coefficient of D 1, gini (D 2) is the kenel coefficient of D 2, gini (D, a|a i) is the kenel coefficient of the branch combination corresponding to a i.
S23, calculating the degree of dependence of each branch combination in the branch combination set.
In detail, the calculation formula of the degree of dependence is:
Wherein, C represents the degree of dependence of the branch combination, X i represents the number of users corresponding to different types of user standard feature data corresponding to the branch combination, N represents the number of all users in the branch combination, Z represents the type of user standard feature data corresponding to the branch combination, and i is the type of different user standard feature data.
In detail, in the embodiment of the present invention, the branch combinations in the branch combination set are ordered according to the degree of dependence, and the preset number of branch combinations in the preset ranking range are selected for summarization, so as to obtain the training data set.
The training data set is constructed by screening branch combinations with better degree of dependence, so that a trained model has higher stability, and the loss of calculation resources is reduced.
The model training module 103 is configured to train a user behavior prediction model using the training data set.
Preferably, the user behavior prediction model can be constructed by using a random forest regression model.
In detail, selecting the user characteristic of the corresponding user standard characteristic data in the branch combination set as a decision node of the user behavior prediction model; selecting the corresponding user standard characteristic data in the branch combination set as the split value of the user behavior prediction model; and constructing the user behavior prediction model by using the training data set and the decision node and the split value.
Further, user standard feature data of user behavior features in the training data set is used as a tag set, wherein the user standard feature data of the user behavior features can be user standard feature data corresponding to consumption behavior features of a user, namely amounts spent by different consumption types of the user, further, other user standard feature data except data contained in the tag set in the training data set is used as a training set, a model error function is constructed to train a model, and the model error function can be expressed by the following formula:
wherein W is a model error value, n is the number of users in the training data set, A t is an expected value of user standard feature data of user behavior features, E t is a predicted value of user standard feature data of user behavior features, and t represents different users.
In the embodiment of the invention, training is stopped when the model error value reaches the preset threshold value, and the user behavior prediction model is obtained.
The behavior prediction module 104 is configured to obtain user data to be predicted, and analyze a user behavior according to the user data to be predicted by using the user behavior prediction model.
In another embodiment of the present invention, training data for the user behavior prediction model and prediction data may be stored in a blockchain.
The user data to be predicted in the embodiment of the invention comprises relevant personal data of the user, for example: personal data of the user, consumption history data of the user, and the like. The user data to be predicted can be acquired through a webpage or an APP according to user authorization.
The embodiment of the invention inputs the user data to be predicted into the user behavior prediction model, outputs the predicted value of at least one predicted behavior of the user to be predicted, further orders the predicted values from high to low, determines the predicted behavior corresponding to the predicted value of the preset bit before the predicted value as the user predicted behavior, and preferably, the preset number can be the first five bits.
Fig. 3 is a schematic structural diagram of an electronic device for implementing the user behavior detection method according to the present invention.
The electronic device 1 may comprise a processor 10, a memory 11 and a bus, and may further comprise a computer program, such as a user behavior detection program 12, stored in the memory 11 and executable on the processor 10.
The memory 11 includes at least one type of readable storage medium, including flash memory, a mobile hard disk, a multimedia card, a card type memory (e.g., SD or DX memory, etc.), a magnetic memory, a magnetic disk, an optical disk, etc. The memory 11 may in some embodiments be an internal storage unit of the electronic device 1, such as a removable hard disk of the electronic device 1. The memory 11 may in other embodiments also be an external storage device of the electronic device 1, such as a plug-in mobile hard disk, a smart memory card (SMART MEDIA CARD, SMC), a Secure Digital (SD) card, a flash memory card (FLASH CARD) or the like, which are provided on the electronic device 1. Further, the memory 11 may also include both an internal storage unit and an external storage device of the electronic device 1. The memory 11 may be used not only for storing application software installed in the electronic device 1 and various types of data, such as codes of user behavior detection programs, but also for temporarily storing data that has been output or is to be output.
The processor 10 may be comprised of integrated circuits in some embodiments, for example, a single packaged integrated circuit, or may be comprised of multiple integrated circuits packaged with the same or different functions, including one or more central processing units (Central Processing unit, CPU), microprocessors, digital processing chips, graphics processors, combinations of various control chips, and the like. The processor 10 is a Control Unit (Control Unit) of the electronic device, connects respective components of the entire electronic device using various interfaces and lines, and executes various functions of the electronic device 1 and processes data by running or executing programs or modules (e.g., a user behavior detection program, etc.) stored in the memory 11, and calling data stored in the memory 11.
The bus may be a peripheral component interconnect standard (PERIPHERAL COMPONENT INTERCONNECT, PCI) bus, or an extended industry standard architecture (extended industry standard architecture, EISA) bus, among others. The bus may be classified as an address bus, a data bus, a control bus, etc. The bus is arranged to enable a connection communication between the memory 11 and at least one processor 10 etc.
Fig. 3 shows only an electronic device with components, it being understood by a person skilled in the art that the structure shown in fig. 3 does not constitute a limitation of the electronic device 1, and may comprise fewer or more components than shown, or may combine certain components, or may be arranged in different components.
For example, although not shown, the electronic device 1 may further include a power source (such as a battery) for supplying power to each component, and preferably, the power source may be logically connected to the at least one processor 10 through a power management device, so that functions of charge management, discharge management, power consumption management, and the like are implemented through the power management device. The power supply may also include one or more of any of a direct current or alternating current power supply, recharging device, power failure detection circuit, power converter or inverter, power status indicator, etc. The electronic device 1 may further include various sensors, bluetooth modules, wi-Fi modules, etc., which will not be described herein.
Further, the electronic device 1 may also comprise a network interface, optionally the network interface may comprise a wired interface and/or a wireless interface (e.g. WI-FI interface, bluetooth interface, etc.), typically used for establishing a communication connection between the electronic device 1 and other electronic devices.
The electronic device 1 may optionally further comprise a user interface, which may be a Display, an input unit, such as a Keyboard (Keyboard), or a standard wired interface, a wireless interface. Alternatively, in some embodiments, the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode) touch, or the like. The display may also be referred to as a display screen or display unit, as appropriate, for displaying information processed in the electronic device 1 and for displaying a visual user interface.
It should be understood that the embodiments described are for illustrative purposes only and are not limited to this configuration in the scope of the patent application.
The user behavior detection program 12 stored in the memory 11 of the electronic device 1 is a combination of instructions that, when executed in the processor 10, may implement:
acquiring an initial user data set, and carrying out standardization processing and vectorization processing on the initial user data set to obtain a standard user data set;
Branching is carried out on the standard user data set based on the coefficient of the radix, a branch combination set is obtained, and the degree of dependence of each branch combination in the branch combination set is calculated;
selecting a preset number of branch combinations from the branch combination sets according to the degree of dependence to obtain a training data set;
Training a user behavior prediction model by utilizing the training data set;
And obtaining user data to be predicted, and analyzing user behaviors by utilizing the user behavior prediction model according to the user data to be predicted.
Specifically, the specific implementation method of the above instructions by the processor 10 may refer to the description of the relevant steps in the corresponding embodiment of fig. 1, which is not repeated herein.
Further, the modules/units integrated in the electronic device 1 may be stored in a computer readable storage medium if implemented in the form of software functional units and sold or used as separate products. The computer readable medium may include: any entity or device capable of carrying the computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM).
Further, the computer-usable storage medium may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function, and the like; the storage data area may store data created from the use of blockchain nodes, and the like.
In the several embodiments provided in the present invention, it should be understood that the disclosed apparatus, device and method may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is merely a logical function division, and there may be other manners of division when actually implemented.
The modules described as separate components may or may not be physically separate, and components shown as modules may or may not be physical units, may be located in one place, or may be distributed over multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional module in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units can be realized in a form of hardware or a form of hardware and a form of software functional modules.
It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential characteristics thereof.
The present embodiments are, therefore, to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference signs in the claims shall not be construed as limiting the claim concerned.
The blockchain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanism, encryption algorithm and the like. The blockchain (Blockchain), essentially a de-centralized database, is a string of data blocks that are generated in association using cryptographic methods, each of which contains information from a batch of network transactions for verifying the validity (anti-counterfeit) of its information and generating the next block. The blockchain may include a blockchain underlying platform, a platform product services layer, an application services layer, and the like.
Furthermore, it is evident that the word "comprising" does not exclude other elements or steps, and that the singular does not exclude a plurality. A plurality of units or means recited in the system claims can also be implemented by means of software or hardware by means of one unit or means. The terms second, etc. are used to denote a name, but not any particular order.
Finally, it should be noted that the above-mentioned embodiments are merely for illustrating the technical solution of the present invention and not for limiting the same, and although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications and equivalents may be made to the technical solution of the present invention without departing from the spirit and scope of the technical solution of the present invention.

Claims (10)

1. A method for detecting user behavior, the method comprising:
Acquiring an initial user data set, deleting repeated data of the initial user data set, filling missing data of the initial user data set, and carrying out standardization and vectorization on the initial user data set to obtain a standard user data set;
Branching is carried out on the standard user data set based on the coefficient of the radix, a branch combination set is obtained, and the degree of dependence of each branch combination in the branch combination set is calculated;
selecting a preset number of branch combinations from the branch combination sets according to the degree of dependence to obtain a training data set;
Training a user behavior prediction model by utilizing the training data set;
And obtaining user data to be predicted, and analyzing user behaviors by utilizing the user behavior prediction model according to the user data to be predicted.
2. The user behavior detection method of claim 1, wherein the normalizing the initial user data set comprises:
acquiring numerical data from the initial user data set, calculating the dispersion and standard deviation of each acquired numerical data, and replacing each corresponding numerical data by the ratio of the dispersion and standard deviation of each numerical data;
Vectorizing the initial user data set includes:
And acquiring character type data from the initial user data set, performing word segmentation on the character type data by using a jieba word segmentation tool, and performing coding processing on the character type data subjected to word segmentation by using single-hot coding.
3. The method for detecting user behavior according to claim 1, wherein branching the standard user data set based on a coefficient of kunning to obtain a branching combination set comprises:
Dividing the standard data set into different branch combinations according to each user standard characteristic data corresponding to each user characteristic in the standard user data set;
And selecting branch combinations with the coefficient of the base coefficient reaching a preset value for summarizing to obtain the branch combination set.
4. The method for detecting user behavior according to claim 1, wherein selecting a predetermined number of branch combinations from the branch combination sets according to the degree of dependence to obtain a training data set comprises:
and sequencing the branch combinations in the branch combination set according to the degree of dependence, and selecting the branch combinations with the preset quantity in the preset ranking range for summarizing to obtain the training data set.
5. The method for detecting user behavior according to claim 1, wherein the obtaining the user data to be predicted, analyzing the user behavior according to the user data to be predicted by using a trained user behavior prediction model, comprises:
inputting the user data to be predicted into the user behavior prediction model, and outputting a predicted value of at least one predicted behavior of the user to be predicted;
Ordering the predicted values from high to low;
and determining the predicted behavior corresponding to the predicted value of the preset bit before row as the user behavior.
6. The user behavior detection method of claim 1, wherein the calculating the degree of dependence of each of the set of branch combinations comprises:
the degree of dependence of each branch combination in the set of branch combinations is calculated using the following formula:
Wherein, C represents the degree of dependence of the branch combination, X i represents the number of users corresponding to different types of user standard feature data corresponding to the branch combination, N represents the number of all users in the branch combination, Z represents the type of user standard feature data corresponding to the branch combination, and i is the type of different user standard feature data.
7. The method of claim 1, wherein training a user behavior prediction model using the training data set comprises:
Selecting the user characteristics of the corresponding user standard characteristic data in the branch combination set as decision nodes of the user behavior prediction model;
Selecting the corresponding user standard characteristic data in the branch combination set as the split value of the user behavior prediction model;
And constructing the user behavior prediction model by using the training data set and the decision node and the split value.
8. A user behavior detection apparatus, the apparatus comprising:
the processing module is used for acquiring an initial user data set, deleting repeated data of the initial user data set, filling missing data of the initial user data set, standardizing and vectorizing the initial user data set to obtain a standard user data set;
The generation module is used for carrying out branch processing on the standard user data set based on the coefficient of the radix, obtaining a branch combination set and calculating the degree of dependence of each branch combination in the branch combination set; selecting a preset number of branch combinations from the branch combination sets according to the degree of dependence to obtain a training data set;
the model training module is used for training a user behavior prediction model by utilizing the training data set;
and the behavior prediction module is used for acquiring user data to be predicted and analyzing the user behavior by utilizing the user behavior prediction model according to the user data to be predicted.
9. An electronic device, the electronic device comprising:
at least one processor; and
A memory communicatively coupled to the at least one processor; wherein,
The memory stores instructions executable by the at least one processor to enable the at least one processor to perform the user behavior detection method of any one of claims 1 to 7.
10. A computer readable storage medium comprising a stored data area storing data created according to use of blockchain nodes and a stored program area storing a computer program, characterized in that the computer program when executed by a processor implements the user behavior detection method according to any of claims 1 to 7.
CN202010370028.XA 2020-04-30 2020-04-30 User behavior detection method, device, electronic equipment and medium Active CN111652278B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010370028.XA CN111652278B (en) 2020-04-30 2020-04-30 User behavior detection method, device, electronic equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010370028.XA CN111652278B (en) 2020-04-30 2020-04-30 User behavior detection method, device, electronic equipment and medium

Publications (2)

Publication Number Publication Date
CN111652278A CN111652278A (en) 2020-09-11
CN111652278B true CN111652278B (en) 2024-04-30

Family

ID=72342541

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010370028.XA Active CN111652278B (en) 2020-04-30 2020-04-30 User behavior detection method, device, electronic equipment and medium

Country Status (1)

Country Link
CN (1) CN111652278B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112269793B (en) * 2020-09-16 2024-06-25 连尚(新昌)网络科技有限公司 Method and equipment for detecting user type based on blockchain
CN112215336B (en) * 2020-09-30 2024-02-09 招商局金融科技有限公司 Data labeling method, device, equipment and storage medium based on user behaviors
CN112148577B (en) * 2020-10-09 2024-05-07 平安科技(深圳)有限公司 Data anomaly detection method and device, electronic equipment and storage medium
CN112099739B (en) * 2020-11-10 2021-02-23 大象慧云信息技术有限公司 Classified batch printing method and system for paper invoices
CN112541745B (en) * 2020-12-22 2024-04-09 平安银行股份有限公司 User behavior data analysis method and device, electronic equipment and readable storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106384282A (en) * 2016-06-14 2017-02-08 平安科技(深圳)有限公司 Method and device for building decision-making model
CN107590224A (en) * 2017-09-04 2018-01-16 北京京东尚科信息技术有限公司 User preference analysis method and device based on big data
CN108427669A (en) * 2018-02-27 2018-08-21 华青融天(北京)技术股份有限公司 Abnormal behaviour monitoring method and system
CN108710609A (en) * 2018-05-07 2018-10-26 南京邮电大学 A kind of analysis method of social platform user information based on multi-feature fusion
CN110717509A (en) * 2019-09-03 2020-01-21 中国平安人寿保险股份有限公司 Data sample analysis method and device based on tree splitting algorithm

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150134401A1 (en) * 2013-11-09 2015-05-14 Carsten Heuer In-memory end-to-end process of predictive analytics
US10496927B2 (en) * 2014-05-23 2019-12-03 DataRobot, Inc. Systems for time-series predictive data analytics, and related methods and apparatus

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106384282A (en) * 2016-06-14 2017-02-08 平安科技(深圳)有限公司 Method and device for building decision-making model
CN107590224A (en) * 2017-09-04 2018-01-16 北京京东尚科信息技术有限公司 User preference analysis method and device based on big data
CN108427669A (en) * 2018-02-27 2018-08-21 华青融天(北京)技术股份有限公司 Abnormal behaviour monitoring method and system
CN108710609A (en) * 2018-05-07 2018-10-26 南京邮电大学 A kind of analysis method of social platform user information based on multi-feature fusion
CN110717509A (en) * 2019-09-03 2020-01-21 中国平安人寿保险股份有限公司 Data sample analysis method and device based on tree splitting algorithm

Also Published As

Publication number Publication date
CN111652278A (en) 2020-09-11

Similar Documents

Publication Publication Date Title
CN111652278B (en) User behavior detection method, device, electronic equipment and medium
CN113449187B (en) Product recommendation method, device, equipment and storage medium based on double images
CN113688923B (en) Order abnormity intelligent detection method and device, electronic equipment and storage medium
CN113656690B (en) Product recommendation method and device, electronic equipment and readable storage medium
WO2021238563A1 (en) Enterprise operation data analysis method and apparatus based on configuration algorithm, and electronic device and medium
CN112579621B (en) Data display method and device, electronic equipment and computer storage medium
CN111768096A (en) Rating method and device based on algorithm model, electronic equipment and storage medium
CN114781832A (en) Course recommendation method and device, electronic equipment and storage medium
CN114491047A (en) Multi-label text classification method and device, electronic equipment and storage medium
CN111652282B (en) Big data-based user preference analysis method and device and electronic equipment
CN115018588A (en) Product recommendation method and device, electronic equipment and readable storage medium
CN113268665A (en) Information recommendation method, device and equipment based on random forest and storage medium
CN114612194A (en) Product recommendation method and device, electronic equipment and storage medium
CN112560465A (en) Method and device for monitoring batch abnormal events, electronic equipment and storage medium
CN113658002B (en) Transaction result generation method and device based on decision tree, electronic equipment and medium
CN113505273B (en) Data sorting method, device, equipment and medium based on repeated data screening
CN114840684A (en) Map construction method, device and equipment based on medical entity and storage medium
CN113918718A (en) Vehicle insurance user classification method, device, equipment and medium based on artificial intelligence
CN113486238A (en) Information pushing method, device and equipment based on user portrait and storage medium
CN113705201B (en) Text-based event probability prediction evaluation algorithm, electronic device and storage medium
CN116484296A (en) Financial fund collection risk analysis method, device, equipment and storage medium
CN113626605B (en) Information classification method, device, electronic equipment and readable storage medium
CN113706019B (en) Service capability analysis method, device, equipment and medium based on multidimensional data
CN113435746B (en) User workload scoring method and device, electronic equipment and storage medium
CN111652281B (en) Information data classification method, device and readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant