CN114169451A - Behavior data classification processing method, device, equipment and storage medium - Google Patents

Behavior data classification processing method, device, equipment and storage medium Download PDF

Info

Publication number
CN114169451A
CN114169451A CN202111506939.1A CN202111506939A CN114169451A CN 114169451 A CN114169451 A CN 114169451A CN 202111506939 A CN202111506939 A CN 202111506939A CN 114169451 A CN114169451 A CN 114169451A
Authority
CN
China
Prior art keywords
data
user behavior
behavior data
historical user
classification
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111506939.1A
Other languages
Chinese (zh)
Inventor
兰珣
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Construction Bank Corp
Original Assignee
China Construction Bank Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Construction Bank Corp filed Critical China Construction Bank Corp
Priority to CN202111506939.1A priority Critical patent/CN114169451A/en
Publication of CN114169451A publication Critical patent/CN114169451A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/906Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application provides a behavior data classification processing method, a behavior data classification processing device, behavior data classification processing equipment and a storage medium, and relates to the technical field of data processing, wherein the behavior data classification processing method comprises the following steps: receiving user behavior data sent by a first client, and determining target data according to the user behavior data; inputting the target data into a trained random forest model to obtain the result of each classification tree in the trained random forest model; determining a classification result corresponding to the user behavior data according to the proportion of abnormal results in the results of all the classification trees to all the results; and sending the classification result to a second client side for displaying. The method increases the accuracy of classifying the user behavior data.

Description

Behavior data classification processing method, device, equipment and storage medium
Technical Field
The present application relates to the field of data processing technologies, and in particular, to a behavior data classification processing method, apparatus, device, and storage medium.
Background
As web pages and software in networks become clickable and submittable with more and more content, user behavior in web pages and software becomes increasingly unpredictable, which tends to cause security problems.
Currently, classification and pre-classification of user behavior that is prone to security problems is typically done manually.
However, the user behavior is huge in variety, so that the classification of the user behavior by manpower is easy to miss detection or make mistakes, and the problem of low classification accuracy is caused.
Disclosure of Invention
The application provides a behavior data classification processing method, a behavior data classification processing device, behavior data classification processing equipment and a storage medium, which are used for solving the problem of low accuracy of manual classification.
In a first aspect, the present application provides a behavior data classification processing method, including:
receiving user behavior data sent by a first client, and determining target data according to the user behavior data; inputting target data into the trained random forest model to obtain the result of each classification tree in the trained random forest model; determining a classification result corresponding to the user behavior data according to the proportion of the abnormal results in the results of each classification tree to all the results; and sending the classification result to the second client side for displaying.
In a possible implementation manner, before receiving the user behavior data sent by the first client, the method further includes: acquiring historical user behavior data; receiving assignment information sent by a third client, and assigning historical user behavior data according to the assignment information to obtain assigned historical user behavior data, wherein the assigned historical user behavior data comprises normal data and abnormal data; determining standard historical user behavior data according to the assigned historical user behavior data; and (4) performing random forest model training by using standard historical user behavior data to obtain a trained random forest model.
In one possible implementation, determining the standard historical user behavior data according to the assigned historical user behavior data includes: calculating to obtain new abnormal data by taking all the abnormal data as basic data, and adding the new abnormal data into the abnormal data until the data volume of the abnormal data reaches a first preset proportion of assigned historical user behavior data; and determining the assigned historical user behavior data with the abnormal data volume reaching a first preset proportion as standard historical user behavior data.
In a possible implementation manner, the new abnormal data is calculated based on all the abnormal data, and the method includes: and taking any abnormal data in all the abnormal data as basic data, taking another abnormal data in all the abnormal data as auxiliary data, and performing difference on the basic data and the auxiliary data to obtain new abnormal data.
In one possible implementation, the random forest model training is performed using standard historical user behavior data to obtain a trained random forest model, and the method includes: extracting a preset amount of data in the standard historical user behavior data in a replacement mode to serve as a training set, and taking the residual data which are not extracted to serve as a test set; training an initial random forest model by using a training set to obtain a random forest model to be determined; and if the accuracy of the classification result obtained by inputting the test set into the random forest model to be determined is smaller than a second preset proportion, re-executing the step of determining standard historical user behavior data according to the assigned historical user behavior data, and continuing to execute the steps of establishing the training set and the test set and training the model until the accuracy of the obtained classification result exceeds the second preset proportion, so as to obtain the trained random forest model.
In a second aspect, the present application provides a behavior data classification processing apparatus, including:
the target data determining module is used for receiving the user behavior data sent by the first client and determining the target data according to the user behavior data; the result obtaining module is used for inputting the target data into the trained random forest model to obtain the result of each classification tree in the trained random forest model; the classification result obtaining module is used for determining a classification result corresponding to the user behavior data according to the proportion of the abnormal results in the results of each classification tree to all the results; and the classification result sending module is used for sending the classification result to the second client side for displaying.
In a possible implementation manner, the behavior data classification processing apparatus further includes:
the behavior data acquisition module is used for acquiring historical user behavior data; the assigned data acquisition module is used for receiving assignment information sent by a third client, and assigning historical user behavior data according to the assignment information to obtain assigned historical user behavior data, wherein the assigned historical user behavior data comprises normal data and abnormal data; the standard data determining module is used for determining standard historical user behavior data according to the assigned historical user behavior data; and the model obtaining module is used for carrying out random forest model training by using the standard historical user behavior data to obtain a trained random forest model.
In a third aspect, the present application provides an electronic device, comprising: a processor, and a memory communicatively coupled to the processor; the memory stores computer-executable instructions; the processor executes the computer-executable instructions stored in the memory to implement the behavioral data classification processing method of the first aspect as described above.
In a fourth aspect, the present application provides a computer-readable storage medium, in which computer-executable instructions are stored, and when the computer-executable instructions are executed by a processor, the method for classifying behavioral data according to the first aspect is implemented.
In a fifth aspect, the present application provides a computer program product comprising a computer program which, when executed by a processor, implements the behavioural data classification processing method as described in the first aspect above.
According to the behavior data classification processing method, the behavior data classification processing device, the behavior data classification processing equipment and the behavior data classification processing storage medium, the user behavior data are input into the trained random forest model to obtain results of all classification trees in the trained random forest model, and the classification results are determined according to the proportion of abnormal results in the results of all classification trees to all results, so that the trained random forest model is not only used for outputting yes results or no results, and the results of all classification trees are considered, so that the user behavior data can be classified more finely, and the classification accuracy is improved.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present application and together with the description, serve to explain the principles of the application.
Fig. 1 is a schematic view of an application scenario of a behavior data classification processing method according to an embodiment of the present application;
fig. 2 is a schematic flowchart of a behavior data classification processing method according to an embodiment of the present application;
fig. 3 is a schematic diagram of a behavior data classification processing apparatus according to an embodiment of the present application;
fig. 4 is a schematic diagram of a behavior data classification processing apparatus according to an embodiment of the present application;
fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
With the above figures, there are shown specific embodiments of the present application, which will be described in more detail below. These drawings and written description are not intended to limit the scope of the inventive concepts in any manner, but rather to illustrate the inventive concepts to those skilled in the art by reference to specific embodiments.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present application, as detailed in the appended claims.
With the increasing of software and websites in the network, the degree of freedom of netizens in the network is higher and higher, and for the provider of the network service, the increasing of contents that can be submitted by the users of the network service in the network means that the behaviors of the users in the network are more unpredictable, and the risk of damage to the service is higher. At present, user behaviors are divided into normal operations and abnormal operations, and manual investigation can be performed, but due to the fact that the number of the user behaviors is large and paths are different, missing detection or normal operations are classified into abnormal operations easily in manual classification of the user behaviors, and classification accuracy is low.
Aiming at the problem that the manual classification accuracy is low, the application provides the following technical idea: the obtained user behavior data are input into the random forest model to obtain the result of each classification tree of the random forest model, and the user behavior data are classified by using the proportion of abnormal results in the result of each classification tree of the random forest model to all the results, so that the classification accuracy is improved, and the problem of low classification accuracy is solved.
The method and the device are applied to scenes for classifying the user behavior data. In the technical scheme of the disclosure, the acquisition, storage, application and the like of the personal information of the related user all accord with the regulations of related laws and regulations, and do not violate the good customs of the public order.
Fig. 1 is a schematic view of an application scenario of a behavior data classification processing method according to an embodiment of the present application. As in fig. 1, this scenario includes: a first client 101, a server 102 and a second client 103.
In a specific implementation process, the first client 101 is configured to collect user behavior data and send the collected user behavior data to the server 102.
The server 102 is configured to input the user behavior data into the trained random forest model to obtain results of each classification tree in the random forest model, determine a classification result corresponding to the user behavior data according to the results of each classification tree in the random forest model, and send the classification result to the second client 103.
The second client 103 is configured to receive the classification result and display the classification result.
The first client 101 may be any device having functions of collecting data and sending data, including but not limited to a computer, a server, a tablet, a mobile phone, a Personal Digital Assistant (PDA), a notebook, and the like. The server 102 may be implemented by a cluster of one or more servers with stronger processing power and higher security, and may be replaced by a computer with stronger computing power, a notebook computer, or the like. The second client 103 may include a tablet, a mobile phone, a palm computer, a computer connected to a display, a notebook, and the like.
The connection among the first client 101, the server 102 and the second client 103 may be a wired connection or a wireless connection, wherein the network used by the wireless connection may include various types of wired and wireless networks, such as but not limited to: the internet, a Local Area network, Wireless Fidelity (WIFI), a Wireless Local Area Network (WLAN), a cellular communication network (General Packet Radio Service (GPRS), Code Division Multiple Access (CDMA), 2G/3G/4G/5G cellular network), a satellite communication network, and so on.
It is to be understood that the illustrated structure of the embodiment of the present application does not specifically limit the behavior data classification processing method. In other possible embodiments of the present application, the foregoing architecture may include more or less components than those shown in the drawings, or combine some components, or split some components, or arrange different components, which may be determined according to practical application scenarios, and is not limited herein. The components shown in fig. 1 may be implemented in hardware, software, or a combination of software and hardware.
The following describes the technical solutions of the present application and how to solve the above technical problems with specific embodiments. The following several specific embodiments may be combined with each other, and details of the same or similar concepts or processes may not be repeated in some embodiments. Embodiments of the present application will be described below with reference to the accompanying drawings.
Fig. 2 is a schematic flow chart of a behavior data classification processing method according to an embodiment of the present application. The execution subject of the embodiment of the present application may be the server 102 in fig. 1, or may also be other electronic devices such as a computer and/or a mobile phone, which is not limited in this embodiment. As shown in fig. 2, the method includes:
s201: and receiving user behavior data sent by the first client, and determining target data according to the user behavior data.
In this step, the user behavior data may be user behavior data generated after the user performs operations such as browsing, inputting, and/or clicking on the first client. Determining the target data according to the user behavior data may be intercepting part of the user behavior data in the user behavior data to obtain intercepted user behavior data, and then encoding the intercepted user behavior data to obtain the target data.
For example, user behavior data may include an entry interface, an entry interface time, a click operation on a connection, a time at which a click operation on a connection occurred, a request action, a request time, a time interval for individual behaviors, a time to exit an interface, and so forth. Intercepting part of the user behavior data can be intercepting all actions between two request actions, and can also be intercepting all operations between an entry interface and an exit interface. Encoding the intercepted user behavior data may be WOE (Weight of Evidence) encoding the intercepted user behavior data.
S202: and inputting the target data into the trained random forest model to obtain the result of each classification tree in the trained random forest model.
In this step, the trained random forest model includes a plurality of classification trees, and the result of each classification tree includes "normal behavior" and "abnormal behavior".
S203: and determining a classification result corresponding to the user behavior data according to the proportion of the abnormal results in the results of each classification tree to all the results.
In the present step, the abnormal result may be the "behavior abnormal" result in the above-described step S202.
For example, the risk rating may be divided by the proportion of abnormal outcomes to total outcomes, e.g., 0% to 25% no risk, 25% to 50% low risk, 50% to 75% medium risk, 75% to 100% high risk. Specifically, if the trained random forest model includes 100 classification trees, and 30 of the classification trees are abnormal results, the proportion of the abnormal results to the total results is 30%, the classification results of the user behavior data can be determined as low risk, and if the trained random forest model includes 100 classification trees, 60 of the classification results are abnormal results, the proportion of the abnormal results to the total results is 60%, and the classification results of the user behavior data can be determined as medium risk. The above classification criteria for risk levels are merely illustrative and are not intended to be limiting of the embodiments of the present application, and other implementations will readily occur to those skilled in the art upon reading the embodiments of the present application.
S204: and sending the classification result to the second client side for displaying.
In this step, the classification result obtained in step S203 may be sent to the second client, and the second client displays the classification result on a screen.
As can be seen from the description of the above embodiment, in the embodiment of the present application, the user behavior data is input into the trained random forest model to obtain the results of all classification trees in the trained random forest model, and the classification result is determined according to the proportion of the abnormal result in the result of each classification tree to all the results, so that the trained random forest model is not only used for outputting yes or no results, and the user behavior data can be classified more finely due to consideration of the results of all the classification trees, thereby improving the classification accuracy.
In a possible implementation manner, before the step S201 receives the user behavior data sent by the first client, the method further includes:
s301: historical user behavior data is obtained.
In this step, the obtaining of the historical user behavior data may be performed by obtaining from a data collecting device, where the data collecting device may include a server, a computer, and the like, or by recording the user behavior data within a preset time period, so as to obtain the historical user behavior data, where the recording of the user behavior data may be performed by using a buried point method, and storing the obtained user behavior data.
S302: and receiving assignment information sent by the third client, and assigning values to the historical user behavior data according to the assignment information to obtain assigned historical user behavior data, wherein the assigned historical user behavior data comprises normal data and abnormal data.
In this step, assigning the historical user behavior data may include converting data in the historical user behavior data into corresponding numerical values, and may further include adding a determination whether the historical user behavior data corresponds to abnormal data or not to the historical user behavior data.
For example, the value obtained by converting the data in the historical user behavior data is not particularly limited in the present application, but is not limited to 1 for request time data with a request duration of 30 seconds or less, 2 for request time data with a request duration of 30 seconds to 40 seconds, and 3 for request time data with a request duration of 40 seconds to 60 seconds. The determination of whether the historical user behavior data corresponds to the abnormal data or not is added into the historical user behavior data, which may include adding "user behavior data is normal" or "user behavior data is abnormal" into the historical user behavior data, where "user behavior data is normal" may also be replaced with "0", and "user behavior data is abnormal" may also be replaced with "1", and may also include adding risk level data, such as "no risk", "low risk", "medium risk", or "high risk", into the historical user behavior data, and the risk level data may also be replaced with numbers, such as "0", "1", "2", and "3", respectively.
S303: and determining standard historical user behavior data according to the assigned historical user behavior data.
In this step, determining the standard historical user behavior data may include increasing abnormal data in the assigned historical user behavior data by using a smote method, and using the assigned historical user behavior data after the abnormal data is increased as the standard historical user behavior data.
For example, the total data volume of the current standard historical user behavior data is 100, wherein the data volume of the normal data is 90, and the data volume of the abnormal data is 10, the abnormal data is increased by using a smote method, the data volume of the abnormal data can also be increased to 90, and then the total data volume reaches 180. The increment of the abnormal data can be preset, and the abnormal data is stopped to be increased after the proportion of the total data amount is reached or the preset number is reached.
S304: and (4) performing random forest model training by using standard historical user behavior data to obtain a trained random forest model.
In this step, performing random forest model training may include constructing a classification tree in the random forest model by using different data types, where the data types are data types in the standard historical user behavior data, such as request time and request object.
It can be known from the description of the above embodiment that, because the historical user behavior data has less abnormal data, the embodiment of the present application assigns values to the historical user behavior data, and adds the assigned abnormal data in the historical user behavior data to obtain the standard historical user behavior data, thereby ensuring the accuracy of the model obtained by training, and further improving the classification accuracy.
In a possible implementation manner, the determining the standard historical user behavior data according to the assigned historical user behavior data in step S303 includes:
s3031: and calculating to obtain new abnormal data by taking all the abnormal data as basic data, and adding the new abnormal data into the abnormal data until the data volume of the abnormal data reaches a first preset proportion of the assigned historical user behavior data.
In a possible implementation manner, the calculating to obtain new abnormal data by using all the abnormal data as basic data specifically includes: and taking any abnormal data in all the abnormal data as basic data, taking another abnormal data in all the abnormal data as auxiliary data, and performing difference on the basic data and the auxiliary data to obtain new abnormal data.
In this step, since the abnormal data has assignments, new abnormal data can be obtained by performing a difference on the two abnormal data, where the difference can be a difference for each data type of the abnormal data to obtain difference results for all data types, and the new abnormal data can be obtained by combining all the difference results.
Specifically, for example, there are currently 10 abnormal data, where the content of the first abnormal data includes:
first request operation 1, first request time 30 seconds, second request operation 2, second request time 10 seconds, exit operation 0.
The contents of the second anomaly data include:
first request operation 1, first request time 20 seconds, second request operation 2, second request time 5 seconds, exit operation 0.
The difference may be performed between the first abnormal data and the second abnormal data, and the difference method for performing the difference on the first request time may be to use the first abnormal data as basic data and the second abnormal data as auxiliary data, and bring the data of the same data type in the first abnormal data and the second abnormal data into the following formula to obtain a value of the new abnormal data of the same data type:
xk=xi+(xi-xj
wherein x iskValue, x, indicating that new exception data is in the same data typeiA value, x, indicating that the first exception data is of the same data typeiA value indicating that the first abnormal data is of the same data type, and γ represents a random number between 0 and 1.
The difference results in new anomaly data such as:
first request operation 1, first request time 14 seconds, second request operation 2, second request time 7 seconds, exit operation 0.
In a possible implementation manner, taking any abnormal data in all abnormal data as basic data, and taking another abnormal data in the abnormal data as auxiliary data specifically include: and finding the other abnormal data which is closest to the abnormal data according to the values of various data types in any abnormal data to serve as auxiliary data.
S3032: and determining the assigned historical user behavior data with the abnormal data volume reaching a first preset proportion as standard historical user behavior data.
In the step, when new abnormal data is created, the total amount of the abnormal data is recorded, when the total amount of the abnormal data reaches a first preset proportion of the total amount of the assigned historical user behavior data, the creation of the new abnormal data is stopped, and the assigned historical user behavior data is determined as standard historical user behavior data.
From the description of the above embodiment, the embodiment of the application provides a method for increasing the data volume of abnormal data, and the overfitting problem caused by random sampling can be effectively alleviated by increasing the data volume of the abnormal data, so that the classification of the random forest model obtained by training is more accurate.
In a possible implementation manner, the training of the random forest model by using the standard historical user behavior data in step S304 to obtain a trained random forest model specifically includes:
s3041: and (4) extracting a preset amount of data in the standard historical user behavior data in a replacement mode to serve as a training set, and taking the residual data which are not extracted to serve as a testing set.
In this step, a preset amount of data is extracted as a training set in a replaced manner, and the data may not be removed from the standard historical user behavior data after the data is extracted from the standard historical user behavior data, and the data may still be extracted during subsequent data extraction.
S3042: and training the initial random forest model by using a training set to obtain the random forest model to be determined.
In this step, the number of classification trees may be preset and then model training may be performed, and the default number of classification trees is the maximum possible value.
S3043: and if the accuracy of the classification result obtained by inputting the test set into the random forest model to be determined is smaller than a second preset proportion, re-executing the step of determining standard historical user behavior data according to the assigned historical user behavior data, and continuing to execute the steps of establishing the training set and the test set and training the model until the accuracy of the obtained classification result exceeds the second preset proportion, so as to obtain the trained random forest model.
In this step, the classification result may include a risk level, i.e., "no risk", "low risk", "medium risk", or "high risk", and may also include "user behavior data is normal" or "user behavior data is abnormal". And comparing the obtained classification result with the actual data of the test set to determine whether the classification is correct.
As can be seen from the description of the above embodiments, the embodiments of the present application provide a specific method for training a random forest model, and a random forest model can be established without the random forest model through training the random forest model.
Fig. 3 is a schematic diagram of a behavior data classification processing apparatus according to an embodiment of the present application. As shown in fig. 3, the behavior data classification processing apparatus 300 includes a target data determination module 301, a result obtaining module 302, a classification result obtaining module 303, and a classification result transmission module 304.
The target data determining module 301 is configured to receive user behavior data sent by the first client, and determine target data according to the user behavior data.
And a result obtaining module 302, configured to input the target data into the trained random forest model, and obtain a result of each classification tree in the trained random forest model.
A classification result obtaining module 303, configured to determine a classification result corresponding to the user behavior data according to a ratio of the abnormal result to all results in the results of each classification tree.
And a classification result sending module 304, configured to send the classification result to the second client for display.
Fig. 4 is a schematic diagram of a behavior data classification processing apparatus according to an embodiment of the present application. As shown in fig. 4, the behavior data classification processing apparatus 300 further includes a behavior data acquisition module 305, an assigned value data acquisition module 306, a standard data determination module 307, and a model acquisition module 308.
A behavior data obtaining module 305, configured to obtain historical user behavior data.
And the assignment data obtaining module 306 is configured to receive assignment information sent by the third client, and assign values to the historical user behavior data according to the assignment information to obtain assigned historical user behavior data, where the assigned historical user behavior data includes normal data and abnormal data.
And a standard data determining module 307, configured to determine standard historical user behavior data according to the assigned historical user behavior data.
And the model obtaining module 308 is configured to perform random forest model training using the standard historical user behavior data to obtain a trained random forest model.
With continued reference to fig. 4. As shown in fig. 4, the standard data determining module 307 is specifically configured to calculate new abnormal data based on all the abnormal data, and add the new abnormal data to the abnormal data until the data amount of the abnormal data reaches the first preset proportion of the assigned historical user behavior data. And determining the assigned historical user behavior data with the abnormal data volume reaching a first preset proportion as standard historical user behavior data.
With continued reference to fig. 4. As shown in fig. 4, the standard data determining module 307 is further configured to take any abnormal data of all the abnormal data as basic data, take another abnormal data of all the abnormal data as auxiliary data, and perform a difference between the basic data and the auxiliary data to obtain new abnormal data.
With continued reference to fig. 4. As shown in fig. 4, the model obtaining module 308 is further configured to extract a preset amount of data from the standard historical user behavior data in a replacement manner as a training set, and use the remaining data that is not extracted as a test set. And training the initial random forest model by using a training set to obtain the random forest model to be determined. And if the accuracy of the classification result obtained by inputting the test set into the random forest model to be determined is smaller than a second preset proportion, re-executing the step of determining standard historical user behavior data according to the assigned historical user behavior data, and continuing to execute the steps of establishing the training set and the test set and training the model until the accuracy of the obtained classification result exceeds the second preset proportion, so as to obtain the trained random forest model.
Fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present application. For example, referring to fig. 5, the electronic device 500 may include a processor 501 and a memory 502 communicatively coupled to the processor 501.
The memory 502 stores computer-executable instructions.
The processor 501 executes computer-executable instructions stored in the memory 502 to implement the data classification processing method provided by any of the embodiments described above.
Alternatively, the memory 502 may be separate or integrated with the processor 501. When the memory 502 is a separate device from the processor 501, the electronic device may further include: a bus for connecting the memory 502 and the processor 501.
The present application further provides a computer-readable storage medium, where a computer execution instruction is stored in the computer-readable storage medium, and when a processor executes the computer execution instruction, the technical solution of the data classification processing method in any of the above embodiments is implemented, and the implementation principle and the beneficial effect of the data classification processing method are similar to those of the data classification processing method, which can be referred to as the implementation principle and the beneficial effect of the data classification processing method, and are not described herein again.
The present application further provides a computer program product, including a computer program, where when the computer program is executed by a processor, the technical solution of the data classification processing method in any of the above embodiments is implemented, and the implementation principle and the beneficial effect of the computer program are similar to those of the data classification processing method, which can be referred to as the implementation principle and the beneficial effect of the data classification processing method, and are not described herein again.
In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described embodiments of the apparatus are merely illustrative, and for example, the division of modules is only one logical division, and other divisions may be realized in practice, for example, a plurality of modules may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or modules, and may be in an electrical, mechanical or other form.
Modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to implement the solution of the present embodiment.
In addition, functional modules in the embodiments of the present application may be integrated into one processing unit, or each module may exist alone physically, or two or more modules are integrated into one unit. The unit formed by the modules can be realized in a hardware form, and can also be realized in a form of hardware and a software functional unit.
The integrated module implemented in the form of a software functional module may be stored in a computer-readable storage medium. The software functional module is stored in a storage medium and includes several instructions to enable a computer device (which may be a personal computer, a server, or a network device) or a processor to execute some steps of the methods according to the embodiments of the present application.
It should be understood that the Processor may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of a method disclosed in connection with the present invention may be embodied directly in a hardware processor, or in a combination of the hardware and software modules within the processor.
The memory may comprise a high-speed RAM memory, and may further comprise a non-volatile storage NVM, such as at least one disk memory, and may also be a usb disk, a removable hard disk, a read-only memory, a magnetic or optical disk, etc.
The bus may be an Industry Standard Architecture (ISA) bus, a Peripheral Component Interconnect (PCI) bus, an Extended ISA (Extended Industry Standard Architecture) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, the buses in the figures of the present application are not limited to only one bus or one type of bus.
The storage medium may be implemented by any type or combination of volatile or non-volatile memory devices, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks. A storage media may be any available media that can be accessed by a general purpose or special purpose computer.
An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium. Of course, the storage medium may also be integral to the processor. The processor and the storage medium may reside in an Application Specific Integrated Circuits (ASIC). Of course, the processor and the storage medium may reside as discrete components in an electronic device or host device.
Those of ordinary skill in the art will understand that: all or a portion of the steps of implementing the above-described method embodiments may be performed by hardware associated with program instructions. The program may be stored in a computer-readable storage medium. When executed, the program performs steps comprising the method embodiments described above; and the aforementioned storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic or optical disks.
Other embodiments of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the application being indicated by the following claims.
It will be understood that the present application is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the application is limited only by the appended claims.

Claims (10)

1. A behavior data classification processing method is applied to a server and comprises the following steps:
receiving user behavior data sent by a first client, and determining target data according to the user behavior data;
inputting the target data into a trained random forest model to obtain the result of each classification tree in the trained random forest model;
determining a classification result corresponding to the user behavior data according to the proportion of abnormal results in the results of all the classification trees to all the results;
and sending the classification result to a second client side for displaying.
2. The method of claim 1, wherein before receiving the user behavior data sent by the first client, the method further comprises:
acquiring historical user behavior data;
receiving assignment information sent by a third client, and assigning values to the historical user behavior data according to the assignment information to obtain assigned historical user behavior data, wherein the assigned historical user behavior data comprises normal data and abnormal data;
determining standard historical user behavior data according to the assigned historical user behavior data;
and performing random forest model training by using the standard historical user behavior data to obtain the trained random forest model.
3. The method of claim 2, wherein determining standard historical user behavior data from the assigned historical user behavior data comprises:
calculating to obtain new abnormal data by taking all the abnormal data as basic data, and adding the new abnormal data into the abnormal data until the data volume of the abnormal data reaches a first preset proportion of the assigned historical user behavior data;
and determining the assigned historical user behavior data with the abnormal data volume reaching the first preset proportion as standard historical user behavior data.
4. The method of claim 3, wherein calculating new anomaly data based on all anomaly data comprises:
and taking any abnormal data in all the abnormal data as basic data, taking another abnormal data in all the abnormal data as auxiliary data, and performing difference on the basic data and the auxiliary data to obtain new abnormal data.
5. A method as claimed in any one of claims 2 to 4, wherein the training of a random forest model using the standard historical user behavior data to obtain the trained random forest model comprises:
extracting a preset amount of data in the standard historical user behavior data in a replacement mode to serve as a training set, and taking the residual data which are not extracted to serve as a test set;
training an initial random forest model by using the training set to obtain a random forest model to be determined;
and if the accuracy of the classification result obtained by inputting the test set into the random forest model to be determined is smaller than a second preset proportion, re-executing the step of determining standard historical user behavior data according to the assigned historical user behavior data, and continuing to execute the steps of establishing the training set and the test set and training the model until the accuracy of the obtained classification result exceeds the second preset proportion, so as to obtain the trained random forest model.
6. A behavior data classification processing apparatus, comprising:
the target data determining module is used for receiving the user behavior data sent by the first client and determining the target data according to the user behavior data;
the result obtaining module is used for inputting the target data into the trained random forest model to obtain the result of each classification tree in the trained random forest model;
a classification result obtaining module, configured to determine a classification result corresponding to the user behavior data according to a ratio of abnormal results to all results in the results of each classification tree;
and the classification result sending module is used for sending the classification result to a second client side for displaying.
7. The apparatus of claim 6, further comprising:
the behavior data acquisition module is used for acquiring historical user behavior data;
the assigned data acquisition module is used for receiving assignment information sent by a third client, and assigning the historical user behavior data according to the assignment information to obtain assigned historical user behavior data, wherein the assigned historical user behavior data comprises normal data and abnormal data;
the standard data determining module is used for determining standard historical user behavior data according to the assigned historical user behavior data;
and the model obtaining module is used for carrying out random forest model training by using the standard historical user behavior data to obtain the trained random forest model.
8. An electronic device, comprising: a processor, and a memory communicatively coupled to the processor;
the memory stores computer-executable instructions;
the processor executes the computer-executable instructions stored by the memory to implement the behavioral data classification processing method of any one of claims 1 to 5.
9. A computer-readable storage medium having stored therein computer-executable instructions for implementing the behavioral data classification processing method according to any one of claims 1 to 5 when executed by a processor.
10. A computer program product, characterized in that it comprises a computer program which, when executed by a processor, implements the behavioural data classification processing method as claimed in any one of claims 1 to 5.
CN202111506939.1A 2021-12-10 2021-12-10 Behavior data classification processing method, device, equipment and storage medium Pending CN114169451A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111506939.1A CN114169451A (en) 2021-12-10 2021-12-10 Behavior data classification processing method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111506939.1A CN114169451A (en) 2021-12-10 2021-12-10 Behavior data classification processing method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN114169451A true CN114169451A (en) 2022-03-11

Family

ID=80485595

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111506939.1A Pending CN114169451A (en) 2021-12-10 2021-12-10 Behavior data classification processing method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN114169451A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114996769A (en) * 2022-08-08 2022-09-02 西安晟昕科技发展有限公司 Data preprocessing and storing method

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114996769A (en) * 2022-08-08 2022-09-02 西安晟昕科技发展有限公司 Data preprocessing and storing method

Similar Documents

Publication Publication Date Title
CN109345417B (en) Online assessment method and terminal equipment for business personnel based on identity authentication
CN108876464B (en) Cheating behavior detection method and device, service equipment and storage medium
CN113268641B (en) User data processing method based on big data and big data server
CN110830234A (en) User traffic distribution method and device
CN110704677A (en) Program recommendation method and device, readable storage medium and terminal equipment
CN109271315B (en) Script code detection method, script code detection device, computer equipment and storage medium
CN111273891A (en) Business decision method and device based on rule engine and terminal equipment
CN114169451A (en) Behavior data classification processing method, device, equipment and storage medium
CN111475494A (en) Mass data processing method, system, terminal and storage medium
CN112347457A (en) Abnormal account detection method and device, computer equipment and storage medium
CN112465565B (en) User portrait prediction method and device based on machine learning
CN113010785A (en) User recommendation method and device
CN110737509A (en) Thermal migration processing method and device, storage medium and electronic equipment
CN110704614A (en) Information processing method and device for predicting user group type in application
CN113032278B (en) Application running mode, and method and device for confirming grade of terminal equipment
CN114265740A (en) Error information processing method, device, equipment and storage medium
CN110297989B (en) Test method, device, equipment and medium for anomaly detection
CN114513686A (en) Method and device for determining video information and storage medium
CN111263351A (en) Service processing method, service processing device, electronic device and storage medium
CN113301597B (en) Network analysis method and equipment
CN113905400B (en) Network optimization processing method and device, electronic equipment and storage medium
CN111400174B (en) Method and device for determining application efficiency of data source and server
CN112261484B (en) Target user identification method and device, electronic equipment and storage medium
CN114265537B (en) Display method and device for use mode of data center monitoring system
CN113051128B (en) Power consumption detection method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination