WO2021189975A1 - 机器行为识别方法、装置、设备及计算机可读存储介质 - Google Patents

机器行为识别方法、装置、设备及计算机可读存储介质 Download PDF

Info

Publication number
WO2021189975A1
WO2021189975A1 PCT/CN2020/136324 CN2020136324W WO2021189975A1 WO 2021189975 A1 WO2021189975 A1 WO 2021189975A1 CN 2020136324 W CN2020136324 W CN 2020136324W WO 2021189975 A1 WO2021189975 A1 WO 2021189975A1
Authority
WO
WIPO (PCT)
Prior art keywords
data set
machine behavior
sample data
target
machine
Prior art date
Application number
PCT/CN2020/136324
Other languages
English (en)
French (fr)
Inventor
张秋蕾
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2021189975A1 publication Critical patent/WO2021189975A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • G06F18/2135Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods based on approximation criteria, e.g. principal component analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Definitions

  • This application relates to the field of information security technology, and in particular to a machine behavior recognition method, device, equipment, and computer-readable storage medium.
  • information security risk control measures can be used to identify the behavior of external or internal machines accessing the server.
  • Traditional information security risk control measures are usually based on statistical knowledge, combined with expert experience to set rules, and use the set rules to identify external access
  • the machine behavior of the server the inventor realized that this method is limited by the depth and breadth of the knowledge of the security experts who write the rules, and it is less flexible.
  • network attacks are endless, and the machine behavior is also different.
  • Traditional information security risk control The measures cannot accurately identify the behavior of the machine, and the security of the data needs to be improved. Therefore, how to improve the recognition accuracy of machine behavior and improve the security of data is a problem that needs to be solved urgently.
  • the embodiment of the present application provides a machine behavior recognition method, including:
  • the machine behavior recognition model includes a principal component analysis layer, a gradient descent tree model layer, a random forest tree model layer, and a logistic regression model layer;
  • the first machine behavior recognition result and the second machine behavior recognition result are input to the logistic regression model layer for fusion processing to obtain the machine behavior recognition result of the target data.
  • An embodiment of the present application also provides a machine behavior recognition device, and the machine behavior recognition device includes:
  • the machine behavior recognition module is used to input the target data into the machine behavior recognition model to obtain the machine behavior recognition result of the target data.
  • An embodiment of the present application also provides a computer device, the computer device including a processor, a memory, and a computer program stored on the memory and executable by the processor, wherein the computer program is executed by the processor When executed, the following steps are implemented:
  • the machine behavior recognition model includes a principal component analysis layer, a gradient descent tree model layer, a random forest tree model layer, and a logistic regression model layer;
  • the first machine behavior recognition result and the second machine behavior recognition result are input to the logistic regression model layer for fusion processing to obtain the machine behavior recognition result of the target data.
  • the embodiments of the present application also provide a computer-readable storage medium on which a computer program is stored.
  • the computer program is executed by a processor, the following steps are implemented:
  • the machine behavior recognition model includes a principal component analysis layer, a gradient descent tree model layer, a random forest tree model layer, and a logistic regression model layer;
  • the first machine behavior recognition result and the second machine behavior recognition result are input to the logistic regression model layer for fusion processing to obtain the machine behavior recognition result of the target data.
  • FIG. 1 is a schematic flowchart of a machine behavior recognition method provided by an embodiment of the present application
  • Figure 2 is a hierarchical schematic diagram of a machine behavior recognition model in an embodiment of the present application
  • FIG. 3 is a schematic flowchart of a machine behavior recognition method provided by an embodiment of the present application.
  • FIG. 4 is a schematic diagram of the sub-step flow diagram of the fraud identification method in FIG. 3;
  • FIG. 5 is a schematic block diagram of a machine behavior recognition device provided by an embodiment of the present application.
  • FIG. 6 is a schematic block diagram of the structure of a computer device provided by an embodiment of the present application.
  • the embodiments of the present application provide a machine behavior recognition method, device, equipment, and computer-readable storage medium.
  • the machine behavior recognition method can be applied to terminal devices, which can be mobile phones, tablet computers, notebook computers, desktop computers, personal digital assistants, and wearable devices.
  • the machine behavior recognition method can also be applied to servers.
  • the server can be a single server or a server cluster composed of multiple servers.
  • FIG. 1 is a schematic flowchart of a machine behavior recognition method provided by an embodiment of the present application. As shown in Fig. 1, the machine behavior recognition method includes steps S101 to S105.
  • Step S101 Obtain target data to be recognized and a machine behavior recognition model, where the machine behavior recognition model includes a principal component analysis layer, a gradient descent tree model layer, a random forest tree model layer, and a logistic regression model layer.
  • the machine behavior recognition model includes a principal component analysis layer, a gradient descent tree model layer, a random forest tree model layer, and a logistic regression model layer.
  • obtain the current access request to access the server and obtain the current access time of the current access request, and extract the IP address, user account, and user authority from the current access request; obtain multiple historical access requests based on the IP address And the access time point of each historical access request to obtain multiple historical access time points.
  • the IP address of each historical access request is the same as the IP address in the access request; the IP address, user account, user authority, and current access The time point and each historical access time point are used as a piece of target data to be identified.
  • the machine behavior recognition model is stored in a terminal device or a server, and the machine behavior recognition model is determined by fusion training of a preset random forest tree model and a preset gradient descent tree model, where the machine behavior recognition model includes a master Principal Component Analysis (PCA) layer, Gradient Boost Decision Tree (GBDT) layer, Random Forest (RF) model layer and Logistic Regression (LR) layer, as shown in the figure
  • PCA Principal Component Analysis
  • GBDT Gradient Boost Decision Tree
  • RF Random Forest
  • LR Logistic Regression
  • the PCA layer is connected in series with the GBDT model layer
  • the RF model layer is connected in parallel with the PCA layer and the GBDT model layer
  • the RF model layer is connected in series with the LR model
  • the GBDT model layer is connected in series with the LR model.
  • the PCA layer is used to reduce the dimensionality of the target data. .
  • Step S102 Input the target data to the principal component analysis layer for processing, to obtain principal component characteristics of the target data;
  • the target data Since the gradient descent tree model is not suitable for high-dimensional feature data, the target data needs to be reduced in dimensionality. Therefore, the target data is input to the principal component analysis layer for processing to obtain the principal component features of the target data, thereby reducing the dimensionality of the target data. Number, which is convenient for subsequent input to the gradient descent tree model layer for processing.
  • Step S103 input the principal component features to the gradient descent tree model layer for processing, and obtain the first machine behavior recognition result of the target data;
  • the principal component features of the target data are input to the gradient descent tree model layer for processing, and the first machine behavior recognition result of the target data is obtained, that is, the principal component features of the target data are input into the gradient descent tree model layer for processing, and the target data is obtained
  • the probability of classifying as uncertain behavior, the probability of target data being classified as machine behavior, and the probability of target data being classified as non-machine behavior, and the probability of classifying target data as uncertain behavior and the probability of target data The probability of classifying the target data as the machine behavior class and the probability of classifying the target data as the non-machine behavior class output the first machine behavior recognition result of the target data.
  • the probability that the target data is classified as an uncertain behavior type, the probability that the target data is classified as a machine behavior type, and the probability that the target data is classified as a non-machine behavior type are 75%, 60%, and 95%, respectively.
  • the classification of the data has the highest probability of being a non-machine behavior type, and the first machine behavior recognition result of the target data is that the target data is a non-machine behavior type.
  • the probability that the target data is classified as an uncertain behavior type and the probability of the target data
  • the probability of classifying as machine behavior and the probability of classifying target data as non-machine behavior are respectively 98%, 60%, and 30%. Since the probability of the target data being classified as an uncertain behavior class is the highest, the target data’s probability
  • the first machine behavior recognition result is that the target data is an uncertain behavior type.
  • Step S104 Input the target data to the random forest tree model layer for processing, and obtain a second machine behavior recognition result of the target data;
  • the second machine behavior recognition result of the target data is about to input the target data without dimensionality reduction into the random forest tree model layer for processing, and obtain the probability that the target data is classified as an uncertain behavior class, and the target data is classified as machine behavior
  • the probability of the class and the classification of the target data as a non-machine behavior class, and the probability of the target data being classified as an uncertain behavior class, the probability of the target data being classified as a machine behavior class, and the classification of the target data as a non-machine behavior The probability of the class outputs the second machine behavior recognition result of the target data.
  • the probability that the target data is classified as an uncertain behavior type, the probability that the target data is classified as a machine behavior type, and the probability that the target data is classified as a non-machine behavior type are 55%, 93%, and 70%, respectively.
  • the classification of the data has the highest probability of being a machine behavior type, and the second machine behavior recognition result of the target data is that the target data is a machine behavior type.
  • the probability that the target data is classified as an uncertain behavior type, and the target data is classified as The probability of machine behavior class and the probability that the target data is classified as non-machine behavior class are respectively 98%, 60%, and 30%. Since the probability of the target data being classified as an uncertain behavior class is the highest, the target data is the second The result of machine behavior recognition is that the target data is of uncertain behavior.
  • Step S105 Input the first machine behavior recognition result and the second machine behavior recognition result to the logistic regression model layer for fusion processing to obtain the machine behavior recognition result of the target data.
  • the first machine behavior recognition result and the second machine behavior recognition result of the target data are obtained, the first machine behavior recognition result and the second machine behavior recognition result are input to the logistic regression model layer for fusion processing, that is, when the first machine is determined When the behavior recognition result and the second machine behavior recognition result are different, the machine behavior recognition result of the target data is determined to be the target data classification as the uncertain behavior type; and when the first machine behavior recognition result is determined to be the target data, the classification is the machine behavior type , And the second machine behavior recognition result is that the target data is classified as the machine behavior class, the machine behavior recognition result of the target data is determined to be the target data classification as the machine behavior class; and when the first machine behavior recognition result is determined to be the target data When the classification is non-machine behavior, and the second machine behavior recognition result is that the target data is classified as non-machine behavior, the machine behavior recognition result of the target data is determined to be the target data and the classification is non-machine behavior.
  • the machine behavior recognition result is uploaded to the blockchain for storage.
  • the blockchain referred to in this application is a new application mode of computer technology such as distributed data storage, point-to-point transmission, consensus mechanism, and encryption algorithm.
  • Blockchain essentially a decentralized database, is a series of data blocks associated with cryptographic methods. Each data block contains a batch of network transaction information for verification. The validity of the information (anti-counterfeiting) and the generation of the next block.
  • the blockchain can include the underlying platform of the blockchain, the platform product service layer, and the application service layer.
  • the machine behavior recognition method inputs the target data by acquiring the target data to be recognized and the machine behavior recognition model including the principal component analysis layer, the gradient descent tree model layer, the random forest tree model layer, and the logistic regression model layer.
  • the principal component analysis layer to obtain the principal component characteristics of the target data
  • the principal component characteristics to the gradient descent tree model layer for processing to obtain the first machine behavior recognition result of the target data
  • the target data into the random forest at the same time
  • the tree model layer performs processing to obtain the second machine behavior recognition result of the target data.
  • the first machine behavior recognition result and the second machine behavior recognition result are input to the logistic regression model layer for fusion processing to obtain the machine behavior recognition result of the target data , Which greatly improves the recognition accuracy of machine behavior and improves data security.
  • FIG. 3 is a schematic flowchart of another machine behavior recognition method provided by an embodiment of the present application.
  • the machine behavior recognition method includes steps S201 to S208.
  • Step S201 Obtain a machine behavior data set, a non-machine behavior data set, and an uncertain behavior data set.
  • the machine behavior data set only includes machine behavior data
  • the non-machine behavior data set only includes non-machine behavior data
  • the uncertain behavior data set only includes uncertain behavior data that cannot be determined whether it is a machine behavior or a non-machine behavior.
  • the method of obtaining the machine behavior data set, the non-machine behavior data set, and the uncertain behavior data set may be: obtaining the server log data set, machine behavior recognition rules, and non-machine behavior recognition rules; according to the machine behavior
  • the recognition rule extracts a machine behavior data set from the log data set; extracts a non-machine behavior data set from the log data set according to the non-machine behavior recognition rule; removes the machine behavior data set and the non-machine behavior data set from the log data set, Obtain uncertain behavior data sets.
  • the machine behavior recognition rules are determined based on the characteristic information of machine behaviors
  • the non-machine behavior recognition rules are determined based on the characteristic information of non-machine behaviors
  • the characteristic information of machine behaviors and non-machine behaviors are based on the experience of security experts.
  • the characteristic information of machine behavior includes but is not limited to regular visits, honeypot links and remote logins, etc.
  • Step S202 Determine a target sample data set according to the machine behavior data set, non-machine behavior data set and uncertain behavior data set.
  • the target sample data set includes machine behavior data, non-machine behavior data, and uncertain behavior data, and the number of machine behavior data, non-machine behavior data, and uncertain behavior data is the same.
  • the method for determining the target sample data set may be: clustering samples in the machine behavior data set, non-machine behavior data set, and uncertain behavior data set to obtain sample data sets of multiple categories; Distribution information of machine behavior data, non-machine behavior data, and uncertain behavior data of sample data sets of three categories; when it is determined that the distribution information meets the preset distribution conditions, the target sample data set is determined according to the sample data sets of multiple categories.
  • clustering samples in the machine behavior data set, non-machine behavior data set, and uncertain behavior data set the machine behavior data and non-machine behavior data can be further marked from the uncertain behavior data set.
  • a clustering algorithm may be used to cluster samples in the machine behavior data set, non-machine behavior data set, and uncertain behavior data set to obtain sample data sets of multiple categories; when it is determined that the distribution information does not satisfy
  • the parameters of the clustering algorithm are updated, and the samples in the machine behavior data set, non-machine behavior data set and uncertain behavior data set are clustered based on the updated clustering algorithm to obtain multiple new categories Sample data set.
  • the parameters of the clustering algorithm include the number of clustering categories and the amount of data participating in the clustering.
  • the clustering algorithms include but are not limited to K-Means clustering algorithm, Mini Batch KMeans clustering algorithm, mean shift clustering algorithm and Density clustering algorithm (DBSCAN).
  • the distribution information of the machine behavior data, non-machine behavior data, and uncertain behavior data of the sample data sets of multiple categories obtained by clustering meets the preset distribution conditions, which can improve the accuracy of the sample data sets sex.
  • the multiple category sample data sets include the first sample data set that only contains uncertain behavior data, the second sample data set that contains machine behavior data and uncertain behavior data, but does not contain non-machine behavior data, and the second sample data set that contains non-machine behavior data.
  • the preset distribution conditions are the same as the first
  • the first ratio of the number of samples in this data set to the total number of samples is within the range of the first preset ratio
  • the second ratio of the number of samples in the second sample data set to the total number of samples is within the second preset ratio Range
  • the third ratio of the number of samples in the third sample data set to the total number of samples is in the third preset ratio range
  • the fourth ratio of the number of samples in the fourth sample data set to the total number of samples is in the first Four preset ratio ranges
  • the sum of the first ratio, the second ratio, the third ratio, and the fourth ratio is 1, the first preset ratio range, the second preset ratio range, the third preset ratio range, and the fourth ratio
  • the set ratio range can be set based on actual conditions, and this application does not specifically limit this.
  • the method of determining the target sample data set according to the sample data sets of multiple categories may be: combining the first sample data set and the fourth sample data set Mark the sample data as the sample data of the uncertain behavior class to obtain the first candidate sample data set; mark the sample data in the second sample data set as the sample data of the machine behavior class to obtain the second candidate sample data set; The sample data in the three-sample data set is marked as non-machine behavior sample data, and the third candidate sample data set is obtained; extracted from the first candidate sample data set, the second candidate sample data set, and the third candidate sample data set respectively
  • the preset number of sample data is used to obtain the target sample data set. Among them, the preset number can be set based on actual conditions, which is not specifically limited in this application.
  • Step S203 Perform fusion training on the preset random forest tree model and the preset gradient descent tree model according to the target sample data set to obtain the machine behavior recognition model.
  • model parameters of the preset random forest tree model and the preset gradient descent tree model can be set based on actual conditions, which are not specifically limited in this application.
  • step S203 includes sub-steps S2031 to S2035.
  • the ratio coefficient of the verification sample data set to the target sample data set is obtained, and the target sample data set is split into the verification sample data set and the training sample data set according to the ratio coefficient.
  • the ratio coefficient of the verification sample data set to the target sample data set can be set based on actual conditions, and this application does not specifically limit this, for example, the ratio coefficient is 0.2.
  • the target sample data set includes 1000 pieces of sample data, and the ratio coefficient of the verification sample data set to the target sample data set is 0.2, then the target sample data set is split into a verification sample data set that includes 200 pieces of sample data and includes A training sample data set of 800 sample data.
  • the feature trains the preset gradient descent tree model until the preset gradient descent tree model converges or the number of training reaches the set number; at the same time, one training sample data is selected from the training sample data set each time, and the selected training sample data Set the random forest tree model to train until the preset random forest tree model converges or the number of training times reaches the set number.
  • the set times can be set based on actual conditions, which is not specifically limited in this application.
  • the features are input to the target gradient descent tree model for processing, and the first prediction category of the machine recognition result of the selected verification sample data is obtained, and the above process is repeated to obtain the first prediction of the machine recognition result of each verification sample data in the verification sample data set Category; compare the first prediction category of the machine recognition result of each verification sample data with the label category of the machine recognition result of each verification sample data; count the number of verification sample data with the same first prediction category and the label category, Obtain the first number, and count the total number of samples in the verification sample data set, and use the ratio of the first number to the total number of samples as the first accuracy rate of the target gradient descent tree model; obtain the first prediction category and The verification sample data of different label categories are collected, and the verification sample data of the first prediction category and the label category are different to obtain the first error
  • the second prediction category of the machine recognition result of the selected verification sample data is obtained, and repeat In the above process, the second prediction category of the machine recognition result of each verification sample data in the verification sample data set is obtained; Compare the labeled categories; count the number of verification sample data with the same second prediction category and labeled category to obtain the second number, and count the total number of samples in the verification sample data set, and the second number is accounted for in the total sample
  • the ratio of the number is used as the second accuracy rate of the target random forest tree model; obtain the verification sample data with the second prediction category and the label category different, and collect the verification sample data with the second prediction category and the label category different to obtain the target random forest tree
  • the second error sample data set of the model is used as the second accuracy rate of the target random forest tree model.
  • the error sample data set of the target gradient descent tree model is the first error sample data set
  • the error sample data set of the target random forest tree model is the second error sample data set
  • the first error sample data set and the second error sample data set are obtained.
  • the intersection of the error sample data set, and determine the number of error sample data contained in the intersection record it as the target number; determine the number of error sample data contained in the first error sample data set, record it as the first total number, confirm The number of error sample data contained in the second error sample data set is recorded as the second total number; the ratio of the target number to the first total number is determined, the first similarity is obtained, and the target number is determined to account for the first total number.
  • the ratio of the two total numbers to the second degree of similarity is obtained; the average value of the first degree of similarity and the second degree of similarity is determined, and the average value of the first degree of similarity and the second degree of similarity is used as the first error sample data set and the first The similarity between the two wrong sample data sets.
  • the target is compared according to the training sample data set.
  • the random forest tree model and the target gradient descent tree model are fused and trained to obtain the machine behavior recognition model.
  • the preset similarity and the preset accuracy rate can be set based on the actual situation, which is not specifically limited in this application.
  • the preset randomness is adjusted.
  • the model parameters of the forest tree model and the gradient descent tree model are preset, and the gradient descent tree model and the random forest tree model that adjust the model parameters are trained according to the training sample data set to obtain the target gradient descent tree model and the target random forest tree model , And then perform steps S2033 and S2034.
  • the target random forest tree model and the target gradient descent tree model are fused training according to the training sample data set, and the way to obtain the machine behavior recognition model may be: each time a training sample is selected from the training sample data set Data; input the selected training sample data into the target random forest tree model and the target gradient descent tree model for processing, and obtain the first machine behavior recognition result and the second machine behavior recognition result; according to the first machine behavior recognition result and the second machine behavior According to the recognition result, the preset logistic regression model is trained until the trained logistic regression model meets the preset constraint conditions, and the machine behavior recognition model is obtained.
  • the preset constraints are:
  • x 1 is the first machine behavior recognition result
  • x 2 is the second machine behavior recognition result
  • h(x i ) is the machine behavior recognition result output by the LR model
  • w 0 is the deviation
  • w 1 is the weight coefficient of the GBDT model
  • W 2 is the weight coefficient of the random forest tree model
  • y i is the machine behavior recognition result of the training data
  • w j is one of w 0 , w 1 and w 2
  • is the coefficient, which can be selected as 0.05, Sum the difference squares between the output machine behavior recognition results of all samples and the labeled machine behavior recognition results to ensure that the classification results output by the LR model and the labeled results are as consistent as possible, and the output machine behavior recognition results and the labeled results When the machine behavior recognition results are consistent, w 1 and w 2 are as small as possible to reduce computational pressure or over-fitting.
  • Step S204 Obtain target data to be recognized and a machine behavior recognition model, where the machine behavior recognition model includes a principal component analysis layer, a gradient descent tree model layer, a random forest tree model layer, and a logistic regression model layer.
  • the machine behavior recognition model includes a principal component analysis layer, a gradient descent tree model layer, a random forest tree model layer, and a logistic regression model layer.
  • obtain the current access request to access the server and obtain the current access time of the current access request, and extract the IP address, user account, and user authority from the current access request; obtain multiple historical access requests based on the IP address And the access time point of each historical access request to obtain multiple historical access time points.
  • the IP address of each historical access request is the same as the IP address in the access request; the IP address, user account, user authority, and current access The time point and each historical access time point are used as a piece of target data to be identified.
  • Step S205 Input the target data to the principal component analysis layer for processing to obtain principal component characteristics of the target data.
  • the target data Since the gradient descent tree model is not suitable for high-dimensional feature data, the target data needs to be reduced in dimensionality. Therefore, the target data is input to the principal component analysis layer for processing to obtain the principal component features of the target data, thereby reducing the dimensionality of the target data. Number, which is convenient for subsequent input to the gradient descent tree model layer for processing.
  • Step S206 Input the principal component features to the gradient descent tree model layer for processing, and obtain the first machine behavior recognition result of the target data.
  • the principal component features of the target data are input to the gradient descent tree model layer for processing, and the first machine behavior recognition result of the target data is obtained, that is, the principal component features of the target data are input into the gradient descent tree model layer for processing, and the target data is obtained
  • the probability of classifying as uncertain behavior, the probability of target data being classified as machine behavior, and the probability of target data being classified as non-machine behavior, and the probability of classifying target data as uncertain behavior and the probability of target data The probability of classifying the target data as the machine behavior class and the probability of classifying the target data as the non-machine behavior class output the first machine behavior recognition result of the target data.
  • Step S207 Input the target data to the random forest tree model layer for processing, and obtain a second machine behavior recognition result of the target data.
  • the second machine behavior recognition result of the target data is about to input the target data without dimensionality reduction into the random forest tree model layer for processing, and obtain the probability that the target data is classified as an uncertain behavior class, and the target data is classified as machine behavior
  • the probability of the class and the classification of the target data as a non-machine behavior class, and the probability of the target data being classified as an uncertain behavior class, the probability of the target data being classified as a machine behavior class, and the classification of the target data as a non-machine behavior The probability of the class outputs the second machine behavior recognition result of the target data.
  • Step S208 Input the first machine behavior recognition result and the second machine behavior recognition result to the logistic regression model layer for fusion processing to obtain the machine behavior recognition result of the target data
  • the first machine behavior recognition result and the second machine behavior recognition result of the target data are obtained, the first machine behavior recognition result and the second machine behavior recognition result are input to the logistic regression model layer for fusion processing, that is, when the first machine is determined When the behavior recognition result and the second machine behavior recognition result are different, the machine behavior recognition result of the target data is determined to be the target data classification as the uncertain behavior type; and when the first machine behavior recognition result is determined to be the target data, the classification is the machine behavior type , And the second machine behavior recognition result is that the target data is classified as the machine behavior class, the machine behavior recognition result of the target data is determined to be the target data classification as the machine behavior class; and when the first machine behavior recognition result is determined to be the target data When the classification is non-machine behavior, and the second machine behavior recognition result is that the target data is classified as non-machine behavior, it is determined that the machine behavior recognition result of the target data is the target data and the classification is non-machine behavior.
  • the machine behavior recognition method obtaineds machine behavior data sets, non-machine behavior data sets, and uncertain behavior data sets, and determines according to the machine behavior data sets, non-machine behavior data sets, and uncertain behavior data sets
  • the target sample data set, and then according to the target sample data set, the preset random forest tree model and the preset gradient descent tree model are fused to train to obtain the machine behavior recognition model, and the target data is input to the principal component analysis layer for processing to obtain the target
  • the principal component characteristics of the data, and the principal component characteristics are input to the gradient descent tree model layer for processing, and the first machine behavior recognition result of the target data is obtained.
  • the target data is input into the random forest tree model layer for processing, and the target data is obtained.
  • the second machine behavior recognition result, and finally the first machine behavior recognition result and the second machine behavior recognition result are input to the logistic regression model layer for fusion processing, and the machine behavior recognition result of the target data is obtained, which greatly improves the machine behavior recognition The accuracy of the model's recognition of machine behavior.
  • FIG. 5 is a schematic block diagram of a machine behavior recognition device provided by an embodiment of the present application.
  • the machine behavior recognition device 300 includes: an acquisition module 310, a first machine behavior recognition module 320, a second machine behavior recognition module 330, and a fusion module 340, wherein:
  • the acquisition module 310 is used to acquire target data to be identified and a machine behavior recognition model, where the machine behavior recognition model includes a principal component analysis layer, a gradient descent tree model layer, a random forest tree model layer, and a logistic regression model layer ;
  • the first machine behavior recognition module 320 is configured to input the target data into the principal component analysis layer for processing to obtain principal component characteristics of the target data; and input the principal component characteristics into the gradient Process by descending the tree model layer to obtain the first machine behavior recognition result of the target data;
  • the second machine behavior recognition module 330 is configured to input the target data into the random forest tree model layer for processing to obtain a second machine behavior recognition result of the target data;
  • the fusion module 340 is configured to input the first machine behavior recognition result and the second machine behavior recognition result to the logistic regression model layer for fusion processing to obtain the machine behavior recognition result of the target data.
  • the machine behavior recognition device 300 further includes:
  • the acquiring module 310 is also used to acquire machine behavior data sets, non-machine behavior data sets, and uncertain behavior data sets;
  • the determining module is used to determine the target sample data set according to the machine behavior data set, non-machine behavior data set and uncertain behavior data set;
  • the model training module is used to perform fusion training on the preset random forest tree model and the preset gradient descending tree model according to the target sample data set to obtain the machine behavior recognition model.
  • the obtaining module 310 is further configured to:
  • the determining module is further used for:
  • a target sample data set is determined according to the sample data sets of the multiple categories.
  • the sample data sets of the multiple categories include a first sample data set, a second sample data set, a third sample data set, and a fourth sample data set
  • the first sample data set is only Contains uncertain behavior data
  • the second sample data set includes machine behavior data and uncertain behavior data, but does not include non-machine behavior data
  • the third sample data set includes non-machine behavior data and uncertain behavior data, but Excluding machine behavior data
  • the fourth sample data set includes machine behavior data, non-machine behavior data, and uncertain behavior data
  • the determining module is further used for:
  • a preset number of sample data are extracted from the first candidate sample data set, the second candidate sample data set, and the third candidate sample data set, respectively, to obtain the target sample data set.
  • model training module is further used to:
  • model training module is further used to:
  • the preset logistic regression model is trained until the trained logistic regression model satisfies the preset constraint conditions, and the machine behavior recognition model is obtained.
  • the apparatus provided in the foregoing embodiment may be implemented in the form of a computer program, and the computer program may run on the computer device as shown in FIG. 6.
  • FIG. 6 is a schematic block diagram of the structure of a computer device provided by an embodiment of the present application.
  • the computer equipment can be a server or a terminal.
  • the computer device includes a processor, a memory, and a network interface connected through a system bus, where the memory may be volatile or non-volatile.
  • the non-volatile storage medium can store an operating system and a computer program.
  • the computer program includes program instructions, and when the program instructions are executed, the processor can execute any data leakage reminding method.
  • the processor is used to provide computing and control capabilities and support the operation of the entire computer equipment.
  • the internal memory provides an environment for the operation of the computer program in the non-volatile storage medium.
  • the processor can execute any machine behavior identification method.
  • the network interface is used for network communication, such as sending assigned tasks.
  • the network interface is used for network communication, such as sending assigned tasks.
  • FIG. 6 is only a block diagram of part of the structure related to the solution of the present application, and does not constitute a limitation on the computer device to which the solution of the present application is applied.
  • the specific computer device may Including more or fewer parts than shown in the figure, or combining some parts, or having a different arrangement of parts.
  • the processor may be a central processing unit (Central Processing Unit, CPU), and the processor may also be other general-purpose processors, digital signal processors (Digital Signal Processor, DSP), and application specific integrated circuits (Application Specific Integrated Circuits). Circuit, ASIC), Field-Programmable Gate Array (FPGA) or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components, etc.
  • the general-purpose processor may be a microprocessor or the processor may also be any conventional processor.
  • the processor is configured to run a computer program stored in a memory to implement the following steps:
  • the machine behavior recognition model includes a principal component analysis layer, a gradient descent tree model layer, a random forest tree model layer, and a logistic regression model layer;
  • the first machine behavior recognition result and the second machine behavior recognition result are input to the logistic regression model layer for fusion processing to obtain the machine behavior recognition result of the target data.
  • the method before acquiring the target data to be recognized and the machine behavior recognition model, the method further includes:
  • fusion training is performed on the preset random forest tree model and the preset gradient descent tree model to obtain the machine behavior recognition model.
  • the acquiring a machine behavior data set, a non-machine behavior data set, and an uncertain behavior data set includes:
  • the determining the target sample data set according to the machine behavior data set, the non-machine behavior data set, and the uncertain behavior data set includes:
  • a target sample data set is determined according to the sample data sets of the multiple categories.
  • the sample data sets of the multiple categories include a first sample data set, a second sample data set, a third sample data set, and a fourth sample data set
  • the first sample data set is only Contains uncertain behavior data
  • the second sample data set includes machine behavior data and uncertain behavior data, but does not include non-machine behavior data
  • the third sample data set includes non-machine behavior data and uncertain behavior data, but Excluding machine behavior data
  • the fourth sample data set includes machine behavior data, non-machine behavior data, and uncertain behavior data
  • determining a target sample data set according to the multiple types of sample data sets includes:
  • a preset number of sample data are extracted from the first candidate sample data set, the second candidate sample data set, and the third candidate sample data set, respectively, to obtain the target sample data set.
  • the fusion training of a preset random forest tree model and a preset gradient descent tree model according to the target sample data set to obtain a machine behavior recognition model includes:
  • the target random forest tree is compared according to the training sample data set.
  • the model and the target gradient descent tree model are fused and trained to obtain a machine behavior recognition model.
  • the fusion training of the target random forest tree model and the target gradient descent tree model according to the training sample data set to obtain a machine behavior recognition model includes:
  • the preset logistic regression model is trained until the trained logistic regression model satisfies the preset constraint conditions, and the machine behavior recognition model is obtained.
  • the embodiments of the present application also provide a computer-readable storage medium, and the computer-readable storage medium may be volatile or non-volatile.
  • a computer program is stored on the computer-readable storage medium, and the computer program includes program instructions.
  • the method implemented when the program instructions are executed please refer to the various embodiments of the machine behavior identification method of the present application.
  • the computer-readable storage medium may be the internal storage unit of the computer device described in the foregoing embodiment, for example, the hard disk or memory of the computer device.
  • the computer-readable storage medium may also be an external storage device of the computer device, such as a plug-in hard disk equipped on the computer device, a smart memory card (Smart Media Card, SMC), and a Secure Digital (SD) ) Card, Flash Card, etc.
  • a plug-in hard disk equipped on the computer device such as a smart memory card (Smart Media Card, SMC), and a Secure Digital (SD) ) Card, Flash Card, etc.
  • SD Secure Digital
  • the computer-readable storage medium may mainly include a storage program area and a storage data area, where the storage program area may store an operating system, an application program required by at least one function, etc.; the storage data area may store Data created by the use of nodes, etc.
  • the blockchain referred to in this application is a new application mode of computer technology such as distributed data storage, point-to-point transmission, consensus mechanism, and encryption algorithm.
  • Blockchain essentially a decentralized database, is a series of data blocks associated with cryptographic methods. Each data block contains a batch of network transaction information for verification. The validity of the information (anti-counterfeiting) and the generation of the next block.
  • the blockchain can include the underlying platform of the blockchain, the platform product service layer, and the application service layer.

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Software Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Medical Informatics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Image Analysis (AREA)

Abstract

一种机器行为识别方法、装置、设备及计算机可读存储介质,涉及安全防护技术领域,该方法包括:获取待识别的目标数据以及机器行为识别模型,其中,所述机器行为识别模型包括主成分分析层、梯度下降树模型层、随机森林树模型层和逻辑回归模型层(S101);将所述目标数据输入至所述主成分分析层进行处理,得到所述目标数据的主成分特征(S102);将所述主成分特征输入至梯度下降树模型层进行处理,得到所述目标数据的第一机器行为识别结果(S103);将所述目标数据输入至随机森林树模型层进行处理,得到所述目标数据的第二机器行为识别结果(S104);将所述第一机器行为识别结果和第二机器行为识别结果输入至所述逻辑回归模型层进行融合处理,得到所述目标数据的机器行为识别结果(S105),其中,机器行为识别结果可以上传至区块链进行存储。所述方法提高了机器行为的识别准确性。

Description

机器行为识别方法、装置、设备及计算机可读存储介质
本申请要求于2020年08月28日提交中国专利局、申请号为CN202010888899.0、名称为“机器行为识别方法、装置、设备及计算机可读存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及信息安全技术领域,尤其涉及一种机器行为识别方法、装置、设备及计算机可读存储介质。
背景技术
随着互联网技术的快速发展,越来越多的企业都依托于互联网技术实现业务办理、企业管理和资料管理等,业务办理、企业管理和资料管理等所需的数据,例如客户身份信息、企业机密资料和财务数据等,通常是存储在服务器中的,因此,如何保证数据的安全性极其重要。
目前,可以通过信息安全风控措施识别外界或内部访问服务器的机器行为,传统的信息安全风控措施通常是基于统计学知识,结合专家经验来设定规则,通过设定的规则来识别外界访问服务器的机器行为,发明人意识到这种方式受限于编写规则的安全专家自身知识的深度和广度,灵活性较差,同时网络攻击层出不穷,机器行为也不尽相同,传统的信息安全风控措施无法准确的识别机器行为,数据的安全性有待提高。因此,如何提高机器行为的识别准确性,提高数据的安全性是目前亟待解决的问题。
发明内容
本申请实施例提供一种机器行为识别方法,包括:
获取待识别的目标数据以及机器行为识别模型,其中,所述机器行为识别模型包括主成分分析层、梯度下降树模型层、随机森林树模型层和逻辑回归模型层;
将所述目标数据输入至所述主成分分析层进行处理,得到所述目标数据的主成分特征;
将所述主成分特征输入至所述梯度下降树模型层进行处理,得到所述目标数据的第一机器行为识别结果;
将所述目标数据输入至所述随机森林树模型层进行处理,得到所述目标数据的第二机器行为识别结果;
将所述第一机器行为识别结果和第二机器行为识别结果输入至所述逻辑回归模型层进行融合处理,得到所述目标数据的机器行为识别结果。
本申请实施例还提供一种机器行为识别装置,所述机器行为识别装置包括:
获取模块,用于获取待识别的目标数据以及机器行为识别模型,其中,所述机器行为识别模型是对随机森林树模型和梯度下降树模型进行融合训练确定的;
机器行为识别模块,用于将所述目标数据输入至所述机器行为识别模型,得到所述目标数据的机器行为识别结果。
本申请实施例还提供一种计算机设备,所述计算机设备包括处理器、存储器、以及存储在所述存储器上并可被所述处理器执行的计算机程序,其中所述计算机程序被所述处理器执行时,实现如下步骤:
获取待识别的目标数据以及机器行为识别模型,其中,所述机器行为识别模型包括主成分分析层、梯度下降树模型层、随机森林树模型层和逻辑回归模型层;
将所述目标数据输入至所述主成分分析层进行处理,得到所述目标数据的主成分特征;
将所述主成分特征输入至所述梯度下降树模型层进行处理,得到所述目标数据的第一机器行为识别结果;
将所述目标数据输入至所述随机森林树模型层进行处理,得到所述目标数据的第二机器行为识别结果;
将所述第一机器行为识别结果和第二机器行为识别结果输入至所述逻辑回归模型层进行融合处理,得到所述目标数据的机器行为识别结果。
第四方面,本申请实施例还提供一种计算机可读存储介质,所述计算机可读存储介质上存储有计算机程序,其中所述计算机程序被处理器执行时,实现如下步骤:
获取待识别的目标数据以及机器行为识别模型,其中,所述机器行为识别模型包括主成分分析层、梯度下降树模型层、随机森林树模型层和逻辑回归模型层;
将所述目标数据输入至所述主成分分析层进行处理,得到所述目标数据的主成分特征;
将所述主成分特征输入至所述梯度下降树模型层进行处理,得到所述目标数据的第一机器行为识别结果;
将所述目标数据输入至所述随机森林树模型层进行处理,得到所述目标数据的第二机器行为识别结果;
将所述第一机器行为识别结果和第二机器行为识别结果输入至所述逻辑回归模型层进行融合处理,得到所述目标数据的机器行为识别结果。
附图说明
为了更清楚地说明本申请实施例技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1是本申请实施例提供的一种机器行为识别方法的流程示意图;
图2是本申请实施例中的机器行为识别模型的一层级示意图;
图3是本申请实施例提供的一种机器行为识别方法的流程示意图;
图4为图3中的欺诈识别方法的子步骤流程示意图;
图5是本申请实施例提供的一种机器行为识别装置的示意性框图;
图6是本申请实施例提供的一种计算机设备的结构示意框图。
本申请目的的实现、功能特点及优点将结合实施例,参阅附图做进一步说明。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
附图中所示的流程图仅是示例说明,不是必须包括所有的内容和操作/步骤,也不是必须按所描述的顺序执行。例如,有的操作/步骤还可以分解、组合或部分合并,因此实际执行的顺序有可能根据实际情况改变。
本申请实施例提供一种机器行为识别方法、装置、设备及计算机可读存储介质。其中,该机器行为识别方法可应用于终端设备中,该终端设备可以手机、平板电脑、笔记本电脑、台式电脑、个人数字助理和穿戴式设备等电子设备,该机器行为识别方法还可以应用于服务器,该服务器可以为单台的服务器,也可以为由多台服务器组成的服务器集群。
下面结合附图,对本申请的一些实施方式作详细说明。在不冲突的情况下,下述的实施例及实施例中的特征可以相互组合。
请参阅图1,图1是本申请实施例提供的一种机器行为识别方法的流程示意图。如图 1所示,该机器行为识别方法包括步骤S101至步骤S105。
步骤S101、获取待识别的目标数据以及机器行为识别模型,其中,所述机器行为识别模型包括主成分分析层、梯度下降树模型层、随机森林树模型层和逻辑回归模型层。
示例性的,获取访问服务器的当前访问请求,并获取当前访问请求的当前访问时刻点,且从当前访问请求中提取IP地址、用户账号和用户权限;基于该IP地址,获取多个历史访问请求以及每个历史访问请求的访问时刻点,得到多个历史访问时刻点,其中,每个历史访问请求的IP地址与访问请求中的IP地址相同;将IP地址、用户账号、用户权限、当前访问时刻点以及每个历史访问时刻点作为一条待识别的目标数据。
其中,该机器行为识别模型存储在终端设备或者服务器中,该机器行为识别模型是对预设随机森林树模型和预设梯度下降树模型进行融合训练确定的,其中,该机器行为识别模型包括主成分分析(Principal Component Analysis,PCA)层、梯度下降树模型(Gradient Boost Decision Tree,GBDT)层、随机森林树(Random Forest,RF)模型层和逻辑回归模型(Logistic Regression,LR)层,如图2所示,PCA层与GBDT模型层串联,RF模型层与PCA层以及GBDT模型层并联,RF模型层与LR模型串联,GBDT模型层与LR模型串联,PCA层用于对目标数据进行降维。
步骤S102、将所述目标数据输入至所述主成分分析层进行处理,得到所述目标数据的主成分特征;
由于梯度下降树模型不适用于高维特征数据,需要先对目标数据进行降维处理,因此将目标数据输入至主成分分析层进行处理,得到目标数据的主成分特征,从而降低目标数据的维数,便于后续输入到梯度下降树模型层进行处理。
步骤S103、将所述主成分特征输入至所述梯度下降树模型层进行处理,得到所述目标数据的第一机器行为识别结果;
将目标数据的主成分特征输入至梯度下降树模型层进行处理,得到目标数据的第一机器行为识别结果,即将目标数据的主成分特征输入至梯度下降树模型层进行处理,得到该目标数据的分类为不确定行为类的概率、目标数据的分类为机器行为类的概率和目标数据的分类为非机器行为类的概率,并根据该目标数据的分类为不确定行为类的概率、目标数据的分类为机器行为类的概率和目标数据的分类为非机器行为类的概率输出目标数据的第一机器行为识别结果。
例如,该目标数据的分类为不确定行为类的概率、目标数据的分类为机器行为类的概率和目标数据的分类为非机器行为类的概率分别为75%、60%和95%,由于目标数据的分类为非机器行为类的概率最高,则目标数据的第一机器行为识别结果为目标数据为非机器行为类,又例如,该目标数据的分类为不确定行为类的概率、目标数据的分类为机器行为类的概率和目标数据的分类为非机器行为类的概率分别为98%、60%和30%,由于目标数据的分类为不确定行为类的概率的概率最高,则目标数据的第一机器行为识别结果为目标数据为不确定行为类。
步骤S104、将所述目标数据输入至所述随机森林树模型层进行处理,得到所述目标数据的第二机器行为识别结果;
由于梯度下降树模型不适用于高维特征数据,使用PCA对目标数据降维后,目标数据丢失了部分潜在信息,因此将未做降维的目标数据输入至随机森林树模型层进行处理,得到目标数据的第二机器行为识别结果,即将将未做降维的目标数据输入至随机森林树模型层进行处理,得到该目标数据的分类为不确定行为类的概率、目标数据的分类为机器行为类的概率和目标数据的分类为非机器行为类的概率,并根据该目标数据的分类为不确定行为类的概率、目标数据的分类为机器行为类的概率和目标数据的分类为非机器行为类的概率输出目标数据的第二机器行为识别结果。
例如,该目标数据的分类为不确定行为类的概率、目标数据的分类为机器行为类的概 率和目标数据的分类为非机器行为类的概率分别为55%、93%和70%,由于目标数据的分类为机器行为类的概率最高,则目标数据的第二机器行为识别结果为目标数据为机器行为类,又例如,该目标数据的分类为不确定行为类的概率、目标数据的分类为机器行为类的概率和目标数据的分类为非机器行为类的概率分别为98%、60%和30%,由于目标数据的分类为不确定行为类的概率的概率最高,则目标数据的第二机器行为识别结果为目标数据为不确定行为类。
步骤S105、将所述第一机器行为识别结果和第二机器行为识别结果输入至所述逻辑回归模型层进行融合处理,得到所述目标数据的机器行为识别结果。
在得到目标数据的第一机器行为识别结果和第二机器行为识别结果后,将第一机器行为识别结果和第二机器行为识别结果输入至逻辑回归模型层进行融合处理,即当确定第一机器行为识别结果和第二机器行为识别结果不同时,确定目标数据的机器行为识别结果为目标数据的分类为不确定行为类;而当确定第一机器行为识别结果为目标数据的分类为机器行为类,且第二机器行为识别结果为目标数据的分类为机器行为类时,确定目标数据的机器行为识别结果为目标数据的分类为机器行为类;而当确定第一机器行为识别结果为目标数据的分类为非机器行为类,且第二机器行为识别结果为目标数据的分类为非机器行为类时,确定目标数据的机器行为识别结果为目标数据的分类为非机器行为类。
在一实施例中,将机器行为识别结果上传至区块链进行存储。其中,本申请所指区块链是分布式数据存储、点对点传输、共识机制、加密算法等计算机技术的新型应用模式。区块链(Blockchain),本质上是一个去中心化的数据库,是一串使用密码学方法相关联产生的数据块,每一个数据块中包含了一批次网络交易的信息,用于验证其信息的有效性(防伪)和生成下一个区块。区块链可以包括区块链底层平台、平台产品服务层以及应用服务层等。通过将机器行为识别结果上传至区块链进行存储,可以保证机器行为识别结果的安全性。
上述实施例提供的机器行为识别方法,通过获取待识别的目标数据以及包含主成分分析层、梯度下降树模型层、随机森林树模型层和逻辑回归模型层的机器行为识别模型,将目标数据输入至主成分分析层进行处理,得到目标数据的主成分特征,并将主成分特征输入至梯度下降树模型层进行处理,得到目标数据的第一机器行为识别结果,同时将目标数据输入至随机森林树模型层进行处理,得到目标数据的第二机器行为识别结果,最后将第一机器行为识别结果和第二机器行为识别结果输入至逻辑回归模型层进行融合处理,得到目标数据的机器行为识别结果,极大地提高了机器行为的识别准确性,提高数据的安全性。
请参阅图3,图3是本申请实施例提供的另一种机器行为识别方法的流程示意图。
如图3所示,该机器行为识别方法包括步骤S201至S208。
步骤S201、获取机器行为数据集、非机器行为数据集和不确定行为数据集。
其中,机器行为数据集仅包含机器行为数据,非机器行为数据集仅包含非机器行为数据,不确定行为数据集仅包括无法确定是机器行为,还是非机器行为的不确定行为数据。
在一实施例中,获取机器行为数据集、非机器行为数据集和不确定行为数据集的方式可以为:获取服务器的日志数据集、机器行为识别规则和非机器行为识别规则;根据该机器行为识别规则从日志数据集中提取出机器行为数据集;根据该非机器行为识别规则从日志数据集中提取出非机器行为数据集;从该日志数据集中剔除该机器行为数据集和非机器行为数据集,得到不确定行为数据集。其中,机器行为识别规则是根据机器行为的特征信息确定的,非机器行为识别规则是根据非机器行为的特征信息确定的,机器行为的特征信息和非机器行为的特征信息是根据安全专家的经验和知识总结确定的,机器行为的特征信息包括但不限于规律访问、访问蜜罐链接和异地登录等。通过机器行为识别规则和非机器行为识别规则,可以从大量的日志数据中提取机器行为数据、非机器行为数据和不确定行为数据,便于后续准确地确定训练样本。
步骤S202、根据所述机器行为数据集、非机器行为数据集和不确定行为数据集,确 定目标样本数据集。
其中,目标样本数据集包括机器行为数据、非机器行为数据和不确定行为数据,且机器行为数据、非机器行为数据和不确定行为数据的数量相同。
在一实施例中,确定目标样本数据集的方式可以为:对机器行为数据集、非机器行为数据集和不确定行为数据集中的样本进行聚类,得到多个类别的样本数据集;确定多个类别的样本数据集的机器行为数据、非机器行为数据和不确定行为数据的分布信息;当确定该分布信息满足预设分布条件时,根据多个类别的样本数据集确定目标样本数据集。通过对机器行为数据集、非机器行为数据集和不确定行为数据集中的样本进行聚类,能够进一步地从不确定行为数据集中标记出机器行为数据和非机器行为数据。
在一实施例中,可以采用聚类算法对机器行为数据集、非机器行为数据集和不确定行为数据集中的样本进行聚类,得到多个类别的样本数据集;当确定该分布信息不满足预设分布条件时,更新聚类算法的参数,并基于更新后的聚类算法对机器行为数据集、非机器行为数据集和不确定行为数据集中的样本进行聚类,得到新的多个类别的样本数据集。其中,聚类算法的参数包括聚类的类别个数和参与聚类的数据量,聚类算法包括但不限于K-Means聚类算法、Mini Batch KMeans聚类算法、均值漂移聚类算法和基于密度的聚类算法(DBSCAN)。通过调整聚类算法的参数,使得聚类得到的多个类别的样本数据集的机器行为数据、非机器行为数据和不确定行为数据的分布信息满足预设分布条件,能够提高样本数据集的准确性。
其中,多个类别样本数据集包括仅包含不确定行为数据的第一样本数据集、包含机器行为数据和不确定行为数据,但不包含非机器行为数据的第二样本数据集、包含非机器行为数据和不确定行为数据,但不包含机器行为数据的第三样本数据集和包含机器行为数据、非机器行为数据和不确定行为数据的第四样本数据集,预设分布条件为第一样本数据集内的样本个数占总样本个数的第一比例处于第一预设比例范围、第二样本数据集内的样本个数占总样本个数的第二比例处于第二预设比例范围、第三样本数据集内的样本个数占总样本个数的第三比例处于第三预设比例范围、第四样本数据集内的样本个数占总样本个数的第四比例处于第四预设比例范围,第一比例、第二比例、第三比例和第四比例之和为1,第一预设比例范围、第二预设比例范围、第三预设比例范围和第四预设比例范围可基于实际情况进行设置,本申请对此不做具体限定。
在一实施例中,当确定该分布信息满足预设分布条件时,根据多个类别的样本数据集确定目标样本数据集的方式可以为:将第一样本数据集和第四样本数据集内的样本数据标记为不确定行为类的样本数据,得到第一候选样本数据集;将第二样本数据集内的样本数据标记为机器行为类的样本数据,得到第二候选样本数据集;将第三样本数据集内的样本数据标记为非机器行为类的样本数据,得到第三候选样本数据集;分别从第一候选样本数据集、第二候选样本数据集和第三候选样本数据集内提取预设数量的样本数据,得到目标样本数据集。其中,预设数量可基于实际情况进行设置,本申请对此不做具体限定。
步骤S203、根据所述目标样本数据集,对预设随机森林树模型和预设梯度下降树模型进行融合训练,得到所述机器行为识别模型。
其中,预设随机森林树模型和预设梯度下降树模型的模型参数可基于实际情况进行设置,本申请对此不做具体限定。
在一实施例中,如图4所示,步骤S203包括子步骤S2031至S2035。
S2031、将所述目标样本数据集拆分为验证样本数据集和训练样本数据集。
获取验证样本数据集占目标样本数据集的比例系数,并按照该比例系数将目标样本数据集拆分为验证样本数据集和训练样本数据集。其中,验证样本数据集占目标样本数据集的比例系数可基于实际情况进行设置,本申请对此不做具体限定,例如,该比例系数为0.2。示例性的,目标样本数据集包括1000条样本数据,验证样本数据集占目标样本数据集的 比例系数为0.2,则将目标样本数据集拆分为包括200条样本数据的验证样本数据集和包括800条样本数据的训练样本数据集。
S2032、根据所述训练样本数据集分别对预设梯度下降树模型和预设随机森林树模型进行训练,得到目标梯度下降树模型和目标随机森林树模型。
具体地,每次从训练样本数据集选取一个训练样本数据,并基于主成分分析算法,对选取的训练样本数据进行降维处理,得到选取的训练样本数据的主成分特征,且基于该主成分特征对预设梯度下降树模型进行训练,直到预设梯度下降树模型收敛或者训练次数达到设定次数;同时每次从训练样本数据集选取一个训练样本数据,并基于选取的训练样本数据对预设随机森林树模型进行训练,直到预设随机森林树模型收敛或者训练次数达到设定次数。其中,设定次数可基于实际情况进行设置,本申请对此不做具体限定。
S2033、根据所述验证样本数据集确定所述目标梯度下降树模型的第一准确率和第一错误样本数据集以及所述目标随机森林树模型的第二准确率和第二错误样本数据集。
具体地,每次从验证样本数据集中选取一个验证样本数据,并基于主成分分析算法,对选取的验证样本数据进行降维处理,得到选取的验证样本数据的主成分特征,且将该主成分特征输入至目标梯度下降树模型进行处理,得到选取的验证样本数据的机器识别结果的第一预测类别,重复上述过程,得到验证样本数据集中的每个验证样本数据的机器识别结果的第一预测类别;将每个验证样本数据的机器识别结果的第一预测类别与每个验证样本数据的机器识别结果的标记类别进行比较;统计第一预测类别与标记类别相同的验证样本数据的个数,得到第一个数,并统计验证样本数据集内的总样本个数,且将第一个数占总样本个数的比例作为目标梯度下降树模型的第一准确率;获取第一预测类别与标记类别不同的验证样本数据,并汇集第一预测类别与标记类别不同的验证样本数据,得到目标梯度下降树模型的第一错误样本数据集。
类似的,每次从验证样本数据集中选取一个验证样本数据,并将选取的验证样本数据输入至目标随机森林树模型进行处理,得到选取的验证样本数据的机器识别结果的第二预测类别,重复上述过程,得到验证样本数据集中的每个验证样本数据的机器识别结果的第二预测类别;将每个验证样本数据的机器识别结果的第二预测类别与每个验证样本数据的机器识别结果的标记类别进行比较;统计第二预测类别与标记类别相同的验证样本数据的个数,得到第二个数,并统计验证样本数据集内的总样本个数,且将第二个数占总样本个数的比例作为目标随机森林树模型的第二准确率;获取第二预测类别与标记类别不同的验证样本数据,并汇集第二预测类别与标记类别不同的验证样本数据,得到目标随机森林树模型的第二错误样本数据集。
S2034、确定所述第一错误样本数据集与所述第二错误样本数据集之间的相似度。
具体地,记目标梯度下降树模型的错误样本数据集为第一错误样本数据集,目标随机森林树模型的错误样本数据集为第二错误样本数据集,获取第一错误样本数据集与第二错误样本数据集的交集,并确定交集包含的错误样本数据的个数,记为目标个数;确定第一错误样本数据集包含的错误样本数据的个数,记为第一总个数,确定第二错误样本数据集包含的错误样本数据的个数,记为第二总个数;确定该目标个数占第一总个数的比例,得到第一相似度,确定该目标个数占第二总个数的比例,得到第二相似度;确定第一相似度与第二相似度的平均值,并将第一相似度与第二相似度的平均值作为第一错误样本数据集与第二错误样本数据集之间的相似度。
S2035、当确定所述相似度小于或等于预设相似度,且所述第一准确率和第二准确率均大于或等于预设准确率时,根据所述训练样本数据集对所述目标随机森林树模型和目标梯度下降树模型进行融合训练,得到机器行为识别模型。
当确定该相似度小于或等于预设相似度,且该目标随机森林树模型的准确率以及目标梯度下降树模型的准确率均大于或等于预设准确率时,根据该训练样本数据集对目标随机 森林树模型和目标梯度下降树模型进行融合训练,得到机器行为识别模型。其中,预设相似度和预设准确率可基于实际情况进行设置,本申请对此不做具体限定。
在一实施例中,当确定该相似度大于预设相似度、目标随机森林树模型的准确率小于预设准确率或目标梯度下降树模型的准确率小于预设准确率时,调整预设随机森林树模型和预设梯度下降树模型的模型参数,并根据训练样本数据集分别对调整模型参数的梯度下降树模型和随机森林树模型进行训练,得到目标梯度下降树模型和目标随机森林树模型,然后再执行步骤S2033和S2034。
在一实施例中,根据该训练样本数据集对目标随机森林树模型和目标梯度下降树模型进行融合训练,得到机器行为识别模型的方式可以为:每次从该训练样本数据集中选择一个训练样本数据;将选择的训练样本数据输入目标随机森林树模型和目标梯度下降树模型进行处理,得到第一机器行为识别结果和第二机器行为识别结果;根据第一机器行为识别结果和第二机器行为识别结果,对预设逻辑回归模型进行训练,直至训练后的逻辑回归模型满足预设约束条件,得到机器行为识别模型。
其中,预设约束条件为:
h(x)=w 0+w 1x 1+w 2x 2
Figure PCTCN2020136324-appb-000001
其中,x 1为第一机器行为识别结果,x 2为第二机器行为识别结果,h(x i)为LR模型输出的机器行为识别结果,w 0为偏差,w 1为GBDT模型的权重系数,w 2为随机森林树模型的权重系数,y i为训练数据的标记的机器行为识别结果,w j为w 0、w 1和w 2中的一个,γ为系数,可选为0.05,
Figure PCTCN2020136324-appb-000002
为所有样本的输出的机器行为识别结果和标记的机器行为识别结果的差值平方求和,保证LR模型输出的分类结果和标记结果尽可能的一致,而在输出的机器行为识别结果和标记的机器行为识别结果一致的情况下,w 1和w 2尽可能的小,减少运算压力或者过拟合。
步骤S204、获取待识别的目标数据以及机器行为识别模型,其中,所述机器行为识别模型包括主成分分析层、梯度下降树模型层、随机森林树模型层和逻辑回归模型层。
示例性的,获取访问服务器的当前访问请求,并获取当前访问请求的当前访问时刻点,且从当前访问请求中提取IP地址、用户账号和用户权限;基于该IP地址,获取多个历史访问请求以及每个历史访问请求的访问时刻点,得到多个历史访问时刻点,其中,每个历史访问请求的IP地址与访问请求中的IP地址相同;将IP地址、用户账号、用户权限、当前访问时刻点以及每个历史访问时刻点作为一条待识别的目标数据。
步骤S205、将所述目标数据输入至所述主成分分析层进行处理,得到所述目标数据的主成分特征。
由于梯度下降树模型不适用于高维特征数据,需要先对目标数据进行降维处理,因此将目标数据输入至主成分分析层进行处理,得到目标数据的主成分特征,从而降低目标数据的维数,便于后续输入到梯度下降树模型层进行处理。
步骤S206、将所述主成分特征输入至所述梯度下降树模型层进行处理,得到所述目标数据的第一机器行为识别结果。
将目标数据的主成分特征输入至梯度下降树模型层进行处理,得到目标数据的第一机器行为识别结果,即将目标数据的主成分特征输入至梯度下降树模型层进行处理,得到该目标数据的分类为不确定行为类的概率、目标数据的分类为机器行为类的概率和目标数据的分类为非机器行为类的概率,并根据该目标数据的分类为不确定行为类的概率、目标数据的分类为机器行为类的概率和目标数据的分类为非机器行为类的概率输出目标数据的第一机器行为识别结果。
步骤S207、将所述目标数据输入至所述随机森林树模型层进行处理,得到所述目标数据的第二机器行为识别结果。
由于梯度下降树模型不适用于高维特征数据,使用PCA对目标数据降维后,目标数据丢失了部分潜在信息,因此将未做降维的目标数据输入至随机森林树模型层进行处理,得到目标数据的第二机器行为识别结果,即将将未做降维的目标数据输入至随机森林树模型层进行处理,得到该目标数据的分类为不确定行为类的概率、目标数据的分类为机器行为类的概率和目标数据的分类为非机器行为类的概率,并根据该目标数据的分类为不确定行为类的概率、目标数据的分类为机器行为类的概率和目标数据的分类为非机器行为类的概率输出目标数据的第二机器行为识别结果。
步骤S208、将所述第一机器行为识别结果和第二机器行为识别结果输入至所述逻辑回归模型层进行融合处理,得到所述目标数据的机器行为识别结果
在得到目标数据的第一机器行为识别结果和第二机器行为识别结果后,将第一机器行为识别结果和第二机器行为识别结果输入至逻辑回归模型层进行融合处理,即当确定第一机器行为识别结果和第二机器行为识别结果不同时,确定目标数据的机器行为识别结果为目标数据的分类为不确定行为类;而当确定第一机器行为识别结果为目标数据的分类为机器行为类,且第二机器行为识别结果为目标数据的分类为机器行为类时,确定目标数据的机器行为识别结果为目标数据的分类为机器行为类;而当确定第一机器行为识别结果为目标数据的分类为非机器行为类,且第二机器行为识别结果为目标数据的分类为非机器行为类时,确定目标数据的机器行为识别结果为目标数据的分类为非机器行为类。
上述实施例提供的机器行为识别方法,通过获取机器行为数据集、非机器行为数据集和不确定行为数据集,并根据该机器行为数据集、非机器行为数据集和不确定行为数据集,确定目标样本数据集,然后根据目标样本数据集,对预设随机森林树模型和预设梯度下降树模型进行融合训练,得到机器行为识别模型,将目标数据输入至主成分分析层进行处理,得到目标数据的主成分特征,并将主成分特征输入至梯度下降树模型层进行处理,得到目标数据的第一机器行为识别结果,同时将目标数据输入至随机森林树模型层进行处理,得到目标数据的第二机器行为识别结果,最后将第一机器行为识别结果和第二机器行为识别结果输入至逻辑回归模型层进行融合处理,得到该目标数据的机器行为识别结果,极大的提高了机器行为识别模型对机器行为识别的准确性。
请参阅图5,图5是本申请实施例提供的一种机器行为识别装置的示意性框图。
如图5所示,该机器行为识别装置300,包括:获取模块310、第一机器行为识别模块320、第二机器行为识别模块330和融合模块340,其中:
所述获取模块310,用于获取待识别的目标数据以及机器行为识别模型,其中,所述机器行为识别模型包括主成分分析层、梯度下降树模型层、随机森林树模型层和逻辑回归模型层;
所述第一机器行为识别模块320,用于将所述目标数据输入至所述主成分分析层进行处理,得到所述目标数据的主成分特征;以及将所述主成分特征输入至所述梯度下降树模型层进行处理,得到所述目标数据的第一机器行为识别结果;
所述第二机器行为识别模块330,用于将所述目标数据输入至所述随机森林树模型层进行处理,得到所述目标数据的第二机器行为识别结果;
所述融合模块340,用于将所述第一机器行为识别结果和第二机器行为识别结果输入至所述逻辑回归模型层进行融合处理,得到所述目标数据的机器行为识别结果。
在一实施例中,所述机器行为识别装置300还包括:
所述获取模块310,还用于获取机器行为数据集、非机器行为数据集和不确定行为数据集;
确定模块,用于根据所述机器行为数据集、非机器行为数据集和不确定行为数据集,确定目标样本数据集;
模型训练模块,用于根据所述目标样本数据集,对预设随机森林树模型和预设梯度下 降树模型进行融合训练,得到所述机器行为识别模型。
在一实施例中,所述获取模块310还用于:
获取服务器的日志数据集、机器行为识别规则和非机器行为识别规则;
根据所述机器行为识别规则从所述日志数据集中提取出机器行为数据集;
根据所述非机器行为识别规则从所述日志数据集中提取出非机器行为数据集;
从所述日志数据集中剔除所述机器行为数据集和非机器行为数据集,得到不确定行为数据集。
在一实施例中,所述确定模块还用于:
对所述机器行为数据集、非机器行为数据集和不确定行为数据集中的样本进行聚类,得到多个类别的样本数据集;
确定所述多个类别的样本数据集的机器行为数据、非机器行为数据和不确定行为数据的分布信息;
当确定所述分布信息满足预设分布条件时,根据所述多个类别的样本数据集确定目标样本数据集。
在一实施例中,所述多个类别的样本数据集包括第一样本数据集、第二样本数据集、第三样本数据集和第四样本数据集,所述第一样本数据集仅包含不确定行为数据,所述第二样本数据集包含机器行为数据和不确定行为数据,但不包含非机器行为数据,所述第三样本数据集包含非机器行为数据和不确定行为数据,但不包含机器行为数据,所述第四样本数据集包含机器行为数据、非机器行为数据和不确定行为数据,所述确定模块还用于:
将所述第一样本数据集和所述第四样本数据集内的样本数据标记为不确定行为类的样本数据,得到第一候选样本数据集;
将所述第二样本数据集内的样本数据标记为机器行为类的样本数据,得到第二候选样本数据集;
将所述第三样本数据集内的样本数据标记为非机器行为类的样本数据,得到第三候选样本数据集;
分别从所述第一候选样本数据集、第二候选样本数据集和第三候选样本数据集内提取预设数量的样本数据,得到所述目标样本数据集。
在一实施例中,所述模型训练模块还用于:
将所述目标样本数据集拆分为验证样本数据集和训练样本数据集;
根据所述训练样本数据集分别对预设梯度下降树模型和预设随机森林树模型进行训练,得到目标梯度下降树模型和目标随机森林树模型;
根据所述验证样本数据集确定所述目标梯度下降树模型的第一准确率和第一错误样本数据集以及所述目标随机森林树模型的第二准确率和第二错误样本数据集;
确定所述第一错误样本数据集与所述第二错误样本数据集之间的相似度;
当确定所述相似度小于或等于预设相似度,且所述第一准确率和第二准确率大于或等于预设准确率时,根据所述训练样本数据集对所述目标随机森林树模型和目标梯度下降树模型进行融合训练,得到机器行为识别模型。
在一实施例中,所述模型训练模块还用于:
每次从所述训练样本数据集中选择一个训练样本数据;
将选择的训练样本数据输入所述目标随机森林树模型和目标梯度下降树模型进行处理,得到第一机器行为识别结果和第二机器行为识别结果;
根据所述第一机器行为识别结果和第二机器行为识别结果,对预设逻辑回归模型进行训练,直至训练后的逻辑回归模型满足预设约束条件,得到机器行为识别模型。
需要说明的是,所属领域的技术人员可以清楚地了解到,为了描述的方便和简洁,上述描述的装置和各模块及单元的具体工作过程,可以参考前述机器行为识别方法实施例中 的对应过程,在此不再赘述。
上述实施例提供的装置可以实现为一种计算机程序的形式,该计算机程序可以在如图6所示的计算机设备上运行。
请参阅图6,图6是本申请实施例提供的一种计算机设备的结构示意性框图。该计算机设备可以为服务器或终端。
如图6所示,该计算机设备包括通过系统总线连接的处理器、存储器和网络接口,其中,存储器可以是易失性的,也可以是非易失性的。
非易失性存储介质可存储操作系统和计算机程序。该计算机程序包括程序指令,该程序指令被执行时,可使得处理器执行任意一种数据泄露提醒方法。
处理器用于提供计算和控制能力,支撑整个计算机设备的运行。
内存储器为非易失性存储介质中的计算机程序的运行提供环境,该计算机程序被处理器执行时,可使得处理器执行任意一种机器行为识别方法。
该网络接口用于进行网络通信,如发送分配的任务等。本领域技术人员可以理解,图6中示出的结构,仅仅是与本申请方案相关的部分结构的框图,并不构成对本申请方案所应用于其上的计算机设备的限定,具体的计算机设备可以包括比图中所示更多或更少的部件,或者组合某些部件,或者具有不同的部件布置。
应当理解的是,处理器可以是中央处理单元(Central Processing Unit,CPU),该处理器还可以是其他通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现场可编程门阵列(Field-Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。其中,通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。
其中,在一实施例中,所述处理器用于运行存储在存储器中的计算机程序,以实现如下步骤:
获取待识别的目标数据以及机器行为识别模型,其中,所述机器行为识别模型包括主成分分析层、梯度下降树模型层、随机森林树模型层和逻辑回归模型层;
将所述目标数据输入至所述主成分分析层进行处理,得到所述目标数据的主成分特征;
将所述主成分特征输入至所述梯度下降树模型层进行处理,得到所述目标数据的第一机器行为识别结果;
将所述目标数据输入至所述随机森林树模型层进行处理,得到所述目标数据的第二机器行为识别结果;
将所述第一机器行为识别结果和第二机器行为识别结果输入至所述逻辑回归模型层进行融合处理,得到所述目标数据的机器行为识别结果。
在一实施例中,所述获取待识别的目标数据以及机器行为识别模型之前,还包括:
获取机器行为数据集、非机器行为数据集和不确定行为数据集;
根据所述机器行为数据集、非机器行为数据集和不确定行为数据集,确定目标样本数据集;
根据所述目标样本数据集,对预设随机森林树模型和预设梯度下降树模型进行融合训练,得到所述机器行为识别模型。
在一实施例中,所述获取机器行为数据集、非机器行为数据集和不确定行为数据集,包括:
获取服务器的日志数据集、机器行为识别规则和非机器行为识别规则;
根据所述机器行为识别规则从所述日志数据集中提取出机器行为数据集;
根据所述非机器行为识别规则从所述日志数据集中提取出非机器行为数据集;
从所述日志数据集中剔除所述机器行为数据集和非机器行为数据集,得到不确定行为数据集。
在一实施例中,所述根据所述机器行为数据集、非机器行为数据集和不确定行为数据集,确定目标样本数据集,包括:
对所述机器行为数据集、非机器行为数据集和不确定行为数据集中的样本进行聚类,得到多个类别的样本数据集;
确定所述多个类别的样本数据集的机器行为数据、非机器行为数据和不确定行为数据的分布信息;
当确定所述分布信息满足预设分布条件时,根据所述多个类别的样本数据集确定目标样本数据集。
在一实施例中,所述多个类别的样本数据集包括第一样本数据集、第二样本数据集、第三样本数据集和第四样本数据集,所述第一样本数据集仅包含不确定行为数据,所述第二样本数据集包含机器行为数据和不确定行为数据,但不包含非机器行为数据,所述第三样本数据集包含非机器行为数据和不确定行为数据,但不包含机器行为数据,所述第四样本数据集包含机器行为数据、非机器行为数据和不确定行为数据,所述根据所述多个类别的样本数据集确定目标样本数据集,包括:
将所述第一样本数据集和所述第四样本数据集内的样本数据标记为不确定行为类的样本数据,得到第一候选样本数据集;
将所述第二样本数据集内的样本数据标记为机器行为类的样本数据,得到第二候选样本数据集;
将所述第三样本数据集内的样本数据标记为非机器行为类的样本数据,得到第三候选样本数据集;
分别从所述第一候选样本数据集、第二候选样本数据集和第三候选样本数据集内提取预设数量的样本数据,得到所述目标样本数据集。
在一实施例中,所述根据所述目标样本数据集,对预设随机森林树模型和预设梯度下降树模型进行融合训练,得到机器行为识别模型,包括:
将所述目标样本数据集拆分为验证样本数据集和训练样本数据集;
根据所述训练样本数据集分别对预设梯度下降树模型和预设随机森林树模型进行训练,得到目标梯度下降树模型和目标随机森林树模型;
根据所述验证样本数据集确定所述目标梯度下降树模型的第一准确率和第一错误样本数据集以及所述目标随机森林树模型的第二准确率和第二错误样本数据集;
确定所述第一错误样本数据集与所述第二错误样本数据集之间的相似度;
当确定所述相似度小于或等于预设相似度,且所述第一准确率和第二准确率均大于或等于预设准确率时,根据所述训练样本数据集对所述目标随机森林树模型和目标梯度下降树模型进行融合训练,得到机器行为识别模型。
在一实施例中,所述根据所述训练样本数据集对所述目标随机森林树模型和目标梯度下降树模型进行融合训练,得到机器行为识别模型,包括:
每次从所述训练样本数据集中选择一个训练样本数据;
将选择的训练样本数据输入所述目标随机森林树模型和目标梯度下降树模型进行处理,得到第一机器行为识别结果和第二机器行为识别结果;
根据所述第一机器行为识别结果和第二机器行为识别结果,对预设逻辑回归模型进行训练,直至训练后的逻辑回归模型满足预设约束条件,得到机器行为识别模型。
需要说明的是,所属领域的技术人员可以清楚地了解到,为了描述的方便和简洁,上述描述的计算机设备的具体工作过程,可以参考前述机器行为识别方法实施例中的对应过程,在此不再赘述。
通过以上的实施方式的描述可知,本领域的技术人员可以清楚地了解到本申请可借助软件加必需的通用硬件平台的方式来实现。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品可以存储在存储介质中,如ROM/RAM、磁碟、光盘等,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例或者实施例的某些部分所述的方法。
本申请实施例还提供一种计算机可读存储介质,所述计算机可读存储介质可以是易失性的,也可以是非易失性的。所述计算机可读存储介质上存储有计算机程序,所述计算机程序中包括程序指令,所述程序指令被执行时所实现的方法可参阅本申请机器行为识别方法的各个实施例。
其中,所述计算机可读存储介质可以是前述实施例所述的计算机设备的内部存储单元,例如所述计算机设备的硬盘或内存。所述计算机可读存储介质也可以是所述计算机设备的外部存储设备,例如所述计算机设备上配备的插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)等。
进一步地,所述计算机可读存储介质可主要包括存储程序区和存储数据区,其中,存储程序区可存储操作系统、至少一个功能所需的应用程序等;存储数据区可存储根据区块链节点的使用所创建的数据等。
本申请所指区块链是分布式数据存储、点对点传输、共识机制、加密算法等计算机技术的新型应用模式。区块链(Blockchain),本质上是一个去中心化的数据库,是一串使用密码学方法相关联产生的数据块,每一个数据块中包含了一批次网络交易的信息,用于验证其信息的有效性(防伪)和生成下一个区块。区块链可以包括区块链底层平台、平台产品服务层以及应用服务层等。
应当理解,在此本申请说明书中所使用的术语仅仅是出于描述特定实施例的目的而并不意在限制本申请。如在本申请说明书和所附权利要求书中所使用的那样,除非上下文清楚地指明其它情况,否则单数形式的“一”、“一个”及“该”意在包括复数形式。
还应当理解,在本申请说明书和所附权利要求书中使用的术语“和/或”是指相关联列出的项中的一个或多个的任何组合以及所有可能组合,并且包括这些组合。需要说明的是,在本文中,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者系统不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者系统所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括该要素的过程、方法、物品或者系统中还存在另外的相同要素。
上述本申请实施例序号仅仅为了描述,不代表实施例的优劣。以上所述,仅是本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到各种等效的修改或替换,这些修改或替换都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以权利要求的保护范围为准。

Claims (20)

  1. 一种机器行为识别方法,其中,包括:
    获取待识别的目标数据以及机器行为识别模型,其中,所述机器行为识别模型包括主成分分析层、梯度下降树模型层、随机森林树模型层和逻辑回归模型层;
    将所述目标数据输入至所述主成分分析层进行处理,得到所述目标数据的主成分特征;
    将所述主成分特征输入至所述梯度下降树模型层进行处理,得到所述目标数据的第一机器行为识别结果;
    将所述目标数据输入至所述随机森林树模型层进行处理,得到所述目标数据的第二机器行为识别结果;
    将所述第一机器行为识别结果和第二机器行为识别结果输入至所述逻辑回归模型层进行融合处理,得到所述目标数据的机器行为识别结果。
  2. 根据权利要求1所述的机器行为识别方法,其中,所述获取待识别的目标数据以及机器行为识别模型之前,还包括:
    获取机器行为数据集、非机器行为数据集和不确定行为数据集;
    根据所述机器行为数据集、非机器行为数据集和不确定行为数据集,确定目标样本数据集;
    根据所述目标样本数据集,对预设随机森林树模型和预设梯度下降树模型进行融合训练,得到所述机器行为识别模型。
  3. 根据权利要求2所述的机器行为识别方法,其中,所述获取机器行为数据集、非机器行为数据集和不确定行为数据集,包括:
    获取服务器的日志数据集、机器行为识别规则和非机器行为识别规则;
    根据所述机器行为识别规则从所述日志数据集中提取出机器行为数据集;
    根据所述非机器行为识别规则从所述日志数据集中提取出非机器行为数据集;
    从所述日志数据集中剔除所述机器行为数据集和非机器行为数据集,得到不确定行为数据集。
  4. 根据权利要求2所述的机器行为识别方法,其中,所述根据所述机器行为数据集、非机器行为数据集和不确定行为数据集,确定目标样本数据集,包括:
    对所述机器行为数据集、非机器行为数据集和不确定行为数据集中的样本进行聚类,得到多个类别的样本数据集;
    确定所述多个类别的样本数据集的机器行为数据、非机器行为数据和不确定行为数据的分布信息;
    当确定所述分布信息满足预设分布条件时,根据所述多个类别的样本数据集确定目标样本数据集。
  5. 根据权利要求4所述的机器行为识别方法,其中,所述多个类别的样本数据集包括第一样本数据集、第二样本数据集、第三样本数据集和第四样本数据集,所述第一样本数据集仅包含不确定行为数据,所述第二样本数据集包含机器行为数据和不确定行为数据,但不包含非机器行为数据,所述第三样本数据集包含非机器行为数据和不确定行为数据,但不包含机器行为数据,所述第四样本数据集包含机器行为数据、非机器行为数据和不确定行为数据,所述根据所述多个类别的样本数据集确定目标样本数据集,包括:
    将所述第一样本数据集和所述第四样本数据集内的样本数据标记为不确定行为类的样本数据,得到第一候选样本数据集;
    将所述第二样本数据集内的样本数据标记为机器行为类的样本数据,得到第二候选样本数据集;
    将所述第三样本数据集内的样本数据标记为非机器行为类的样本数据,得到第三候选 样本数据集;
    分别从所述第一候选样本数据集、第二候选样本数据集和第三候选样本数据集内提取预设数量的样本数据,得到所述目标样本数据集。
  6. 根据权利要求2所述的机器行为识别方法,其中,所述根据所述目标样本数据集,对预设随机森林树模型和预设梯度下降树模型进行融合训练,得到机器行为识别模型,包括:
    将所述目标样本数据集拆分为验证样本数据集和训练样本数据集;
    根据所述训练样本数据集分别对预设梯度下降树模型和预设随机森林树模型进行训练,得到目标梯度下降树模型和目标随机森林树模型;
    根据所述验证样本数据集确定所述目标梯度下降树模型的第一准确率和第一错误样本数据集以及所述目标随机森林树模型的第二准确率和第二错误样本数据集;
    确定所述第一错误样本数据集与所述第二错误样本数据集之间的相似度;
    当确定所述相似度小于或等于预设相似度,且所述第一准确率和第二准确率均大于或等于预设准确率时,根据所述训练样本数据集对所述目标随机森林树模型和目标梯度下降树模型进行融合训练,得到机器行为识别模型。
  7. 根据权利要求6所述的机器行为识别方法,其中,所述根据所述训练样本数据集对所述目标随机森林树模型和目标梯度下降树模型进行融合训练,得到机器行为识别模型,包括:
    每次从所述训练样本数据集中选择一个训练样本数据;
    将选择的训练样本数据输入所述目标随机森林树模型和目标梯度下降树模型进行处理,得到第一机器行为识别结果和第二机器行为识别结果;
    根据所述第一机器行为识别结果和第二机器行为识别结果,对预设逻辑回归模型进行训练,直至训练后的逻辑回归模型满足预设约束条件,得到机器行为识别模型。
  8. 一种机器行为识别装置,其中,所述机器行为识别装置包括:
    获取模块,用于获取待识别的目标数据以及机器行为识别模型,其中,所述机器行为识别模型包括主成分分析层、梯度下降树模型层、随机森林树模型层和逻辑回归模型层;
    第一机器行为识别模块,用于将所述目标数据输入至所述主成分分析层进行处理,得到所述目标数据的主成分特征;以及将所述主成分特征输入至所述梯度下降树模型层进行处理,得到所述目标数据的第一机器行为识别结果;
    第二机器行为识别模块,用于将所述目标数据输入至所述随机森林树模型层进行处理,得到所述目标数据的第二机器行为识别结果;
    融合模块,用于将所述第一机器行为识别结果和第二机器行为识别结果输入至所述逻辑回归模型层进行融合处理,得到所述目标数据的机器行为识别结果。
  9. 一种计算机设备,其中,所述计算机设备包括处理器、存储器、以及存储在所述存储器上并可被所述处理器执行的计算机程序,其中所述计算机程序被所述处理器执行时,实现如下步骤:
    获取待识别的目标数据以及机器行为识别模型,其中,所述机器行为识别模型包括主成分分析层、梯度下降树模型层、随机森林树模型层和逻辑回归模型层;
    将所述目标数据输入至所述主成分分析层进行处理,得到所述目标数据的主成分特征;
    将所述主成分特征输入至所述梯度下降树模型层进行处理,得到所述目标数据的第一机器行为识别结果;
    将所述目标数据输入至所述随机森林树模型层进行处理,得到所述目标数据的第二机器行为识别结果;
    将所述第一机器行为识别结果和第二机器行为识别结果输入至所述逻辑回归模型层 进行融合处理,得到所述目标数据的机器行为识别结果。
  10. 根据权利要求9所述的计算机设备,其中,所述计算机程序被所述处理器执行时,还实现如下步骤:
    获取机器行为数据集、非机器行为数据集和不确定行为数据集;
    根据所述机器行为数据集、非机器行为数据集和不确定行为数据集,确定目标样本数据集;
    根据所述目标样本数据集,对预设随机森林树模型和预设梯度下降树模型进行融合训练,得到所述机器行为识别模型。
  11. 根据权利要求10所述的计算机设备,其中,所述获取机器行为数据集、非机器行为数据集和不确定行为数据集,包括:
    获取服务器的日志数据集、机器行为识别规则和非机器行为识别规则;
    根据所述机器行为识别规则从所述日志数据集中提取出机器行为数据集;
    根据所述非机器行为识别规则从所述日志数据集中提取出非机器行为数据集;
    从所述日志数据集中剔除所述机器行为数据集和非机器行为数据集,得到不确定行为数据集。
  12. 根据权利要求10所述的计算机设备,其中,所述根据所述机器行为数据集、非机器行为数据集和不确定行为数据集,确定目标样本数据集,包括:
    对所述机器行为数据集、非机器行为数据集和不确定行为数据集中的样本进行聚类,得到多个类别的样本数据集;
    确定所述多个类别的样本数据集的机器行为数据、非机器行为数据和不确定行为数据的分布信息;
    当确定所述分布信息满足预设分布条件时,根据所述多个类别的样本数据集确定目标样本数据集。
  13. 根据权利要求12所述的计算机设备,其中,所述多个类别的样本数据集包括第一样本数据集、第二样本数据集、第三样本数据集和第四样本数据集,所述第一样本数据集仅包含不确定行为数据,所述第二样本数据集包含机器行为数据和不确定行为数据,但不包含非机器行为数据,所述第三样本数据集包含非机器行为数据和不确定行为数据,但不包含机器行为数据,所述第四样本数据集包含机器行为数据、非机器行为数据和不确定行为数据,所述根据所述多个类别的样本数据集确定目标样本数据集,包括:
    将所述第一样本数据集和所述第四样本数据集内的样本数据标记为不确定行为类的样本数据,得到第一候选样本数据集;
    将所述第二样本数据集内的样本数据标记为机器行为类的样本数据,得到第二候选样本数据集;
    将所述第三样本数据集内的样本数据标记为非机器行为类的样本数据,得到第三候选样本数据集;
    分别从所述第一候选样本数据集、第二候选样本数据集和第三候选样本数据集内提取预设数量的样本数据,得到所述目标样本数据集。
  14. 根据权利要求10所述的计算机设备,其中,所述根据所述目标样本数据集,对预设随机森林树模型和预设梯度下降树模型进行融合训练,得到机器行为识别模型,包括:
    将所述目标样本数据集拆分为验证样本数据集和训练样本数据集;
    根据所述训练样本数据集分别对预设梯度下降树模型和预设随机森林树模型进行训练,得到目标梯度下降树模型和目标随机森林树模型;
    根据所述验证样本数据集确定所述目标梯度下降树模型的第一准确率和第一错误样本数据集以及所述目标随机森林树模型的第二准确率和第二错误样本数据集;
    确定所述第一错误样本数据集与所述第二错误样本数据集之间的相似度;
    当确定所述相似度小于或等于预设相似度,且所述第一准确率和第二准确率均大于或等于预设准确率时,根据所述训练样本数据集对所述目标随机森林树模型和目标梯度下降树模型进行融合训练,得到机器行为识别模型。
  15. 根据权利要求14所述的计算机设备,其中,所述根据所述训练样本数据集对所述目标随机森林树模型和目标梯度下降树模型进行融合训练,得到机器行为识别模型,包括:
    每次从所述训练样本数据集中选择一个训练样本数据;
    将选择的训练样本数据输入所述目标随机森林树模型和目标梯度下降树模型进行处理,得到第一机器行为识别结果和第二机器行为识别结果;
    根据所述第一机器行为识别结果和第二机器行为识别结果,对预设逻辑回归模型进行训练,直至训练后的逻辑回归模型满足预设约束条件,得到机器行为识别模型。
  16. 一种计算机可读存储介质,其中,所述计算机可读存储介质上存储有计算机程序,其中所述计算机程序被处理器执行时,实现如下步骤:
    获取待识别的目标数据以及机器行为识别模型,其中,所述机器行为识别模型包括主成分分析层、梯度下降树模型层、随机森林树模型层和逻辑回归模型层;
    将所述目标数据输入至所述主成分分析层进行处理,得到所述目标数据的主成分特征;
    将所述主成分特征输入至所述梯度下降树模型层进行处理,得到所述目标数据的第一机器行为识别结果;
    将所述目标数据输入至所述随机森林树模型层进行处理,得到所述目标数据的第二机器行为识别结果;
    将所述第一机器行为识别结果和第二机器行为识别结果输入至所述逻辑回归模型层进行融合处理,得到所述目标数据的机器行为识别结果。
  17. 根据权利要求16所述的计算机可读存储介质,其中,所述计算机程序被所述处理器执行时,还实现如下步骤:
    获取机器行为数据集、非机器行为数据集和不确定行为数据集;
    根据所述机器行为数据集、非机器行为数据集和不确定行为数据集,确定目标样本数据集;
    根据所述目标样本数据集,对预设随机森林树模型和预设梯度下降树模型进行融合训练,得到所述机器行为识别模型。
  18. 根据权利要求17所述的计算机可读存储介质,其中,所述获取机器行为数据集、非机器行为数据集和不确定行为数据集,包括:
    获取服务器的日志数据集、机器行为识别规则和非机器行为识别规则;
    根据所述机器行为识别规则从所述日志数据集中提取出机器行为数据集;
    根据所述非机器行为识别规则从所述日志数据集中提取出非机器行为数据集;
    从所述日志数据集中剔除所述机器行为数据集和非机器行为数据集,得到不确定行为数据集。
  19. 根据权利要求17所述的计算机可读存储介质,其中,所述根据所述机器行为数据集、非机器行为数据集和不确定行为数据集,确定目标样本数据集,包括:
    对所述机器行为数据集、非机器行为数据集和不确定行为数据集中的样本进行聚类,得到多个类别的样本数据集;
    确定所述多个类别的样本数据集的机器行为数据、非机器行为数据和不确定行为数据的分布信息;
    当确定所述分布信息满足预设分布条件时,根据所述多个类别的样本数据集确定目标样本数据集。
  20. 根据权利要求19所述的计算机可读存储介质,其中,所述多个类别的样本数据集包括第一样本数据集、第二样本数据集、第三样本数据集和第四样本数据集,所述第一样本数据集仅包含不确定行为数据,所述第二样本数据集包含机器行为数据和不确定行为数据,但不包含非机器行为数据,所述第三样本数据集包含非机器行为数据和不确定行为数据,但不包含机器行为数据,所述第四样本数据集包含机器行为数据、非机器行为数据和不确定行为数据,所述根据所述多个类别的样本数据集确定目标样本数据集,包括:
    将所述第一样本数据集和所述第四样本数据集内的样本数据标记为不确定行为类的样本数据,得到第一候选样本数据集;
    将所述第二样本数据集内的样本数据标记为机器行为类的样本数据,得到第二候选样本数据集;
    将所述第三样本数据集内的样本数据标记为非机器行为类的样本数据,得到第三候选样本数据集;
    分别从所述第一候选样本数据集、第二候选样本数据集和第三候选样本数据集内提取预设数量的样本数据,得到所述目标样本数据集。
PCT/CN2020/136324 2020-08-28 2020-12-15 机器行为识别方法、装置、设备及计算机可读存储介质 WO2021189975A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010888899.0A CN112052891A (zh) 2020-08-28 2020-08-28 机器行为识别方法、装置、设备及计算机可读存储介质
CN202010888899.0 2020-08-28

Publications (1)

Publication Number Publication Date
WO2021189975A1 true WO2021189975A1 (zh) 2021-09-30

Family

ID=73607582

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/136324 WO2021189975A1 (zh) 2020-08-28 2020-12-15 机器行为识别方法、装置、设备及计算机可读存储介质

Country Status (2)

Country Link
CN (1) CN112052891A (zh)
WO (1) WO2021189975A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114389834A (zh) * 2021-11-26 2022-04-22 浪潮通信信息系统有限公司 一种api网关异常调用识别的方法、装置、设备及产品

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112052891A (zh) * 2020-08-28 2020-12-08 平安科技(深圳)有限公司 机器行为识别方法、装置、设备及计算机可读存储介质
CN113608946B (zh) * 2021-08-10 2023-09-12 国家计算机网络与信息安全管理中心 基于特征工程和表示学习的机器行为识别方法
CN115168916B (zh) * 2022-07-26 2023-01-13 北京大数据先进技术研究院 一种面向移动终端应用的数字对象可信存证方法和系统

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107330445A (zh) * 2017-05-31 2017-11-07 北京京东尚科信息技术有限公司 用户属性的预测方法和装置
WO2018096789A1 (en) * 2016-11-22 2018-05-31 Mitsubishi Electric Corporation Method for training neuron network and active learning system
CN109598331A (zh) * 2018-12-04 2019-04-09 北京芯盾时代科技有限公司 一种欺诈识别模型训练方法、欺诈识别方法及装置
CN110517071A (zh) * 2019-08-15 2019-11-29 中国平安财产保险股份有限公司 基于机器模型的信息预测方法、装置、设备及存储介质
CN111259985A (zh) * 2020-02-19 2020-06-09 腾讯科技(深圳)有限公司 基于业务安全的分类模型训练方法、装置和存储介质
CN112052891A (zh) * 2020-08-28 2020-12-08 平安科技(深圳)有限公司 机器行为识别方法、装置、设备及计算机可读存储介质

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11068942B2 (en) * 2018-10-19 2021-07-20 Cerebri AI Inc. Customer journey management engine
CN111401440B (zh) * 2020-03-13 2023-03-31 重庆第二师范学院 目标分类识别方法、装置、计算机设备及存储介质

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018096789A1 (en) * 2016-11-22 2018-05-31 Mitsubishi Electric Corporation Method for training neuron network and active learning system
CN107330445A (zh) * 2017-05-31 2017-11-07 北京京东尚科信息技术有限公司 用户属性的预测方法和装置
CN109598331A (zh) * 2018-12-04 2019-04-09 北京芯盾时代科技有限公司 一种欺诈识别模型训练方法、欺诈识别方法及装置
CN110517071A (zh) * 2019-08-15 2019-11-29 中国平安财产保险股份有限公司 基于机器模型的信息预测方法、装置、设备及存储介质
CN111259985A (zh) * 2020-02-19 2020-06-09 腾讯科技(深圳)有限公司 基于业务安全的分类模型训练方法、装置和存储介质
CN112052891A (zh) * 2020-08-28 2020-12-08 平安科技(深圳)有限公司 机器行为识别方法、装置、设备及计算机可读存储介质

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114389834A (zh) * 2021-11-26 2022-04-22 浪潮通信信息系统有限公司 一种api网关异常调用识别的方法、装置、设备及产品
CN114389834B (zh) * 2021-11-26 2024-04-30 浪潮通信信息系统有限公司 一种api网关异常调用识别的方法、装置、设备及产品

Also Published As

Publication number Publication date
CN112052891A (zh) 2020-12-08

Similar Documents

Publication Publication Date Title
WO2021189975A1 (zh) 机器行为识别方法、装置、设备及计算机可读存储介质
US11475143B2 (en) Sensitive data classification
WO2021189974A1 (zh) 模型训练方法、文本分类方法、装置、计算机设备和介质
US11385942B2 (en) Systems and methods for censoring text inline
US9858426B2 (en) Computer-implemented system and method for automatically identifying attributes for anonymization
US10637826B1 (en) Policy compliance verification using semantic distance and nearest neighbor search of labeled content
Dash et al. Summarizing user-generated textual content: Motivation and methods for fairness in algorithmic summaries
CN111507470A (zh) 一种异常账户的识别方法及装置
JP6892454B2 (ja) データの秘匿性−実用性間のトレードオフを算出するためのシステムおよび方法
CN110929525B (zh) 一种网贷风险行为分析检测方法、装置、设备和存储介质
US11562032B1 (en) Apparatus and methods for updating a user profile based on a user file
US20230236890A1 (en) Apparatus for generating a resource probability model
CN113657993A (zh) 信用风险识别方法、装置、设备及存储介质
US20230237252A1 (en) Digital posting match recommendation apparatus and methods
CN112559526A (zh) 数据表导出方法、装置、计算机设备及存储介质
US11620580B2 (en) Methods and systems for probabilistic filtering of candidate intervention representations
US11887059B2 (en) Apparatus and methods for creating a video record
US11863676B1 (en) Apparatus and methods for minting non-fungible tokens (NFTS) from user-specific products and data
US11803575B2 (en) Apparatus, system, and method for classifying and neutralizing bias in an application
US11797942B2 (en) Apparatus and method for applicant scoring
US11586766B1 (en) Apparatuses and methods for revealing user identifiers on an immutable sequential listing
WO2019143360A1 (en) Data security using graph communities
US11809594B2 (en) Apparatus and method for securely classifying applications to posts using immutable sequential listings
US20230230708A1 (en) Methods and systems for probabilistic filtering of candidate intervention representations
US11741651B2 (en) Apparatus, system, and method for generating a video avatar

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20927150

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20927150

Country of ref document: EP

Kind code of ref document: A1