WO2021120775A1 - Method and device for detecting data abnormality - Google Patents

Method and device for detecting data abnormality Download PDF

Info

Publication number
WO2021120775A1
WO2021120775A1 PCT/CN2020/118432 CN2020118432W WO2021120775A1 WO 2021120775 A1 WO2021120775 A1 WO 2021120775A1 CN 2020118432 W CN2020118432 W CN 2020118432W WO 2021120775 A1 WO2021120775 A1 WO 2021120775A1
Authority
WO
WIPO (PCT)
Prior art keywords
training
output
neural network
output vector
machine learning
Prior art date
Application number
PCT/CN2020/118432
Other languages
French (fr)
Chinese (zh)
Inventor
臧大卫
Original Assignee
中国银联股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中国银联股份有限公司 filed Critical 中国银联股份有限公司
Publication of WO2021120775A1 publication Critical patent/WO2021120775A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/04Trading; Exchange, e.g. stocks, commodities, derivatives or currency exchange

Definitions

  • the present invention relates to the technical field of data processing, in particular to a method and device for detecting data abnormalities.
  • This application provides a data abnormality detection method and device to improve the accuracy and precision of data detection.
  • the detection sample data determine that the object to be tested corresponds to a first detection feature value of the first machine learning model and a second detection feature value corresponding to a rule algorithm, where the rule algorithm includes at least one judgment logic;
  • the first detection feature value corresponding to the first machine learning model is input to the trained machine learning model to obtain the first output vector of the object to be tested, and the second detection feature value corresponding to the rule algorithm is input to the In the rule algorithm, the second output vector of the object to be tested is obtained;
  • the abnormal determination result of the object to be tested is determined.
  • the second output vector includes at least one output identifier; and the second detection feature value of the object to be tested is input into the rule algorithm to obtain the first value of the object to be tested.
  • Two output vectors including:
  • All output identifiers are combined into the second output vector in a predetermined order.
  • the first machine learning model is a neural network model
  • the second machine learning model is a logistic regression model
  • the neural network model is trained in the following manner:
  • the training sample data selecting a training object corresponding to the first training feature of the neural network model, and determining the first training feature value corresponding to the first training feature;
  • the first training feature value is input into the initial neural network model, and the loss function is calculated according to the obtained machine risk index and the abnormal determination result of the training object.
  • the loss function is less than a preset threshold, the corresponding first The parameter is the first parameter corresponding to the neural network model, and the trained neural network model is obtained;
  • the logistic regression model is trained in the following manner:
  • the training sample data selecting a training object corresponding to the second training feature of the rule algorithm, and determining a second training feature value corresponding to the second training feature;
  • the neural network model and the logistic regression model are trained in the following manner:
  • the training sample data selecting a training object corresponding to the first training feature of the neural network model, and determining the first training feature value corresponding to the first training feature;
  • the training sample data selecting a training object corresponding to the second training feature of the rule algorithm, and determining a second training feature value corresponding to the second training feature;
  • the first machine learning model includes a plurality of different machine learning sub-models.
  • it further includes:
  • the rationality of the judgment logic is determined according to the relationship between the judgment logic and other judgment logics, and the weight parameters corresponding to the judgment logic.
  • a data abnormality detection device includes:
  • the acquiring unit is used to acquire the test sample data of the object to be tested
  • the processing unit is configured to determine, according to the detection sample data, that the object to be tested corresponds to a first detection feature value of the first machine learning model, and a second detection feature value corresponding to a rule algorithm, the rule algorithm includes At least one judgment logic;
  • the calculation unit is configured to input the first detection feature value corresponding to the first machine learning model into the trained machine learning model to obtain the first output vector of the object to be tested, and to transfer the second output vector corresponding to the rule algorithm
  • the detection feature value is input into the rule algorithm to obtain the second output vector of the object to be tested;
  • An output unit configured to input the first output vector and the second output vector into the trained second machine learning model to determine the output risk index of the object to be tested;
  • the determining unit is configured to determine the abnormal determination result of the object to be tested according to the output risk index.
  • the second output vector includes at least one output identifier; the calculation unit is specifically configured to:
  • All output identifiers are combined into the second output vector in a predetermined order.
  • the first machine learning model is a neural network model
  • the second machine learning model is a logistic regression model
  • it further includes a training unit for training the neural network model in the following manner:
  • the training sample data selecting a training object corresponding to the first training feature of the neural network model, and determining the first training feature value corresponding to the first training feature;
  • the first training feature value is input into the initial neural network model, and the loss function is calculated according to the obtained machine risk index and the abnormal determination result of the training object.
  • the loss function is less than a preset threshold, the corresponding first The parameter is the first parameter corresponding to the neural network model, and the trained neural network model is obtained;
  • the training unit is also used to train the logistic regression model in the following manner:
  • the training sample data selecting a training object corresponding to the second training feature of the rule algorithm, and determining a second training feature value corresponding to the second training feature;
  • the training unit is further configured to train the neural network model and the logistic regression model in the following manner:
  • the training sample data selecting a training object corresponding to the first training feature of the neural network model, and determining the first training feature value corresponding to the first training feature;
  • the training sample data selecting a training object corresponding to the second training feature of the rule algorithm, and determining a second training feature value corresponding to the second training feature;
  • the first machine learning model includes a plurality of different machine learning sub-models.
  • an analysis unit is further included for:
  • the rationality of the judgment logic is determined according to the relationship between the judgment logic and other judgment logics, and the weight parameters corresponding to the judgment logic.
  • the embodiment of the present invention also provides an electronic device, including:
  • At least one processor and,
  • a memory communicatively connected with the at least one processor; wherein,
  • the memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor, so that the at least one processor can execute the method as described above.
  • the embodiment of the present invention also provides a non-transitory computer-readable storage medium, the non-transitory computer-readable storage medium storing computer instructions, and the computer instructions are used to make the computer execute the method as described above.
  • the risk control system determines the first detection feature value of the object to be tested corresponding to the first machine learning model and the second detection feature value corresponding to the rule algorithm according to the detection sample data .
  • the rule algorithm contains at least one judgment logic.
  • the first detection feature value corresponding to the first machine learning model is input into the trained machine learning model to obtain the first output vector of the object to be tested.
  • the second detection feature value corresponding to the rule algorithm is input into the rule algorithm to obtain the second output vector of the object to be tested.
  • the first output vector and the second output vector are input into the trained second machine learning model to determine the output risk index of the object to be tested, and based on the output risk index, determine the abnormality determination result of the object to be tested.
  • the machine learning algorithm and the rule algorithm are closely connected, the output result of the first machine learning model and the output result of the rule algorithm are input into the second machine learning model, and the second machine learning model is used to effectively combine the first machine learning model.
  • the accuracy and precision rate of the output of a machine learning model and the rule algorithm are higher than that of the machine learning model alone, and the recall rate index is also better than the general machine learning model system.
  • FIG. 1 is an architecture diagram of a data anomaly detection system provided by an embodiment of the present invention
  • FIG. 2 is a schematic flowchart of a method for detecting data anomaly according to an embodiment of the present invention
  • FIG. 3 is a schematic diagram of a rule tree provided by an embodiment of the present invention.
  • FIG. 4 is a schematic diagram of a rule tree of a rule algorithm that needs to be optimized according to an embodiment of the present invention
  • FIG. 5 is a schematic flowchart of a method for detecting data risk anomalies according to a specific embodiment of the present invention.
  • FIG. 6 is a schematic structural diagram of a data anomaly detection device provided by an embodiment of the present invention.
  • FIG. 7 is a schematic structural diagram of an electronic device provided by an embodiment of the present invention.
  • FIG. 1 shows an architecture diagram of a data anomaly detection system provided by an embodiment of the present application. It includes five subsystems, namely a transaction collection component, a historical feature calculation component, a rule sub-engine, a depth sub-engine, and an output module.
  • the transaction collection component collects the detection sample data of the object to be tested through the MySQL proxy or Kafka queue, filters through preliminary conditions, filters out low-risk objects and channels that do not require risk control through key field comparison, and then uses TCP socket (container) The communication is sent to the historical feature calculation component, the rule sub-engine and the depth sub-engine.
  • the historical feature calculation component will update the context and statistics according to the information of the object to be tested.
  • the context information stores the information of the user's last specific behavior; the statistics information includes statistics in multiple dimensions such as card number, merchant, mobile phone number, etc. information.
  • the rule sub-engine obtains all the features required for the rule calculation from the historical feature calculation component, traverses all the rule trees, records the calculation results of the judgment logic in all the rule trees in the order of the middle order traversal, and sends them to the output module.
  • the deep sub-engine loads the trained neural network model, and sends the required features to the historical feature calculation module on demand; interactively calculates the features, One-Hot (one hot) encoding, and gets the input of the neural network model; enters the neural network model Perform the forward propagation algorithm and send the output to the output module.
  • the output module loads the trained logistic regression model, splices the output of the rule sub-engine and the depth sub-engine, and enters the logistic regression model for regression calculation to obtain a risk index between 0-1; if the risk index is greater than the preset Risk threshold, then the transaction is determined to be a risk transaction and stored in the risk transaction table.
  • One-Hot Encoding is a code system in which there are as many bits as there are states, and only one bit is 1, and the others are all 0. In the embodiment of the present invention, it is used to convert the detection sample data into the current feature value and then input the machine learning model.
  • TCP Transmission Control Protocol
  • Transmission Control Protocol a connection-oriented, reliable, byte stream-based transport layer communication protocol.
  • an embodiment of the present invention provides a data anomaly detection method.
  • the data anomaly detection method provided by the embodiment of the present invention includes the following steps:
  • Step 201 Obtain test sample data of the object to be tested.
  • the detection sample data includes historical detection sample data and current detection sample data of the object to be tested.
  • the object to be tested can be a transaction, or a user, or a bank account, etc.
  • the current detection sample data and historical detection sample data in the embodiment of the present invention may be a user's transaction sequence. By inputting the user's current transaction sequence into the data anomaly detection system, the risk of the current transaction can be predicted.
  • the historical test sample data is the test sample of the object to be tested in the historical time period.
  • the historical time period is the time period before the current time point corresponding to the object to be tested. For example, the current time point is at 10:00 am on June 3, 2019, and the historical time period is from 10:00 am on June 3, 2018 to June 2019. 10 am on the 3rd of the month.
  • the time length of the historical time period can be selected according to needs and accuracy. Among them, the longer the historical time period, the higher the detection accuracy, but the greater the amount of calculation required; the historical time period’s time length The shorter the segment, the smaller the amount of calculation required for detection, but the accuracy is lower.
  • Step 202 According to the detection sample data, determine that the object to be tested corresponds to a first detection feature value of the first machine learning model and a second detection feature value corresponding to a rule algorithm, where the rule algorithm includes at least one Judgment logic.
  • the first machine learning model can be selected according to requirements, and can be a neural network model, a PCA (principal components analysis, principal component analysis) model, and so on.
  • the neural network model is used as the first machine learning model in the embodiment of the present invention.
  • Step 203 Input the first detection feature value corresponding to the first machine learning model into the trained machine learning model to obtain the first output vector of the object to be tested, and transfer the second detection feature corresponding to the rule algorithm The value is input into the rule algorithm to obtain the second output vector of the object to be tested.
  • the neural network model it is necessary to determine the historical feature value corresponding to the historical feature and the current feature value corresponding to the instant feature according to the detected sample data. Specifically, for a specific object to be tested, its historical feature value and real-time feature value are combined as needed to perform One-Hot Encoding, and then enter the neural network model.
  • the corresponding second detection feature value is calculated according to the detection sample data, and then the second detection feature value is judged according to the judgment logic.
  • Step 204 Input the first output vector and the second output vector into the trained second machine learning model, and determine the output risk index of the object to be tested.
  • the second machine learning model can also be selected as needed, and can be a logistic regression model, a neural network model, and the like.
  • a logistic regression model is used as the second machine learning model.
  • Step 205 Determine the abnormality determination result of the object to be tested according to the output risk index.
  • the risk index is greater than the risk threshold, it indicates that the risk is greater, that is, the object to be tested is abnormal. At this time, the corresponding personnel can be notified by mail, internal process documents of the company, etc. On the other hand, if the risk index is less than or equal to the risk threshold, it indicates that the object to be tested is normal.
  • the risk control system determines the first detection feature value of the object to be tested corresponding to the first machine learning model and the second detection feature value corresponding to the rule algorithm according to the detection sample data .
  • the rule algorithm contains at least one judgment logic.
  • the first detection feature value corresponding to the first machine learning model is input into the trained machine learning model to obtain the first output vector of the object to be tested.
  • the second detection feature value corresponding to the rule algorithm is input into the rule algorithm to obtain the second output vector of the object to be tested.
  • the first output vector and the second output vector are input to the trained second machine learning model to determine the output risk index of the object to be tested, and according to the output risk index, determine the abnormality determination result of the object to be tested.
  • the machine learning algorithm and the rule algorithm are closely connected, the output result of the first machine learning model and the output result of the rule algorithm are input into the second machine learning model, and the second machine learning model is used to effectively combine the first machine learning model.
  • the accuracy and precision rate of the output of a machine learning model and the rule algorithm are higher than that of the machine learning model alone, and the recall rate index is also better than the general machine learning model system.
  • the embodiment of the present invention introduces the machine learning algorithm while using the rule algorithm, and merges the two together and is closely connected.
  • the output of the rule algorithm needs to be transformed and deformed.
  • the second output vector is calculated from the rule algorithm, and the second output vector includes at least one output identifier.
  • All output identifiers are combined into the second output vector in a predetermined order.
  • the output identifier is used to digitize the determination result. Since the judgment result in the rule algorithm is generally risky and risk-free, the judgment result is digitized with 1 and 0. Generally speaking, if the judgment result is risky, the corresponding output flag is 1; If the result is no risk, the corresponding output identifier is 0. On the other hand, in order to increase accuracy and to facilitate subsequent optimization of the rule algorithm, in the embodiment of the present invention, the total judgment result of the rule algorithm is not used as the rule output result of the rule algorithm, but is based on each of the rule algorithms. The judgment logic determines a rule output result, and combines all the rule output results as the second output vector.
  • the rule algorithm contains two rules: "A+B>8" and "C
  • the traditional rule algorithm will only output one result, 1 or 0.
  • the rule algorithm traverses all the judgment logics in the rule in a predetermined order, and the predetermined order may be middle order, preorder, postorder, etc.
  • a determination result is generated for each determination logic, and then the corresponding output identifier is determined according to the corresponding relationship between the determination result and the output identifier.
  • Figure 3 is a schematic diagram of the rule tree of the above rules. As shown in Figure 3, each rule corresponds to a rule tree. Among them, the first rule tree contains one judgment logic, and the second rule tree contains three judgment logics. Therefore, the second output vector d corresponding to the rule algorithm contains 4 output identifiers, denoted as [s 1 ,s 2 , s 3 ,s 4 ].
  • the first judgment logic is to judge whether A+B>8 is true, corresponding to two judgment results, namely, yes or no, if yes, the corresponding output identifier s 1 is 1; if not, Then the output identifier s 1 is 0.
  • the second judgment logic is whether C is included in the second detection characteristic value of the object to be tested. If it is, the corresponding output identifier s 2 is 1; if not, the corresponding output identifier s 2 is 0. Analyzing the third logic C
  • the fourth judgment logic is whether D>(EF) is established. If it is, the corresponding output identifier s 4 is 1; if not, the corresponding output identifier s 4 is 0. After all the decision logic is traversed, the final second output vector is obtained, and each element in the second output vector is 1 or 0.
  • the first machine learning algorithm is also adaptively improved according to the input requirements of the second machine learning algorithm.
  • the first machine learning algorithm is a neural network model as an example for description.
  • the output result of the traditional neural network model is the risk index, and the risk index y t can be calculated by the following formula:
  • x is a first object to be measured corresponding to the characteristic value detection neural network model
  • b a b d corresponding to the neural network model to the offset vector
  • W a W d is the weight to the matrix of the neural network model
  • [sigma] Is the sigmoid function, which is a fixed value
  • ReLU is the activation function.
  • the input of the second machine learning algorithm is a vector
  • the j-dimensional output vector in formula 1 is obtained, that is, the first output vector c satisfies the following formula:
  • c is the second output vector corresponding to the neural network model.
  • formula 1 obtains a value, that is, the risk index
  • formula 2 obtains a vector, that is, the second output vector c.
  • the output of the first machine learning model and the output of the rule algorithm are used as the input of the second machine learning model.
  • the first machine learning model and the rule algorithm are combined through the second machine learning model, so that the machine learning model and the rule algorithm can effectively complement each other.
  • the second machine learning model is a logistic regression model as an example for introduction.
  • the logistic regression model regresses the output of the neural network model and the output of the rule algorithm to obtain the final prediction of the risk of the object to be tested.
  • the logistic regression model uses the following formula to calculate the output risk index:
  • y is the output risk index calculated by the logistic regression model
  • b 0 is the bias vector corresponding to the logistic regression model
  • c is the first output vector of the neural network model
  • d is the second output vector of the rule algorithm
  • W 0 is the weight matrix corresponding to the logistic regression model, which includes i weight values, and the number of weight values is equal to the sum of the number of elements in the first output vector and the number of elements in the second output vector.
  • each weight parameter in the weight matrix W 0 corresponds to the weight of each input of a logistic regression model.
  • each output identifier s corresponds to a weight parameter w.
  • the first machine learning model in the embodiment of the present invention may include a plurality of different machine learning sub-models, thereby further increasing the accuracy of risk judgment, and the suitable scenarios are wider and the accuracy is higher.
  • the embodiment of the present invention further includes:
  • the rationality of the judgment logic is determined according to the relationship between the judgment logic and other judgment logics, and the weight parameters corresponding to the judgment logic.
  • the weight parameter corresponding to each rule algorithm that has been calculated is stored in the logistic regression model.
  • the user sends an analysis request through a front-end user interface, such as a client or a browser, and the analysis request contains a rule set consisting of one or more rules.
  • the system s rule-assisted analysis master parses the rule set after receiving the request, determines all the judgment logic in the rule set, and determines the weight parameter of each judgment logic in the logistic regression model. Then according to the relationship between the judgment logic and other judgment logics, and the weight parameters corresponding to the judgment logic, the rationality of the judgment logic is determined.
  • Fig. 4 shows a rule tree of a rule algorithm that needs to be optimized in an embodiment of the present invention.
  • Rule-assisted analysis The main controller parses the rule set after receiving the request, loads the weight parameters of the rule tree in the logistic regression model; uses each judgment logic as metadata, analyzes the rule tree, and analyzes the judgment logic that can be optimized.
  • the rule algorithm contains two rule numbers, and one rule tree contains one or more judgment logics. Use the judgment logic node as metadata to analyze the rule tree, as shown in the rule tree on the left in Figure 4. If w 1 ⁇ w 2 , it is recommended to prun the node corresponding to w 1 and keep only the right branch.
  • w 4 and w 8 correspond to nodes that belong to similar structures. If w 4 ⁇ w 8 , it is recommended to use w 8 corresponds to the structure.
  • the rule-assisted analysis master also sends the current batch metadata to the historical rule analysis module.
  • the historical rule analysis module will search the historical rule library for structures similar to the metadata of the current batch. For a batch of similar historical metadata, first select one or a group of exactly the same historical metadata, and use it as a basis to convert the weight of the historical metadata of the batch and the weight of the current batch of metadata to make both Comparable. Then analyze the interchangeability of the current batch metadata. If there is a similar structure with greater weight in the historical rule base for a certain metadata, it is recommended to replace that structure. Send the current batch analysis results and historical batch analysis results to the suggestion generation module, generate visual results and descriptive suggestions, and return to the front-end interface.
  • the embodiment of the present invention contains at least two machine learning models, for the training process of these two machine learning models, one or more first machine learning models can be separately trained, and finally all output vectors And the output of the rule algorithm are combined to train the second machine learning model. It is also possible to combine all the first machine learning model and the second machine learning model for joint training.
  • the following takes the neural network model and the logical return model as examples to introduce.
  • the neural network model is trained in the following ways:
  • the training sample data select the training object corresponding to the first training feature of the neural network model, and determine the first training feature value corresponding to the first training feature;
  • the loss function is less than the preset threshold, determine the corresponding first parameter as the neural network model corresponding The first parameter of, get the trained neural network model.
  • the logistic regression model is trained in the following ways:
  • the training sample data select the training object corresponding to the second training feature of the rule algorithm, and determine the second training feature value corresponding to the second training feature;
  • the first output vector and the second output vector are input to the initial logistic regression model, and the loss function is calculated according to the obtained output risk index and the abnormal determination result of the training object.
  • the loss function is less than the preset threshold, the corresponding second parameter is determined to be
  • the second parameter corresponding to the logistic regression model obtains the trained logistic regression model.
  • the neural network model and logistic regression model are trained in the following ways:
  • the training sample data select the training object corresponding to the first training feature of the neural network model, and determine the first training feature value corresponding to the first training feature;
  • the training sample data select the training object corresponding to the second training feature of the rule algorithm, and determine the second training feature value corresponding to the second training feature;
  • the first output vector and the second output vector are input to the initial logistic regression model, and the loss function is calculated according to the obtained output risk index and the abnormal determination result of the training object.
  • the corresponding first parameter is determined to be
  • the first parameter corresponding to the neural network model obtains the trained neural network model
  • the corresponding second parameter is determined to be the second parameter corresponding to the logistic regression model to obtain the trained logistic regression model.
  • the first machine learning model is a neural network model
  • the second machine learning model is a logistic regression model.
  • Fig. 5 shows a schematic flowchart of a method for detecting data risk anomalies in a specific embodiment.
  • the core of the data risk anomaly detection method is a dual-engine model, which includes four parts: a rule sub-engine, a deep sub-engine, an output module, and a rule-assisted analysis module, of which:
  • the rule sub-engine contains a set of rules. For the transaction to be tested, it traverses all the rules in the rule set and evaluates the risk of the transaction. As shown in Figure 5, the two rules, "A+B>8" and "C
  • (D>(EF))", the engine traverses the rule tree in order, and records the calculation results of all the judgment logic nodes in order. As the output of the rule sub-engine d [s 1 ,s 2 ,s 3 ,s 4 ].
  • the deep sub-engine uses the trained neural network model to evaluate the risk of the transaction under test.
  • the historical feature and the real-time feature are combined as needed to perform One-Hot Encoding, and then input the neural network model, and output the vector c.
  • the output module uses the trained logistic regression model to regress the output of the rule sub-engine and the depth sub-engine to obtain the final prediction of the transaction risk.
  • the rule-assisted analysis module receives front-end instructions to compare multiple rules and assist in rule formulation. Analyze the weights of multiple judgment logic nodes within a single rule, analyze the weights of judgment logic nodes among multiple rules, analyze the weights of similar rules in the historical rule library, generate visual results, and give suggestions for improving existing rules.
  • the embodiment of the present invention also provides a data abnormality detection device, as shown in FIG. 6, including:
  • the obtaining unit 601 is configured to obtain test sample data of the object to be tested
  • the processing unit 602 is configured to determine, according to the detection sample data, that the object to be tested corresponds to a first detection feature value of the first machine learning model, and a second detection feature value corresponding to a rule algorithm.
  • the rule algorithm Contain at least one judgment logic
  • the calculation unit 603 is configured to input the first detection feature value corresponding to the first machine learning model into the trained machine learning model to obtain the first output vector of the object to be tested, and to convert the first output vector corresponding to the rule algorithm 2.
  • the detection feature value is input into the rule algorithm to obtain the second output vector of the object to be tested;
  • the output unit 604 is configured to input the first output vector and the second output vector into the trained second machine learning model to determine the output risk index of the object to be tested;
  • the determining unit 605 is configured to determine the abnormal determination result of the object to be tested according to the output risk index.
  • the second output vector includes at least one output identifier; the calculation unit is specifically configured to:
  • All output identifiers are combined into the second output vector in a predetermined order.
  • the first machine learning model is a neural network model
  • the second machine learning model is a logistic regression model
  • it further includes a training unit 606, configured to train the neural network model in the following manner:
  • the training sample data selecting a training object corresponding to the first training feature of the neural network model, and determining the first training feature value corresponding to the first training feature;
  • the first training feature value is input into the initial neural network model, and the loss function is calculated according to the obtained machine risk index and the abnormal determination result of the training object.
  • the loss function is less than a preset threshold, the corresponding first The parameter is the first parameter corresponding to the neural network model, and the trained neural network model is obtained;
  • the training unit is also used to train the logistic regression model in the following manner:
  • the training sample data selecting a training object corresponding to the second training feature of the rule algorithm, and determining a second training feature value corresponding to the second training feature;
  • the training unit 606 is further configured to train the neural network model and the logistic regression model in the following manner:
  • the training sample data selecting a training object corresponding to the first training feature of the neural network model, and determining the first training feature value corresponding to the first training feature;
  • the training sample data selecting a training object corresponding to the second training feature of the rule algorithm, and determining a second training feature value corresponding to the second training feature;
  • the first machine learning model includes a plurality of different machine learning sub-models.
  • an analysis unit 607 is further included, configured to:
  • the rationality of the judgment logic is determined according to the relationship between the judgment logic and other judgment logics, and the weight parameters corresponding to the judgment logic.
  • the present invention also provides an electronic device, as shown in FIG. 7, including:
  • It includes a processor 701, a memory 702, a transceiver 703, and a bus interface 704, wherein the processor 701, the memory 702 and the transceiver 703 are connected through the bus interface 704;
  • the processor 701 is configured to read a program in the memory 702 and execute the following method:
  • the detection sample data determine that the object to be tested corresponds to a first detection feature value of the first machine learning model and a second detection feature value corresponding to a rule algorithm, where the rule algorithm includes at least one judgment logic;
  • the first detection feature value corresponding to the first machine learning model is input to the trained machine learning model to obtain the first output vector of the object to be tested, and the second detection feature value corresponding to the rule algorithm is input to the In the rule algorithm, the second output vector of the object to be tested is obtained;
  • the abnormal determination result of the object to be tested is determined.
  • These computer program instructions can also be stored in a computer-readable memory that can guide a computer or other programmable data processing equipment to work in a specific manner, so that the instructions stored in the computer-readable memory produce an article of manufacture including the instruction device.
  • the device implements the functions specified in one process or multiple processes in the flowchart and/or one block or multiple blocks in the block diagram.
  • These computer program instructions can also be loaded on a computer or other programmable data processing equipment, so that a series of operation steps are executed on the computer or other programmable equipment to produce computer-implemented processing, so as to execute on the computer or other programmable equipment.
  • the instructions provide steps for implementing the functions specified in one process or multiple processes in the flowchart and/or one block or multiple blocks in the block diagram.

Abstract

A method and device for detecting data abnormality, for use in increasing the accuracy and precision of data detection. The method comprises: acquiring detection sample data of an object to be tested (201); determining, on the basis of the detection sample data, a first detection eigenvalue of said object corresponding to a first machine learning model and a second detection eigenvalue corresponding to a rule algorithm, the rule algorithm comprising at least one determination logic (202); inputting the first detection eigenvalue corresponding to the first machine learning model into a trained machine learning model to produce a first output vector of said object, and inputting the second detection eigenvalue corresponding to the rule algorithm into the rule algorithm to produce a second output vector of said object (203); inputting the first output vector and the second output vector into a trained second machine learning model, determining an output risk index of said object (204); and determining an abnormality ascertainment result of said object on the basis of the output risk index (205).

Description

一种数据异常检测方法与装置Method and device for detecting data abnormality
相关申请的交叉引用Cross-references to related applications
本申请要求在2019年12月19日提交中国专利局、申请号为201911317683.2、申请名称为“一种数据异常检测方法与装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of a Chinese patent application filed with the Chinese Patent Office on December 19, 2019, the application number is 201911317683.2, and the application name is "a data anomaly detection method and device", the entire content of which is incorporated into this application by reference in.
技术领域Technical field
本发明涉及数据处理技术领域,尤其涉及一种数据异常检测方法与装置。The present invention relates to the technical field of data processing, in particular to a method and device for detecting data abnormalities.
背景技术Background technique
互联网和互联网金融的快速发展给风控系统带来了前所未有的挑战,欺诈交易的形式、方式越发多种多样,隐蔽性高,难以挖掘,传统的规则引擎风控方法越发乏力。近年深度学习的快速发展为解决这一问题提供了另一种思路,开发深度引擎,通过深度学习构建模型挖掘隐含信息,辨别欺诈交易,已有不错的成果。The rapid development of the Internet and Internet finance has brought unprecedented challenges to the risk control system. The forms and methods of fraudulent transactions have become more diverse, highly concealed and difficult to mine, and traditional rule-engine risk control methods have become increasingly weak. The rapid development of deep learning in recent years has provided another way to solve this problem. The development of deep engines and the construction of deep learning models to mine hidden information and identify fraudulent transactions have achieved good results.
制定规则检测异常数据,在部分场景仍有不可取代的优势。但目前进行异常数据检测多为单独使用深度学习算法,准确性和精确度有待进一步提高。Formulating rules to detect abnormal data still has irreplaceable advantages in some scenarios. However, most of the current abnormal data detection is to use deep learning algorithms alone, and the accuracy and precision need to be further improved.
发明内容Summary of the invention
本申请提供一种数据异常检测方法与装置,用以提高数据检测的准确性和精确度。This application provides a data abnormality detection method and device to improve the accuracy and precision of data detection.
本发明实施例提供的一种数据异常检测方法,包括:A data abnormality detection method provided by an embodiment of the present invention includes:
获取待测对象的检测样本数据;Obtain test sample data of the object to be tested;
根据所述检测样本数据,确定所述待测对象对应于第一机器学习模型的第一检测特征值,以及对应于规则算法的第二检测特征值,所述规则算法中包含至少一个判断逻辑;According to the detection sample data, determine that the object to be tested corresponds to a first detection feature value of the first machine learning model and a second detection feature value corresponding to a rule algorithm, where the rule algorithm includes at least one judgment logic;
将所述第一机器学习模型对应的第一检测特征值输入已训练的机器学习模型,得到所述待测对象的第一输出向量,并且将所述规则算法对应的第二检测特征值输入所述规则算法中,得到所述待测对象的第二输出向量;The first detection feature value corresponding to the first machine learning model is input to the trained machine learning model to obtain the first output vector of the object to be tested, and the second detection feature value corresponding to the rule algorithm is input to the In the rule algorithm, the second output vector of the object to be tested is obtained;
将所述第一输出向量和第二输出向量输入已训练的第二机器学习模型,确定所述待测对象的输出风险指数;Input the first output vector and the second output vector into the trained second machine learning model to determine the output risk index of the object to be tested;
根据所述输出风险指数,确定所述待测对象的异常判定结果。According to the output risk index, the abnormal determination result of the object to be tested is determined.
一种可选的实施例中,所述第二输出向量包含至少一个输出标识;所述将所述待测对象的第二检测特征值输入所述规则算法中,得到所述待测对象的第二输出向量,包括:In an optional embodiment, the second output vector includes at least one output identifier; and the second detection feature value of the object to be tested is input into the rule algorithm to obtain the first value of the object to be tested. Two output vectors, including:
确定判定结果与输出标识的对应关系;Determine the corresponding relationship between the judgment result and the output identification;
针对所述规则算法中的每一个判断逻辑,利用对应的第二检测特征值,根据所述判断逻辑进行判定,得到对应的判定结果,并根据所述判定结果确定对应的输出标识;For each judgment logic in the rule algorithm, use the corresponding second detection characteristic value to make a judgment according to the judgment logic to obtain a corresponding judgment result, and determine the corresponding output identifier according to the judgment result;
按照预定顺序将所有输出标识组成所述第二输出向量。All output identifiers are combined into the second output vector in a predetermined order.
一种可选的实施例中,所述第一机器学习模型为神经网络模型,所述第二机器学习模型为逻辑回归模型。In an optional embodiment, the first machine learning model is a neural network model, and the second machine learning model is a logistic regression model.
一种可选的实施例中,所述神经网络模型利用以下方式进行训练:In an optional embodiment, the neural network model is trained in the following manner:
获取历史时间段内的训练样本数据;Obtain the training sample data in the historical time period;
根据所述训练样本数据,选择训练对象对应于所述神经网络模型的第一训练特征,并确定第一训练特征对应的第一训练特征值;According to the training sample data, selecting a training object corresponding to the first training feature of the neural network model, and determining the first training feature value corresponding to the first training feature;
将所述第一训练特征值输入初始神经网络模型,并根据得到的机器风险指数及所述训练对象的异常判定结果计算损失函数,当所述损失函数小于预设阈值时,确定对应的第一参数为所述神经网络模型对应的第一参数,得到已训练的神经网络模型;The first training feature value is input into the initial neural network model, and the loss function is calculated according to the obtained machine risk index and the abnormal determination result of the training object. When the loss function is less than a preset threshold, the corresponding first The parameter is the first parameter corresponding to the neural network model, and the trained neural network model is obtained;
所述逻辑回归模型利用以下方式进行训练:The logistic regression model is trained in the following manner:
从所述已训练的神经网络模型中获取训练对象的第一输出向量;Obtaining the first output vector of the training object from the trained neural network model;
根据所述训练样本数据,选择训练对象对应于所述规则算法的第二训练 特征,并确定第二训练特征对应的第二训练特征值;According to the training sample data, selecting a training object corresponding to the second training feature of the rule algorithm, and determining a second training feature value corresponding to the second training feature;
将第二训练特征值输入所述规则算法中,得到所述训练对象的第二输出向量;Input the second training feature value into the rule algorithm to obtain the second output vector of the training object;
将所述第一输出向量和所述第二输出向量输入初始逻辑回归模型,并根据得到的输出风险指数及所述训练对象的异常判定结果计算损失函数,当所述损失函数小于预设阈值时,确定对应的第二参数为所述逻辑回归模型对应的第二参数,得到已训练的逻辑回归模型。Input the first output vector and the second output vector to the initial logistic regression model, and calculate a loss function according to the obtained output risk index and the abnormality determination result of the training object, when the loss function is less than a preset threshold , It is determined that the corresponding second parameter is the second parameter corresponding to the logistic regression model, and the trained logistic regression model is obtained.
一种可选的实施例中,所述神经网络模型和所述逻辑回归模型利用以下方式进行训练:In an optional embodiment, the neural network model and the logistic regression model are trained in the following manner:
获取历史时间段内的训练样本数据;Obtain the training sample data in the historical time period;
根据所述训练样本数据,选择训练对象对应于所述神经网络模型的第一训练特征,并确定第一训练特征对应的第一训练特征值;According to the training sample data, selecting a training object corresponding to the first training feature of the neural network model, and determining the first training feature value corresponding to the first training feature;
将所述第一训练特征值输入初始神经网络模型,得到所述训练对象的第一输出向量;Input the first training feature value into an initial neural network model to obtain a first output vector of the training object;
根据所述训练样本数据,选择训练对象对应于所述规则算法的第二训练特征,并确定第二训练特征对应的第二训练特征值;According to the training sample data, selecting a training object corresponding to the second training feature of the rule algorithm, and determining a second training feature value corresponding to the second training feature;
将第二训练特征值输入所述规则算法中,得到所述训练对象的第二输出向量;Input the second training feature value into the rule algorithm to obtain the second output vector of the training object;
将所述第一输出向量和所述第二输出向量输入初始逻辑回归模型,并根据得到的输出风险指数及所述训练对象的异常判定结果计算损失函数,当所述损失函数小于预设阈值时,确定对应的第一参数为所述神经网络模型对应的第一参数,得到已训练的神经网络模型,并确定对应的第二参数为所述逻辑回归模型对应的第二参数,得到已训练的逻辑回归模型。Input the first output vector and the second output vector to the initial logistic regression model, and calculate a loss function according to the obtained output risk index and the abnormality determination result of the training object, when the loss function is less than a preset threshold , Determine that the corresponding first parameter is the first parameter corresponding to the neural network model to obtain the trained neural network model, and determine the corresponding second parameter to be the second parameter corresponding to the logistic regression model, to obtain the trained neural network model Logistic regression model.
一种可选的实施例中,所述第一机器学习模型包括多个不同的机器学习子模型。In an optional embodiment, the first machine learning model includes a plurality of different machine learning sub-models.
一种可选的实施例中,还包括:In an optional embodiment, it further includes:
获取规则算法中的所有判断逻辑;Get all the judgment logic in the rule algorithm;
从所述第二机器学习模型中获取每个判断逻辑对应的权重参数;Obtaining a weight parameter corresponding to each judgment logic from the second machine learning model;
针对每个判断逻辑,根据所述判断逻辑与其它判断逻辑之间的关系,以及判断逻辑对应的权重参数,确定所述判断逻辑的合理性。For each judgment logic, the rationality of the judgment logic is determined according to the relationship between the judgment logic and other judgment logics, and the weight parameters corresponding to the judgment logic.
一种数据异常检测装置,包括:A data abnormality detection device includes:
获取单元,用于获取待测对象的检测样本数据;The acquiring unit is used to acquire the test sample data of the object to be tested;
处理单元,用于根据所述检测样本数据,确定所述待测对象对应于第一机器学习模型的第一检测特征值,以及对应于规则算法的第二检测特征值,所述规则算法中包含至少一个判断逻辑;The processing unit is configured to determine, according to the detection sample data, that the object to be tested corresponds to a first detection feature value of the first machine learning model, and a second detection feature value corresponding to a rule algorithm, the rule algorithm includes At least one judgment logic;
计算单元,用于将所述第一机器学习模型对应的第一检测特征值输入已训练的机器学习模型,得到所述待测对象的第一输出向量,并且将所述规则算法对应的第二检测特征值输入所述规则算法中,得到所述待测对象的第二输出向量;The calculation unit is configured to input the first detection feature value corresponding to the first machine learning model into the trained machine learning model to obtain the first output vector of the object to be tested, and to transfer the second output vector corresponding to the rule algorithm The detection feature value is input into the rule algorithm to obtain the second output vector of the object to be tested;
输出单元,用于将所述第一输出向量和第二输出向量输入已训练的第二机器学习模型,确定所述待测对象的输出风险指数;An output unit, configured to input the first output vector and the second output vector into the trained second machine learning model to determine the output risk index of the object to be tested;
判定单元,用于根据所述输出风险指数,确定所述待测对象的异常判定结果。The determining unit is configured to determine the abnormal determination result of the object to be tested according to the output risk index.
一种可选的实施例中,所述第二输出向量包含至少一个输出标识;所述计算单元,具体用于:In an optional embodiment, the second output vector includes at least one output identifier; the calculation unit is specifically configured to:
确定判定结果与输出标识的对应关系;Determine the corresponding relationship between the judgment result and the output identification;
针对所述规则算法中的每一个判断逻辑,利用对应的第二检测特征值,根据所述判断逻辑进行判定,得到对应的判定结果,并根据所述判定结果确定对应的输出标识;For each judgment logic in the rule algorithm, use the corresponding second detection characteristic value to make a judgment according to the judgment logic to obtain a corresponding judgment result, and determine the corresponding output identifier according to the judgment result;
按照预定顺序将所有输出标识组成所述第二输出向量。All output identifiers are combined into the second output vector in a predetermined order.
一种可选的实施例中,所述第一机器学习模型为神经网络模型,所述第二机器学习模型为逻辑回归模型。In an optional embodiment, the first machine learning model is a neural network model, and the second machine learning model is a logistic regression model.
一种可选的实施例中,还包括训练单元,用于利用以下方式训练所述神经网络模型:In an optional embodiment, it further includes a training unit for training the neural network model in the following manner:
获取历史时间段内的训练样本数据;Obtain the training sample data in the historical time period;
根据所述训练样本数据,选择训练对象对应于所述神经网络模型的第一训练特征,并确定第一训练特征对应的第一训练特征值;According to the training sample data, selecting a training object corresponding to the first training feature of the neural network model, and determining the first training feature value corresponding to the first training feature;
将所述第一训练特征值输入初始神经网络模型,并根据得到的机器风险指数及所述训练对象的异常判定结果计算损失函数,当所述损失函数小于预设阈值时,确定对应的第一参数为所述神经网络模型对应的第一参数,得到已训练的神经网络模型;The first training feature value is input into the initial neural network model, and the loss function is calculated according to the obtained machine risk index and the abnormal determination result of the training object. When the loss function is less than a preset threshold, the corresponding first The parameter is the first parameter corresponding to the neural network model, and the trained neural network model is obtained;
所述训练单元,还用于利用以下方式训练所述逻辑回归模型:The training unit is also used to train the logistic regression model in the following manner:
从所述已训练的神经网络模型中获取训练对象的第一输出向量;Obtaining the first output vector of the training object from the trained neural network model;
根据所述训练样本数据,选择训练对象对应于所述规则算法的第二训练特征,并确定第二训练特征对应的第二训练特征值;According to the training sample data, selecting a training object corresponding to the second training feature of the rule algorithm, and determining a second training feature value corresponding to the second training feature;
将第二训练特征值输入所述规则算法中,得到所述训练对象的第二输出向量;Input the second training feature value into the rule algorithm to obtain the second output vector of the training object;
将所述第一输出向量和所述第二输出向量输入初始逻辑回归模型,并根据得到的输出风险指数及所述训练对象的异常判定结果计算损失函数,当所述损失函数小于预设阈值时,确定对应的第二参数为所述逻辑回归模型对应的第二参数,得到已训练的逻辑回归模型。Input the first output vector and the second output vector to the initial logistic regression model, and calculate a loss function according to the obtained output risk index and the abnormality determination result of the training object, when the loss function is less than a preset threshold , It is determined that the corresponding second parameter is the second parameter corresponding to the logistic regression model, and the trained logistic regression model is obtained.
一种可选的实施例中,所述训练单元,还用于利用以下方式训练所述神经网络模型和所述逻辑回归模型:In an optional embodiment, the training unit is further configured to train the neural network model and the logistic regression model in the following manner:
获取历史时间段内的训练样本数据;Obtain the training sample data in the historical time period;
根据所述训练样本数据,选择训练对象对应于所述神经网络模型的第一训练特征,并确定第一训练特征对应的第一训练特征值;According to the training sample data, selecting a training object corresponding to the first training feature of the neural network model, and determining the first training feature value corresponding to the first training feature;
将所述第一训练特征值输入初始神经网络模型,得到所述训练对象的第一输出向量;Input the first training feature value into an initial neural network model to obtain a first output vector of the training object;
根据所述训练样本数据,选择训练对象对应于所述规则算法的第二训练特征,并确定第二训练特征对应的第二训练特征值;According to the training sample data, selecting a training object corresponding to the second training feature of the rule algorithm, and determining a second training feature value corresponding to the second training feature;
将第二训练特征值输入所述规则算法中,得到所述训练对象的第二输出 向量;Input the second training feature value into the rule algorithm to obtain the second output vector of the training object;
将所述第一输出向量和所述第二输出向量输入初始逻辑回归模型,并根据得到的输出风险指数及所述训练对象的异常判定结果计算损失函数,当所述损失函数小于预设阈值时,确定对应的第一参数为所述神经网络模型对应的第一参数,得到已训练的神经网络模型,并确定对应的第二参数为所述逻辑回归模型对应的第二参数,得到已训练的逻辑回归模型。Input the first output vector and the second output vector to the initial logistic regression model, and calculate a loss function according to the obtained output risk index and the abnormality determination result of the training object, when the loss function is less than a preset threshold , Determine that the corresponding first parameter is the first parameter corresponding to the neural network model to obtain the trained neural network model, and determine the corresponding second parameter to be the second parameter corresponding to the logistic regression model, to obtain the trained neural network model Logistic regression model.
一种可选的实施例中,所述第一机器学习模型包括多个不同的机器学习子模型。In an optional embodiment, the first machine learning model includes a plurality of different machine learning sub-models.
一种可选的实施例中,还包括分析单元,用于:In an optional embodiment, an analysis unit is further included for:
获取规则算法中的所有判断逻辑;Get all the judgment logic in the rule algorithm;
从所述第二机器学习模型中获取每个判断逻辑对应的权重参数;Obtaining a weight parameter corresponding to each judgment logic from the second machine learning model;
针对每个判断逻辑,根据所述判断逻辑与其它判断逻辑之间的关系,以及判断逻辑对应的权重参数,确定所述判断逻辑的合理性。For each judgment logic, the rationality of the judgment logic is determined according to the relationship between the judgment logic and other judgment logics, and the weight parameters corresponding to the judgment logic.
本发明实施例还提供一种电子设备,包括:The embodiment of the present invention also provides an electronic device, including:
至少一个处理器;以及,At least one processor; and,
与所述至少一个处理器通信连接的存储器;其中,A memory communicatively connected with the at least one processor; wherein,
所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行如上所述的方法。The memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor, so that the at least one processor can execute the method as described above.
本发明实施例还提供一种非暂态计算机可读存储介质,所述非暂态计算机可读存储介质存储计算机指令,所述计算机指令用于使所述计算机执行如上所述的方法。The embodiment of the present invention also provides a non-transitory computer-readable storage medium, the non-transitory computer-readable storage medium storing computer instructions, and the computer instructions are used to make the computer execute the method as described above.
本发明实施例中,针对待测对象的异常检测,风控系统根据检测样本数据确定待测对象对应于第一机器学习模型的第一检测特征值,以及对应于规则算法的第二检测特征值,这里的规则算法中包含至少一个判断逻辑。将第一机器学习模型对应的第一检测特征值输入已训练的机器学习模型,得到待测对象的第一输出向量。另一方面,将规则算法对应的第二检测特征值输入规则算法中,得到待测对象的第二输出向量。将第一输出向量和第二输出向 量输入已训练的第二机器学习模型,确定待测对象的输出风险指数,并根据所述输出风险指数,确定所述待测对象的异常判定结果。本发明实施例中,将机器学习算法与规则算法紧密相连,将第一机器学习模型的输出结果与规则算法的输出结果输入至第二机器学习模型中,利用第二机器学习模型有效结合了第一机器学习模型与规则算法的输出,准确性和精确率均高于单独利用机器学习模型,召回率指标也优于一般的机器学习模型系统。In the embodiment of the present invention, for the abnormal detection of the object to be tested, the risk control system determines the first detection feature value of the object to be tested corresponding to the first machine learning model and the second detection feature value corresponding to the rule algorithm according to the detection sample data , The rule algorithm here contains at least one judgment logic. The first detection feature value corresponding to the first machine learning model is input into the trained machine learning model to obtain the first output vector of the object to be tested. On the other hand, the second detection feature value corresponding to the rule algorithm is input into the rule algorithm to obtain the second output vector of the object to be tested. The first output vector and the second output vector are input into the trained second machine learning model to determine the output risk index of the object to be tested, and based on the output risk index, determine the abnormality determination result of the object to be tested. In the embodiment of the present invention, the machine learning algorithm and the rule algorithm are closely connected, the output result of the first machine learning model and the output result of the rule algorithm are input into the second machine learning model, and the second machine learning model is used to effectively combine the first machine learning model. The accuracy and precision rate of the output of a machine learning model and the rule algorithm are higher than that of the machine learning model alone, and the recall rate index is also better than the general machine learning model system.
附图说明Description of the drawings
为了更清楚地说明本发明实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简要介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域的普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。In order to explain the technical solutions in the embodiments of the present invention more clearly, the following will briefly introduce the drawings needed in the description of the embodiments. Obviously, the drawings in the following description are only some embodiments of the present invention. For those of ordinary skill in the art, other drawings can be obtained from these drawings without creative labor.
图1为本发明实施例提供的一种数据异常检测系统架构图;FIG. 1 is an architecture diagram of a data anomaly detection system provided by an embodiment of the present invention;
图2为本发明实施例提供的一种数据异常检测方法的流程示意图;2 is a schematic flowchart of a method for detecting data anomaly according to an embodiment of the present invention;
图3为本发明实施例提供的一种规则树示意图;FIG. 3 is a schematic diagram of a rule tree provided by an embodiment of the present invention;
图4为本发明实施例提供的一种需要优化的规则算法的规则树示意图;4 is a schematic diagram of a rule tree of a rule algorithm that needs to be optimized according to an embodiment of the present invention;
图5为本发明具体实施例提供的一种数据风险异常检测方法的流程示意图;FIG. 5 is a schematic flowchart of a method for detecting data risk anomalies according to a specific embodiment of the present invention;
图6为本发明实施例提供的一种数据异常检测装置的结构示意图;6 is a schematic structural diagram of a data anomaly detection device provided by an embodiment of the present invention;
图7为本发明实施例提供的电子设备的结构示意图。FIG. 7 is a schematic structural diagram of an electronic device provided by an embodiment of the present invention.
具体实施方式Detailed ways
为了使本发明的目的、技术方案和优点更加清楚,下面将结合附图对本发明作进一步地详细描述,显然,所描述的实施例仅仅是本发明一部份实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其它实施例,都属于本发明保护的范围。In order to make the objectives, technical solutions, and advantages of the present invention clearer, the present invention will be further described in detail below with reference to the accompanying drawings. Obviously, the described embodiments are only a part of the embodiments of the present invention, rather than all the embodiments. . Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative work shall fall within the protection scope of the present invention.
请参考图1,其示出了本申请一个实施例提供的数据异常检测系统架构图,包括5个子系统,分别是交易采集组件、历史特征计算组件、规则子引擎、深度子引擎和输出模块。其中交易采集组件通过MySQL proxy或者Kafka队列采集待测对象的检测样本数据,经过初步条件过滤,通过关键字段比对过滤掉低风险对象和不需要风控的渠道,然后通过TCP socket(容器)通信发送至历史特征计算组件、规则子引擎和深度子引擎。Please refer to FIG. 1, which shows an architecture diagram of a data anomaly detection system provided by an embodiment of the present application. It includes five subsystems, namely a transaction collection component, a historical feature calculation component, a rule sub-engine, a depth sub-engine, and an output module. The transaction collection component collects the detection sample data of the object to be tested through the MySQL proxy or Kafka queue, filters through preliminary conditions, filters out low-risk objects and channels that do not require risk control through key field comparison, and then uses TCP socket (container) The communication is sent to the historical feature calculation component, the rule sub-engine and the depth sub-engine.
历史特征计算组件将根据待测对象的信息更新上下文和统计量,上下文信息存储了用户的上次特定行为的信息;统计量信息包含了如卡号、商户、手机号等等多个维度的统计量信息。The historical feature calculation component will update the context and statistics according to the information of the object to be tested. The context information stores the information of the user's last specific behavior; the statistics information includes statistics in multiple dimensions such as card number, merchant, mobile phone number, etc. information.
规则子引擎从历史特征计算组件获取规则计算所需的所有特征,遍历所有规则树,并将所有规则树中判断逻辑的计算结果按照中序遍历的顺序记录下来,发送给输出模块。The rule sub-engine obtains all the features required for the rule calculation from the historical feature calculation component, traverses all the rule trees, records the calculation results of the judgment logic in all the rule trees in the order of the middle order traversal, and sends them to the output module.
深度子引擎载入已训练的神经网络模型,按需向历史特征计算模块发送所需特征;将特征进行交互计算、One-Hot(独热)编码,得到神经网络模型的输入;输入神经网络模型进行前向传播算法,将输出发送至输出模块。The deep sub-engine loads the trained neural network model, and sends the required features to the historical feature calculation module on demand; interactively calculates the features, One-Hot (one hot) encoding, and gets the input of the neural network model; enters the neural network model Perform the forward propagation algorithm and send the output to the output module.
输出模块载入已训练的逻辑回归模型,将规则子引擎和深度子引擎的输出进行拼接,输入逻辑回归模型进行回归计算,得到0-1之间的风险指数;若风险指数大于预先设定的风险阈值,则判定该笔交易为风险交易,存入风险交易表。The output module loads the trained logistic regression model, splices the output of the rule sub-engine and the depth sub-engine, and enters the logistic regression model for regression calculation to obtain a risk index between 0-1; if the risk index is greater than the preset Risk threshold, then the transaction is determined to be a risk transaction and stored in the risk transaction table.
需要注意的是,上文提及的应用场景仅是为了便于理解本申请的精神和原理而示出,本申请实施例在此方面不受任何限制。相反,本申请实施例可以应用于适用的任何场景。It should be noted that the application scenarios mentioned above are only shown to facilitate the understanding of the spirit and principle of the present application, and the embodiments of the present application are not limited in this respect. On the contrary, the embodiments of the present application can be applied to any applicable scenarios.
下面对本申请实施例中涉及的部分概念进行介绍。The following introduces some concepts involved in the embodiments of the present application.
热独编码(One-Hot Encoding),是有多少个状态就有多少比特,而且只有一个比特为1,其他全为0的一种码制。本发明实施例中用于将检测样本数据转为当前特征值后输入机器学习模型。One-Hot Encoding is a code system in which there are as many bits as there are states, and only one bit is 1, and the others are all 0. In the embodiment of the present invention, it is used to convert the detection sample data into the current feature value and then input the machine learning model.
TCP(Transmission Control Protocol,传输控制协议),一种面向连接的、 可靠的、基于字节流的传输层通信协议。TCP (Transmission Control Protocol, Transmission Control Protocol), a connection-oriented, reliable, byte stream-based transport layer communication protocol.
为了监控系统数据,并提高异常检测的准确性,本发明实施例提供了一种数据异常检测方法,如图2所示,本发明实施例提供的数据异常检测方法包括以下步骤:In order to monitor system data and improve the accuracy of anomaly detection, an embodiment of the present invention provides a data anomaly detection method. As shown in FIG. 2, the data anomaly detection method provided by the embodiment of the present invention includes the following steps:
步骤201、获取待测对象的检测样本数据。Step 201: Obtain test sample data of the object to be tested.
其中,检测样本数据包括待测对象的历史检测样本数据和当前检测样本数据。待测对象可以为一笔交易、或者一位用户、或者一个银行账户等。Among them, the detection sample data includes historical detection sample data and current detection sample data of the object to be tested. The object to be tested can be a transaction, or a user, or a bank account, etc.
本发明实施例中的当前检测样本数据和历史检测样本数据可以为用户的交易序列。通过将用户当前的交易序列输入数据异常检测系统,来预测当前交易的风险情况。The current detection sample data and historical detection sample data in the embodiment of the present invention may be a user's transaction sequence. By inputting the user's current transaction sequence into the data anomaly detection system, the risk of the current transaction can be predicted.
历史检测样本数据为历史时间段内待测对象的检测样本。历史时间段为待测对象对应的当前时间点之前的时间段,例如,当前时间点为2019年6月3日上午10点,历史时间段为2018年6月3日上午10点至2019年6月3日上午10点。具体实施过程中,历史时间段的时间长度可以根据需要和精确度进行选择,其中,历史时间段的时长越长,则检测准确度越高,但所需计算量越大;历史时间段的时长越段短,则检测所需计算量越小,但准确性较低。The historical test sample data is the test sample of the object to be tested in the historical time period. The historical time period is the time period before the current time point corresponding to the object to be tested. For example, the current time point is at 10:00 am on June 3, 2019, and the historical time period is from 10:00 am on June 3, 2018 to June 2019. 10 am on the 3rd of the month. In the specific implementation process, the time length of the historical time period can be selected according to needs and accuracy. Among them, the longer the historical time period, the higher the detection accuracy, but the greater the amount of calculation required; the historical time period’s time length The shorter the segment, the smaller the amount of calculation required for detection, but the accuracy is lower.
步骤202、根据所述检测样本数据,确定所述待测对象对应于第一机器学习模型的第一检测特征值,以及对应于规则算法的第二检测特征值,所述规则算法中包含至少一个判断逻辑。Step 202: According to the detection sample data, determine that the object to be tested corresponds to a first detection feature value of the first machine learning model and a second detection feature value corresponding to a rule algorithm, where the rule algorithm includes at least one Judgment logic.
具体实施过程中,第一机器学习模型可以按照需求选择,可以为神经网络模型、PCA(principal components analysis,主成分分析)模型等。较佳的,本发明实施例中利用神经网络模型作为第一机器学习模型。In the specific implementation process, the first machine learning model can be selected according to requirements, and can be a neural network model, a PCA (principal components analysis, principal component analysis) model, and so on. Preferably, the neural network model is used as the first machine learning model in the embodiment of the present invention.
步骤203、将所述第一机器学习模型对应的第一检测特征值输入已训练的机器学习模型,得到所述待测对象的第一输出向量,并且将所述规则算法对应的第二检测特征值输入所述规则算法中,得到所述待测对象的第二输出向量。Step 203: Input the first detection feature value corresponding to the first machine learning model into the trained machine learning model to obtain the first output vector of the object to be tested, and transfer the second detection feature corresponding to the rule algorithm The value is input into the rule algorithm to obtain the second output vector of the object to be tested.
针对神经网络模型,需要根据检测样本数据,确定历史特征对应的历史 特征值,以及确定即时特征对应的当前特征值。具体的,对于一个具体的待测对象,将其历史特征值和即时特征值按需组合,进行One-Hot Encoding,再输入神经网络模型。For the neural network model, it is necessary to determine the historical feature value corresponding to the historical feature and the current feature value corresponding to the instant feature according to the detected sample data. Specifically, for a specific object to be tested, its historical feature value and real-time feature value are combined as needed to perform One-Hot Encoding, and then enter the neural network model.
规则算法,则针对规则算法中的一个或多个判断逻辑,根据检测样本数据计算出对应的第二检测特征值,再将第二检测特征值依据判断逻辑进行评判。For the rule algorithm, for one or more judgment logics in the rule algorithm, the corresponding second detection feature value is calculated according to the detection sample data, and then the second detection feature value is judged according to the judgment logic.
步骤204、将所述第一输出向量和第二输出向量输入已训练的第二机器学习模型,确定所述待测对象的输出风险指数。Step 204: Input the first output vector and the second output vector into the trained second machine learning model, and determine the output risk index of the object to be tested.
其中,第二机器学习模型也可以按需选择,可以为逻辑回归模型、神经网络模型等。较佳的,本发明实施例中利用逻辑回归模型作为第二机器学习模型。Among them, the second machine learning model can also be selected as needed, and can be a logistic regression model, a neural network model, and the like. Preferably, in the embodiment of the present invention, a logistic regression model is used as the second machine learning model.
步骤205、根据所述输出风险指数,确定所述待测对象的异常判定结果。Step 205: Determine the abnormality determination result of the object to be tested according to the output risk index.
其中,若风险指数大于风险阈值,则表明风险较大,即待测对象出现了异常。此时,可以通过邮件、公司内部流程单据等方式通知到对应人员。另一方面,若风险指数小于或等于风险阈值,则表明待测对象正常。Among them, if the risk index is greater than the risk threshold, it indicates that the risk is greater, that is, the object to be tested is abnormal. At this time, the corresponding personnel can be notified by mail, internal process documents of the company, etc. On the other hand, if the risk index is less than or equal to the risk threshold, it indicates that the object to be tested is normal.
本发明实施例中,针对待测对象的异常检测,风控系统根据检测样本数据确定待测对象对应于第一机器学习模型的第一检测特征值,以及对应于规则算法的第二检测特征值,这里的规则算法中包含至少一个判断逻辑。将第一机器学习模型对应的第一检测特征值输入已训练的机器学习模型,得到待测对象的第一输出向量。另一方面,将规则算法对应的第二检测特征值输入规则算法中,得到待测对象的第二输出向量。将第一输出向量和第二输出向量输入已训练的第二机器学习模型,确定待测对象的输出风险指数,并根据所述输出风险指数,确定所述待测对象的异常判定结果。本发明实施例中,将机器学习算法与规则算法紧密相连,将第一机器学习模型的输出结果与规则算法的输出结果输入至第二机器学习模型中,利用第二机器学习模型有效结合了第一机器学习模型与规则算法的输出,准确性和精确率均高于单独利用机器学习模型,召回率指标也优于一般的机器学习模型系统。In the embodiment of the present invention, for the abnormal detection of the object to be tested, the risk control system determines the first detection feature value of the object to be tested corresponding to the first machine learning model and the second detection feature value corresponding to the rule algorithm according to the detection sample data , The rule algorithm here contains at least one judgment logic. The first detection feature value corresponding to the first machine learning model is input into the trained machine learning model to obtain the first output vector of the object to be tested. On the other hand, the second detection feature value corresponding to the rule algorithm is input into the rule algorithm to obtain the second output vector of the object to be tested. The first output vector and the second output vector are input to the trained second machine learning model to determine the output risk index of the object to be tested, and according to the output risk index, determine the abnormality determination result of the object to be tested. In the embodiment of the present invention, the machine learning algorithm and the rule algorithm are closely connected, the output result of the first machine learning model and the output result of the rule algorithm are input into the second machine learning model, and the second machine learning model is used to effectively combine the first machine learning model. The accuracy and precision rate of the output of a machine learning model and the rule algorithm are higher than that of the machine learning model alone, and the recall rate index is also better than the general machine learning model system.
对于传统的规则算法只有两种可能的输出结果,即输出结果为有风险,或者为无风险,即输出仅有0和1,无法量化规则算法的置信度。因此,本发明实施例在利用规则算法的同时引入机器学习算法,将两者融合在一起,紧密相连。为了适应于机器学习算法的输入和输出,需要对规则算法的输出进行转化和变形。本发明实施例中从规则算法中计算得出第二输出向量,第二输出向量包含至少一个输出标识。上述步骤203,将所述待测对象的第二检测特征值输入所述规则算法中,得到所述待测对象的第二输出向量,包括:For the traditional rule algorithm, there are only two possible output results, that is, the output result is risky or risk-free, that is, the output is only 0 and 1, and the confidence of the rule algorithm cannot be quantified. Therefore, the embodiment of the present invention introduces the machine learning algorithm while using the rule algorithm, and merges the two together and is closely connected. In order to adapt to the input and output of the machine learning algorithm, the output of the rule algorithm needs to be transformed and deformed. In the embodiment of the present invention, the second output vector is calculated from the rule algorithm, and the second output vector includes at least one output identifier. The above step 203, inputting the second detection feature value of the object under test into the rule algorithm to obtain the second output vector of the object under test, includes:
确定判定结果与输出标识的对应关系;Determine the corresponding relationship between the judgment result and the output identification;
针对所述规则算法中的每一个判断逻辑,利用对应的第二检测特征值,根据所述判断逻辑进行判定,得到对应的判定结果,并根据所述判定结果确定对应的输出标识;For each judgment logic in the rule algorithm, use the corresponding second detection characteristic value to make a judgment according to the judgment logic to obtain a corresponding judgment result, and determine the corresponding output identifier according to the judgment result;
按照预定顺序将所有输出标识组成所述第二输出向量。All output identifiers are combined into the second output vector in a predetermined order.
具体来说,本发明实施例中利用输出标识将判定结果数字化。由于规则算法中的判定结果一般存在有风险和无风险两种可能,因此,利用1和0将判定结果数字化,一般来说,若判定结果为有风险,则对应的输出标识为1;若判定结果为无风险,则对应的输出标识为0。另一方面,为了增加准确性,以及为了便于后续对规则算法进行优化,本发明实施例中并不是将规则算法的总的判断结果作为规则算法的规则输出结果,而是根据规则算法中每一个判断逻辑,确定一个规则输出结果,将所有规则输出结果结合,作为第二输出向量。Specifically, in the embodiment of the present invention, the output identifier is used to digitize the determination result. Since the judgment result in the rule algorithm is generally risky and risk-free, the judgment result is digitized with 1 and 0. Generally speaking, if the judgment result is risky, the corresponding output flag is 1; If the result is no risk, the corresponding output identifier is 0. On the other hand, in order to increase accuracy and to facilitate subsequent optimization of the rule algorithm, in the embodiment of the present invention, the total judgment result of the rule algorithm is not used as the rule output result of the rule algorithm, but is based on each of the rule algorithms. The judgment logic determines a rule output result, and combines all the rule output results as the second output vector.
举例来说,规则算法中包含两条规则:“A+B>8”和“C|(D>(E-F))”。对应传统的规则算法来说,只要满足其中任一条规则,即判断该笔交易为风险交易。因此,传统的规则算法只会输出一个结果,1或者0。For example, the rule algorithm contains two rules: "A+B>8" and "C|(D>(E-F))". Corresponding to the traditional rule algorithm, as long as any one of the rules is satisfied, the transaction is judged to be a risky transaction. Therefore, the traditional rule algorithm will only output one result, 1 or 0.
本发明实施例中,规则算法按照预定顺序遍历规则中的所有判断逻辑,这个预定顺序可以为中序、前序、后序等。针对每一个判断逻辑生成一个判定结果,再根据判定结果与输出标识的对应关系,确定对应的输出标识。In the embodiment of the present invention, the rule algorithm traverses all the judgment logics in the rule in a predetermined order, and the predetermined order may be middle order, preorder, postorder, etc. A determination result is generated for each determination logic, and then the corresponding output identifier is determined according to the corresponding relationship between the determination result and the output identifier.
任以上述规则“A+B>8”和“C|(D>(E-F))”为例进行说明。图3为上述 规则的规则树示意图。如图3所示,每一个规则对应个一个规则树。其中,第一个规则树包含一个判断逻辑,第二个规则树包含三个判断逻辑,因此,该规则算法对应的第二输出向量d包含4个输出标识,记为[s 1,s 2,s 3,s 4]。图3中从左至右,第一个判断逻辑为判断A+B>8是否成立,对应两个判定结果,即成立或者不成立,若成立,则对应的输出标识s 1为1;若不成立,则输出标识s 1为0。第二个判断逻辑为待测对象的第二检测特征值中是否包含C,若是,则对应的输出标识s 2为1;若否,则对应的输出标识s 2为0。第三个判断逻辑为C|(D>(E-F))是否成立,若是,则对应的输出标识s 3为1;若否,则对应的输出标识s 3为0。第四个判断逻辑为D>(E-F)是否成立,若是,则对应的输出标识s 4为1;若否,则对应的输出标识s 4为0。将所有判定逻辑遍历后,得到最终的第二输出向量,第二输出向量中每一个元素均为1或者0。 Let us take the above rules "A+B>8" and "C|(D>(EF))" as examples. Figure 3 is a schematic diagram of the rule tree of the above rules. As shown in Figure 3, each rule corresponds to a rule tree. Among them, the first rule tree contains one judgment logic, and the second rule tree contains three judgment logics. Therefore, the second output vector d corresponding to the rule algorithm contains 4 output identifiers, denoted as [s 1 ,s 2 , s 3 ,s 4 ]. From left to right in Figure 3, the first judgment logic is to judge whether A+B>8 is true, corresponding to two judgment results, namely, yes or no, if yes, the corresponding output identifier s 1 is 1; if not, Then the output identifier s 1 is 0. The second judgment logic is whether C is included in the second detection characteristic value of the object to be tested. If it is, the corresponding output identifier s 2 is 1; if not, the corresponding output identifier s 2 is 0. Analyzing the third logic C | (D> (EF) ) is satisfied, if yes, identifying the corresponding output s 3 to 1; if not, then the corresponding output identifier is 0 s 3. The fourth judgment logic is whether D>(EF) is established. If it is, the corresponding output identifier s 4 is 1; if not, the corresponding output identifier s 4 is 0. After all the decision logic is traversed, the final second output vector is obtained, and each element in the second output vector is 1 or 0.
本发明实施例中不仅对规则算法进行了适应性的改进,对于第一机器学习算法也根据第二机器学习算法输入的需求,进行了适应性改进。下面以第一机器学习算法为神经网络模型为例进行说明。In the embodiment of the present invention, not only the rule algorithm is adaptively improved, but the first machine learning algorithm is also adaptively improved according to the input requirements of the second machine learning algorithm. In the following, the first machine learning algorithm is a neural network model as an example for description.
传统的神经网络模型输出的结果为风险指数,该风险指数y t可以通过以下公式进行计算: The output result of the traditional neural network model is the risk index, and the risk index y t can be calculated by the following formula:
y t=σW d(W cReLU(W bReLU(W a·x+b a)+b b)+b c)+b d)……公式1 y t =σW d (W c ReLU(W b ReLU(W a ·x+b a )+b b )+b c )+b d )……Formula 1
其中,x为待测对象对应于神经网络模型的第一检测特征值,b a至b d为对应于神经网络模型的偏置向量,W a至W d为神经网络模型的权值矩阵;σ为sigmoid函数,为一个定值;ReLU为激活函数。 Wherein, x is a first object to be measured corresponding to the characteristic value detection neural network model, b a b d corresponding to the neural network model to the offset vector, W a W d is the weight to the matrix of the neural network model; [sigma] Is the sigmoid function, which is a fixed value; ReLU is the activation function.
本发明实施例中,为了满足第二机器学习算法的输入为向量的需求,只获取公式1中的j维输出向量,即第一输出向量c满足以下公式:In the embodiment of the present invention, in order to meet the requirement that the input of the second machine learning algorithm is a vector, only the j-dimensional output vector in formula 1 is obtained, that is, the first output vector c satisfies the following formula:
c=σ(W cReLU(W bReLU(W a·x+b a)+b b)+b c)……公式2 c=σ(W c ReLU(W b ReLU(W a ·x+b a )+b b )+b c )……Formula 2
其中,c为神经网络模型对应的第二输出向量。Among them, c is the second output vector corresponding to the neural network model.
对比公式1与公式2可以看出,公式1得出的是一个值,即风险指数,而公式2得出的是向量,即第二输出向量c。Comparing formula 1 and formula 2, it can be seen that formula 1 obtains a value, that is, the risk index, while formula 2 obtains a vector, that is, the second output vector c.
本发明实施例中,将第一机器学习模型的输出以及规则算法的输出,作 为第二机器学习模型的输入。通过第二机器学习模型将第一机器学习模型和规则算法结合,从而使机器学习模型和规则算法产生有效互补。下面以第二机器学习模型为逻辑回归模型为例进行介绍。In the embodiment of the present invention, the output of the first machine learning model and the output of the rule algorithm are used as the input of the second machine learning model. The first machine learning model and the rule algorithm are combined through the second machine learning model, so that the machine learning model and the rule algorithm can effectively complement each other. The second machine learning model is a logistic regression model as an example for introduction.
具体实施过程中,逻辑回归模型将神经网络模型的输出与规则算法的输出进行回归,得到对待测对象的风险的最终预测。一种可选的实施例中,逻辑回归模型利用以下公式计算输出风险指数:In the specific implementation process, the logistic regression model regresses the output of the neural network model and the output of the rule algorithm to obtain the final prediction of the risk of the object to be tested. In an optional embodiment, the logistic regression model uses the following formula to calculate the output risk index:
y=σ(W 0[c,d]+b 0)……公式3 y=σ(W 0 [c,d]+b 0 )……Formula 3
其中,y为逻辑回归模型计算得出的输出风险指数;b 0为对应于逻辑回归模型的偏置向量;c为神经网络模型的第一输出向量;d为规则算法的第二输出向量;W 0为对应于逻辑回归模型的权重矩阵,其中包括i个权重值,权重值的个数等于第一输出向量中元素的个数与第二输出向量中元素个数之和。 Among them, y is the output risk index calculated by the logistic regression model; b 0 is the bias vector corresponding to the logistic regression model; c is the first output vector of the neural network model; d is the second output vector of the rule algorithm; W 0 is the weight matrix corresponding to the logistic regression model, which includes i weight values, and the number of weight values is equal to the sum of the number of elements in the first output vector and the number of elements in the second output vector.
上述公式3中,权重矩阵W 0中的每个权重参数,对应于一个逻辑回归模型的每个输入的权重。对于规则算法对应的第二输出向量而言,其中每一个输出标识s对应于一个权重参数w,权重参数w越高,表明该输出标识对应的判断逻辑的重要性越高,依据该判断逻辑进行风险判断的准确性越高。反之,若w越低或为负,则说明该判断逻辑效果较差,需要规则进行调整。 In the above formula 3, each weight parameter in the weight matrix W 0 corresponds to the weight of each input of a logistic regression model. For the second output vector corresponding to the rule algorithm, each output identifier s corresponds to a weight parameter w. The higher the weight parameter w, the higher the importance of the judgment logic corresponding to the output identifier. The higher the accuracy of risk judgment. Conversely, if w is lower or negative, it means that the judgment logic is inferior and needs to be adjusted by rules.
进一步地,本发明实施例中的第一机器学习模型可以包括多个不同的机器学习子模型,从而进一步增加了风险判断的准确性,且适合的场景更广,精确度更高。Further, the first machine learning model in the embodiment of the present invention may include a plurality of different machine learning sub-models, thereby further increasing the accuracy of risk judgment, and the suitable scenarios are wider and the accuracy is higher.
由上述分析可知,逻辑回归模型中,对应于规则算法的权重参数,可以作为调整规则算法中判断逻辑的依据。进一步地,本发明实施例还包括:From the above analysis, it can be seen that in the logistic regression model, the weight parameters corresponding to the rule algorithm can be used as the basis for judging the logic in the adjustment rule algorithm. Further, the embodiment of the present invention further includes:
获取规则算法中的所有判断逻辑;Get all the judgment logic in the rule algorithm;
从所述第二机器学习模型中获取每个判断逻辑对应的权重参数;Obtaining a weight parameter corresponding to each judgment logic from the second machine learning model;
针对每个判断逻辑,根据所述判断逻辑与其它判断逻辑之间的关系,以及判断逻辑对应的权重参数,确定所述判断逻辑的合理性。For each judgment logic, the rationality of the judgment logic is determined according to the relationship between the judgment logic and other judgment logics, and the weight parameters corresponding to the judgment logic.
具体实施过程中,逻辑回归模型中存储有计算过的每个规则算法对应的权重参数。需要对规则算法进行合理性评价或者进行优化时,用户通过前端 用户界面,如客户端或者浏览器,发送分析请求,该分析请求中包含一个或者多个规则组成的规则集合。系统的规则辅助分析主控收到请求后解析规则集合,确定规则集合中的所有判断逻辑,确定每一个判断逻辑在逻辑回归模型中的权重参数。然后依据判断逻辑与其它判断逻辑之间的关系,以及判断逻辑对应的权重参数,确定判断逻辑的合理性。In the specific implementation process, the weight parameter corresponding to each rule algorithm that has been calculated is stored in the logistic regression model. When it is necessary to evaluate or optimize the rationality of the rule algorithm, the user sends an analysis request through a front-end user interface, such as a client or a browser, and the analysis request contains a rule set consisting of one or more rules. The system’s rule-assisted analysis master parses the rule set after receiving the request, determines all the judgment logic in the rule set, and determines the weight parameter of each judgment logic in the logistic regression model. Then according to the relationship between the judgment logic and other judgment logics, and the weight parameters corresponding to the judgment logic, the rationality of the judgment logic is determined.
图4示出了本发明实施例中需要优化的规则算法的规则树。规则辅助分析主控收到请求后解析规则集,加载规则树在逻辑回归模型中的权重参数;以每一个判断逻辑为元数据,进行规则树内分析,分析可优化的判断逻辑。如图4所示,该规则算法包含两个规则数,其中一个规则树包含一个或多个判断逻辑。以判断逻辑节点为元数据,进行规则树内分析,如图4中左侧的规则树,若w 1≤w 2,则建议将w 1对应节点剪枝,只保留右侧分支。又例如,还可以进行规则树间对比分析,分析相似结构节点的权重,如图4中两个规则树,w 4和w 8对应节点属于相似结构,如果w 4≤w 8,则建议采用w 8对应结构。 Fig. 4 shows a rule tree of a rule algorithm that needs to be optimized in an embodiment of the present invention. Rule-assisted analysis The main controller parses the rule set after receiving the request, loads the weight parameters of the rule tree in the logistic regression model; uses each judgment logic as metadata, analyzes the rule tree, and analyzes the judgment logic that can be optimized. As shown in Figure 4, the rule algorithm contains two rule numbers, and one rule tree contains one or more judgment logics. Use the judgment logic node as metadata to analyze the rule tree, as shown in the rule tree on the left in Figure 4. If w 1 ≤ w 2 , it is recommended to prun the node corresponding to w 1 and keep only the right branch. For another example, you can also perform comparative analysis between rule trees to analyze the weights of similar structure nodes. In the two rule trees in Figure 4, w 4 and w 8 correspond to nodes that belong to similar structures. If w 4 ≤ w 8 , it is recommended to use w 8 corresponds to the structure.
规则辅助分析主控还将当前批次元数据发送至历史规则分析模块。历史规则分析模块将从历史规则库中搜索与当前批次元数据相似的结构。对于一个批次的相似的历史元数据,首先选取一个或一组完全相同的历史元数据,以其为基准将该批次历史元数据的权重与当前批次元数据权重进行转换,使两者具有可对比性。然后分析当前批次元数据的可替换性。若对于某一个元数据,在历史规则库中有权重更大的相似结构,则建议替换该结构。将当前批次分析结果和历史批次分析结果发送至建议生成模块,生成可视化结果和描述性建议,返回前端界面。The rule-assisted analysis master also sends the current batch metadata to the historical rule analysis module. The historical rule analysis module will search the historical rule library for structures similar to the metadata of the current batch. For a batch of similar historical metadata, first select one or a group of exactly the same historical metadata, and use it as a basis to convert the weight of the historical metadata of the batch and the weight of the current batch of metadata to make both Comparable. Then analyze the interchangeability of the current batch metadata. If there is a similar structure with greater weight in the historical rule base for a certain metadata, it is recommended to replace that structure. Send the current batch analysis results and historical batch analysis results to the suggestion generation module, generate visual results and descriptive suggestions, and return to the front-end interface.
进一步地,由于本发明实施例中包含了至少两个机器学习模型,针对这两个机器学习模型的训练过程,可以将一个或多个第一机器学习模型分别单独训练,最后将所有的输出向量以及规则算法的输出合在一起,用于训练第二机器学习模型。也可以将所有第一机器学习模型和第二机器学习模型合在一起,共同进行训练。下面以神经网络模型和逻辑归回模型为例进行介绍。Further, since the embodiment of the present invention contains at least two machine learning models, for the training process of these two machine learning models, one or more first machine learning models can be separately trained, and finally all output vectors And the output of the rule algorithm are combined to train the second machine learning model. It is also possible to combine all the first machine learning model and the second machine learning model for joint training. The following takes the neural network model and the logical return model as examples to introduce.
对于分开训练,神经网络模型利用以下方式进行训练:For separate training, the neural network model is trained in the following ways:
获取历史时间段内的训练样本数据;Obtain the training sample data in the historical time period;
根据训练样本数据,选择训练对象对应于神经网络模型的第一训练特征,并确定第一训练特征对应的第一训练特征值;According to the training sample data, select the training object corresponding to the first training feature of the neural network model, and determine the first training feature value corresponding to the first training feature;
将第一训练特征值输入初始神经网络模型,并根据得到的机器风险指数及训练对象的异常判定结果计算损失函数,当损失函数小于预设阈值时,确定对应的第一参数为神经网络模型对应的第一参数,得到已训练的神经网络模型。Input the first training feature value into the initial neural network model, and calculate the loss function according to the obtained machine risk index and the abnormal determination result of the training object. When the loss function is less than the preset threshold, determine the corresponding first parameter as the neural network model corresponding The first parameter of, get the trained neural network model.
逻辑回归模型利用以下方式进行训练:The logistic regression model is trained in the following ways:
从已训练的神经网络模型中获取训练对象的第一输出向量;Obtain the first output vector of the training object from the trained neural network model;
根据训练样本数据,选择训练对象对应于规则算法的第二训练特征,并确定第二训练特征对应的第二训练特征值;According to the training sample data, select the training object corresponding to the second training feature of the rule algorithm, and determine the second training feature value corresponding to the second training feature;
将第二训练特征值输入规则算法中,得到训练对象的第二输出向量;Input the second training feature value into the rule algorithm to obtain the second output vector of the training object;
将第一输出向量和第二输出向量输入初始逻辑回归模型,并根据得到的输出风险指数及训练对象的异常判定结果计算损失函数,当损失函数小于预设阈值时,确定对应的第二参数为逻辑回归模型对应的第二参数,得到已训练的逻辑回归模型。The first output vector and the second output vector are input to the initial logistic regression model, and the loss function is calculated according to the obtained output risk index and the abnormal determination result of the training object. When the loss function is less than the preset threshold, the corresponding second parameter is determined to be The second parameter corresponding to the logistic regression model obtains the trained logistic regression model.
对于共同训练,神经网络模型和逻辑回归模型利用以下方式进行训练:For co-training, the neural network model and logistic regression model are trained in the following ways:
获取历史时间段内的训练样本数据;Obtain the training sample data in the historical time period;
根据训练样本数据,选择训练对象对应于神经网络模型的第一训练特征,并确定第一训练特征对应的第一训练特征值;According to the training sample data, select the training object corresponding to the first training feature of the neural network model, and determine the first training feature value corresponding to the first training feature;
将第一训练特征值输入初始神经网络模型,得到训练对象的第一输出向量;Input the first training feature value into the initial neural network model to obtain the first output vector of the training object;
根据训练样本数据,选择训练对象对应于规则算法的第二训练特征,并确定第二训练特征对应的第二训练特征值;According to the training sample data, select the training object corresponding to the second training feature of the rule algorithm, and determine the second training feature value corresponding to the second training feature;
将第二训练特征值输入规则算法中,得到训练对象的第二输出向量;Input the second training feature value into the rule algorithm to obtain the second output vector of the training object;
将第一输出向量和第二输出向量输入初始逻辑回归模型,并根据得到的 输出风险指数及训练对象的异常判定结果计算损失函数,当损失函数小于预设阈值时,确定对应的第一参数为神经网络模型对应的第一参数,得到已训练的神经网络模型,并确定对应的第二参数为逻辑回归模型对应的第二参数,得到已训练的逻辑回归模型。The first output vector and the second output vector are input to the initial logistic regression model, and the loss function is calculated according to the obtained output risk index and the abnormal determination result of the training object. When the loss function is less than the preset threshold, the corresponding first parameter is determined to be The first parameter corresponding to the neural network model obtains the trained neural network model, and the corresponding second parameter is determined to be the second parameter corresponding to the logistic regression model to obtain the trained logistic regression model.
为了更清楚地理解本发明,以具体实施例对上述流程进行详细描述。具体实施例第一机器学习模型为神经网络模型,第二机器学习模型为逻辑回归模型。图5示出了具体实施例中数据风险异常检测方法的流程示意图。如图5所示,数据风险异常检测方法的核心为双引擎模型,包括规则子引擎、深度子引擎、输出模块、规则辅助分析模块四部分,其中:In order to understand the present invention more clearly, specific embodiments are used to describe the foregoing process in detail. Specific embodiments The first machine learning model is a neural network model, and the second machine learning model is a logistic regression model. Fig. 5 shows a schematic flowchart of a method for detecting data risk anomalies in a specific embodiment. As shown in Figure 5, the core of the data risk anomaly detection method is a dual-engine model, which includes four parts: a rule sub-engine, a deep sub-engine, an output module, and a rule-assisted analysis module, of which:
规则子引擎包含一组规则集,对于待测交易,遍历规则集中的所有规则,评估该笔交易的风险。如图5所示的两条规则,“A+B>8”和“C|(D>(E-F))”,引擎中序遍历规则树,将所有的判断逻辑节点的计算结果按序记录,作为规则子引擎的输出d=[s 1,s 2,s 3,s 4]。 The rule sub-engine contains a set of rules. For the transaction to be tested, it traverses all the rules in the rule set and evaluates the risk of the transaction. As shown in Figure 5, the two rules, "A+B>8" and "C|(D>(EF))", the engine traverses the rule tree in order, and records the calculation results of all the judgment logic nodes in order. As the output of the rule sub-engine d=[s 1 ,s 2 ,s 3 ,s 4 ].
深度子引擎使用已训练的神经网络模型对待测交易的风险进行评估。对于待测交易,将历史特征和即时特征按需组合,进行One-Hot Encoding,再输入神经网络模型,输出向量c。The deep sub-engine uses the trained neural network model to evaluate the risk of the transaction under test. For the transaction to be tested, the historical feature and the real-time feature are combined as needed to perform One-Hot Encoding, and then input the neural network model, and output the vector c.
输出模块利用已训练的逻辑回归模型将规则子引擎和深度子引擎的输出进行回归,得到对该笔交易风险的最终预测。The output module uses the trained logistic regression model to regress the output of the rule sub-engine and the depth sub-engine to obtain the final prediction of the transaction risk.
此外,规则辅助分析模块接收前端指令对多条规则进行对比,辅助规则制定。分析单条规则内部多个判断逻辑节点的权重、分析多条规则间判断逻辑节点的权重、分析历史规则库中相似规则权重,生成可视化结果,给出对现有规则的改进建议。In addition, the rule-assisted analysis module receives front-end instructions to compare multiple rules and assist in rule formulation. Analyze the weights of multiple judgment logic nodes within a single rule, analyze the weights of judgment logic nodes among multiple rules, analyze the weights of similar rules in the historical rule library, generate visual results, and give suggestions for improving existing rules.
本发明实施例还提供了一种数据异常检测装置,如图6所示,包括:The embodiment of the present invention also provides a data abnormality detection device, as shown in FIG. 6, including:
获取单元601,用于获取待测对象的检测样本数据;The obtaining unit 601 is configured to obtain test sample data of the object to be tested;
处理单元602,用于根据所述检测样本数据,确定所述待测对象对应于第一机器学习模型的第一检测特征值,以及对应于规则算法的第二检测特征值,所述规则算法中包含至少一个判断逻辑;The processing unit 602 is configured to determine, according to the detection sample data, that the object to be tested corresponds to a first detection feature value of the first machine learning model, and a second detection feature value corresponding to a rule algorithm. In the rule algorithm Contain at least one judgment logic;
计算单元603,用于将所述第一机器学习模型对应的第一检测特征值输入已训练的机器学习模型,得到所述待测对象的第一输出向量,并且将所述规则算法对应的第二检测特征值输入所述规则算法中,得到所述待测对象的第二输出向量;The calculation unit 603 is configured to input the first detection feature value corresponding to the first machine learning model into the trained machine learning model to obtain the first output vector of the object to be tested, and to convert the first output vector corresponding to the rule algorithm 2. The detection feature value is input into the rule algorithm to obtain the second output vector of the object to be tested;
输出单元604,用于将所述第一输出向量和第二输出向量输入已训练的第二机器学习模型,确定所述待测对象的输出风险指数;The output unit 604 is configured to input the first output vector and the second output vector into the trained second machine learning model to determine the output risk index of the object to be tested;
判定单元605,用于根据所述输出风险指数,确定所述待测对象的异常判定结果。The determining unit 605 is configured to determine the abnormal determination result of the object to be tested according to the output risk index.
一种可选的实施例中,所述第二输出向量包含至少一个输出标识;所述计算单元,具体用于:In an optional embodiment, the second output vector includes at least one output identifier; the calculation unit is specifically configured to:
确定判定结果与输出标识的对应关系;Determine the corresponding relationship between the judgment result and the output identification;
针对所述规则算法中的每一个判断逻辑,利用对应的第二检测特征值,根据所述判断逻辑进行判定,得到对应的判定结果,并根据所述判定结果确定对应的输出标识;For each judgment logic in the rule algorithm, use the corresponding second detection characteristic value to make a judgment according to the judgment logic to obtain a corresponding judgment result, and determine the corresponding output identifier according to the judgment result;
按照预定顺序将所有输出标识组成所述第二输出向量。All output identifiers are combined into the second output vector in a predetermined order.
一种可选的实施例中,所述第一机器学习模型为神经网络模型,所述第二机器学习模型为逻辑回归模型。In an optional embodiment, the first machine learning model is a neural network model, and the second machine learning model is a logistic regression model.
一种可选的实施例中,还包括训练单元606,用于利用以下方式训练所述神经网络模型:In an optional embodiment, it further includes a training unit 606, configured to train the neural network model in the following manner:
获取历史时间段内的训练样本数据;Obtain the training sample data in the historical time period;
根据所述训练样本数据,选择训练对象对应于所述神经网络模型的第一训练特征,并确定第一训练特征对应的第一训练特征值;According to the training sample data, selecting a training object corresponding to the first training feature of the neural network model, and determining the first training feature value corresponding to the first training feature;
将所述第一训练特征值输入初始神经网络模型,并根据得到的机器风险指数及所述训练对象的异常判定结果计算损失函数,当所述损失函数小于预设阈值时,确定对应的第一参数为所述神经网络模型对应的第一参数,得到已训练的神经网络模型;The first training feature value is input into the initial neural network model, and the loss function is calculated according to the obtained machine risk index and the abnormal determination result of the training object. When the loss function is less than a preset threshold, the corresponding first The parameter is the first parameter corresponding to the neural network model, and the trained neural network model is obtained;
所述训练单元,还用于利用以下方式训练所述逻辑回归模型:The training unit is also used to train the logistic regression model in the following manner:
从所述已训练的神经网络模型中获取训练对象的第一输出向量;Obtaining the first output vector of the training object from the trained neural network model;
根据所述训练样本数据,选择训练对象对应于所述规则算法的第二训练特征,并确定第二训练特征对应的第二训练特征值;According to the training sample data, selecting a training object corresponding to the second training feature of the rule algorithm, and determining a second training feature value corresponding to the second training feature;
将第二训练特征值输入所述规则算法中,得到所述训练对象的第二输出向量;Input the second training feature value into the rule algorithm to obtain the second output vector of the training object;
将所述第一输出向量和所述第二输出向量输入初始逻辑回归模型,并根据得到的输出风险指数及所述训练对象的异常判定结果计算损失函数,当所述损失函数小于预设阈值时,确定对应的第二参数为所述逻辑回归模型对应的第二参数,得到已训练的逻辑回归模型。Input the first output vector and the second output vector to the initial logistic regression model, and calculate a loss function according to the obtained output risk index and the abnormality determination result of the training object, when the loss function is less than a preset threshold , It is determined that the corresponding second parameter is the second parameter corresponding to the logistic regression model, and the trained logistic regression model is obtained.
一种可选的实施例中,所述训练单元606,还用于利用以下方式训练所述神经网络模型和所述逻辑回归模型:In an optional embodiment, the training unit 606 is further configured to train the neural network model and the logistic regression model in the following manner:
获取历史时间段内的训练样本数据;Obtain the training sample data in the historical time period;
根据所述训练样本数据,选择训练对象对应于所述神经网络模型的第一训练特征,并确定第一训练特征对应的第一训练特征值;According to the training sample data, selecting a training object corresponding to the first training feature of the neural network model, and determining the first training feature value corresponding to the first training feature;
将所述第一训练特征值输入初始神经网络模型,得到所述训练对象的第一输出向量;Input the first training feature value into an initial neural network model to obtain a first output vector of the training object;
根据所述训练样本数据,选择训练对象对应于所述规则算法的第二训练特征,并确定第二训练特征对应的第二训练特征值;According to the training sample data, selecting a training object corresponding to the second training feature of the rule algorithm, and determining a second training feature value corresponding to the second training feature;
将第二训练特征值输入所述规则算法中,得到所述训练对象的第二输出向量;Input the second training feature value into the rule algorithm to obtain the second output vector of the training object;
将所述第一输出向量和所述第二输出向量输入初始逻辑回归模型,并根据得到的输出风险指数及所述训练对象的异常判定结果计算损失函数,当所述损失函数小于预设阈值时,确定对应的第一参数为所述神经网络模型对应的第一参数,得到已训练的神经网络模型,并确定对应的第二参数为所述逻辑回归模型对应的第二参数,得到已训练的逻辑回归模型。Input the first output vector and the second output vector to the initial logistic regression model, and calculate a loss function according to the obtained output risk index and the abnormality determination result of the training object, when the loss function is less than a preset threshold , Determine that the corresponding first parameter is the first parameter corresponding to the neural network model to obtain the trained neural network model, and determine the corresponding second parameter to be the second parameter corresponding to the logistic regression model, to obtain the trained neural network model Logistic regression model.
一种可选的实施例中,所述第一机器学习模型包括多个不同的机器学习子模型。In an optional embodiment, the first machine learning model includes a plurality of different machine learning sub-models.
一种可选的实施例中,还包括分析单元607,用于:In an optional embodiment, an analysis unit 607 is further included, configured to:
获取规则算法中的所有判断逻辑;Get all the judgment logic in the rule algorithm;
从所述第二机器学习模型中获取每个判断逻辑对应的权重参数;Obtaining a weight parameter corresponding to each judgment logic from the second machine learning model;
针对每个判断逻辑,根据所述判断逻辑与其它判断逻辑之间的关系,以及判断逻辑对应的权重参数,确定所述判断逻辑的合理性。For each judgment logic, the rationality of the judgment logic is determined according to the relationship between the judgment logic and other judgment logics, and the weight parameters corresponding to the judgment logic.
基于相同的原理,本发明还提供一种电子设备,如图7所示,包括:Based on the same principle, the present invention also provides an electronic device, as shown in FIG. 7, including:
包括处理器701、存储器702、收发机703、总线接口704,其中处理器701、存储器702与收发机703之间通过总线接口704连接;It includes a processor 701, a memory 702, a transceiver 703, and a bus interface 704, wherein the processor 701, the memory 702 and the transceiver 703 are connected through the bus interface 704;
所述处理器701,用于读取所述存储器702中的程序,执行下列方法:The processor 701 is configured to read a program in the memory 702 and execute the following method:
获取待测对象的检测样本数据;Obtain test sample data of the object to be tested;
根据所述检测样本数据,确定所述待测对象对应于第一机器学习模型的第一检测特征值,以及对应于规则算法的第二检测特征值,所述规则算法中包含至少一个判断逻辑;According to the detection sample data, determine that the object to be tested corresponds to a first detection feature value of the first machine learning model and a second detection feature value corresponding to a rule algorithm, where the rule algorithm includes at least one judgment logic;
将所述第一机器学习模型对应的第一检测特征值输入已训练的机器学习模型,得到所述待测对象的第一输出向量,并且将所述规则算法对应的第二检测特征值输入所述规则算法中,得到所述待测对象的第二输出向量;The first detection feature value corresponding to the first machine learning model is input to the trained machine learning model to obtain the first output vector of the object to be tested, and the second detection feature value corresponding to the rule algorithm is input to the In the rule algorithm, the second output vector of the object to be tested is obtained;
将所述第一输出向量和第二输出向量输入已训练的第二机器学习模型,确定所述待测对象的输出风险指数;Input the first output vector and the second output vector into the trained second machine learning model to determine the output risk index of the object to be tested;
根据所述输出风险指数,确定所述待测对象的异常判定结果。According to the output risk index, the abnormal determination result of the object to be tested is determined.
本发明是参照根据本发明实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。The present invention is described with reference to flowcharts and/or block diagrams of methods, devices (systems), and computer program products according to embodiments of the present invention. It should be understood that each process and/or block in the flowchart and/or block diagram, and the combination of processes and/or blocks in the flowchart and/or block diagram can be realized by computer program instructions. These computer program instructions can be provided to the processor of a general-purpose computer, a special-purpose computer, an embedded processor, or other programmable data processing equipment to generate a machine, so that the instructions executed by the processor of the computer or other programmable data processing equipment are used to generate It is a device that realizes the functions specified in one process or multiple processes in the flowchart and/or one block or multiple blocks in the block diagram.
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。These computer program instructions can also be stored in a computer-readable memory that can guide a computer or other programmable data processing equipment to work in a specific manner, so that the instructions stored in the computer-readable memory produce an article of manufacture including the instruction device. The device implements the functions specified in one process or multiple processes in the flowchart and/or one block or multiple blocks in the block diagram.
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。These computer program instructions can also be loaded on a computer or other programmable data processing equipment, so that a series of operation steps are executed on the computer or other programmable equipment to produce computer-implemented processing, so as to execute on the computer or other programmable equipment. The instructions provide steps for implementing the functions specified in one process or multiple processes in the flowchart and/or one block or multiple blocks in the block diagram.
尽管已描述了本发明的优选实施例,但本领域内的技术人员一旦得知了基本创造性概念,则可对这些实施例作出另外的变更和修改。所以,所附权利要求意欲解释为包括优选实施例以及落入本发明范围的所有变更和修改。Although the preferred embodiments of the present invention have been described, those skilled in the art can make additional changes and modifications to these embodiments once they learn the basic creative concept. Therefore, the appended claims are intended to be interpreted as including the preferred embodiments and all changes and modifications falling within the scope of the present invention.
显然,本领域的技术人员可以对本发明进行各种改动和变型而不脱离本发明的精神和范围。这样,倘若本发明的这些修改和变型属于本发明权利要求及其等同技术的范围之内,则本发明也意图包括这些改动和变型在内。Obviously, those skilled in the art can make various changes and modifications to the present invention without departing from the spirit and scope of the present invention. In this way, if these modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalent technologies, the present invention is also intended to include these modifications and variations.

Claims (16)

  1. 一种数据异常检测方法,其特征在于,包括:A data anomaly detection method, which is characterized in that it comprises:
    获取待测对象的检测样本数据;Obtain test sample data of the object to be tested;
    根据所述检测样本数据,确定所述待测对象对应于第一机器学习模型的第一检测特征值,以及对应于规则算法的第二检测特征值,所述规则算法中包含至少一个判断逻辑;According to the detection sample data, determine that the object to be tested corresponds to a first detection feature value of the first machine learning model and a second detection feature value corresponding to a rule algorithm, where the rule algorithm includes at least one judgment logic;
    将所述第一机器学习模型对应的第一检测特征值输入已训练的机器学习模型,得到所述待测对象的第一输出向量,并且将所述规则算法对应的第二检测特征值输入所述规则算法中,得到所述待测对象的第二输出向量;The first detection feature value corresponding to the first machine learning model is input to the trained machine learning model to obtain the first output vector of the object to be tested, and the second detection feature value corresponding to the rule algorithm is input to the In the rule algorithm, the second output vector of the object to be tested is obtained;
    将所述第一输出向量和第二输出向量输入已训练的第二机器学习模型,确定所述待测对象的输出风险指数;Input the first output vector and the second output vector into the trained second machine learning model to determine the output risk index of the object to be tested;
    根据所述输出风险指数,确定所述待测对象的异常判定结果。According to the output risk index, the abnormal determination result of the object to be tested is determined.
  2. 如权利要求1所述的方法,其特征在于,所述第二输出向量包含至少一个输出标识;所述将所述待测对象的第二检测特征值输入所述规则算法中,得到所述待测对象的第二输出向量,包括:The method of claim 1, wherein the second output vector contains at least one output identifier; and the second detection feature value of the object to be tested is input into the rule algorithm to obtain the The second output vector of the test object includes:
    确定判定结果与输出标识的对应关系;Determine the corresponding relationship between the judgment result and the output identification;
    针对所述规则算法中的每一个判断逻辑,利用对应的第二检测特征值,根据所述判断逻辑进行判定,得到对应的判定结果,并根据所述判定结果确定对应的输出标识;For each judgment logic in the rule algorithm, use the corresponding second detection characteristic value to make a judgment according to the judgment logic to obtain a corresponding judgment result, and determine the corresponding output identifier according to the judgment result;
    按照预定顺序将所有输出标识组成所述第二输出向量。All output identifiers are combined into the second output vector in a predetermined order.
  3. 如权利要求1所述的方法,其特征在于,所述第一机器学习模型为神经网络模型,所述第二机器学习模型为逻辑回归模型。The method of claim 1, wherein the first machine learning model is a neural network model, and the second machine learning model is a logistic regression model.
  4. 如权利要求3所述的方法,其特征在于,所述神经网络模型利用以下方式进行训练:The method of claim 3, wherein the neural network model is trained in the following manner:
    获取历史时间段内的训练样本数据;Obtain the training sample data in the historical time period;
    根据所述训练样本数据,选择训练对象对应于所述神经网络模型的第一 训练特征,并确定第一训练特征对应的第一训练特征值;According to the training sample data, selecting a training object corresponding to the first training feature of the neural network model, and determining the first training feature value corresponding to the first training feature;
    将所述第一训练特征值输入初始神经网络模型,并根据得到的机器风险指数及所述训练对象的异常判定结果计算损失函数,当所述损失函数小于预设阈值时,确定对应的第一参数为所述神经网络模型对应的第一参数,得到已训练的神经网络模型;The first training feature value is input into the initial neural network model, and the loss function is calculated according to the obtained machine risk index and the abnormal determination result of the training object. When the loss function is less than a preset threshold, the corresponding first The parameter is the first parameter corresponding to the neural network model, and the trained neural network model is obtained;
    所述逻辑回归模型利用以下方式进行训练:The logistic regression model is trained in the following manner:
    从所述已训练的神经网络模型中获取训练对象的第一输出向量;Obtaining the first output vector of the training object from the trained neural network model;
    根据所述训练样本数据,选择训练对象对应于所述规则算法的第二训练特征,并确定第二训练特征对应的第二训练特征值;According to the training sample data, selecting a training object corresponding to the second training feature of the rule algorithm, and determining a second training feature value corresponding to the second training feature;
    将第二训练特征值输入所述规则算法中,得到所述训练对象的第二输出向量;Input the second training feature value into the rule algorithm to obtain the second output vector of the training object;
    将所述第一输出向量和所述第二输出向量输入初始逻辑回归模型,并根据得到的输出风险指数及所述训练对象的异常判定结果计算损失函数,当所述损失函数小于预设阈值时,确定对应的第二参数为所述逻辑回归模型对应的第二参数,得到已训练的逻辑回归模型。Input the first output vector and the second output vector to the initial logistic regression model, and calculate a loss function according to the obtained output risk index and the abnormality determination result of the training object, when the loss function is less than a preset threshold , It is determined that the corresponding second parameter is the second parameter corresponding to the logistic regression model, and the trained logistic regression model is obtained.
  5. 如权利要求3所述的方法,其特征在于,所述神经网络模型和所述逻辑回归模型利用以下方式进行训练:The method of claim 3, wherein the neural network model and the logistic regression model are trained in the following manner:
    获取历史时间段内的训练样本数据;Obtain the training sample data in the historical time period;
    根据所述训练样本数据,选择训练对象对应于所述神经网络模型的第一训练特征,并确定第一训练特征对应的第一训练特征值;According to the training sample data, selecting a training object corresponding to the first training feature of the neural network model, and determining the first training feature value corresponding to the first training feature;
    将所述第一训练特征值输入初始神经网络模型,得到所述训练对象的第一输出向量;Input the first training feature value into an initial neural network model to obtain a first output vector of the training object;
    根据所述训练样本数据,选择训练对象对应于所述规则算法的第二训练特征,并确定第二训练特征对应的第二训练特征值;According to the training sample data, selecting a training object corresponding to the second training feature of the rule algorithm, and determining a second training feature value corresponding to the second training feature;
    将第二训练特征值输入所述规则算法中,得到所述训练对象的第二输出向量;Input the second training feature value into the rule algorithm to obtain the second output vector of the training object;
    将所述第一输出向量和所述第二输出向量输入初始逻辑回归模型,并根 据得到的输出风险指数及所述训练对象的异常判定结果计算损失函数,当所述损失函数小于预设阈值时,确定对应的第一参数为所述神经网络模型对应的第一参数,得到已训练的神经网络模型,并确定对应的第二参数为所述逻辑回归模型对应的第二参数,得到已训练的逻辑回归模型。Input the first output vector and the second output vector to the initial logistic regression model, and calculate a loss function according to the obtained output risk index and the abnormality determination result of the training object, when the loss function is less than a preset threshold , Determine that the corresponding first parameter is the first parameter corresponding to the neural network model to obtain the trained neural network model, and determine the corresponding second parameter to be the second parameter corresponding to the logistic regression model, to obtain the trained neural network model Logistic regression model.
  6. 如权利要求1所述的方法,其特征在于,所述第一机器学习模型包括多个不同的机器学习子模型。The method of claim 1, wherein the first machine learning model includes a plurality of different machine learning sub-models.
  7. 如权利要求1至6任一项所述的方法,其特征在于,还包括:The method according to any one of claims 1 to 6, further comprising:
    获取规则算法中的所有判断逻辑;Get all the judgment logic in the rule algorithm;
    从所述第二机器学习模型中获取每个判断逻辑对应的权重参数;Obtaining a weight parameter corresponding to each judgment logic from the second machine learning model;
    针对每个判断逻辑,根据所述判断逻辑与其它判断逻辑之间的关系,以及判断逻辑对应的权重参数,确定所述判断逻辑的合理性。For each judgment logic, the rationality of the judgment logic is determined according to the relationship between the judgment logic and other judgment logics, and the weight parameters corresponding to the judgment logic.
  8. 一种数据异常检测装置,其特征在于,包括:A data abnormality detection device, which is characterized in that it comprises:
    获取单元,用于获取待测对象的检测样本数据;The acquiring unit is used to acquire the test sample data of the object to be tested;
    处理单元,用于根据所述检测样本数据,确定所述待测对象对应于第一机器学习模型的第一检测特征值,以及对应于规则算法的第二检测特征值,所述规则算法中包含至少一个判断逻辑;The processing unit is configured to determine, according to the detection sample data, that the object to be tested corresponds to a first detection feature value of the first machine learning model, and a second detection feature value corresponding to a rule algorithm, the rule algorithm includes At least one judgment logic;
    计算单元,用于将所述第一机器学习模型对应的第一检测特征值输入已训练的机器学习模型,得到所述待测对象的第一输出向量,并且将所述规则算法对应的第二检测特征值输入所述规则算法中,得到所述待测对象的第二输出向量;The calculation unit is configured to input the first detection feature value corresponding to the first machine learning model into the trained machine learning model to obtain the first output vector of the object to be tested, and to transfer the second output vector corresponding to the rule algorithm The detection feature value is input into the rule algorithm to obtain the second output vector of the object to be tested;
    输出单元,用于将所述第一输出向量和第二输出向量输入已训练的第二机器学习模型,确定所述待测对象的输出风险指数;An output unit, configured to input the first output vector and the second output vector into the trained second machine learning model to determine the output risk index of the object to be tested;
    判定单元,用于根据所述输出风险指数,确定所述待测对象的异常判定结果。The determining unit is configured to determine the abnormal determination result of the object to be tested according to the output risk index.
  9. 如权利要求8所述的装置,其特征在于,所述第二输出向量包含至少一个输出标识;所述计算单元,具体用于:The device according to claim 8, wherein the second output vector includes at least one output identifier; and the calculation unit is specifically configured to:
    确定判定结果与输出标识的对应关系;Determine the corresponding relationship between the judgment result and the output identification;
    针对所述规则算法中的每一个判断逻辑,利用对应的第二检测特征值,根据所述判断逻辑进行判定,得到对应的判定结果,并根据所述判定结果确定对应的输出标识;For each judgment logic in the rule algorithm, use the corresponding second detection characteristic value to make a judgment according to the judgment logic to obtain a corresponding judgment result, and determine the corresponding output identifier according to the judgment result;
    按照预定顺序将所有输出标识组成所述第二输出向量。All output identifiers are combined into the second output vector in a predetermined order.
  10. 如权利要求8所述的装置,其特征在于,所述第一机器学习模型为神经网络模型,所述第二机器学习模型为逻辑回归模型。8. The device of claim 8, wherein the first machine learning model is a neural network model, and the second machine learning model is a logistic regression model.
  11. 如权利要求10所述的装置,其特征在于,还包括训练单元,用于利用以下方式训练所述神经网络模型:10. The device of claim 10, further comprising a training unit for training the neural network model in the following manner:
    获取历史时间段内的训练样本数据;Obtain the training sample data in the historical time period;
    根据所述训练样本数据,选择训练对象对应于所述神经网络模型的第一训练特征,并确定第一训练特征对应的第一训练特征值;According to the training sample data, selecting a training object corresponding to the first training feature of the neural network model, and determining the first training feature value corresponding to the first training feature;
    将所述第一训练特征值输入初始神经网络模型,并根据得到的机器风险指数及所述训练对象的异常判定结果计算损失函数,当所述损失函数小于预设阈值时,确定对应的第一参数为所述神经网络模型对应的第一参数,得到已训练的神经网络模型;The first training feature value is input into the initial neural network model, and the loss function is calculated according to the obtained machine risk index and the abnormal determination result of the training object. When the loss function is less than a preset threshold, the corresponding first The parameter is the first parameter corresponding to the neural network model, and the trained neural network model is obtained;
    所述训练单元,还用于利用以下方式训练所述逻辑回归模型:The training unit is also used to train the logistic regression model in the following manner:
    从所述已训练的神经网络模型中获取训练对象的第一输出向量;Obtaining the first output vector of the training object from the trained neural network model;
    根据所述训练样本数据,选择训练对象对应于所述规则算法的第二训练特征,并确定第二训练特征对应的第二训练特征值;According to the training sample data, selecting a training object corresponding to the second training feature of the rule algorithm, and determining a second training feature value corresponding to the second training feature;
    将第二训练特征值输入所述规则算法中,得到所述训练对象的第二输出向量;Input the second training feature value into the rule algorithm to obtain the second output vector of the training object;
    将所述第一输出向量和所述第二输出向量输入初始逻辑回归模型,并根据得到的输出风险指数及所述训练对象的异常判定结果计算损失函数,当所述损失函数小于预设阈值时,确定对应的第二参数为所述逻辑回归模型对应的第二参数,得到已训练的逻辑回归模型。Input the first output vector and the second output vector to the initial logistic regression model, and calculate a loss function according to the obtained output risk index and the abnormality determination result of the training object, when the loss function is less than a preset threshold , It is determined that the corresponding second parameter is the second parameter corresponding to the logistic regression model, and the trained logistic regression model is obtained.
  12. 如权利要求10所述的装置,其特征在于,所述训练单元,还用于利用以下方式训练所述神经网络模型和所述逻辑回归模型:The device according to claim 10, wherein the training unit is further configured to train the neural network model and the logistic regression model in the following manner:
    获取历史时间段内的训练样本数据;Obtain the training sample data in the historical time period;
    根据所述训练样本数据,选择训练对象对应于所述神经网络模型的第一训练特征,并确定第一训练特征对应的第一训练特征值;According to the training sample data, selecting a training object corresponding to the first training feature of the neural network model, and determining the first training feature value corresponding to the first training feature;
    将所述第一训练特征值输入初始神经网络模型,得到所述训练对象的第一输出向量;Input the first training feature value into an initial neural network model to obtain a first output vector of the training object;
    根据所述训练样本数据,选择训练对象对应于所述规则算法的第二训练特征,并确定第二训练特征对应的第二训练特征值;According to the training sample data, selecting a training object corresponding to the second training feature of the rule algorithm, and determining a second training feature value corresponding to the second training feature;
    将第二训练特征值输入所述规则算法中,得到所述训练对象的第二输出向量;Input the second training feature value into the rule algorithm to obtain the second output vector of the training object;
    将所述第一输出向量和所述第二输出向量输入初始逻辑回归模型,并根据得到的输出风险指数及所述训练对象的异常判定结果计算损失函数,当所述损失函数小于预设阈值时,确定对应的第一参数为所述神经网络模型对应的第一参数,得到已训练的神经网络模型,并确定对应的第二参数为所述逻辑回归模型对应的第二参数,得到已训练的逻辑回归模型。Input the first output vector and the second output vector to the initial logistic regression model, and calculate a loss function according to the obtained output risk index and the abnormality determination result of the training object, when the loss function is less than a preset threshold , Determine that the corresponding first parameter is the first parameter corresponding to the neural network model to obtain the trained neural network model, and determine the corresponding second parameter to be the second parameter corresponding to the logistic regression model, to obtain the trained neural network model Logistic regression model.
  13. 如权利要求8所述的装置,其特征在于,所述第一机器学习模型包括多个不同的机器学习子模型。The apparatus of claim 8, wherein the first machine learning model includes a plurality of different machine learning sub-models.
  14. 如权利要求8至13任一项所述的装置,其特征在于,还包括分析单元,用于:The device according to any one of claims 8 to 13, characterized in that it further comprises an analysis unit for:
    获取规则算法中的所有判断逻辑;Get all the judgment logic in the rule algorithm;
    从所述第二机器学习模型中获取每个判断逻辑对应的权重参数;Obtaining a weight parameter corresponding to each judgment logic from the second machine learning model;
    针对每个判断逻辑,根据所述判断逻辑与其它判断逻辑之间的关系,以及判断逻辑对应的权重参数,确定所述判断逻辑的合理性。For each judgment logic, the rationality of the judgment logic is determined according to the relationship between the judgment logic and other judgment logics, and the weight parameters corresponding to the judgment logic.
  15. 一种电子设备,其特征在于,包括:An electronic device, characterized in that it comprises:
    至少一个处理器;以及,At least one processor; and,
    与所述至少一个处理器通信连接的存储器;其中,A memory communicatively connected with the at least one processor; wherein,
    所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行权利要求1-7任一 所述的方法。The memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor, so that the at least one processor can execute the method according to any one of claims 1-7 .
  16. 一种非暂态计算机可读存储介质,其特征在于,所述非暂态计算机可读存储介质存储计算机指令,所述计算机指令用于使所述计算机执行权利要求1~7任一所述的方法。A non-transitory computer-readable storage medium, wherein the non-transitory computer-readable storage medium stores computer instructions, and the computer instructions are used to cause the computer to execute any one of claims 1-7 method.
PCT/CN2020/118432 2019-12-19 2020-09-28 Method and device for detecting data abnormality WO2021120775A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201911317683.2A CN111126622B (en) 2019-12-19 2019-12-19 Data anomaly detection method and device
CN201911317683.2 2019-12-19

Publications (1)

Publication Number Publication Date
WO2021120775A1 true WO2021120775A1 (en) 2021-06-24

Family

ID=70500935

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/118432 WO2021120775A1 (en) 2019-12-19 2020-09-28 Method and device for detecting data abnormality

Country Status (2)

Country Link
CN (1) CN111126622B (en)
WO (1) WO2021120775A1 (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110363644A (en) * 2019-06-17 2019-10-22 深圳壹账通智能科技有限公司 Exception information recognition methods, device, computer equipment and storage medium
CN113486302A (en) * 2021-07-12 2021-10-08 浙江网商银行股份有限公司 Data processing method and device
CN113709223A (en) * 2021-08-18 2021-11-26 上海明略人工智能(集团)有限公司 Method and device for monitoring platform abnormity, electronic equipment and readable storage medium
CN114051026A (en) * 2021-10-12 2022-02-15 青岛民航凯亚系统集成有限公司 Cloud commanding and dispatching and airport local sharing interaction management system and method
CN114500038A (en) * 2022-01-24 2022-05-13 深信服科技股份有限公司 Network security detection method and device, electronic equipment and readable storage medium
CN114878651A (en) * 2022-05-18 2022-08-09 西安热工研究院有限公司 Big data based condensate water quality deterioration early warning system and method
CN114996821A (en) * 2022-06-28 2022-09-02 中建八局装饰工程有限公司 Curtain wall cavity air tightness judgment method
CN116245256A (en) * 2023-04-23 2023-06-09 湖州新江浩电子有限公司 Multi-factor-combined capacitor quality prediction method, system and storage medium
CN117350548A (en) * 2023-12-04 2024-01-05 国网浙江省电力有限公司宁波供电公司 Power distribution equipment potential safety hazard investigation method
CN117556362A (en) * 2024-01-10 2024-02-13 玻尔兹曼(广州)科技有限公司 Measurement data abnormity supervision system and method based on data analysis
CN117556362B (en) * 2024-01-10 2024-04-30 玻尔兹曼(广州)科技有限公司 Measurement data abnormity supervision system and method based on data analysis

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111126622B (en) * 2019-12-19 2023-11-03 中国银联股份有限公司 Data anomaly detection method and device
CN113839904B (en) * 2020-06-08 2023-08-22 北京梆梆安全科技有限公司 Security situation awareness method and system based on intelligent network-connected automobile
CN111816312B (en) * 2020-09-14 2021-02-26 杭州憶盛医疗科技有限公司 Health state detection method and equipment based on model interpretation and readable storage medium
CN111883222B (en) * 2020-09-28 2020-12-22 平安科技(深圳)有限公司 Text data error detection method and device, terminal equipment and storage medium
CN112491820B (en) * 2020-11-12 2022-07-29 新华三技术有限公司 Abnormity detection method, device and equipment
CN112529695A (en) * 2020-12-23 2021-03-19 招联消费金融有限公司 Credit risk determination method, credit risk determination device, computer equipment and storage medium
CN112907949B (en) * 2021-01-20 2022-11-22 北京百度网讯科技有限公司 Traffic anomaly detection method, model training method and device
CN112819156A (en) * 2021-01-26 2021-05-18 支付宝(杭州)信息技术有限公司 Data processing method, device and equipment

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107645403A (en) * 2016-07-22 2018-01-30 阿里巴巴集团控股有限公司 Terminal rule engine apparatus, terminal rule operation method
CN107766418A (en) * 2017-09-08 2018-03-06 广州汪汪信息技术有限公司 A kind of credit estimation method based on Fusion Model, electronic equipment and storage medium
CN108038701A (en) * 2018-03-20 2018-05-15 杭州恩牛网络技术有限公司 A kind of integrated study is counter to cheat test method and system
CN108197280A (en) * 2018-01-10 2018-06-22 上海电气集团股份有限公司 Mining ability evaluation method based on industrial equipment data
CN110414554A (en) * 2019-06-18 2019-11-05 浙江大学 One kind being based on the improved Stacking integrated study fish identification method of multi-model
CN111126622A (en) * 2019-12-19 2020-05-08 中国银联股份有限公司 Data anomaly detection method and device

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104794192B (en) * 2015-04-17 2018-06-08 南京大学 Multistage method for detecting abnormality based on exponential smoothing, integrated study model
US10061300B1 (en) * 2017-09-29 2018-08-28 Xometry, Inc. Methods and apparatus for machine learning predictions and multi-objective optimization of manufacturing processes
CN108896996B (en) * 2018-05-11 2019-09-20 中南大学 A kind of Pb-Zn deposits absorbing well, absorption well water sludge interface ultrasonic echo signal classification method based on random forest
US20190378050A1 (en) * 2018-06-12 2019-12-12 Bank Of America Corporation Machine learning system to identify and optimize features based on historical data, known patterns, or emerging patterns
CN109615020A (en) * 2018-12-25 2019-04-12 深圳前海微众银行股份有限公司 Characteristic analysis method, device, equipment and medium based on machine learning model

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107645403A (en) * 2016-07-22 2018-01-30 阿里巴巴集团控股有限公司 Terminal rule engine apparatus, terminal rule operation method
CN107766418A (en) * 2017-09-08 2018-03-06 广州汪汪信息技术有限公司 A kind of credit estimation method based on Fusion Model, electronic equipment and storage medium
CN108197280A (en) * 2018-01-10 2018-06-22 上海电气集团股份有限公司 Mining ability evaluation method based on industrial equipment data
CN108038701A (en) * 2018-03-20 2018-05-15 杭州恩牛网络技术有限公司 A kind of integrated study is counter to cheat test method and system
CN110414554A (en) * 2019-06-18 2019-11-05 浙江大学 One kind being based on the improved Stacking integrated study fish identification method of multi-model
CN111126622A (en) * 2019-12-19 2020-05-08 中国银联股份有限公司 Data anomaly detection method and device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
LI QIAO: "Research and Application of Model Fusion Algorithm", MASTER THESIS, 15 June 2017 (2017-06-15), pages 1 - 40, XP009528696 *

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110363644A (en) * 2019-06-17 2019-10-22 深圳壹账通智能科技有限公司 Exception information recognition methods, device, computer equipment and storage medium
CN113486302A (en) * 2021-07-12 2021-10-08 浙江网商银行股份有限公司 Data processing method and device
CN113709223A (en) * 2021-08-18 2021-11-26 上海明略人工智能(集团)有限公司 Method and device for monitoring platform abnormity, electronic equipment and readable storage medium
CN114051026A (en) * 2021-10-12 2022-02-15 青岛民航凯亚系统集成有限公司 Cloud commanding and dispatching and airport local sharing interaction management system and method
CN114500038A (en) * 2022-01-24 2022-05-13 深信服科技股份有限公司 Network security detection method and device, electronic equipment and readable storage medium
CN114878651B (en) * 2022-05-18 2024-03-08 西安热工研究院有限公司 Condensate water quality deterioration early warning system and method based on big data
CN114878651A (en) * 2022-05-18 2022-08-09 西安热工研究院有限公司 Big data based condensate water quality deterioration early warning system and method
CN114996821A (en) * 2022-06-28 2022-09-02 中建八局装饰工程有限公司 Curtain wall cavity air tightness judgment method
CN116245256A (en) * 2023-04-23 2023-06-09 湖州新江浩电子有限公司 Multi-factor-combined capacitor quality prediction method, system and storage medium
CN117350548A (en) * 2023-12-04 2024-01-05 国网浙江省电力有限公司宁波供电公司 Power distribution equipment potential safety hazard investigation method
CN117350548B (en) * 2023-12-04 2024-04-16 国网浙江省电力有限公司宁波供电公司 Power distribution equipment potential safety hazard investigation method
CN117556362A (en) * 2024-01-10 2024-02-13 玻尔兹曼(广州)科技有限公司 Measurement data abnormity supervision system and method based on data analysis
CN117556362B (en) * 2024-01-10 2024-04-30 玻尔兹曼(广州)科技有限公司 Measurement data abnormity supervision system and method based on data analysis

Also Published As

Publication number Publication date
CN111126622B (en) 2023-11-03
CN111126622A (en) 2020-05-08

Similar Documents

Publication Publication Date Title
WO2021120775A1 (en) Method and device for detecting data abnormality
CN108960303B (en) Unmanned aerial vehicle flight data anomaly detection method based on LSTM
JP2018538587A (en) Risk assessment method and system
US11650968B2 (en) Systems and methods for predictive early stopping in neural network training
WO2021098384A1 (en) Data abnormality detection method and apparatus
JP6595718B2 (en) Credit score model training method, credit score calculation method, apparatus and server
US20190220924A1 (en) Method and device for determining key variable in model
CN114036531A (en) Multi-scale code measurement-based software security vulnerability detection method
WO2020057283A1 (en) Unsupervised model evaluation method and device, server and readable storage medium
CN115051929A (en) Network fault prediction method and device based on self-supervision target perception neural network
CN111126566B (en) Abnormal furniture layout data detection method based on GAN model
Wang et al. A knowledge discovery case study of software quality prediction: Isbsg database
CN112738098A (en) Anomaly detection method and device based on network behavior data
CN111027318A (en) Industry classification method, device, equipment and storage medium based on big data
CN113486595A (en) Intelligent blowout early warning method, system, equipment and storage medium
CN109933579B (en) Local K neighbor missing value interpolation system and method
CN108121993A (en) A kind of data processing method and device
CN109978038A (en) A kind of cluster abnormality determination method and device
CN111026661A (en) Method and system for comprehensively testing usability of software
JP2021012600A (en) Method for diagnosis, method for learning, learning device, and program
RU2764873C1 (en) Method for detecting abnormalities in information and communication systems
CN114510518B (en) Self-adaptive aggregation method and system for massive structured data and electronic equipment
CN113542276B (en) Method and system for detecting intrusion target of hybrid network
CN115965823B (en) Online difficult sample mining method and system based on Focal loss function
CN116661954B (en) Virtual machine abnormality prediction method, device, communication equipment and storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20901087

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20901087

Country of ref document: EP

Kind code of ref document: A1