CN116010792A - Method and device for testing robustness of model - Google Patents

Method and device for testing robustness of model Download PDF

Info

Publication number
CN116010792A
CN116010792A CN202211737623.8A CN202211737623A CN116010792A CN 116010792 A CN116010792 A CN 116010792A CN 202211737623 A CN202211737623 A CN 202211737623A CN 116010792 A CN116010792 A CN 116010792A
Authority
CN
China
Prior art keywords
samples
sample
correct
risk
behavior
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211737623.8A
Other languages
Chinese (zh)
Inventor
李志峰
崔世文
孟昌华
王维强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alipay Hangzhou Information Technology Co Ltd
Original Assignee
Alipay Hangzhou Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alipay Hangzhou Information Technology Co Ltd filed Critical Alipay Hangzhou Information Technology Co Ltd
Priority to CN202211737623.8A priority Critical patent/CN116010792A/en
Publication of CN116010792A publication Critical patent/CN116010792A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Image Analysis (AREA)

Abstract

The embodiment of the specification provides a method and a device for testing robustness of a model, wherein the method comprises the following steps: acquiring a plurality of first samples, wherein each first sample comprises a plurality of user behaviors in sequence, and each user behavior comprises a plurality of behavior attributes; performing first processing on the plurality of first samples to obtain a plurality of second samples, wherein the first processing on a third sample included in the plurality of first samples includes: confirming at least one first attribute from the behavior attributes of the user behavior included in the third sample; setting the first attribute as a first default value, wherein the first default value is used for indicating that the value of the first attribute is missing, so as to obtain a fourth sample, and the plurality of second samples comprise the fourth sample; determining a first correct recognition rate of the first risk recognition model based on the plurality of first samples; determining a second correct recognition rate of the first risk recognition model based on the plurality of second samples; and determining a first index for the first risk identification model according to the first correct identification rate and the second correct identification rate, wherein the first index is used for indicating the robustness of the first risk identification model.

Description

Method and device for testing robustness of model
Technical Field
One or more embodiments of the present specification relate to the field of machine learning, and more particularly, to a method and apparatus for testing model robustness.
Background
There are problems with risk control in many industries, such as electronic payment and transaction service enterprises, where operators often need to analyze whether there is a risk of illegal operations, particularly illegal transactions, based on a sequence of user actions of the user. In recent years, the use of machine learning models to identify the risk that user behavior may carry has achieved good results. However, training and testing samples for sequences of user behavior often do not completely cover the sample space, as sequence samples often suffer from data sparsity issues. Therefore, there is a problem that the recognition performance is not stable in actual operation from the trained model.
Therefore, in order to more accurately evaluate the stability of the recognition performance in the actual running of the model, a new solution for testing the robustness of the model is needed.
Disclosure of Invention
The embodiment of the specification aims to provide a novel method for testing the robustness of a model, by the method, a simulated environment disturbance sample and an interference behavior sample can be obtained through simulated operation environment disturbance and interference behavior disturbance applied to a user behavior sequence sample, the simulated environment disturbance sample and the interference behavior sample are used for testing a user behavior risk identification model, the robustness index of the identification model aiming at user risk behaviors in actual operation can be accurately tested, and the defects in the prior art are overcome.
According to a first aspect, there is provided a method of testing robustness of a model, comprising:
acquiring a plurality of first samples, wherein each first sample comprises a plurality of user behaviors in sequence, and each user behavior comprises a plurality of behavior attributes;
performing first processing on the plurality of first samples to obtain a plurality of second samples, wherein the first processing on a third sample included in the plurality of first samples includes:
confirming at least one first attribute from the behavior attributes of the user behaviors included in the third sample; setting the first attribute as a preset first default value, wherein the first default value is used for indicating that the value of the first attribute is missing, so as to obtain a fourth sample, and the plurality of second samples comprise the fourth sample;
determining a first correct recognition rate of the first risk recognition model based on the plurality of first samples; determining a second correct recognition rate of the first risk recognition model based on the plurality of second samples; and determining a first index for the first risk identification model according to the first correct identification rate and the second correct identification rate, wherein the first index is used for indicating the robustness of the first risk identification model.
In one possible implementation manner, identifying at least one first attribute from the behavior attributes of the operation behaviors included in the third sample includes:
at least one first attribute is identified from the behavioral attributes of the operational behavior comprised by the third sample based on a random algorithm.
In one possible implementation, determining a first correct recognition rate of the first risk recognition model based on the plurality of first samples includes:
inputting the plurality of first samples into the first risk recognition model, acquiring recognition results corresponding to the plurality of first samples, determining the correct recognition quantity and the incorrect recognition quantity corresponding to the plurality of first samples according to the risk labels and the recognition results corresponding to the plurality of first samples, and determining the first correct recognition rate of the first risk recognition model according to the correct recognition quantity and the incorrect recognition quantity corresponding to the plurality of first samples.
In one possible implementation, determining the first indicator for the first risk identification model according to the first correct identification rate and the second correct identification rate includes:
a first indicator for the first risk identification model is determined based on a difference between the first correct identification rate and the second correct identification rate.
In one possible embodiment, the method further comprises:
performing second processing on the first samples respectively to obtain a fifth samples, wherein the second processing on a third sample included in the first samples comprises:
confirming at least one second attribute from the behavior attributes of the user behaviors included in the third sample; adding a preset offset value to the value of the second attribute to obtain a sixth user sample, wherein the plurality of fifth samples comprise a sixth sample;
determining a third correct recognition rate of the first risk recognition model based on the plurality of fifth samples; and determining a first index aiming at the first risk identification model according to the first correct identification rate and the third correct identification rate.
In one possible embodiment, the method further comprises:
performing third processing on the first samples respectively to obtain seventh samples, wherein the third processing on the third samples included in the first samples comprises:
confirming at least one third attribute from the behavior attributes of the user behaviors included in the third sample; applying random disturbance to the value of the third attribute to obtain an eighth sample, wherein the seventh samples comprise eighth samples;
Determining a fourth correct recognition rate of the first risk recognition model based on the plurality of eighth samples; and determining the robustness of the first risk identification model according to the first correct identification rate and the fourth correct identification rate.
According to a second aspect, there is provided a method of testing robustness of a model, comprising:
acquiring a plurality of first samples, wherein each first sample comprises a plurality of user behaviors in sequence, and each user behavior comprises a plurality of behavior attributes;
performing fourth processing on the first samples respectively to obtain a ninth samples, wherein the fourth processing on the third samples included in the first samples comprises:
determining a non-critical behavior subsequence in a third sample based on a predetermined critical behavior subsequence, and applying predetermined disturbance to the non-critical behavior subsequence in the third sample to obtain a tenth sample, wherein the tenth sample is included in the plurality of ninth samples;
determining a first correct recognition rate of the first risk recognition model based on the plurality of first samples; determining a fifth correct recognition rate of the first risk recognition model based on the plurality of tenth samples; and determining a first index for the first risk identification model according to the first correct identification rate and the fifth correct identification rate, wherein the first index is used for indicating the robustness of the first risk identification model.
In a possible implementation, the non-critical behavior subsequence includes a first subsequence preceding the critical subsequence in a third sample;
applying a predetermined perturbation to a non-critical behavior sub-sequence in the third sample, comprising:
inserting a number of user actions determined randomly in said first sub-sequence.
According to a third aspect, there is provided an apparatus for testing robustness of a model, comprising:
a first sample acquisition unit configured to acquire a plurality of first samples each including a plurality of user behaviors in sequence, the user behaviors including a plurality of behavior attributes;
a first processing unit configured to perform first processing on the plurality of first samples to obtain a plurality of second samples, wherein the first processing unit includes a first sub-processing unit for first processing on a third sample included in the plurality of first samples:
the first sub-processing unit is configured to confirm at least one first attribute from the behavior attributes of the user behaviors included in the third sample; setting the first attribute as a preset first default value, wherein the first default value is used for indicating that the value of the first attribute is missing, so as to obtain a fourth sample, and the plurality of second samples comprise the fourth sample;
A first result determination unit configured to determine a first correct recognition rate of the first risk recognition model based on the plurality of first samples; determining a second correct recognition rate of the first risk recognition model based on the plurality of second samples; and determining a first index for the first risk identification model according to the first correct identification rate and the second correct identification rate, wherein the first index is used for indicating the robustness of the first risk identification model.
According to a fourth aspect, there is provided an apparatus for testing robustness of a model, comprising:
a second sample acquiring unit configured to acquire a plurality of first samples, each of the first samples including a plurality of user behaviors in sequence, the user behaviors including a plurality of behavior attributes;
a second processing unit configured to perform fourth processing on the plurality of first samples, respectively, to obtain a plurality of ninth samples, wherein the second processing unit includes a second sub-processing unit configured to perform fourth processing on twelfth samples included in the plurality of first samples:
the second sub-processing unit is configured to determine a non-critical behavior sub-sequence in a third sample based on a predetermined critical behavior sub-sequence, and apply a predetermined disturbance to the non-critical behavior sub-sequence in the third sample to obtain a tenth sample, wherein the tenth samples are included in the plurality of ninth samples;
A second result determination unit configured to determine a first correct recognition rate of the first risk recognition model based on the plurality of first samples; determining a fifth correct recognition rate of the first risk recognition model based on the plurality of tenth samples; and determining a first index for the first risk identification model according to the first correct identification rate and the fifth correct identification rate, wherein the first index is used for indicating the robustness of the first risk identification model.
According to a fifth aspect, there is provided a computer readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the method of the first and second aspects.
According to a sixth aspect, there is provided a computing device comprising a memory having executable code stored therein and a processor which when executing the executable code implements the methods of the first and second aspects.
By using one or more of the methods, the devices, the computing equipment and the storage media in the aspects, the robustness index of the identification model aiming at the risk behaviors of the user in actual operation can be accurately tested.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 shows a schematic diagram of a method of testing model robustness according to an embodiment of the present description;
FIG. 2 shows a flow chart of a method of testing model robustness according to an embodiment of the present description;
FIG. 3 shows a schematic diagram of a method of testing model robustness according to an embodiment of the present description;
FIG. 4 shows a schematic diagram of a method of testing model robustness according to another embodiment of the present description;
FIG. 5 shows a schematic diagram of a method of testing model robustness according to a further embodiment of the present description;
FIG. 6 illustrates a flow chart of a method of testing model robustness in accordance with another embodiment of the present description;
FIG. 7 shows a schematic diagram of a method of testing model robustness according to yet another embodiment of the present description;
FIG. 8 shows a block diagram of an apparatus for testing model robustness according to an embodiment of the present disclosure;
Fig. 9 is a block diagram showing an apparatus for testing robustness of a model according to another embodiment of the present specification.
Detailed Description
The present invention will be described below with reference to the drawings.
As previously mentioned, there is a need in many industries for operational risk control. For example, in electronic payment and transaction service enterprises, operators often need to analyze whether there is a risk of illegal operations, particularly illegal transactions, therein according to a sequence of user's operational actions. In general, the sequence of actions of a user within a client application can be very complex, and may include, for example, various complex actions of registering, logging in, clicking on, accessing, submitting requests, purchasing, etc. In order to submit recognition efficiency, it has been a common technical solution in recent years to recognize potential risks brought by user operation behaviors by training a risk recognition model based on machine learning. Conventionally, the recognition performance of the trained machine learning model is tested and verified by a test sample set. However, due to the data sparsity problem that often exists with training and testing sample sets of risk identification models-user operational sequence sample sets, i.e., the samples in the training and testing sample sets typically do not completely cover the entire sample space or all of the possibilities of user operational sequence data. Thus, the model performance index (e.g., recognition accuracy) that is tested by the test sample set alone is not necessarily stable during actual operation of the model. Therefore, a solution is needed that can more accurately test the stability, or robustness, of the recognition performance of the model in actual operation.
In order to solve the above technical problems, the inventors have studied on factors that cause unstable recognition performance of a model in actual operation. It was found through research that during long-term operation of the model, there are two types of disturbances to the user behavior sequence data that can lead to unstable recognition performance of the model. The first type of disturbance is a disturbance of behavioral data caused by non-human factors in the operation of the model, and is also called environmental disturbance. For example, a part of collected behavior attributes (or behavior characteristics) caused by environment abnormality are missing in model operation, the distribution of the part of behavior attribute values drift with time, noise exists in the collection of some behavior attributes, and the like. The second type of disturbance is a disturbance of behavior data caused by human factors in model operation, and such disturbance is also called artificial disturbance. For example, an unlawful molecule inserts an interfering user behavior into a sequence of user behaviors for unlawful purposes, so that the sequence is changed but the execution results are essentially the same, and the difficulty of identifying the risk of behavior by the identification model is increased.
The recognition performance of the comprehensive test model under the disturbance can more accurately judge the robustness (or stability) of the recognition model in actual operation. To this end, the present embodiment provides a method for testing robustness of a model, and fig. 1 is a schematic diagram illustrating a method for testing robustness of a model according to the present embodiment. As shown in fig. 1a, in an example, a simulated environmental disturbance may be applied to a plurality of user behavior sequence samples to obtain a plurality of simulated environmental disturbance samples, and the original user behavior sequence samples and the environmental disturbance samples are used to test a risk recognition model to obtain a correct recognition rate (for example, correct recognition a) of the model to the original user behavior sequence samples and a correct recognition rate (for example, correct recognition B) to the environmental disturbance samples, so that a robustness index of the model operation may be determined according to the correct recognition rate a and the correct recognition rate B, and in an example, the robustness index may be determined according to a difference between the correct recognition rate a and the correct recognition rate B. In another example, as shown in fig. 1b, simulated artificial disturbance may be applied to a plurality of user behavior sequence samples to obtain a plurality of simulated artificial disturbance samples, and the risk recognition model is tested by using the original behavior sequence samples and the artificial disturbance samples to obtain a correct recognition rate a of the model to the original user behavior sequence samples and a correct recognition rate (for example, a correct recognition rate C) to the artificial disturbance samples, so as to determine a robustness index of the model operation according to the correct recognition rate a and the correct recognition rate C. In another example, the robustness index of the model operation may also be determined in combination with the correct recognition rate a, and the correct recognition rate a.
The method has the following advantages: on one hand, the method can test the stability or robustness of the performance of the model under the condition by applying environmental disturbance similar to the actual running environment to the user behavior sequence sample and using the disturbed sample to more accurately test the identification performance of the model under the common environmental disturbance in the actual running. According to the method, the artificial disturbance similar to the actual running environment can be applied to the user behavior sequence sample, and the recognition performance of the model when the artificial disturbance occurs in the actual running can be tested more accurately by using the disturbed sample, so that the stability of the recognition performance of the model under the condition of the artificial disturbance can be tested. In conclusion, by using the method, the actual recognition performance of the model under the environment and artificial disturbance, which has great influence on the model recognition performance in the actual operation of the model, can be tested, so that the stability of the model in the actual operation can be more comprehensively and accurately confirmed.
The detailed procedure of the method is further described below. Fig. 2 shows a flow chart of a method of testing model robustness according to an embodiment of the present description. As shown in fig. 2, the method at least comprises the following steps:
Step S21, a plurality of first samples are obtained, wherein each first sample comprises a plurality of operation behaviors in sequence, and each operation behavior comprises a plurality of behavior attributes;
step S23, performing first processing on the plurality of first samples respectively to obtain a plurality of second samples.
Step S25, determining a first correct recognition rate of a first risk recognition model based on the plurality of first samples; determining a second correct recognition rate of the first risk recognition model based on the plurality of second samples; and determining a first index for the first risk identification model according to the first correct identification rate and the second correct identification rate, wherein the first index is used for indicating the robustness of the first risk identification model.
First, in step 21, a plurality of first samples are obtained, the first samples each comprising a sequential plurality of operational behaviors including a number of behavior attributes.
In this step, the first sample is a user behavior sequence sample, and each user behavior sequence sample may include a plurality of user operation behaviors in turn. In different embodiments, the plurality of user actions may be, for example, user operation actions or actions directed to different specific applications, services, user terminals, operation interfaces. Each user behavior may include several behavior attributes. In different embodiments, the behavior attributes of the same or different operational behaviors obtained may be different. In one example, the behavior attributes may include, for example, one or more of a behavior object, a behavior time, a behavior identification, a behavior category, a behavior implementer, and the like.
In different embodiments, different specific ways of extracting or intercepting user behavior may also be employed. The present specification focuses on the processing procedure after the user behavior sequence is obtained, but the specific manner in which the user behavior is applied to what kind of object, or the user behavior is extracted, is not focused on, nor is it limited.
According to one embodiment, the user behavior may be a user business behavior for a target business. Thus, in one embodiment, a plurality of user behavior sequences may be obtained, wherein each user behavior sequence comprises a sequential plurality of user business behaviors for a target business. In a specific embodiment, the target service may be a service with a known risk, such as a known service with a risk of fraudulent use of an account, fraud, credit escrow, etc.
After the plurality of first samples are acquired, a first process may be performed on the plurality of first samples, respectively, to obtain a plurality of second samples in step S23.
As previously mentioned, environmental disturbances in the operation of a model are often factors that lead to unstable recognition performance of the model. Specifically, in some scenarios, situations arise in the model run where the acquisition of user behavior attribute data fails over time, for example, because of excessive load pressure. The situation often causes that the collected part of user behavior attributes are null values or default values without practical significance, which are also called attribute missing or feature missing, and can cause deviation of recognition results of the model and influence recognition performance of the model. In order to simulate the user behavior sequence data with missing attributes, in this step, a plurality of second samples are obtained for the first processing performed on the plurality of first samples, respectively. In particular, the first processing for each sample (e.g., a third sample) included in the plurality of first samples may include sub-step S231: confirming at least one first attribute from the behavior attributes of the user behaviors included in the third sample; and setting the first attribute as a preset first default value, wherein the first default value is used for indicating that the value of the first attribute is missing, so as to obtain a fourth sample, and the plurality of second samples comprise the fourth sample. In one embodiment, at least one first attribute may be identified from the behavioral attributes of the operational behavior comprised by the third sample based on a random algorithm. Fig. 3 shows a schematic diagram of a method of testing model robustness according to an embodiment of the present description. As shown in fig. 3, a plurality of user behavior sequences (first samples) may be subjected to a first process, to obtain a plurality of environment disturbance sequences (second samples) simulating attribute deletion. Specifically, from the behavior attributes of all the user behaviors included in each user behavior sequence, one to a plurality of target attributes (first attributes) may be randomly determined, and the value of the first attribute is set to a preset default value, where the default value is used to indicate that the value of the corresponding attribute is missing. In different embodiments, the default value may be a different specific value. In one example, this may be, for example, a null value or a miss indicator (e.g., denoted "None").
Then, at step 25, a first correct recognition rate for the first risk recognition model may be determined based on the plurality of first samples; determining a second correct recognition rate of the first risk recognition model based on the plurality of second samples; and determining a first index for the first risk identification model according to the first correct identification rate and the second correct identification rate, wherein the first index is used for indicating the robustness of the first risk identification model.
In the step, the correct recognition rates (a first correct recognition rate and a second correct recognition rate) of the first risk recognition model for the first samples and the second samples are determined according to the first samples and the second samples respectively, and then the robustness index of the first risk recognition model is determined according to the first correct recognition rate and the second correct recognition rate. In different embodiments, the first risk recognition model may be a different specific model that recognizes whether the user behavior sequence is risky, which is not limited in this specification.
In different embodiments, the specific manner in which the first correct recognition rate of the first risk recognition model is determined may be different based on the plurality of first samples. For example, in one embodiment, the plurality of first samples may be input into the first risk recognition model, recognition results corresponding to the plurality of first samples may be obtained, the number of correct recognitions and the number of incorrect recognitions corresponding to the plurality of first samples may be determined according to the risk labels and the recognition results corresponding to the plurality of first samples, and the first correct recognition rate of the first risk recognition model may be determined according to the number of correct recognitions and the number of incorrect recognitions corresponding to the plurality of first samples. In an embodiment, a specific manner of determining the second correct recognition rate of the first risk recognition model according to the plurality of second samples may be similar to that of the previous embodiment, and will not be described again.
A first indicator may be used to indicate the robustness of the first risk identification model. In different embodiments, the first index may be a different specific index, and the specific manner in which the first index is determined may also be different. In one embodiment, a first indicator for the first risk identification model may be determined based on a difference between a first correct identification rate and a second correct identification rate. In this embodiment, the smaller the absolute value of the first index, the smaller the recognition performance gap of the model in the disturbance and non-disturbance situation, and the better the robustness of the model.
In the actual operating scenario of some models, the distribution of values of some behavioral attributes may drift naturally as the duration of the model's operation increases. For example, behavior attribute 1 is used to indicate the duration of an online or registration of a behavior implementer, and the value of this attribute 1 tends to grow naturally over time. That is, the distribution of values of the attribute may drift naturally over time, and such attribute value drift may lead to gradual degradation of the model identification performance. Therefore, the real recognition performance of the model under the condition of attribute value drift is tested, and the robustness of the model under the actual running environment can be better determined. Thus, in one embodiment, the second processing may be further performed on the plurality of first samples, respectively, to obtain a plurality of fifth samples, wherein the second processing for the third samples included in the plurality of first samples includes: confirming at least one second attribute from the behavior attributes of the user behaviors included in the third sample; adding a preset offset value to the value of the second attribute to obtain a sixth user sample, wherein the plurality of fifth samples comprise a sixth sample; determining a third correct recognition rate of the first risk recognition model based on the plurality of fifth samples; and determining a first index aiming at the first risk identification model according to the first correct identification rate and the third correct identification rate. In different embodiments, different offset values may be added for different values of specific properties, which the present specification is not limited to. In a specific embodiment, as shown in fig. 4, the plurality of user behavior sequences may be respectively subjected to a second process, so as to obtain a plurality of environment disturbance sequences in which attribute drift is simulated. Specifically, for the behavior attribute included in each user behavior sequence, one or more target attributes (second attributes) may be randomly determined, and the value of the second attribute is set to a preset default value, where the default value is used to indicate that the value of the corresponding attribute is missing. The default value may also be a different specific value in different embodiments. In one example, this may be, for example, a null value or a miss indication.
In actual operating scenarios of other models, errors may be present in the acquisition of some behavior attribute values themselves. In contrast, the evaluation criteria for attribute 2 differ to different extents during different acquisitions, resulting in errors in the value of attribute 2 (even if it itself does not change or does not change in nature) during different acquisitions. Errors in such attribute value acquisition can also lead to model identification inaccuracies. Therefore, the real recognition performance of the model under the condition of attribute value errors is tested, and the robustness of the model under the actual running environment can be better determined. Thus, in one embodiment, the third processing may be further performed on the plurality of first samples, respectively, to obtain a plurality of seventh samples, wherein the third processing for the third samples included in the plurality of first samples includes: confirming at least one third attribute from the behavior attributes of the user behaviors included in the third sample; applying random disturbance to the value of the third attribute to obtain an eighth sample, wherein the seventh samples comprise eighth samples; determining a fourth correct recognition rate of the first risk recognition model based on the plurality of eighth samples; and determining the robustness of the first risk identification model according to the first correct identification rate and the fourth correct identification rate. In a specific embodiment, as shown in fig. 5, a third process may be performed on the plurality of user behavior sequences, to obtain a plurality of environment disturbance sequences simulating attribute errors. Specifically, one or more target attributes (third attributes) may be randomly determined for behavior attributes included in each user behavior in each sequence of user behaviors, and a random disturbance (e.g., expressed as a function R ()) may be applied to the value setting of the third attributes. In different embodiments, different types of random perturbations may be applied for different specific properties. In one example, a random perturbation may be applied to the value setting of the third attribute, for example, based on a random number within a preset range of values.
As described above, in addition to the environmental disturbance factors, the human disturbance factors may also affect the recognition performance of the recognition model. For example, an illegitimate person often performs operation b (e.g., a take-out account balance operation) immediately after performing an illegitimate operation a (e.g., an operation to obtain a user account). And the identification model learns the operation sequences of the operation A and the operation B as risk behavior subsequences according to the acquired user behavior samples, and identifies the risk of the user behavior sequences according to the risk behavior subsequences. However, lawless persons can also bypass the recognition of the model by inserting other interfering operations into the operations a and b so that the actual behavior sequence does not include the risk behavior subsequence learned by the model, but the actual operation result is the same as the risk behavior subsequence. In order to test the real recognition performance of the model under the condition of attribute value errors, the robustness of the model under the actual running environment can be better determined, and according to the embodiment of the other aspect, a method for testing the robustness of the model is further provided. FIG. 6 shows a flow chart of a method of testing model robustness according to another embodiment of the present description. As shown in fig. 6, the method at least comprises the following steps:
In step S61, a plurality of first samples are obtained, each of the first samples comprising a plurality of user behaviors in sequence, the user behaviors comprising a plurality of behavior attributes.
The first sample, the user behavior and the behavior attribute in this step, and the manner of obtaining the first sample are similar to those in step S21, and detailed descriptions of step S21 will be omitted here.
And step S63, performing fourth processing on the first samples respectively to obtain a plurality of ninth samples.
In this step, a fourth process is performed for each of the plurality of first samples, and a plurality of ninth samples are obtained. In particular, the fourth process for each sample (e.g., third sample) included in the plurality of first samples may include sub-step S631: and determining a non-critical behavior subsequence in a third sample based on a predetermined critical behavior subsequence, and applying predetermined disturbance to the non-critical behavior subsequence in the third sample to obtain a tenth sample, wherein the tenth sample is included in the plurality of ninth samples. The key behavior subsequence can be a subsequence which can be used as a judging basis for identifying whether the user behavior sequence has risk or not. In different embodiments, the predetermined critical behavior subsequence may be different, and the specific manner in which the critical behavior subsequence is determined may be different, which is not limited in this specification. For example, in one example, a plurality of key behavior subsequences may be determined in advance by manual setting according to a history behavior record, where the advantage of manual setting is that the advantage of human judgment can be utilized to make up for the deficiency of machine learning. For example, if there are a large number of consecutive behavior sequences of operations A and B in the risky training sample as described above, the model may learn that the sequence of operations A-B is the critical behavior subsequence that is risky. However, based on a person's understanding of the actual meaning of the operation, it is possible to identify that the operation in it (e.g., the operation for taking the account balance) is the critical behavior subsequence with the higher risk. In another example, key behavior subsequences may also be extracted from historical behavior samples through other machine learning models. Or firstly extracting candidate key subsequences through a machine learning model, and screening or cutting the candidate key subsequences in a manual mode to obtain key behavior subsequences. In this way, machine learning can be utilized to improve the extraction efficiency of key behavior subsequences, and manual screening can be utilized to improve the accuracy of the extraction results.
In different embodiments, the predetermined perturbation applied to the non-critical behavior sub-sequence may be different. In one embodiment, the non-critical behavior subsequence includes a first subsequence preceding the critical subsequence in the fifth sequence of user behavior. Further, the perturbation may be applied to the non-critical behavior subsequence by: a number of randomly determined operational actions are inserted in the first sub-sequence. In a specific embodiment, as shown in fig. 7, a fourth process may be performed on the plurality of user behavior sequences, to obtain a plurality of artificial disturbance sequences. Specifically, in the example shown in fig. 7, the predetermined key operation subsequence includes, for example, "behavior 5", and one or more other user behaviors may be inserted immediately before "behavior 5" in each user behavior sequence, to obtain a plurality of artificial disturbance sequences. .
Step S65, determining a first correct recognition rate of a first risk recognition model based on the plurality of first samples; determining a fifth correct recognition rate of the first risk recognition model based on the plurality of tenth samples; and determining a first index for the first risk identification model according to the first correct identification rate and the fifth correct identification rate, wherein the first index is used for indicating the robustness of the first risk identification model.
In this embodiment, a specific manner of determining the first correct recognition rate is determined according to the plurality of first samples and the plurality of tenth samples, respectively; and determining, according to the first correct recognition rate and the fifth correct recognition rate, a specific manner of determining the first index for the first risk recognition model, which is similar to the specific manner in step S25, see the description of step S25, which is not repeated here.
In different embodiments, the robustness index of the model may also be determined in combination with an undisturbed user behavior sequence and a model recognition rate determined from a disturbance behavior sequence obtained by different environmental disturbances and/or artificial disturbances. For example, in one embodiment, the robustness index of the model may be determined in conjunction with one or more of the correct recognition rate a, the correct recognition rate B, the correct recognition rate C, the correct recognition rate D, and the correct recognition rate E obtained in fig. 3, 4, 5, and 7. In one example, the robustness index may be, for example, an average of differences of the correct recognition rate a from one or more of the correct recognition rate B, the correct recognition rate C, the correct recognition rate D, and the correct recognition rate E, respectively.
According to an embodiment of still another aspect, there is also provided an apparatus for testing robustness of a model. Fig. 8 shows a block diagram of an apparatus for testing robustness of a model according to an embodiment of the present specification. As shown in fig. 8, the apparatus 800 includes:
A first sample acquiring unit 801 configured to acquire a plurality of first samples each including a plurality of operation behaviors including a plurality of behavior attributes in sequence;
a first processing unit 802 configured to perform a first process on the plurality of first samples, to obtain a plurality of second samples, where the first processing unit includes a first sub-processing unit for performing a first process on a third sample included in the plurality of first samples:
the first sub-processing unit is configured to confirm at least one first attribute from the behavior attributes of the operation behaviors included in the third sample; setting the first attribute as a preset first default value, wherein the first default value is used for indicating that the value of the first attribute is missing, so as to obtain a fourth sample, and the plurality of second samples comprise the fourth sample;
a first result determination unit 803 configured to determine a first correct recognition rate of the first risk recognition model based on the plurality of first samples; determining a second correct recognition rate of the first risk recognition model based on the plurality of second samples; and determining a first index for the first risk identification model according to the first correct identification rate and the second correct identification rate, wherein the first index is used for indicating the robustness of the first risk identification model.
According to an embodiment of still another aspect, there is also provided an apparatus for testing robustness of a model. Fig. 9 is a block diagram showing an apparatus for testing robustness of a model according to another embodiment of the present specification. As shown in fig. 9, the apparatus 900 includes:
a second sample acquiring unit 901 configured to acquire a plurality of first samples, each of the first samples including a plurality of operation behaviors in sequence, the operation behaviors including a plurality of behavior attributes;
a second processing unit 902 configured to perform fourth processing on the plurality of first samples, to obtain a plurality of eleventh samples, where the second processing unit includes a second sub-processing unit for fourth processing on a twelfth sample included in the plurality of first samples:
the second sub-processing unit is configured to determine a non-critical behavior sub-sequence included in a twelfth sample based on a predetermined critical behavior sub-sequence, and apply a predetermined disturbance to the non-critical behavior sub-sequence to obtain a thirteenth user sample;
a second result determination unit 903 configured to determine a first correct recognition rate of the first risk recognition model based on the plurality of first samples; determining a fifth correct recognition rate of the first risk recognition model based on the plurality of eleventh samples; and determining a first index for the first risk identification model according to the first correct identification rate and the fifth correct identification rate, wherein the first index is used for indicating the robustness of the first risk identification model.
Yet another aspect of the present description provides a computer-readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform any of the methods described above.
In yet another aspect, the present description provides a computing device comprising a memory having executable code stored therein and a processor that, when executing the executable code, performs any of the methods described above.
It should be understood that the description of "first," "second," etc. herein is merely for simplicity of description and does not have other limiting effect on the similar concepts.
Those skilled in the art will appreciate that in one or more of the examples described above, the functions described in the present invention may be implemented in hardware, software, firmware, or any combination thereof. When implemented in software, these functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium. The foregoing embodiments have been provided for the purpose of illustrating the general principles of the present invention in further detail, and are not to be construed as limiting the scope of the invention, but are merely intended to cover any modifications, equivalents, improvements, etc. based on the teachings of the invention.

Claims (12)

1. A method of testing model robustness, comprising:
acquiring a plurality of first samples, wherein each first sample comprises a plurality of user behaviors in sequence, and each user behavior comprises a plurality of behavior attributes;
performing first processing on the plurality of first samples to obtain a plurality of second samples, wherein the first processing on a third sample included in the plurality of first samples includes:
confirming at least one first attribute from the behavior attributes of the user behaviors included in the third sample; setting the first attribute as a preset first default value, wherein the first default value is used for indicating that the value of the first attribute is missing, so as to obtain a fourth sample, and the plurality of second samples comprise the fourth sample;
determining a first correct recognition rate of the first risk recognition model based on the plurality of first samples; determining a second correct recognition rate of the first risk recognition model based on the plurality of second samples; and determining a first index for the first risk identification model according to the first correct identification rate and the second correct identification rate, wherein the first index is used for indicating the robustness of the first risk identification model.
2. The method of claim 1, wherein identifying at least one first attribute from among behavioral attributes of user behavior included in the third sample comprises:
At least one first attribute is confirmed from the behavior attributes of the user behavior included in the third sample based on a random algorithm.
3. The method of claim 1, wherein determining a first correct recognition rate for a first risk recognition model based on the plurality of first samples comprises:
inputting the plurality of first samples into the first risk recognition model, acquiring recognition results corresponding to the plurality of first samples, determining the correct recognition quantity and the incorrect recognition quantity corresponding to the plurality of first samples according to the risk labels and the recognition results corresponding to the plurality of first samples, and determining the first correct recognition rate of the first risk recognition model according to the correct recognition quantity and the incorrect recognition quantity corresponding to the plurality of first samples.
4. The method of claim 1, wherein determining a first indicator for the first risk identification model based on a first correct identification rate and a second correct identification rate comprises:
a first indicator for the first risk identification model is determined based on a difference between the first correct identification rate and the second correct identification rate.
5. The method of claim 1, further comprising:
Performing second processing on the first samples respectively to obtain a fifth samples, wherein the second processing on a third sample included in the first samples comprises:
confirming at least one second attribute from the behavior attributes of the user behaviors included in the third sample; adding a preset offset value to the value of the second attribute to obtain a sixth user sample, wherein the plurality of fifth samples comprise a sixth sample;
determining a third correct recognition rate of the first risk recognition model based on the plurality of fifth samples; and determining a first index aiming at the first risk identification model according to the first correct identification rate and the third correct identification rate.
6. The method of claim 1, further comprising:
performing third processing on the first samples respectively to obtain seventh samples, wherein the third processing on the third samples included in the first samples comprises:
confirming at least one third attribute from the behavior attributes of the user behaviors included in the third sample; applying random disturbance to the value of the third attribute to obtain an eighth sample, wherein the seventh samples comprise eighth samples;
Determining a fourth correct recognition rate of the first risk recognition model based on the plurality of eighth samples; and determining the robustness of the first risk identification model according to the first correct identification rate and the fourth correct identification rate.
7. A method of testing model robustness, comprising:
acquiring a plurality of first samples, wherein each first sample comprises a plurality of user behaviors in sequence, and each user behavior comprises a plurality of behavior attributes;
performing fourth processing on the first samples respectively to obtain a ninth samples, wherein the fourth processing on the third samples included in the first samples comprises:
determining a non-critical behavior subsequence in a third sample based on a predetermined critical behavior subsequence, and applying predetermined disturbance to the non-critical behavior subsequence in the third sample to obtain a tenth sample, wherein the tenth sample is included in the plurality of ninth samples;
determining a first correct recognition rate of the first risk recognition model based on the plurality of first samples; determining a fifth correct recognition rate of the first risk recognition model based on the plurality of tenth samples; and determining a first index for the first risk identification model according to the first correct identification rate and the fifth correct identification rate, wherein the first index is used for indicating the robustness of the first risk identification model.
8. The method of claim 1, wherein the non-critical behavior subsequence comprises a first subsequence preceding the critical subsequence in a third sample;
applying a predetermined perturbation to a non-critical behavior sub-sequence in the third sample, comprising:
inserting a number of user actions determined randomly in said first sub-sequence.
9. An apparatus for testing robustness of a model, comprising:
a first sample acquisition unit configured to acquire a plurality of first samples each including a plurality of user behaviors in sequence, the user behaviors including a plurality of behavior attributes;
a first processing unit configured to perform first processing on the plurality of first samples to obtain a plurality of second samples, wherein the first processing unit includes a first sub-processing unit for first processing on a third sample included in the plurality of first samples:
the first sub-processing unit is configured to confirm at least one first attribute from the behavior attributes of the user behaviors included in the third sample; setting the first attribute as a preset first default value, wherein the first default value is used for indicating that the value of the first attribute is missing, so as to obtain a fourth sample, and the plurality of second samples comprise the fourth sample;
A first result determination unit configured to determine a first correct recognition rate of the first risk recognition model based on the plurality of first samples; determining a second correct recognition rate of the first risk recognition model based on the plurality of second samples; and determining a first index for the first risk identification model according to the first correct identification rate and the second correct identification rate, wherein the first index is used for indicating the robustness of the first risk identification model.
10. An apparatus for testing robustness of a model, comprising:
a second sample acquiring unit configured to acquire a plurality of first samples, each of the first samples including a plurality of user behaviors in sequence, the user behaviors including a plurality of behavior attributes;
a second processing unit configured to perform fourth processing on the plurality of first samples, respectively, to obtain a plurality of ninth samples, wherein the second processing unit includes a second sub-processing unit configured to perform fourth processing on twelfth samples included in the plurality of first samples:
the second sub-processing unit is configured to determine a non-critical behavior sub-sequence in a third sample based on a predetermined critical behavior sub-sequence, and apply a predetermined disturbance to the non-critical behavior sub-sequence in the third sample to obtain a tenth sample, wherein the tenth samples are included in the plurality of ninth samples;
A second result determination unit configured to determine a first correct recognition rate of the first risk recognition model based on the plurality of first samples; determining a fifth correct recognition rate of the first risk recognition model based on the plurality of tenth samples; and determining a first index for the first risk identification model according to the first correct identification rate and the fifth correct identification rate, wherein the first index is used for indicating the robustness of the first risk identification model.
11. A computer readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the method of any of claims 1-8.
12. A computing device comprising a memory having executable code stored therein and a processor, which when executing the executable code, implements the method of any of claims 1-8.
CN202211737623.8A 2022-12-30 2022-12-30 Method and device for testing robustness of model Pending CN116010792A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211737623.8A CN116010792A (en) 2022-12-30 2022-12-30 Method and device for testing robustness of model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211737623.8A CN116010792A (en) 2022-12-30 2022-12-30 Method and device for testing robustness of model

Publications (1)

Publication Number Publication Date
CN116010792A true CN116010792A (en) 2023-04-25

Family

ID=86020689

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211737623.8A Pending CN116010792A (en) 2022-12-30 2022-12-30 Method and device for testing robustness of model

Country Status (1)

Country Link
CN (1) CN116010792A (en)

Similar Documents

Publication Publication Date Title
CN110442712B (en) Risk determination method, risk determination device, server and text examination system
CN104461863A (en) Service system testing method, device and system
CN109934268B (en) Abnormal transaction detection method and system
US20200027105A1 (en) Systems and methods for value at risk anomaly detection using a hybrid of deep learning and time series models
CN117421217B (en) Automatic software function test method, system, terminal and medium
CN109543409B (en) Method, device and equipment for detecting malicious application and training detection model
CN109242165A (en) A kind of model training and prediction technique and device based on model training
CN110287700B (en) iOS application security analysis method and device
CN115952081A (en) Software testing method, device, storage medium and equipment
US20210034917A1 (en) Anomaly detection and clustering in financial data channel migration
CN114625406A (en) Application development control method, computer equipment and storage medium
CN114285587A (en) Domain name identification method and device and domain name classification model acquisition method and device
CN111651500A (en) User identity recognition method, electronic device and storage medium
CN116627804A (en) Test method, system, electronic equipment and storage medium based on artificial intelligence
CN116010792A (en) Method and device for testing robustness of model
CN116340172A (en) Data collection method and device based on test scene and test case detection method
CN115373984A (en) Code coverage rate determining method and device
CN111143220B (en) Training system and method for software test
CN114266941A (en) Method for rapidly detecting annotation result data of image sample
CN113239075A (en) Construction data self-checking method and system
CN108235324B (en) Short message template testing method and server
CN114580982B (en) Method, device and equipment for evaluating data quality of industrial equipment
CN113609487B (en) Method for detecting backdoor code through static analysis
CN114581693B (en) User behavior mode distinguishing method and device
KR102155750B1 (en) Method for managing inspection quality using returned work result of crowdsourcing based project for artificial intelligence training data generation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination