WO2020232902A1 - 异常对象识别方法、装置、计算设备和存储介质 - Google Patents

异常对象识别方法、装置、计算设备和存储介质 Download PDF

Info

Publication number
WO2020232902A1
WO2020232902A1 PCT/CN2019/103604 CN2019103604W WO2020232902A1 WO 2020232902 A1 WO2020232902 A1 WO 2020232902A1 CN 2019103604 W CN2019103604 W CN 2019103604W WO 2020232902 A1 WO2020232902 A1 WO 2020232902A1
Authority
WO
WIPO (PCT)
Prior art keywords
object data
identified
score
predetermined rule
correction value
Prior art date
Application number
PCT/CN2019/103604
Other languages
English (en)
French (fr)
Inventor
孙家棣
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2020232902A1 publication Critical patent/WO2020232902A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/2433Single-class perspective, e.g. one-against-all classification; Novelty detection; Outlier detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Definitions

  • This application relates to the field of network monitoring technology, and in particular to an abnormal object identification method, device, computing device, and computer non-volatile readable storage medium.
  • the purpose of this application is to provide an abnormal object identification method, device, computing device, and computer non-volatile readable storage medium.
  • an abnormal object recognition method including:
  • each of the object data corresponds to an object
  • each of the object data includes a plurality of features and a feature value corresponding to each feature
  • the sample set also includes The correction value of each object data label
  • the feature value corresponding to each feature Using the multiple features of the object data in the sample set, the feature value corresponding to each feature, and the correction value corresponding to each object data to train the machine learning model to obtain the object score correction value prediction model;
  • a predetermined rule that the object data to be identified meets is obtained from a plurality of preset rules, wherein each predetermined rule and feature is Score correspondence
  • the score of the object data to be identified is determined according to the predetermined rules satisfied by the object data to be identified, the score corresponding to each predetermined rule satisfied by the object data to be identified, and the correction value;
  • an abnormal object is identified among the objects corresponding to the object data to be identified.
  • an abnormal object recognition device including:
  • the first obtaining module is configured to obtain a sample set including a plurality of object data, wherein each of the object data corresponds to an object, and each of the object data includes a plurality of features and a feature value corresponding to each feature,
  • the sample set also includes correction values pre-marked for each object data;
  • the training module is configured to use multiple features of the object data in the sample set, the feature value corresponding to each feature, and the correction value of the object data to train the machine learning model to obtain the object score correction value prediction model;
  • the second acquiring module is configured to acquire at least one object data to be identified
  • the input module is configured to input the object data to be identified into the object score correction value prediction model to obtain the correction value output by the object score correction value prediction model corresponding to each of the object data to be identified;
  • the third acquisition module is configured to, for each of the object data to be identified, acquire a predetermined rule that the object data to be identified satisfies from a plurality of preset rules according to the feature and feature value in the object data to be identified, wherein , Each predetermined rule corresponds to the feature and score;
  • the determining module is configured to, for each of the object data to be identified, determine the object data to be identified according to the predetermined rule satisfied by the object data to be identified, the score corresponding to each predetermined rule satisfied by the object data to be identified, and the correction value. Scoring of recognition object data;
  • the identification module is configured to identify an abnormal object among the objects corresponding to the object data to be identified according to the score of the object data to be identified.
  • a computing device including a memory and a processor, the memory stores computer-readable instructions, and when the computer-readable instructions are executed by the processor, the processor executes the above exception The steps of the object recognition method.
  • a computer non-volatile readable storage medium storing computer readable instructions.
  • the computer readable instructions are executed by one or more processors, one or more processors execute the above The steps of an abnormal object identification method.
  • the technical solution provided by the embodiments of the present application may include the following beneficial effects: the above-mentioned abnormal object identification method, device, computing device, and computer non-volatile readable storage medium are trained to obtain an object score correction value prediction model by first using a sample set, Then use the model to obtain the correction value of the object data to be identified, and finally obtain the score of the object data to be identified based on the features in the object data to be identified and the satisfaction of the feature values to the predetermined rules, and identify abnormal objects based on the score, so that the recognition result can be It quantifies the abnormality of the object well, improves the accuracy of identifying abnormal objects, and improves the interpretability of the recognition results.
  • Fig. 1 is a schematic diagram showing a system architecture of an abnormal object recognition method in an abnormal traffic recognition application scenario according to an exemplary embodiment
  • Fig. 2 is a schematic diagram showing the system architecture of an abnormal object recognition method in an application scenario of gang stalking behavior recognition in a group according to an exemplary embodiment
  • Fig. 3 is a flowchart showing a method for identifying abnormal objects according to an exemplary embodiment
  • FIG. 4 is a flowchart showing details of step 370 in an embodiment according to the embodiment corresponding to FIG. 3;
  • FIG. 5 is a flowchart of a method for determining a score corresponding to a predetermined rule according to an embodiment shown in the embodiment corresponding to FIG. 3;
  • Fig. 6 is a block diagram showing a device for identifying abnormal objects according to an exemplary embodiment
  • Fig. 7 is a block diagram showing an example of a computing device implementing the above method for identifying abnormal objects according to an exemplary embodiment
  • Fig. 8 shows a non-volatile computer readable storage medium for realizing the above abnormal object identification method according to an exemplary embodiment.
  • This application first provides a method for identifying abnormal objects.
  • An object refers to a computer-related device with certain associated data or anything that exists or runs on a computer device or network platform that can be used as a target. For example, it can be a data object, terminal object, account object, etc.
  • An abnormal object refers to an object that meets certain conditions and is considered abnormal.
  • the identification of abnormal objects refers to the process of finding out possible abnormal objects.
  • the method for identifying abnormal objects provided in the present application can be applied to various scenarios in the field of network security, for example, it can be used to identify abnormal traffic, and can also be used to monitor the behavior of wool.
  • the implementation terminal of this application can be any device that has the function of computing and processing data.
  • the device can be connected to an external device to receive or send information. It can be a portable mobile device, such as a smart phone, a tablet, a laptop, PDA (Personal Digital Assistant), etc., can also be fixed devices, such as computer equipment, field terminals, desktop computers, servers, workstations, etc., or a collection of multiple devices, such as the physical infrastructure of cloud computing.
  • Fig. 1 is a schematic diagram showing a system architecture of an abnormal object identification method in an abnormal traffic identification application scenario according to an exemplary embodiment.
  • the architecture between the server 110 and the user terminal 120 can be a C/S architecture, that is, a Client/Server (client/server) architecture, or a B/S architecture, that is, Browser/Server. (Browser/Server) architecture.
  • C/S architecture that is, a Client/Server (client/server) architecture
  • B/S architecture that is, Browser/Server. (Browser/Server) architecture.
  • the abnormal object identification method provided by the present application can be run on the server 110, and can also run on a terminal other than the server 110.
  • Fig. 2 is a schematic diagram showing the system architecture of an abnormal object recognition method in an application scenario of gang stalking behavior recognition in a group according to an exemplary embodiment.
  • the smart phone 230 may be connected to the base station 220 via a cellular network, and then communicate with the server 210 via the base station.
  • the smartphone 230 is installed with an App (Application) provided by the operator of the server 210.
  • App Application
  • the user of the smartphone 230 uses the App for the first time, he needs to register with the server 210, and the server 210 will assign an account to the user. , The account can be bound to the account.
  • the user of the smart phone 230 can further use the App to perform more interactive behaviors with the server 210, which is a typical running mode of the App at present.
  • These apps can generally establish chat groups and transfer funds within the group.
  • App operators carry out activities, such as activities that involve the issuance of money, such as registering to receive red envelopes, participating in event rebates, etc.
  • criminals may use a large number of registered accounts and other methods to obtain the operator’s activity rewards. Wool, then criminals can use the way of issuing red envelopes in the group to transfer the funds obtained from the wool, causing economic losses to the App operator, so it is necessary to identify the behavior of the wool to achieve a targeted attack.
  • Fig. 3 is a flow chart showing a method for identifying abnormal objects according to an exemplary embodiment. As shown in Figure 3, it includes the following steps:
  • Step 310 Obtain a sample set including multiple object data.
  • each of the object data corresponds to an object
  • each of the object data includes a plurality of features and a feature value corresponding to each feature
  • the sample set further includes a correction value previously labeled for each object data.
  • the correction value is a value used in the process of transforming the score of the object data obtained by the scoring rule to obtain the expert score, where the expert score is the score of the object data obtained by judging the object data based on the expert experience in advance.
  • Object data is data related to the object, which can be related to the properties of the object or data generated by the operation of the object.
  • the object data can be data related to the IP address of the traffic generator
  • the corresponding object can be the IP address of the traffic generator
  • the feature included in the object data can be the number of visits to the same IP address , The number of accounts accessed using the same IP address, the number of Wi-Fi names used by terminals accessing the same IP address, etc.
  • the feature value corresponding to each feature is the actual value of the corresponding feature.
  • the method for identifying abnormal objects provided in this application can be applied to the application scenario of gangs in a group to identify the behavior of stalking wool.
  • the object data can be data related to the issuance of red envelope bank cards.
  • Each of the object data includes Multiple features can have the following features: the ratio of the number of registered mobile phone numbers of bank cards issued in red envelopes to the number of bank cards issued in red envelopes in the group, the ratio of the frequency of red envelope income from the activities of issuing red envelopes to the number of bank cards issued in red envelopes, Among them, the frequency of red envelope income from issuing red envelope bank card activity is the ratio of the frequency of red envelope income from issuing red envelope bank card activity to the frequency of in and out of bank card binding account, and the number of registered mobile phone numbers of bank card receiving red envelope in the group.
  • Each mobile phone number can be registered as an account, and each registered account can be bound to one or more bank cards, and each bank card can also be used to bind different registered accounts, so red envelope bank cards are issued
  • the number of registered mobile phone numbers can be multiple.
  • the feature value corresponding to each feature is the actual value of the corresponding feature, and will not be repeated here.
  • the multiple features included in the same object data and the feature value corresponding to each feature are stored in a mapping table, each feature is a key in the mapping table, and the feature corresponding to the feature The value is value.
  • Step 320 Use the multiple features of the object data in the sample set, the feature value corresponding to each feature, and the correction value corresponding to each object data to train a machine learning model to obtain an object score correction value prediction model.
  • the trained machine learning model can be a variety of models, such as a logistic regression model, a neural network model, etc.
  • the training process of the machine learning model can be as follows: take multiple features of an object data and the feature value corresponding to each feature as input, input to the machine learning model, and obtain the correction value output by the machine learning model. The correction value is compared with the correction value corresponding to the object data. If the two are inconsistent, the coefficients or weights of the machine learning model are adjusted until for multiple object data in the sample set, the machine learning model is made based on the multiple The output correction value of most object data in each object data is the same or similar to the correction value corresponding to the object data.
  • Step 330 Obtain at least one object to be identified.
  • the object data to be identified is data generated by the object to be identified, and similar to the aforementioned object data, it can also include corresponding features and feature values corresponding to each feature.
  • the data to be identified can be data related to the IP address of the traffic generator, and for the application scenario of the behavior identification of gangs in a group, the data to be identified can be related to sending and receiving red packets. Bank card related data.
  • the data of the object to be identified whose data has changed during the predetermined time period is acquired every predetermined time period.
  • each time the data changes the data of the object to be identified whose data has changed is acquired.
  • Step 340 Input the object data to be identified into the object score correction value prediction model, and obtain the correction value output by the object score correction value prediction model corresponding to each of the object data to be identified.
  • the object data to be identified can also include corresponding features and the feature value corresponding to each feature. Then the object score correction value prediction model can output corresponding data according to the input of the object data to be identified. Since the object score correction value prediction model has been trained, it can be considered that the correction value output by the object score correction value prediction model corresponding to each of the object data to be identified is to a certain extent Reliable and accurate.
  • Step 350 For each of the object data to be identified, according to the feature and feature value in the object data to be identified, a predetermined rule that the object data to be identified meets is obtained from a plurality of preset rules.
  • each predetermined rule corresponds to a feature and a score.
  • the predetermined rule is used to screen data generated by possible abnormal objects in the object data to be identified.
  • the feature in the object data to be identified is used to determine the corresponding predetermined rule, and the feature value is used to determine whether the object data to be identified meets the predetermined rule, that is, by judging whether the feature value corresponding to the feature conforms to the feature
  • the corresponding predetermined rule is used to obtain the predetermined rule satisfied by the object data to be identified.
  • each The predetermined rule corresponding to the feature may be that the number of visits to the same IP address is greater than 8, the number of accounts that use the same IP address is greater than 6, and the number of Wi-Fi names used by terminals that use the same IP address to access is greater than 7.
  • the object data includes the characteristics: the ratio of the number of registered mobile phone numbers of bank cards issued in red envelopes to the number of bank cards issued in red envelopes in the group, red envelope income from the activities of issuing red envelopes
  • the ratio of the frequency ratio to the number of red envelope bank cards issued, the number of registered mobile phone numbers of red envelope bank cards in the group, then the predetermined rule corresponding to each feature can be: the number of registered mobile phone numbers of red envelope bank cards issued in the group and the number of bank cards issued in the group
  • the ratio of the number of red envelope bank cards is greater than or equal to 7
  • the ratio of the frequency of red envelope income from issuing red envelope bank card activities to the number of red envelope bank cards is greater than or equal to 0.99
  • the number of registered mobile phone numbers of red envelope bank cards in the group is greater than or equal to 0.99 7.
  • the score corresponding to each predetermined rule is stored in a predetermined rule-score correspondence table established in advance based on experience, and the score corresponding to the predetermined rule is obtained by searching the predetermined rule-score correspondence table.
  • Step 360 For each of the object data to be identified, the object data to be identified is determined according to the predetermined rule that the object data to be identified meets, the score corresponding to each predetermined rule that the object data to be identified meets, and the correction value. Rating.
  • step 360 specifically includes the following steps:
  • the following formula is used to obtain the object data to be identified Rating:
  • n is the number of predetermined rules that the object data to be identified meets
  • i is the sequence number of the predetermined rule that the object data to be identified meets
  • x i is the score corresponding to the i-th predetermined rule that the object data to be identified meets
  • y is the number to be identified
  • is the deviation correction value of the object data to be identified.
  • the above-mentioned deviation correction value refers to the value required for the process of correcting the score obtained by the predetermined rule and the score of the object data to be identified by increasing the deviation.
  • the characteristic of the above formula is that through accumulation, each object data to be identified is satisfied The scores corresponding to the predetermined rules are reflected in the final score of the object data to be identified. Therefore, the advantage of this embodiment is that, by accumulating the scores corresponding to the predetermined rules that are met, it fully and objectively reflects the satisfaction of the data of the object to be identified to the predetermined rules, realizing the quantification of the abnormality of the object, and improving the recognition of abnormal objects. Accuracy.
  • the correction value output by the object score correction value prediction model corresponding to each of the object data to be identified is a growth coefficient, and for each object data to be identified, the data is based on the object to be identified
  • the predetermined rules met by the data, the score corresponding to each predetermined rule met by the object data to be identified, and the correction value to determine the score of the object data to be identified include:
  • the following formula is used to obtain the object data to be identified Rating:
  • n is the number of predetermined rules that the object data to be identified meets
  • i is the sequence number of the predetermined rule that the object data to be identified meets
  • x i is the score corresponding to the i-th predetermined rule that the object data to be identified meets
  • y is the number to be identified
  • k is the growth coefficient.
  • the growth coefficient refers to the ratio of the increase required to convert the score of the object data to be recognized obtained by using a predetermined rule into the actual score of the object data to be recognized.
  • the advantage of this embodiment is that the quantification of the degree of abnormality of the object corresponding to the object data to be identified is achieved through each predetermined rule, and at the same time, the score obtained using the predetermined rule is further modified by using the growth coefficient, so that the obtained object data to be identified The score is more objective and improves the accuracy of identifying abnormal objects.
  • each predetermined rule includes a reference value
  • step 360 specifically includes the following steps:
  • the following formula is used to determine the object data to be identified according to the predetermined rule that the object data to be identified meets, the score corresponding to each predetermined rule that the object data to be identified meets, and the correction value Rating:
  • n is the number of predetermined rules that the object data to be identified meets
  • i is the sequence number of the predetermined rule that the object data to be identified meets
  • x i is the score corresponding to the i-th predetermined rule that the object data to be identified meets
  • is the predetermined rule The ratio of the difference between the characteristic value of the corresponding feature and the reference value of the predetermined rule to the reference value of the predetermined rule
  • y is the score of the object data to be identified
  • is the correction value of the object data to be identified.
  • the finally obtained score of the object data to be identified is not only related to the predetermined rule and correction value that the object data to be identified meets, but also the amount of change of the characteristic value of the feature corresponding to the predetermined rule relative to the reference value of the predetermined rule Relevant, the greater the amount of change, the greater the effect of the corresponding score of the corresponding predetermined rule in the obtained scoring of the object data to be identified, that is, the more the score of the object data to be identified is finally obtained.
  • the score of the predetermined rule satisfied by the object to be identified be reflected in the score of the obtained data of the object to be identified, but also the actual degree of satisfaction of the object to be identified with the predetermined rule is reflected in the final obtained Among the scoring of the object data to be identified, to a certain extent, the scoring of the object data to be identified is further refined and quantified, so that the obtained score of the object data to be identified is more credible, and the accuracy of identifying abnormal objects is improved.
  • the reference value included in the predetermined rule is a limit used for judging the characteristic value corresponding to the characteristic in the predetermined rule. For example, for an application scenario of abnormal traffic identification, if the predetermined rule corresponding to a feature is that the number of accounts accessed using the same IP address is greater than 6, the feature value included in the predetermined rule is 6.
  • Step 370 According to the score of the object data to be identified, an abnormal object is identified among the objects corresponding to the object data to be identified.
  • the identifying an abnormal object among the objects corresponding to each object data to be identified according to the score of the object data to be identified includes:
  • the machine learning model is first trained, and then the trained machine learning model is used to obtain the correction value, and then the predetermined rule and correction value satisfied by the object data to be recognized are obtained.
  • FIG. 4 is a flowchart showing details of step 370 in an embodiment according to the embodiment corresponding to FIG. 3.
  • the object data to be identified further includes at least one exempt attribute and an exempt attribute value corresponding to each exempt attribute, as shown in FIG. 4, including the following steps:
  • Step 371 Obtain the object data to be identified whose score is greater than a predetermined score threshold as candidate abnormal object data.
  • the predetermined score threshold is 0.8, and if a score of the object data to be identified is 0.83, since the score 0.83 of the object data to be identified is greater than the predetermined score threshold 0.8, the object data to be identified will be regarded as candidate abnormal object data.
  • Step 372 Filter out candidate abnormal object data whose exempt attribute value corresponding to the exempt attribute is less than the preset exempt attribute value threshold corresponding to the exempt attribute in the candidate abnormal object data, and filter out the object corresponding to the filtered candidate abnormal object data As an exception object.
  • the exempt attribute is an attribute that can greatly reduce the possibility that the object corresponding to an object data is an abnormal object, and the corresponding exempt attribute value is the actual value of the exempt attribute.
  • the exemption attribute can be the historical red envelope winning amount of the bank card. If the historical red envelope winning amount of the bank card is less than the bank card historical red envelope winning amount threshold, that is, the historical red envelope winning amount of the bank card is sufficient If it is small, it means that the object corresponding to the candidate abnormal object data is less likely to be involved in the behavior of squeezing wool, and the candidate abnormal object data can be filtered out.
  • the advantage of this embodiment is that it provides a way for the corresponding object data to meet the predetermined rules but is unlikely to be an abnormal object to avoid being recognized as an abnormal object, which further improves the accuracy of identifying the abnormal object.
  • FIG. 5 is a flowchart of a method for determining a score corresponding to a predetermined rule according to an embodiment shown in the embodiment corresponding to FIG. 3. As shown in Figure 5, it includes the following steps:
  • Step 510 Obtain a positive sample containing multiple object data.
  • the positive sample is a collection of multiple object data that meets a predetermined condition, and the predetermined condition is used to filter the object data that the corresponding object is more likely to be an abnormal object.
  • the object data also includes the following features: historically there are red envelopes receiving and sending records in the group, the amount of red envelopes won by the bank card history, and the frequency of in and out of accounts bound to the bank card.
  • a positive sample can be banks that have red envelopes in the group in history, the amount of red envelopes won in the history of bank cards ranks in the top 20% from large to small, and the frequency of in and out of accounts bound to the bank card ranks in the top 20% from large to small Card data.
  • Step 520 Determine the number of object data in the positive sample as the first number.
  • a counter is embedded in the implementation terminal of the present application, and by using the counter, the number of object data in the positive sample can be obtained.
  • Step 530 For each predetermined rule in the plurality of predetermined rules, determine the number of object data satisfying the predetermined rule in the positive sample as the second number.
  • the implementation terminal of the present application is provided with a counter. For each predetermined rule in a plurality of predetermined rules, the counter is increased by 1 whenever it is determined that there is an object data in the positive sample that satisfies the predetermined rule. , Until all object data in the positive sample is judged for the predetermined rule, the value calculated by the counter at this time is the second number obtained for the predetermined rule.
  • Step 540 For each predetermined rule in a plurality of predetermined rules, determine the ratio of the second number and the first number corresponding to the predetermined rule.
  • Step 550 Regarding each predetermined rule of the plurality of predetermined rules, the ratio determined for the predetermined rule is used as the score corresponding to the predetermined rule.
  • the advantage of this embodiment is that by using the object data with a higher probability of being an abnormal object as the positive sample, each object data that meets the predetermined rule is determined based on the proportion of the acquired positive sample.
  • the scores corresponding to the predetermined rules increase the credibility and accuracy of the scores obtained for each predetermined rule.
  • the present application also provides an abnormal object recognition device, and the following are device embodiments of the present application.
  • Fig. 6 is a block diagram showing a device for identifying abnormal objects according to an exemplary embodiment. As shown in FIG. 6, the apparatus 600 includes:
  • the first obtaining module 610 is configured to obtain a sample set including a plurality of object data, wherein each of the object data corresponds to an object, and each of the object data includes a plurality of features and a feature value corresponding to each feature , The sample set further includes correction values pre-marked for each object data;
  • the training module 620 is configured to use multiple features of the object data in the sample set, the feature value corresponding to each feature, and the correction value of the object data to train the machine learning model to obtain the object score correction value prediction model;
  • the second acquiring module 630 is configured to acquire at least one object data to be identified
  • the input module 640 is configured to input the object data to be identified into the object score correction value prediction model to obtain the correction value output by the object score correction value prediction model corresponding to each of the object data to be identified;
  • the third acquisition module 650 is configured to, for each of the object data to be identified, acquire the predetermined rules that the object data to be identified meets from a plurality of preset rules according to the features and feature values in the object data to be identified, Among them, each predetermined rule corresponds to a feature and score;
  • the determining module 660 is configured to determine, for each object data to be identified, according to a predetermined rule satisfied by the object data to be identified, a score corresponding to each predetermined rule satisfied by the object data to be identified, and the correction value. Scoring of object data to be identified;
  • the identification module 670 is configured to identify an abnormal object among the objects corresponding to the object data to be identified according to the score of the object data to be identified.
  • the correction value corresponding to each of the object data to be identified output by the object score correction value prediction model is a growth coefficient
  • the determination module is further configured to:
  • the following formula is used to obtain the object data to be identified Rating:
  • n is the number of predetermined rules that the object data to be identified meets
  • i is the sequence number of the predetermined rule that the object data to be identified meets
  • x i is the score corresponding to the i-th predetermined rule that the object data to be identified meets
  • y is the number to be identified
  • k is the growth coefficient.
  • the correction value corresponding to each of the object data to be identified and output by the object score correction value prediction model is a deviation correction value
  • the determining module is further configured to:
  • the following formula is used to obtain the object data to be identified Rating:
  • n is the number of predetermined rules that the object data to be identified meets
  • i is the sequence number of the predetermined rule that the object data to be identified meets
  • x i is the score corresponding to the i-th predetermined rule that the object data to be identified meets
  • y is the number to be identified
  • is the deviation correction value of the object data to be identified.
  • the identification module is further configured to:
  • the object data to be identified further includes at least one exempt attribute and an exempt attribute value corresponding to each exempt attribute.
  • the data corresponding to the object data to be identified Anomalous objects are identified in the objects, including:
  • the score corresponding to each predetermined rule is determined in the following manner:
  • the ratio determined for the predetermined rule is used as the score corresponding to the predetermined rule.
  • each predetermined rule includes a reference value
  • the determining module is further configured to:
  • the following formula is used to determine the object data to be identified according to the predetermined rule that the object data to be identified meets, the score corresponding to each predetermined rule that the object data to be identified meets, and the correction value Rating:
  • n is the number of predetermined rules that the object data to be identified meets
  • i is the sequence number of the predetermined rule that the object data to be identified meets
  • x i is the score corresponding to the i-th predetermined rule that the object data to be identified meets
  • is the predetermined rule The ratio of the difference between the characteristic value of the corresponding feature and the reference value of the predetermined rule to the reference value of the predetermined rule
  • y is the score of the object data to be identified
  • is the correction value of the object data to be identified.
  • the computing equipment includes:
  • At least one processor At least one processor
  • a memory communicatively connected with the at least one processor; wherein,
  • the memory stores instructions that can be executed by the at least one processor, and the instructions are executed by the at least one processor, so that the at least one processor can execute as shown in any one of the above exemplary embodiments.
  • the computing device 700 according to this embodiment of the present application will be described below with reference to FIG. 7.
  • the computing device 700 shown in FIG. 7 is only an example, and should not bring any limitation to the function and scope of use of the embodiments of the present application.
  • the computing device 700 is represented in the form of a general-purpose computing device.
  • the components of the computing device 700 may include, but are not limited to: the aforementioned at least one processing unit 710, the aforementioned at least one storage unit 720, and a bus 730 connecting different system components (including the storage unit 720 and the processing unit 710).
  • the storage unit stores program code, and the program code can be executed by the processing unit 710, so that the processing unit 710 executes the various exemplary methods described in the "Embodiment Method" section of this specification. Implementation steps.
  • the storage unit 720 may include a readable medium in the form of a volatile storage unit, such as a random access storage unit (RAM) 821 and/or a cache storage unit 722, and may further include a read-only storage unit (ROM) 723.
  • RAM random access storage unit
  • ROM read-only storage unit
  • the storage unit 720 may also include a program/utility tool 724 having a set of (at least one) program modules 825.
  • program modules 725 include but are not limited to: an operating system, one or more application programs, other program modules, and program data, Each of these examples or some combination may include the implementation of a network environment.
  • the bus 730 may represent one or more of several types of bus structures, including a storage unit bus or a storage unit controller, a peripheral bus, a graphics acceleration port, a processing unit, or a local area using any bus structure among multiple bus structures. bus.
  • the computing device 700 may also communicate with one or more external devices 900 (such as keyboards, pointing devices, Bluetooth devices, etc.), and may also communicate with one or more devices that enable a user to interact with the computing device 700, and/or communicate with Any device (such as a router, modem, etc.) that enables the computing device 700 to communicate with one or more other computing devices. This communication can be performed through an input/output (I/O) interface 750.
  • the computing device 700 may also communicate with one or more networks (such as a local area network (LAN), a wide area network (WAN), and/or a public network, such as the Internet) through the network adapter 760.
  • networks such as a local area network (LAN), a wide area network (WAN), and/or a public network, such as the Internet
  • the network adapter 760 communicates with other modules of the computing device 700 through the bus 730. It should be understood that although not shown in the figure, other hardware and/or software modules can be used in conjunction with the computing device 700, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives And data backup storage system, etc.
  • the exemplary embodiments described herein can be implemented by software, or can be implemented by combining software with necessary hardware. Therefore, the technical solution according to the embodiments of the present application can be embodied in the form of a software product, which can be stored in a non-volatile storage medium (can be a CD-ROM, U disk, mobile hard disk, etc.) or on the network , Including several instructions to make a computing device (which may be a personal computer, server, terminal device, or network device, etc.) execute the method according to the embodiment of the present application.
  • a non-volatile storage medium can be a CD-ROM, U disk, mobile hard disk, etc.
  • Including several instructions to make a computing device which may be a personal computer, server, terminal device, or network device, etc.
  • a computer non-volatile readable storage medium on which is stored a program product capable of implementing the above method of this specification.
  • various aspects of the present application can also be implemented in the form of a program product, which includes program code.
  • the program product runs on a terminal device, the program code is used to enable the The terminal device executes the steps according to various exemplary embodiments of the present application described in the above-mentioned "Exemplary Method" section of this specification.
  • a computer non-volatile readable storage medium 800 for implementing the above method according to an embodiment of the present application is described, which may adopt a portable compact disk read-only memory (CD-ROM) and includes program code , And can run on terminal devices, such as personal computers.
  • CD-ROM portable compact disk read-only memory
  • the program product of this application is not limited to this.
  • the computer non-volatile readable storage medium can be any tangible medium that contains or stores a program, and the program can be used by or in combination with an instruction execution system, device, or device. In conjunction with.
  • the program product can use any combination of one or more readable media.
  • the readable medium may be a readable signal medium or a readable storage medium.
  • the readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, device, or device, or a combination of any of the above. More specific examples (non-exhaustive list) of readable storage media include: electrical connections with one or more wires, portable disks, hard disks, random access memory (RAM), read only memory (ROM), erasable Type programmable read only memory (EPROM or flash memory), optical fiber, portable compact disk read only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above.
  • the computer-readable signal medium may include a data signal propagated in baseband or as a part of a carrier wave, and readable program code is carried therein. This propagated data signal can take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing.
  • the readable signal medium may also be any readable medium other than a readable storage medium, and the readable medium may send, propagate, or transmit a program for use by or in combination with the instruction execution system, apparatus, or device.
  • the program code contained on the readable medium can be transmitted by any suitable medium, including but not limited to wireless, wired, optical cable, RF, etc., or any suitable combination of the foregoing.
  • the program code used to perform the operations of this application can be written in any combination of one or more programming languages.
  • the programming languages include object-oriented programming languages—such as Java, C++, etc., as well as conventional procedural Programming language-such as "C" language or similar programming language.
  • the program code can be executed entirely on the user's computing device, partly on the user's device, executed as an independent software package, partly on the user's computing device and partly executed on the remote computing device, or entirely on the remote computing device or server Executed on.
  • the remote computing device can be connected to a user computing device through any kind of network, including a local area network (LAN) or a wide area network (WAN), or can be connected to an external computing device (for example, using Internet service providers) Business to connect via the Internet).
  • LAN local area network
  • WAN wide area network
  • Internet service providers Internet service providers

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Medical Informatics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

本申请涉及网络监控领域,揭示了一种异常对象识别方法、装置、计算设备和存储介质。该方法包括:获取包括多个对象数据的样本集;利用样本集训练机器学习模型,得到对象分数修正值预测模型;获取待识别对象数据;将待识别对象数据输入对象分数修正值预测模型,得到修正值;在多个预设规则中获取每一待识别对象数据满足的预定规则;针对每一待识别对象数据,根据该待识别对象数据满足的预定规则、满足的预定规则对应的分数和修正值,确定该待识别对象数据的评分;根据待识别对象数据的评分,在各待识别对象数据对应的对象中识别出异常对象。此方法下,实现了对对象异常程度的量化,提高了识别异常对象的精度,提高了识别结果的可解释性。

Description

异常对象识别方法、装置、计算设备和存储介质
本申请基于并要求2019年5月23日申请的、申请号为CN 201910435976.4、名称为“异常对象识别方法、装置、介质及电子设备”的中国专利申请的优先权,其全部内容在此并入作为参考。
技术领域
本申请涉及网络监控技术领域,特别是涉及一种异常对象识别方法、装置、计算设备和计算机非易失性可读存储介质。
背景技术
随着移动互联网的发展,网络已经进入了绝大多数人的生活,网络安全愈发重要。网络平台一般为众多用户提供服务,一旦出现异常可能会造成巨大损失。比如,非法用户访问、异常流量入侵等行为具有危害大、发展迅速等特点,识别这些异常行为的产生对象具有很大的困难。在现有技术中,主要是利用一系列规则对产生行为的对象进行识别,仅仅利用这些规则把这些产生行为的对象分为异常和非异常两种类型。本申请的发明人意识到,现有技术存在以下缺陷:对于被识别为的异常对象,无法很好地界定其异常的程度,导致了识别异常对象的精度较低,识别结果的可解释性较低。
发明内容
在网络监控技术领域,为了解决上述技术问题,本申请的目的在于提供一种异常对象识别方法、装置、计算设备和计算机非易失性可读存储介质。
第一方面,提供了一种异常对象识别方法,包括:
获取包括多个对象数据的样本集,其中,每一所述对象数据与对象对应,每一所述对象数据包括多个特征以及与每一特征对应的特征值,所述样本集还包括预先为每一对象数据标注的修正值;
利用所述样本集中的对象数据的多个特征、与每一特征对应的特征值以及与每一对象数据对应的修正值,训练机器学习模型,得到对象分数修正值预测模型;
获取至少一个待识别对象数据;
将所述待识别对象数据输入至对象分数修正值预测模型,得到所述对象分数修正值预测模型输出的与每一所述待识别对象数据对应的修正值;
针对每一所述待识别对象数据,根据该待识别对象数据中的特征和特征值,在多个预设规则中获取该待识别对象数据满足的预定规则,其中,每一预定规则与特征以及分数对应;
针对每一所述待识别对象数据,根据该待识别对象数据满足的预定规则、该待识别对象数据满足的每一预定规则对应的分数以及所述修正值,确定该待识别对象数据的评分;
根据所述待识别对象数据的评分,在各待识别对象数据对应的对象中识别出异常对象。
第二方面,提供了一种异常对象识别装置,包括:
第一获取模块,被配置为获取包括多个对象数据的样本集,其中,每一所述对象数据与对象对应,每一所述对象数据包括多个特征以及与每一特征对应的特征值,所述样本集还包括预先为每一对象数据标注的修正值;
训练模块,被配置为利用所述样本集中的对象数据的多个特征、与每一特征对应的特征值以及对象数据的修正值,训练机器学习模型,得到对象分数修正值预测模型;
第二获取模块,被配置为获取至少一个待识别对象数据;
输入模块,被配置为将所述待识别对象数据输入至对象分数修正值预测模型,得到所述对象分数修正值预测模型输出的与每一所述待识别对象数据对应的修正值;
第三获取模块,被配置为针对每一所述待识别对象数据,根据该待识别对象数据中的 特征和特征值,在多个预设规则中获取该待识别对象数据满足的预定规则,其中,每一预定规则与特征以及分数对应;
确定模块,被配置为针对每一所述待识别对象数据,根据该待识别对象数据满足的预定规则、该待识别对象数据满足的每一预定规则对应的分数以及所述修正值,确定该待识别对象数据的评分;
识别模块,被配置为根据所述待识别对象数据的评分,在各待识别对象数据对应的对象中识别出异常对象。
第三方面,提供了一种计算设备,包括存储器和处理器,所述存储器中存储有计算机可读指令,所述计算机可读指令被所述处理器执行时,使得所述处理器执行上述异常对象识别方法的步骤。
第四方面,提供了一种存储有计算机可读指令的计算机非易失性可读存储介质,所述计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器执行上述异常对象识别方法的步骤。
本申请的实施例提供的技术方案可以包括以下有益效果:上述异常对象识别方法、装置、计算设备和计算机非易失性可读存储介质,通过首先利用样本集训练得到对象分数修正值预测模型,然后使用该模型获得待识别对象数据的修正值,最后基于待识别对象数据中的特征和特征值对预定规则的满足情况获得待识别对象数据的评分并根据评分来识别异常对象,使得识别结果能很好地对对象的异常程度进行量化,提高了识别异常对象的精度,提高了识别结果的可解释性。
应当理解的是,以上的一般描述和后文的细节描述仅是示例性的,并不能限制本申请。
附图说明
图1是根据一示例性实施例示出的一种异常对象识别方法在异常流量识别应用场景下的系统构架示意图;
图2是根据一示例性实施例示出的一种异常对象识别方法在群内团伙薅羊毛行为识别应用场景下的系统构架示意图;
图3是根据一示例性实施例示出的一种异常对象识别方法的流程图;
图4是根据图3对应实施例示出的一实施例的步骤370的细节的流程图;
图5是根据图3对应实施例示出的一实施例的预定规则对应分数的确定方法流程图;
图6是根据一示例性实施例示出的一种异常对象识别装置的框图;
图7是根据一示例性实施例示出的一种实现上述异常对象识别方法的计算设备示例框图;
图8是根据一示例性实施例示出的一种实现上述异常对象识别方法的计算机非易失性可读存储介质。
具体实施方式
这里将详细地对示例性实施例进行说明,其示例表示在附图中。下面的描述涉及附图时,除非另有表示,不同附图中的相同数字表示相同或相似的要素。以下示例性实施例中所描述的实施方式并不代表与本申请相一致的所有实施方式。相反,它们仅是与如所附权利要求书中所详述的、本申请的一些方面相一致的装置和方法的例子。
此外,附图仅为本申请的示意性图解,并非一定是按比例绘制。图中相同的附图标记表示相同或类似的部分,因而将省略对它们的重复描述。附图中所示的一些方框图是功能实体,不一定必须与物理或逻辑上独立的实体相对应。本申请首先提供了一种异常对象识别方法。对象是指具有一定关联数据的计算机相关设备本身或者任何在计算机设备或者网络平台上存在或者运行的可以作为目标的事物。比如可以是数据对象、终端对象、账号对象等。异常对象是指满足一定条件并被视为异常的对象。对异常对象的识别是指,找出可 能的异常对象的过程。本申请提供的异常对象识别方法可以应用于网络安全领域的多种场景下,比如可以用于对异常流量进行识别,还可以用于监控薅羊毛行为。本申请的实施终端可以是任何具有对数据进行计算和处理功能的设备,该设备可与外部设备相连,用于接收或者发出信息,可以是便携移动设备,例如智能手机、平板电脑、笔记本电脑、PDA(Personal Digital Assistant)等,也可以是固定式设备,例如,计算机设备、现场终端、台式电脑、服务器、工作站等,还可以是多个设备的集合,比如云计算的物理基础设施。
图1是根据一示例性实施例示出的一种异常对象识别方法在异常流量识别应用场景下的系统构架示意图。如图1所示,包括服务器110和多个用户终端120,两者之间通过通信链路连接。图1示出的应用场景下,服务器110和用户终端120之间的架构可以是C/S架构,即Client/Server(客户机/服务器)架构,还可以是B/S架构,即Browser/Server(浏览器/服务器)架构。无论采用的是哪种架构,如果大量用户终端120非法访问服务器110,会给服务器110带来巨大流量,可能会使服务器110宕机,造成损失,所以有必要对造成这些异常流量的来源进行识别,即,将识别出的产生异常流量的来源作为异常对象。在图1示出的实施例中,本申请提供的异常对象识别方法可以运行于服务器110,还可以运行于服务器110之外的终端。
图2是根据一示例性实施例示出的一种异常对象识别方法在群内团伙薅羊毛行为识别应用场景下的系统构架示意图。如图2所示,包括服务器210、基站220以及智能手机230。在图2中,智能手机230可以通过蜂窝网络与基站220连接,然后经由基站与服务器210进行通信。智能手机230安装有服务器210的运营方提供的App(Application,应用程序),智能手机230的用户在第一次使用该App时,需要在服务器210进行注册,服务器210会为该用户分配一个账号,账号可以绑定账户,通过使用该账号,智能手机230的用户可以进一步使用App,与服务器210进行更多的交互行为,这是目前典型的App的运行方式。这些App一般都可以建立聊天群,群内可以进行转账。当App运营方开展活动,比如开展例如注册领红包、参加活动返利等涉及金钱的发放的活动时,不法分子可能就会利用大量注册账号等能获取运营方的活动奖励的方式薅App运营方的羊毛,然后不法分子可以利用群内发红包的方式转移薅羊毛所得资金,给App运营方造成经济损失,所以有必要对薅羊毛行为进行识别,从而实现有针对性地打击。
图3是根据一示例性实施例示出的一种异常对象识别方法的流程图。如图3所示,包括以下步骤:
步骤310,获取包括多个对象数据的样本集。
其中,每一所述对象数据与对象对应,每一所述对象数据包括多个特征以及与每一特征对应的特征值,所述样本集还包括预先为每一对象数据标注的修正值。
修正值是对利用评分规则获得的对象数据的评分进行变换以得到专家评分的过程所使用的值,其中,专家评分是事先根据专家经验对对象数据进行判断从而得出的对象数据的评分。对象数据是与对象有关的数据,可以是涉及对象的自身属性或者对象运行产生的数据。比如,对于异常流量识别的应用场景,对象数据可以是与流量产生方的IP地址有关的数据,对应的对象可以是流量产生方的IP地址,而对象数据包括的特征可以是同一IP地址访问次数,使用同一IP地址访问的账号的数目,使用同一IP地址访问的终端使用的WI-FI名称的数目等,每一特征对应的特征值,则是对应特征的实际取值。
在一个实施例中,本申请所提供的异常对象识别方法可以应用于群内团伙薅羊毛行为识别的应用场景,对象数据可以是与发红包银行卡有关的数据,每一所述对象数据包括的多个特征可以有下列特征:群内发红包银行卡注册手机号个数与群内发红包银行卡个数的比值,发红包银行卡活动红包收入频数占比与发红包银行卡个数比值,其中,发红包银行卡活动红包收入频数占比为发红包银行卡活动红包收入频数与发红包银行卡绑定账户进 出频数的比值,群内收红包银行卡注册手机号的个数等。每一手机号都可以作为一个账号进行注册,而每一注册账号都可以绑定一张或多张银行卡,而每一银行卡也可以用来绑定不同的注册账号,所以发红包银行卡注册手机号的个数可以为多个。相应地,每一特征对应的特征值,则是对应特征的实际取值,此处不再赘述。
在一个实施例中,同一对象数据包括的多个特征以及与每一特征对应的特征值通过映射表的方式进行存储,每一特征为映射表中的键(key),而与特征对应的特征值为值(value)。
步骤320,利用所述样本集中的对象数据的多个特征、与每一特征对应的特征值以及与每一对象数据对应的修正值,训练机器学习模型,得到对象分数修正值预测模型。
训练的机器学习模型可以是多种模型,比如可以是逻辑回归模型、神经网络模型等。具体地,机器学习模型的训练过程可以是这样的:将一个对象数据的多个特征以及与每一特征对应的特征值作为输入,输入至机器学习模型,得到机器学习模型输出的修正值,将该修正值与该对象数据对应的修正值进行比较,若两者不一致,则调整机器学习模型的系数或者权重,直至对于所述样本集中的多个对象数据,使得该机器学习模型根据所述多个对象数据中大多数对象数据输出的修正值和与对象数据对应的修正值相同或者类似。
步骤330,获取至少一个待识别对象数据。
待识别对象数据是待识别的对象产生的数据,与前述的对象数据类似,也可以包括相应的特征以及与每一特征对应的特征值。
比如,对于异常流量识别的应用场景,待识别对象数据可以是与流量产生方的IP地址有关的数据,而对于群内团伙薅羊毛行为识别的应用场景,待识别对象数据可以是与收发红包的银行卡有关的数据。
在一个实施例中,每隔预定时间段,获取在该预定时间段内数据发生变化的待识别对象数据。
在一个实施例中,每当数据发生一次变化,就获取该数据发生变化的待识别对象数据。
步骤340,将所述待识别对象数据输入至对象分数修正值预测模型,得到所述对象分数修正值预测模型输出的与每一所述待识别对象数据对应的修正值。
如前所述,待识别对象数据,也可以包括相应的特征以及与每一特征对应的特征值,则所述对象分数修正值预测模型就可以根据所述待识别对象数据的输入,输出相应的修正值,此外,由于所述对象分数修正值预测模型经过了训练,这样就可以认为所述对象分数修正值预测模型输出的与每一所述待识别对象数据对应的修正值在一定程度上是可靠而准确的。
步骤350,针对每一所述待识别对象数据,根据该待识别对象数据中的特征和特征值,在多个预设规则中获取该待识别对象数据满足的预定规则。
其中,每一预定规则与特征以及分数对应。
预定规则是用于在待识别对象数据中筛选可能的异常对象产生的数据。
在一个实施例中,待识别对象数据中的特征用于确定对应的预定规则,而特征值用于确定该待识别对象数据是否满足预定规则,即通过判断特征对应的特征值是否符合与该特征对应的预定规则,来获取待识别对象数据满足的预定规则。
比如,对于异常流量识别的应用场景,如果对象数据包括的特征是同一IP地址访问次数,使用同一IP地址访问的账号的数目以及使用同一IP地址访问的终端使用的WI-FI名称的数目,各特征对应的预定规则可以是同一IP地址访问次数大于8,使用同一IP地址访问的账号的数目大于6,使用同一IP地址访问的终端使用的WI-FI名称的数目大于7。
对于群内团伙薅羊毛行为识别的应用场景,如果对象数据包括的特征有:群内发红包银行卡注册手机号个数与群内发红包银行卡个数的比值,发红包银行卡活动红包收入频数占比与发红包银行卡个数比值,群内收红包银行卡注册手机号的个数,那么各特征对应的 预定规则可以是:群内发红包银行卡注册手机号个数与群内发红包银行卡个数的比值大于或等于7,发红包银行卡活动红包收入频数占比与发红包银行卡个数比值大于或等于0.99,群内收红包银行卡注册手机号的个数大于或等于7。
在一个实施例中,每一预定规则对应的分数保存在事先根据经验建立的预定规则与分数对应关系表中,通过查找该预定规则与分数对应关系表,获取预定规则对应的分数。
步骤360,针对每一所述待识别对象数据,根据该待识别对象数据满足的预定规则、该待识别对象数据满足的每一预定规则对应的分数以及所述修正值,确定该待识别对象数据的评分。
在一个实施例中,所述对象分数修正值预测模型输出的与每一所述待识别对象数据对应的修正值为偏差修正值,步骤360具体包括以下步骤:
针对每一所述待识别对象数据,根据该待识别对象数据满足的预定规则、该待识别对象数据满足的每一预定规则对应的分数以及所述修正值,利用如下公式获取该待识别对象数据的评分:
Figure PCTCN2019103604-appb-000001
其中,n为待识别对象数据满足的预定规则的数目,i为待识别对象数据满足的预定规则的序号,x i为待识别对象数据满足的第i个预定规则对应的分数,y为待识别对象数据的评分,μ为待识别对象数据的偏差修正值。
上述的偏差修正值是指利用增加偏差的方式修正利用预定规则获得的分数与待识别对象数据的评分的过程所需要的值,上述公式的特点在于,通过累加,将每一待识别对象数据满足的预定规则对应的分数都反映在最终获得的该待识别对象数据的评分之中。所以本实施例的好处是,通过累加所满足的预定规则对应分数的方式全面客观地反映了待识别对象数据对预定规则的满足情况,实现了对对象的异常程度的量化,提高了识别异常对象的精度。
在一个实施例中,所述对象分数修正值预测模型输出的与每一所述待识别对象数据对应的修正值为增长系数,所述针对每一所述待识别对象数据,根据该待识别对象数据满足的预定规则、该待识别对象数据满足的每一预定规则对应的分数以及所述修正值,确定该待识别对象数据的评分,包括:
针对每一所述待识别对象数据,根据该待识别对象数据满足的预定规则、该待识别对象数据满足的每一预定规则对应的分数以及所述修正值,利用如下公式获取该待识别对象数据的评分:
Figure PCTCN2019103604-appb-000002
其中,n为待识别对象数据满足的预定规则的数目,i为待识别对象数据满足的预定规则的序号,x i为待识别对象数据满足的第i个预定规则对应的分数,y为待识别对象数据的评分,k为所述增长系数。
增长系数是指将利用预定规则获得的待识别对象数据的评分转变成要获取的待识别对象数据的实际评分需要增加的比例。本实施例的好处在于,通过各预定规则实现了对待识别对象数据对应对象的异常程度的量化,同时,通过使用增长系数来对利用预定规则获得的分数进行进一步修正,使得获得的待识别对象数据的评分更为客观,提高了识别异常对象的精度。
在一个实施例中,每一预定规则包括基准值,步骤360具体包括以下步骤:
针对每一所述待识别对象数据,根据该待识别对象数据满足的预定规则、该待识别对象数据满足的每一预定规则对应的分数以及所述修正值,利用如下公式确定该待识别对象数据的评分:
Figure PCTCN2019103604-appb-000003
其中,n为待识别对象数据满足的预定规则的数目,i为待识别对象数据满足的预定规则的序号,x i为待识别对象数据满足的第i个预定规则对应的分数,ρ为预定规则对应的特征的特征值与该预定规则的基准值的差值和该预定规则的基准值的比值,y为待识别对象数据的评分,μ为待识别对象数据的修正值。
在上述公式中,最终获得的待识别对象数据的评分不仅与该待识别对象数据满足的预定规则以及修正值有关,还与预定规则对应的特征的特征值相对于预定规则的基准值的变化量有关,变化量越大,相应的预定规则对应分数在获得的待识别对象数据的评分中的作用越大,即,使最终获得的待识别对象数据的评分更多。
在本实施例中,不仅使待识别对象满足的预定规则的评分反映在获得的待识别对象数据的评分之中,还让待识别对象对其所满足的预定规则的实际满足程度体现在最终获得的待识别对象数据的评分之中,在一定程度上实现了对待识别对象数据的评分进一步精细的量化,使得获得的待识别对象数据的评分更为可信,提高了识别异常对象的精度。
在一个实施例中,预定规则包括的基准值是预定规则中用于对特征对应特征值进行判断的限值。比如,对于异常流量识别的应用场景,如果一个特征对应的预定规则是使用同一IP地址访问的账号的数目大于6,则该预定规则包括的特征值是6。
步骤370,根据所述待识别对象数据的评分,在各待识别对象数据对应的对象中识别出异常对象。
在一个实施例中,所述根据所述待识别对象数据的评分,在各待识别对象数据对应的对象中识别出异常对象,包括:
获取评分大于预定评分阈值的待识别对象数据对应的对象作为异常对象。
综上所述,根据图3实施例提供的异常对象识别方法,通过首先训练机器学习模型,然后利用训练好的机器学习模型获得修正值,然后根据待识别对象数据满足的预定规则以及修正值来获得待识别对象数据的评分,最后根据待识别对象数据的评分来识别异常对象,使得识别结果能很好地对异常对象的异常程度进行量化,提高了识别异常对象的精度,提高了识别结果的可解释性。
图4是根据图3对应实施例示出的一实施例的步骤370的细节的流程图。在图4实施例中,所述待识别对象数据还包括至少一个豁免属性以及与每一豁免属性对应的豁免属性值,如图4所示,包括以下步骤:
步骤371,获取评分大于预定评分阈值的待识别对象数据作为候选异常对象数据。
比如,预定评分阈值为0.8,而如果一个待识别对象数据的评分为0.83,则由于该待识别对象数据的评分0.83大于预定评分阈值0.8,就会将该待识别对象数据作为候选异常对象数据。
步骤372,在候选异常对象数据中将豁免属性对应的豁免属性值小于与该豁免属性对应的预设豁免属性值阈值的候选异常对象数据过滤掉,并将过滤后的候选异常对象数据对应的对象作为异常对象。
豁免属性是能大大降低一个对象数据对应的对象为异常对象的可能性的属性,对应的豁免属性值则是豁免属性的实际取值。比如,对于群内团伙薅羊毛行为识别的应用场景,豁免属性可以为银行卡历史红包中奖金额,如果银行卡历史红包中奖金额小于银行卡历史 红包中奖金额阈值,即银行卡历史红包的中奖金额足够小,则说明该候选异常对象数据对应的对象涉嫌参与薅羊毛行为的可能性较低,就可以将该候选异常对象数据过滤掉。
本实施例的好处在于,为对应的对象数据满足预定规则但不太可能为异常对象的对象提供了避免被识别为异常对象的途径,进一步提高了识别异常对象的准确率。
图5是根据图3对应实施例示出的一实施例的预定规则对应分数的确定方法流程图。如图5所示,包括以下步骤:
步骤510,获取包含多个对象数据的正样本。
在一个实施例中,正样本是满足预定条件的多个对象数据的集合,该预定条件用于在对象数据中筛选出对应的对象更可能是异常对象的对象数据。
比如,对于群内团伙薅羊毛行为识别的应用场景,对象数据还包括以下特征:历史上在群内有红包收发记录、银行卡历史中奖红包金额以及银行卡绑定的账户进出账频数,获取的正样本可以是历史上在群内有红包收发行为、银行卡历史中奖红包金额从大到小排名在前20%并且银行卡绑定的账户进出账频数从大到小排名在前20%的银行卡数据。
步骤520,确定所述正样本中对象数据的数目,作为第一数目。
在一个实施例中,本申请的实施终端中内嵌有计数器,通过利用该计数器,可以获得所述正样本中对象数据的数目。
步骤530,针对多个预定规则中的每一预定规则,确定所述正样本中满足该预定规则的对象数据的数目,作为第二数目。
在一个实施例中,本申请的实施终端中内设有计数器,针对多个预定规则中的每一预定规则,每当判断所述正样本中有一个对象数据满足该预定规则,计数器就加1,直至针对该预定规则,对所述正样本中的所有对象数据进行了判断,此时计数器计得的数值即为针对该预定规则获得的第二数目。
步骤540,针对多个预定规则中的每一预定规则,确定与该预定规则对应的所述第二数目和第一数目的比值。
步骤550,针对多个预定规则中的每一预定规则,将针对该预定规则确定的所述比值作为该预定规则对应的分数。
本实施例的好处在于,通过使用对应的对象是异常对象的可能性较高的对象数据作为正样本,然后完全根据满足预定规则的各对象数据在获取的正样本中的占比来确定每一预定规则对应的分数,提高了获取的与每一预定规则对应分数的可信度以及准确率。
本申请还提供了一种异常对象识别装置,以下是本申请的装置实施例。
图6是根据一示例性实施例示出的一种异常对象识别装置的框图。如图6所示,装置600包括:
第一获取模块610,被配置为获取包括多个对象数据的样本集,其中,每一所述对象数据与对象对应,每一所述对象数据包括多个特征以及与每一特征对应的特征值,所述样本集还包括预先为每一对象数据标注的修正值;
训练模块620,被配置为利用所述样本集中的对象数据的多个特征、与每一特征对应的特征值以及对象数据的修正值,训练机器学习模型,得到对象分数修正值预测模型;
第二获取模块630,被配置为获取至少一个待识别对象数据;
输入模块640,被配置为将所述待识别对象数据输入至对象分数修正值预测模型,得到所述对象分数修正值预测模型输出的与每一所述待识别对象数据对应的修正值;
第三获取模块650,被配置为针对每一所述待识别对象数据,根据该待识别对象数据中的特征和特征值,在多个预设规则中获取该待识别对象数据满足的预定规则,其中,每一预定规则与特征以及分数对应;
确定模块660,被配置为针对每一所述待识别对象数据,根据该待识别对象数据满足的预定规则、该待识别对象数据满足的每一预定规则对应的分数以及所述修正值,确定该待识别对象数据的评分;
识别模块670,被配置为根据所述待识别对象数据的评分,在各待识别对象数据对应的对象中识别出异常对象。
在一个实施例中,所述对象分数修正值预测模型输出的与每一所述待识别对象数据对应的修正值为增长系数,所述确定模块被进一步配置为:
针对每一所述待识别对象数据,根据该待识别对象数据满足的预定规则、该待识别对象数据满足的每一预定规则对应的分数以及所述修正值,利用如下公式获取该待识别对象数据的评分:
Figure PCTCN2019103604-appb-000004
其中,n为待识别对象数据满足的预定规则的数目,i为待识别对象数据满足的预定规则的序号,x i为待识别对象数据满足的第i个预定规则对应的分数,y为待识别对象数据的评分,k为所述增长系数。
在一个实施例中,所述对象分数修正值预测模型输出的与每一所述待识别对象数据对应的修正值为偏差修正值,所述确定模块被进一步配置为:
针对每一所述待识别对象数据,根据该待识别对象数据满足的预定规则、该待识别对象数据满足的每一预定规则对应的分数以及所述修正值,利用如下公式获取该待识别对象数据的评分:
Figure PCTCN2019103604-appb-000005
其中,n为待识别对象数据满足的预定规则的数目,i为待识别对象数据满足的预定规则的序号,x i为待识别对象数据满足的第i个预定规则对应的分数,y为待识别对象数据的评分,μ为待识别对象数据的偏差修正值。
在一个实施例中,所述识别模块被进一步配置为:
获取评分大于预定评分阈值的待识别对象数据对应的对象作为异常对象。
在一个实施例中,所述待识别对象数据还包括至少一个豁免属性以及与每一豁免属性对应的豁免属性值,所述根据所述待识别对象数据的评分,在各待识别对象数据对应的对象中识别出异常对象,包括:
获取评分大于预定评分阈值的待识别对象数据作为候选异常对象数据;
在候选异常对象数据中将豁免属性对应的豁免属性值小于与该豁免属性对应的预设豁免属性值阈值的候选异常对象数据过滤掉,并将过滤后的候选异常对象数据对应的对象作为异常对象。
在一个实施例中,每一预定规则对应的分数是通过如下方式确定的:
获取包含多个对象数据的正样本;
确定所述正样本中对象数据的数目,作为第一数目;
针对多个预定规则中的每一预定规则,确定所述正样本中满足该预定规则的对象数据的数目,作为第二数目;
针对多个预定规则中的每一预定规则,确定与该预定规则对应的所述第二数目和第一数目的比值;
针对多个预定规则中的每一预定规则,将针对该预定规则确定的所述比值作为该预定规则对应的分数。
在一个实施例中,每一预定规则包括基准值,所述确定模块被进一步配置为:
针对每一所述待识别对象数据,根据该待识别对象数据满足的预定规则、该待识别对象数据满足的每一预定规则对应的分数以及所述修正值,利用如下公式确定该待识别对象数据的评分:
Figure PCTCN2019103604-appb-000006
其中,n为待识别对象数据满足的预定规则的数目,i为待识别对象数据满足的预定规则的序号,x i为待识别对象数据满足的第i个预定规则对应的分数,ρ为预定规则对应的特征的特征值与该预定规则的基准值的差值和该预定规则的基准值的比值,y为待识别对象数据的评分,μ为待识别对象数据的修正值。
根据本申请的第三方面,还提供了一种计算设备,执行上述任一所示的异常对象识别方法的全部或者部分步骤。该计算设备包括:
至少一个处理器;以及
与所述至少一个处理器通信连接的存储器;其中,
所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行如上述任一个示例性实施例所示出的异常对象识别方法。
所属技术领域的技术人员能够理解,本申请的各个方面可以实现为系统、方法或程序产品。因此,本申请的各个方面可以具体实现为以下形式,即:完全的硬件实施方式、完全的软件实施方式(包括固件、微代码等),或硬件和软件方面结合的实施方式,这里可以统称为“电路”、“模块”或“系统”。
下面参照图7来描述根据本申请的这种实施方式的计算设备700。图7显示的计算设备700仅仅是一个示例,不应对本申请实施例的功能和使用范围带来任何限制。
如图7所示,计算设备700以通用计算设备的形式表现。计算设备700的组件可以包括但不限于:上述至少一个处理单元710、上述至少一个存储单元720、连接不同系统组件(包括存储单元720和处理单元710)的总线730。
其中,所述存储单元存储有程序代码,所述程序代码可以被所述处理单元710执行,使得所述处理单元710执行本说明书上述“实施例方法”部分中描述的根据本申请各种示例性实施方式的步骤。
存储单元720可以包括易失性存储单元形式的可读介质,例如随机存取存储单元(RAM)821和/或高速缓存存储单元722,还可以进一步包括只读存储单元(ROM)723。
存储单元720还可以包括具有一组(至少一个)程序模块825的程序/实用工具724,这样的程序模块725包括但不限于:操作系统、一个或者多个应用程序、其它程序模块以及程序数据,这些示例中的每一个或某种组合中可能包括网络环境的实现。
总线730可以为表示几类总线结构中的一种或多种,包括存储单元总线或者存储单元控制器、外围总线、图形加速端口、处理单元或者使用多种总线结构中的任意总线结构的局域总线。
计算设备700也可以与一个或多个外部设备900(例如键盘、指向设备、蓝牙设备等)通信,还可与一个或者多个使得用户能与该计算设备700交互的设备通信,和/或与使得该计算设备700能与一个或多个其它计算设备进行通信的任何设备(例如路由器、调制解调器等等)通信。这种通信可以通过输入/输出(I/O)接口750进行。并且,计算设备700还可以通过网络适配器760与一个或者多个网络(例如局域网(LAN),广域网(WAN)和/或公共网络,例如因特网)通信。如图所示,网络适配器760通过总线730与计算设备700的其它模块通信。应当明白,尽管图中未示出,可以结合计算设备700使用其它硬 件和/或软件模块,包括但不限于:微代码、设备驱动器、冗余处理单元、外部磁盘驱动阵列、RAID系统、磁带驱动器以及数据备份存储系统等。
通过以上的实施方式的描述,本领域的技术人员易于理解,这里描述的示例实施方式可以通过软件实现,也可以通过软件结合必要的硬件的方式来实现。因此,根据本申请实施方式的技术方案可以以软件产品的形式体现出来,该软件产品可以存储在一个非易失性存储介质(可以是CD-ROM,U盘,移动硬盘等)中或网络上,包括若干指令以使得一台计算设备(可以是个人计算机、服务器、终端装置、或者网络设备等)执行根据本申请实施方式的方法。
根据本申请的第四方面,还提供了一种计算机非易失性可读存储介质,其上存储有能够实现本说明书上述方法的程序产品。在一些可能的实施方式中,本申请的各个方面还可以实现为一种程序产品的形式,其包括程序代码,当所述程序产品在终端设备上运行时,所述程序代码用于使所述终端设备执行本说明书上述“示例性方法”部分中描述的根据本申请各种示例性实施方式的步骤。
参考图8所示,描述了根据本申请的实施方式的用于实现上述方法的计算机非易失性可读存储介质800,其可以采用便携式紧凑盘只读存储器(CD-ROM)并包括程序代码,并可以在终端设备,例如个人电脑上运行。然而,本申请的程序产品不限于此,在本文件中,计算机非易失性可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。
所述程序产品可以采用一个或多个可读介质的任意组合。可读介质可以是可读信号介质或者可读存储介质。可读存储介质例如可以为但不限于电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。可读存储介质的更具体的例子(非穷举的列表)包括:具有一个或多个导线的电连接、便携式盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。
计算机可读信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了可读程序代码。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。可读信号介质还可以是可读存储介质以外的任何可读介质,该可读介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。
可读介质上包含的程序代码可以用任何适当的介质传输,包括但不限于无线、有线、光缆、RF等等,或者上述的任意合适的组合。
可以以一种或多种程序设计语言的任意组合来编写用于执行本申请操作的程序代码,所述程序设计语言包括面向对象的程序设计语言—诸如Java、C++等,还包括常规的过程式程序设计语言—诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算设备上执行、部分地在用户设备上执行、作为一个独立的软件包执行、部分在用户计算设备上部分在远程计算设备上执行、或者完全在远程计算设备或服务器上执行。在涉及远程计算设备的情形中,远程计算设备可以通过任意种类的网络,包括局域网(LAN)或广域网(WAN),连接到用户计算设备,或者,可以连接到外部计算设备(例如利用因特网服务提供商来通过因特网连接)。
此外,上述附图仅是根据本申请示例性实施例的方法所包括的处理的示意性说明,而不是限制目的。易于理解,上述附图所示的处理并不表明或限制这些处理的时间顺序。另外,也易于理解,这些处理可以是例如在多个模块中同步或异步执行的。
应当理解的是,本申请并不局限于上面已经描述并在附图中示出的精确结构,并且可以在不脱离其范围执行各种修改和改变。本申请的范围仅由所附的权利要求来限制。

Claims (28)

  1. 一种异常对象识别方法,其特征在于,所述方法包括:
    获取包括多个对象数据的样本集,其中,每一所述对象数据与对象对应,每一所述对象数据包括多个特征以及与每一特征对应的特征值,所述样本集还包括预先为每一对象数据标注的修正值;
    利用所述样本集中的对象数据的多个特征、与每一特征对应的特征值以及与每一对象数据对应的修正值,训练机器学习模型,得到对象分数修正值预测模型;
    获取至少一个待识别对象数据;
    将所述待识别对象数据输入至对象分数修正值预测模型,得到所述对象分数修正值预测模型输出的与每一所述待识别对象数据对应的修正值;
    针对每一所述待识别对象数据,根据该待识别对象数据中的特征和特征值,在多个预设规则中获取该待识别对象数据满足的预定规则,其中,每一预定规则与特征以及分数对应;
    针对每一所述待识别对象数据,根据该待识别对象数据满足的预定规则、该待识别对象数据满足的每一预定规则对应的分数以及所述修正值,确定该待识别对象数据的评分;
    根据所述待识别对象数据的评分,在各待识别对象数据对应的对象中识别出异常对象。
  2. 根据权利要求1所述的方法,其特征在于,所述对象分数修正值预测模型输出的与每一所述待识别对象数据对应的修正值为增长系数,所述针对每一所述待识别对象数据,根据该待识别对象数据满足的预定规则、该待识别对象数据满足的每一预定规则对应的分数以及所述修正值,确定该待识别对象数据的评分,包括:
    针对每一所述待识别对象数据,根据该待识别对象数据满足的预定规则、该待识别对象数据满足的每一预定规则对应的分数以及所述修正值,利用如下公式获取该待识别对象数据的评分:
    Figure PCTCN2019103604-appb-100001
    其中,n为待识别对象数据满足的预定规则的数目,i为待识别对象数据满足的预定规则的序号,x i为待识别对象数据满足的第i个预定规则对应的分数,y为待识别对象数据的评分,k为所述增长系数。
  3. 根据权利要求1所述的方法,其特征在于,所述对象分数修正值预测模型输出的与每一所述待识别对象数据对应的修正值为偏差修正值,所述针对每一所述待识别对象数据,根据该待识别对象数据满足的预定规则、该待识别对象数据满足的每一预定规则对应的分数以及所述修正值,确定该待识别对象数据的评分,包括:
    针对每一所述待识别对象数据,根据该待识别对象数据满足的预定规则、该待识别对象数据满足的每一预定规则对应的分数以及所述修正值,利用如下公式获取该待识别对象数据的评分:
    Figure PCTCN2019103604-appb-100002
    其中,n为待识别对象数据满足的预定规则的数目,i为待识别对象数据满足的预定规则的序号,x i为待识别对象数据满足的第i个预定规则对应的分数,y为待识别对象数据的评分,μ为待识别对象数据的偏差修正值。
  4. 根据权利要求1所述的方法,其特征在于,所述根据所述待识别对象数据的评分,在各待识别对象数据对应的对象中识别出异常对象,包括:
    获取评分大于预定评分阈值的待识别对象数据对应的对象作为异常对象。
  5. 根据权利要求1所述的方法,其特征在于,所述待识别对象数据还包括至少一个豁免属性以及与每一豁免属性对应的豁免属性值,所述根据所述待识别对象数据的评分,在各待识别对象数据对应的对象中识别出异常对象,包括:
    获取评分大于预定评分阈值的待识别对象数据作为候选异常对象数据;
    在候选异常对象数据中将豁免属性对应的豁免属性值小于与该豁免属性对应的预设豁免属性值阈值的候选异常对象数据过滤掉,并将过滤后的候选异常对象数据对应的对象作为异常对象。
  6. 根据权利要求1所述的方法,其特征在于,每一预定规则对应的分数是通过如下方式确定的:
    获取包含多个对象数据的正样本;
    确定所述正样本中对象数据的数目,作为第一数目;
    针对多个预定规则中的每一预定规则,确定所述正样本中满足该预定规则的对象数据的数目,作为第二数目;
    针对多个预定规则中的每一预定规则,确定与该预定规则对应的所述第二数目和第一数目的比值;
    针对多个预定规则中的每一预定规则,将针对该预定规则确定的所述比值作为该预定规则对应的分数。
  7. 根据权利要求1所述的方法,其特征在于,每一预定规则包括基准值,所述针对每一所述待识别对象数据,根据该待识别对象数据满足的预定规则、该待识别对象数据满足的每一预定规则对应的分数以及所述修正值,确定该待识别对象数据的评分,包括:
    针对每一所述待识别对象数据,根据该待识别对象数据满足的预定规则、该待识别对象数据满足的每一预定规则对应的分数以及所述修正值,利用如下公式确定该待识别对象数据的评分:
    Figure PCTCN2019103604-appb-100003
    其中,n为待识别对象数据满足的预定规则的数目,i为待识别对象数据满足的预定规则的序号,x i为待识别对象数据满足的第i个预定规则对应的分数,ρ为预定规则对应的特征的特征值与该预定规则的基准值的差值和该预定规则的基准值的比值,y为待识别对象数据的评分,μ为待识别对象数据的修正值。
  8. 一种异常对象识别装置,其特征在于,所述装置包括:
    第一获取模块,被配置为获取包括多个对象数据的样本集,其中,每一所述对象数据与对象对应,每一所述对象数据包括多个特征以及与每一特征对应的特征值,所述样本集还包括预先为每一对象数据标注的修正值;
    训练模块,被配置为利用所述样本集中的对象数据的多个特征、与每一特征对应的特征值以及对象数据的修正值,训练机器学习模型,得到对象分数修正值预测模型;
    第二获取模块,被配置为获取至少一个待识别对象数据;
    输入模块,被配置为将所述待识别对象数据输入至对象分数修正值预测模型,得到所述对象分数修正值预测模型输出的与每一所述待识别对象数据对应的修正值;
    第三获取模块,被配置为针对每一所述待识别对象数据,根据该待识别对象数据中的特征和特征值,在多个预设规则中获取该待识别对象数据满足的预定规则,其中,每一预定规则与特征以及分数对应;
    确定模块,被配置为针对每一所述待识别对象数据,根据该待识别对象数据满足的预定规则、该待识别对象数据满足的每一预定规则对应的分数以及所述修正值,确定该待识别对象数据的评分;
    识别模块,被配置为根据所述待识别对象数据的评分,在各待识别对象数据对应的对象中识别出异常对象。
  9. 根据权利要求8所述的装置,其特征在于,所述对象分数修正值预测模型输出的与每一所述待识别对象数据对应的修正值为增长系数,所述确定模块被进一步配置为:
    针对每一所述待识别对象数据,根据该待识别对象数据满足的预定规则、该待识别对象数据满足的每一预定规则对应的分数以及所述修正值,利用如下公式获取该待识别对象数据的评分:
    Figure PCTCN2019103604-appb-100004
    其中,n为待识别对象数据满足的预定规则的数目,i为待识别对象数据满足的预定规则的序号,x i为待识别对象数据满足的第i个预定规则对应的分数,y为待识别对象数据的评分,k为所述增长系数。
  10. 根据权利要求8所述的装置,其特征在于,所述对象分数修正值预测模型输出的与每一所述待识别对象数据对应的修正值为偏差修正值,所述确定模块被进一步配置为:
    针对每一所述待识别对象数据,根据该待识别对象数据满足的预定规则、该待识别对象数据满足的每一预定规则对应的分数以及所述修正值,利用如下公式获取该待识别对象数据的评分:
    Figure PCTCN2019103604-appb-100005
    其中,n为待识别对象数据满足的预定规则的数目,i为待识别对象数据满足的预定规则的序号,x i为待识别对象数据满足的第i个预定规则对应的分数,y为待识别对象数据的评分,μ为待识别对象数据的偏差修正值。
  11. 根据权利要求8所述的装置,其特征在于,所述识别模块被进一步配置为:
    获取评分大于预定评分阈值的待识别对象数据对应的对象作为异常对象。
  12. 根据权利要求8所述的装置,其特征在于,所述待识别对象数据还包括至少一个豁免属性以及与每一豁免属性对应的豁免属性值,所述根据所述待识别对象数据的评分,在各待识别对象数据对应的对象中识别出异常对象,包括:
    获取评分大于预定评分阈值的待识别对象数据作为候选异常对象数据;
    在候选异常对象数据中将豁免属性对应的豁免属性值小于与该豁免属性对应的预设豁免属性值阈值的候选异常对象数据过滤掉,并将过滤后的候选异常对象数据对应的对象作为异常对象。
  13. 根据权利要求8所述的装置,其特征在于,每一预定规则对应的分数是通过如下方式确定的:
    获取包含多个对象数据的正样本;
    确定所述正样本中对象数据的数目,作为第一数目;
    针对多个预定规则中的每一预定规则,确定所述正样本中满足该预定规则的对象数据的数目,作为第二数目;
    针对多个预定规则中的每一预定规则,确定与该预定规则对应的所述第二数目和第一数目的比值;
    针对多个预定规则中的每一预定规则,将针对该预定规则确定的所述比值作为该预定规则对应的分数。
  14. 根据权利要求8所述的装置,其特征在于,每一预定规则包括基准值,所述确定模块被进一步配置为:
    针对每一所述待识别对象数据,根据该待识别对象数据满足的预定规则、该待识别对象数据满足的每一预定规则对应的分数以及所述修正值,利用如下公式确定该待识别对象数据的评分:
    Figure PCTCN2019103604-appb-100006
    其中,n为待识别对象数据满足的预定规则的数目,i为待识别对象数据满足的预定规则的序号,x i为待识别对象数据满足的第i个预定规则对应的分数,ρ为预定规则对应的特征的特征值与该预定规则的基准值的差值和该预定规则的基准值的比值,y为待识别对象数据的评分,μ为待识别对象数据的修正值。
  15. 一种计算设备,包括存储器和处理器,所述存储器中存储有计算机可读指令,所述计算机可读指令被所述处理器执行时,使得所述处理器执行:
    获取包括多个对象数据的样本集,其中,每一所述对象数据与对象对应,每一所述对象数据包括多个特征以及与每一特征对应的特征值,所述样本集还包括预先为每一对象数据标注的修正值;
    利用所述样本集中的对象数据的多个特征、与每一特征对应的特征值以及与每一对象数据对应的修正值,训练机器学习模型,得到对象分数修正值预测模型;
    获取至少一个待识别对象数据;
    将所述待识别对象数据输入至对象分数修正值预测模型,得到所述对象分数修正值预测模型输出的与每一所述待识别对象数据对应的修正值;
    针对每一所述待识别对象数据,根据该待识别对象数据中的特征和特征值,在多个预设规则中获取该待识别对象数据满足的预定规则,其中,每一预定规则与特征以及分数对应;
    针对每一所述待识别对象数据,根据该待识别对象数据满足的预定规则、该待识别对象数据满足的每一预定规则对应的分数以及所述修正值,确定该待识别对象数据的评分;
    根据所述待识别对象数据的评分,在各待识别对象数据对应的对象中识别出异常对象。
  16. 根据权利要求15所述的计算设备,其特征在于,所述对象分数修正值预测模型输出的与每一所述待识别对象数据对应的修正值为增长系数,所述针对每一所述待识别对象数据,根据该待识别对象数据满足的预定规则、该待识别对象数据满足的每一预定规则对应的分数以及所述修正值,确定该待识别对象数据的评分,包括:
    针对每一所述待识别对象数据,根据该待识别对象数据满足的预定规则、该待识别对象数据满足的每一预定规则对应的分数以及所述修正值,利用如下公式获取该待识别对象数据的评分:
    Figure PCTCN2019103604-appb-100007
    其中,n为待识别对象数据满足的预定规则的数目,i为待识别对象数据满足的预定规则的序号,x i为待识别对象数据满足的第i个预定规则对应的分数,y为待识别对象数据的评分,k为所述增长系数。
  17. 根据权利要求15所述的计算设备,其特征在于,所述对象分数修正值预测模型输出的与每一所述待识别对象数据对应的修正值为偏差修正值,所述针对每一所述待识别对象数据,根据该待识别对象数据满足的预定规则、该待识别对象数据满足的每一预定规则对应的分数以及所述修正值,确定该待识别对象数据的评分,包括:
    针对每一所述待识别对象数据,根据该待识别对象数据满足的预定规则、该待识别对象数据满足的每一预定规则对应的分数以及所述修正值,利用如下公式获取该待识别对象数据的评分:
    Figure PCTCN2019103604-appb-100008
    其中,n为待识别对象数据满足的预定规则的数目,i为待识别对象数据满足的预定规则的序号,x i为待识别对象数据满足的第i个预定规则对应的分数,y为待识别对象数据的评分,μ为待识别对象数据的偏差修正值。
  18. 根据权利要求15所述的计算设备,其特征在于,所述根据所述待识别对象数据的评分,在各待识别对象数据对应的对象中识别出异常对象,包括:
    获取评分大于预定评分阈值的待识别对象数据对应的对象作为异常对象。
  19. 根据权利要求15所述的计算设备,其特征在于,所述待识别对象数据还包括至少一个豁免属性以及与每一豁免属性对应的豁免属性值,所述根据所述待识别对象数据的评分,在各待识别对象数据对应的对象中识别出异常对象,包括:
    获取评分大于预定评分阈值的待识别对象数据作为候选异常对象数据;
    在候选异常对象数据中将豁免属性对应的豁免属性值小于与该豁免属性对应的预设豁免属性值阈值的候选异常对象数据过滤掉,并将过滤后的候选异常对象数据对应的对象作为异常对象。
  20. 根据权利要求15所述的计算设备,其特征在于,每一预定规则对应的分数是通过如下方式确定的:
    获取包含多个对象数据的正样本;
    确定所述正样本中对象数据的数目,作为第一数目;
    针对多个预定规则中的每一预定规则,确定所述正样本中满足该预定规则的对象数据的数目,作为第二数目;
    针对多个预定规则中的每一预定规则,确定与该预定规则对应的所述第二数目和第一数目的比值;
    针对多个预定规则中的每一预定规则,将针对该预定规则确定的所述比值作为该预定规则对应的分数。
  21. 根据权利要求15所述的计算设备,其特征在于,每一预定规则包括基准值,所述针对每一所述待识别对象数据,根据该待识别对象数据满足的预定规则、该待识别对象数据满足的每一预定规则对应的分数以及所述修正值,确定该待识别对象数据的评分,包括:
    针对每一所述待识别对象数据,根据该待识别对象数据满足的预定规则、该待识别对象数据满足的每一预定规则对应的分数以及所述修正值,利用如下公式确定该待识别对象数据的评分:
    Figure PCTCN2019103604-appb-100009
    其中,n为待识别对象数据满足的预定规则的数目,i为待识别对象数据满足的预定规则的序号,x i为待识别对象数据满足的第i个预定规则对应的分数,ρ为预定规则对应的特征的特征值与该预定规则的基准值的差值和该预定规则的基准值的比值,y为待识别对象数据的评分,μ为待识别对象数据的修正值。
  22. 一种存储有计算机可读指令的计算机非易失性可读存储介质,所述计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器执行:
    获取包括多个对象数据的样本集,其中,每一所述对象数据与对象对应,每一所述对象数据包括多个特征以及与每一特征对应的特征值,所述样本集还包括预先为每一对象数据标注的修正值;
    利用所述样本集中的对象数据的多个特征、与每一特征对应的特征值以及与每一对象数 据对应的修正值,训练机器学习模型,得到对象分数修正值预测模型;
    获取至少一个待识别对象数据;
    将所述待识别对象数据输入至对象分数修正值预测模型,得到所述对象分数修正值预测模型输出的与每一所述待识别对象数据对应的修正值;
    针对每一所述待识别对象数据,根据该待识别对象数据中的特征和特征值,在多个预设规则中获取该待识别对象数据满足的预定规则,其中,每一预定规则与特征以及分数对应;
    针对每一所述待识别对象数据,根据该待识别对象数据满足的预定规则、该待识别对象数据满足的每一预定规则对应的分数以及所述修正值,确定该待识别对象数据的评分;
    根据所述待识别对象数据的评分,在各待识别对象数据对应的对象中识别出异常对象。
  23. 根据权利要求22所述的计算机非易失性可读存储介质,其特征在于,所述对象分数修正值预测模型输出的与每一所述待识别对象数据对应的修正值为增长系数,所述针对每一所述待识别对象数据,根据该待识别对象数据满足的预定规则、该待识别对象数据满足的每一预定规则对应的分数以及所述修正值,确定该待识别对象数据的评分,包括:
    针对每一所述待识别对象数据,根据该待识别对象数据满足的预定规则、该待识别对象数据满足的每一预定规则对应的分数以及所述修正值,利用如下公式获取该待识别对象数据的评分:
    Figure PCTCN2019103604-appb-100010
    其中,n为待识别对象数据满足的预定规则的数目,i为待识别对象数据满足的预定规则的序号,x i为待识别对象数据满足的第i个预定规则对应的分数,y为待识别对象数据的评分,k为所述增长系数。
  24. 根据权利要求22所述的计算机非易失性可读存储介质,其特征在于,所述对象分数修正值预测模型输出的与每一所述待识别对象数据对应的修正值为偏差修正值,所述针对每一所述待识别对象数据,根据该待识别对象数据满足的预定规则、该待识别对象数据满足的每一预定规则对应的分数以及所述修正值,确定该待识别对象数据的评分,包括:
    针对每一所述待识别对象数据,根据该待识别对象数据满足的预定规则、该待识别对象数据满足的每一预定规则对应的分数以及所述修正值,利用如下公式获取该待识别对象数据的评分:
    Figure PCTCN2019103604-appb-100011
    其中,n为待识别对象数据满足的预定规则的数目,i为待识别对象数据满足的预定规则的序号,x i为待识别对象数据满足的第i个预定规则对应的分数,y为待识别对象数据的评分,μ为待识别对象数据的偏差修正值。
  25. 根据权利要求22所述的计算机非易失性可读存储介质,其特征在于,所述根据所述待识别对象数据的评分,在各待识别对象数据对应的对象中识别出异常对象,包括:
    获取评分大于预定评分阈值的待识别对象数据对应的对象作为异常对象。
  26. 根据权利要求22所述的计算机非易失性可读存储介质,其特征在于,所述待识别对象数据还包括至少一个豁免属性以及与每一豁免属性对应的豁免属性值,所述根据所述待识别对象数据的评分,在各待识别对象数据对应的对象中识别出异常对象,包括:
    获取评分大于预定评分阈值的待识别对象数据作为候选异常对象数据;
    在候选异常对象数据中将豁免属性对应的豁免属性值小于与该豁免属性对应的预设豁免属性值阈值的候选异常对象数据过滤掉,并将过滤后的候选异常对象数据对应的对象作为异常对象。
  27. 根据权利要求22所述的计算机非易失性可读存储介质,其特征在于,每一预定规则对应的分数是通过如下方式确定的:
    获取包含多个对象数据的正样本;
    确定所述正样本中对象数据的数目,作为第一数目;
    针对多个预定规则中的每一预定规则,确定所述正样本中满足该预定规则的对象数据的数目,作为第二数目;
    针对多个预定规则中的每一预定规则,确定与该预定规则对应的所述第二数目和第一数目的比值;
    针对多个预定规则中的每一预定规则,将针对该预定规则确定的所述比值作为该预定规则对应的分数。
  28. 根据权利要求22所述的计算机非易失性可读存储介质,其特征在于,每一预定规则包括基准值,所述针对每一所述待识别对象数据,根据该待识别对象数据满足的预定规则、该待识别对象数据满足的每一预定规则对应的分数以及所述修正值,确定该待识别对象数据的评分,包括:
    针对每一所述待识别对象数据,根据该待识别对象数据满足的预定规则、该待识别对象数据满足的每一预定规则对应的分数以及所述修正值,利用如下公式确定该待识别对象数据的评分:
    Figure PCTCN2019103604-appb-100012
    其中,n为待识别对象数据满足的预定规则的数目,i为待识别对象数据满足的预定规则的序号,x i为待识别对象数据满足的第i个预定规则对应的分数,ρ为预定规则对应的特征的特征值与该预定规则的基准值的差值和该预定规则的基准值的比值,y为待识别对象数据的评分,μ为待识别对象数据的修正值。
PCT/CN2019/103604 2019-05-23 2019-08-30 异常对象识别方法、装置、计算设备和存储介质 WO2020232902A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910435976.4A CN110348471B (zh) 2019-05-23 2019-05-23 异常对象识别方法、装置、介质及电子设备
CN201910435976.4 2019-05-23

Publications (1)

Publication Number Publication Date
WO2020232902A1 true WO2020232902A1 (zh) 2020-11-26

Family

ID=68173956

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/103604 WO2020232902A1 (zh) 2019-05-23 2019-08-30 异常对象识别方法、装置、计算设备和存储介质

Country Status (2)

Country Link
CN (1) CN110348471B (zh)
WO (1) WO2020232902A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114666123A (zh) * 2022-03-21 2022-06-24 阿里云计算有限公司 异常对象识别方法及装置
CN114866486A (zh) * 2022-03-18 2022-08-05 广州大学 一种基于数据包的加密流量分类系统

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111985703B (zh) * 2020-08-12 2022-07-29 支付宝(杭州)信息技术有限公司 一种用户身份状态预测方法、装置及设备
CN114419528B (zh) * 2022-04-01 2022-07-08 浙江口碑网络技术有限公司 异常识别方法、装置、计算机设备及计算机可读存储介质

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015058616A1 (zh) * 2013-10-23 2015-04-30 腾讯科技(深圳)有限公司 恶意网站的识别方法和装置
US20150205862A1 (en) * 2011-03-18 2015-07-23 Jean-Charles Campagne Method and device for recognizing and labeling peaks, increases, or abnormal or exceptional variations in the throughput of a stream of digital documents
CN107153971A (zh) * 2017-05-05 2017-09-12 北京京东尚科信息技术有限公司 用于识别app推广中设备作弊的方法及装置
US20190044967A1 (en) * 2018-09-12 2019-02-07 Intel Corporation Identification of a malicious string
CN109509048A (zh) * 2017-09-15 2019-03-22 北京京东尚科信息技术有限公司 恶意订单识别方法、装置、电子设备及存储介质
CN109685536A (zh) * 2017-10-18 2019-04-26 北京京东尚科信息技术有限公司 用于输出信息的方法和装置
CN109740352A (zh) * 2018-12-28 2019-05-10 微梦创科网络科技(中国)有限公司 一种账号处理方法、装置及电子设备

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109636430A (zh) * 2017-10-09 2019-04-16 北京京东尚科信息技术有限公司 对象识别方法及其系统
CN109639633B (zh) * 2018-11-02 2021-11-12 平安科技(深圳)有限公司 异常流量数据识别方法、装置、介质及电子设备
CN109522304B (zh) * 2018-11-23 2021-05-18 中国联合网络通信集团有限公司 异常对象识别方法及装置、存储介质
CN109787960B (zh) * 2018-12-19 2022-09-02 中国平安人寿保险股份有限公司 异常流量数据识别方法、装置、介质及电子设备

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150205862A1 (en) * 2011-03-18 2015-07-23 Jean-Charles Campagne Method and device for recognizing and labeling peaks, increases, or abnormal or exceptional variations in the throughput of a stream of digital documents
WO2015058616A1 (zh) * 2013-10-23 2015-04-30 腾讯科技(深圳)有限公司 恶意网站的识别方法和装置
CN107153971A (zh) * 2017-05-05 2017-09-12 北京京东尚科信息技术有限公司 用于识别app推广中设备作弊的方法及装置
CN109509048A (zh) * 2017-09-15 2019-03-22 北京京东尚科信息技术有限公司 恶意订单识别方法、装置、电子设备及存储介质
CN109685536A (zh) * 2017-10-18 2019-04-26 北京京东尚科信息技术有限公司 用于输出信息的方法和装置
US20190044967A1 (en) * 2018-09-12 2019-02-07 Intel Corporation Identification of a malicious string
CN109740352A (zh) * 2018-12-28 2019-05-10 微梦创科网络科技(中国)有限公司 一种账号处理方法、装置及电子设备

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114866486A (zh) * 2022-03-18 2022-08-05 广州大学 一种基于数据包的加密流量分类系统
CN114666123A (zh) * 2022-03-21 2022-06-24 阿里云计算有限公司 异常对象识别方法及装置

Also Published As

Publication number Publication date
CN110348471A (zh) 2019-10-18
CN110348471B (zh) 2023-09-01

Similar Documents

Publication Publication Date Title
CN110992169B (zh) 一种风险评估方法、装置、服务器及存储介质
CN109241418B (zh) 基于随机森林的异常用户识别方法及装置、设备、介质
CN110197315B (zh) 风险评估方法、装置及其存储介质
WO2020232902A1 (zh) 异常对象识别方法、装置、计算设备和存储介质
US10438297B2 (en) Anti-money laundering platform for mining and analyzing data to identify money launderers
US12118552B2 (en) User profiling based on transaction data associated with a user
US20200304550A1 (en) Generic Event Stream Processing for Machine Learning
AU2016328959B2 (en) Updating attribute data structures to indicate trends in attribute data provided to automated modeling systems
CN110442712B (zh) 风险的确定方法、装置、服务器和文本审理系统
CN111210335B (zh) 用户风险识别方法、装置及电子设备
US20170018030A1 (en) System and Method for Determining Credit Worthiness of a User
CN111612038B (zh) 异常用户检测方法及装置、存储介质、电子设备
WO2021196935A1 (zh) 数据校验方法、装置、电子设备和存储介质
US20230023630A1 (en) Creating predictor variables for prediction models from unstructured data using natural language processing
CN111586695B (zh) 短信识别方法及相关设备
US20230214677A1 (en) Techniques for evaluating an effect of changes to machine learning models
CN111210109A (zh) 基于关联用户预测用户风险的方法、装置和电子设备
CN111582645B (zh) 基于因子分解机的app风险评估方法、装置和电子设备
CN116628163A (zh) 客服服务处理方法、装置、设备及存储介质
CN109684198B (zh) 待测试数据获取方法、装置、介质、电子设备
WO2021174814A1 (zh) 众包任务的答案验证方法、装置、计算机设备及存储介质
WO2019095569A1 (zh) 基于微博财经事件的金融分析方法、应用服务器及计算机可读存储介质
US11983747B2 (en) Using machine learning to identify hidden software issues
WO2020252880A1 (zh) 反向图灵验证方法及装置、存储介质、电子设备
CN116739605A (zh) 交易数据检测方法、装置、设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19929736

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19929736

Country of ref document: EP

Kind code of ref document: A1