CN107341716B

CN107341716B - Malicious order identification method and device and electronic equipment

Info

Publication number: CN107341716B
Application number: CN201710560874.6A
Authority: CN
Inventors: 钱春江; 余文喆; 杜红光
Original assignee: Beijing QIYI Century Science and Technology Co Ltd
Current assignee: Beijing QIYI Century Science and Technology Co Ltd
Priority date: 2017-07-11
Filing date: 2017-07-11
Publication date: 2020-12-25
Anticipated expiration: 2037-07-11
Also published as: CN107341716A

Abstract

The embodiment of the invention provides a method and a device for identifying malicious orders and electronic equipment, wherein the method comprises the following steps: acquiring data of order behaviors to be identified; analyzing the data of the order behaviors to be identified by using an analysis model to obtain malicious scores of the order behaviors to be identified, wherein the analysis model is obtained by performing model training according to preset data of the order behaviors; and judging whether the order behavior to be identified belongs to malicious order behavior or not according to the malicious scores. By using the analysis model to analyze the data of the order behavior to be identified, the success rate of malicious order identification can be improved and the scope of malicious order identification can be enlarged.

Description

Malicious order identification method and device and electronic equipment

Technical Field

The present invention relates to the field of network technologies, and in particular, to a method and an apparatus for identifying malicious orders, and an electronic device.

Background

With the rise of internet e-commerce, the security of online shopping is also increasingly emphasized. Many malicious users use loopholes or price differences in the e-commerce to swipe and snatch orders, which causes disadvantages or even losses to the vast consumer groups with normal demands and the e-commerce.

However, the inventor finds that the prior art has at least the following problems in the process of implementing the invention:

the existing e-commerce adopts targeted identification in each link, such as a method of specially identifying whether access is too frequent or not, and a method of specially identifying whether addresses of consignees are similar or not. The identification methods are independent and based on limited functions, whether user order behaviors are malicious or not is judged, and malicious users can easily bypass the limited identification functions and conduct malicious order behaviors without being discovered. It can be seen that with the improvement of the anti-monitoring strategy of the malicious order maker, the identification success rate of the existing malicious order identification technology is low, and the identification range is narrow.

Disclosure of Invention

The embodiment of the invention aims to provide a method and a device for identifying malicious orders and electronic equipment, so as to improve the success rate and range of identifying malicious orders. The specific technical scheme is as follows:

a method of malicious order identification, the method comprising:

acquiring data of order behaviors to be identified;

analyzing the data of the order behaviors to be identified by using an analysis model to obtain malicious scores of the order behaviors to be identified, wherein the analysis model is obtained by performing model training according to preset data of the order behaviors;

and judging whether the order behavior to be identified belongs to malicious order behavior or not according to the malicious scores.

Optionally, before analyzing the data of the order behavior to be identified by using the analysis model to obtain the malicious score of the order behavior to be identified, the method further includes:

judging whether the user type to which the data of the order behaviors to be identified belongs is a new user or not, wherein the user type comprises a new user or an old user, the new user is a user of which the historical order behavior number is smaller than a preset first threshold value, and the old user is a user of which the historical order behavior number is larger than or equal to the first threshold value;

the analyzing the data of the order behavior to be identified by using the analysis model to obtain the malicious score of the order behavior to be identified comprises the following steps:

when the user type of the data of the order behaviors to be identified is judged to be a new user, calculating to obtain a first similarity between the order behaviors to be identified and malicious order behaviors marked by a first analysis submodel, and taking the first similarity as malicious scores of the order behaviors to be identified, wherein the first analysis submodel is one of the analysis models, and performing K-means cluster analysis on the data of the historical order behaviors of the sample user to obtain analysis submodels of classes formed by normal order behaviors of different levels and classes formed by malicious order behaviors of different levels;

or when the user type to which the data of the order behaviors to be identified belongs is judged to be an old user, calculating to obtain a second similarity between the order behaviors to be identified and the malicious order behaviors marked by the first analysis sub-model;

inputting the data of the order behaviors to be identified into a second analysis submodel corresponding to the user to which the order behaviors to be identified belong, calculating to obtain a third similarity between the order behaviors to be identified and the historical order behaviors of the user to which the order behaviors to be identified belong, performing score aggregation on the second similarity and the third similarity, and taking the result of the score aggregation as the malicious score of the order behaviors to be identified, wherein the second analysis submodel is one of the analysis models and is an analysis submodel corresponding to each sample user, which is obtained by performing logistic regression training by using the data of the individual historical order behaviors of the sample user for each sample user.

Optionally, the determining, according to the malicious score, whether the order behavior to be identified belongs to a malicious order behavior includes:

when the user type to which the data of the order behaviors to be identified belongs is a new user and the malicious score is greater than a preset second threshold value, determining that the order behaviors to be identified belong to malicious order behaviors;

or when the user type to which the data of the order behavior to be identified belongs is a new user and the malicious score is smaller than or equal to the second threshold value, determining that the order behavior to be identified does not belong to a malicious order behavior;

or when the user type to which the data of the order behavior to be identified belongs is an old user and the malicious score is greater than a preset third threshold value, determining that the order behavior to be identified belongs to a malicious order behavior;

or when the user type to which the data of the order behavior to be identified belongs is an old user and the malicious score is smaller than or equal to the third threshold value, determining that the order behavior to be identified does not belong to a malicious order behavior.

Optionally, the order behavior includes one or more of the following:

the IP address of the order access, the geographic location of the IP address, the equipment used by the order request, the goods type of the order, the quantity of each order, the order time, the payment method, the third-level address of the consignee, the name of the consignee and the telephone of the consignee.

An apparatus for malicious order identification, the apparatus comprising:

the data acquisition module is used for acquiring data of order behaviors to be identified;

the score obtaining module is used for analyzing the data of the order behaviors to be identified by utilizing an analysis model to obtain the malicious scores of the order behaviors to be identified, wherein the analysis model is obtained by performing model training according to the preset data of the order behaviors;

and the behavior judgment module is used for judging whether the order behavior to be identified belongs to malicious order behavior according to the malicious scores.

Optionally, the apparatus further includes a type determining module, and the score obtaining module includes: a first score obtaining sub-module and a second score obtaining sub-module;

the type judging module is used for judging whether the user type to which the data of the order behaviors to be identified belongs is a new user or not, wherein the user type comprises a new user or an old user, the new user is a user of which the historical order behavior number is smaller than a preset first threshold value, and the old user is a user of which the historical order behavior number is larger than or equal to the first threshold value; if the user type to which the data of the order behavior to be identified belongs is a new user, triggering the first grading obtaining sub-module, and if the user type to which the data of the order behavior to be identified belongs is an old user, triggering the second grading obtaining sub-module;

the first score obtaining sub-module is used for calculating and obtaining a first similarity between the order behaviors to be identified and malicious order behaviors marked by a first analysis sub-model, and taking the first similarity as a malicious score of the order behaviors to be identified, wherein the first analysis sub-model is one of the analysis models and carries out K-means cluster analysis on data of historical order behaviors of a sample user to obtain analysis sub-models of classes formed by normal order behaviors of different levels and classes formed by malicious order behaviors of different levels;

the second grading obtaining sub-module is used for calculating and obtaining a second similarity between the order behaviors to be identified and the malicious order behaviors marked by the first analysis sub-model;

Optionally, the behavior determining module includes: the device comprises a first grading judgment sub-module, a first behavior determination sub-module, a second behavior determination sub-module and a second grading judgment sub-module;

the first scoring judgment sub-module is configured to judge whether the malicious score is greater than a preset second threshold value or not, trigger the first behavior determination sub-module if the malicious score is greater than the second threshold value, and trigger the second behavior determination sub-module if the malicious score is less than or equal to the second threshold value;

the first behavior determining submodule is used for determining that the order behavior to be identified belongs to malicious order behavior;

the second behavior determining submodule is used for determining that the order behavior to be identified does not belong to malicious order behavior;

the second scoring judgment sub-module is configured to judge whether the malicious score is greater than a preset third threshold value or not, trigger the first behavior determination sub-module if the malicious score is greater than the third threshold value, and trigger the second behavior determination sub-module if the malicious score is less than or equal to the third threshold value.

Optionally, the order behavior includes one or more of the following:

In another aspect of the present invention, there is also provided an electronic device, including a processor, a communication interface, a memory and a communication bus, where the processor, the communication interface and the memory complete communication with each other through the communication bus;

a memory for storing a computer program;

and the processor is used for realizing any malicious order identification method when executing the program stored in the memory.

In yet another aspect of the present invention, there is also provided a computer-readable storage medium having stored therein instructions, which when run on a computer, cause the computer to perform any one of the above described methods for malicious order identification.

In yet another aspect of the present invention, the present invention also provides a computer program product containing instructions, which when run on a computer, causes the computer to execute any of the above described methods for malicious order identification.

In the scheme provided by the embodiment of the invention, the received data of the order behaviors to be identified can be analyzed by utilizing an analysis model to obtain the malicious scores of the order behaviors to be identified, wherein the analysis model is obtained by performing model training according to the preset data of the order behaviors, and whether the order behaviors belong to the malicious order behaviors or not is judged according to the malicious scores. Therefore, when the embodiment of the invention is applied, the analysis model is obtained based on the data training of the preset order behavior, so that the analysis model can be expanded according to the requirement, the analysis model has self-adaptability and a wide analysis range, the success rate of malicious order recognition is improved, and the malicious order recognition range is expanded. Of course, not all of the advantages described above need to be achieved at the same time in the practice of any one product or method of the invention.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below.

Fig. 1 is a block diagram of a system for malicious order identification according to an embodiment of the present invention;

fig. 2 is a first flowchart illustrating a malicious order identification method according to an embodiment of the present invention;

fig. 3 is a second flowchart illustrating a malicious order identification method according to an embodiment of the present invention;

FIG. 4 is a diagram illustrating an isolation result of clustering using K-means according to an embodiment of the present invention;

fig. 5 is a third flowchart illustrating a malicious order identification method according to an embodiment of the present invention;

fig. 6 is a schematic structural diagram of a malicious order identification apparatus according to an embodiment of the present invention;

fig. 7 is a schematic structural diagram of a malicious order identification apparatus according to an embodiment of the present invention;

fig. 8 is a schematic structural diagram of a malicious order identification apparatus according to an embodiment of the present invention;

fig. 9 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be described below with reference to the drawings in the embodiments of the present invention.

In the prior art, malicious orders are identified in a targeted manner in various links, for example, an IP address having access to the orders is identified, and when a rapid increase in the number of orders within a period of time of the same IP address is detected, the orders can be determined as malicious orders or suspicious orders, and further identified. However, when a malicious user utilizes a malicious means to make the access IP addresses of each malicious order different, the above method cannot identify the malicious orders, and thus the existing method has a low success rate of identification.

Based on the above, the inventor considers that the historical order behaviors of the user include the order habits of the user, considers the utilization of statistical learning and machine learning, constructs a multi-dimensional and self-adaptive analysis model to calculate the similarity between the order behaviors to be identified and the historical order behaviors, and determines whether the order behaviors to be identified belong to malicious order behaviors or not through the calculated similarity so as to improve the success rate of malicious order identification.

Based on the above consideration, the invention provides a method for identifying malicious orders, which analyzes the order behaviors to be identified by using an analysis model constructed based on historical order behaviors, obtains the malicious scores of the order behaviors to be identified, and judges whether the order behaviors to be identified belong to the malicious order behaviors or not according to the malicious scores. When the analysis model is constructed, the data of the preset order behaviors are adopted, and new data can be added or unnecessary data can be deleted according to needs, so that the analysis model has multiple dimensions and self-adaptability, the order behaviors are prevented from being identified from a single dimension, the success rate of malicious order identification can be improved, and the malicious order identification range can be enlarged.

Fig. 1 is a block diagram of a system for identifying malicious orders according to an embodiment of the present invention.

After receiving the order request behavior, judging whether enough data of the historical order behavior of the user requesting the order is stored, if so, analyzing by using a personal order behavior model and a group order behavior model, and if not, analyzing by using the group order behavior model;

wherein the training and analyzing comprises: personal behavior training and analysis, group behavior training and analysis;

personal behavior training and analysis: performing model training by using data of a user's personal order behavior to obtain a personal order behavior model, and analyzing the user's to-be-identified order behavior by using the personal order behavior model to obtain the similarity between the user's to-be-identified order behavior and the user's historical order behavior;

group behavior training and analysis: performing model training by using data of all personal order behaviors of a sample user to obtain group order behavior models with different malicious order behavior grades, and analyzing the order behaviors to be identified by using the group order behavior models to obtain the malicious grades of the order behaviors to be identified;

and comprehensively analyzing to obtain a score result of the order behavior to be identified by using the obtained similarity and the malicious level.

Fig. 2 is a first flowchart of a malicious order identification method according to an embodiment of the present invention, including:

s201: and acquiring data of order behaviors to be identified.

Specifically, in this embodiment, the order behavior includes: the IP address of the order access, the geographic location of the IP address, the equipment used by the order request, the goods type of the order, the quantity of each order, the order time, the payment method, the third-level address of the consignee, the name of the consignee and the telephone of the consignee.

The information contained in the order behavior is directly related to the order behavior, and when the information in the order behavior is suspicious, the order behavior can be judged to belong to malicious order behavior, or the order behavior is classified as suspicious order behavior and is further identified. Therefore, whether the order behavior belongs to malicious order behaviors or not can be identified according to the change condition of the information in the order behavior.

S202: and analyzing the data of the order behaviors to be identified by using the analysis model to obtain the malicious scores of the order behaviors to be identified.

The analysis model is obtained by performing model training according to preset data of order behaviors.

In this embodiment, the preset order behavior may include: the type of goods (most students buy electronic products and most middle-aged people buy healthcare products), the time of the order (office workers often place orders at night or on weekends), and the quantity of each order (non-malicious persons often do not buy in bulk at once for luxury goods).

The probability that the order behavior to be identified belongs to the malicious order behavior can be obtained by using the trained analysis model, the obtained probability is used as the malicious score of the order behavior to be identified, the distance that the order behavior to be identified deviates from the normal order behavior can also be obtained by using the trained analysis model, the obtained distance is used as the malicious score of the order behavior to be identified, and whether the order behavior to be identified belongs to the malicious order behavior can be judged by further analyzing the obtained malicious score.

S203: and judging whether the order behavior to be identified belongs to malicious order behavior or not according to the malicious scores.

Specifically, the obtained malicious score may be compared with a preset threshold to determine whether the order behavior to be identified belongs to a malicious order behavior, when the malicious score is greater than the threshold, it is determined that the order behavior to be identified belongs to the malicious order behavior, and when the malicious score is less than or equal to the threshold, it is determined that the order behavior to be identified does not belong to the malicious order behavior; or when the malicious score is 1, determining that the order behavior to be identified belongs to the malicious order behavior, and when the malicious score is 0, determining that the order behavior to be identified does not belong to the malicious order behavior.

After judging whether the order behavior to be identified belongs to the malicious order behavior by utilizing the malicious score, the order behavior to be identified can be used as a new training sample to obtain a new analysis model, so that the identification accuracy of the analysis model is improved.

As can be seen from the above, in the scheme provided in this embodiment, an analysis model is constructed according to data of preset order behaviors to analyze the order behaviors to be recognized, so as to obtain a malicious score of the order behaviors to be recognized, and whether the order behaviors to be recognized belong to malicious order behaviors is determined according to the malicious score. Compared with the prior art, in the scheme provided by the embodiment, the order behaviors of the user can be analyzed by using the statistical learning and machine learning analysis models, wherein when the analysis models are constructed, the preset data of the order behaviors are adopted, and new data can be added or unnecessary data can be deleted according to needs, so that the analysis models have multiple dimensions and self-adaptability, the order behaviors are prevented from being identified from a single dimension, the success rate and the range of malicious order identification can be improved, and the security monitoring can be performed on the e-commerce system of a company.

In an embodiment of the present invention, referring to fig. 3, a second flowchart of a malicious order identification method is provided, including:

s301: and acquiring data of order behaviors to be identified.

This step is the same as S201 in the above embodiment, and is not described herein again.

S302: and judging whether the user type of the data of the order behavior to be identified belongs to a new user, if so, executing S3031, and if so, executing S3032.

The user type comprises a new user or an old user, the new user is a user with the historical order behavior number smaller than a preset first threshold, and the old user is a user with the historical order behavior number larger than or equal to the preset first threshold.

For example, the first threshold may be 20, which is not limited in the present application. When the order behaviors to be identified of the user A are received, if the number of the historical order behaviors of the user A is less than 20, determining the user A as a new user; and when the order behaviors to be identified of the user B are received, if the number of the historical order behaviors of the user B is more than or equal to 20, determining that the user B is an old user.

Whether the user type to which the data of the order behavior to be identified belongs is a new user or not is judged, so that different analyses can be performed according to different user types to obtain a more accurate analysis result.

S3031: and calculating to obtain a first similarity between the order behaviors to be identified and the malicious order behaviors marked by the first analysis sub-model, and taking the first similarity as a malicious score of the order behaviors to be identified.

The first analysis submodel is one of the analysis models and is used for performing K-means clustering analysis on the data of the historical order behaviors of the sample user to obtain analysis submodels of classes formed by normal order behaviors in different levels and classes formed by malicious order behaviors in different levels; the group order behavior model is used for analyzing the group user order behavior. By carrying out cluster analysis on the data of the order behaviors of the sample users, the malicious order behaviors can be separated from the normal order behaviors, and classes with different grades of the malicious order behaviors can be obtained according to the result of manual marking.

For example, a malicious orderer typically uses a cloud machine to simulate browser access for a large amount of frequent order accesses, and although the order accesses have different user names and different consignee telephones, the malicious orderer can be identified by similarity in three dimensions, namely the geographic location of the IP address, the order item type and the third-level address of the consignee.

Through clustering analysis, new user order behaviors can also be identified. For example, when a malicious order maker performs malicious order behaviors by using the newly added machine, the malicious order behaviors of the newly added machine may also show similarities with the separated malicious order behaviors and be identified.

A first analysis submodel for training sample X ═ { X using K-means algorithm⁽¹⁾，…，x^(m)Performing cluster analysis, wherein X comprises historical order behaviors of sample users, wherein the historical order behaviors comprise malicious order behaviors and normal order behaviors, and X is^(m)Representing the mth order behavior in the training sample, including the preset data of each dimension in the mth order behavior, wherein m represents the order behavior in the training sampleThe value of m is a natural number greater than 0. For example, in the cluster analysis, historical order behaviors of 300 sample users are collected as training samples, and if m is 300, the larger the value of m is, that is, the larger the number of training samples is, the more accurate the cluster analysis is, but the larger the amount of data to be processed is, and in practical application, the value of m can be adjusted according to different scenes or personal experience.

Randomly selecting K samples in X as clustering center points U ═ mu₁，μ₂…μ_k}，1＜k≤m。

For each training sample x⁽ⁱ⁾The class to which it should belong is calculated using equation (1).

C⁽ⁱ⁾＝argmin_j||x⁽ⁱ⁾-μ_j||² (1)

Wherein x is⁽ⁱ⁾Representing the ith order behavior in the training sample, i is more than or equal to 1 and less than or equal to m, mu_jJ is more than or equal to 1 and less than or equal to k and C⁽ⁱ⁾Denotes x⁽ⁱ⁾Class to which x is calculated⁽ⁱ⁾Difference with all the clustering centroid points in U when training sample x⁽ⁱ⁾And cluster centroid points mu_jWhen the difference is minimum, the training sample x is confirmed⁽ⁱ⁾Belonging to a cluster centroid point mu_jClass j where it is.

After all training samples belonging to class j are obtained, the centroid point of class j is recalculated using equation (2).

Wherein, mu_jRepresenting the centroid, x, of class j⁽ⁱ⁾Representing the ith order behavior in the training sample, C⁽ⁱ⁾Denotes x⁽ⁱ⁾The class to which it currently belongs.

And repeating the calculation processes of the formula (1) and the formula (2) until the first analysis submodel converges.

Wherein, the convergence condition of the first analysis submodel may be:

the difference value of all the clustering centroid points before and after recalculation is smaller than a preset threshold value; alternatively, the first and second electrodes may be,

for each class, the sum of the squared differences of all samples in the class and the centroid point thereof is less than another preset threshold;

or, other convergence criteria.

By carrying out cluster analysis on the data of the historical order behaviors of the sample user and utilizing the result of manual marking, classes with different grades of normal order behaviors and malicious order behaviors can be obtained. For example, the class classification shown in table 1 is specifically a correspondence between a malicious class and a probability of belonging to a malicious order, and a correspondence between a normal class and a probability of not belonging to a malicious order.

TABLE 1

Malicious level	Probability of belonging to a malicious order
		Malicious level 0	50％-60％
Malicious level 1	60％-70％
		Malicious level 2	70％-80％
Malicious level 3	80％-90％
		Malicious level 4	90％-100％
Grade of normality	Probability of not belonging to a malicious order
		Normal rating of 0	50％-60％
Normal class 1	60％-70％
		Normal class 2	70％-80％
Normal class 3	80％-90％
		Normal class 4	90％-100％

And calculating the similarity of the order behaviors to be identified and the obtained mass center points of the classes with different grades of the malicious order behaviors, and expressing the malicious grade of the order behaviors to be identified by using the obtained similarity.

Fig. 4 is a schematic diagram of a separation result of clustering analysis using K-means, where the order behaviors in the training sample are only divided into two types, one type is represented by dots, and the other type is represented by triangles, which represent malicious order behaviors and normal order behaviors, respectively, and the type containing the malicious order behaviors can be determined by the result of manual labeling.

In one implementation, a first similarity between the order behavior to be identified and the malicious order behavior marked by the first analysis submodel is obtained, similarities of centroid points of classes of different levels of the order behavior to be identified and the marked malicious order behavior may be calculated respectively, the calculated similarities are subjected to weighted summation, and a summation result is used as the first similarity. The greater the first similarity, that is, the more similar the order behavior to be identified is to the malicious order behavior marked by the first analysis submodel, the more likely the order behavior to be identified is to belong to the malicious order behavior, so the first similarity can be used as a malicious score.

When calculating the similarity, the similarity may be calculated by using a euclidean distance or a pearson similarity, or may be calculated by using another algorithm.

For example, classes of malicious order behaviors of three different levels, namely malicious level 0, malicious level 1 and malicious level 2, are obtained, wherein the centroid point of the malicious level 0 class is μ_oCentroid point of malicious level 1 class is μ_pCentroid point of malicious level 2 class is μ_q。

By utilizing the Pearson similarity calculation, the order behavior to be identified and the centroid point mu can be obtained_oHas a similarity of A_oAnd the centroid point mu_pHas a similarity of A_pAnd the centroid point mu_qHas a similarity of A_q。

The malicious score of the new user's order behavior to be identified can be calculated by using formula (3).

S＝0.1A_o+0.3A_p+0.6A_q (3)

Wherein S represents the malicious score, and in practical application, A can be evaluated according to different scenes or personal experience_o、A_pAnd A_qThe weight of the user is adjusted.

The malicious score of the order behavior to be identified of the new user reflects the similarity between the order behavior to be identified and the malicious order behavior, and whether the order behavior to be identified belongs to the malicious order behavior can be accurately judged according to the similarity.

S3032: calculating to obtain a second similarity between the order behaviors to be identified and the malicious order behaviors marked by the first analysis sub-model; inputting data of the order behaviors to be identified into a second analysis sub-model corresponding to the user to which the order behaviors to be identified belong, calculating to obtain a third similarity between the order behaviors to be identified and the historical order behaviors of the user to which the order behaviors to be identified belong, performing score summation on the second similarity and the third similarity, and taking the result of the score summation as the malicious score of the order behaviors to be identified.

The step of calculating the second similarity between the order behavior to be identified and the malicious order behavior marked by the first analysis submodel is consistent with the step in S3031, and is not described herein again.

The second analysis submodel is one of the analysis models, and is an analysis submodel corresponding to each sample user, which is obtained by performing logistic regression training using data of personal historical order behaviors of the sample user, for each sample user, that is, the personal order behavior model, and each user has a corresponding second analysis submodel for analyzing the order behaviors of the user and determining whether the order behaviors of the user conform to the past order habits of the user. The user's order behavior preferences are observed and counted over time, and if the order behavior deviates from the user's past order habits, the order behavior can be further identified.

For example, the user a usually only places an order at about 10 pm and purchases a number of usb flash disks and camera accessories, and it is detected that the user a purchases a large number of lipsticks in the middle of the day, and although the number of lipsticks purchased in each order is not large, the number of orders is large, which requires analyzing the order behavior of the user a purchasing the lipsticks.

And (3) training the historical order behavior data of each user by using a formula (4) and adopting Logistic Regression to construct a second analysis sub-model corresponding to the user.

f(x)＝θ^Tx (4)

Where θ represents a model parameter, i.e. a regression coefficient, and x represents data of each preset dimension of the user's historical order behavior, which can be represented by a matrix (5).

Wherein x is₁₁，x₂₁，…x_n1Represents the user oneThe data of each preset dimension in the historical order behavior, namely representing information such as order quantity, order access IP address and payment mode, has n dimensions, and the value of n is a natural number larger than 0. For example, the data of the preset order behavior includes: the order quantity, the IP address of order access and the payment mode are only three dimensions, and n is 3; if the data of the preset order behavior comprises: and n is 5 when the order quantity, the payment mode, the IP address accessed by the order, the geographic position of the IP address and the third-level address of the receiver are in five dimensions. j represents the number of the historical order behaviors of the user adopted when the second analysis submodel is constructed, and the value of j is a natural number which is larger than 0. For example, if the second analysis sub-model is constructed by using 20 historical order behaviors of the user, j is 20, and the value of j is larger, that is, the more the historical order behaviors of the user are used, the more accurate the analysis effect of the second analysis sub-model corresponding to the user is obtained, but the larger the data size to be processed is, and in practical application, the value of j can be adjusted according to different scenes or personal experience.

Based on the second analysis submodel corresponding to the user, the probability that the order behavior to be identified of the user is similar to the historical order behavior of the user can be obtained by using the formula (6).

Wherein x represents data of order behavior to be identified of the user, sigma represents S-shaped growth curve Sigmoid, theta represents model parameter, and h_θ(x) And P (y ═ 1| x) denotes an order behavior corresponding to the data x of order behaviors to be identified, which is a probability similar to the historical order behavior of the user. And expressing the similarity between the order behavior to be identified of the user and the historical order behavior of the user by the obtained probability.

The matrix (5) can be reduced to a distributed system, and training of logistic regression can be completed by using a Machine learning library (Machine learning lib, abbreviated as MLib) of Spark to obtain a model parameter θ.

And inputting the data of the order behavior to be identified of the user into the trained second analysis sub-model corresponding to the user, and obtaining a third similarity between the order behavior to be identified of the user and the historical order behavior of the user by using a formula (5).

The greater the third similarity is, that is, the greater the probability that the order behavior to be identified is similar to the historical order behavior of the user is, the more unlikely the order behavior to be identified is to belong to the malicious order behavior, therefore, when the second similarity and the third similarity are integrated, the opposite number of the third similarity and the second similarity can be weighted and summed to obtain the malicious score, or the third similarity can be used to obtain the probability that the order behavior to be identified is not similar to the historical order behavior of the user, and the dissimilar probability and the second similarity are weighted and summed to obtain the malicious score.

For example, a class of malicious order behaviors of three different levels, malicious level 0, malicious level 1, and malicious level 2, is obtained, where the centroid store of the malicious level 0 class is μ_oCentroid point of malicious level 1 class is μ_pCentroid point of malicious level 2 class is μ_q。

The behavior of the order to be recognized and the centroid point mu can be obtained by utilizing the Pearson similarity calculation_oThe similarity of (A)'_oAnd the centroid point mu_pThe similarity of (A)'_pAnd the centroid point mu_qThe similarity of (A)'_q。

The second similarity may be calculated using equation (7).

S₁＝0.1A′_o+0.4A′_p+0.5A′_q (7)

Wherein S is₁The second similarity is expressed, and in practical application, the A 'can be obtained according to different scenes or personal experiences'_o、A′_pAnd A'_qThe weight of the user is adjusted.

The third similarity is obtained by equation (8).

Wherein S is₂Indicating a third degree of similarity.

And performing score summation on the second similarity and the third similarity to obtain a malicious score of the order behavior to be identified.

Specifically, the malicious score can be calculated by using formula (9).

S＝0.4S₁+0.6(1-S₂) (9)

In practical application, the weight value can be adjusted according to different scenes or personal experience.

The score aggregation may be performed using a score accumulator or may be performed using an online adjustable polynomial function.

The malicious score of the to-be-identified order behavior of the old user comprises the similarity between the to-be-identified order behavior and the malicious order behavior and the degree of deviation of the to-be-identified order behavior from the individual order habit, and the two behaviors are combined for identification, so that a more accurate identification result can be obtained.

As can be seen from the above, in the scheme provided by this embodiment, for the order behavior to be identified of the new user, the malicious score is directly calculated by using the first analysis submodel; and for the order behavior to be identified of the old user, calculating a second similarity by using the first analysis submodel, calculating a third similarity by using the second analysis submodel corresponding to the old user, and obtaining a malicious score by combining the second similarity and the third similarity. Compared with the prior art, in the scheme provided by the embodiment, different analyses are performed on the order behavior to be identified of the new user and the order behavior to be identified of the old user, so that more accurate malicious scores can be obtained, and the success rate of malicious order identification is further improved.

In an embodiment of the present invention, referring to fig. 5, a third flowchart of a malicious order identification method is provided, including:

s301: and acquiring data of order behaviors to be identified.

S301, S302, S3031 and S3032 are described in detail in the above embodiments, and are not described herein again.

S3041: and for the order behavior to be identified of the new user, judging whether the malicious score is greater than a preset second threshold, if so, executing S3042, and if not, executing S3043.

The second threshold is set to measure the malice score of the to-be-identified order behavior of the new user, the higher the malice score is, the more likely the to-be-identified order behavior of the new user belongs to the malice order behavior, when the malice score is greater than the second threshold, the to-be-identified order behavior of the new user can be determined to belong to the malice order behavior, and when the malice score is less than or equal to the second threshold, the to-be-identified order behavior of the new user can be determined not to belong to the malice order behavior.

In one implementation manner, the calculated malicious score of the order behavior to be identified of the new user is in a range from 0 to 1 by using the pearson similarity, at this time, the second threshold value may be set to be 0.5, the malicious score of the order behavior to be identified of the new user is compared with 0.5, whether the order behavior to be identified of the new user belongs to the malicious order behavior is judged according to the comparison result, and a more accurate judgment result can be obtained by judging according to the result of the numerical comparison.

In practical applications, the second threshold may be adjusted according to different similarity calculation methods.

S3042: and determining that the order behavior to be identified belongs to malicious order behavior.

In one implementation, when the malicious score of the order behavior to be identified of the new user is greater than a second threshold value, determining that the order behavior to be identified of the new user belongs to malicious order behavior; or when the malicious score of the order behavior to be identified of the old user is larger than the third threshold, determining that the order behavior to be identified of the old user belongs to the malicious order behavior.

And determining that the order behavior to be identified belongs to the malicious order behavior, updating the analysis model by using the order behavior to be identified, improving the identification accuracy of the analysis model, and performing key monitoring or other subsequent processing on the order behavior of the user to which the order behavior to be identified belongs.

S3043: and determining that the order behavior to be identified does not belong to malicious order behavior.

In one implementation, when the malicious score of the order behavior to be identified of the new user is smaller than or equal to a second threshold, determining that the order behavior to be identified of the new user does not belong to the malicious order behavior; or when the malicious score of the order behavior to be identified of the old user is smaller than or equal to the third threshold, determining that the order behavior to be identified of the old user does not belong to the malicious order behavior.

And determining that the order behavior to be identified does not belong to the malicious order behavior, and updating the analysis model by using the order behavior to be identified so as to improve the identification accuracy of the analysis model.

S3044: and for the order behaviors to be identified of the old user, judging whether the malicious score is greater than a preset third threshold, if so, executing S3042, and if not, executing S3043.

The third threshold is set to measure a malice score of the to-be-identified order behavior of the old user, the higher the malice score is, the more likely the to-be-identified order behavior of the old user belongs to the malice order behavior, when the malice score is greater than the third threshold, it can be determined that the to-be-identified order behavior of the old user belongs to the malice order behavior, and when the malice score is less than or equal to the third threshold, it can be determined that the to-be-identified order behavior of the old user does not belong to the malice order behavior.

In one implementation manner, the calculated malicious score of the to-be-identified order behavior of the old user is in a range from 0 to 1 by using the pearson similarity, the second threshold value may be set to 0.5, the malicious score of the to-be-identified order behavior of the old user is compared with 0.5, whether the to-be-identified order behavior of the old user belongs to the malicious order behavior is judged according to the comparison result, and a more accurate judgment result can be obtained by judging according to the result of the numerical comparison.

In practical applications, the third threshold may be adjusted according to different similarity calculation methods.

As can be seen from the above, in the scheme provided in this embodiment, for the order behavior to be identified of the new user, the obtained malicious score is compared with the second threshold value, so as to determine whether the order behavior to be identified of the new user belongs to a malicious order behavior; and for the order behaviors to be identified of the old user, comparing the obtained malicious score with a third threshold value to judge whether the order behaviors to be identified of the old user belong to malicious order behaviors. Compared with the prior art, in the scheme provided by the embodiment, when comparing the malicious scores, the obtained malicious scores are compared with different threshold values according to different user types to which the order behaviors to be identified belong, so as to judge whether the order behaviors to be identified belong to the malicious order behaviors, and a more accurate comparison result can be obtained, so that the success rate of malicious order identification is improved.

In an embodiment of the present invention, when the analysis model is constructed by using the data of the preset order behavior, the order behavior may include one or a combination of the following: the IP address of the order access, the geographic location of the IP address, the equipment used by the order request, the goods type of the order, the quantity of each order, the order time, the payment method, the third-level address of the consignee, the name of the consignee and the telephone of the consignee.

The data contained in the order behaviors can be used as a basis for judging whether the order behaviors to be identified belong to malicious order behaviors, and the data can be combined, added or deleted according to different scenes.

As can be seen from the above, in the scheme provided by this embodiment, the order behaviors include important data for identifying malicious orders, and the malicious order behaviors can be accurately identified according to the data, so that the success rate of identifying malicious orders is improved.

Corresponding to the malicious order identification method, the embodiment of the invention also provides a malicious order identification device.

Fig. 6 is a schematic structural diagram of a malicious order identification apparatus according to an embodiment of the present invention, including: a data acquisition module 601, a score acquisition module 602 and a behavior judgment module 603.

The data acquisition module 601 is configured to acquire data of order behaviors to be identified;

a score obtaining module 602, configured to analyze the data of the order behavior to be identified by using an analysis model, and obtain a malicious score of the order behavior to be identified, where the analysis model is obtained by performing model training according to preset data of the order behavior;

and a behavior judging module 603, configured to judge whether the order behavior to be identified belongs to a malicious order behavior according to the malicious score.

In an embodiment of the present invention, referring to fig. 7, a second structural diagram of a malicious order identification apparatus is provided, including: a data acquisition module 701, a type judgment module 702, a first score obtaining sub-module 7031, a second score obtaining sub-module 7032, and a behavior judgment module 704.

The data obtaining module 701 is the same as the data obtaining module 601 in the above embodiments, and details are not repeated here.

The type determination submodule 702 is configured to determine a user type to which the data of the order behavior to be identified belongs, where the user type includes a new user or an old user, the new user is a user whose historical order behavior number is smaller than a preset first threshold, and the old user is a user whose historical order behavior number is greater than or equal to the first threshold; if the user type to which the data of the order behavior to be identified belongs is a new user, triggering the first score obtaining sub-module 7031, and if the user type to which the data of the order behavior to be identified belongs is an old user, triggering the second score obtaining sub-module 7032;

the first score obtaining sub-module 7031 is configured to calculate and obtain a first similarity between the order behavior to be identified and a malicious order behavior marked by a first analysis sub-model, and use the first similarity as a malicious score of the order behavior to be identified, where the first analysis sub-model is one of the analysis models, and performs K-means cluster analysis on data of historical order behaviors of a sample user to obtain analysis sub-models of classes of different grades of a normal order behavior and a malicious order behavior;

the second score obtaining sub-module 7032 is configured to calculate and obtain a second similarity between the order behavior to be identified and the malicious order behavior marked by the first analysis sub-model, and use the second similarity as a first malicious score of the order behavior to be identified; inputting the data of the order behaviors to be identified into a second analysis submodel corresponding to the user to which the order behaviors to be identified belong, calculating to obtain a third similarity between the order behaviors to be identified and the historical order behaviors of the user to which the order behaviors to be identified belong, generating a second malicious score of the order behaviors to be identified according to the third similarity, performing score summation on the first malicious score and the second malicious score, and taking the result of the score summation as the malicious score of the order behaviors to be identified, wherein the second analysis submodel is one of the analysis models, and is an analysis submodel corresponding to each sample user, which is obtained by performing logistic regression training by using the data of the individual historical order behaviors of the sample user.

The behavior determining module 704 is consistent with the behavior determining module 603 in the above embodiments, and is not described herein again.

In an embodiment of the present invention, referring to fig. 8, a third structural diagram of a malicious order identification apparatus is provided, wherein the behavior determination module 704 includes: the method comprises the following steps: a first scoring sub-module 7041, a first behavior determination sub-module 7042, a second behavior determination sub-module 7043, and a second scoring sub-module 7044.

The first scoring judgment sub-module 7041 is configured to judge whether the malicious score is greater than a preset second threshold value or not, if the malicious score is greater than the preset second threshold value, trigger the first behavior determination sub-module 7042, and if the malicious score is less than or equal to the second threshold value, trigger the second behavior determination sub-module 7043, where the user type to which the data of the to-be-identified order behavior belongs is a new user;

a first behavior determining sub-module 7042, configured to determine that the order behavior to be identified belongs to a malicious order behavior;

a second behavior determining sub-module 7043, configured to determine that the order behavior to be identified does not belong to a malicious order behavior;

the second scoring judgment sub-module 7044 is configured to judge whether the user type to which the data of the order behavior to be identified belongs is an old user, determine whether the malicious score is greater than a preset third threshold, trigger the first behavior determination sub-module 7042 if the malicious score is greater than the third threshold, and trigger the second behavior determination sub-module 7043 if the malicious score is less than or equal to the third threshold.

An embodiment of the present invention further provides an electronic device, as shown in fig. 9, which includes a processor 901, a communication interface 902, a memory 903, and a communication bus 904, where the processor 901, the communication interface 902, and the memory 903 complete mutual communication through the communication bus 904,

a memory 903 for storing computer programs;

the processor 901 is configured to implement the following steps when executing the program stored in the memory 903:

acquiring data of order behaviors to be identified;

The communication bus mentioned in the electronic device may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The communication bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown, but this does not mean that there is only one bus or one type of bus.

The communication interface is used for communication between the electronic equipment and other equipment.

The Memory may include a Random Access Memory (RAM) or a non-volatile Memory (non-volatile Memory), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the processor.

The Processor may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like; the Integrated Circuit may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, a discrete Gate or transistor logic device, or a discrete hardware component.

In yet another embodiment of the present invention, a computer-readable storage medium is further provided, which has instructions stored therein, and when the instructions are executed on a computer, the instructions cause the computer to execute the method for malicious order identification described in any one of the above embodiments.

In yet another embodiment of the present invention, there is also provided a computer program product containing instructions which, when run on a computer, cause the computer to perform the method of malicious order identification as in any of the above embodiments.

In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the invention to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website site, computer, server, or data center to another website site, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.

It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the system embodiment, since it is substantially similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.

The above description is only for the preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention shall fall within the protection scope of the present invention.

Claims

1. A method of malicious order identification, the method comprising:

acquiring data of order behaviors to be identified;

judging whether the order behavior to be identified belongs to malicious order behavior or not according to the malicious score;

before analyzing the data of the order behavior to be identified by using the analysis model to obtain the malice score of the order behavior to be identified, the method further comprises:

2. The method of claim 1, wherein the determining whether the order behavior to be identified belongs to a malicious order behavior according to the malicious score comprises:

3. The method according to claim 1 or 2, wherein the order behavior comprises one or a combination of the following:

4. An apparatus for malicious order identification, the apparatus comprising:

the behavior judging module is used for judging whether the order behavior to be identified belongs to malicious order behavior according to the malicious scores;

the device also comprises a type judging module, and the grading obtaining module comprises: a first score obtaining sub-module and a second score obtaining sub-module;

5. The apparatus of claim 4, wherein the behavior determination module comprises: the system comprises a first grading judgment submodule, a first behavior judgment submodule, a second behavior judgment submodule and a second grading judgment submodule;

6. The apparatus of claim 4 or 5, wherein the order behavior comprises one or more of the following:

7. An electronic device is characterized by comprising a processor, a communication interface, a memory and a communication bus, wherein the processor and the communication interface are used for realizing mutual communication by the memory through the communication bus;

a memory for storing a computer program;

a processor for implementing the method steps of any of claims 1 to 3 when executing a program stored in the memory.