CN114493250A - Abnormal behavior detection method, computing device and readable storage medium - Google Patents

Abnormal behavior detection method, computing device and readable storage medium Download PDF

Info

Publication number
CN114493250A
CN114493250A CN202210083460.XA CN202210083460A CN114493250A CN 114493250 A CN114493250 A CN 114493250A CN 202210083460 A CN202210083460 A CN 202210083460A CN 114493250 A CN114493250 A CN 114493250A
Authority
CN
China
Prior art keywords
behavior data
data sample
sample
behavior
samples
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210083460.XA
Other languages
Chinese (zh)
Inventor
邓永国
范光亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Cheerbright Technologies Co Ltd
Original Assignee
Beijing Cheerbright Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Cheerbright Technologies Co Ltd filed Critical Beijing Cheerbright Technologies Co Ltd
Priority to CN202210083460.XA priority Critical patent/CN114493250A/en
Publication of CN114493250A publication Critical patent/CN114493250A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0635Risk analysis of enterprise or organisation activities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data

Landscapes

  • Business, Economics & Management (AREA)
  • Engineering & Computer Science (AREA)
  • Strategic Management (AREA)
  • Human Resources & Organizations (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Development Economics (AREA)
  • Economics (AREA)
  • Finance (AREA)
  • Accounting & Taxation (AREA)
  • General Business, Economics & Management (AREA)
  • Marketing (AREA)
  • Physics & Mathematics (AREA)
  • Game Theory and Decision Science (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Educational Administration (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The invention discloses an abnormal behavior detection method, a computing device and a readable storage medium, wherein the method comprises the following steps: acquiring a behavior data sample set, wherein the behavior data sample set comprises a plurality of behavior data samples; inputting the behavior data samples in the behavior data sample set into a trained self-encoder for processing to obtain output data; determining a sample error corresponding to each behavior data sample based on the input behavior data sample and the output data; determining a first threshold value based on sample errors corresponding to all behavior data samples; obtaining a first risk score corresponding to each behavior data sample based on the sample error of each behavior data sample and a first threshold; determining whether the behavioral data sample is abnormal based at least on the first risk score.

Description

Abnormal behavior detection method, computing device and readable storage medium
Technical Field
The present invention relates to the field of computer technologies, and in particular, to an abnormal behavior detection method, a computing device, and a readable storage medium.
Background
In the field of machine learning, labeled data are extremely precious, and in practical application, labeling data sometimes generates a large amount of labor and material cost and brings poor user experience to users, for example, on some vehicle media platforms, users tend to buy and reserve resources, namely users leave data such as mobile phone numbers for concerned vehicle models, and the data are always important information concerned by various automobile dealers and vehicle media platforms. Therefore, in practical application, training on unlabeled data has a wide application scenario.
In the field of detecting abnormal user behaviors, the existing detection method based on machine learning has the defects that service personnel cannot know the specific reasons of the results given by the models and the reliability of the results is insufficient by performing feature processing on service scene data, training the models and performing risk assessment according to the results of the models.
Therefore, a method for detecting abnormal behavior by using unlabeled data is needed to improve the accuracy of detection.
Disclosure of Invention
To this end, the present invention provides an abnormal behavior detection method, a computing device and a readable storage medium in an effort to solve or at least alleviate at least one of the problems identified above.
According to an aspect of the present invention, there is provided an abnormal behavior detection method, executed in a computing device, the method comprising the steps of: acquiring a behavior data sample set, wherein the behavior data sample set comprises a plurality of behavior data samples; inputting the behavior data samples in the behavior data sample set into a trained self-encoder for processing to obtain output data; determining a sample error corresponding to each behavior data sample based on the input behavior data sample and the output data; determining a first threshold value based on sample errors corresponding to all behavior data samples; obtaining a first risk score corresponding to each behavior data sample based on the sample error of each behavior data sample and a first threshold; determining whether the behavioral data sample is abnormal based at least on the first risk score.
Optionally, in the abnormal behavior detection method according to the present invention, the method further includes: evaluating the behavior data sample set based on a preset abnormal behavior detection strategy to obtain a second risk score corresponding to each behavior data sample; and performing fusion processing on the first risk score and the second risk score to obtain a third risk score corresponding to each behavior data sample.
Optionally, in the abnormal behavior detection method according to the present invention, the step of determining whether the behavior data sample is abnormal based on at least the first risk score includes: determining whether the behavioral data sample is abnormal based on the third risk score.
Optionally, in the abnormal behavior detection method according to the present invention, the step of determining whether the behavior data sample is abnormal based on the third risk score includes: determining a threshold value of a risk assessment grade based on the third risk scores corresponding to all the behavior data samples; and determining the risk evaluation grade corresponding to the behavior data sample based on the threshold value of the risk evaluation grade.
Optionally, in the abnormal behavior detection method according to the present invention, before the step of inputting the behavior data samples in the behavior data sample set into a trained self-encoder for processing, the method further includes: and preprocessing the behavior data sample set.
Optionally, in the abnormal behavior detection method according to the present invention, the behavior data sample includes at least one behavior data, wherein the step of preprocessing the behavior data sample set includes: if the behavior data included in one behavior data sample in the behavior data sample set has missing values, and the number of the missing values exceeds a first preset value of the total number of the behavior data, discarding the behavior data sample; and if the behavior data included in one behavior data sample in the behavior data sample set has missing values, but the number of the missing values does not exceed a first preset value of the total number of the behavior data, filling the missing values of the behavior data sample.
Optionally, in the abnormal behavior detection method according to the present invention, the step of filling missing values of the behavior data sample includes: if the behavior data corresponding to the missing value included in the behavior data sample belongs to the continuous variable in the behavior data sample set, taking the behavior data sample corresponding to the behavior data in the behavior data sample set, carrying out mean value calculation on the value corresponding to the behavior data, and taking the result of the mean value calculation as the missing value; and if the behavior data corresponding to the missing value included in the behavior data sample belongs to the discrete variable in the behavior data sample set, taking the behavior data sample corresponding to the behavior data in the behavior data sample set and taking the mode of the value corresponding to the behavior data as the missing value.
Optionally, in the abnormal behavior detection method according to the present invention, the step of determining the first threshold based on sample errors corresponding to all behavior data samples includes: and taking the sample errors of the second preset value as a first threshold value according to the sequence of the sample errors corresponding to all the behavior data samples from large to small.
Optionally, in the abnormal behavior detection method according to the present invention, the first risk score corresponding to each behavior data sample is calculated by the following formula: mnodel _ score (x)i)=sigmoid(log(RMSEi) Log (threshold)) where model _ score (x)i) As behavioral data samples xiCorresponding first risk score, RMSEiAs behavioral data samples xiThe corresponding root mean square error, threshold, is a first threshold.
Alternatively, in the abnormal behavior detection method according to the present invention, the behavior data samples include behavior data in a predetermined period.
Alternatively, in the abnormal behavior detection method according to the present invention, the behavior data includes: at least one of the mobile phone number, the number of times of searching for the information reserving vehicle in the near day, the number of times of browsing the information reserving vehicle in the near day, the average browsing time length of the information reserving vehicle in each time, and the number of times of reserving information for the mobile phone number in the near day.
Optionally, in the abnormal behavior detection method according to the present invention, the predetermined abnormal behavior detection policy includes: at least one of the number of times of occurrence of the same behavior data in a first predetermined time and whether to browse related pages of the parking space in a second predetermined time before the user behavior.
According to another aspect of the invention, there is provided a computing device comprising: one or more processors; and a memory; one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs including instructions for performing the above-described abnormal behavior detection method.
According to yet another aspect of the present invention, there is also provided a computer readable storage medium storing one or more programs, the one or more programs comprising instructions, which when executed by a computing device, cause the computing device to perform the above-described abnormal behavior detection method.
According to the abnormal behavior detection method, the behavior data samples in the data sample set are evaluated based on the self-coding model, risks contained in the behavior data are scored based on sample errors and the determined threshold values of the sample errors, abnormal behaviors are effectively detected, the platform data are helped to be purified, the risk level of the behavior data samples can be clearly and intuitively known based on the risk scoring, and effective data support is provided for refined and differentiated accurate marketing.
In addition, under a large number of application scenes, data samples are not labeled, and the unsupervised anomaly detection model is reasonable and effective, so that a large number of manual labeling works can be saved.
Due to the complexity of the practical application scene, the effect is often poor only by using the unsupervised model, therefore, the abnormal behavior detection strategy is introduced to evaluate the behavior data sample, a risk scoring scheme combining the self-encoder and the abnormal behavior detection strategy is creatively provided, the reliability of abnormal behavior detection is increased, and the accidental injury possibly brought by the unsupervised model is greatly reduced.
Drawings
To the accomplishment of the foregoing and related ends, certain illustrative aspects are described herein in connection with the following description and the annexed drawings, which are indicative of various ways in which the principles disclosed herein may be practiced, and all aspects and equivalents thereof are intended to be within the scope of the claimed subject matter. The above and other objects, features and advantages of the present disclosure will become more apparent from the following detailed description read in conjunction with the accompanying drawings. Throughout this disclosure, like reference numerals generally refer to like parts or elements.
FIG. 1 shows a schematic diagram of a computing device 100, according to one embodiment of the invention;
FIG. 2 illustrates a flow diagram of an abnormal behavior detection method 200 according to one embodiment of the present invention; and
fig. 3 shows a flow diagram of an abnormal behavior detection method 300 according to another embodiment of the present invention.
Detailed Description
Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
In various industries, user interest in funding, that is, a user leaves data such as a mobile phone number for an intended product to be consulted, is very important information. Taking the automobile industry as an example, users tend to buy automobiles and reserve resources, that is, users reserve data such as mobile phone numbers for certain vehicle type with intent, which has been a key concern of automobile media platforms and dealers. The real investment behavior greatly reflects the car buying intention of the user, and is a very valuable marketing clue for the platform and the dealer. However, false reservations from malicious users or competitors occupy a significant amount of platform resources, and manual screening and filtering out false reservations also creates unnecessary cost expenditures. Although the platform can prevent the occurrence of false fund reservation by adding the short message verification code, the willingness of the user to reserve the fund is greatly reduced. Therefore, on the premise of not increasing the short message verification code, how to effectively filter the false vestiges becomes a key topic of current automobile media platform research.
The invention provides an abnormal behavior detection method in combination with a self-encoder algorithm, which carries out unsupervised model training on user information and multidimensional data such as browsing behaviors, browsing duration, browsing vehicle system types, searching behaviors and the like of different vehicle systems before user information reservation by constructing a self-encoder. The behavior data generated by the user is scored through the model, or the behavior data generated by the user is scored by combining with an abnormal behavior detection strategy, so that the accuracy rate of detecting the funding behavior of the abnormal user is improved.
The invention provides an abnormal behavior detection method, which is executed in a computing device. FIG. 1 shows a schematic diagram of a computing device 100, according to one embodiment of the invention. It should be noted that the computing device 100 shown in fig. 1 is only an example, and in practice, the computing device for implementing the abnormal behavior detection method of the present invention may be any type of device, and the hardware configuration thereof may be the same as the computing device 100 shown in fig. 1 or different from the computing device 100 shown in fig. 1. In practice, the computing device implementing the abnormal behavior detection method of the present invention may add or delete hardware components of the computing device 100 shown in fig. 1, and the present invention does not limit the specific hardware configuration of the computing device.
A block diagram of a computing device 100 is shown in FIG. 1. in a basic configuration 102, the computing device 100 typically includes a system memory 106 and one or more processors 104. A memory bus 108 may be used for communication between the processor 104 and the system memory 106.
Depending on the desired configuration, the processor 104 may be any type of processing, including but not limited to: a microprocessor (μ P), a microcontroller (μ C), a Digital Signal Processor (DSP), or any combination thereof. The processor 104 may include one or more levels of cache, such as a level one cache 110 and a level two cache 112, a processor core 114, and registers 116. The example processor core 114 may include an Arithmetic Logic Unit (ALU), a Floating Point Unit (FPU), a digital signal processing core (DSP core), or any combination thereof. The example memory controller 118 may be used with the processor 104, or in some implementations the memory controller 118 may be an internal part of the processor 104.
Depending on the desired configuration, system memory 106 may be any type of memory, including but not limited to: volatile memory (such as RAM), non-volatile memory (such as ROM, flash memory, etc.), or any combination thereof. System memory 106 may include an operating system 120, one or more applications 122, and program data 124. In some embodiments, application 122 may be arranged to operate with program data 124 on an operating system. Program data 124 includes instructions, and in computing device 100 according to the present invention, program data 124 includes instructions for performing program debugging abnormal behavior detection method 200 or 300 of the present invention.
The computing device 100 also includes a storage device 132, the storage device 132 including removable storage 136 and non-removable storage 138, the removable storage 136 and the non-removable storage 138 each connected to the storage interface bus 134. In the present invention, the data related to each event occurring during the program execution process and the time information indicating the occurrence of each event may be stored in the storage device 132, and the operating system 120 is adapted to manage the storage device 132. The storage device 132 may be a magnetic disk.
Computing device 100 may also include an interface bus 140 that facilitates communication from various interface devices (e.g., output devices 142, peripheral interfaces 144, and communication devices 146) to the basic configuration 102 via the bus/interface controller 130. The example output device 142 includes a graphics processing unit 148 and an audio processing unit 150. They may be configured to facilitate communication with various external devices such as a display or speakers via one or more a/V ports 152. Example peripheral interfaces 144 may include a serial interface controller 154 and a parallel interface controller 156, which may be configured to facilitate communication with external devices such as input devices (e.g., keyboard, mouse, pen, voice input device, touch input device) or other peripherals (e.g., printer, scanner, etc.) via one or more I/O ports 158. An example communication device 146 may include a network controller 160, which may be arranged to facilitate communications with one or more other computing devices 162 over a network communication link via one or more communication ports 164.
A network communication link may be one example of a communication medium. Communication media may typically be embodied by computer readable instructions, data structures, program modules, and may include any information delivery media, such as carrier waves or other transport mechanisms, in a modulated data signal. A "modulated data signal" may be a signal that has one or more of its data set or its changes made in such a manner as to encode information in the signal. By way of non-limiting example, communication media may include wired media such as a wired network or private-wired network, and various wireless media such as acoustic, Radio Frequency (RF), microwave, Infrared (IR), or other wireless media. The term computer readable media as used herein may include both storage media and communication media.
Computing device 100 may be implemented as a server, such as a file server, a database server, an application server, a WEB server, etc., or as part of a small-form factor portable (or mobile) electronic device, such as a cellular telephone, a Personal Digital Assistant (PDA), a personal media player device, a wireless WEB-watch device, a personal headset device, an application specific device, or a hybrid device that include any of the above functions. Computing device 100 may also be implemented as a personal computer including both desktop and notebook computer configurations. In some embodiments, the operating system 120 of the computing device 100 is configured to perform a method of abnormal behavior detection in accordance with the present invention.
Fig. 2 shows a flow diagram of an abnormal behavior detection method 200 according to one embodiment of the present invention. The method 200 is suitable for execution in the computing device 100 as described above. As shown in fig. 2, the method 200 begins at step S210.
In step S210, a behavior data sample set is obtained, the behavior data sample set including a plurality of behavior data samples.
According to embodiments of the present invention, the behavior data samples may include behavior data over a predetermined period, for example, behavior data collected over a day. The behavior data includes: at least one of the number of the remaining resources mobile phone, the number of times of searching for the remaining resources vehicle in the near day, the number of times of browsing in the near day in the remaining resources vehicle, the average browsing time length of the remaining resources vehicle in each browsing, and the number of times of remaining resources in the near day in the mobile phone number. The behavior data may also include: the number of times of browsing the car system, the duration of browsing the car system, the number of times of browsing the car system public praise information, the duration of browsing the car system public praise information, the type of browsing the car system, the number of times of searching the car system, the frequency of searching the car system, the number of times of reserving resources for the same mobile phone number, the frequency of reserving resources for the same mobile phone number, the forum posting and replying behavior, such as the car system posted or replying by the user in the forum, the number of times of posting or replying by the user to a car system of a certain car type in the forum, the type of car system to which the section where the user posting and replying belongs, and the like. The reserved data represents the behavior of the user to leave data, the reserved data vehicle system represents the vehicle system as the intention vehicle system of the user, and the user leaves data for the vehicle system.
Examples of collected behavioral data samples are shown in the following table:
Figure BDA0003474074160000081
table one
The behavior data sample shown in the table one includes behavior data such as a user remaining asset mobile phone number, a search frequency of a remaining asset vehicle system for nearly 3 days, a browsing frequency of a remaining asset vehicle system for nearly 3 days, an average browsing time length each time, a remaining asset frequency of a remaining asset mobile phone number for nearly 3 days, and a remaining asset frequency within 1 hour of the remaining asset mobile phone number. In this example, the first two rows of behavior data samples, i.e., the behavior data corresponding to the user-funded mobile phone numbers 130 × 5678 and 130 × 1234, have corresponding values, complete data, and no missing values, while the third row of behavior data samples, i.e., the behavior data corresponding to the user-funded mobile phone numbers 130 × 0000, has values corresponding to the number of funding times within 1 hour of the user-funded mobile phone number, and has values of the other behavior data missing. In the data preprocessing stage, behavior data with missing values will be processed.
It should be noted that the abnormal behavior detection method of the present invention is applicable to the behavior data sample set of the present invention, and may also be applicable to other behavior data sample sets, for example, but not limited to, user behavior data sample sets in other industries besides the automobile industry.
In step S220, the behavior data samples in the behavior data sample set are input into a trained self-encoder for processing, so as to obtain output data.
Optionally, the behavior data sample set is preprocessed before the behavior data samples in the behavior data sample set are input to the trained self-encoder.
Specifically, the behavior data samples are processed according to the sparsity of the behavior data samples, including processing missing values in the behavior data samples. Some data samples have missing values, for example, behavior data samples in the third row in the table one, that is, the behavior data corresponding to the user remaining phone number 130 x 0000 only has a value corresponding to the number of remaining phone numbers within 1 hour, and the values of other behavior data are all missing, that is, other behavior data all have missing values.
Regarding the step of processing the missing values of the behavior data samples in the behavior data sample set, firstly, whether the behavior data samples have the missing values is judged, and if the behavior data in the behavior data samples do not have the missing values, the behavior data is not processed. And if the behavior data included in the behavior data sample has missing values, and the number of the missing values exceeds a first preset value of the total number of the behavior data, discarding the behavior data sample. When the number of the missing values exceeds the first preset value of the total number of the behavior data, the number of the missing values is too large, so that the data quality of the behavior data sample can be deduced to be low, and therefore the behavior data sample is discarded to improve the quality of the behavior data sample, and the accuracy of abnormal behavior detection is improved. And if the behavior data included in the behavior data sample has missing values, but the quantity of the missing values does not exceed the first preset value of the total quantity of the behavior data, filling the missing values of the behavior data sample.
Regarding the step of filling the missing values of the behavior data samples, first, it is determined what type of variable the behavior data corresponding to the missing values belong to in the behavior data sample set. And if the behavior data corresponding to the missing value included in the behavior data sample belongs to the continuous variable in the behavior data sample set, taking the behavior data sample with a value corresponding to the behavior data in the behavior data sample set, carrying out mean value calculation on the value corresponding to the behavior data, and taking the result of the mean value calculation as the missing value. For example, in a behavior data sample set including a plurality of behavior data samples, the value of one behavior data included in behavior data sample 1 is 6, the value of the same behavior data included in behavior data sample 2 is 10, the value of the same behavior data included in behavior data sample 3 is missing, where the rest of behavior data samples are omitted, if behavior data sample 3 satisfies the condition that the number of the missing values does not exceed the first predetermined value of the total number of behavior data, and, if the behavior data belongs to a continuous variable in the behavior data sample set, taking the value 6 of the behavior data corresponding to the data sample 1, the value 10 of the behavior data corresponding to the data sample 2, and the values corresponding to the same behavior data in the remaining behavior data samples, performing a mean calculation, and filling the result of the mean calculation as a missing value of the behavior data in the data sample 3.
And if the behavior data corresponding to the missing value included in the behavior data sample belongs to the discrete variable in the behavior data sample set, taking the behavior data sample corresponding to the behavior data in the behavior data sample set and taking the mode of the value corresponding to the behavior data as the missing value. For example, in a behavior data sample set containing a plurality of behavior data samples, behavior data sample 1 includes a behavior data value of 8, behavior data sample 2 includes the same behavior data value of 10, behavior data sample 3 includes the same behavior data value of missing, the remaining behavior data samples are omitted here, if the behavior data sample 3 meets the condition that the number of the above-mentioned missing values does not exceed the first predetermined value of the total number of behavior data, and, if the behavior data belongs to a discrete variable in the behavior data sample set, the mode is obtained to fill the missing value of the behavior data in the data sample 3 by taking the value 8 of the behavior data corresponding to the data sample 1, the value 10 of the behavior data corresponding to the data sample 2, and the value corresponding to the same behavior data in the rest of behavior data samples.
The behavior data samples with excessive missing values are discarded, the behavior data samples with less missing values are filled, the samples with lower quality can be eliminated, the quality of the samples with less missing values is improved, the problem of abnormal behavior detection errors caused by the missing values is solved, and the accuracy of abnormal behavior detection is improved.
Regarding the selection of the first predetermined value, for example, it may be set to 50%, and of course, the first predetermined value may also be set according to the actual service scenario. For example, if the number of samples of a particular behavioural data sample set collected is small, the first predetermined value is set to a large percentage, such as 80%, in order to retain as many behavioural data samples as possible. The number of samples of the collected specific behavioural data sample set is small and the first predetermined value is set to a small percentage, such as 20%, in order to reject as many data samples as possible that are not informative enough. The selection of the specific value of the first predetermined value is not limited in the present invention.
Preprocessing the behavioral data sample set may further include: processing, feature derivation, data segmentation, data encoding, etc. of outliers in behavioral data samples.
According to one embodiment of the invention, the value of a behavioral data exception in a behavioral data sample is processed. Whether the value corresponding to each behavior data included in the data sample is an abnormal value or not may be defined by a person skilled in the art according to a specific service, for example, a value corresponding to the number of funding times within 1 hour of the funding mobile phone number is defined to be greater than 60, and the value corresponding to the behavior data is determined to be an abnormal value.
According to an embodiment of the present invention, for a situation that an original feature is not enough to be well established from a coding model, feature derivation, that is, constructing a new feature is required, wherein the feature refers to behavior data in a behavior data sample. The characteristic derivation can adopt a counting mode, for example, the number of searching times of a near week of the parking system is counted, deep analysis can be carried out based on the existing characteristics, for example, the attribution of the parking mobile phone number and whether the parking mobile phone number is a virtual operator, and data filled by the user can be added to the characteristics, for example, the information of the age, the sex, the academic calendar, the occupation and the like of the user.
According to one embodiment of the invention, the features are segmented according to specific business requirements, for example, … are segmented according to browsing times of a certain train in the last three days according to 0-5 times, 5-10 times and 10-15 times.
In step S230, based on the input behavior data samples and the output data, a sample error corresponding to each behavior data sample is determined. Wherein the sample error may be a root mean square error.
The sample error corresponding to each behavior data sample can be constructed by the following formula:
Figure BDA0003474074160000111
wherein, RMSEiIs a sample xiRoot mean square error of xtIs a sample xiThe t-th feature of (the t-th input of the coder), ptAnd N is the total number of the features.
Alternatively, the sample error may be an average error.
Subsequently, in step S240, a first threshold is determined based on the sample errors corresponding to all behavior data samples.
Specifically, the sample errors corresponding to all the behavior data samples are sorted in a positive order from large to small, and the second predetermined value of the sample errors is taken as a first threshold. Optionally, a 95% quantile value is chosen, i.e. the second predetermined value is chosen to be 95% N, where N is the number of all behavioral data samples. And if the sample error between a certain input behavior data sample and the output data is larger than a first threshold value, the behavior data sample is determined to be abnormal behavior.
However, over time, the distribution of positive and negative samples will also change, which may result in a change in the distribution of sample errors and thus in an instability of the first risk score derived by the model. In order to reduce accidental injury, the invention creatively adopts a mode of dynamically adding the threshold, the first threshold is re-determined according to the specific quantile of the sample error of the full amount of samples in the period every predetermined period, the abnormal proportion of each predetermined period is dynamically adjusted, so that the result of the self-coding model achieves a stable state relative to the changed service data, for example, the first threshold is re-determined according to the quantile of 95% of the root mean square error of the full amount of the day every day, and the purpose of dynamically adjusting the first threshold according to the determined abnormal proportion of the day is achieved.
According to another embodiment of the present invention, the employed self-encoder may be obtained by training. The self-encoder may be a variational self-encoder. The variational self-encoder comprises an encoder, a decoder and a loss function, the difference between behavior data samples input into the variational self-encoder and output data output by the variational self-encoder can be obtained through calculation of the loss function, and the loss can be reconstructed through gradient descent, so that the difference between the input data and the output data is reduced.
In a large number of application scenarios, for example, in an automobile media platform, the willingness of a user to reserve resources is usually to consult automobile purchasing information such as the price of an automobile, and in a similar scenario, most of the reserved information is real, and only a small number of malicious users leave information of false mobile phone numbers, so that the number of positive samples (i.e., abnormally reserved behavior data samples) is far less than that of negative samples (i.e., normally reserved behavior data samples), and due to the large difference between the numbers of the positive and negative samples, the effect of fitting the trained model with the negative samples is superior to that of fitting the positive samples, and overfitting is caused to the negative samples with large sample proportion, so that the overall effect of the model is poor, and the generalization capability of the model is greatly reduced.
In this regard, positive and negative samples may be distinguished by the determined first threshold, behavior data samples having a sample error greater than the first threshold are determined to be positive samples, and behavior data samples having a sample error less than the first threshold are determined to be negative samples. Then, the number of few samples can be increased by oversampling to realize sample equalization, and the number of most samples can be decreased by undersampling to realize sample equalization. It is also possible to use different penalty weights for positive and negative samples, e.g. high for a few samples and low for a majority. By adopting the manner of dynamically adjusting the first threshold value, the accidental injury can be reduced, namely, the situation that the negative sample is wrongly divided into the positive samples is reduced, and the accuracy of the model is improved.
In step S250, a first risk score corresponding to each behavior data sample is obtained based on the sample error of each behavior data sample and the first threshold determined in the previous step.
The first risk score for each behavior data sample may be calculated by the following equation:
mnodel_score(xi)=sigmoid(log(RMSEi)-log(threshold))
wherein, mnodel _ score (x)i) As behavioral data samples xiCorresponding first risk score, RMSEiAs behavioral data samples xiThe corresponding root mean square error, threshold, is a first threshold.
The abnormal value obtained by the self-encoder needs to be subjected to logarithmic transformation and sigmoid function transformation, and the first risk score obtained by the self-encoder model is mapped into the interval of [0,1], so that the first risk score and the risk score obtained by other methods are fused in the same numerical dimension.
In step S260, it is determined whether the behavioral data sample is abnormal based at least on the first risk score.
According to one embodiment of the invention, whether a behavioral data sample is abnormal may be determined based only on the first risk score. Firstly, determining a threshold value of a risk assessment grade based on first risk scores corresponding to all the behavior data samples, and secondly, determining a risk assessment grade corresponding to the behavior data samples based on the determined threshold value of the risk assessment grade. According to an embodiment of the present invention, the risk assessment level may be composed of a plurality of levels, and then the risk assessment level to which the behavior data sample belongs is determined according to the threshold value of each risk assessment level.
Alternatively, the threshold of the risk assessment level may be set according to the business needs, or may be determined by referring to the 3 σ criterion. For example, if the first risk score meets the positive distribution rule, the threshold of the risk assessment scale may be set to be μ - σ, μ, μ + σ, or four scales, where μ represents the average of all the first risk scores, σ represents the standard deviation of all the first risk scores, the behavioral data sample with the first risk score between 0 and μ - σ is the lowest risk, the behavioral data sample with the first risk score between μ - σ and μ is the lower risk, the behavioral data sample with the first risk score between μ and μ + σ is the higher risk, and the behavioral data sample with the first risk score between μ + σ and 1 is the highest risk.
After step S260, steps as shown in fig. 3 may also be included, and fig. 3 shows an abnormal behavior detection method 300 according to another embodiment of the present invention. The method 300 begins at step S310.
In step S310, the behavior data sample set is evaluated based on a predetermined abnormal behavior detection policy, and a second risk score corresponding to each behavior data sample is obtained.
In a funding scenario of a vehicle media platform, the predetermined abnormal behavior detection strategy comprises: and at least one of the number of times of the same behavior data appearing in the first preset time and whether to browse related pages of the behavior car series in the second preset time before the user behavior. For example, the number of times of mooring for the same vehicle in 10 minutes by the same user equipment is higher, and if the number of times is higher, the possibility that the mooring of the user equipment is abnormal behavior is higher; another example is: the user with the same mobile phone number does not browse the related industry of the vehicle system within the first 1 hour when the user reserves resources for the vehicle system, and the probability that the resources reserved for the user are abnormal is high. Therefore, partial abnormal behaviors can be effectively screened out through a preset abnormal behavior detection strategy. The specific values of the first predetermined time and the second predetermined time may be set by those skilled in the art according to specific application scenarios, and the present invention is not limited by this.
Specifically, the second risk score may be generated by the following formula:
Figure BDA0003474074160000141
wherein, rules _ score (x)i) Is a sample xiSecond risk score of rj(xi) Is a sample xiThe characteristic value of the j-th abnormal behavior detection policy.
And evaluating the behavior data sample set through a preset abnormal behavior detection strategy, wherein the obtained abnormal value needs to be subjected to logarithmic transformation and sigmoid function transformation to map the second risk score into the interval of [0,1], so that the second risk score and the risk score obtained by other methods are fused in the same numerical dimension.
Subsequently, in step S320, the first risk score and the second risk score are subjected to fusion processing, so as to obtain a third risk score corresponding to each behavior data sample.
According to one embodiment of the invention, the third risk score is constructed by the following formula:
final_score(xi)=a*model_score(xi)+b*rules_score(xi)
a+b=1
wherein final _ score (x)i) Is a sample xiModel _ score (x) of (c)i) Is a sample xiThe first risk score of (a), rules _ score (x)i) Is a sample xiA assigns a weight to the first risk score, and b assigns a weight to the second risk score.
According to the method, whether the behavior data sample is abnormal behavior or not and the abnormal behavior grade are judged through the final score obtained by fusing the risk score obtained by fusing the self-encoder with the risk score obtained by the preset abnormal behavior detection strategy, so that the advantages of the self-encoding model and the abnormal behavior detection strategy are fused, the result also has certain interpretability, and the reliability and the accuracy of the result are improved.
According to an embodiment of the present invention, the step of step S260 includes: determining whether the behavioral data sample is abnormal based on the third risk score. Whether the behavioral data sample is abnormal may be determined in conjunction with the first risk score and the second risk score. First, a threshold value of each risk assessment level is determined based on the third risk scores corresponding to all the behavior data samples. And secondly, determining the risk evaluation grade corresponding to the behavior data sample based on the threshold value of each risk evaluation grade.
Alternatively, the threshold of the risk assessment level may be set according to the business needs, or may be determined by referring to the 3 σ criterion. For example, if the third risk score meets the positive distribution rule, the threshold of the risk assessment scale may be set to be μ - σ, μ, μ + σ, or four scales, where μ represents the average of all the third risk scores, σ represents the standard deviation of all the third risk scores, the behavioral data samples with the third risk scores between 0 and μ - σ are the lowest risk, the behavioral data samples with the third risk scores between μ - σ and μ are the lower risk, the behavioral data samples with the third risk scores between μ and μ + σ are the higher risk, and the behavioral data samples with the third risk scores between μ + σ and 1 are the highest risk.
According to the abnormal behavior detection method, the behavior data samples in the data sample set are evaluated based on the self-coding model, risks contained in the behavior data are scored based on sample errors and the determined threshold values of the sample errors, abnormal behaviors are effectively detected, the platform data are helped to be purified, the risk level of the behavior data samples can be clearly and intuitively known based on the risk scoring, and effective data support is provided for refined and differentiated accurate marketing.
In addition, under a large number of application scenes, data samples are not labeled, and the unsupervised anomaly detection model is reasonable and effective, so that a large number of manual labeling works can be saved.
Due to the complexity of the practical application scene, the effect is often poor only by applying the unsupervised model, so that the abnormal behavior detection strategy is introduced to evaluate the behavior data sample, a risk scoring scheme integrating the self-encoder and the abnormal behavior detection strategy is creatively provided, the reliability of abnormal behavior detection is improved, and the accidental injury possibly caused by the unsupervised model is greatly reduced.
The various techniques described herein may be implemented in connection with hardware or software or, alternatively, with a combination of both. Thus, the methods and apparatus of the present invention, or certain aspects or portions thereof, may take the form of program code (i.e., instructions) embodied in tangible media, such as removable hard drives, U.S. disks, floppy disks, CD-ROMs, or any other machine-readable storage medium, wherein, when the program is loaded into and executed by a machine, such as a computer, the machine becomes an apparatus for practicing the invention.
In the case of program code execution on programmable computers, the computing device will generally include a processor, a storage medium readable by the processor (including volatile and non-volatile memory and/or storage elements), at least one input device, and at least one output device. Wherein the memory is configured to store program code; the processor is configured to perform the abnormal behavior detection method of the present invention according to instructions in the program code stored in the memory.
By way of example, and not limitation, readable media includes readable storage media and communication media. Readable storage media store information such as computer readable instructions, data structures, program modules or other data. Communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media. Combinations of any of the above are also included within the scope of readable media.
In the description provided herein, algorithms and displays are not inherently related to any particular computer, virtual system, or other apparatus. Various general purpose systems may also be used with examples of this invention. The required structure for constructing such a system will be apparent from the description above. Moreover, the present invention is not directed to any particular programming language. It is appreciated that a variety of programming languages may be used to implement the teachings of the present invention as described herein, and any descriptions of specific languages are provided above to disclose the best mode of the invention.
In the description provided herein, numerous specific details are set forth. It is understood, however, that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.
It should be appreciated that in the foregoing description of exemplary embodiments of the invention, various features of the invention are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more of the various inventive aspects. However, the disclosed method should not be interpreted as reflecting an intention that: that the invention as claimed requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this invention.
Those skilled in the art will appreciate that the modules or units or components of the devices in the examples disclosed herein may be arranged in a device as described in this embodiment or alternatively may be located in one or more devices different from the devices in this example. The modules in the foregoing examples may be combined into one module or may be further divided into multiple sub-modules.
Those skilled in the art will appreciate that the modules in the device in an embodiment may be adaptively changed and disposed in one or more devices different from the embodiment. The modules or units or components of the embodiments may be combined into one module or unit or component, and furthermore they may be divided into a plurality of sub-modules or sub-units or sub-components. All of the features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or elements of any method or apparatus so disclosed, may be combined in any combination, except combinations where at least some of such features and/or processes or elements are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.
Furthermore, those skilled in the art will appreciate that while some embodiments described herein include some features included in other embodiments, rather than other features, combinations of features of different embodiments are meant to be within the scope of the invention and form different embodiments. For example, in the following claims, any of the claimed embodiments may be used in any combination.
Additionally, some of the embodiments are described herein as a method or combination of method elements that can be implemented by a processor of a computer system or by other means of performing the described functions. A processor having the necessary instructions for carrying out the method or method elements thus forms a means for carrying out the method or method elements. Further, the elements of the apparatus embodiments described herein are examples of the following apparatus: the apparatus is used to implement the functions performed by the elements for the purpose of carrying out the invention.
As used herein, unless otherwise specified the use of the ordinal adjectives "first", "second", "third", etc., to describe a common object, merely indicate that different instances of like objects are being referred to, and are not intended to imply that the objects so described must be in a given sequence, either temporally, spatially, in ranking, or in any other manner.
While the invention has been described with respect to a limited number of embodiments, those skilled in the art, having benefit of this description, will appreciate that other embodiments can be devised which do not depart from the scope of the invention as described herein. Furthermore, it should be noted that the language used in the specification has been principally selected for readability and instructional purposes, and may not have been selected to delineate or circumscribe the inventive subject matter. Accordingly, many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the appended claims. The present invention has been disclosed in an illustrative rather than a restrictive sense, and the scope of the present invention is defined by the appended claims.

Claims (10)

1. An abnormal behavior detection method, executed in a computing device, the method comprising the steps of:
acquiring a behavior data sample set, wherein the behavior data sample set comprises a plurality of behavior data samples;
inputting the behavior data samples in the behavior data sample set into a trained self-encoder for processing to obtain output data;
determining a sample error corresponding to each behavior data sample based on the input behavior data sample and the output data;
determining a first threshold value based on sample errors corresponding to all behavior data samples;
obtaining a first risk score corresponding to each behavior data sample based on the sample error of each behavior data sample and the first threshold;
determining whether the behavioral data sample is abnormal based at least on the first risk score.
2. The method of claim 1, further comprising the step of:
evaluating the behavior data sample set based on a preset abnormal behavior detection strategy to obtain a second risk score corresponding to each behavior data sample;
and performing fusion processing on the first risk score and the second risk score to obtain a third risk score corresponding to each behavior data sample.
3. The method of claim 2, wherein the step of determining whether the behavioral data sample is abnormal based at least on the first risk score comprises:
determining whether a behavioral data sample is abnormal based on the third risk score.
4. The method of claim 3, wherein the step of determining whether the behavioral data sample is abnormal based on the third risk score comprises:
determining a threshold value of a risk assessment grade based on the third risk scores corresponding to all the behavior data samples;
and determining the risk evaluation grade corresponding to the behavior data sample based on the threshold value of the risk evaluation grade.
5. The method of any of claims 1-4, wherein prior to the step of inputting behavior data samples in the set of behavior data samples into a trained self-encoder for processing, further comprising:
and preprocessing the behavior data sample set.
6. The method of claim 5, the behavioral data sample comprising at least one behavioral data, wherein the step of preprocessing the set of behavioral data samples comprises:
if the behavior data included in one behavior data sample in the behavior data sample set has missing values, and the number of the missing values exceeds a first preset value of the total number of the behavior data, discarding the behavior data sample;
and if the behavior data included in one behavior data sample in the behavior data sample set has missing values, and the number of the missing values does not exceed a first preset value of the total number of the behavior data, filling the missing values of the behavior data sample.
7. The method of claim 6, wherein the step of filling missing values of the behavior data sample comprises:
if the behavior data corresponding to the missing value included in the behavior data sample belongs to the continuous variable in the behavior data sample set, taking the behavior data sample corresponding to the behavior data in the behavior data sample set, carrying out mean value calculation on the value corresponding to the behavior data, and taking the result of the mean value calculation as the missing value;
and if the behavior data corresponding to the missing value included in the behavior data sample belongs to the discrete variable in the behavior data sample set, taking the behavior data sample corresponding to the behavior data in the behavior data sample set and taking the mode of the value corresponding to the behavior data as the missing value.
8. The method of any of claims 1 to 7, wherein the step of determining the first threshold based on the sample errors for all behavior data samples comprises:
and taking the sample errors of the second preset value as a first threshold value according to the sequence of the sample errors corresponding to all the behavior data samples from large to small.
9. A computing device, comprising:
one or more processors; and
a memory;
one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs comprising instructions for performing any of the methods of claims 1-8.
10. A computer readable storage medium storing one or more programs, the one or more programs comprising instructions, which when executed by a computing device, cause the computing device to perform any of the methods of claims 1-8.
CN202210083460.XA 2022-01-17 2022-01-17 Abnormal behavior detection method, computing device and readable storage medium Pending CN114493250A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210083460.XA CN114493250A (en) 2022-01-17 2022-01-17 Abnormal behavior detection method, computing device and readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210083460.XA CN114493250A (en) 2022-01-17 2022-01-17 Abnormal behavior detection method, computing device and readable storage medium

Publications (1)

Publication Number Publication Date
CN114493250A true CN114493250A (en) 2022-05-13

Family

ID=81474215

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210083460.XA Pending CN114493250A (en) 2022-01-17 2022-01-17 Abnormal behavior detection method, computing device and readable storage medium

Country Status (1)

Country Link
CN (1) CN114493250A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115001953A (en) * 2022-05-30 2022-09-02 中国第一汽车股份有限公司 Electric vehicle data quality evaluation method, device, terminal and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115001953A (en) * 2022-05-30 2022-09-02 中国第一汽车股份有限公司 Electric vehicle data quality evaluation method, device, terminal and storage medium
CN115001953B (en) * 2022-05-30 2023-11-14 中国第一汽车股份有限公司 Electric automobile data quality evaluation method, device, terminal and storage medium

Similar Documents

Publication Publication Date Title
CN110837931B (en) Customer churn prediction method, device and storage medium
CN107025596B (en) Risk assessment method and system
CN108121795B (en) User behavior prediction method and device
CN110297912A (en) Cheat recognition methods, device, equipment and computer readable storage medium
CN103984703B (en) Mail classification method and device
CN108550065B (en) Comment data processing method, device and equipment
CN109388675A (en) Data analysing method, device, computer equipment and storage medium
CN112633962B (en) Service recommendation method and device, computer equipment and storage medium
CN110287328A (en) A kind of file classification method, device, equipment and computer readable storage medium
CN112102073A (en) Credit risk control method and system, electronic device and readable storage medium
CN111078880A (en) Risk identification method and device for sub-application
CN112328909A (en) Information recommendation method and device, computer equipment and medium
CN113407854A (en) Application recommendation method, device and equipment and computer readable storage medium
CN111159481B (en) Edge prediction method and device for graph data and terminal equipment
CN114493250A (en) Abnormal behavior detection method, computing device and readable storage medium
CN115423600B (en) Data screening method, device, medium and electronic equipment
CN113935788B (en) Model evaluation method, device, equipment and computer readable storage medium
CN111507850A (en) Authority guaranteeing method and related device and equipment
CN110795537B (en) Method, device, equipment and medium for determining improvement strategy of target commodity
CN115238194A (en) Book recommendation method, computing device and computer storage medium
CN115099366A (en) Classification prediction method and device and electronic equipment
CN115982634A (en) Application program classification method and device, electronic equipment and computer program product
CN112115258A (en) User credit evaluation method, device, server and storage medium
CN116304065B (en) Public opinion text classification method, device, electronic equipment and storage medium
CN115660722B (en) Prediction method and device for silver life customer conversion and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination