CN111880986A - Data detection method and device - Google Patents

Data detection method and device Download PDF

Info

Publication number
CN111880986A
CN111880986A CN202010632128.5A CN202010632128A CN111880986A CN 111880986 A CN111880986 A CN 111880986A CN 202010632128 A CN202010632128 A CN 202010632128A CN 111880986 A CN111880986 A CN 111880986A
Authority
CN
China
Prior art keywords
target
time period
preset time
time intervals
executing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010632128.5A
Other languages
Chinese (zh)
Inventor
冉君尧
陈成
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Asiainfo Technologies (chengdu) Inc
Original Assignee
Asiainfo Technologies (chengdu) Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Asiainfo Technologies (chengdu) Inc filed Critical Asiainfo Technologies (chengdu) Inc
Priority to CN202010632128.5A priority Critical patent/CN111880986A/en
Publication of CN111880986A publication Critical patent/CN111880986A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3065Monitoring arrangements determined by the means or processing involved in reporting the monitored data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Quality & Reliability (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The embodiment of the invention provides a data detection method and device, relates to the technical field of computers, and can improve the reasonability of significance level of the times of determining that a target object executes a target task within a second preset time period by data detection equipment, so as to improve the accuracy of determining whether an event for executing the target task abnormally exists by the data detection equipment. The method comprises the following steps: determining standard values corresponding to the times that the reference object respectively executes the target task in a plurality of time intervals of a first preset time period; acquiring the times of respectively executing the target tasks by the target object in a plurality of time intervals of a second preset time period; performing chi-square test to determine a significance level of the number of times that the target object performs the target task within a second preset time period; and if the significance level of the times of executing the target task by the target object in the second preset time period is less than the preset significance level threshold value, determining that an event for abnormally executing the target task exists.

Description

Data detection method and device
Technical Field
The embodiment of the invention relates to the technical field of computers, in particular to a data detection method and device.
Background
Currently, in the field of computer technology, an anomaly detection device may obtain (or count) a service log of a certain time interval to determine whether an anomaly condition occurs in the time interval. Illustratively, the anomaly detection device obtains, from the business log of the time interval, that a certain salesperson (which may be understood as a device used by the salesperson) processes the business a 10 times in the time interval, and the threshold of the transaction times of the business a in the time interval is 8 times, the anomaly detection device determines that the business a is processed more frequently than the threshold of the transaction times of the business a in the time interval, and further, the anomaly detection device determines that an anomaly occurs when the salesperson processes the business a in the time interval, and the anomaly state may be that the salesperson illegally obtains some information related to the business a or some information related to the business a is leaked.
However, in the above method, the threshold of the number of times of transaction of a certain type of service (for example, service a) in a certain time interval is determined based on subjective experience, and the objectivity of data is lacked, so that there may be a phenomenon that the anomaly detection is not reasonable.
Disclosure of Invention
The embodiment of the invention provides a data detection method and device, which can improve the reasonability of significance level of the times of determining that a target object executes a target task within a second preset time period by data detection equipment, and further improve the accuracy of determining whether an event for executing the target task abnormally exists by the data detection equipment.
In a first aspect, an embodiment of the present invention provides a data detection method, including: determining standard values corresponding to the times that the reference object respectively executes the target task in a plurality of time intervals of a first preset time period based on the target neural network model; acquiring the times of executing the target task by the target object in a plurality of time intervals of a second preset time period; performing data standardization processing on standard values corresponding to the times of executing the target task by the reference object in a plurality of time intervals of the first preset time period respectively to obtain standardized values corresponding to the times of executing the target task by the reference object in a plurality of time intervals of the first preset time period respectively, and performing data standardization processing on the times of executing the target task by the target object in a plurality of time intervals of the second preset time period respectively to obtain standardized values corresponding to the times of executing the target task by the target object in a plurality of time intervals of the second preset time period respectively; taking the normalized values of the reference object corresponding to the time intervals in the first preset time period as standard quantities, taking the normalized values of the target object corresponding to the time intervals in the second preset time period as observed quantities, and performing chi-square test to determine the significance level of the times of executing the target task by the target object in the second preset time period; if the significance level of the times of executing the target task by the target object in the second preset time period is smaller than a preset significance level threshold value, determining that an event for abnormally executing the target task exists; if the significance level of the times of executing the target task by the target object in the second preset time period is greater than or equal to the preset significance level threshold, determining that no event for abnormally executing the target task exists.
In a second aspect, an embodiment of the present invention provides a data detection apparatus, including: a determining module and an obtaining module; the determining module is used for determining standard values corresponding to the times that the reference object respectively executes the target tasks in a plurality of time intervals of a first preset time period based on the target neural network model; the acquisition module is used for acquiring the times of executing the target task by the target object in a plurality of time intervals of a second preset time period; the determining module is further configured to perform data normalization on standard values corresponding to times that the reference object executes the target task respectively in a plurality of time intervals of the first preset time period to obtain normalized values corresponding to the reference object in the plurality of time intervals of the first preset time period, and perform data normalization on times that the target object executes the target task respectively in the plurality of time intervals of the second preset time period to obtain normalized values corresponding to the target object in the plurality of time intervals of the second preset time period; the determining module is further configured to use, as a standard quantity, normalized values respectively corresponding to a plurality of time intervals of the reference object in the first preset time period, and use, as an observed quantity, normalized values respectively corresponding to a plurality of time intervals of the target object in the second preset time period, to perform chi-square test to determine a significance level of the number of times that the target object executes the target task in the second preset time period; the determining module is further configured to determine that an event that the target task is executed abnormally exists if the significance level of the number of times that the target object executes the target task within the second preset time period is less than a preset significance level threshold; if the significance level of the times of executing the target task by the target object in the second preset time period is greater than or equal to the preset significance level threshold, determining that no event for abnormally executing the target task exists.
In a third aspect, an embodiment of the present invention provides another data detection apparatus, including: a processor, a memory, a bus, and a communication interface; the memory is used for storing computer execution instructions, the processor is connected with the memory through a bus, and when the data detection device runs, the processor executes the computer execution instructions stored in the memory, so that the data detection device executes the data detection method provided by the first aspect.
In a fourth aspect, an embodiment of the present invention provides a computer-readable storage medium, which includes instructions that, when executed on a data detection apparatus, cause the data detection apparatus to perform a data detection method provided in the first aspect.
In a fifth aspect, an embodiment of the present invention provides a computer program product including instructions, which, when run on a computer, causes the computer to execute the data detection method of the first aspect and any one of the implementations of the first aspect.
According to the data detection method and device provided by the embodiment of the invention, the data detection equipment determines standard values corresponding to the times that the reference object respectively executes the target tasks in a plurality of time intervals of a first preset time period based on the target neural network model, and obtains the times that the target object respectively executes the target tasks in a plurality of time intervals of a second preset time period; then, the data detection equipment carries out data standardization processing on standard values corresponding to the times of executing the target tasks by the reference object in a plurality of time intervals of a first preset time period respectively to obtain standardized values corresponding to the reference object in the plurality of time intervals of the first preset time period respectively, and carries out data standardization processing on the times of executing the target tasks by the target object in a plurality of time intervals of a second preset time period respectively to obtain standardized values corresponding to the target object in the plurality of time intervals of the second preset time period respectively; then the data detection device takes the normalized values of the reference object corresponding to the multiple time intervals in the first preset time period as standard quantities, takes the normalized values of the target object corresponding to the multiple time intervals in the second preset time period as observed quantities, and conducts chi-square test to determine the significance level of the times of executing the target task by the target object in the second preset time period; if the significance level of the times of executing the target task by the target object in the second preset time period is smaller than the preset significance level threshold value, determining that an event of executing the target task abnormally exists by the data detection equipment; and if the significance level of the times of executing the target task by the target object in the second preset time period is greater than or equal to the preset significance level threshold value, the data detection equipment determines that no event for abnormally executing the target task exists. In the embodiment of the invention, the data detection equipment carries out data standardization processing on the standard value corresponding to the times of respectively executing the target tasks by the reference object determined based on the neural network model in a plurality of time intervals of a first preset time period and the acquired times of respectively executing the target tasks by the target object in a plurality of time intervals of a second preset time period, so that the data detection equipment can obtain two groups of data (namely standard quantity and observed quantity) with high reliability and the same order of magnitude; then, the data detection device performs chi-square test to determine the significance level of the times of executing the target task by the target object in the second preset time period, so that the reasonability of the significance level of the times of executing the target task by the target object in the second preset time period determined by the data detection device can be improved, and the accuracy of determining whether an event abnormally executing the target task exists by the data detection device is further improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below.
Fig. 1 is a hardware schematic diagram of a server according to an embodiment of the present invention;
fig. 2 is a first schematic diagram illustrating a data detection method according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a training process of a target neural network model according to an embodiment of the present invention;
FIG. 4 is a diagram illustrating a neural network model according to an embodiment of the present invention;
fig. 5 is a schematic diagram illustrating the number of times that a reference object executes a target task in a plurality of time intervals of a preset time period according to an embodiment of the present invention;
FIG. 6 is a diagram illustrating a chi-squared test threshold table portion according to an embodiment of the present invention;
fig. 7 is a second schematic diagram of a data detection method according to an embodiment of the present invention;
fig. 8 is a schematic diagram illustrating a distribution of normalization values respectively corresponding to a plurality of time intervals of a reference object in a preset time period and a distribution of normalization values respectively corresponding to a plurality of time intervals of a target object in the preset time period according to an embodiment of the present invention;
fig. 9 is a first schematic structural diagram of a data detection apparatus according to an embodiment of the present invention;
fig. 10 is a schematic structural diagram of a data detection apparatus according to an embodiment of the present invention.
Detailed Description
The terms "comprising" and "having," and any variations thereof, as referred to in the description of the present application, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements but may alternatively include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
It should be noted that, in the embodiments of the present invention, words such as "exemplary" or "for example" are used to indicate examples, illustrations or explanations. Any embodiment or design described as "exemplary" or "e.g.," an embodiment of the present invention is not necessarily to be construed as preferred or advantageous over other embodiments or designs. Rather, use of the word "exemplary" or "such as" is intended to present concepts related in a concrete fashion.
The term "and/or" as used herein includes the use of either or both of the two methods.
In the description of the present application, the meaning of "a plurality" means two or more unless otherwise specified.
Some concepts related to a data detection method and apparatus provided by the embodiments of the present invention are explained below.
The neural network model is an operational model formed by a large number of nodes (or neurons) connected with each other. The neural network model may output different results based on different connection patterns and/or different weight values in the neural network. In the embodiment of the present invention, the data detection device may predict, based on the neural network model, standard values corresponding to the times that the reference object performs the target task in each of the plurality of time intervals of the first preset time period.
For example, in the embodiment of the present invention, the order of magnitude corresponding to the number of times that the reference object executes the target task respectively in a plurality of time intervals of the first preset time period may be millions of times, and the order of magnitude corresponding to the number of times that the target object executes the target task respectively in a plurality of time intervals of the second preset time period may be hundreds of times. When the order of magnitude of the evaluation index is different, if the original evaluation index is directly used, the effect of the evaluation index having a lower order of magnitude (for example, the number of times that the target object performs the target task in each of the plurality of time intervals of the second preset time period) may be weakened. Therefore, in order to ensure the validity and reliability of the evaluation result, it is necessary to perform data normalization processing on the evaluation index. In the embodiment of the invention, the data detection device performs data standardization processing on standard values corresponding to the times that the reference object executes the target tasks respectively in a plurality of time intervals of a first preset time period and the times that the target object executes the target tasks respectively in a plurality of time intervals of a second preset time period, and determines standardized values corresponding to the reference object respectively in the plurality of time intervals of the first preset time period and standardized values corresponding to the target object respectively in the plurality of time intervals of the second preset time period.
Chi-square test is a hypothesis testing method, which mainly compares two or more sample rates (composition ratios) and the correlation analysis of two classification variables. The basic idea is to compare the deviation degree of the theoretical frequency (namely, standard quantity) and the actual frequency (namely, observed quantity), wherein the deviation degree between the standard quantity and the observed quantity determines the magnitude of the chi-square value, and if the chi-square value is larger, the deviation degree between the standard quantity and the observed quantity is larger; if the chi-squared value is smaller, the degree of deviation between the chi-squared value and the chi-squared value is smaller. In the embodiment of the invention, the data detection equipment determines the significance level of the times of executing the target task by the target object in the second preset time period through chi-square test, and further determines whether an event for abnormally executing the target task exists.
Based on the problems existing in the background art, the embodiments of the present invention provide a data detection method and apparatus, where a data detection device determines, based on a target neural network model, standard values corresponding to times that a reference object executes a target task respectively in multiple time intervals of a first preset time period, and obtains the times that the target object executes the target task respectively in multiple time intervals of a second preset time period; then, the data detection equipment carries out data standardization processing on standard values corresponding to the times of executing the target tasks by the reference object in a plurality of time intervals of a first preset time period respectively to obtain standardized values corresponding to the reference object in the plurality of time intervals of the first preset time period respectively, and carries out data standardization processing on the times of executing the target tasks by the target object in a plurality of time intervals of a second preset time period respectively to obtain standardized values corresponding to the target object in the plurality of time intervals of the second preset time period respectively; then the data detection device takes the normalized values of the reference object corresponding to the multiple time intervals in the first preset time period as standard quantities, takes the normalized values of the target object corresponding to the multiple time intervals in the second preset time period as observed quantities, and conducts chi-square test to determine the significance level of the times of executing the target task by the target object in the second preset time period; if the significance level of the times of executing the target task by the target object in the second preset time period is smaller than the preset significance level threshold value, determining that an event of executing the target task abnormally exists by the data detection equipment; and if the significance level of the times of executing the target task by the target object in the second preset time period is greater than or equal to the preset significance level threshold value, the data detection equipment determines that no event for abnormally executing the target task exists. In the embodiment of the invention, the data detection equipment carries out data standardization processing on the standard value corresponding to the times of respectively executing the target tasks by the reference object determined based on the neural network model in a plurality of time intervals of a first preset time period and the acquired times of respectively executing the target tasks by the target object in a plurality of time intervals of a second preset time period, so that the data detection equipment can obtain two groups of data (namely standard quantity and observed quantity) with high reliability and the same order of magnitude; then, the data detection device performs chi-square test to determine the significance level of the times of executing the target task by the target object in the second preset time period, so that the reasonability of the significance level of the times of executing the target task by the target object in the second preset time period determined by the data detection device can be improved, and the accuracy of determining whether an event abnormally executing the target task exists by the data detection device is further improved.
An embodiment of the present invention provides a data detection apparatus, where the data detection apparatus may be a server, fig. 1 is a hardware schematic diagram of a server that executes a data detection method provided by an embodiment of the present invention, and as shown in fig. 1, the server 10 may include a processor 101, a memory 102, a network interface 103, and the like.
The processor 101 is a core component of the server 10, and the processor 101 is configured to run an operating system of the server 10 and application programs (including a system application program and a third-party application program) on the server 10, so as to implement the data detection method performed by the server 10.
In an embodiment of the present invention, the processor 101 may be a Central Processing Unit (CPU), a microprocessor, a Digital Signal Processor (DSP), an application-specific integrated circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, transistor logic device, hardware component, or any combination thereof, which is capable of implementing or executing various exemplary logic blocks, modules, and circuits described in connection with the disclosure of the embodiment of the present invention; a processor may also be a combination of computing functions, e.g., comprising one or more microprocessors, a DSP and a microprocessor, or the like.
Optionally, the processor 101 of the server 10 includes one or more CPUs, which are single-core CPUs (single-CPUs) or multi-core CPUs (multi-CPUs).
The memory 102 includes, but is not limited to, a Random Access Memory (RAM), a Read Only Memory (ROM), an erasable programmable read-only memory (EPROM), a flash memory, an optical memory, or the like. The memory 102 holds the code of the operating system.
Optionally, the processor 101 reads the instruction stored in the memory 102 to implement the data detection method in the embodiment of the present invention, or the processor 101 implements the data detection method provided in the embodiment of the present invention by using an instruction stored inside. In the case where the processor 101 implements the data detection method provided by the embodiment of the present invention by reading the execution saved in the memory, the memory stores instructions for implementing the data detection method provided by the embodiment of the present invention.
The network interface 103 is a wired interface, such as a Fiber Distributed Data Interface (FDDI) interface or a Gigabit Ethernet (GE) interface. Alternatively, the network interface 103 is a wireless interface. The network interface 103 is used for the server 10 to communicate with other devices.
The memory 102 is used for storing a plurality of time intervals within a historical time period. Optionally, the memory 102 is further configured to store the number of times that the reference object performs the target task in a plurality of time intervals in the historical time period, respectively, and the like. The at least one processor 101 further executes the method according to the embodiments of the present invention according to the plurality of time intervals in the historical time period saved in the memory 102 and the number of times the reference object executes the target task in each of the plurality of time intervals in the historical time period. For more details of the above functions implemented by the processor 101, reference is made to the following description of various method embodiments.
Optionally, the server 10 further includes a bus, and the processor 101 and the memory 102 are connected to each other through the bus 104, or are otherwise known to each other.
Optionally, the server 10 further includes an input/output interface 105, where the input/output interface 105 is configured to connect to an input device, and receive the number of times that the target object, input by the user through the input device, executes the target task in each of a plurality of time intervals of a second preset time period. Input devices include, but are not limited to, a keyboard, a touch screen, a microphone, and the like. The input/output interface 105 is also used for connecting with an output device, and outputting a data detection result of the processor 101 (i.e., an event for determining whether there is an abnormal execution target task). Output devices include, but are not limited to, a display, a printer, and the like.
The data detection method and device provided by the embodiment of the invention are applied to a scene that whether an event for abnormally executing a target task exists is determined by an operator or data detection equipment of the operator. When the data detection device needs to determine whether an event that the target task is executed abnormally exists, specifically, whether the target object executes the target task abnormally exists, the data of the target object (specifically, the number of times that the target object executes the target task in each of a plurality of time intervals of a second preset time period) may be detected according to the method provided by the embodiment of the present invention.
As shown in fig. 2, the data detection method provided in the embodiment of the present invention may include S101 to S105:
s101, determining standard values corresponding to the times that the reference object respectively executes the target tasks in a plurality of time intervals of a first preset time period by the data detection device based on the target neural network model.
It should be understood that the above reference object may be one operator or a plurality of operators, and a plurality of operators may belong to one or a plurality of business halls. When the reference object is a plurality of salesclerks, the number of times that the reference object performs the target task in a time interval of the first preset time period is the sum of the number of times that the plurality of salesclerks perform the target task in the time interval of the preset time period. The number of times the reference object performs the target task may be the number of times the business is operated by the salesperson, for example, the number of times the sales clerk transacts the communication package.
It is understood that, in order to improve the training (or convergence) speed of the target neural network model, the input of the target neural network model may be normalized, and thus, the value determined based on (or through) the target neural network model (or the output of the target neural network model) may also be understood as a normalized value, i.e., a standard value corresponding to the number of times that the reference object performs the target task in each of the time intervals of the first preset time period.
The first preset time period may be 1 day, 1 week, 1 month, or the like, and the one time interval may be 1 hour, 1 day, 1 week, or the like, and the embodiment of the present invention is not particularly limited. Also, the time length of the first preset time period should be greater than the time length of a time interval of the first preset time period, for example, when the first preset time period is 1 day (or some 1 day), the time interval of the first preset time period should be less than 1 day (e.g., 1 hour).
In one implementation, in step S101, the data detection device may be directly used when the target neural network model has been trained, that is, the determined number of times that the reference object performs the target task in each of the time intervals of the first preset time period based on the target neural network model.
In another implementation manner of the embodiment of the present invention, the data detection device may further perform neural network training based on historical log data to obtain the target neural network model. The historical log data comprises a plurality of time intervals in the historical time period and the times of executing the target task by the reference object in the time intervals in the historical time period.
In the embodiment of the invention, in a plurality of time intervals in the historical time period, the number of the time intervals is the same as that in a plurality of time intervals in the first preset time period. For example, assuming that the historical period of time is M days, 1 hour is a time interval, and 24 hours are available in 1 day, a plurality of time intervals in the historical period of time represent 24 hours of each day in the M days, for example, 0 hour-1 hour is a time interval. Further, assuming that the first preset time period is N days, the plurality of time intervals within the first preset time period represent 24 hours of each day of the N days, that is, the number of the time intervals corresponding to the historical time period and the number of the time intervals corresponding to the first preset time period are both 24 hours.
As shown in fig. 3, the following is a process of how the data detection device trains the historical log data and obtains the target neural network model in the embodiment of the present invention, and the process specifically includes S201 to S204:
s201, the data detection device acquires historical time log data.
Illustratively, taking the granularity of the historical time period as days and the granularity of the time intervals as hours as examples, assuming that the historical time period is from 1/5/2020/5/30/2020 and 1 hour represents 1 time interval, 1 day corresponds to 24 time intervals, and the 24 time intervals include 0 hour-1 hour, 1 hour-2 hours and … … 23 hours-24 hours (24 hours, namely 0 hour), so that the data detection apparatus obtains 24 time intervals of each day in 30 days of the 5 months (namely, 1/2020/5/30 days) and the number of times of executing the target task in each 24 time intervals of each day in the 30 days.
S202, configuring relevant parameters for the neural network model by the data detection equipment.
It should be understood that the neural network model includes an input layer, an output layer and at least one hidden layer, wherein relevant parameters of the neural network model may include the number of nodes of the input layer, the number of hidden layers (or hidden layers), the number of nodes in the hidden layers and the number of nodes of the output layer. Specifically, the data detection apparatus may determine a batch size (batch size) for the initial neural network model (for convenience of description, the initially trained neural network model is hereinafter referred to as the initial neural network model), where the batch size is the number of nodes included in the input layer, and for example, assuming that the batch size determined by the data detection apparatus is a product of a and 24 (i.e., 24 hour intervals per day), the number of nodes included in the input layer is also a product of a and 24, and a is an integer greater than or equal to 1.
It is understood that relevant parameters of the neural network model may also include weights and biases between nodes (including between input layer nodes and hidden layer nodes, between hidden layer and hidden layer nodes, and between hidden layer and output layer), activation functions between different layers, and training (or iteration) times of the neural network model, etc.
S203, the data detection equipment trains the neural network model and reaches the target training times.
In the following, 1 training process of the neural network model is given as an example.
Assuming that the batch scale determined by the data detection device is 10 × 24, it is determined that the number of nodes of the input layer of the initial neural network model is 240, that is, the initial neural network model includes 240 input layer nodes, and assuming that the number of hidden layers is 1, the number of nodes of the hidden layers is 5 (that is, 5 hidden layer nodes), and the number of nodes of the output layer is 24 (that is, 24 output layer nodes), so that the neural network model schematic diagram shown in fig. 4 can be obtained.
It is also assumed that,
Figure BDA0002569398150000081
represents the ith node weight of the input layer and the jth node weight of the hidden layer,
Figure BDA0002569398150000082
represents the weight between the jth node of the hidden layer and the kth base point of the output layer,
Figure BDA0002569398150000083
indicating the bias of the jth node of the hidden layer,
Figure BDA0002569398150000084
representing the offset of the kth node of the output layer, xi representing the input value of the ith node of the input layer,
Figure BDA0002569398150000085
the input value representing the jth node of the hidden layer,
Figure BDA0002569398150000086
representing the input value of the kth node of the output layer,
Figure BDA0002569398150000087
the method comprises the steps of representing the output value of a j-th node of a hidden layer after an activation function theta (x), yk representing the output value of a k-th node of an output layer, i being an integer less than or equal to 240, j being an integer less than or equal to 5, and k being an integer less than or equal to 24.
Thus, the input value of the jth node of the hidden layer satisfies the following conditions:
Figure BDA0002569398150000088
specifically, the input value of the 1 st node of the hidden layer satisfies:
Figure BDA0002569398150000089
moreover, the output value of the 1 st node of the hidden layer after passing through the activation function θ (x) is specifically:
Figure BDA00025693981500000810
and the input value of the kth node of the output layer satisfies:
Figure BDA00025693981500000811
specifically, the input value of the 1 st node of the output layer satisfies:
Figure BDA00025693981500000812
it should be noted that, since the output layer has no activation function, the input of the output layer is the output of the output layer, that is, the output layer
Figure BDA00025693981500000813
Value of (a) and ykIs equal, so that the output value of the kth node of the output layer satisfies:
Figure BDA00025693981500000814
namely, the output value of the 1 st node of the output layer satisfies:
Figure BDA00025693981500000815
it should be appreciated that the above weights and biases can be zero initialized at the first training, i.e., configured to 0, and then updated and calculated using an Optimizer (e.g., Adam Optimizer) and configuring the corresponding learning rate. And then the updated and calculated weight and bias are used as initial values in the next training, and after the preset training (or iteration) times are finished in the next training by adopting the same method, the weight and the bias reach optimal values to obtain the target neural network model. The test data input by the input layer can obtain a test value closer to a real value through the trained target neural network model.
S204, inputting the standard values corresponding to the multiple time intervals of the first preset time period into the target neural network model by the data detection equipment, and determining the standard values corresponding to the times that the reference object respectively executes the target task in the multiple time intervals of the first preset time period.
It should be understood that, after the training of the target neural network model is completed, a standard value (i.e., an input value for performing the normalization process) corresponding to the time information (i.e., a plurality of time intervals of the first preset time period, for example, 24 hours per day in 10 (or a) days) with the same batch size may be input to the target neural network model, and the standard value corresponding to the number of times that the reference object performs the target task in each of the plurality of time intervals of the first preset time period is predicted by the target neural network model.
In an implementation manner of the embodiment of the present invention, the data detection device may perform denormalization on a standard value corresponding to a number of times that the reference object performs the target task in each of a plurality of time intervals of a first preset time period, so as to determine a number of times that the reference object performs the target task in each of the plurality of time intervals of the first preset time period.
For example, as shown in fig. 5, the method is an example of the number of times that the data detection device determines that the reference object performs the target task in the plurality of time intervals of the first preset time period after denormalizing the standard value corresponding to the number of times that the reference object performs the target task in the plurality of time intervals of the first preset time period. Wherein, the plurality of time intervals are the 24 time intervals.
From fig. 5, it can be determined that the reference object performs the target task for zero times in the first 5 time intervals (i.e. 0-1 time, 1-2 time, 2-3 time, 3-4 time and 4-5 time), the reference object performs the target task for a gradually increasing trend from the 6 th time interval (i.e. 5-6 time), and then reaches the maximum value of the number of times of the reference object performing the target task in the 24 time intervals (above 1500000 times, which can also be understood as the first peak in the present schematic diagram) in about the 10 th time interval (i.e. 9-10 time), and is downward in the 11 time intervals (i.e. 10-11 time), and then is upward again in the 14 th time interval (i.e. 13-14 time), and reaches the second peak in the 17 th time interval (i.e. 16-17 time), and then trends downward until the last time interval (i.e., 24 th time interval, 23 th-24 th).
S102, the data detection device obtains the times of target tasks of the target object respectively executed in a plurality of time intervals of a second preset time period.
It should be understood that, similar to the above description of the reference object, the target object may also be one salesperson or a plurality of salespersons, and a plurality of salespersons may belong to one or a plurality of business halls. When the target object is a plurality of salesclerks, the number of times that the target object executes the target task in a time interval of the second preset time period is the sum of the number of times that the plurality of salesclerks execute the target task in the time interval of the second preset time period.
In the embodiment of the present invention, the first preset time period and the second preset time period may be different. For example, the first preset time period may be from 1/6/2020 to 30/6/2020, and the second preset time period may be any B days in 7/2020 (i.e., from 1/7/2020 to 31/7/2020), where B is an integer greater than or equal to 1 and less than or equal to 31.
It should be noted that the execution order of S101 and S102 is not limited in the embodiments of the present invention. For example, S101 may be performed first and then S102 may be performed, or S102 may be performed first and then S101 may be performed, or S101 and S102 may be performed simultaneously.
S103, the data detection device performs data standardization processing on the times of the reference object executing the target tasks in the multiple time intervals of the first preset time period respectively to obtain standardized values corresponding to the multiple time intervals of the reference object in the first preset time period respectively, and performs data standardization processing on the times of the target object executing the target tasks in the multiple time intervals of the second preset time period respectively to obtain standardized values corresponding to the multiple time intervals of the target object in the second preset time period respectively.
It will be appreciated that the order of magnitude (or granularity) of the reference object may not be the same as the order of magnitude (or granularity) of the target object, which may result in the reference object performing the target task a different order of magnitude (or granularity) than the target object performing the target task. For example, when the reference object is a plurality of salespeople and the target object is a salespeople, the reference object may perform the target task on the order of millions, and the target object may perform the target task on the order of hundreds. The standard value corresponding to the number of times that the reference object executes the target task in each of the plurality of time intervals of the first preset time period is a value obtained after the input value after the normalization processing is input to the target neural network model (the corresponding denormalized value of the value is the number of times that the reference object executes the target task in each of the plurality of time intervals of the first preset time period), and the number of times that the target object executes the target task in each of the plurality of time intervals of the second preset time period is different from the physical quantity represented by the normalized value. In this way, the data detection device can perform data standardization processing on the standard values corresponding to the times that the reference object executes the target tasks respectively in the multiple time intervals of the first preset time period (hereinafter referred to as target data of the reference object) and the times that the target object executes the target tasks respectively in the multiple time intervals of the second preset time period (hereinafter referred to as target data of the target object), so that the target data of the reference object and the target data of the target object are in the same order of magnitude, and the effectiveness and reliability of data analysis and data detection are improved.
Specifically, the method of the data detection apparatus performing the data normalization process on the target data of the reference object and the target data of the target object includes a min-max normalization process and a z-score normalization process.
For the min-max normalization processing, after the min-max normalization processing is performed on the target data of the reference object, normalized values corresponding to a plurality of time intervals of the reference object in a first preset time period can be determined, and the normalized values satisfy:
Figure BDA0002569398150000101
wherein, y'kIndicating a normalized value, y, corresponding to the kth time interval of the reference object in the first preset time periodkA standard value y corresponding to the number of times that the reference object performs the target task in the kth time interval of the first preset time periodminMultiple times of the reference object in the first preset time periodA standard value, y, corresponding to the minimum value of the number of times the target task is executed within the intervalmaxAnd k is a positive integer less than or equal to 24, and represents a standard value corresponding to the maximum value of the times of executing the target task by the reference object in a plurality of time intervals of the first preset time period.
For the z-score standardization processing, the standardization values of the reference object in a plurality of time intervals of a first preset time period respectively satisfy:
Figure BDA0002569398150000111
wherein μ represents an average of standard values corresponding to the number of times that the reference object performs the target task within the plurality of time intervals of the first preset time period, and σ represents a standard deviation of the standard values corresponding to the number of times that the reference object performs the target task within the plurality of time intervals of the first preset time period.
It should be noted that the process of the data normalization processing of the target data of the target object is the same as or similar to the process of the data normalization processing of the target data of the reference object, and is not repeated here.
And S104, taking the normalized values respectively corresponding to the multiple time intervals of the reference object in the first preset time period as standard quantities, taking the normalized values respectively corresponding to the multiple time intervals of the target object in the second preset time period as observed quantities, and performing chi-square test to determine the significance level of the times of executing the target task by the target object in the preset time period.
In combination with the foregoing description of the chi-square test, it should be understood that, in the embodiment of the present invention, by performing the chi-square test on the standard quantity and the observed quantity, the deviation degree between the normalized value corresponding to each of the plurality of time intervals of the second preset time period of the target object and the normalized value corresponding to each of the plurality of time intervals of the first preset time period of the reference object can be determined, and thus the significance level of the number of times that the target object performs the target task within the second preset time period can be determined.
In an implementation manner of the embodiment of the present invention, determining the significance level of the number of times that the target object executes the target task within the second preset time period through chi-square test specifically includes steps 1 to 2:
step 1, the data detection device takes the normalized values of the reference object corresponding to a plurality of time intervals in a first preset time period as standard quantities, takes the normalized values of the target object corresponding to a plurality of time intervals in a second preset time period as observed quantities, and determines the chi-square value and the degree of freedom corresponding to chi-square test.
Specifically, the chi-square test corresponds to a chi-square value satisfying:
Figure BDA0002569398150000112
wherein, X2The chi-square value corresponding to the chi-square test is represented, f (i) the normalized value corresponding to the ith time interval of the target object in the second preset time period is represented, F (i) the normalized value corresponding to the ith time interval of the reference object in the first preset time period is represented, l represents the number of a plurality of time intervals, and l is a positive integer greater than or equal to 1.
The chi-square test shows that the corresponding degrees of freedom meet:
df=(h-1)×(l-1);
where df represents the degree of freedom corresponding to chi-square test, h represents the number of objects, and l represents the number of multiple time intervals.
In the embodiment of the present invention, the data detection apparatus may use the reference object as 1 complete object, use the target object as 1 object, that is, the value of h is 2, and assume that the number of the plurality of time intervals is 24, that is, the value of l is 24, so that an example of the normalized value (that is, the standard quantity) corresponding to each of the plurality of time intervals of the reference object in the first preset time period and the normalized value (that is, the observed quantity) corresponding to each of the plurality of time intervals of the target object in the second preset time period, as shown in table 1 below, may be determined.
TABLE 1
Figure BDA0002569398150000121
Figure BDA0002569398150000131
Combining the formula in step 1 and table 1, it can be determined that the chi-squared value and the degree of freedom corresponding to the chi-squared test are 0.379 and 23, respectively.
And 2, determining significance levels corresponding to the chi-square value and the degree of freedom from the chi-square test critical value table by the data detection equipment.
As shown in fig. 6, which is an example of a part of the chi-square test critical value table, since the chi-square test illustrated in the above step 1 corresponds to a degree of freedom of 23 and a chi-square value of 0.379 (less than 22.337), it can be determined that the significance level (a) corresponding to the chi-square value and the degree of freedom (i.e., the number of times the target object performs the target task within the second preset time period) is greater than 0.5.
S105, if the significance level of the times of executing the target task by the target object in a second preset time period is smaller than a preset significance level threshold value, determining that an event of executing the target task abnormally exists by the data detection equipment; and if the significance level of the times of executing the target task by the target object in the second preset time period is greater than or equal to the preset significance level threshold value, the data detection equipment determines that no event for abnormally executing the target task exists.
It should be understood that a smaller significance level of the number of times that the target object performs the target task within the second preset time indicates a larger difference (i.e., a larger degree of deviation) between when the target object performs the target task within the second preset time and when the reference object performs the target task within the first preset time. The data detection equipment determines that an event for abnormally executing the target task exists, namely the data detection equipment determines that an abnormal condition exists when the target object executes the target task within second preset time; similarly, the data detection device determines that there is no event for executing the target task abnormally, and determines that there is no abnormal condition when the target object executes the target task within the second preset time for the data detection device.
In connection with the example in step 2 above, assuming that the preset significance level threshold is 0.05, the data detection apparatus determines that there is no event that abnormally performs the target task (within the second preset time of the target object).
In the embodiment of the invention, the data detection equipment determines standard values corresponding to the times that the reference object respectively executes the target tasks in a plurality of time intervals of a first preset time period based on a target neural network model, and acquires the times that the target object respectively executes the target tasks in a plurality of time intervals of a second preset time period; then, the data detection equipment carries out data standardization processing on standard values corresponding to the times of executing the target tasks by the reference object in a plurality of time intervals of a first preset time period respectively to obtain standardized values corresponding to the reference object in the plurality of time intervals of the first preset time period respectively, and carries out data standardization processing on the times of executing the target tasks by the target object in a plurality of time intervals of a second preset time period respectively to obtain standardized values corresponding to the target object in the plurality of time intervals of the second preset time period respectively; then the data detection device takes the normalized values of the reference object corresponding to the multiple time intervals in the first preset time period as standard quantities, takes the normalized values of the target object corresponding to the multiple time intervals in the second preset time period as observed quantities, and conducts chi-square test to determine the significance level of the times of executing the target task by the target object in the second preset time period; if the significance level of the times of executing the target task by the target object in the second preset time period is smaller than the preset significance level threshold value, determining that an event of executing the target task abnormally exists by the data detection equipment; and if the significance level of the times of executing the target task by the target object in the second preset time period is greater than or equal to the preset significance level threshold value, the data detection equipment determines that no event for abnormally executing the target task exists. In the embodiment of the invention, the data detection equipment carries out data standardization processing on the standard value corresponding to the times of respectively executing the target tasks by the reference object determined based on the neural network model in a plurality of time intervals of a first preset time period and the acquired times of respectively executing the target tasks by the target object in a plurality of time intervals of a second preset time period, so that the data detection equipment can obtain two groups of data (namely standard quantity and observed quantity) with high reliability and the same order of magnitude; then, the data detection device performs chi-square test to determine the significance level of the times of executing the target task by the target object in the second preset time period, so that the reasonability of the significance level of the times of executing the target task by the target object in the second preset time period determined by the data detection device can be improved, and the accuracy of determining whether an event abnormally executing the target task exists by the data detection device is further improved.
As shown in fig. 7, in an implementation manner, a data detection method provided by the embodiment of the present invention includes S301 to S305:
s301, the data detection device determines standard values corresponding to the times that the reference object respectively executes the target tasks in a plurality of time intervals of a first preset time period based on the target neural network model.
S302, the data detection device obtains the times of executing the target tasks by the target objects in a second preset time period.
The times that a target object executes a target task in a second preset time period include the times that the target object executes the target task in a plurality of time intervals of the second preset time period respectively.
S303, the data detection device performs data standardization processing on standard values corresponding to the times of executing the target tasks by the reference object in a plurality of time intervals of a first preset time period respectively to obtain standardized values corresponding to the time intervals of executing the target tasks by the reference object in the first preset time period respectively, and performs data standardization processing on the times of executing the target tasks by the target objects in a second preset time period respectively to obtain standardized values of the target objects in the second preset time period.
The normalized value of one target object in the second preset time period comprises normalized values corresponding to a plurality of time intervals of the target object in the second preset time period.
S304, the data detection device determines the significance level of the times of executing the target tasks by the target objects in a second preset time period.
It should be noted that the descriptions of the processes of S301 to S304 are the same as or similar to the descriptions of the processes of S101 to S104, and are not repeated herein.
S305, the data detection device determines that the target objects corresponding to N significance levels smaller than a preset significance level threshold have events of abnormally executing the target tasks in the significance levels of the times of executing the target tasks by the target objects in a second preset time period.
Wherein N is a positive integer greater than or equal to 1.
It should be understood that when the N significance levels are smaller than the preset significance level threshold, the data detection device may determine that the target objects (i.e., the N target objects) corresponding to the N significance levels have an event that abnormally performs the target task.
For example, table 2 below is an example of a significance level of the number of times that the plurality of target objects individually perform the target task within the second preset time.
TABLE 2
Figure BDA0002569398150000151
Assuming that the preset significance level threshold is 0.05, the data detection device determines that there is an event that the target object 3 and the target object 4 abnormally perform the target task, that is, N is 2 as described above.
In another implementation manner of the embodiment of the present invention, after S304, the data detection device may further determine, as an object that has an abnormal execution target task, a target object corresponding to the minimum M saliency levels among the saliency levels of the number of times that the target objects respectively execute the target tasks in the second preset time period, where M is a positive integer greater than or equal to 1.
Illustratively, in conjunction with the example in table 2 above, assuming that M is 3, the data detection apparatus may determine that the target object 2, the target object 3, and the target object 4 are objects for which there is an abnormal execution target task.
In connection with the description of the above embodiments, it should be understood that, in the case where the data detection apparatus determines the chi-square value and the degree of freedom corresponding to the chi-square test by using the normalized values corresponding to the plurality of time intervals of the reference object in the first preset time period as the standard quantities and the normalized values corresponding to the plurality of time intervals of the target object in the second preset time period as the observed quantities, the degree of freedom corresponding to any one target object (specifically, any one target object and the reference object are subjected to the chi-square test) is the same (that is, when the number of the plurality of time intervals is 24, the degree of freedom corresponding to any one target object is 23), and it can be determined from the chi-square test critical value table that the greater the chi-square value is the same, the smaller the significance level is.
Thus, in an implementation manner of the embodiment of the present invention, the data detection device may further configure a corresponding preset chi-square value threshold, and if a chi-square value of the number of times that the target object executes the target task within the second preset time (specifically, a chi-square value corresponding to a normalized value corresponding to each of a plurality of time intervals of the target object within the second preset time and a chi-square value corresponding to a normalized value corresponding to each of a plurality of time intervals of the reference object within the first preset time) is greater than the preset chi-square value threshold, it is determined that an event that the target task is abnormally executed exists; and if the chi-square value of the times of executing the target task by the target object in the second preset time is less than or equal to the preset chi-square value threshold, determining that no event for abnormally executing the target task exists.
For example, table 3 below is an example of chi-squared values of the number of times that the target objects respectively execute the target tasks within the second preset time.
TABLE 3
Figure BDA0002569398150000161
Assuming that the preset chi-square value threshold is 1.0000, the data detecting apparatus determines the target object 2 as an object having an event of abnormally performing the target task.
In another implementation manner of the embodiment of the present invention, the data detection device may further analyze a reason that the target object may have an event that abnormally executes the target task, based on a normalized value distribution map formed by normalized values corresponding to a plurality of time intervals of the first preset time period of the reference object and normalized values corresponding to a plurality of time intervals of the target object in the second preset time period.
As shown in fig. 8, it is assumed that a curve 1 is a distribution of normalized values corresponding to a plurality of time intervals of a first preset time period of a reference object, and a curve 2 is a distribution of normalized values corresponding to a plurality of time intervals of a second preset time period of a target object. As can be determined from fig. 8, curve 1 shows two peaks in the 9 th time interval (i.e., 8 th time-9 th time) and the 16 th time interval (i.e., 15 th time-16 th time), respectively, and 1 trough in the 12 th time interval (i.e., 11 th time-12 th time); for curve 2, it can be determined that it has two peaks respectively occurring at 12 th and 18 th time intervals (i.e., 17 th-18 th), and one trough occurring at 16 th time interval.
Based on curve 1 in fig. 8, it can be determined that the two peaks of the reference object both appear during working hours (i.e., 8-hour-9 hours and 15-hour-16 hours), the trough of curve 1 appears near the time of rest (i.e., 11-hour-12 hours), while the two peaks of curve 2 appear at times that should be near rest (i.e., 11-hour-12 hours and 17-hour-18 hours), and the trough appears during working hours (i.e., 15-hour-16 hours). In this way, the data detection device may analyze that the target object performs other tasks during the working time when the target task should be performed, and performs the target task during the non-working time, so that the target object has an event that abnormally performs the target task.
In the embodiment of the present invention, the data detection device and the like may be divided into functional modules according to the method example, for example, each functional module may be divided corresponding to each function, or two or more functions may be integrated into one processing module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. It should be noted that, the division of the modules in the embodiment of the present invention is schematic, and is only a logic function division, and there may be another division manner in actual implementation.
In the case of dividing each functional module according to each function, fig. 9 shows a schematic diagram of a possible structure of the data detection apparatus according to the above embodiment, and as shown in fig. 9, the data detection apparatus 20 may include: a determination module 201 and an acquisition module 202.
The determining module 201 is configured to determine, based on the target neural network model, standard values corresponding to the times that the reference object executes the target task in a plurality of time intervals of a first preset time period, respectively.
The obtaining module 202 is configured to obtain the times that the target object executes the target task in a plurality of time intervals of a second preset time period.
The determining module 201 is further configured to perform data normalization on the times that the reference object performs the target task in the multiple time intervals of the first preset time period, to obtain normalized values that the reference object corresponds to in the multiple time intervals of the first preset time period, and perform data normalization on the times that the target object performs the target task in the multiple time intervals of the second preset time period, to obtain normalized values that the target object corresponds to in the multiple time intervals of the second preset time period.
The determining module 201 is further configured to use the normalized values of the reference object corresponding to the multiple time intervals of the first preset time period as standard quantities, and use the normalized values of the target object corresponding to the multiple time intervals of the second preset time period as observed quantities, and perform chi-square test to determine the significance level of the number of times that the target object executes the target task within the preset time period.
The determining module 201 is further configured to determine that an event that the target task is executed abnormally exists if the significance level of the number of times that the target object executes the target task within the second preset time period is less than a preset significance level threshold; if the significance level of the times of executing the target task by the target object in the second preset time period is greater than or equal to the preset significance level threshold, determining that no event for abnormally executing the target task exists.
Optionally, the determining module 201 is specifically configured to use, as the standard quantity, normalized values corresponding to a plurality of time intervals of the reference object in the first preset time period, and use, as the observed quantity, normalized values corresponding to a plurality of time intervals of the target object in the second preset time period, to determine a chi-square value and a degree of freedom corresponding to chi-square test; and determining a significance level corresponding to the chi-squared value and the degree of freedom from a chi-squared test critical value table.
Optionally, the determining module 201 is further configured to perform neural network training on the historical log data to obtain the target neural network model; the historical log data comprises a plurality of time intervals in a historical time period and the times of the target task which is executed by the reference object in the time intervals in the historical time period respectively.
Optionally, the determining module 201 is further configured to determine a significance level of the number of times that the target tasks are respectively executed by the plurality of target objects within the second preset time period; and determining that, in the significance levels of the times of executing the target task by the target objects in the second preset time period, the target objects corresponding to N significance levels smaller than the preset significance level threshold have an event of executing the target task abnormally, where N is a positive integer greater than or equal to 1.
Fig. 10 shows a schematic diagram of a possible configuration of the data acquisition device according to the exemplary embodiment described above, in the case of an integrated unit. As shown in fig. 10, the data acquisition device 30 may include: a processing module 301 and a communication module 302. The processing module 301 may be configured to control and manage the actions of the data acquisition device 30, for example, the processing module 301 may be configured to support the data acquisition device 30 to execute S101, S103, S104, and S105 in the above method embodiments. The communication module 302 may be used to support communication of the data acquisition apparatus 30 with other entities, for example, the communication module 302 may be used to support the data acquisition apparatus 30 to execute S102 in the above-described method embodiment. Optionally, as shown in fig. 10, the data acquisition device 30 may further include a storage module 303 for storing program codes and data of the data acquisition device 30.
The processing module 301 may be a processor or a controller (e.g., the processor 101 shown in fig. 1). The communication module 302 may be a transceiver, a transceiver circuit, or a communication interface, etc. (e.g., may be the network interface 103 shown in fig. 1 described above). The storage module 303 may be a memory (e.g., may be the memory 102 described above with reference to fig. 1).
When the processing module 301 is a processor, the communication module 302 is a transceiver, and the storage module 303 is a memory, the processor, the transceiver, and the memory may be connected via a bus. The bus may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc.
It should be understood that, in various embodiments of the present invention, the sequence numbers of the above-mentioned processes do not mean the execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation on the implementation process of the embodiments of the present invention.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented using a software program, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. The procedures or functions described in accordance with the embodiments of the invention are all or partially effected when the computer program instructions are loaded and executed on a computer. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored on a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website, computer, server, or data center to another website, computer, server, or data center via wire (e.g., coaxial cable, fiber optics, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or can comprise one or more data storage devices, such as a server, a data center, etc., that can be integrated with the medium. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.
The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (10)

1. A method of data detection, the method comprising:
determining standard values corresponding to the times that the reference object respectively executes the target task in a plurality of time intervals of a first preset time period based on the target neural network model;
acquiring the times of respectively executing the target tasks by the target object in a plurality of time intervals of a second preset time period;
performing data standardization processing on standard values corresponding to the times of executing the target tasks by the reference object in the multiple time intervals of the first preset time period respectively to obtain standardized values corresponding to the multiple time intervals of the reference object in the first preset time period respectively, and performing data standardization processing on the times of executing the target tasks by the target object in the multiple time intervals of the second preset time period respectively to obtain standardized values corresponding to the target object in the multiple time intervals of the second preset time period respectively;
taking the normalized values respectively corresponding to the multiple time intervals of the reference object in the first preset time period as standard quantities, taking the normalized values respectively corresponding to the multiple time intervals of the target object in the second preset time period as observed quantities, and performing chi-square test to determine the significance level of the times of executing the target task by the target object in the second preset time period;
if the significance level of the times of executing the target task by the target object in the second preset time period is smaller than a preset significance level threshold value, determining that an event for abnormally executing the target task exists; if the significance level of the times of executing the target task by the target object in the second preset time period is greater than or equal to the preset significance level threshold, determining that no event for abnormally executing the target task exists.
2. The method according to claim 1, wherein determining the significance level of the number of times the target object performs the target task within the second preset time period by chi-square test specifically comprises:
respectively taking the normalized values of the reference object corresponding to the multiple time intervals in the first preset time period as standard quantities, respectively taking the normalized values of the target object corresponding to the multiple time intervals in the second preset time period as observed quantities, and determining a chi-square value and a degree of freedom corresponding to chi-square test;
determining a significance level corresponding to the chi-squared value and the degree of freedom from a chi-squared test critical value table.
3. The method of claim 2, further comprising:
performing neural network training on historical log data to obtain the target neural network model; wherein the historical log data includes the plurality of time intervals within a historical time period and a number of times the reference object performed the target task within the plurality of time intervals within the historical time period, respectively.
4. The method according to any one of claims 1 to 3, further comprising:
determining a significance level of a number of times that a plurality of target objects each perform the target task within the second preset time period;
determining that, among the significance levels of the times of executing the target tasks by the target objects in the second preset time period, events of executing the target tasks abnormally exist in the target objects corresponding to N significance levels smaller than the preset significance level threshold, where N is a positive integer greater than or equal to 1.
5. A data detection apparatus, comprising: a determining module and an obtaining module;
the determining module is used for determining standard values corresponding to the times that the reference object respectively executes the target tasks in a plurality of time intervals of a first preset time period based on the target neural network model;
the acquisition module is used for acquiring the times of executing the target task by the target object in a plurality of time intervals of a second preset time period;
the determining module is further configured to perform data normalization on standard values corresponding to times that the reference object executes the target task respectively in a plurality of time intervals of the first preset time period to obtain normalized values corresponding to the reference object in the plurality of time intervals of the first preset time period, and perform data normalization on times that the target object executes the target task respectively in the plurality of time intervals of the second preset time period to obtain normalized values corresponding to the target object in the plurality of time intervals of the second preset time period;
the determining module is further configured to use, as a standard quantity, normalized values respectively corresponding to a plurality of time intervals of the reference object in the first preset time period, and use, as an observed quantity, normalized values respectively corresponding to a plurality of time intervals of the target object in the second preset time period, to perform chi-square test to determine a significance level of the number of times that the target object executes the target task in the preset time period;
the determining module is further configured to determine that an event that the target task is executed abnormally exists if the significance level of the times that the target object executes the target task within the second preset time period is smaller than a preset significance level threshold; if the significance level of the times of executing the target task by the target object in the second preset time period is greater than or equal to the preset significance level threshold, determining that no event for abnormally executing the target task exists.
6. The apparatus of claim 5,
the determining module is specifically configured to determine a chi-square value and a degree of freedom corresponding to chi-square test by using standardized values of the reference object corresponding to a plurality of time intervals of the first preset time period as standard quantities and using standardized values of the target object corresponding to a plurality of time intervals of the second preset time period as observed quantities; and determining a significance level corresponding to the chi-squared value and the degree of freedom from a chi-squared test critical value table.
7. The apparatus of claim 6,
the determining module is further configured to perform neural network training on the historical log data to obtain the target neural network model; wherein the historical log data includes the plurality of time intervals within a historical time period and a number of times the reference object performed the target task within the plurality of time intervals within the historical time period, respectively.
8. The apparatus according to any one of claims 5 to 7,
the determining module is further configured to determine a significance level of the number of times that the target tasks are respectively executed by the plurality of target objects within the second preset time period; and determining that, in the significance levels of the times of executing the target tasks by the target objects in the second preset time period, the target objects corresponding to N significance levels smaller than the preset significance level threshold have an event of executing the target tasks abnormally, where N is a positive integer greater than or equal to 1.
9. A data detection apparatus, characterized in that the data detection apparatus comprises: a processor, a memory, a bus, and a communication interface; the memory is used for storing computer-executable instructions, and when the data detection device runs, the processor executes the computer-executable instructions stored in the memory so as to enable the data detection device to execute the data detection method according to any one of claims 1 to 3.
10. A computer readable storage medium having instructions stored therein, which when run on a data detection apparatus, cause the data detection apparatus to perform a data detection method according to any one of claims 1 to 3.
CN202010632128.5A 2020-07-03 2020-07-03 Data detection method and device Pending CN111880986A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010632128.5A CN111880986A (en) 2020-07-03 2020-07-03 Data detection method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010632128.5A CN111880986A (en) 2020-07-03 2020-07-03 Data detection method and device

Publications (1)

Publication Number Publication Date
CN111880986A true CN111880986A (en) 2020-11-03

Family

ID=73150215

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010632128.5A Pending CN111880986A (en) 2020-07-03 2020-07-03 Data detection method and device

Country Status (1)

Country Link
CN (1) CN111880986A (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102483775A (en) * 2009-06-30 2012-05-30 生命扫描有限公司 Analyte testing method and system
CN102694696A (en) * 2012-05-14 2012-09-26 中国科学院计算机网络信息中心 Method and device for anomaly detection of DNS (domain name system) server
CN108009036A (en) * 2017-11-17 2018-05-08 亚信科技(中国)有限公司 A kind of method and server for positioning the operation for causing data exception
CN108833409A (en) * 2018-06-15 2018-11-16 北京网思科平科技有限公司 webshell detection method and device based on deep learning and semi-supervised learning
CN110377491A (en) * 2019-07-10 2019-10-25 中国银联股份有限公司 A kind of data exception detection method and device
CN110503204A (en) * 2018-05-17 2019-11-26 国际商业机器公司 Identification is used for the migration models of machine learning task
CN110535864A (en) * 2019-08-30 2019-12-03 北京达佳互联信息技术有限公司 Service method for detecting abnormality, device, equipment and storage medium
CN111199244A (en) * 2019-12-19 2020-05-26 北京航天测控技术有限公司 Data classification method and device, storage medium and electronic device

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102483775A (en) * 2009-06-30 2012-05-30 生命扫描有限公司 Analyte testing method and system
CN102694696A (en) * 2012-05-14 2012-09-26 中国科学院计算机网络信息中心 Method and device for anomaly detection of DNS (domain name system) server
CN108009036A (en) * 2017-11-17 2018-05-08 亚信科技(中国)有限公司 A kind of method and server for positioning the operation for causing data exception
CN110503204A (en) * 2018-05-17 2019-11-26 国际商业机器公司 Identification is used for the migration models of machine learning task
CN108833409A (en) * 2018-06-15 2018-11-16 北京网思科平科技有限公司 webshell detection method and device based on deep learning and semi-supervised learning
CN110377491A (en) * 2019-07-10 2019-10-25 中国银联股份有限公司 A kind of data exception detection method and device
CN110535864A (en) * 2019-08-30 2019-12-03 北京达佳互联信息技术有限公司 Service method for detecting abnormality, device, equipment and storage medium
CN111199244A (en) * 2019-12-19 2020-05-26 北京航天测控技术有限公司 Data classification method and device, storage medium and electronic device

Similar Documents

Publication Publication Date Title
CN109242135B (en) Model operation method, device and business server
EP2453381B1 (en) System for an engine for forecasting cyber threats and method for forecasting cyber threats using the system
CN108681496A (en) Prediction technique, device and the electronic equipment of disk failure
CN110474808B (en) Flow prediction method and device
CN113837596B (en) Fault determination method and device, electronic equipment and storage medium
CN110837852A (en) Fault diagnosis method and device for rolling mill gearbox and terminal equipment
CN114997607A (en) Anomaly assessment early warning method and system based on engineering detection data
CN110930218A (en) Method and device for identifying fraudulent customer and electronic equipment
CN110991761B (en) Heat supply load prediction method and device
CN114418214A (en) Pipe network clogging analysis method and device, computer equipment and storage medium
CN114333317B (en) Traffic event processing method and device, electronic equipment and storage medium
CN114662926A (en) Scientific and technological enterprise evaluation method and device, equipment and medium thereof
CN115115190A (en) Quality monitoring method based on working condition, related device and program medium product
CN112507121B (en) Customer service violation quality inspection method and device, computer equipment and storage medium
CN112262353A (en) Abnormality analysis device, manufacturing system, abnormality analysis method, and program
CN113962874A (en) Bus load model training method, device, equipment and storage medium
CN111880986A (en) Data detection method and device
CN108280608B (en) Product life analysis method and terminal equipment
JP2015184818A (en) Server, model application propriety determination method and computer program
CN110704614A (en) Information processing method and device for predicting user group type in application
CN115860505A (en) Object evaluation method and device, terminal equipment and storage medium
CN115494431A (en) Transformer fault warning method, terminal equipment and computer readable storage medium
CN115062687A (en) Enterprise credit monitoring method, device, equipment and storage medium
CN102902838A (en) Trend-based target setting method and system for process control
CN110827144B (en) Application risk evaluation method and application risk evaluation device for user and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination