CN114708003A - Abnormal data detection method, device and equipment and readable storage medium - Google Patents
Abnormal data detection method, device and equipment and readable storage medium Download PDFInfo
- Publication number
- CN114708003A CN114708003A CN202210458381.2A CN202210458381A CN114708003A CN 114708003 A CN114708003 A CN 114708003A CN 202210458381 A CN202210458381 A CN 202210458381A CN 114708003 A CN114708003 A CN 114708003A
- Authority
- CN
- China
- Prior art keywords
- information
- data
- clustering
- abnormal
- price
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 230000002159 abnormal effect Effects 0.000 title claims abstract description 99
- 238000001514 detection method Methods 0.000 title claims abstract description 58
- 238000012545 processing Methods 0.000 claims abstract description 69
- 238000000034 method Methods 0.000 claims abstract description 27
- 238000007781 pre-processing Methods 0.000 claims abstract description 17
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 15
- 238000012216 screening Methods 0.000 claims abstract description 13
- 230000002547 anomalous effect Effects 0.000 claims abstract description 10
- 238000012795 verification Methods 0.000 claims description 29
- 238000004590 computer program Methods 0.000 claims description 9
- 238000010606 normalization Methods 0.000 claims description 7
- 235000018185 Betula X alpestris Nutrition 0.000 claims description 5
- 235000018212 Betula X uliginosa Nutrition 0.000 claims description 5
- 238000004364 calculation method Methods 0.000 claims description 5
- 238000013507 mapping Methods 0.000 claims description 5
- 238000009499 grossing Methods 0.000 claims description 4
- 230000007547 defect Effects 0.000 abstract description 3
- 230000000694 effects Effects 0.000 abstract description 3
- 238000012360 testing method Methods 0.000 description 8
- 238000004891 communication Methods 0.000 description 7
- 230000005856 abnormality Effects 0.000 description 6
- 230000005236 sound signal Effects 0.000 description 4
- 238000010219 correlation analysis Methods 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 238000006467 substitution reaction Methods 0.000 description 2
- 238000003491 array Methods 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000013075 data extraction Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/018—Certifying business or products
- G06Q30/0185—Product, service or business identity fraud
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/243—Classification techniques relating to the number of classes
- G06F18/2433—Single-class perspective, e.g. one-against-all classification; Novelty detection; Outlier detection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/06—Buying, selling or leasing transactions
- G06Q30/0601—Electronic shopping [e-shopping]
- G06Q30/0609—Buyer or seller confidence or verification
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Business, Economics & Management (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Finance (AREA)
- Accounting & Taxation (AREA)
- Artificial Intelligence (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Economics (AREA)
- Development Economics (AREA)
- Marketing (AREA)
- General Engineering & Computer Science (AREA)
- General Business, Economics & Management (AREA)
- Strategic Management (AREA)
- Life Sciences & Earth Sciences (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Entrepreneurship & Innovation (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention relates to the field of data processing, in particular to an abnormal data detection method, device, equipment and readable storage medium, wherein the method is used for acquiring first information, and the first information is at least commodity sales data information of one store; sending the first information to a data preprocessing model to obtain second information, wherein the second information is price characteristic data information and sales characteristic data information obtained by preprocessing the first information; sending the second information to an anomaly detection model for anomaly data detection to obtain third information, wherein the third information is anomalous commodity sales data obtained by performing twice clustering screening on the second information; and checking the third information to obtain abnormal commodity sales data screened by the model after checking the parameters. The advantages of the two clustering algorithms are integrated, so that the defects of the two algorithms are overcome, and the effect of efficiently and accurately judging abnormal data is achieved.
Description
Technical Field
The invention relates to the field of data processing, in particular to an abnormal data detection method, an abnormal data detection device, abnormal data detection equipment and a readable storage medium.
Background
In recent years, internet technology has developed rapidly, and the e-commerce industry has also stepped on developing motorways. The online shopping is more and more favored by people due to the characteristics of convenience, rapidness, time saving, labor saving and delivery to home. While the scale of each platform is continuously enlarged and the number of commodities is continuously increased, some illegal operation behaviors such as virtual price marking and single-row swiping also occur, the e-commerce law is seriously violated, and the data of the commodities need to be accurately identified. Aiming at the large quantity of commodities, if the commodities are simply checked and screened manually, the workload is huge, and the situations of omission and errors can also occur. There is a need for a data detection method that can accurately locate abnormal merchandise, reduce manual intervention costs, and reduce error rates.
Disclosure of Invention
The present invention provides a method, an apparatus, a device and a readable storage medium for detecting abnormal data, so as to improve the above problems. In order to achieve the purpose, the technical scheme adopted by the invention is as follows:
in one aspect, the present application provides a method for detecting abnormal data, where the method includes: acquiring first information, wherein the first information is commodity sales data information of at least one store; sending the first information to a data preprocessing model to obtain second information, wherein the second information is price characteristic data information and sales characteristic data information obtained by preprocessing the first information; sending the second information to an anomaly detection model for anomaly data detection to obtain third information, wherein the third information is anomalous commodity sales data obtained by performing twice clustering screening on the second information; and sending the third information to a verification module for processing to obtain fourth information, wherein the fourth information is abnormal commodity sales data screened by the model after the parameters are verified.
In a second aspect, an embodiment of the present application provides an abnormal data detection apparatus, including:
a first acquisition unit configured to acquire first information including commodity sales data information of at least one store;
the first processing unit is used for sending the first information to a data preprocessing model to obtain second information, wherein the second information is price characteristic data information and sales characteristic data information obtained by preprocessing the first information;
the second processing unit is used for sending the second information to an anomaly detection model for anomaly data detection to obtain third information, and the third information is anomalous commodity sales data obtained by performing twice clustering screening on the second information;
and the third processing unit is used for sending the third information to the verification module for processing to obtain fourth information, and the fourth information is abnormal commodity sales data screened by the model after the parameters are verified.
In a third aspect, an embodiment of the present application provides an abnormal data detection apparatus, which includes a memory and a processor. The memory is used for storing a computer program; the processor is used for realizing the steps of the abnormal data detection method when executing the computer program.
In a fourth aspect, an embodiment of the present application provides a readable storage medium, where a computer program is stored, and when the computer program is executed by a processor, the computer program implements the steps of the above abnormal data detection method.
The invention has the beneficial effects that:
the method and the device have the advantages that the characteristics of the commodity sales receipt are extracted, two different clustering algorithms are adopted for secondary clustering, commodities can be accurately positioned, manual intervention is reduced, the error rate is reduced, the data are subjected to primary processing through a high-efficiency clustering method, the number of the data needing to be detected is effectively reduced, the data subjected to primary clustering are processed through a high-accuracy clustering method, the advantages of the two algorithms are integrated, the defects of the two algorithms are overcome, and the effect of efficiently and accurately judging abnormal data is achieved.
Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by the practice of the embodiments of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and those skilled in the art can also obtain other related drawings based on the drawings without inventive efforts.
FIG. 1 is a schematic flow chart illustrating an abnormal data detection method according to an embodiment of the present invention;
FIG. 2 is a schematic structural diagram of an abnormal data detecting apparatus according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of an abnormal data detecting apparatus according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. The components of embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present invention, presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures. Meanwhile, in the description of the present invention, the terms "first", "second", and the like are used only for distinguishing the description, and are not to be construed as indicating or implying relative importance.
Example 1
As shown in fig. 1, the present embodiment provides an abnormal data detection method, which includes step S1, step S2, step S3, and step S4.
Step S1, acquiring first information, wherein the first information is at least commodity sales data information of one store;
step S2, sending the first information to a data preprocessing model to obtain second information, wherein the second information is price characteristic data information and sales characteristic data information obtained by preprocessing the first information;
step S3, sending the second information to an anomaly detection model for anomaly data detection to obtain third information, wherein the third information is anomalous commodity sales data obtained by twice clustering and screening the second information;
and S4, sending the third information to a verification module for processing to obtain fourth information, wherein the fourth information is abnormal commodity sales data screened by the model after the parameters are verified.
It is understood that the above-mentioned abnormal data is abnormal data in the commodity sales data.
The method and the device have the advantages that the characteristics of the commodity sales receipt are extracted, two different clustering algorithms are adopted for secondary clustering, the commodity can be accurately positioned, manual intervention is reduced, the error rate is reduced, the data are subjected to primary processing by adopting a high-efficiency clustering method, the number of data to be detected is effectively reduced, the data subjected to primary clustering are further processed by adopting a high-accuracy clustering method, the advantages of the two algorithms are integrated, the defects of the two algorithms are overcome, and the effect of efficiently and accurately judging abnormal data is achieved.
In a specific embodiment of the present disclosure, the step S2 includes a step S21, a step S22, and a step S23.
Step S21, carrying out data processing on the commodity sales data information, eliminating invalid data in the first information, and carrying out mean value filling on incomplete data in the first information to obtain first sub information;
step S22, calculating to obtain characteristic data of the first information based on the first sub-information and a preset calculation formula, wherein the characteristic data of the first information comprises price characteristic data and sales characteristic data;
step S23, performing normalization processing on the feature data of the first information, and performing smoothing processing on the feature data after the normalization processing to obtain the preprocessed first information.
The method comprises the steps of preprocessing commodity sales data, eliminating invalid data and carrying out mean value filling on the data, wherein the filling method comprises the steps of summing the data corresponding to other months, then calculating a mean value, and taking the mean value as a filling value of incomplete data, so that error values generated during feature data extraction are reduced, and the accuracy of clustering is increased.
In a specific embodiment of the present disclosure, the step S3 includes a step S31, a step S32, and a step S33.
Step S31, sending the price characteristic data information in the second information to a first clustering module for clustering to obtain first clustering information, wherein the first clustering information is abnormal data information in the price characteristic data information;
step S32, performing data corresponding mapping on the first cluster information and the sales characteristic data information in the second information to obtain second sub information, wherein the second sub information comprises the sales characteristic data information corresponding to the first cluster information;
and step S33, sending the second sub-information to a second clustering module for processing to obtain second clustering information, wherein the second clustering information is abnormal commodity sales data obtained by twice clustering and screening the second information.
The price characteristic data information is subjected to primary clustering, mapping is carried out on the basis of data after primary clustering and sales characteristic data, data of secondary clustering is selected, and then secondary clustering is carried out, so that the data are efficiently screened in the primary clustering, and the data corresponding to the data screened in the primary clustering are only screened in the secondary clustering, so that the advantages can be integrated under the condition of high efficiency.
In a specific embodiment of the present disclosure, the step S31 includes a step S311, a step S312, a step S313, and a step S314.
Step S311, traversing the price characteristic data information based on preset first initial parameter information, and processing the price characteristic data according to a generation method of a clustering characteristic tree in a BIRCH algorithm to obtain a price clustering characteristic tree;
s312, obtaining at least one clustering feature cluster based on the price clustering feature tree, and obtaining a threshold range corresponding to each clustering feature cluster;
step S313, analyzing all the threshold value ranges, and taking the minimum threshold value range in all the threshold value ranges as a normal threshold value range for judging normal points;
step S314, determining abnormal points in the price clustering feature tree based on the normal threshold range, and judging abnormal data information in the price feature data information based on the abnormal points.
It can be understood that in the above steps, the BIRCH algorithm is subjected to parameter setting by using a preset first initial parameter, a clustering feature tree is generated based on price feature data information, then the price feature data information is clustered based on the clustering feature tree to obtain at least one clustering cluster, then the size range of the clustering cluster is analyzed, the minimum range is selected as the threshold range of normal data, and then the threshold range of abnormal data is reversely judged to obtain abnormal data, so that the data can be efficiently and quickly processed, and the data amount required to be processed in the second clustering is reduced.
In a specific embodiment of the present disclosure, the step S33 includes steps S331, S332, S333, and S334.
Step S331, performing data processing based on a preset second initial parameter and data information in the second sub information, wherein the data information in the second sub information is converted into coordinate data points in a space coordinate system, and a mutual reachable distance between each coordinate data point is obtained based on each coordinate data point;
step S332, generating a weighted distance graph based on the mutual reachable distance, and generating a minimum spanning tree of the mutual reachable distance based on the weighted distance graph;
s333, converting the minimum spanning tree into a component of a hierarchical cluster structure according to the mutual reachable distance, and constructing the hierarchical cluster structure based on the component of the clustering hierarchical structure;
step S334, compress the hierarchical cluster structure, and classify the data information in the second sub information based on the compressed hierarchical cluster structure, so as to obtain the abnormal data in the second sub information.
It can be understood that in the above steps, the clustering algorithm is subjected to parameter setting by using the second initial parameter, the second sub-information is subjected to space coordinate transformation, the mutual reachable distance between each coordinate point is calculated, the robustness of the algorithm to noise is increased, then the minimum spanning tree is constructed based on the mutual reachable distance, and then the coordinate points are clustered to obtain the abnormal data in the second sub-information, so that the abnormal data information in each sales characteristic data information can be determined more accurately.
In a specific embodiment of the present disclosure, the step S4 includes steps S41, S42, S43, S44, S45, and S46.
Step S41, acquiring third sub information, wherein the third sub information comprises normal sales data information of historical commodities and abnormal sales data information of the historical commodities;
step S42, dividing the third sub information into a test set and a verification set, and sending the test set of the third sub information to a second clustering module for processing to obtain historical abnormal commodity sales data;
step S43, comparing the historical abnormal commodity sales data with the verification set to obtain verification result information;
step S44, performing grey correlation analysis on the verification result information and all initial parameters in the anomaly detection model to obtain the correlation degree of the verification result information and all the initial parameters;
step S45, adjusting the initial parameters based on the verification result information and the association degree to obtain an abnormal detection model with the adjusted initial parameters, wherein if the verification result is that the test set is inconsistent with the verification set, the initial parameters with the maximum association degree with the verification result are adjusted until the verification result is that the test set is consistent with the verification set;
and step S46, sending the first information to the abnormality detection model with the initial parameters adjusted for second abnormality detection, and screening the third information by data after the second abnormality detection to obtain screened abnormal commodity sales data.
The abnormal commodity sales data are classified, sent to the abnormal detection model for detection based on the historical data, subjected to gray correlation analysis on the detection result and the initial parameters to obtain the correlation degree of the detection result and the initial parameters, adjusted according to the correlation degree to obtain the adjusted abnormal detection model, and subjected to secondary abnormal data screening to obtain the screened abnormal commodity sales data.
Example 2
As shown in fig. 2, the present embodiment provides an abnormal data detection apparatus, which includes a first obtaining unit 701, a first processing unit 702, a second processing unit 703 and a third processing unit 704.
A first acquiring unit 701 configured to acquire first information, the first information being information on commodity sales data of at least one store;
a first processing unit 702, configured to send the first information to a data preprocessing model to obtain second information, where the second information is price characteristic data information and sales characteristic data information obtained by preprocessing the first information;
the second processing unit 703 is configured to send the second information to an anomaly detection model for performing anomaly data detection, so as to obtain third information, where the third information is anomalous commodity sales data obtained by performing twice clustering and screening on the second information;
and the third processing unit 704 is configured to send the third information to the verification module for processing, so as to obtain fourth information, where the fourth information is abnormal commodity sales data screened by the model after the parameter verification.
In one embodiment of the present disclosure, the first processing unit 702 includes a first processing subunit 7021, a second processing subunit 7022, and a third processing subunit 7023.
A first processing subunit 7021, configured to perform data processing on the commodity sales data information, remove invalid data in the first information, and perform mean value filling on incomplete data in the first information to obtain first sub information;
a second processing subunit 7022, configured to obtain feature data of the first information through calculation based on the first sub information and a preset calculation formula, where the feature data of the first information includes price feature data and sales feature data;
a third processing subunit 7023, configured to perform normalization on the feature data of the first information, and perform smoothing on the feature data after the normalization to obtain the preprocessed first information.
In a specific embodiment of the present disclosure, the second processing unit 703 includes a first clustering subunit 7031, a fourth processing subunit 7032, and a second clustering subunit 7033.
A first clustering subunit 7031, configured to send the price characteristic data information in the second information to a first clustering module for clustering, so as to obtain first clustering information, where the first clustering information is abnormal data information in the price characteristic data information;
a fourth processing subunit 7032, configured to perform data corresponding mapping on the first clustering information and the sales characteristic data information in the second information to obtain second sub information, where the second sub information includes the sales characteristic data information corresponding to the first clustering information;
and a second clustering subunit 7033, configured to send the second sub information to a second clustering module for processing, so as to obtain second clustering information, where the second clustering information is abnormal commodity sales data obtained by performing twice clustering and screening on the second information.
In a specific embodiment of the present disclosure, the first clustering subunit 7031 includes a third clustering subunit 70311, a fourth clustering subunit 70312, a fifth clustering subunit 70313, and a sixth clustering subunit 70314.
A third clustering subunit 70311, configured to traverse the price feature data information based on preset first initial parameter information, and process the price feature data according to a generation method of a clustering feature tree in a BIRCH algorithm to obtain a price clustering feature tree;
a fourth clustering subunit 70312, configured to obtain at least one clustering feature cluster based on the price clustering feature tree, and obtain a threshold range corresponding to each clustering feature cluster;
a fifth clustering subunit 70313, configured to analyze all the threshold ranges, and use a minimum threshold range in all the threshold ranges as a normal threshold range for determining a normal point;
a sixth clustering subunit 70314, configured to determine an abnormal point in the price clustering feature tree based on the normal threshold range, and determine abnormal data information in the price feature data information based on the abnormal point.
In a specific embodiment of the present disclosure, the second clustering subunit 7033 includes a seventh clustering subunit 70331, an eighth clustering subunit 70332, a ninth clustering subunit 70333, and a tenth clustering subunit 70334.
A seventh clustering subunit 70331, configured to perform data processing based on a preset second initial parameter and data information in the second sub information, where the data information in the second sub information is converted into coordinate data points in a spatial coordinate system, and a mutual reachable distance between each coordinate data point is obtained based on each coordinate data point;
an eighth clustering subunit 70332, configured to generate a weighted distance map based on the mutual reachable distances, and generate a minimum spanning tree of the mutual reachable distances based on the weighted distance map;
a ninth clustering subunit 70333, configured to convert the minimum spanning tree into a component of a hierarchical cluster structure according to the mutual reachable distance, and construct a hierarchical cluster structure based on the component of the hierarchical cluster structure;
a tenth clustering subunit 70334, configured to compress the hierarchical cluster structure, and classify the data information in the second sub information based on the compressed hierarchical cluster structure, to obtain abnormal data in the second sub information.
In a specific embodiment of the present disclosure, the third processing unit 704 includes a first obtaining sub-unit 7041, a fifth processing sub-unit 7042, a sixth processing sub-unit 7043, a seventh processing sub-unit 7044, an eighth processing sub-unit 7045, and a ninth processing sub-unit 7046.
A first obtaining subunit 7041, configured to obtain third sub information, where the third sub information includes normal sales data information of historical commodities and abnormal sales data information of historical commodities;
a fifth processing subunit 7042, configured to divide the third sub information into a test set and a verification set, and send the test set of the third sub information to the second clustering module for processing, so as to obtain historical abnormal commodity sales data;
a sixth processing subunit 7043, configured to compare the historical abnormal commodity sales data with the verification set, to obtain verification result information;
a seventh processing subunit 7044, configured to perform gray correlation analysis on the verification result information and all initial parameters in the anomaly detection model to obtain correlation degrees between the verification result information and all initial parameters;
an eighth processing subunit 7045, configured to adjust the initial parameter based on the verification result information and the association degree, to obtain an abnormal detection model with an adjusted initial parameter, where if a verification result is that the test set is inconsistent with the verification set, the initial parameter with the largest association degree with the verification result is adjusted until the verification result is that the test set is consistent with the verification set;
a ninth processing subunit 7046, configured to send the first information to the abnormality detection model after the initial parameter is adjusted to perform second abnormality detection, and screen the third information according to data after the second abnormality detection, so as to obtain screened abnormal commodity sales data.
It should be noted that, regarding the apparatus in the above embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated herein.
Example 3
Corresponding to the above method embodiment, the embodiment of the present disclosure further provides an abnormal data detection apparatus, and an abnormal data detection apparatus described below and an abnormal data detection method described above may be referred to in a corresponding manner.
Fig. 3 is a block diagram illustrating an abnormal data detecting apparatus 800 according to an exemplary embodiment. As shown in fig. 3, the abnormal data detecting apparatus 800 may include: a processor 801, a memory 802. The anomalous data detection device 800 can also include one or more of a multimedia component 803, an input/output (I/O) interface 804, and a communication component 805.
The processor 801 is configured to control the overall operation of the abnormal data detecting apparatus 800, so as to complete all or part of the steps in the abnormal data detecting method. The memory 802 is used to store various types of data to support operation at the anomalous data detection device 800, such data can include, for example, instructions for any application or method operating on the anomalous data detection device 800, as well as application related data such as contact data, messaging, pictures, audio, video, and so forth. The Memory 802 may be implemented by any type of volatile or non-volatile Memory device or combination thereof, such as Static Random Access Memory (SRAM), Electrically Erasable Programmable Read-Only Memory (EEPROM), Erasable Programmable Read-Only Memory (EPROM), Programmable Read-Only Memory (PROM), Read-Only Memory (ROM), magnetic Memory, flash Memory, magnetic disk or optical disk. The multimedia components 803 may include screen and audio components. Wherein the screen may be, for example, a touch screen and the audio component is used for outputting and/or inputting audio signals. For example, the audio component may include a microphone for receiving external audio signals. The received audio signal may further be stored in the memory 802 or transmitted through the communication component 805. The audio assembly also includes at least one speaker for outputting audio signals. The I/O interface 804 provides an interface between the processor 801 and other interface modules, such as a keyboard, mouse, buttons, etc. These buttons may be virtual buttons or physical buttons. The communication component 805 is used for wired or wireless communication between the abnormal data detecting apparatus 800 and other apparatuses. Wireless communication, such as Wi-Fi, bluetooth, Near Field Communication (NFC), 2G, 3G, or 4G, or a combination of one or more of them, so that the corresponding communication component 805 may include: Wi-Fi module, bluetooth module, NFC module.
In an exemplary embodiment, the abnormal data detecting apparatus 800 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, microcontrollers, microprocessors, or other electronic components, for performing one of the above abnormal data detecting methods.
In another exemplary embodiment, there is also provided a computer readable storage medium including program instructions which, when executed by a processor, implement the steps of the above-described abnormal data detection method. For example, the computer readable storage medium may be the above-described memory 802 including program instructions executable by the processor 801 of the abnormal data detecting apparatus 800 to perform the above-described abnormal data detecting method.
Example 4
Corresponding to the above method embodiment, the embodiment of the present disclosure further provides a readable storage medium, and a readable storage medium described below and an abnormal data detection method described above may be referred to in correspondence with each other.
A readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the abnormal data detection method of the above method embodiment.
The readable storage medium may be a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, an optical disk, or other various readable storage media capable of storing program codes.
The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.
The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.
Claims (10)
1. An abnormal data detection method, comprising:
acquiring first information, wherein the first information is commodity sales data information of at least one store;
sending the first information to a data preprocessing model to obtain second information, wherein the second information is price characteristic data information and sales characteristic data information obtained by preprocessing the first information;
sending the second information to an anomaly detection model for anomaly data detection to obtain third information, wherein the third information is anomalous commodity sales data obtained by performing twice clustering screening on the second information;
and sending the third information to a verification module for processing to obtain fourth information, wherein the fourth information is abnormal commodity sales data screened by the model after the parameters are verified.
2. The abnormal data detection method according to claim 1, wherein sending the first information to a data preprocessing model to obtain second information, the second information being information obtained by preprocessing the first information, comprises:
carrying out data processing on the commodity sales data information, clearing invalid data in the first information, and carrying out mean value filling on incomplete data in the first information to obtain first sub information;
calculating to obtain characteristic data of the first information based on the first sub-information and a preset calculation formula, wherein the characteristic data of the first information comprises price characteristic data and sales characteristic data;
and normalizing the characteristic data of the first information, and smoothing the characteristic data after normalization to obtain the preprocessed first information.
3. The abnormal data detection method according to claim 1, wherein sending the second information to an abnormal data detection model for abnormal data detection to obtain third information, comprises:
sending the price characteristic data information in the second information to a first clustering module for clustering to obtain first clustering information, wherein the first clustering information is abnormal data information in the price characteristic data information;
performing data corresponding mapping on the first clustering information and the sales characteristic data information in the second information to obtain second sub-information, wherein the second sub-information comprises the sales characteristic data information corresponding to the first clustering information;
and sending the second sub-information to a second clustering module for processing to obtain second clustering information, wherein the second clustering information is abnormal commodity sales data obtained by performing twice clustering screening on the second information.
4. The abnormal data detection method according to claim 3, wherein sending the price feature data information in the second information to a first clustering module for clustering to obtain first clustering information comprises:
traversing the price characteristic data information based on preset first initial parameter information, and processing the price characteristic data according to a generation method of a clustering characteristic tree in a BIRCH algorithm to obtain a price clustering characteristic tree;
obtaining at least one clustering feature cluster based on the price clustering feature tree, and obtaining a threshold range corresponding to each clustering feature cluster;
analyzing all the threshold value ranges, and taking the minimum threshold value range in all the threshold value ranges as a normal threshold value range for judging a normal point;
and determining abnormal points in the price clustering feature tree based on the normal threshold range, and judging abnormal data information in the price feature data information based on the abnormal points.
5. An abnormal data detecting apparatus, comprising:
a first acquisition unit configured to acquire first information including commodity sales data information of at least one store;
the first processing unit is used for sending the first information to a data preprocessing model to obtain second information, wherein the second information is price characteristic data information and sales characteristic data information obtained by preprocessing the first information;
the second processing unit is used for sending the second information to an anomaly detection model for anomaly data detection to obtain third information, and the third information is anomalous commodity sales data obtained by performing twice clustering screening on the second information;
and the third processing unit is used for sending the third information to the verification module for processing to obtain fourth information, and the fourth information is abnormal commodity sales data screened by the model after the parameters are verified.
6. The abnormal data detecting apparatus according to claim 5, wherein the apparatus comprises:
the first processing subunit is used for carrying out data processing on the commodity sales data information, eliminating invalid data in the first information, and carrying out mean value filling on incomplete data in the first information to obtain first sub information;
the second processing subunit is used for calculating to obtain the characteristic data of the first information based on the first sub-information and a preset calculation formula, wherein the characteristic data of the first information comprises price characteristic data and sales characteristic data;
and the third processing subunit is used for carrying out normalization processing on the feature data of the first information and carrying out smoothing processing on the feature data after the normalization processing to obtain the preprocessed first information.
7. The abnormal data detecting apparatus according to claim 5, wherein the apparatus comprises:
the first clustering subunit is used for sending the price characteristic data information in the second information to a first clustering module for clustering to obtain first clustering information, wherein the first clustering information is abnormal data information in the price characteristic data information;
the fourth processing subunit is configured to perform data corresponding mapping on the first clustering information and the sales characteristic data information in the second information to obtain second sub-information, where the second sub-information includes the sales characteristic data information corresponding to the first clustering information;
and the second clustering subunit is used for sending the second sub-information to a second clustering module for processing to obtain second clustering information, wherein the second clustering information is abnormal commodity sales data obtained by performing twice clustering screening on the second information.
8. The abnormal data detecting apparatus according to claim 7, wherein the apparatus comprises:
the third clustering subunit is used for traversing the price characteristic data information based on preset first initial parameter information and processing the price characteristic data according to a generation method of a clustering characteristic tree in a BIRCH algorithm to obtain a price clustering characteristic tree;
the fourth clustering subunit is used for obtaining at least one clustering feature cluster based on the price clustering feature tree and obtaining a threshold range corresponding to each clustering feature cluster;
the fifth clustering subunit is used for analyzing all the threshold value ranges, and taking the minimum threshold value range in all the threshold value ranges as a normal threshold value range for judging normal points;
and the sixth clustering subunit is used for determining abnormal points in the price clustering feature tree based on the normal threshold range and judging abnormal data information in the price feature data information based on the abnormal points.
9. An abnormal data detecting apparatus, comprising:
a memory for storing a computer program;
a processor for implementing the steps of the abnormal data detecting method according to any one of claims 1 to 4 when executing the computer program.
10. A readable storage medium, characterized by: the readable storage medium has stored thereon a computer program which, when executed by a processor, implements the steps of the abnormal data detecting method according to any one of claims 1 to 4.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210458381.2A CN114708003B (en) | 2022-04-27 | 2022-04-27 | Abnormal data detection method, device, equipment and readable storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210458381.2A CN114708003B (en) | 2022-04-27 | 2022-04-27 | Abnormal data detection method, device, equipment and readable storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114708003A true CN114708003A (en) | 2022-07-05 |
CN114708003B CN114708003B (en) | 2023-11-10 |
Family
ID=82177116
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210458381.2A Active CN114708003B (en) | 2022-04-27 | 2022-04-27 | Abnormal data detection method, device, equipment and readable storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114708003B (en) |
Citations (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105809448A (en) * | 2014-12-30 | 2016-07-27 | 阿里巴巴集团控股有限公司 | Account transaction clustering method and system thereof |
US9454785B1 (en) * | 2015-07-30 | 2016-09-27 | Palantir Technologies Inc. | Systems and user interfaces for holistic, data-driven investigation of bad actor behavior based on clustering and scoring of related data |
CN106529968A (en) * | 2016-09-29 | 2017-03-22 | 深圳大学 | Customer classification method and system thereof based on transaction data |
KR101834260B1 (en) * | 2017-01-18 | 2018-03-06 | 한국인터넷진흥원 | Method and Apparatus for Detecting Fraudulent Transaction |
CN107918905A (en) * | 2017-11-22 | 2018-04-17 | 阿里巴巴集团控股有限公司 | Abnormal transaction identification method, apparatus and server |
CN109389453A (en) * | 2017-08-11 | 2019-02-26 | 苏宁云商集团股份有限公司 | A kind of price analysis method and device |
CN110046889A (en) * | 2019-03-20 | 2019-07-23 | 腾讯科技(深圳)有限公司 | A kind of detection method, device and the server of abnormal behaviour main body |
CN110400220A (en) * | 2019-07-23 | 2019-11-01 | 上海氪信信息技术有限公司 | A kind of suspicious transaction detection method of intelligence based on semi-supervised figure neural network |
US20200314159A1 (en) * | 2019-03-29 | 2020-10-01 | Paypal, Inc. | Anomaly detection for streaming data |
US20210333280A1 (en) * | 2020-04-23 | 2021-10-28 | YatHing Biotechnology Company Limited | Methods related to the diagnosis of prostate cancer |
CN113988148A (en) * | 2020-07-10 | 2022-01-28 | 华为技术有限公司 | Data clustering method, system, computer equipment and storage medium |
CN114077872A (en) * | 2021-11-29 | 2022-02-22 | 税友软件集团股份有限公司 | Data anomaly detection method and related device |
CN114186626A (en) * | 2021-12-09 | 2022-03-15 | 中国建设银行股份有限公司 | Abnormity detection method and device, electronic equipment and computer readable medium |
CN114548276A (en) * | 2022-02-22 | 2022-05-27 | Oppo广东移动通信有限公司 | Method and device for clustering data, electronic equipment and storage medium |
CN115510982A (en) * | 2022-09-29 | 2022-12-23 | 联想(北京)有限公司 | Clustering method, device, equipment and computer storage medium |
-
2022
- 2022-04-27 CN CN202210458381.2A patent/CN114708003B/en active Active
Patent Citations (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105809448A (en) * | 2014-12-30 | 2016-07-27 | 阿里巴巴集团控股有限公司 | Account transaction clustering method and system thereof |
US9454785B1 (en) * | 2015-07-30 | 2016-09-27 | Palantir Technologies Inc. | Systems and user interfaces for holistic, data-driven investigation of bad actor behavior based on clustering and scoring of related data |
CN106529968A (en) * | 2016-09-29 | 2017-03-22 | 深圳大学 | Customer classification method and system thereof based on transaction data |
KR101834260B1 (en) * | 2017-01-18 | 2018-03-06 | 한국인터넷진흥원 | Method and Apparatus for Detecting Fraudulent Transaction |
CN109389453A (en) * | 2017-08-11 | 2019-02-26 | 苏宁云商集团股份有限公司 | A kind of price analysis method and device |
CN107918905A (en) * | 2017-11-22 | 2018-04-17 | 阿里巴巴集团控股有限公司 | Abnormal transaction identification method, apparatus and server |
CN110046889A (en) * | 2019-03-20 | 2019-07-23 | 腾讯科技(深圳)有限公司 | A kind of detection method, device and the server of abnormal behaviour main body |
US20200314159A1 (en) * | 2019-03-29 | 2020-10-01 | Paypal, Inc. | Anomaly detection for streaming data |
CN110400220A (en) * | 2019-07-23 | 2019-11-01 | 上海氪信信息技术有限公司 | A kind of suspicious transaction detection method of intelligence based on semi-supervised figure neural network |
US20210333280A1 (en) * | 2020-04-23 | 2021-10-28 | YatHing Biotechnology Company Limited | Methods related to the diagnosis of prostate cancer |
CN113988148A (en) * | 2020-07-10 | 2022-01-28 | 华为技术有限公司 | Data clustering method, system, computer equipment and storage medium |
CN114077872A (en) * | 2021-11-29 | 2022-02-22 | 税友软件集团股份有限公司 | Data anomaly detection method and related device |
CN114186626A (en) * | 2021-12-09 | 2022-03-15 | 中国建设银行股份有限公司 | Abnormity detection method and device, electronic equipment and computer readable medium |
CN114548276A (en) * | 2022-02-22 | 2022-05-27 | Oppo广东移动通信有限公司 | Method and device for clustering data, electronic equipment and storage medium |
CN115510982A (en) * | 2022-09-29 | 2022-12-23 | 联想(北京)有限公司 | Clustering method, device, equipment and computer storage medium |
Non-Patent Citations (16)
Title |
---|
DANILO LABANCA等: "Amaretto: An Active Learning Framework for Money Laundering Detection", IEEE ACCESS, vol. 10, pages 41720 * |
IBRAHIM K. OGUNDOYIN等: "DESIGN AND SIMULATION OF AN EFFICIENT MODEL FOR CREDIT CARDS FRAUD DETECTION", JOURNAL OF ENGINEERING AND TECHNOLOGY, vol. 16, no. 1, pages 88 - 99 * |
MOHIUDDIN AHMED等: "A survey of anomaly detection techniques in financial domain", FUTURE GENERATION COMPUTER SYSTEMS, vol. 55, pages 278 - 288 * |
Retrieved from the Internet <URL:https://zhuanlan.zhihu.com/p/412918565> * |
THUSHARA AMARASINGHE: "Critical Analysis of Machine Learning Based Approaches for Fraud Detection in Financial Transactions", PROCEEDINGS OF THE 2018 INTERNATIONAL CONFERENCE ON MACHINE LEARNING TECHNOLOGIES, pages 12 - 17 * |
YAN YANG: "DBSCAN Clustering Algorithm Applied to Identify Suspicious Financial Transactions", 2014 INTERNATIONAL CONFERENCE ON CYBER-ENABLED DISTRIBUTED COMPUTING AND KNOWLEDGE DISCOVERY, pages 60 * |
ZHENGUO CHEN等: "Anomaly Detection Based on Enhanced DBScan Algorithm", SCIVERSE SCIENCEDIRECT, vol. 15, pages 178 - 182, XP028337358, DOI: 10.1016/j.proeng.2011.08.036 * |
余胜辉: "基于Spark的层次聚类算法的并行化研究", 计算机技术与发展, vol. 30, no. 6, pages 19 - 22 * |
朱琳: "银行交易大数据洗钱挖掘模型及应用研究", no. 2021, pages 7 - 9 * |
朱琳: "银行交易大数据洗钱挖掘模型及应用研究", 中国优秀硕士学位论文全文数据库信息科技辑, no. 2, pages 7 - 9 * |
王红雨: "基于机器学习的信用卡欺诈检测方案的研究", CNKI优秀硕士学位论文全文库, vol. 2019, no. 08, pages 1 - 66 * |
罗钦芳 等: "基于"多层次分类"方法的异常P2P网贷借款识别", 管理工程学报, vol. 31, no. 3, pages 201 - 209 * |
詹姆斯德1: "Birch算法介绍", pages 1 - 9 * |
赵学华: "基于过采样的不平衡数据集成分类算法研究", CNKI优秀硕士学位论文全文库, vol. 2021, no. 02, pages 1 - 79 * |
陈敏昊: "基于定性数据聚类的孤立森林算法", CNKI优秀硕士学位论文全文库, vol. 2022, no. 3, pages 1 - 56 * |
韩鑫: "基于核的层次聚类算法研究", CNKI优秀硕士学位论文全文库, vol. 2021, no. 9, pages 1 - 65 * |
Also Published As
Publication number | Publication date |
---|---|
CN114708003B (en) | 2023-11-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110489314B (en) | Model anomaly detection method and device, computer equipment and storage medium | |
JP6667865B1 (en) | Accounting information processing apparatus, accounting information processing method, and accounting information processing program | |
CN112987675A (en) | Method, device, computer equipment and medium for anomaly detection | |
CN112884092A (en) | AI model generation method, electronic device, and storage medium | |
CN112818066A (en) | Time sequence data anomaly detection method and device, electronic equipment and storage medium | |
KR101924352B1 (en) | Method for detecting issue based on trend analysis device thereof | |
CN111177655B (en) | Data processing method and device and electronic equipment | |
CN112364939A (en) | Abnormal value detection method, device, equipment and storage medium | |
CN111027531A (en) | Pointer instrument information identification method and device and electronic equipment | |
JP2019215698A (en) | Image inspection support apparatus and method | |
CN113435753A (en) | Enterprise risk judgment method, device, equipment and medium in high-risk industry | |
CN116029617B (en) | Quality acceptance form generation method, device, equipment and readable storage medium | |
CN115796846B (en) | Equipment cleaning service recommendation method, device, equipment and readable storage medium | |
CN114708003A (en) | Abnormal data detection method, device and equipment and readable storage medium | |
CN115827496A (en) | Code abnormality detection method and device, electronic equipment and storage medium | |
CN116662186A (en) | Log playback assertion method and device based on logistic regression and electronic equipment | |
CN114240928B (en) | Partition detection method, device and equipment for board quality and readable storage medium | |
CN108446907B (en) | Safety verification method and device | |
US11520831B2 (en) | Accuracy metric for regular expression | |
CN111798237A (en) | Abnormal transaction diagnosis method and system based on application log | |
CN113515684A (en) | Abnormal data detection method and device | |
US11100449B1 (en) | Systems and methods for efficiency management | |
CN117227551B (en) | New energy equipment safety monitoring method, device, equipment and readable storage medium | |
CN114356743B (en) | Abnormal event automatic detection method and system based on sequence reconstruction | |
CN112766059B (en) | Method and device for detecting product processing quality |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |