CN114708003A - Abnormal data detection method, device and equipment and readable storage medium - Google Patents

Abnormal data detection method, device and equipment and readable storage medium Download PDF

Info

Publication number
CN114708003A
CN114708003A CN202210458381.2A CN202210458381A CN114708003A CN 114708003 A CN114708003 A CN 114708003A CN 202210458381 A CN202210458381 A CN 202210458381A CN 114708003 A CN114708003 A CN 114708003A
Authority
CN
China
Prior art keywords
information
data
clustering
abnormal
price
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210458381.2A
Other languages
Chinese (zh)
Other versions
CN114708003B (en
Inventor
范华琦
刘恒
周杲
蒋挺
向吴优
陈赛
庞苏川
杨柳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chengdu Jiaoda Big Data Technology Co ltd
Southwest Jiaotong University
Original Assignee
Chengdu Jiaoda Big Data Technology Co ltd
Southwest Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chengdu Jiaoda Big Data Technology Co ltd, Southwest Jiaotong University filed Critical Chengdu Jiaoda Big Data Technology Co ltd
Priority to CN202210458381.2A priority Critical patent/CN114708003B/en
Publication of CN114708003A publication Critical patent/CN114708003A/en
Application granted granted Critical
Publication of CN114708003B publication Critical patent/CN114708003B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/018Certifying business or products
    • G06Q30/0185Product, service or business identity fraud
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/2433Single-class perspective, e.g. one-against-all classification; Novelty detection; Outlier detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions
    • G06Q30/0601Electronic shopping [e-shopping]
    • G06Q30/0609Buyer or seller confidence or verification

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Finance (AREA)
  • Accounting & Taxation (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Economics (AREA)
  • Development Economics (AREA)
  • Marketing (AREA)
  • General Engineering & Computer Science (AREA)
  • General Business, Economics & Management (AREA)
  • Strategic Management (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention relates to the field of data processing, in particular to an abnormal data detection method, device, equipment and readable storage medium, wherein the method is used for acquiring first information, and the first information is at least commodity sales data information of one store; sending the first information to a data preprocessing model to obtain second information, wherein the second information is price characteristic data information and sales characteristic data information obtained by preprocessing the first information; sending the second information to an anomaly detection model for anomaly data detection to obtain third information, wherein the third information is anomalous commodity sales data obtained by performing twice clustering screening on the second information; and checking the third information to obtain abnormal commodity sales data screened by the model after checking the parameters. The advantages of the two clustering algorithms are integrated, so that the defects of the two algorithms are overcome, and the effect of efficiently and accurately judging abnormal data is achieved.

Description

Abnormal data detection method, device and equipment and readable storage medium
Technical Field
The invention relates to the field of data processing, in particular to an abnormal data detection method, an abnormal data detection device, abnormal data detection equipment and a readable storage medium.
Background
In recent years, internet technology has developed rapidly, and the e-commerce industry has also stepped on developing motorways. The online shopping is more and more favored by people due to the characteristics of convenience, rapidness, time saving, labor saving and delivery to home. While the scale of each platform is continuously enlarged and the number of commodities is continuously increased, some illegal operation behaviors such as virtual price marking and single-row swiping also occur, the e-commerce law is seriously violated, and the data of the commodities need to be accurately identified. Aiming at the large quantity of commodities, if the commodities are simply checked and screened manually, the workload is huge, and the situations of omission and errors can also occur. There is a need for a data detection method that can accurately locate abnormal merchandise, reduce manual intervention costs, and reduce error rates.
Disclosure of Invention
The present invention provides a method, an apparatus, a device and a readable storage medium for detecting abnormal data, so as to improve the above problems. In order to achieve the purpose, the technical scheme adopted by the invention is as follows:
in one aspect, the present application provides a method for detecting abnormal data, where the method includes: acquiring first information, wherein the first information is commodity sales data information of at least one store; sending the first information to a data preprocessing model to obtain second information, wherein the second information is price characteristic data information and sales characteristic data information obtained by preprocessing the first information; sending the second information to an anomaly detection model for anomaly data detection to obtain third information, wherein the third information is anomalous commodity sales data obtained by performing twice clustering screening on the second information; and sending the third information to a verification module for processing to obtain fourth information, wherein the fourth information is abnormal commodity sales data screened by the model after the parameters are verified.
In a second aspect, an embodiment of the present application provides an abnormal data detection apparatus, including:
a first acquisition unit configured to acquire first information including commodity sales data information of at least one store;
the first processing unit is used for sending the first information to a data preprocessing model to obtain second information, wherein the second information is price characteristic data information and sales characteristic data information obtained by preprocessing the first information;
the second processing unit is used for sending the second information to an anomaly detection model for anomaly data detection to obtain third information, and the third information is anomalous commodity sales data obtained by performing twice clustering screening on the second information;
and the third processing unit is used for sending the third information to the verification module for processing to obtain fourth information, and the fourth information is abnormal commodity sales data screened by the model after the parameters are verified.
In a third aspect, an embodiment of the present application provides an abnormal data detection apparatus, which includes a memory and a processor. The memory is used for storing a computer program; the processor is used for realizing the steps of the abnormal data detection method when executing the computer program.
In a fourth aspect, an embodiment of the present application provides a readable storage medium, where a computer program is stored, and when the computer program is executed by a processor, the computer program implements the steps of the above abnormal data detection method.
The invention has the beneficial effects that:
the method and the device have the advantages that the characteristics of the commodity sales receipt are extracted, two different clustering algorithms are adopted for secondary clustering, commodities can be accurately positioned, manual intervention is reduced, the error rate is reduced, the data are subjected to primary processing through a high-efficiency clustering method, the number of the data needing to be detected is effectively reduced, the data subjected to primary clustering are processed through a high-accuracy clustering method, the advantages of the two algorithms are integrated, the defects of the two algorithms are overcome, and the effect of efficiently and accurately judging abnormal data is achieved.
Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by the practice of the embodiments of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and those skilled in the art can also obtain other related drawings based on the drawings without inventive efforts.
FIG. 1 is a schematic flow chart illustrating an abnormal data detection method according to an embodiment of the present invention;
FIG. 2 is a schematic structural diagram of an abnormal data detecting apparatus according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of an abnormal data detecting apparatus according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. The components of embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present invention, presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures. Meanwhile, in the description of the present invention, the terms "first", "second", and the like are used only for distinguishing the description, and are not to be construed as indicating or implying relative importance.
Example 1
As shown in fig. 1, the present embodiment provides an abnormal data detection method, which includes step S1, step S2, step S3, and step S4.
Step S1, acquiring first information, wherein the first information is at least commodity sales data information of one store;
step S2, sending the first information to a data preprocessing model to obtain second information, wherein the second information is price characteristic data information and sales characteristic data information obtained by preprocessing the first information;
step S3, sending the second information to an anomaly detection model for anomaly data detection to obtain third information, wherein the third information is anomalous commodity sales data obtained by twice clustering and screening the second information;
and S4, sending the third information to a verification module for processing to obtain fourth information, wherein the fourth information is abnormal commodity sales data screened by the model after the parameters are verified.
It is understood that the above-mentioned abnormal data is abnormal data in the commodity sales data.
The method and the device have the advantages that the characteristics of the commodity sales receipt are extracted, two different clustering algorithms are adopted for secondary clustering, the commodity can be accurately positioned, manual intervention is reduced, the error rate is reduced, the data are subjected to primary processing by adopting a high-efficiency clustering method, the number of data to be detected is effectively reduced, the data subjected to primary clustering are further processed by adopting a high-accuracy clustering method, the advantages of the two algorithms are integrated, the defects of the two algorithms are overcome, and the effect of efficiently and accurately judging abnormal data is achieved.
In a specific embodiment of the present disclosure, the step S2 includes a step S21, a step S22, and a step S23.
Step S21, carrying out data processing on the commodity sales data information, eliminating invalid data in the first information, and carrying out mean value filling on incomplete data in the first information to obtain first sub information;
step S22, calculating to obtain characteristic data of the first information based on the first sub-information and a preset calculation formula, wherein the characteristic data of the first information comprises price characteristic data and sales characteristic data;
step S23, performing normalization processing on the feature data of the first information, and performing smoothing processing on the feature data after the normalization processing to obtain the preprocessed first information.
The method comprises the steps of preprocessing commodity sales data, eliminating invalid data and carrying out mean value filling on the data, wherein the filling method comprises the steps of summing the data corresponding to other months, then calculating a mean value, and taking the mean value as a filling value of incomplete data, so that error values generated during feature data extraction are reduced, and the accuracy of clustering is increased.
In a specific embodiment of the present disclosure, the step S3 includes a step S31, a step S32, and a step S33.
Step S31, sending the price characteristic data information in the second information to a first clustering module for clustering to obtain first clustering information, wherein the first clustering information is abnormal data information in the price characteristic data information;
step S32, performing data corresponding mapping on the first cluster information and the sales characteristic data information in the second information to obtain second sub information, wherein the second sub information comprises the sales characteristic data information corresponding to the first cluster information;
and step S33, sending the second sub-information to a second clustering module for processing to obtain second clustering information, wherein the second clustering information is abnormal commodity sales data obtained by twice clustering and screening the second information.
The price characteristic data information is subjected to primary clustering, mapping is carried out on the basis of data after primary clustering and sales characteristic data, data of secondary clustering is selected, and then secondary clustering is carried out, so that the data are efficiently screened in the primary clustering, and the data corresponding to the data screened in the primary clustering are only screened in the secondary clustering, so that the advantages can be integrated under the condition of high efficiency.
In a specific embodiment of the present disclosure, the step S31 includes a step S311, a step S312, a step S313, and a step S314.
Step S311, traversing the price characteristic data information based on preset first initial parameter information, and processing the price characteristic data according to a generation method of a clustering characteristic tree in a BIRCH algorithm to obtain a price clustering characteristic tree;
s312, obtaining at least one clustering feature cluster based on the price clustering feature tree, and obtaining a threshold range corresponding to each clustering feature cluster;
step S313, analyzing all the threshold value ranges, and taking the minimum threshold value range in all the threshold value ranges as a normal threshold value range for judging normal points;
step S314, determining abnormal points in the price clustering feature tree based on the normal threshold range, and judging abnormal data information in the price feature data information based on the abnormal points.
It can be understood that in the above steps, the BIRCH algorithm is subjected to parameter setting by using a preset first initial parameter, a clustering feature tree is generated based on price feature data information, then the price feature data information is clustered based on the clustering feature tree to obtain at least one clustering cluster, then the size range of the clustering cluster is analyzed, the minimum range is selected as the threshold range of normal data, and then the threshold range of abnormal data is reversely judged to obtain abnormal data, so that the data can be efficiently and quickly processed, and the data amount required to be processed in the second clustering is reduced.
In a specific embodiment of the present disclosure, the step S33 includes steps S331, S332, S333, and S334.
Step S331, performing data processing based on a preset second initial parameter and data information in the second sub information, wherein the data information in the second sub information is converted into coordinate data points in a space coordinate system, and a mutual reachable distance between each coordinate data point is obtained based on each coordinate data point;
step S332, generating a weighted distance graph based on the mutual reachable distance, and generating a minimum spanning tree of the mutual reachable distance based on the weighted distance graph;
s333, converting the minimum spanning tree into a component of a hierarchical cluster structure according to the mutual reachable distance, and constructing the hierarchical cluster structure based on the component of the clustering hierarchical structure;
step S334, compress the hierarchical cluster structure, and classify the data information in the second sub information based on the compressed hierarchical cluster structure, so as to obtain the abnormal data in the second sub information.
It can be understood that in the above steps, the clustering algorithm is subjected to parameter setting by using the second initial parameter, the second sub-information is subjected to space coordinate transformation, the mutual reachable distance between each coordinate point is calculated, the robustness of the algorithm to noise is increased, then the minimum spanning tree is constructed based on the mutual reachable distance, and then the coordinate points are clustered to obtain the abnormal data in the second sub-information, so that the abnormal data information in each sales characteristic data information can be determined more accurately.
In a specific embodiment of the present disclosure, the step S4 includes steps S41, S42, S43, S44, S45, and S46.
Step S41, acquiring third sub information, wherein the third sub information comprises normal sales data information of historical commodities and abnormal sales data information of the historical commodities;
step S42, dividing the third sub information into a test set and a verification set, and sending the test set of the third sub information to a second clustering module for processing to obtain historical abnormal commodity sales data;
step S43, comparing the historical abnormal commodity sales data with the verification set to obtain verification result information;
step S44, performing grey correlation analysis on the verification result information and all initial parameters in the anomaly detection model to obtain the correlation degree of the verification result information and all the initial parameters;
step S45, adjusting the initial parameters based on the verification result information and the association degree to obtain an abnormal detection model with the adjusted initial parameters, wherein if the verification result is that the test set is inconsistent with the verification set, the initial parameters with the maximum association degree with the verification result are adjusted until the verification result is that the test set is consistent with the verification set;
and step S46, sending the first information to the abnormality detection model with the initial parameters adjusted for second abnormality detection, and screening the third information by data after the second abnormality detection to obtain screened abnormal commodity sales data.
The abnormal commodity sales data are classified, sent to the abnormal detection model for detection based on the historical data, subjected to gray correlation analysis on the detection result and the initial parameters to obtain the correlation degree of the detection result and the initial parameters, adjusted according to the correlation degree to obtain the adjusted abnormal detection model, and subjected to secondary abnormal data screening to obtain the screened abnormal commodity sales data.
Example 2
As shown in fig. 2, the present embodiment provides an abnormal data detection apparatus, which includes a first obtaining unit 701, a first processing unit 702, a second processing unit 703 and a third processing unit 704.
A first acquiring unit 701 configured to acquire first information, the first information being information on commodity sales data of at least one store;
a first processing unit 702, configured to send the first information to a data preprocessing model to obtain second information, where the second information is price characteristic data information and sales characteristic data information obtained by preprocessing the first information;
the second processing unit 703 is configured to send the second information to an anomaly detection model for performing anomaly data detection, so as to obtain third information, where the third information is anomalous commodity sales data obtained by performing twice clustering and screening on the second information;
and the third processing unit 704 is configured to send the third information to the verification module for processing, so as to obtain fourth information, where the fourth information is abnormal commodity sales data screened by the model after the parameter verification.
In one embodiment of the present disclosure, the first processing unit 702 includes a first processing subunit 7021, a second processing subunit 7022, and a third processing subunit 7023.
A first processing subunit 7021, configured to perform data processing on the commodity sales data information, remove invalid data in the first information, and perform mean value filling on incomplete data in the first information to obtain first sub information;
a second processing subunit 7022, configured to obtain feature data of the first information through calculation based on the first sub information and a preset calculation formula, where the feature data of the first information includes price feature data and sales feature data;
a third processing subunit 7023, configured to perform normalization on the feature data of the first information, and perform smoothing on the feature data after the normalization to obtain the preprocessed first information.
In a specific embodiment of the present disclosure, the second processing unit 703 includes a first clustering subunit 7031, a fourth processing subunit 7032, and a second clustering subunit 7033.
A first clustering subunit 7031, configured to send the price characteristic data information in the second information to a first clustering module for clustering, so as to obtain first clustering information, where the first clustering information is abnormal data information in the price characteristic data information;
a fourth processing subunit 7032, configured to perform data corresponding mapping on the first clustering information and the sales characteristic data information in the second information to obtain second sub information, where the second sub information includes the sales characteristic data information corresponding to the first clustering information;
and a second clustering subunit 7033, configured to send the second sub information to a second clustering module for processing, so as to obtain second clustering information, where the second clustering information is abnormal commodity sales data obtained by performing twice clustering and screening on the second information.
In a specific embodiment of the present disclosure, the first clustering subunit 7031 includes a third clustering subunit 70311, a fourth clustering subunit 70312, a fifth clustering subunit 70313, and a sixth clustering subunit 70314.
A third clustering subunit 70311, configured to traverse the price feature data information based on preset first initial parameter information, and process the price feature data according to a generation method of a clustering feature tree in a BIRCH algorithm to obtain a price clustering feature tree;
a fourth clustering subunit 70312, configured to obtain at least one clustering feature cluster based on the price clustering feature tree, and obtain a threshold range corresponding to each clustering feature cluster;
a fifth clustering subunit 70313, configured to analyze all the threshold ranges, and use a minimum threshold range in all the threshold ranges as a normal threshold range for determining a normal point;
a sixth clustering subunit 70314, configured to determine an abnormal point in the price clustering feature tree based on the normal threshold range, and determine abnormal data information in the price feature data information based on the abnormal point.
In a specific embodiment of the present disclosure, the second clustering subunit 7033 includes a seventh clustering subunit 70331, an eighth clustering subunit 70332, a ninth clustering subunit 70333, and a tenth clustering subunit 70334.
A seventh clustering subunit 70331, configured to perform data processing based on a preset second initial parameter and data information in the second sub information, where the data information in the second sub information is converted into coordinate data points in a spatial coordinate system, and a mutual reachable distance between each coordinate data point is obtained based on each coordinate data point;
an eighth clustering subunit 70332, configured to generate a weighted distance map based on the mutual reachable distances, and generate a minimum spanning tree of the mutual reachable distances based on the weighted distance map;
a ninth clustering subunit 70333, configured to convert the minimum spanning tree into a component of a hierarchical cluster structure according to the mutual reachable distance, and construct a hierarchical cluster structure based on the component of the hierarchical cluster structure;
a tenth clustering subunit 70334, configured to compress the hierarchical cluster structure, and classify the data information in the second sub information based on the compressed hierarchical cluster structure, to obtain abnormal data in the second sub information.
In a specific embodiment of the present disclosure, the third processing unit 704 includes a first obtaining sub-unit 7041, a fifth processing sub-unit 7042, a sixth processing sub-unit 7043, a seventh processing sub-unit 7044, an eighth processing sub-unit 7045, and a ninth processing sub-unit 7046.
A first obtaining subunit 7041, configured to obtain third sub information, where the third sub information includes normal sales data information of historical commodities and abnormal sales data information of historical commodities;
a fifth processing subunit 7042, configured to divide the third sub information into a test set and a verification set, and send the test set of the third sub information to the second clustering module for processing, so as to obtain historical abnormal commodity sales data;
a sixth processing subunit 7043, configured to compare the historical abnormal commodity sales data with the verification set, to obtain verification result information;
a seventh processing subunit 7044, configured to perform gray correlation analysis on the verification result information and all initial parameters in the anomaly detection model to obtain correlation degrees between the verification result information and all initial parameters;
an eighth processing subunit 7045, configured to adjust the initial parameter based on the verification result information and the association degree, to obtain an abnormal detection model with an adjusted initial parameter, where if a verification result is that the test set is inconsistent with the verification set, the initial parameter with the largest association degree with the verification result is adjusted until the verification result is that the test set is consistent with the verification set;
a ninth processing subunit 7046, configured to send the first information to the abnormality detection model after the initial parameter is adjusted to perform second abnormality detection, and screen the third information according to data after the second abnormality detection, so as to obtain screened abnormal commodity sales data.
It should be noted that, regarding the apparatus in the above embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated herein.
Example 3
Corresponding to the above method embodiment, the embodiment of the present disclosure further provides an abnormal data detection apparatus, and an abnormal data detection apparatus described below and an abnormal data detection method described above may be referred to in a corresponding manner.
Fig. 3 is a block diagram illustrating an abnormal data detecting apparatus 800 according to an exemplary embodiment. As shown in fig. 3, the abnormal data detecting apparatus 800 may include: a processor 801, a memory 802. The anomalous data detection device 800 can also include one or more of a multimedia component 803, an input/output (I/O) interface 804, and a communication component 805.
The processor 801 is configured to control the overall operation of the abnormal data detecting apparatus 800, so as to complete all or part of the steps in the abnormal data detecting method. The memory 802 is used to store various types of data to support operation at the anomalous data detection device 800, such data can include, for example, instructions for any application or method operating on the anomalous data detection device 800, as well as application related data such as contact data, messaging, pictures, audio, video, and so forth. The Memory 802 may be implemented by any type of volatile or non-volatile Memory device or combination thereof, such as Static Random Access Memory (SRAM), Electrically Erasable Programmable Read-Only Memory (EEPROM), Erasable Programmable Read-Only Memory (EPROM), Programmable Read-Only Memory (PROM), Read-Only Memory (ROM), magnetic Memory, flash Memory, magnetic disk or optical disk. The multimedia components 803 may include screen and audio components. Wherein the screen may be, for example, a touch screen and the audio component is used for outputting and/or inputting audio signals. For example, the audio component may include a microphone for receiving external audio signals. The received audio signal may further be stored in the memory 802 or transmitted through the communication component 805. The audio assembly also includes at least one speaker for outputting audio signals. The I/O interface 804 provides an interface between the processor 801 and other interface modules, such as a keyboard, mouse, buttons, etc. These buttons may be virtual buttons or physical buttons. The communication component 805 is used for wired or wireless communication between the abnormal data detecting apparatus 800 and other apparatuses. Wireless communication, such as Wi-Fi, bluetooth, Near Field Communication (NFC), 2G, 3G, or 4G, or a combination of one or more of them, so that the corresponding communication component 805 may include: Wi-Fi module, bluetooth module, NFC module.
In an exemplary embodiment, the abnormal data detecting apparatus 800 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, microcontrollers, microprocessors, or other electronic components, for performing one of the above abnormal data detecting methods.
In another exemplary embodiment, there is also provided a computer readable storage medium including program instructions which, when executed by a processor, implement the steps of the above-described abnormal data detection method. For example, the computer readable storage medium may be the above-described memory 802 including program instructions executable by the processor 801 of the abnormal data detecting apparatus 800 to perform the above-described abnormal data detecting method.
Example 4
Corresponding to the above method embodiment, the embodiment of the present disclosure further provides a readable storage medium, and a readable storage medium described below and an abnormal data detection method described above may be referred to in correspondence with each other.
A readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the abnormal data detection method of the above method embodiment.
The readable storage medium may be a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, an optical disk, or other various readable storage media capable of storing program codes.
The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.
The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (10)

1. An abnormal data detection method, comprising:
acquiring first information, wherein the first information is commodity sales data information of at least one store;
sending the first information to a data preprocessing model to obtain second information, wherein the second information is price characteristic data information and sales characteristic data information obtained by preprocessing the first information;
sending the second information to an anomaly detection model for anomaly data detection to obtain third information, wherein the third information is anomalous commodity sales data obtained by performing twice clustering screening on the second information;
and sending the third information to a verification module for processing to obtain fourth information, wherein the fourth information is abnormal commodity sales data screened by the model after the parameters are verified.
2. The abnormal data detection method according to claim 1, wherein sending the first information to a data preprocessing model to obtain second information, the second information being information obtained by preprocessing the first information, comprises:
carrying out data processing on the commodity sales data information, clearing invalid data in the first information, and carrying out mean value filling on incomplete data in the first information to obtain first sub information;
calculating to obtain characteristic data of the first information based on the first sub-information and a preset calculation formula, wherein the characteristic data of the first information comprises price characteristic data and sales characteristic data;
and normalizing the characteristic data of the first information, and smoothing the characteristic data after normalization to obtain the preprocessed first information.
3. The abnormal data detection method according to claim 1, wherein sending the second information to an abnormal data detection model for abnormal data detection to obtain third information, comprises:
sending the price characteristic data information in the second information to a first clustering module for clustering to obtain first clustering information, wherein the first clustering information is abnormal data information in the price characteristic data information;
performing data corresponding mapping on the first clustering information and the sales characteristic data information in the second information to obtain second sub-information, wherein the second sub-information comprises the sales characteristic data information corresponding to the first clustering information;
and sending the second sub-information to a second clustering module for processing to obtain second clustering information, wherein the second clustering information is abnormal commodity sales data obtained by performing twice clustering screening on the second information.
4. The abnormal data detection method according to claim 3, wherein sending the price feature data information in the second information to a first clustering module for clustering to obtain first clustering information comprises:
traversing the price characteristic data information based on preset first initial parameter information, and processing the price characteristic data according to a generation method of a clustering characteristic tree in a BIRCH algorithm to obtain a price clustering characteristic tree;
obtaining at least one clustering feature cluster based on the price clustering feature tree, and obtaining a threshold range corresponding to each clustering feature cluster;
analyzing all the threshold value ranges, and taking the minimum threshold value range in all the threshold value ranges as a normal threshold value range for judging a normal point;
and determining abnormal points in the price clustering feature tree based on the normal threshold range, and judging abnormal data information in the price feature data information based on the abnormal points.
5. An abnormal data detecting apparatus, comprising:
a first acquisition unit configured to acquire first information including commodity sales data information of at least one store;
the first processing unit is used for sending the first information to a data preprocessing model to obtain second information, wherein the second information is price characteristic data information and sales characteristic data information obtained by preprocessing the first information;
the second processing unit is used for sending the second information to an anomaly detection model for anomaly data detection to obtain third information, and the third information is anomalous commodity sales data obtained by performing twice clustering screening on the second information;
and the third processing unit is used for sending the third information to the verification module for processing to obtain fourth information, and the fourth information is abnormal commodity sales data screened by the model after the parameters are verified.
6. The abnormal data detecting apparatus according to claim 5, wherein the apparatus comprises:
the first processing subunit is used for carrying out data processing on the commodity sales data information, eliminating invalid data in the first information, and carrying out mean value filling on incomplete data in the first information to obtain first sub information;
the second processing subunit is used for calculating to obtain the characteristic data of the first information based on the first sub-information and a preset calculation formula, wherein the characteristic data of the first information comprises price characteristic data and sales characteristic data;
and the third processing subunit is used for carrying out normalization processing on the feature data of the first information and carrying out smoothing processing on the feature data after the normalization processing to obtain the preprocessed first information.
7. The abnormal data detecting apparatus according to claim 5, wherein the apparatus comprises:
the first clustering subunit is used for sending the price characteristic data information in the second information to a first clustering module for clustering to obtain first clustering information, wherein the first clustering information is abnormal data information in the price characteristic data information;
the fourth processing subunit is configured to perform data corresponding mapping on the first clustering information and the sales characteristic data information in the second information to obtain second sub-information, where the second sub-information includes the sales characteristic data information corresponding to the first clustering information;
and the second clustering subunit is used for sending the second sub-information to a second clustering module for processing to obtain second clustering information, wherein the second clustering information is abnormal commodity sales data obtained by performing twice clustering screening on the second information.
8. The abnormal data detecting apparatus according to claim 7, wherein the apparatus comprises:
the third clustering subunit is used for traversing the price characteristic data information based on preset first initial parameter information and processing the price characteristic data according to a generation method of a clustering characteristic tree in a BIRCH algorithm to obtain a price clustering characteristic tree;
the fourth clustering subunit is used for obtaining at least one clustering feature cluster based on the price clustering feature tree and obtaining a threshold range corresponding to each clustering feature cluster;
the fifth clustering subunit is used for analyzing all the threshold value ranges, and taking the minimum threshold value range in all the threshold value ranges as a normal threshold value range for judging normal points;
and the sixth clustering subunit is used for determining abnormal points in the price clustering feature tree based on the normal threshold range and judging abnormal data information in the price feature data information based on the abnormal points.
9. An abnormal data detecting apparatus, comprising:
a memory for storing a computer program;
a processor for implementing the steps of the abnormal data detecting method according to any one of claims 1 to 4 when executing the computer program.
10. A readable storage medium, characterized by: the readable storage medium has stored thereon a computer program which, when executed by a processor, implements the steps of the abnormal data detecting method according to any one of claims 1 to 4.
CN202210458381.2A 2022-04-27 2022-04-27 Abnormal data detection method, device, equipment and readable storage medium Active CN114708003B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210458381.2A CN114708003B (en) 2022-04-27 2022-04-27 Abnormal data detection method, device, equipment and readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210458381.2A CN114708003B (en) 2022-04-27 2022-04-27 Abnormal data detection method, device, equipment and readable storage medium

Publications (2)

Publication Number Publication Date
CN114708003A true CN114708003A (en) 2022-07-05
CN114708003B CN114708003B (en) 2023-11-10

Family

ID=82177116

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210458381.2A Active CN114708003B (en) 2022-04-27 2022-04-27 Abnormal data detection method, device, equipment and readable storage medium

Country Status (1)

Country Link
CN (1) CN114708003B (en)

Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105809448A (en) * 2014-12-30 2016-07-27 阿里巴巴集团控股有限公司 Account transaction clustering method and system thereof
US9454785B1 (en) * 2015-07-30 2016-09-27 Palantir Technologies Inc. Systems and user interfaces for holistic, data-driven investigation of bad actor behavior based on clustering and scoring of related data
CN106529968A (en) * 2016-09-29 2017-03-22 深圳大学 Customer classification method and system thereof based on transaction data
KR101834260B1 (en) * 2017-01-18 2018-03-06 한국인터넷진흥원 Method and Apparatus for Detecting Fraudulent Transaction
CN107918905A (en) * 2017-11-22 2018-04-17 阿里巴巴集团控股有限公司 Abnormal transaction identification method, apparatus and server
CN109389453A (en) * 2017-08-11 2019-02-26 苏宁云商集团股份有限公司 A kind of price analysis method and device
CN110046889A (en) * 2019-03-20 2019-07-23 腾讯科技(深圳)有限公司 A kind of detection method, device and the server of abnormal behaviour main body
CN110400220A (en) * 2019-07-23 2019-11-01 上海氪信信息技术有限公司 A kind of suspicious transaction detection method of intelligence based on semi-supervised figure neural network
US20200314159A1 (en) * 2019-03-29 2020-10-01 Paypal, Inc. Anomaly detection for streaming data
US20210333280A1 (en) * 2020-04-23 2021-10-28 YatHing Biotechnology Company Limited Methods related to the diagnosis of prostate cancer
CN113988148A (en) * 2020-07-10 2022-01-28 华为技术有限公司 Data clustering method, system, computer equipment and storage medium
CN114077872A (en) * 2021-11-29 2022-02-22 税友软件集团股份有限公司 Data anomaly detection method and related device
CN114186626A (en) * 2021-12-09 2022-03-15 中国建设银行股份有限公司 Abnormity detection method and device, electronic equipment and computer readable medium
CN114548276A (en) * 2022-02-22 2022-05-27 Oppo广东移动通信有限公司 Method and device for clustering data, electronic equipment and storage medium
CN115510982A (en) * 2022-09-29 2022-12-23 联想(北京)有限公司 Clustering method, device, equipment and computer storage medium

Patent Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105809448A (en) * 2014-12-30 2016-07-27 阿里巴巴集团控股有限公司 Account transaction clustering method and system thereof
US9454785B1 (en) * 2015-07-30 2016-09-27 Palantir Technologies Inc. Systems and user interfaces for holistic, data-driven investigation of bad actor behavior based on clustering and scoring of related data
CN106529968A (en) * 2016-09-29 2017-03-22 深圳大学 Customer classification method and system thereof based on transaction data
KR101834260B1 (en) * 2017-01-18 2018-03-06 한국인터넷진흥원 Method and Apparatus for Detecting Fraudulent Transaction
CN109389453A (en) * 2017-08-11 2019-02-26 苏宁云商集团股份有限公司 A kind of price analysis method and device
CN107918905A (en) * 2017-11-22 2018-04-17 阿里巴巴集团控股有限公司 Abnormal transaction identification method, apparatus and server
CN110046889A (en) * 2019-03-20 2019-07-23 腾讯科技(深圳)有限公司 A kind of detection method, device and the server of abnormal behaviour main body
US20200314159A1 (en) * 2019-03-29 2020-10-01 Paypal, Inc. Anomaly detection for streaming data
CN110400220A (en) * 2019-07-23 2019-11-01 上海氪信信息技术有限公司 A kind of suspicious transaction detection method of intelligence based on semi-supervised figure neural network
US20210333280A1 (en) * 2020-04-23 2021-10-28 YatHing Biotechnology Company Limited Methods related to the diagnosis of prostate cancer
CN113988148A (en) * 2020-07-10 2022-01-28 华为技术有限公司 Data clustering method, system, computer equipment and storage medium
CN114077872A (en) * 2021-11-29 2022-02-22 税友软件集团股份有限公司 Data anomaly detection method and related device
CN114186626A (en) * 2021-12-09 2022-03-15 中国建设银行股份有限公司 Abnormity detection method and device, electronic equipment and computer readable medium
CN114548276A (en) * 2022-02-22 2022-05-27 Oppo广东移动通信有限公司 Method and device for clustering data, electronic equipment and storage medium
CN115510982A (en) * 2022-09-29 2022-12-23 联想(北京)有限公司 Clustering method, device, equipment and computer storage medium

Non-Patent Citations (16)

* Cited by examiner, † Cited by third party
Title
DANILO LABANCA等: "Amaretto: An Active Learning Framework for Money Laundering Detection", IEEE ACCESS, vol. 10, pages 41720 *
IBRAHIM K. OGUNDOYIN等: "DESIGN AND SIMULATION OF AN EFFICIENT MODEL FOR CREDIT CARDS FRAUD DETECTION", JOURNAL OF ENGINEERING AND TECHNOLOGY, vol. 16, no. 1, pages 88 - 99 *
MOHIUDDIN AHMED等: "A survey of anomaly detection techniques in financial domain", FUTURE GENERATION COMPUTER SYSTEMS, vol. 55, pages 278 - 288 *
Retrieved from the Internet <URL:https://zhuanlan.zhihu.com/p/412918565> *
THUSHARA AMARASINGHE: "Critical Analysis of Machine Learning Based Approaches for Fraud Detection in Financial Transactions", PROCEEDINGS OF THE 2018 INTERNATIONAL CONFERENCE ON MACHINE LEARNING TECHNOLOGIES, pages 12 - 17 *
YAN YANG: "DBSCAN Clustering Algorithm Applied to Identify Suspicious Financial Transactions", 2014 INTERNATIONAL CONFERENCE ON CYBER-ENABLED DISTRIBUTED COMPUTING AND KNOWLEDGE DISCOVERY, pages 60 *
ZHENGUO CHEN等: "Anomaly Detection Based on Enhanced DBScan Algorithm", SCIVERSE SCIENCEDIRECT, vol. 15, pages 178 - 182, XP028337358, DOI: 10.1016/j.proeng.2011.08.036 *
余胜辉: "基于Spark的层次聚类算法的并行化研究", 计算机技术与发展, vol. 30, no. 6, pages 19 - 22 *
朱琳: "银行交易大数据洗钱挖掘模型及应用研究", no. 2021, pages 7 - 9 *
朱琳: "银行交易大数据洗钱挖掘模型及应用研究", 中国优秀硕士学位论文全文数据库信息科技辑, no. 2, pages 7 - 9 *
王红雨: "基于机器学习的信用卡欺诈检测方案的研究", CNKI优秀硕士学位论文全文库, vol. 2019, no. 08, pages 1 - 66 *
罗钦芳 等: "基于"多层次分类"方法的异常P2P网贷借款识别", 管理工程学报, vol. 31, no. 3, pages 201 - 209 *
詹姆斯德1: "Birch算法介绍", pages 1 - 9 *
赵学华: "基于过采样的不平衡数据集成分类算法研究", CNKI优秀硕士学位论文全文库, vol. 2021, no. 02, pages 1 - 79 *
陈敏昊: "基于定性数据聚类的孤立森林算法", CNKI优秀硕士学位论文全文库, vol. 2022, no. 3, pages 1 - 56 *
韩鑫: "基于核的层次聚类算法研究", CNKI优秀硕士学位论文全文库, vol. 2021, no. 9, pages 1 - 65 *

Also Published As

Publication number Publication date
CN114708003B (en) 2023-11-10

Similar Documents

Publication Publication Date Title
CN110489314B (en) Model anomaly detection method and device, computer equipment and storage medium
JP6667865B1 (en) Accounting information processing apparatus, accounting information processing method, and accounting information processing program
CN112987675A (en) Method, device, computer equipment and medium for anomaly detection
CN112884092A (en) AI model generation method, electronic device, and storage medium
CN112818066A (en) Time sequence data anomaly detection method and device, electronic equipment and storage medium
KR101924352B1 (en) Method for detecting issue based on trend analysis device thereof
CN111177655B (en) Data processing method and device and electronic equipment
CN112364939A (en) Abnormal value detection method, device, equipment and storage medium
CN111027531A (en) Pointer instrument information identification method and device and electronic equipment
JP2019215698A (en) Image inspection support apparatus and method
CN113435753A (en) Enterprise risk judgment method, device, equipment and medium in high-risk industry
CN116029617B (en) Quality acceptance form generation method, device, equipment and readable storage medium
CN115796846B (en) Equipment cleaning service recommendation method, device, equipment and readable storage medium
CN114708003A (en) Abnormal data detection method, device and equipment and readable storage medium
CN115827496A (en) Code abnormality detection method and device, electronic equipment and storage medium
CN116662186A (en) Log playback assertion method and device based on logistic regression and electronic equipment
CN114240928B (en) Partition detection method, device and equipment for board quality and readable storage medium
CN108446907B (en) Safety verification method and device
US11520831B2 (en) Accuracy metric for regular expression
CN111798237A (en) Abnormal transaction diagnosis method and system based on application log
CN113515684A (en) Abnormal data detection method and device
US11100449B1 (en) Systems and methods for efficiency management
CN117227551B (en) New energy equipment safety monitoring method, device, equipment and readable storage medium
CN114356743B (en) Abnormal event automatic detection method and system based on sequence reconstruction
CN112766059B (en) Method and device for detecting product processing quality

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant