CN117235651B - Enterprise information data optimization management system based on Internet of things - Google Patents

Enterprise information data optimization management system based on Internet of things Download PDF

Info

Publication number
CN117235651B
CN117235651B CN202311506793.XA CN202311506793A CN117235651B CN 117235651 B CN117235651 B CN 117235651B CN 202311506793 A CN202311506793 A CN 202311506793A CN 117235651 B CN117235651 B CN 117235651B
Authority
CN
China
Prior art keywords
sales
sales data
type
data sequence
value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202311506793.XA
Other languages
Chinese (zh)
Other versions
CN117235651A (en
Inventor
唐宁
罗志明
邱少明
李正华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hunan Jingtai Information System Co ltd
Original Assignee
Hunan Jingtai Information System Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hunan Jingtai Information System Co ltd filed Critical Hunan Jingtai Information System Co ltd
Priority to CN202311506793.XA priority Critical patent/CN117235651B/en
Publication of CN117235651A publication Critical patent/CN117235651A/en
Application granted granted Critical
Publication of CN117235651B publication Critical patent/CN117235651B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention relates to the technical field of data processing, in particular to an enterprise information data optimization management system based on the Internet of things, which comprises the following steps: the system comprises a sales data acquisition module, a segmented node acquisition module, an optimization K value acquisition module and an anomaly detection module, wherein the sales data acquisition module, the segmented node acquisition module, the optimization K value acquisition module and the anomaly detection module are used for acquiring each type of sales data sequence of each sales position; acquiring a plurality of segment nodes of each type of sales data sequence of each sales location; obtaining a plurality of real interval ranges according to the segmentation nodes, calculating an initial K value and a weight value of each interval range, and optimizing the K value of each real interval range according to the weight value and the initial K value of each real interval range; abnormal detection is carried out according to the optimized K value of each real interval range to obtain abnormal data, and the accuracy of the abnormal detection can be effectively improved through the self-adaptive K value.

Description

Enterprise information data optimization management system based on Internet of things
Technical Field
The invention relates to the technical field of data processing, in particular to an enterprise information data optimization management system based on the Internet of things.
Background
Conventional enterprise information data management methods often fail to efficiently process large-scale, high-frequency, multi-source data streams from internet of things devices. Taking sales data in enterprise information data as an example, sales data is critical to the business and decision of an enterprise, but in an internet of things environment, the volume and complexity of sales data are increasing, and an enterprise may face abnormal situations in sales data, such as abnormal orders, supply chain problems, fraudulent transactions, or inventory management problems, which may negatively affect the business and profits of an enterprise. Therefore, abnormality detection of sales data is required.
The conventional LOF algorithm is a method for detecting outliers in a data set, the algorithm achieves outlier detection by analyzing the difference between the distribution density of each data and the distribution density of other data in a local neighborhood, and the K value in the LOF algorithm is used to define the size of the local neighborhood, so that the reasonability of the K value setting of the LOF algorithm affects the accuracy of the algorithm outlier detection.
Because sales data has multiple types, not all types of sales data are fixed in variation, namely, some types of sales data are distributed densely, some types of sales data are distributed sparsely, and abnormality detection is carried out on the data by using a fixed K value, so that the accuracy of abnormality detection can be influenced.
Disclosure of Invention
The invention provides an enterprise information data optimization management system based on the Internet of things, which aims to solve the existing problems: how to accurately detect abnormal data in sales data.
The enterprise information data optimization management system based on the Internet of things adopts the following technical scheme:
the embodiment of the invention provides an enterprise information data optimization management system based on the Internet of things, which comprises the following modules:
the sales data acquisition module is used for acquiring each type of sales data sequence of each sales position in the enterprise information database;
the segmented node acquisition module is used for obtaining a plurality of changed nodes of each type of sales data sequence of each sales position according to the fluctuation consistency condition of the data in each area in the sales data sequence, obtaining the confidence degree of each changed node according to the time interval between each changed node in each type of sales data sequence of each sales position and the changed nodes of other sales data sequences, and obtaining a plurality of segmented nodes of each type of sales data sequence of each sales position according to the confidence degree of each changed node;
the optimizing K value acquisition module is used for dividing each type of sales data sequence of each sales position into a plurality of real interval ranges by utilizing the segmentation nodes, and obtaining an initial K value of each real interval range according to the fluctuation condition of data in each real interval range; acquiring a plurality of virtual nodes of each type of sales data sequence of each sales location, and dividing each type of sales data sequence of each sales location into a plurality of virtual interval ranges by using the virtual nodes; obtaining the similarity of each virtual interval range according to the similarity of the fluctuation condition of each virtual interval range of each type of sales data sequence of each sales location and the fluctuation condition of the virtual interval ranges of other types of sales data sequences of other sales locations, obtaining the weight of each real interval range according to the similarity of each virtual interval range, and optimizing the initial K value according to the weight of each real interval range to obtain the optimized K value of each real interval range;
the anomaly detection module is used for carrying out anomaly detection on the data in each type of sales data sequence of each sales position according to the optimized K value of each real interval range to obtain anomaly data.
Preferably, the method for obtaining a plurality of change nodes of each type of sales data sequence of each sales location according to the fluctuation consistency of data in each area in the sales data sequence includes the following specific steps:
for each type of sales data sequence of any sales place, presetting a reference size C, taking first sales data of the sales data sequence as a first datum point, acquiring a change node of the first datum point, acquiring a position sequence W of the change node of the first datum point in the sales data sequence, recording the accumulation sum of the position sequence of the change node of the first datum point and the reference size C as a cut-off mark value of the change node of the first datum point, comparing the cut-off mark value of the change node of the first datum point with the length of the sales data sequence, and ending the cycle when the cut-off mark value is larger than the length of the sales data sequence; when the cut-off mark value is smaller than or equal to the length of the sales data sequence, the change node of the first datum point is used as a second datum point, the change node of the second datum point is obtained, the position sequence W of the change node of the second datum point in the sales data sequence is obtained, the sum of the position sequence of the change node of the second datum point and the reference size C is recorded as the cut-off mark value of the change node of the second datum point, the cut-off mark value of the change node of the second datum point is compared with the length of the sales data sequence, when the cut-off mark value is larger than the length of the sales data sequence, the cycle is ended, and the like until the cycle is ended.
Preferably, the method for obtaining the change node of the first reference point includes the following specific steps:
setting a first window with a size of 1*C, aligning the left side of the first window with a first datum point, acquiring data in the first window in a sales data sequence, taking the last data in the first window as a first candidate change node, acquiring the preference degree of the first candidate change node according to the first window, comparing the preference degree of the first candidate change node with a preset preference threshold Y1, taking the first candidate change node as a change node when the preference degree of the first candidate change node is greater than or equal to the preset preference threshold, setting a second window with a size of 1 x (C+1) when the preference degree of the first candidate change node is less than the preset preference threshold, aligning the left side of the second window with the first datum point, acquiring data in the second window in the sales data sequence, taking the last data in the second window as a second candidate change node, and acquiring the preference degree of the second candidate change node according to the second window;
similarly, when the preference degree of the n-1 candidate change node is smaller than a preset preference threshold, setting an n window with the size of 1 (C+n-1), aligning the left side of the n window with the first datum point, acquiring data in the n window in a sales data sequence, taking the last data in the n window as the n candidate change node, and obtaining the preference degree of the n candidate change node according to the n window until the preference degree of the candidate change node is larger than or equal to the preset preference threshold, ending to obtain the change node of the first datum point.
Preferably, the obtaining the preference degree of the first candidate change node according to the first window includes the following specific methods:
wherein,representing the variance of all sales data before the ith sales data in the first window, +.>Representing the variance of all sales data before the i-1 st sales data in the first window, H representing the number of sales data in the first window, +.>Representing a preset reference dimension->() Representing a linear normalization process,/->Indicating the degree of preference of the first candidate change node.
Preferably, the confidence level of each change node is obtained according to the time interval between each change node in each type of sales data sequence of each sales location and the change nodes of other sales data sequences, and a plurality of segment nodes of each type of sales data sequence of each sales location are obtained according to the confidence level of each change node, including the following specific methods:
for any sales location, the kth change node in the jth type of sales data sequence is noted asAcquisition and change node in sales data sequence of type z->The change node with the nearest time interval is marked as change node +.>Reference node +.>Acquiring a reference node of each change node on each type;
the method for obtaining the confidence level of each node according to the reference node of each change node on each type comprises the following steps:
wherein,representing the number of types of sales data, exp () representing an exponential function based on a natural constant,/->Representing the confidence level of the kth variation node in the jth type of sales data sequence for each sales location;
and taking the change node with the confidence degree larger than the preset confidence threshold value as a segmentation node.
Preferably, the obtaining the initial K value of each real interval range according to the fluctuation condition of the data in each real interval range includes the following specific methods:
wherein,mth sales data in each real section range in each type of sales data sequence representing each sales location, +.>Mean value of all sales data in each real section range in each type of sales data sequence representing each sales location, +.>Representing the amount of sales data within each real interval in each type of sales data sequence for each sales location,/>representing preset adjustment parameters->Representing a preset reference value,/-, and>an initial K value, # representing each real interval range in each type of sales data sequence for each sales location>Representing rounding up symbols.
Preferably, the method for obtaining the plurality of virtual nodes of each type of sales data sequence of each sales location includes the following specific steps:
acquiring the positions of the segment nodes in each type of sales data sequence of each sales position, and forming a node position set from the positions of all segment nodes in all types of sales data sequences of all sales positions; each element in the node position set is called a node position;
the sales data at each node location is obtained in each type of sales data sequence for each sales location, and recorded as a virtual node of each type of sales data sequence for each sales location.
Preferably, the method for obtaining the similarity of each virtual interval range according to the similarity of the fluctuation condition of each virtual interval range of each type of sales data sequence of each sales location and the fluctuation condition of the virtual interval ranges of other types of sales data sequences of other sales locations includes the following specific steps:
wherein,type b representing the a sales locationS sales data within each virtual interval of the sales data sequence of (a), +.>Mean value of the s-th sales data within each virtual interval of the b-th type sales data sequence representing all sales locations, +.>Mean value of the s-th sales data within each virtual interval of all types of sales data sequences representing the a-th sales location, +.>Indicates the number of sales locations, +.>Representing the number of types of sales data +.>The number of sales data within each virtual interval of the sales data sequence of type b representing the a-th sales location,/->Similarity of each virtual section range of sales data sequences of type b representing the a-th sales location,/for each virtual section range>Representing a continuous multiplication symbol.
Preferably, the obtaining the weight of each real interval range according to the similarity of each virtual interval range, and optimizing the initial K value according to the weight of each real interval range to obtain the optimized K value of each real interval range, includes the following specific steps:
obtaining a virtual interval range in each real interval range of each type of sales data sequence of each sales location, averaging the similarity of all virtual interval ranges in each real interval range of each type of sales data sequence of each sales location, and then taking the reciprocal to obtain the weight of each real interval range of each type of sales data sequence of each sales location;
the optimized weight of each real interval range is obtained according to the weight of each real interval range:
wherein,an initial K value, # representing each real interval range of each type of sales data sequence for each sales location>Weights for each real interval range of each type of sales data sequence representing each sales location, +.>An optimized K value representing each real interval range of each type of sales data sequence for each sales location.
Preferably, the anomaly detection is performed on the data in each type of sales data sequence of each sales location according to the optimized K value of each real interval range to obtain anomaly data, which comprises the following specific steps:
taking the optimized K value of each real interval range of each type of sales data sequence of each sales location as the optimized K value of each sales data in each type of sales data sequence of each sales location;
the optimized K value of each sales data is used as a K value, and the LOF algorithm is utilized to process the sales data in each type of sales data sequence of each sales position, so as to obtain an outlier set and an aggregation point set;
sales data in the outlier set is determined to be outlier data.
The technical scheme of the invention has the beneficial effects that:
and obtaining each type of sales data sequence of each sales position, and obtaining a plurality of real interval ranges according to the fluctuation consistency of the data of each area in the sales data sequence. By dividing the real interval range, the K value is not required to be set for each sales data, and only the K value is required to be set for each real interval range containing a plurality of data, so that the processing efficiency can be effectively improved. The initial K value is set according to the data fluctuation condition in each real interval range, and the K value can be set for each real interval range according to the data fluctuation condition, so that abnormal data false detection caused by the data fluctuation difference is effectively prevented. The weight of each real interval range is obtained according to the fluctuation consistency of the data in each real interval range and the data in other real interval ranges in the same time period, whether the data fluctuation in the real interval range is caused by abnormal data or data characteristics can be reflected by the weight of the real interval range, the initial K value is corrected by the weight to obtain an optimized K value of each real interval range, poor initial K value setting caused by the fluctuation of the abnormal data can be effectively avoided through the weight correction, and abnormal data missing detection caused by the poor initial K value setting is further effectively avoided. Compared with the traditional anomaly detection algorithm, the method can adaptively set the K value, and further improve the accuracy of anomaly detection.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a block diagram of an enterprise information data optimization management system based on the internet of things.
Detailed Description
In order to further describe the technical means and effects adopted by the invention to achieve the preset aim, the following detailed description is given below of the specific implementation, structure, characteristics and effects of the enterprise information data optimization management system based on the internet of things according to the invention by combining the accompanying drawings and the preferred embodiment. In the following description, different "one embodiment" or "another embodiment" means that the embodiments are not necessarily the same. Furthermore, the particular features, structures, or characteristics of one or more embodiments may be combined in any suitable manner.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.
The following specifically describes a specific scheme of the enterprise information data optimization management system based on the internet of things provided by the invention with reference to the accompanying drawings.
Referring to fig. 1, a block diagram of an enterprise information data optimization management system based on internet of things according to an embodiment of the present invention is shown, where the system includes the following modules:
a sales data acquisition module 101 for acquiring each type of sales data sequence for each sales site.
It should be noted that, the conventional enterprise information data management method generally cannot effectively process large-scale, high-frequency and multi-source data streams from the internet of things device. Taking sales data in enterprise information data as an example, sales data is critical to the business and decision of an enterprise, in an internet of things environment, the volume and complexity of sales data are increasing, and an enterprise may face abnormal situations in sales data, such as abnormal orders, supply chain problems, fraudulent transactions, or inventory management problems, which may negatively affect the business and profits of the enterprise. Therefore, abnormality detection of sales data is required.
In order to realize the enterprise information data optimization management system based on the internet of things, which is provided by the embodiment, each type of sales data sequence of each sales site needs to be collected, and the specific process is as follows: collecting sales data for each type of each point in time for each sales location in an enterprise information database, the types of sales data including, but not limited to, the following aspects, such as: the sales data of each type of sales locations are arranged in time sequence to obtain each type of sales data sequence of each sales location.
To this end, each type of sales data sequence for each sales site is obtained by the above-described method.
The segmented node obtaining module 102 is configured to obtain a plurality of variable nodes of each type of each sales location according to each type of sales data sequence of each sales location, calculate a confidence level of each variable node, and obtain a plurality of segmented nodes of each type of each sales location according to the confidence level of each variable node.
It should be noted that, the conventional LOF algorithm is a method for detecting outliers in a data set, the algorithm achieves outlier detection by analyzing the difference between the distribution density of each data and the distribution density of other data in a local neighborhood, and the K value in the LOF algorithm is used to define the size of the local neighborhood, so the reasonability of the K value setting of the LOF algorithm affects the accuracy of outlier detection of the algorithm. Because sales data has multiple types, not all types of sales data are fixed in variation, namely, some types of sales data are distributed densely, some types of sales data are distributed sparsely, and abnormality detection is carried out on the data by using a fixed K value, so that the accuracy of abnormality detection can be influenced.
It should be further noted that the distribution density of data is different in each type of sales data sequence of each sales site, wherein the distribution density of data with larger fluctuation is sparse, and the distribution density of data with smaller fluctuation is dense. So that the data with similar fluctuation degree also need the same K value. The sales data sequence is thus divided into sections according to the degree of fluctuation of the data of each section in the sales data sequence.
Specifically, for each type of sales data sequence of any sales site, a reference size C is preset, first sales data of the sales data sequence is used as a first reference point, a change node of the first reference point is obtained, the position order W of the change node of the first reference point in the sales data sequence is obtained, the sum of the position order of the change node of the first reference point and the reference size C is recorded as a cut-off flag value of the change node of the first reference point, the cut-off flag value of the change node of the first reference point is compared with the length of the sales data sequence, and when the cut-off flag value is larger than the length of the sales data sequence, the cycle is ended. When the cut-off mark value is smaller than or equal to the length of the sales data sequence, the change node of the first datum point is used as a second datum point, the change node of the second datum point is obtained, the position sequence W of the change node of the second datum point in the sales data sequence is obtained, the sum of the position sequence of the change node of the second datum point and the reference size C is recorded as the cut-off mark value of the change node of the second datum point, the cut-off mark value of the change node of the second datum point is compared with the length of the sales data sequence, when the cut-off mark value is larger than the length of the sales data sequence, the cycle is ended, and the like until the cycle is ended. In this embodiment, the example of 20 is taken as C, and other values may be taken in other embodiments, and the embodiment is not particularly limited.
And similarly, acquiring a plurality of change nodes of each type of sales data sequence of each sales location.
Further, the method for obtaining the change node of the first reference point comprises the following steps:
setting a first window with a size of 1*C, aligning the left side of the first window with a first datum point, acquiring data in the first window in a sales data sequence, taking the last data in the first window as a first candidate change node, obtaining the preference degree of the first candidate change node according to the first window, comparing the preference degree of the first candidate change node with a preset preference threshold Y1, taking the first candidate change node as a change node when the preference degree of the first candidate change node is greater than or equal to the preset preference threshold, setting a second window with a size of 1 x (C+1) when the preference degree of the first candidate change node is less than the preset preference threshold, aligning the left side of the second window with the first datum point, acquiring data in the second window in the sales data sequence, taking the last data in the second window as a second candidate change node, and obtaining the preference degree of the second candidate change node according to the second window.
Similarly, when the preference degree of the n-1 candidate change node is smaller than a preset preference threshold, setting an n window with the size of 1 (C+n-1), aligning the left side of the n window with the first datum point, acquiring data in the n window in a sales data sequence, taking the last data in the n window as the n candidate change node, and obtaining the preference degree of the n candidate change node according to the n window until the preference degree of the candidate change node is larger than or equal to the preset preference threshold, ending to obtain the change node of the first datum point. In this embodiment, Y1 is taken as an example of 0.7, and other values may be taken in other embodiments, and the embodiment is not particularly limited.
Further, the method for obtaining the preference degree of the first candidate change node according to the first window comprises the following steps:
wherein,representing the variance of all sales data before the ith sales data in the first window, which reflects the fluctuation of sales data before the ith sales data in the first window,/>Representing the variance of all sales data before the (i-1) th sales data in the first window, which reflects the fluctuation of sales data before the (i-1) th sales data in the first window,/day>Reflecting the difference between the fluctuation condition of sales data before the ith sales data and the fluctuation condition of sales data before the ith-1 sales data in the first window, wherein the larger the value is, the smaller the fluctuation consistency of the data is, namely, the larger the distribution density difference of the data is, and H represents the quantity of sales data in the first window>Indicating the preset reference size of the device,() Representing a linear normalization process,/->Indicating the degree of preference of the first candidate change node.
Further, the method for calculating the confidence level of each change node comprises the following steps:
for any sales location, the kth change node in the jth type of sales data sequence is noted asAcquiring and changing nodes in sales data sequences of type z>The change node with the nearest time interval is marked as change node +.>Reference node +.>. And similarly, acquiring a reference node of each change node.
The method for obtaining the confidence level of each node according to the reference node of each change node comprises the following steps:
wherein,representing the number of types of sales data, exp () represents an exponential function based on a natural constant,reflecting each sales locationThe distance between the kth change node in the j-th type sales data sequence and the reference node in other types of sales data sequences is that when the change node is used as a segment node, the type sales data sequence is segmented, the segment difference between the change node and the reference node is reflected, and the greater the value is, the less likely the change node is the segment node, namely the smaller the confidence degree of the change node is>Representing the confidence level of the kth variant node in the jth type of sales data sequence for each sales location.
Further, a change node with the confidence degree larger than the preset confidence threshold Y2 is used as a segmentation node. In this embodiment, Y2 is taken as an example of 0.8, and other values may be taken in other embodiments, and the embodiment is not particularly limited.
Thus, the segment nodes in each type of sales data sequence of each sales location are obtained, and the data of fluctuation consistency can be divided into a range through the segment nodes.
An optimized K value obtaining module 103, configured to obtain, according to each type of the plurality of segment nodes of each sales location, each type of the plurality of real section ranges of each sales location, calculate an initial K value of each real section range, obtain, according to all types of the plurality of segment nodes of all sales locations, each type of the plurality of virtual section ranges of each sales location, calculate a similarity degree of each virtual section range, obtain, according to the similarity degree of each virtual section range, a weight value of each type of each real section range of each sales location, and obtain, according to the initial K value and the weight value, an optimized K value of each type of each real section range of each sales location.
Specifically, each type of sales data sequence of each sales location is segmented into several real interval ranges by means of segmentation nodes.
It should be noted that, fluctuation consistency of sales data in each real interval range is larger, so that K values of sales data in each real interval range are the same, and thus, the K value is not required to be set for each sales data, only the K value is required to be set for each real interval range, and the calculation efficiency is effectively reduced.
It should be further noted that, since sales data itself has a fluctuation difference, if abnormality determination is directly performed on the data according to the data density condition in the fixed K-neighborhood, the determination accuracy is low. Some sales data are not abnormal data, and the fluctuation is large, so that in order to reduce abnormal misjudgment, the distribution density of more data is combined to judge the abnormality.
Further, the initial K value calculation method of each real interval range is as follows:
wherein,mth sales data in each real section range in each type of sales data sequence representing each sales location, +.>Mean value of all sales data in each real section range in each type of sales data sequence representing each sales location, +.>Representing the number of sales data in each real section range in each type of sales data sequence for each sales location, +.>Reflecting the fluctuation of the data in the range of each real section in each type of sales data sequence of each sales location, the larger the value is, the larger the K value is needed, namely, the more neighborhood data is needed to be referenced to judge the abnormality of the data in the range, the more the neighborhood data is needed to be>Representing presetsRegulating parameters (I)>Indicating a preset reference value. />Initial K value representing each real section range in each type of sales data sequence for each sales location, the present embodiment is expressed in +.>0.5 part,Taking 10 as an example for description, other embodiments can take other values, the embodiment is not particularly limited, and ∈ ->Representing rounding up symbols.
It should be noted that the detection accuracy is small by setting the K value of each real fluctuation range only according to the fluctuation degree of each real fluctuation range. Since the degree of fluctuation is likely to be abnormal data, a large set K value for the abnormal data may cause abnormal data omission. Thus, to prevent this problem, the initial K value needs to be adjusted.
It should be further noted that the large fluctuation of sales data is generally due to external interference, and the external interference does not generally cause large fluctuation of sales data of one type of sales location, which should cause large fluctuation of sales data of a plurality of types of sales locations, and the fluctuation has similar fluctuation consistency. The initial K value can be corrected by analyzing the fluctuation consistency of the sales data of various types of sales locations within each real section.
Further, the positions of the segment nodes in each type of sales data sequence of each sales location are obtained, and the positions of all segment nodes in all types of sales data sequences of all sales locations are formed into a node position set. Each element in the set of node locations is referred to as a node location.
The sales data at each node location is obtained in each type of sales data sequence for each sales location, and recorded as a virtual node of each type of sales data sequence for each sales location.
And dividing each type of sales data sequence of each sales location into a plurality of virtual interval ranges by using all virtual nodes.
The method for obtaining the similarity of each virtual interval range comprises the following steps:
wherein,the(s) th sales data within each virtual section of the (b) th type sales data sequence representing the (a) th sales location>Mean value of the s-th sales data within each virtual interval of the b-th type sales data sequence representing all sales locations, +.>Mean value of the s-th sales data within each virtual interval of all types of sales data sequences representing the a-th sales location, +.>Indicates the number of sales locations, +.>Representing the number of types of sales data +.>The number of sales data within each virtual interval of the sales data sequence of the type b representing the a sales location. />A similarity of each virtual span range of the sales data sequences of the b-th type representing the a-th sales location. />Representing a continuous multiplication symbol.
It should be noted that, the similarity calculation method refers to a pearson correlation coefficient calculation method, and the pearson correlation coefficient calculation method is the prior art, and is not described herein.
Further, the method for obtaining the weight of each real interval range according to the similarity of each virtual interval range comprises the following steps:
and obtaining a virtual interval range in each real interval range of each type of sales data sequence of each sales location, averaging the similarity of all virtual interval ranges in each real interval range of each type of sales data sequence of each sales location, and then taking the reciprocal to obtain the weight of each real interval range of each type of sales data sequence of each sales location.
The weight of each real interval range is obtained, and the initial K value of each real interval range is corrected by the weight.
Further, the optimized K value for each real interval range is:
wherein,an initial K value, # representing each real interval range of each type of sales data sequence for each sales location>Weights for each real interval range of each type of sales data sequence representing each sales location, +.>An optimized K value representing each real interval range of each type of sales data sequence for each sales location.
The anomaly detection module 104 is configured to perform anomaly detection on each type of sales data sequence of each sales location according to the optimized K value of each real interval range to obtain anomaly data.
Specifically, the optimized K value of each real section range of each type of sales data sequence of each sales location is taken as the optimized K value of each sales data in each type of sales data sequence of each sales location.
And processing the sales data in each type of sales data sequence of each sales position by using an LOF algorithm by taking the optimized K value of each sales data as the K value to obtain an outlier set and a collection point set.
Sales data in the outlier set is determined to be outlier data.
The above description is only of the preferred embodiments of the present invention and is not intended to limit the invention, but any modifications, equivalent substitutions, improvements, etc. within the principles of the present invention should be included in the scope of the present invention.

Claims (9)

1. The enterprise information data optimization management system based on the Internet of things is characterized by comprising the following modules:
the sales data acquisition module is used for acquiring each type of sales data sequence of each sales position in the enterprise information database;
the segmented node acquisition module is used for obtaining a plurality of changed nodes of each type of sales data sequence of each sales position according to the fluctuation consistency condition of the data in each area in the sales data sequence, obtaining the confidence degree of each changed node according to the time interval between each changed node in each type of sales data sequence of each sales position and the changed nodes of other sales data sequences, and obtaining a plurality of segmented nodes of each type of sales data sequence of each sales position according to the confidence degree of each changed node;
the optimizing K value acquisition module is used for dividing each type of sales data sequence of each sales position into a plurality of real interval ranges by utilizing the segmentation nodes, and obtaining an initial K value of each real interval range according to the fluctuation condition of data in each real interval range; acquiring a plurality of virtual nodes of each type of sales data sequence of each sales location, and dividing each type of sales data sequence of each sales location into a plurality of virtual interval ranges by using the virtual nodes; obtaining the similarity of each virtual interval range according to the similarity of the fluctuation condition of each virtual interval range of each type of sales data sequence of each sales location and the fluctuation condition of the virtual interval ranges of other types of sales data sequences of other sales locations, obtaining the weight of each real interval range according to the similarity of each virtual interval range, and optimizing the initial K value according to the weight of each real interval range to obtain the optimized K value of each real interval range;
the anomaly detection module is used for carrying out anomaly detection on data in each type of sales data sequence of each sales position according to the optimized K value of each real interval range to obtain anomaly data;
the method for obtaining the weight of each real interval range according to the similarity of each virtual interval range, and obtaining the optimized K value of each real interval range by optimizing the initial K value by using the weight of each real interval range comprises the following specific steps:
obtaining a virtual interval range in each real interval range of each type of sales data sequence of each sales location, averaging the similarity of all virtual interval ranges in each real interval range of each type of sales data sequence of each sales location, and then taking the reciprocal to obtain the weight of each real interval range of each type of sales data sequence of each sales location;
the optimized K value of each real interval range is obtained according to the weight value of each real interval range and is as follows:
wherein,an initial K value, # representing each real interval range of each type of sales data sequence for each sales location>Weights for each real interval range of each type of sales data sequence representing each sales location, +.>An optimized K value representing each real interval range of each type of sales data sequence for each sales location.
2. The system for optimizing and managing enterprise information data based on the internet of things according to claim 1, wherein the method for obtaining a plurality of change nodes of each type of sales data sequence of each sales location according to the fluctuation consistency of data in each area in the sales data sequence comprises the following specific steps:
for each type of sales data sequence of any sales place, presetting a reference size C, taking first sales data of the sales data sequence as a first datum point, acquiring a change node of the first datum point, acquiring a position sequence W of the change node of the first datum point in the sales data sequence, recording the accumulation sum of the position sequence of the change node of the first datum point and the reference size C as a cut-off mark value of the change node of the first datum point, comparing the cut-off mark value of the change node of the first datum point with the length of the sales data sequence, and ending the cycle when the cut-off mark value is larger than the length of the sales data sequence; when the cut-off mark value is smaller than or equal to the length of the sales data sequence, the change node of the first datum point is used as a second datum point, the change node of the second datum point is obtained, the position sequence W of the change node of the second datum point in the sales data sequence is obtained, the sum of the position sequence of the change node of the second datum point and the reference size C is recorded as the cut-off mark value of the change node of the second datum point, the cut-off mark value of the change node of the second datum point is compared with the length of the sales data sequence, when the cut-off mark value is larger than the length of the sales data sequence, the cycle is ended, and the like until the cycle is ended.
3. The system for optimizing and managing enterprise information data based on the internet of things according to claim 2, wherein the method for obtaining the change node of the first reference point comprises the following specific steps:
setting a first window with a size of 1*C, aligning the left side of the first window with a first datum point, acquiring data in the first window in a sales data sequence, taking the last data in the first window as a first candidate change node, acquiring the preference degree of the first candidate change node according to the first window, comparing the preference degree of the first candidate change node with a preset preference threshold Y1, taking the first candidate change node as a change node when the preference degree of the first candidate change node is greater than or equal to the preset preference threshold, setting a second window with a size of 1 x (C+1) when the preference degree of the first candidate change node is less than the preset preference threshold, aligning the left side of the second window with the first datum point, acquiring data in the second window in the sales data sequence, taking the last data in the second window as a second candidate change node, and acquiring the preference degree of the second candidate change node according to the second window;
similarly, when the preference degree of the n-1 candidate change node is smaller than a preset preference threshold, setting an n window with the size of 1 (C+n-1), aligning the left side of the n window with the first datum point, acquiring data in the n window in a sales data sequence, taking the last data in the n window as the n candidate change node, and obtaining the preference degree of the n candidate change node according to the n window until the preference degree of the candidate change node is larger than or equal to the preset preference threshold, ending to obtain the change node of the first datum point.
4. The system for optimizing and managing enterprise information data based on the internet of things according to claim 3, wherein the obtaining the preference degree of the first candidate change node according to the first window comprises the following specific methods:
wherein,representing the variance of all sales data before the ith sales data in the first window, +.>Representing the variance of all sales data before the i-1 st sales data in the first window, H representing the number of sales data in the first window, +.>Representing a preset reference dimension->() Representing a linear normalization process,/->Indicating the degree of preference of the first candidate change node.
5. The system for optimizing and managing enterprise information data based on the internet of things according to claim 1, wherein the obtaining the confidence level of each change node according to the time interval between each change node in each type of sales data sequence of each sales location and the change nodes of other sales data sequences, and obtaining the plurality of segment nodes of each type of sales data sequence of each sales location according to the confidence level of each change node comprises the following specific methods:
for any sales location, the kth variant in the jth type sales data sequenceThe chemical node is marked asAcquisition and change node in sales data sequence of type z->The change node with the nearest time interval is marked as change node +.>Reference node +.>Acquiring a reference node of each change node on each type;
the method for obtaining the confidence level of each node according to the reference node of each change node on each type comprises the following steps:
wherein,representing the number of types of sales data, exp () representing an exponential function based on a natural constant,/->Representing the confidence level of the kth variation node in the jth type of sales data sequence for each sales location;
and taking the change node with the confidence degree larger than the preset confidence threshold value as a segmentation node.
6. The system for optimizing and managing enterprise information data based on the internet of things according to claim 1, wherein the method for obtaining the initial K value of each real interval range according to the fluctuation condition of the data in each real interval range comprises the following specific steps:
wherein,mth sales data in each real section range in each type of sales data sequence representing each sales location, +.>Mean value of all sales data in each real section range in each type of sales data sequence representing each sales location, +.>Representing the number of sales data in each real section range in each type of sales data sequence for each sales location, +.>Representing preset adjustment parameters->Representing a preset reference value,/-, and>an initial K value, # representing each real interval range in each type of sales data sequence for each sales location>Representing rounding up symbols.
7. The system for optimizing and managing enterprise information data based on the internet of things according to claim 1, wherein the method for obtaining the plurality of virtual nodes of each type of sales data sequence of each sales location comprises the following specific steps:
acquiring the positions of the segment nodes in each type of sales data sequence of each sales position, and forming a node position set from the positions of all segment nodes in all types of sales data sequences of all sales positions; each element in the node position set is called a node position;
the sales data at each node location is obtained in each type of sales data sequence for each sales location, and recorded as a virtual node of each type of sales data sequence for each sales location.
8. The system for optimizing and managing enterprise information data based on the internet of things according to claim 1, wherein the method for obtaining the similarity of each virtual interval range according to the similarity of the fluctuation condition of each virtual interval range of each sales data sequence of each sales location and the fluctuation condition of the virtual interval ranges of other sales data sequences of other sales locations comprises the following specific steps:
wherein,the(s) th sales data within each virtual section of the (b) th type sales data sequence representing the (a) th sales location>Mean value of the s-th sales data within each virtual interval of the b-th type sales data sequence representing all sales locations, +.>Mean value of the s-th sales data within each virtual interval of all types of sales data sequences representing the a-th sales location, +.>Indicates the number of sales locations, +.>Representing the number of types of sales data +.>The number of sales data within each virtual interval of the sales data sequence of type b representing the a-th sales location,/->Similarity of each virtual section range of sales data sequences of type b representing the a-th sales location,/for each virtual section range>Representing a continuous multiplication symbol.
9. The system for optimizing and managing enterprise information data based on the internet of things according to claim 1, wherein the method for performing anomaly detection on data in each type of sales data sequence of each sales location according to the optimized K value of each real interval range to obtain anomaly data comprises the following specific steps:
taking the optimized K value of each real interval range of each type of sales data sequence of each sales location as the optimized K value of each sales data in each type of sales data sequence of each sales location;
the optimized K value of each sales data is used as a K value, and the LOF algorithm is utilized to process the sales data in each type of sales data sequence of each sales position, so as to obtain an outlier set and an aggregation point set;
sales data in the outlier set is determined to be outlier data.
CN202311506793.XA 2023-11-14 2023-11-14 Enterprise information data optimization management system based on Internet of things Active CN117235651B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311506793.XA CN117235651B (en) 2023-11-14 2023-11-14 Enterprise information data optimization management system based on Internet of things

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311506793.XA CN117235651B (en) 2023-11-14 2023-11-14 Enterprise information data optimization management system based on Internet of things

Publications (2)

Publication Number Publication Date
CN117235651A CN117235651A (en) 2023-12-15
CN117235651B true CN117235651B (en) 2024-02-02

Family

ID=89088410

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311506793.XA Active CN117235651B (en) 2023-11-14 2023-11-14 Enterprise information data optimization management system based on Internet of things

Country Status (1)

Country Link
CN (1) CN117235651B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115577275A (en) * 2022-11-11 2023-01-06 山东产业技术研究院智能计算研究院 Time sequence data anomaly monitoring system and method based on LOF and isolated forest
CN116089846A (en) * 2023-04-03 2023-05-09 北京智蚁杨帆科技有限公司 New energy settlement data anomaly detection and early warning method based on data clustering
CN116644373A (en) * 2023-07-27 2023-08-25 深圳恒邦新创科技有限公司 Automobile flow data analysis management system based on artificial intelligence
WO2023174002A1 (en) * 2022-03-18 2023-09-21 华为技术有限公司 System monitoring method and apparatus
CN116873156A (en) * 2023-09-05 2023-10-13 山东航宇游艇发展有限公司 Intelligent monitoring method for power abnormality of natural gas ship based on big data
CN116957634A (en) * 2023-09-19 2023-10-27 贵昌集团有限公司 Information intelligent acquisition processing method for electronic commerce platform

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR102246303B1 (en) * 2021-03-04 2021-04-29 한국과학기술원 Real-time outlier detection method and apparatus in multidimensional data stream

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023174002A1 (en) * 2022-03-18 2023-09-21 华为技术有限公司 System monitoring method and apparatus
CN115577275A (en) * 2022-11-11 2023-01-06 山东产业技术研究院智能计算研究院 Time sequence data anomaly monitoring system and method based on LOF and isolated forest
CN116089846A (en) * 2023-04-03 2023-05-09 北京智蚁杨帆科技有限公司 New energy settlement data anomaly detection and early warning method based on data clustering
CN116644373A (en) * 2023-07-27 2023-08-25 深圳恒邦新创科技有限公司 Automobile flow data analysis management system based on artificial intelligence
CN116873156A (en) * 2023-09-05 2023-10-13 山东航宇游艇发展有限公司 Intelligent monitoring method for power abnormality of natural gas ship based on big data
CN116957634A (en) * 2023-09-19 2023-10-27 贵昌集团有限公司 Information intelligent acquisition processing method for electronic commerce platform

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
一种改进的LOF异常点检测算法;周鹏等;计算机技术与发展(12);全文 *

Also Published As

Publication number Publication date
CN117235651A (en) 2023-12-15

Similar Documents

Publication Publication Date Title
CN110751371B (en) Commodity inventory risk early warning method and system based on statistical four-bit distance and computer readable storage medium
CN107092582B (en) Online abnormal value detection and confidence evaluation method based on residual posterior
US10248528B2 (en) System monitoring method and apparatus
CN112115306B (en) Method and system for performing automatic root cause analysis of anomalous events in high dimensional sensor data
CN109727446B (en) Method for identifying and processing abnormal value of electricity consumption data
CN115577275A (en) Time sequence data anomaly monitoring system and method based on LOF and isolated forest
CN111931834B (en) Method, equipment and storage medium for detecting abnormal flow data in extrusion process of aluminum profile based on isolated forest algorithm
CN107682319A (en) A kind of method of data flow anomaly detection and multiple-authentication based on enhanced angle Outlier factor
CN116383190B (en) Intelligent cleaning method and system for massive financial transaction big data
CN108664603B (en) Method and device for repairing abnormal aggregation value of time sequence data
US20160217193A1 (en) Database and method for evaluating data therefrom
WO2021017665A1 (en) Methods, devices and computer storage media for anomaly detection
CN110569890A (en) Hydrological data abnormal mode detection method based on similarity measurement
Guo et al. Detecting X-outliers in load curve data in power systems
Chuanfei et al. Complex event detection in probabilistic stream
CN113723452A (en) Large-scale anomaly detection system based on KPI clustering
CN113225209A (en) Network monitoring real-time early warning method based on time series similarity retrieval
Nichiforov et al. Information extraction approach for energy time series modelling
CN115659411A (en) Method and device for data analysis
CN114881167A (en) Abnormality detection method, abnormality detection device, electronic apparatus, and medium
CN117235651B (en) Enterprise information data optimization management system based on Internet of things
Xu et al. Extremecast: Boosting extreme value prediction for global weather forecast
CN110874601B (en) Method for identifying running state of equipment, state identification model training method and device
Ahmed et al. Scaling up for high dimensional and high speed data streams: HSDStream
CN111080118B (en) Quality evaluation method and system for new energy grid-connected data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant