CN117235651B

CN117235651B - Enterprise information data optimization management system based on Internet of things

Info

Publication number: CN117235651B
Application number: CN202311506793.XA
Authority: CN
Inventors: 唐宁; 罗志明; 邱少明; 李正华
Original assignee: Hunan Jingtai Information System Co ltd
Current assignee: Hunan Jingtai Information System Co ltd
Priority date: 2023-11-14
Filing date: 2023-11-14
Publication date: 2024-02-02
Anticipated expiration: 2043-11-14
Also published as: CN117235651A

Abstract

The invention relates to the technical field of data processing, in particular to an enterprise information data optimization management system based on the Internet of things, which comprises the following steps: the system comprises a sales data acquisition module, a segmented node acquisition module, an optimization K value acquisition module and an anomaly detection module, wherein the sales data acquisition module, the segmented node acquisition module, the optimization K value acquisition module and the anomaly detection module are used for acquiring each type of sales data sequence of each sales position; acquiring a plurality of segment nodes of each type of sales data sequence of each sales location; obtaining a plurality of real interval ranges according to the segmentation nodes, calculating an initial K value and a weight value of each interval range, and optimizing the K value of each real interval range according to the weight value and the initial K value of each real interval range; abnormal detection is carried out according to the optimized K value of each real interval range to obtain abnormal data, and the accuracy of the abnormal detection can be effectively improved through the self-adaptive K value.

Description

Enterprise information data optimization management system based on Internet of things

Technical Field

The invention relates to the technical field of data processing, in particular to an enterprise information data optimization management system based on the Internet of things.

Background

Conventional enterprise information data management methods often fail to efficiently process large-scale, high-frequency, multi-source data streams from internet of things devices. Taking sales data in enterprise information data as an example, sales data is critical to the business and decision of an enterprise, but in an internet of things environment, the volume and complexity of sales data are increasing, and an enterprise may face abnormal situations in sales data, such as abnormal orders, supply chain problems, fraudulent transactions, or inventory management problems, which may negatively affect the business and profits of an enterprise. Therefore, abnormality detection of sales data is required.

The conventional LOF algorithm is a method for detecting outliers in a data set, the algorithm achieves outlier detection by analyzing the difference between the distribution density of each data and the distribution density of other data in a local neighborhood, and the K value in the LOF algorithm is used to define the size of the local neighborhood, so that the reasonability of the K value setting of the LOF algorithm affects the accuracy of the algorithm outlier detection.

Because sales data has multiple types, not all types of sales data are fixed in variation, namely, some types of sales data are distributed densely, some types of sales data are distributed sparsely, and abnormality detection is carried out on the data by using a fixed K value, so that the accuracy of abnormality detection can be influenced.

Disclosure of Invention

The invention provides an enterprise information data optimization management system based on the Internet of things, which aims to solve the existing problems: how to accurately detect abnormal data in sales data.

The enterprise information data optimization management system based on the Internet of things adopts the following technical scheme:

the embodiment of the invention provides an enterprise information data optimization management system based on the Internet of things, which comprises the following modules:

the sales data acquisition module is used for acquiring each type of sales data sequence of each sales position in the enterprise information database;

the segmented node acquisition module is used for obtaining a plurality of changed nodes of each type of sales data sequence of each sales position according to the fluctuation consistency condition of the data in each area in the sales data sequence, obtaining the confidence degree of each changed node according to the time interval between each changed node in each type of sales data sequence of each sales position and the changed nodes of other sales data sequences, and obtaining a plurality of segmented nodes of each type of sales data sequence of each sales position according to the confidence degree of each changed node;

the optimizing K value acquisition module is used for dividing each type of sales data sequence of each sales position into a plurality of real interval ranges by utilizing the segmentation nodes, and obtaining an initial K value of each real interval range according to the fluctuation condition of data in each real interval range; acquiring a plurality of virtual nodes of each type of sales data sequence of each sales location, and dividing each type of sales data sequence of each sales location into a plurality of virtual interval ranges by using the virtual nodes; obtaining the similarity of each virtual interval range according to the similarity of the fluctuation condition of each virtual interval range of each type of sales data sequence of each sales location and the fluctuation condition of the virtual interval ranges of other types of sales data sequences of other sales locations, obtaining the weight of each real interval range according to the similarity of each virtual interval range, and optimizing the initial K value according to the weight of each real interval range to obtain the optimized K value of each real interval range;

the anomaly detection module is used for carrying out anomaly detection on the data in each type of sales data sequence of each sales position according to the optimized K value of each real interval range to obtain anomaly data.

Preferably, the method for obtaining a plurality of change nodes of each type of sales data sequence of each sales location according to the fluctuation consistency of data in each area in the sales data sequence includes the following specific steps:

for each type of sales data sequence of any sales place, presetting a reference size C, taking first sales data of the sales data sequence as a first datum point, acquiring a change node of the first datum point, acquiring a position sequence W of the change node of the first datum point in the sales data sequence, recording the accumulation sum of the position sequence of the change node of the first datum point and the reference size C as a cut-off mark value of the change node of the first datum point, comparing the cut-off mark value of the change node of the first datum point with the length of the sales data sequence, and ending the cycle when the cut-off mark value is larger than the length of the sales data sequence; when the cut-off mark value is smaller than or equal to the length of the sales data sequence, the change node of the first datum point is used as a second datum point, the change node of the second datum point is obtained, the position sequence W of the change node of the second datum point in the sales data sequence is obtained, the sum of the position sequence of the change node of the second datum point and the reference size C is recorded as the cut-off mark value of the change node of the second datum point, the cut-off mark value of the change node of the second datum point is compared with the length of the sales data sequence, when the cut-off mark value is larger than the length of the sales data sequence, the cycle is ended, and the like until the cycle is ended.

Preferably, the method for obtaining the change node of the first reference point includes the following specific steps:

setting a first window with a size of 1*C, aligning the left side of the first window with a first datum point, acquiring data in the first window in a sales data sequence, taking the last data in the first window as a first candidate change node, acquiring the preference degree of the first candidate change node according to the first window, comparing the preference degree of the first candidate change node with a preset preference threshold Y1, taking the first candidate change node as a change node when the preference degree of the first candidate change node is greater than or equal to the preset preference threshold, setting a second window with a size of 1 x (C+1) when the preference degree of the first candidate change node is less than the preset preference threshold, aligning the left side of the second window with the first datum point, acquiring data in the second window in the sales data sequence, taking the last data in the second window as a second candidate change node, and acquiring the preference degree of the second candidate change node according to the second window;

similarly, when the preference degree of the n-1 candidate change node is smaller than a preset preference threshold, setting an n window with the size of 1 (C+n-1), aligning the left side of the n window with the first datum point, acquiring data in the n window in a sales data sequence, taking the last data in the n window as the n candidate change node, and obtaining the preference degree of the n candidate change node according to the n window until the preference degree of the candidate change node is larger than or equal to the preset preference threshold, ending to obtain the change node of the first datum point.

Preferably, the obtaining the preference degree of the first candidate change node according to the first window includes the following specific methods:

wherein,representing the variance of all sales data before the ith sales data in the first window, +.>Representing the variance of all sales data before the i-1 st sales data in the first window, H representing the number of sales data in the first window, +.>Representing a preset reference dimension->() Representing a linear normalization process,/->Indicating the degree of preference of the first candidate change node.

Preferably, the confidence level of each change node is obtained according to the time interval between each change node in each type of sales data sequence of each sales location and the change nodes of other sales data sequences, and a plurality of segment nodes of each type of sales data sequence of each sales location are obtained according to the confidence level of each change node, including the following specific methods:

for any sales location, the kth change node in the jth type of sales data sequence is noted asAcquisition and change node in sales data sequence of type z->The change node with the nearest time interval is marked as change node +.>Reference node +.>Acquiring a reference node of each change node on each type;

the method for obtaining the confidence level of each node according to the reference node of each change node on each type comprises the following steps:

wherein,representing the number of types of sales data, exp () representing an exponential function based on a natural constant,/->Representing the confidence level of the kth variation node in the jth type of sales data sequence for each sales location;

and taking the change node with the confidence degree larger than the preset confidence threshold value as a segmentation node.

Preferably, the obtaining the initial K value of each real interval range according to the fluctuation condition of the data in each real interval range includes the following specific methods:

wherein,mth sales data in each real section range in each type of sales data sequence representing each sales location, +.>Mean value of all sales data in each real section range in each type of sales data sequence representing each sales location, +.>Representing the amount of sales data within each real interval in each type of sales data sequence for each sales location,/>representing preset adjustment parameters->Representing a preset reference value,/-, and>an initial K value, # representing each real interval range in each type of sales data sequence for each sales location>Representing rounding up symbols.

Preferably, the method for obtaining the plurality of virtual nodes of each type of sales data sequence of each sales location includes the following specific steps:

acquiring the positions of the segment nodes in each type of sales data sequence of each sales position, and forming a node position set from the positions of all segment nodes in all types of sales data sequences of all sales positions; each element in the node position set is called a node position;

the sales data at each node location is obtained in each type of sales data sequence for each sales location, and recorded as a virtual node of each type of sales data sequence for each sales location.

Preferably, the method for obtaining the similarity of each virtual interval range according to the similarity of the fluctuation condition of each virtual interval range of each type of sales data sequence of each sales location and the fluctuation condition of the virtual interval ranges of other types of sales data sequences of other sales locations includes the following specific steps:

wherein,type b representing the a sales locationS sales data within each virtual interval of the sales data sequence of (a), +.>Mean value of the s-th sales data within each virtual interval of the b-th type sales data sequence representing all sales locations, +.>Mean value of the s-th sales data within each virtual interval of all types of sales data sequences representing the a-th sales location, +.>Indicates the number of sales locations, +.>Representing the number of types of sales data +.>The number of sales data within each virtual interval of the sales data sequence of type b representing the a-th sales location,/->Similarity of each virtual section range of sales data sequences of type b representing the a-th sales location,/for each virtual section range>Representing a continuous multiplication symbol.

Preferably, the obtaining the weight of each real interval range according to the similarity of each virtual interval range, and optimizing the initial K value according to the weight of each real interval range to obtain the optimized K value of each real interval range, includes the following specific steps:

obtaining a virtual interval range in each real interval range of each type of sales data sequence of each sales location, averaging the similarity of all virtual interval ranges in each real interval range of each type of sales data sequence of each sales location, and then taking the reciprocal to obtain the weight of each real interval range of each type of sales data sequence of each sales location;

the optimized weight of each real interval range is obtained according to the weight of each real interval range:

wherein,an initial K value, # representing each real interval range of each type of sales data sequence for each sales location>Weights for each real interval range of each type of sales data sequence representing each sales location, +.>An optimized K value representing each real interval range of each type of sales data sequence for each sales location.

Preferably, the anomaly detection is performed on the data in each type of sales data sequence of each sales location according to the optimized K value of each real interval range to obtain anomaly data, which comprises the following specific steps:

taking the optimized K value of each real interval range of each type of sales data sequence of each sales location as the optimized K value of each sales data in each type of sales data sequence of each sales location;

the optimized K value of each sales data is used as a K value, and the LOF algorithm is utilized to process the sales data in each type of sales data sequence of each sales position, so as to obtain an outlier set and an aggregation point set;

sales data in the outlier set is determined to be outlier data.

The technical scheme of the invention has the beneficial effects that:

and obtaining each type of sales data sequence of each sales position, and obtaining a plurality of real interval ranges according to the fluctuation consistency of the data of each area in the sales data sequence. By dividing the real interval range, the K value is not required to be set for each sales data, and only the K value is required to be set for each real interval range containing a plurality of data, so that the processing efficiency can be effectively improved. The initial K value is set according to the data fluctuation condition in each real interval range, and the K value can be set for each real interval range according to the data fluctuation condition, so that abnormal data false detection caused by the data fluctuation difference is effectively prevented. The weight of each real interval range is obtained according to the fluctuation consistency of the data in each real interval range and the data in other real interval ranges in the same time period, whether the data fluctuation in the real interval range is caused by abnormal data or data characteristics can be reflected by the weight of the real interval range, the initial K value is corrected by the weight to obtain an optimized K value of each real interval range, poor initial K value setting caused by the fluctuation of the abnormal data can be effectively avoided through the weight correction, and abnormal data missing detection caused by the poor initial K value setting is further effectively avoided. Compared with the traditional anomaly detection algorithm, the method can adaptively set the K value, and further improve the accuracy of anomaly detection.

Drawings

In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a block diagram of an enterprise information data optimization management system based on the internet of things.

Detailed Description

In order to further describe the technical means and effects adopted by the invention to achieve the preset aim, the following detailed description is given below of the specific implementation, structure, characteristics and effects of the enterprise information data optimization management system based on the internet of things according to the invention by combining the accompanying drawings and the preferred embodiment. In the following description, different "one embodiment" or "another embodiment" means that the embodiments are not necessarily the same. Furthermore, the particular features, structures, or characteristics of one or more embodiments may be combined in any suitable manner.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.

The following specifically describes a specific scheme of the enterprise information data optimization management system based on the internet of things provided by the invention with reference to the accompanying drawings.

Referring to fig. 1, a block diagram of an enterprise information data optimization management system based on internet of things according to an embodiment of the present invention is shown, where the system includes the following modules:

a sales data acquisition module 101 for acquiring each type of sales data sequence for each sales site.

It should be noted that, the conventional enterprise information data management method generally cannot effectively process large-scale, high-frequency and multi-source data streams from the internet of things device. Taking sales data in enterprise information data as an example, sales data is critical to the business and decision of an enterprise, in an internet of things environment, the volume and complexity of sales data are increasing, and an enterprise may face abnormal situations in sales data, such as abnormal orders, supply chain problems, fraudulent transactions, or inventory management problems, which may negatively affect the business and profits of the enterprise. Therefore, abnormality detection of sales data is required.

In order to realize the enterprise information data optimization management system based on the internet of things, which is provided by the embodiment, each type of sales data sequence of each sales site needs to be collected, and the specific process is as follows: collecting sales data for each type of each point in time for each sales location in an enterprise information database, the types of sales data including, but not limited to, the following aspects, such as: the sales data of each type of sales locations are arranged in time sequence to obtain each type of sales data sequence of each sales location.

To this end, each type of sales data sequence for each sales site is obtained by the above-described method.

The segmented node obtaining module 102 is configured to obtain a plurality of variable nodes of each type of each sales location according to each type of sales data sequence of each sales location, calculate a confidence level of each variable node, and obtain a plurality of segmented nodes of each type of each sales location according to the confidence level of each variable node.

It should be noted that, the conventional LOF algorithm is a method for detecting outliers in a data set, the algorithm achieves outlier detection by analyzing the difference between the distribution density of each data and the distribution density of other data in a local neighborhood, and the K value in the LOF algorithm is used to define the size of the local neighborhood, so the reasonability of the K value setting of the LOF algorithm affects the accuracy of outlier detection of the algorithm. Because sales data has multiple types, not all types of sales data are fixed in variation, namely, some types of sales data are distributed densely, some types of sales data are distributed sparsely, and abnormality detection is carried out on the data by using a fixed K value, so that the accuracy of abnormality detection can be influenced.

It should be further noted that the distribution density of data is different in each type of sales data sequence of each sales site, wherein the distribution density of data with larger fluctuation is sparse, and the distribution density of data with smaller fluctuation is dense. So that the data with similar fluctuation degree also need the same K value. The sales data sequence is thus divided into sections according to the degree of fluctuation of the data of each section in the sales data sequence.

Specifically, for each type of sales data sequence of any sales site, a reference size C is preset, first sales data of the sales data sequence is used as a first reference point, a change node of the first reference point is obtained, the position order W of the change node of the first reference point in the sales data sequence is obtained, the sum of the position order of the change node of the first reference point and the reference size C is recorded as a cut-off flag value of the change node of the first reference point, the cut-off flag value of the change node of the first reference point is compared with the length of the sales data sequence, and when the cut-off flag value is larger than the length of the sales data sequence, the cycle is ended. When the cut-off mark value is smaller than or equal to the length of the sales data sequence, the change node of the first datum point is used as a second datum point, the change node of the second datum point is obtained, the position sequence W of the change node of the second datum point in the sales data sequence is obtained, the sum of the position sequence of the change node of the second datum point and the reference size C is recorded as the cut-off mark value of the change node of the second datum point, the cut-off mark value of the change node of the second datum point is compared with the length of the sales data sequence, when the cut-off mark value is larger than the length of the sales data sequence, the cycle is ended, and the like until the cycle is ended. In this embodiment, the example of 20 is taken as C, and other values may be taken in other embodiments, and the embodiment is not particularly limited.

And similarly, acquiring a plurality of change nodes of each type of sales data sequence of each sales location.

Further, the method for obtaining the change node of the first reference point comprises the following steps:

setting a first window with a size of 1*C, aligning the left side of the first window with a first datum point, acquiring data in the first window in a sales data sequence, taking the last data in the first window as a first candidate change node, obtaining the preference degree of the first candidate change node according to the first window, comparing the preference degree of the first candidate change node with a preset preference threshold Y1, taking the first candidate change node as a change node when the preference degree of the first candidate change node is greater than or equal to the preset preference threshold, setting a second window with a size of 1 x (C+1) when the preference degree of the first candidate change node is less than the preset preference threshold, aligning the left side of the second window with the first datum point, acquiring data in the second window in the sales data sequence, taking the last data in the second window as a second candidate change node, and obtaining the preference degree of the second candidate change node according to the second window.

Similarly, when the preference degree of the n-1 candidate change node is smaller than a preset preference threshold, setting an n window with the size of 1 (C+n-1), aligning the left side of the n window with the first datum point, acquiring data in the n window in a sales data sequence, taking the last data in the n window as the n candidate change node, and obtaining the preference degree of the n candidate change node according to the n window until the preference degree of the candidate change node is larger than or equal to the preset preference threshold, ending to obtain the change node of the first datum point. In this embodiment, Y1 is taken as an example of 0.7, and other values may be taken in other embodiments, and the embodiment is not particularly limited.

Further, the method for obtaining the preference degree of the first candidate change node according to the first window comprises the following steps:

wherein,representing the variance of all sales data before the ith sales data in the first window, which reflects the fluctuation of sales data before the ith sales data in the first window,/>Representing the variance of all sales data before the (i-1) th sales data in the first window, which reflects the fluctuation of sales data before the (i-1) th sales data in the first window,/day>Reflecting the difference between the fluctuation condition of sales data before the ith sales data and the fluctuation condition of sales data before the ith-1 sales data in the first window, wherein the larger the value is, the smaller the fluctuation consistency of the data is, namely, the larger the distribution density difference of the data is, and H represents the quantity of sales data in the first window>Indicating the preset reference size of the device,() Representing a linear normalization process,/->Indicating the degree of preference of the first candidate change node.

Further, the method for calculating the confidence level of each change node comprises the following steps:

for any sales location, the kth change node in the jth type of sales data sequence is noted asAcquiring and changing nodes in sales data sequences of type z>The change node with the nearest time interval is marked as change node +.>Reference node +.>. And similarly, acquiring a reference node of each change node.

The method for obtaining the confidence level of each node according to the reference node of each change node comprises the following steps:

wherein,representing the number of types of sales data, exp () represents an exponential function based on a natural constant,reflecting each sales locationThe distance between the kth change node in the j-th type sales data sequence and the reference node in other types of sales data sequences is that when the change node is used as a segment node, the type sales data sequence is segmented, the segment difference between the change node and the reference node is reflected, and the greater the value is, the less likely the change node is the segment node, namely the smaller the confidence degree of the change node is>Representing the confidence level of the kth variant node in the jth type of sales data sequence for each sales location.

Further, a change node with the confidence degree larger than the preset confidence threshold Y2 is used as a segmentation node. In this embodiment, Y2 is taken as an example of 0.8, and other values may be taken in other embodiments, and the embodiment is not particularly limited.

Thus, the segment nodes in each type of sales data sequence of each sales location are obtained, and the data of fluctuation consistency can be divided into a range through the segment nodes.

An optimized K value obtaining module 103, configured to obtain, according to each type of the plurality of segment nodes of each sales location, each type of the plurality of real section ranges of each sales location, calculate an initial K value of each real section range, obtain, according to all types of the plurality of segment nodes of all sales locations, each type of the plurality of virtual section ranges of each sales location, calculate a similarity degree of each virtual section range, obtain, according to the similarity degree of each virtual section range, a weight value of each type of each real section range of each sales location, and obtain, according to the initial K value and the weight value, an optimized K value of each type of each real section range of each sales location.

Specifically, each type of sales data sequence of each sales location is segmented into several real interval ranges by means of segmentation nodes.

It should be noted that, fluctuation consistency of sales data in each real interval range is larger, so that K values of sales data in each real interval range are the same, and thus, the K value is not required to be set for each sales data, only the K value is required to be set for each real interval range, and the calculation efficiency is effectively reduced.

It should be further noted that, since sales data itself has a fluctuation difference, if abnormality determination is directly performed on the data according to the data density condition in the fixed K-neighborhood, the determination accuracy is low. Some sales data are not abnormal data, and the fluctuation is large, so that in order to reduce abnormal misjudgment, the distribution density of more data is combined to judge the abnormality.

Further, the initial K value calculation method of each real interval range is as follows:

wherein,mth sales data in each real section range in each type of sales data sequence representing each sales location, +.>Mean value of all sales data in each real section range in each type of sales data sequence representing each sales location, +.>Representing the number of sales data in each real section range in each type of sales data sequence for each sales location, +.>Reflecting the fluctuation of the data in the range of each real section in each type of sales data sequence of each sales location, the larger the value is, the larger the K value is needed, namely, the more neighborhood data is needed to be referenced to judge the abnormality of the data in the range, the more the neighborhood data is needed to be>Representing presetsRegulating parameters (I)>Indicating a preset reference value. />Initial K value representing each real section range in each type of sales data sequence for each sales location, the present embodiment is expressed in +.>0.5 part,Taking 10 as an example for description, other embodiments can take other values, the embodiment is not particularly limited, and ∈ ->Representing rounding up symbols.

It should be noted that the detection accuracy is small by setting the K value of each real fluctuation range only according to the fluctuation degree of each real fluctuation range. Since the degree of fluctuation is likely to be abnormal data, a large set K value for the abnormal data may cause abnormal data omission. Thus, to prevent this problem, the initial K value needs to be adjusted.

It should be further noted that the large fluctuation of sales data is generally due to external interference, and the external interference does not generally cause large fluctuation of sales data of one type of sales location, which should cause large fluctuation of sales data of a plurality of types of sales locations, and the fluctuation has similar fluctuation consistency. The initial K value can be corrected by analyzing the fluctuation consistency of the sales data of various types of sales locations within each real section.

Further, the positions of the segment nodes in each type of sales data sequence of each sales location are obtained, and the positions of all segment nodes in all types of sales data sequences of all sales locations are formed into a node position set. Each element in the set of node locations is referred to as a node location.

And dividing each type of sales data sequence of each sales location into a plurality of virtual interval ranges by using all virtual nodes.

The method for obtaining the similarity of each virtual interval range comprises the following steps:

wherein,the(s) th sales data within each virtual section of the (b) th type sales data sequence representing the (a) th sales location>Mean value of the s-th sales data within each virtual interval of the b-th type sales data sequence representing all sales locations, +.>Mean value of the s-th sales data within each virtual interval of all types of sales data sequences representing the a-th sales location, +.>Indicates the number of sales locations, +.>Representing the number of types of sales data +.>The number of sales data within each virtual interval of the sales data sequence of the type b representing the a sales location. />A similarity of each virtual span range of the sales data sequences of the b-th type representing the a-th sales location. />Representing a continuous multiplication symbol.

It should be noted that, the similarity calculation method refers to a pearson correlation coefficient calculation method, and the pearson correlation coefficient calculation method is the prior art, and is not described herein.

Further, the method for obtaining the weight of each real interval range according to the similarity of each virtual interval range comprises the following steps:

and obtaining a virtual interval range in each real interval range of each type of sales data sequence of each sales location, averaging the similarity of all virtual interval ranges in each real interval range of each type of sales data sequence of each sales location, and then taking the reciprocal to obtain the weight of each real interval range of each type of sales data sequence of each sales location.

The weight of each real interval range is obtained, and the initial K value of each real interval range is corrected by the weight.

Further, the optimized K value for each real interval range is:

The anomaly detection module 104 is configured to perform anomaly detection on each type of sales data sequence of each sales location according to the optimized K value of each real interval range to obtain anomaly data.

Specifically, the optimized K value of each real section range of each type of sales data sequence of each sales location is taken as the optimized K value of each sales data in each type of sales data sequence of each sales location.

And processing the sales data in each type of sales data sequence of each sales position by using an LOF algorithm by taking the optimized K value of each sales data as the K value to obtain an outlier set and a collection point set.

Sales data in the outlier set is determined to be outlier data.

The above description is only of the preferred embodiments of the present invention and is not intended to limit the invention, but any modifications, equivalent substitutions, improvements, etc. within the principles of the present invention should be included in the scope of the present invention.

Claims

1. The enterprise information data optimization management system based on the Internet of things is characterized by comprising the following modules:

the anomaly detection module is used for carrying out anomaly detection on data in each type of sales data sequence of each sales position according to the optimized K value of each real interval range to obtain anomaly data;

the method for obtaining the weight of each real interval range according to the similarity of each virtual interval range, and obtaining the optimized K value of each real interval range by optimizing the initial K value by using the weight of each real interval range comprises the following specific steps:

the optimized K value of each real interval range is obtained according to the weight value of each real interval range and is as follows:

2. The system for optimizing and managing enterprise information data based on the internet of things according to claim 1, wherein the method for obtaining a plurality of change nodes of each type of sales data sequence of each sales location according to the fluctuation consistency of data in each area in the sales data sequence comprises the following specific steps:

3. The system for optimizing and managing enterprise information data based on the internet of things according to claim 2, wherein the method for obtaining the change node of the first reference point comprises the following specific steps:

4. The system for optimizing and managing enterprise information data based on the internet of things according to claim 3, wherein the obtaining the preference degree of the first candidate change node according to the first window comprises the following specific methods:

5. The system for optimizing and managing enterprise information data based on the internet of things according to claim 1, wherein the obtaining the confidence level of each change node according to the time interval between each change node in each type of sales data sequence of each sales location and the change nodes of other sales data sequences, and obtaining the plurality of segment nodes of each type of sales data sequence of each sales location according to the confidence level of each change node comprises the following specific methods:

for any sales location, the kth variant in the jth type sales data sequenceThe chemical node is marked asAcquisition and change node in sales data sequence of type z->The change node with the nearest time interval is marked as change node +.>Reference node +.>Acquiring a reference node of each change node on each type;

6. The system for optimizing and managing enterprise information data based on the internet of things according to claim 1, wherein the method for obtaining the initial K value of each real interval range according to the fluctuation condition of the data in each real interval range comprises the following specific steps:

wherein,mth sales data in each real section range in each type of sales data sequence representing each sales location, +.>Mean value of all sales data in each real section range in each type of sales data sequence representing each sales location, +.>Representing the number of sales data in each real section range in each type of sales data sequence for each sales location, +.>Representing preset adjustment parameters->Representing a preset reference value,/-, and>an initial K value, # representing each real interval range in each type of sales data sequence for each sales location>Representing rounding up symbols.

7. The system for optimizing and managing enterprise information data based on the internet of things according to claim 1, wherein the method for obtaining the plurality of virtual nodes of each type of sales data sequence of each sales location comprises the following specific steps:

8. The system for optimizing and managing enterprise information data based on the internet of things according to claim 1, wherein the method for obtaining the similarity of each virtual interval range according to the similarity of the fluctuation condition of each virtual interval range of each sales data sequence of each sales location and the fluctuation condition of the virtual interval ranges of other sales data sequences of other sales locations comprises the following specific steps:

wherein,the(s) th sales data within each virtual section of the (b) th type sales data sequence representing the (a) th sales location>Mean value of the s-th sales data within each virtual interval of the b-th type sales data sequence representing all sales locations, +.>Mean value of the s-th sales data within each virtual interval of all types of sales data sequences representing the a-th sales location, +.>Indicates the number of sales locations, +.>Representing the number of types of sales data +.>The number of sales data within each virtual interval of the sales data sequence of type b representing the a-th sales location,/->Similarity of each virtual section range of sales data sequences of type b representing the a-th sales location,/for each virtual section range>Representing a continuous multiplication symbol.

9. The system for optimizing and managing enterprise information data based on the internet of things according to claim 1, wherein the method for performing anomaly detection on data in each type of sales data sequence of each sales location according to the optimized K value of each real interval range to obtain anomaly data comprises the following specific steps:

sales data in the outlier set is determined to be outlier data.