CN116467653A

CN116467653A - Loom abnormal data processing method based on probability distribution and XGBoost decision algorithm

Info

Publication number: CN116467653A
Application number: CN202310345641.XA
Authority: CN
Inventors: 戴宁; 徐开心; 胡旭东; 沈春娅; 袁嫣红; 向忠; 汝欣
Original assignee: Zhejiang Sci Tech University ZSTU
Current assignee: Zhejiang Sci Tech University ZSTU
Priority date: 2023-04-03
Filing date: 2023-04-03
Publication date: 2023-07-21

Abstract

The invention discloses a loom abnormal data processing method based on probability distribution and XGBoost decision algorithm, which belongs to the field of intelligent manufacturing of weaving workshops in textile industry, and comprises the following steps: collecting original data of a loom at fixed time; calculating a variation difference between each adjacent data point; calculating adaptive regression reference thresholds under different time windows; updating the trusted intervals at different moments according to the self-adaptive regression reference threshold; determining data points with data values outside the range of the trusted interval as abnormal data; constructing a Bayesian network abnormal data identification model based on probability distribution; determining loom parameters which cause abnormal data points to generate data abnormality through a Bayesian network abnormal data identification model based on probability distribution; constructing a loom missing data repair model based on an XGBoost decision method; training a loom missing data repair model based on an XGBoost decision method; and repairing the abnormal data points through a trained loom missing data repair model based on the XGBoost decision method.

Description

Loom abnormal data processing method based on probability distribution and XGBoost decision algorithm

Technical Field

The invention belongs to the field of intelligent manufacturing of weaving workshops in the textile industry, and particularly relates to a loom abnormal data processing method based on probability distribution and XGBoost decision algorithm.

Background

The intelligent manufacturing of textile industry in China has also made remarkable progress. Based on the Internet of things, the realization of interconnection and interworking of high-quality data of textiles is a precondition of a series of intelligent manufacturing such as data analysis, intelligent scheduling and the like in the industry. Therefore, analysis and research on the generation cause and the existence form of the textile abnormal data are required, and then an abnormal data processing method in the laminating textile production scene is formulated. In the textile equipment, the technological process of the loom in the weaving process is complex, the generated operation information amount is huge, and the production condition of the loom is used as the last processing equipment for forming the textile cloth and directly affects the final quality of the textile cloth. Therefore, the method realizes the processing of the abnormal data of the loom, improves the accuracy of the weaving data, and has important significance for realizing the high-quality development of intelligent manufacturing of the textile.

At present, in the processing of abnormal data, the method mainly comprises the steps of cleaning and repairing the abnormal data, and identifying and cleaning the abnormal data by analyzing the characteristics of the abnormal data in the industry. Due to the complex textile production environment, network communication faults, hardware damage of measuring sensors and other factors, a large amount of abnormal and outlier data are doped in the textile big data, so that the availability of data loss is low, production information is inaccurate, and finally the problem of influencing the quality of woven cloth is not effectively solved.

Disclosure of Invention

The embodiment of the invention aims to provide a loom abnormal data processing method based on probability distribution and XGBoost decision algorithm, which can solve the technical problems that due to complex textile production environment, network communication faults, damage to measuring sensor hardware and other factors, a large amount of abnormal and outlier data are doped in textile big data, so that the availability of data loss is low, production information is inaccurate, and finally the number of times of woven cloth is influenced.

In order to solve the technical problems, the invention is realized as follows:

the embodiment of the invention provides a loom abnormal data processing method based on probability distribution and XGBoost decision algorithm, which comprises the following steps:

s101: collecting original data of a loom at fixed time;

s102: calculating a variation difference between each adjacent data point;

s103: calculating adaptive regression reference thresholds under different time windows;

s104: updating the trusted intervals at different moments according to the self-adaptive regression reference threshold;

s105: determining data points with data values outside the range of the trusted interval as data points;

s106: constructing a Bayesian network abnormal data identification model based on probability distribution;

s107: determining loom parameters which cause abnormal data points to generate data abnormality through a Bayesian network abnormal data identification model based on probability distribution;

s108: constructing a loom missing data repair model based on an XGBoost decision method;

s109: training a loom missing data repair model based on an XGBoost decision method;

s110: and repairing the abnormal data points through a trained loom missing data repair model based on the XGBoost decision method.

In the embodiment of the invention, firstly, a mode of setting an adaptive regression threshold is adopted to determine the credible interval of each original data change of the loom, and the positioning range of abnormal data is reduced. And then constructing an abnormal data identification model based on probability distribution, further realizing accurate positioning of abnormal data, and determining loom parameters which cause abnormal data points to generate data abnormality through the Bayesian network abnormal data identification model based on probability distribution. And finally, constructing a loom missing data repair model based on an XGBoost decision method, repairing data missing caused by abnormal data by using the XGBoost decision method, filling and repairing the missing data while realizing data cleaning, making up the technical problem of low availability of the missing data in the sampled data, improving the accuracy of loom production information and the quality of finally woven cloth, and greatly improving the production benefit of enterprises.

Drawings

Fig. 1 is a schematic flow chart of a loom abnormal data processing method based on probability distribution and XGBoost decision algorithm provided by the embodiment of the invention.

Fig. 2 is a network relation diagram for identifying abnormal data of a loom according to an embodiment of the present invention.

The achievement of the object, functional features and advantages of the present invention will be further described with reference to the embodiments, referring to the accompanying drawings.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

The embodiment of the invention provides a loom abnormal data processing method based on probability distribution and XGBoost decision algorithm through a specific embodiment and application scene thereof by combining the attached drawings.

Referring to fig. 1, a schematic flow chart of a loom abnormal data processing method based on probability distribution and XGBoost decision algorithm is shown.

s101: and collecting original data of the loom at fixed time.

During the operation of the loom, the problem of abnormal interference data in the original collected information occurs due to abnormal communication, sensor fault, unstable network line signal and the like in the process of collecting the networking data of the loom equipment, so that the reliability of the collected data is lowered, so that the collected original data of the loom contains abnormal data and normal data, and the collected original data provides a data basis for the subsequent processing of the abnormal data. The original data of the loom can be acquired by a person skilled in the art by setting a timing mode by himself, and the timing mode is not limited by the scheme.

TABLE 1

The function of the loom is to interweave the warp yarn of the beam and the weft yarn of the weft yarn tube in the longitudinal and transverse directions to form a cloth. Since the weaving machine has a complicated machining process and a long machining time per beam, the amount of data generated by the weaving machine during operation is enormous. Loom data can be classified into 2 categories: static data and dynamic data. Static data refers to data which can be set and predicted in advance and does not change in a short period, such as equipment attributes, product process information and the like; dynamic data refers to data which changes in real time along with the working state of equipment and the processing technology requirements, such as loom weaving output, beating-up times, loom states, running time, running speed, running efficiency, warp and weft stop times and the like. Among these 2 types of data, dynamic data changes frequently and reflects the overall production of the loom. In the actual operation process of the loom, the variation amplitude of each dynamic data is different, but the dynamic data has time correlation, and the variation of each data is mutually connected. The correlation between the parameters of each loom and time is shown in table 1, the loom output during normal operation of the loom will have a continuous increasing trend along with the accumulation of time, the transition time of the loom speed due to the change of the loom state is shorter, and the collected value will have short discrete low frequency jump.

In the actual operation process, the changes among the data are related. The weaving output and the speed of the loom at unit moment keep synchronous positive correlation change, the output value of the loom should synchronously increase in the running state of the loom, the speed of the loom is larger than 0, and similarly, the output value of the loom in the stopping state should be kept unchanged, and the speed of the loom is equal to 0. The theoretical relation among the weaving yield, the running time, the running speed and the running state is as follows:

wherein M represents the weaving length of the loom, unit cm, T represents the weaving time of the loom, unit min, W represents the weft density parameter value of cloth, namely the number of weft yarns on the length of 1cm of cloth, S represents the running speed of the loom when the loom is weaving, namely the beating-up times of the loom for 1 minute, K represents the working state of the loom at the current moment, wherein 1 is running, and 0 is stopping.

S102: a variance difference between each adjacent data point is calculated.

It should be noted that, because the influence of different factors in the operation process of the loom, the data value between any two adjacent data points in the collected original data cannot be completely the same, by calculating the variation difference value between the adjacent data points, theoretically, if the working conditions of the loom are the same in the continuous operation process, the variation difference value between the adjacent data can be kept within a certain range, so that the position of the abnormal data can be initially positioned by comparing the variation difference value between the adjacent data, and a foundation is laid for the follow-up accurate positioning of the abnormal data.

S103: an adaptive regression reference threshold is calculated for different time windows.

Compared with the prior art, the self-adaptive regression threshold can be updated continuously along with the original data, automatically correct the reference threshold under different time periods, avoid the situation that normal data fluctuation caused by irregular communication faults between the acquisition end and the bottom equipment terminal is misjudged as abnormal data, and improve the positioning accuracy of the abnormal data.

TABLE 2

As shown in table 2, when calculating the adaptive regression reference threshold, the update time of each data value is considered in the calculation to obtain the data change difference sequence per unit time. And then, carrying out moving average on the variation difference value sequence with the time sequence characteristics under the n-dimensional time window, and finally obtaining the self-adaptive regression reference threshold under different time windows.

In one possible implementation, S103 specifically includes:

s1031: acquiring a data change difference sequence in unit time;

s1032: according to the time sequence, calculating a sliding average value of the data change difference value under the current time window as a regression reference threshold under the current time window, wherein the current time window is an n-dimensional time window, and the current window comprises n unit times;

the regression reference threshold is calculated in the following way:

wherein F (i) represents a reference threshold value, x, in an ith time window _k Data value, t, representing parameter at time k _k The data update time of the parameter at time k is indicated, and n indicates the length of the time window.

The n unit times are a time window, the first time window is the 1 st to n unit times, and the second time window is the n+1 to 2n unit times.

S104: and updating the trusted intervals at different moments according to the adaptive regression reference threshold.

In one possible embodiment, the upper and lower limits of the trusted interval are respectively:

wherein H is _t Represents the upper limit of the trusted interval at time t, L _t Representing the lower limit of the trusted interval at time t, y _t-a The legal data value before the time t is represented, and C represents the amplitude coefficient.

It should be noted that, according to the adaptive regression reference threshold, the degree of deviation of different data values from the reference threshold can be determined, the upper and lower limits of the trusted interval can be calculated, the data value within the range of the upper and lower limits of the trusted interval can be determined to be a normal data value, and also be equivalent to the normal data value.

S105: data points with data values outside the range of the trusted interval are determined as abnormal data.

It can be understood that the upper limit and the lower limit of the trusted interval calculated in S104 can be used to determine whether the data value is normal data by determining whether the data value is within the range of the trusted interval, that is, whether the data value not within the range of the trusted interval is abnormal data, and the data point corresponding to the abnormal data value can be determined as abnormal data, that is, the position of the abnormal data is initially located.

S106: and constructing a Bayesian network abnormal data identification model based on probability distribution.

In one possible implementation, S106 specifically includes:

s1061: a sample dataset D is created, the sample dataset D comprising the fabric yield (x ₁ ) Number of beats up (x) ₂ ) Run time (x) ₃ ) Efficiency of operation (x ₄ ) Speed of operation (x) ₅ ) Loom status (x) ₆ ) And data type (x ₇ ) The composition of the sample dataset D is:

the dimension of D is i× 7,i, which represents the number of time-series acquisition points for loom data.

S1062: performing discretization processing on the sample data set by adopting an equal-frequency bin discretization method;

it should be noted that, since the bayesian network has a superior processing performance on discrete data, it is necessary to perform discretization on sample elements of the sample data set D, where the loom state is a binary element reflecting the operation and stoppage of the loom, and the data type is a ternary element representing that the loom data at each moment belongs to a normal data point, a deviated abnormal point, or an inactive abnormal point, and the discretization is not necessary. The 5 types of data of the weaving yield, the beating-up times, the running time, the running efficiency and the running speed are discretized by adopting an equal-frequency box-division discretization method, so that the accurate further positioning of abnormal data is facilitated.

Specifically, the weaving yield (x ₁ ) Number of beats up (x) ₂ ) Run time (x) ₃ ) Efficiency of operation (x ₄ ) And running speed (x) ₅ ) Discretizing.

In one possible implementation, after S1062, further includes:

S1062A: introducing correlation coefficients among all parameters into a Bayesian network, wherein the calculation mode of the correlation coefficients is as follows:

where a, b represent elements for which a degree of correlation needs to be obtained, cov represent a covariance value between the elements, σ represents a standard deviation value between the elements, and ρ (a, b) represents a correlation coefficient between the elements a, b.

The correlation coefficient among the parameters of the loom is the basis for determining the relationship structure of the Bayesian network, and the Pearson correlation function is used for carrying out correlation analysis on each network node so as to determine the relationship structure of the probability distribution network.

S1063: determining a network structure of a Bayesian network anomaly data recognition model based on probability distribution;

the network structure is a Bayesian network, and the relation of the Bayesian network is as follows:

B＝(J,T)

wherein J represents a network structure diagram describing the association relation among elements, and comprises element nodes and a relation pointing line, and T represents a relation data set describing probability distribution among the element nodes in the network.

S1064: according to the prior probability P (x _i I D), i=1, 2, ··,6, training the probability distribution among the sub-nodes, determining the conditional probability distribution P (x) of the evidence nodes and sub-nodes in the loom anomaly recognition network ₇ ＝m _j |x _i )。

S1065: calculating the total probability of each node:

wherein P (x) ₇ ＝m _j ) The combined full probability of the establishment of the conditions of all data types is represented, namely, the result probability under the common influence of six data types, namely, the weaving yield, the beating-up times, the running time, the running efficiency, the running speed and the loom state.

S1066: and determining the data condition type of the loom according to the full probability distribution of each node.

S1067: under the condition that the output result of the child node is abnormal data, the posterior probability of each evidence node is inferred according to the probability distribution of the abnormal data, and finally the loom parameters generating data abnormality are positioned, wherein the posterior probability is calculated in the following way:

wherein P (x) _i |x ₇ ) Representing a parent node x of a child node data type of a known result _i The probability that the condition is established, i.e., the posterior probability.

S1068: the sample data set is input into a Bayesian network abnormal data identification model based on probability distribution for training.

S107: loom parameters that cause abnormal data points to produce data anomalies are determined by a Bayesian network anomaly data recognition model based on probability distributions.

In one possible implementation, after S107, the method further includes:

s111: the loom parameters that caused the data anomalies from the abnormal data points are cleared.

It should be noted that, through the previous positioning of the abnormal data and the positioning of the loom parameters, the accurate position of the abnormal data is finally determined, the determined abnormal data is cleared, the influence of the abnormal data in the subsequent repairing process of the abnormal data point is avoided, and then the cleared missing data is repaired, so as to improve the overall quality of the data.

S108: and constructing a loom missing data repair model based on an XGBoost decision method.

The XGBoost is a decision tree integration algorithm based on limit gradient lifting, the basic idea is to integrate a plurality of base learners by a gradient descent method to gradually reduce residual errors of each repair result of a model and an actual value of a loom, and because loom data has multi-dimensional characteristics, compared with the prior art, the XGBoost decision method fully utilizes the relevance among the data of each dimension of the loom to repair missing bit data, and the plurality of base learners are arranged to repair missing bits gradually to approach the actual value of the loom while insufficient consideration is avoided in the repair process, so that the repair effect is improved. .

In one possible implementation, S108 is specifically:

s1081: constructing a loom missing data repair model based on an XGBoost decision method by taking a regression tree as a base learner, wherein the expression of the loom missing data repair model based on the XGBoost decision method is as follows:

wherein, the liquid crystal display device comprises a liquid crystal display device,representing the loom data restoration result at the moment i, x _i Representing known associated input samples of loom parameters to be repaired at instant i, N represents the number of base learners, f _k Representing the kth base learner.

S109: and training a loom missing data repair model based on an XGBoost decision method.

In one possible implementation, S109 specifically includes:

s1091: and taking normal data points of all parameters of the loom obtained after the abnormality recognition as characteristic samples of a loom missing data repair model based on an XGBoost decision method, constructing a characteristic sample set, and dividing the characteristic sample set into a training set and a test set according to a preset proportion.

The method is characterized in that normal data points of all parameters of the loom obtained after anomaly recognition are used as a characteristic sample set of a driving model, the situation that the model is over-fitted due to high-dimensional sparse characteristics of original data is eliminated, and the data repairing capability of the model can be improved.

Optionally, the preset ratio of training set to test set is 7:3.

s1092: and selecting a regression tree as a base learner of the loom missing data restoration model based on the XGBoost decision method, and setting a target loss function of the loom missing data restoration model based on the XGBoost decision method as a mean square loss regression.

S1093: and (3) adjusting the learning rate, the number of base learners and the regression tree depth of the loom missing data restoration model based on the XGBoost decision method according to the error minimum principle.

It should be noted that, the data restoration problem is a regression problem, when a model base learner (boost) and a target loss function (objective) are selected, a regression tree is selected as the base learner of the loom missing data restoration model, and the target loss function of the loom missing data restoration model is set as a mean square loss regression; adjusting the learning rate (learning_rate), the number of base learners (n_establisher), and the regression tree depth (max_depth) of the model according to the error minimization principle; in order to prevent the excessive fitting of sample data during model training, a random sampling number duty ratio (subsamples) and a random feature number duty ratio (collsample_byte) during the construction of a base learner are set, so that the repair quality is improved, and the repair error is reduced.

S1094: and training the loom missing data repair model based on the XGBoost decision method through a training set.

S1095: and under the condition that the iteration times of the loom missing data repair model based on the XGBoost decision method reach the number of the base learners, outputting an output result of the loom missing data repair model based on the XGBoost decision method.

S1096: and judging whether the output result is between the upper limit and the lower limit of the trusted interval of each corresponding parameter of the loom, if so, entering S1097, otherwise, entering S1098.

S1097: and (3) carrying out abnormal value verification on the output result, judging whether the output result passes the abnormal value verification, and if the output result passes the abnormal value verification, entering S1099, otherwise, entering S1098.

S1098: and (4) adjusting the learning rate, the number of base learners and the regression tree depth, returning to S1094, and retraining the loom missing data repair model based on the XGBoost decision method.

It should be noted that, training and constructing the model according to the initialization parameters, when the number of model training iterations reaches the maximum number of base learners, outputting a model training result and an error, judging whether the training result is included in the range of a trusted interval of each corresponding parameter of the loom, and checking the abnormal value of the result by using the abnormal data recognition method again, if the checking is not passed, adjusting the parameter values of the model learning rate (learning_rate) and the number of base learners (n_establisher), retraining the model until the output result meets the condition of being in the trusted interval, and finishing the training of the loom missing data repair model based on the XGBoost decision method through checking the abnormal value and finishing parameter adjustment.

S1099: sample information related to loom data to be repaired in the test set is randomly input, deviation of a target output result and corresponding loom real parameters is compared, and training of a loom missing data repair model based on the XGBoost decision method is finished under the condition that the deviation of the target output result and the corresponding loom real parameters is within a preset range.

It can be understood that the output result of the previously trained loom missing data repair model based on the XGBoost decision method is truly verified, however, in order to avoid the occurrence of unexpected situations, the test set is further utilized to further verify the loom missing data repair model based on the XGBoost decision method, if the verification result is passed, the loom missing data repair model based on the XGBoost decision method is indicated to have general applicability, and the reliability of the training model are improved through further verification.

It can be understood that the trained loom missing data repair model based on XGBoost decision method is utilized to repair the positioned abnormal data points, the missing positions are filled, the availability of the sampled data of the loom is improved, and the repair success rate of the abnormal data is improved.

In one possible implementation, after S110, the method further includes:

s112: and carrying out reliability verification on the repaired data.

In order to avoid the situation that the repair is inaccurate or does not meet the preset condition after the positioned abnormal data is repaired, the reliability of the repaired data is verified to verify the reliability of the repair effect.

In one possible implementation, S112 is specifically:

s1121: using average absolute error index MAE, root mean square error index RMSE and fitting coefficient index R ² And (3) carrying out reliability verification on the repaired data:

wherein y is _i Representing the actual value of the test sample in the test set,representing the predictive value of the test sample,/->Mean of test samples, MAE, RMSE and R ² The range of values is [0,1]The smaller the MAE and RMSE values, the closer the repair result is to the true value, R ² The closer to 1 the value of (c) is, the higher the repair accuracy of the loom missing data repair model based on the XGBoost decision method on the loom missing data is.

In the actual use process, most of the existing loom equipment serving for textile enterprise weaving production are provided with information transmission interfaces for external data acquisition and communication, such as Toyota, jin field foal, bijiale loom and the like. Taking the data collected by Bijiale OMNIPLUS-340 air jet loom of a textile enterprise in Shijia as an example, the effectiveness of the proposed abnormal data processing method is verified. The existing acquisition equipment mainly adopts an external data acquisition terminal to acquire various production data of the loom. In order to fit the production work and rest law of an actual weaving workshop taking a shift as a period, complete shift data of the loom are collected, the data acquisition frequency is 1 time per minute, the total data collection sets are 720 groups, and each group of samples comprises 7 types of data information of the loom, such as the loom yield, the beating-up times, the running time, the running efficiency, the running speed, the loom state and the abnormal condition.

And defining the credible interval of each parameter change of the loom by adopting a mode of setting an adaptive regression threshold. The method uses parameters of 4 types of loom weaving output, beating-up times, running efficiency and running speed, which have wide data variable range and are easy to be interfered by environment, as verification data of the processing effect of the method. Analysis of the change characteristics of the data shows that: the instantaneous variation amplitude of the weaving output and the beating-up times of the loom under normal conditions is relatively small, and the speed and the efficiency of the loom can be greatly jumped in a short time due to the rapid switching of the running state and the stopping state of the loom. Therefore, in processing the loom speed and efficiency, the setting of the time window length needs to be longer than the window length of the weaving yield and the number of beats-up. However, too long a time window length can reduce the real-time update rate of the base threshold; the window length is too short, the contained historical data is too small, and the change trend of the data is difficult to embody. The figure shows the recognition of abnormal points of various parameters of the loom under different time window lengths.

It was found experimentally that the time window lengths for the loom weaving output, the number of beats up, the running efficiency and the running speed were set to 30, 23, 48 and 57, respectively. After the credible interval of each parameter data change is determined, the number of the abnormal points of the weaving yield is effectively identified by 25, and the identification rate is 32.05%; the number of the abnormal points of beating-up times is 17, and the recognition rate is 21.15%; the number of the abnormal points of the operation efficiency is 10, and the identification rate is 13.33%; the number of the abnormal points of the running vehicle speed is 17, and the identification rate is 26.98%. In addition, by adopting a mode of setting an adaptive regression threshold to define a data change credible area, the overall change trend of loom parameters can be estimated, the positioning range of loom abnormal data is reduced, and the abnormal data can be primarily identified. The method has the advantages that only through defining the abnormal data points identified by the trusted region, most of the abnormal data points are global deviation abnormal points, the local deviation abnormal points and the inactive abnormal data points cannot be effectively identified because the fluctuation amplitude is not obviously included in the trusted region, and the effectiveness of defining the trusted region by introducing the adaptive regression reference threshold and the identification capability of the abnormal data are greatly improved.

TABLE 3 correlation coefficient between parameters

	x ₁	x ₂	x ₃	x ₄	x ₅	x ₆
							x ₁	1.00	0.97	0.96	0.34	0.09	0.08
x ₂	0.97	1.00	0.99	0.35	0.09	0.08
							x ₃	0.96	0.99	1.00	0.34	0.09	0.08
x ₄	0.34	0.35	0.34	1.00	0.11	0.12
							x ₅	0.09	0.09	0.09	0.11	1.00	0.93
x ₆	0.08	0.08	0.08	0.12	0.93	1.00

The setting of the self-adaptive regression threshold can effectively position the global deviation abnormal data, but the recognition effect of the local deviation abnormal data and the inactive abnormal data with unobvious fluctuation amplitude is poor. The Bayesian network abnormal data identification model based on the correlation among parameters is built, and abnormal data occurrence time points and abnormal types are further positioned. Firstly, the correlation of parameters of weaving yield, beating-up times, running time, running efficiency, running speed and weaving state 6 types of the weaving machine is analyzed, and the degree of correlation among the parameters is determined and used as the basis for constructing an abnormal recognition relation network structure. The correlation coefficients between the parameters are shown in table 3.

From the correlation coefficient results among the parameters in Table 3, there was a data correlation among the loom run time, the number of beats up and the fabric yield of 3. Wherein, the correlation degree between the running time and the beating-up times is the highest, and the correlation coefficient is 0.99; the loom state and the running speed have data correlation, and the correlation coefficient is 0.93.

Referring to fig. 2, a network relationship diagram for identifying abnormal data of a loom according to an embodiment of the present invention is shown.

As can be seen from fig. 2, the abnormal condition of the loom data is determined by the nodes of the type 6 evidence of the loom output, the number of beats, the running time, the running efficiency, the running speed and the loom state.

Table 4 abnormal data identification cases of the respective methods

To verify the effectiveness of the recognition network for recognition of the loom abnormal data, the method is compared with a decision tree and a K nearest neighbor algorithm. Table 4 shows the abnormal point identification conditions of the weaving yield, the beating-up times, the running efficiency and the running speed 4 parameters of the weaving machine according to each method. As can be seen from Table 4, the probability distribution-based abnormal point identification method herein has abnormal point identification rates of 98.71%, 95.00%, 98.67% and 98.41% for the class 4 parameters of the loom, respectively, and an average identification rate of 97.70%, which is higher than that of the other two comparison methods, because the proposed method makes full use of the correlation between the data.

TABLE 5XGBoost model parameters

The missing data points of the loom were repaired by XGBoost decision method, and the parameter settings of the model are shown in Table 5. In order to quantitatively analyze the reliability of the model on repairing the missing data of the loom, the characteristic sample set is randomly split into a training set and a testing set according to the ratio of 7:3, and the missing data points of the loom are repaired by using hermite interpolation, cubic spline interpolation and the method. And randomly selecting 5 observation points at different moments from 4 parameters of weaving yield, beating-up times, running efficiency and running speed of the loom to repair data.

The result shows that compared with the fitting degree between the filled data value and the real data, the repairing result of the missing data of the loom by using the XGBoost decision method is closer to the actual value of the loom parameter, and the repairing value and the actual value obtained by the two comparison methods have larger error. Compared with the method that the acquisition of the characteristic information from a single dimension can only be based on the front-back change trend of the data, when the XGBoost decision method is used for repairing the missing value of the loom, the correlation among the multidimensional parameters of the loom is considered, and the existing correlation parameter correlation information is utilized to realize more accurate repair of the missing data.

Table 6 repair evaluation results of each parameter

Table 6 shows the evaluation results of hermite interpolation, cubic spline interpolation and the method herein for the restoration of each parameter of the loom. According to the evaluation indexes in the table, the repair result of the method for the missing data of the loom is superior to that of other two methods, the evaluation indexes MAE and RMSE are minimum, and the loom weaving yield, the beating-up times, the running efficiency and the running speed correspond to R ² Values 0.9649, 0.9563, 0.9832, 0.9736, respectively. After the accuracy of repairing the missing data of the loom based on the XGBoost decision method is verified, the trained repairing model is utilized to repair the missing data of each parameter of the loom, so that the overall quality of the loom data is improved.

The foregoing is merely exemplary of the present invention and is not intended to limit the present invention. Various modifications and variations of the present invention will be apparent to those skilled in the art. Any modification, equivalent replacement, improvement, etc. which come within the spirit and principles of the invention are to be included in the scope of the claims of the present invention.

Claims

1. A loom abnormal data processing method based on probability distribution and XGBoost decision algorithm is characterized by comprising the following steps:

s101: collecting original data of a loom at fixed time;

s102: calculating a variation difference between each adjacent data point;

s105: determining data points with data values outside the range of the trusted interval as abnormal data;

s107: determining loom parameters which cause data anomalies of the abnormal data points through the Bayesian network anomaly data identification model based on probability distribution;

s109: training the loom missing data repair model based on the XGBoost decision method;

s110: and repairing the abnormal data points through a trained loom missing data repairing model based on the XGBoost decision method.

2. The loom anomaly data processing method based on the probability distribution and XGBoost decision algorithm according to claim 1, wherein S103 specifically comprises:

s1031: acquiring a data change difference sequence in unit time;

s1032: according to a time sequence, calculating a sliding average value of a data change difference value under a current time window as a regression reference threshold under the current time window, wherein the current time window is an n-dimensional time window, and the current window comprises n unit times;

the regression reference threshold is calculated in the following way:

3. The loom anomaly data processing method based on the probability distribution and XGBoost decision algorithm according to claim 1, wherein the S104 specifically is:

s1041: according to the self-adaptive regression reference threshold, calculating the upper limit and the lower limit of a trusted interval of loom parameters at the moment t:

wherein H is _t Representing the upper limit of the trusted interval at the time t, L _t Representing the lower limit of the trusted interval at the time t, y _t-a The legal data value before the time t is represented, and C represents the amplitude coefficient.

4. The loom anomaly data processing method based on the probability distribution and XGBoost decision algorithm according to claim 1, wherein the S106 specifically comprises:

s1061: a sample dataset D is created, said sample dataset D comprising a fabric yield (x ₁ ) Number of beats up (x) ₂ ) Run time (x) ₃ ) Efficiency of operation (x ₄ ) Speed of operation (x) ₅ ) Loom status (x) ₆ ) And data type (x ₇ ) The composition relation of the sample data set D is as follows:

wherein the dimension of D is i multiplied by 7,i, and the number of time sequence acquisition points of loom data is represented;

s1063: determining a network structure of the Bayesian network anomaly data identification model based on probability distribution;

the network structure is a Bayesian network, and the relational expression of the Bayesian network is as follows:

B＝(J,T)

wherein J represents a network structure diagram describing the association relation among elements, and comprises element nodes and a relation pointing line, and T represents a relation data set describing probability distribution among the element nodes in the network;

s1064: according to the prior probability P (x _i I D), i=1, 2, ··,6, training the probability distribution among the sub-nodes, determining the conditional probability distribution P (x) of the evidence nodes and sub-nodes in the loom anomaly recognition network ₇ ＝m _j |x _i )；

S1065: calculating the total probability of each node:

wherein P (x) ₇ ＝m _j ) The combined full probability of the establishment of the conditions of each data type is represented, namely, the result probability under the common influence of six data of the weaving yield, the beating-up times, the running time, the running efficiency, the running speed and the loom state;

s1066: determining the type of the loom data condition according to the total probability distribution of each node;

s1067: under the condition that the output result of the child node is abnormal data, reasoning is carried out on posterior probability of each evidence node according to probability distribution of the abnormal data, and finally loom parameters generating data abnormality are positioned, wherein the posterior probability is calculated in the following way:

wherein P (x) _i |x ₇ ) Representing a parent node x of a child node data type of a known result _i The probability that the condition is satisfied, namely the posterior probability;

s1068: and inputting the sample data set into the Bayesian network abnormal data identification model based on probability distribution for training.

5. The loom anomaly data processing method based on the probability distribution and XGBoost decision algorithm according to claim 4, further comprising, after S1062:

S1062A: introducing a correlation coefficient among all parameters into the Bayesian network, wherein the calculation mode of the correlation coefficient is as follows:

6. The loom anomaly data processing method based on the probability distribution and XGBoost decision algorithm according to claim 1, further comprising, after S107:

s111: and clearing loom parameters which cause data anomalies of the abnormal data points.

7. The loom anomaly data processing method based on the probability distribution and XGBoost decision algorithm according to claim 1, wherein S108 specifically is:

s1081: constructing the loom missing data repair model based on the XGBoost decision method by taking a regression tree as a base learner, wherein the expression of the loom missing data repair model based on the XGBoost decision method is as follows:

wherein, the liquid crystal display device comprises a liquid crystal display device,representing the loom data restoration result at the moment i, x _i Representing known associated input samples of loom parameters to be repaired at instant i, N representing the number of said base learners, f _k Representing the kth base learner.

8. The loom anomaly data processing method based on the probability distribution and XGBoost decision algorithm according to claim 7, wherein S109 specifically comprises:

s1091: taking normal data points of all parameters of the loom obtained after abnormal recognition as characteristic samples of a loom missing data restoration model based on an XGBoost decision method, constructing a characteristic sample set, and dividing the characteristic sample set into a training set and a testing set according to a preset proportion;

s1092: selecting a regression tree as a base learner of the loom missing data restoration model based on the XGBoost decision method, and setting a target loss function of the loom missing data restoration model based on the XGBoost decision method as a mean square loss regression;

s1093: the learning rate, the number of base learners and the regression tree depth of the loom missing data restoration model based on the XGBoost decision method are adjusted according to the error minimum principle;

s1094: training the loom missing data repair model based on the XGBoost decision method through the training set;

s1095: outputting an output result of the loom missing data repair model based on the XGBoost decision method under the condition that the iteration times of the loom missing data repair model based on the XGBoost decision method reach the number of the base learners;

s1096: judging whether the output result is between the upper limit and the lower limit of the trusted interval of each corresponding parameter of the loom, if so, entering S1097, otherwise, entering S1098;

s1097: performing outlier verification on the output result, judging whether the output result passes through the outlier verification, and if so, entering S1099, otherwise, entering S1098;

s1098: adjusting the learning rate, the number of the base learners and the regression tree depth, returning to S1094, and retraining the loom missing data restoration model based on the XGBoost decision method;

s1099: and randomly inputting sample information related to the data to be repaired of the loom in the test set, comparing the deviation of a target output result and the corresponding real parameters of the loom, and ending the training of the loom missing data repair model based on the XGBoost decision method under the condition that the deviation of the target output result and the corresponding real parameters of the loom is within a preset range.

9. The loom anomaly data processing method based on the probability distribution and XGBoost decision algorithm according to claim 1, further comprising, after S110:

s112: and carrying out reliability verification on the repaired data.

10. The loom anomaly data processing method based on probability distribution and XGBoost decision algorithm according to claim 9, wherein S112 specifically is:

wherein y is _i Representing the test set testThe actual value of the sample is calculated,representing a predictive value of said test sample, +.>Representing the mean of the test samples, MAE, RMSE and R ² The range of values is [0,1]The smaller the MAE and RMSE values, the closer the repair result is to the true value, R ² The closer to 1 the value of (c) is, the higher the repair accuracy of the loom missing data repair model based on the XGBoost decision method to loom missing data is.