CN116401570B

CN116401570B - Intelligent processing system for printing quality monitoring big data

Info

Publication number: CN116401570B
Application number: CN202310602031.3A
Authority: CN
Inventors: 张永财
Original assignee: Foshan Litian Packaging Printing Co ltd
Current assignee: Foshan Litian Packaging Printing Co ltd
Priority date: 2023-05-26
Filing date: 2023-05-26
Publication date: 2023-08-11
Anticipated expiration: 2043-05-26
Also published as: CN116401570A

Abstract

The application relates to the technical field of data processing, in particular to an intelligent processing system for printing quality monitoring big data, which comprises the following components: acquiring the intervention distance of any two observation sample data according to the intervention factors; obtaining all levels of each suspicious discrete point; obtaining all combination modes of any two suspicious discrete points in any matching layer number in the matching layer number range; calculating the unmatched degree of each combination mode according to the matching results and penalty items of all pairs of layers in each combination mode; and obtaining the optimal layer number of every two suspicious discrete points, clustering all observation sample data according to the number of optimal clustering clusters and the intervention distance to obtain a plurality of clustering centers of the clusters, and taking the clustering centers of all the clusters as initial parameters of a Gaussian mixture model to obtain a soft classification result. According to the method, the initial input parameters are acquired more accurately, so that the multidimensional data clustering result is more accurate, and the utilization rate of the historical printing quality data of enterprises is improved.

Description

Intelligent processing system for printing quality monitoring big data

Technical Field

The application relates to the technical field of data processing, in particular to an intelligent processing system for printing quality monitoring big data.

Background

Pharmaceutical packaging involves product safety issues and therefore packaging quality requirements are very stringent. For example, the characters such as medicine inspection codes, medicine brand manufacturers, main treatment functions of products, adverse reactions and the like must be printed clearly, the printed positions are obvious, the package is complete and has no damage, and the package material is required to be waterproof and resistant to high temperature. The current medicine printing quality monitoring generally adopts three methods of standard model comparison, product characteristic definition and mixed judgment of the standard model comparison and the product characteristic definition, and the existing printing quality detection method is quite mature.

The defect types in the printing process are various, the defect data recording and collecting in the production process are indispensable, a more perfect feature library and a higher-quality product template can be obtained only by continuously updating the defect library, and the production equipment and process parameters, fault repairing and the like can be optimized in time through analysis and treatment of the defect data. However, while collecting printing defects, the process parameters and equipment states on the production line are required to be acquired and stored to perform relevant analysis in combination with specific defects, the reliability of analysis results is established on the basis of verification of a large number of data sets, but as time is accumulated, defect data are more and more complex, and it is difficult to retrieve or intensively analyze defect types possibly generated under certain similar equipment, materials and process parameters or formation reasons under different parameters of the same defect type, so that maintenance of production equipment and optimization work of process flow are relatively low, and therefore efficient management and processing of historical defect data and corresponding multidimensional equipment and process data are essential for production enterprises.

Disclosure of Invention

The application provides an intelligent processing system for printing quality monitoring big data, which aims to solve the existing problems.

The intelligent processing system for the printing quality monitoring big data adopts the following technical scheme:

the application provides an intelligent processing system for printing quality monitoring big data, which comprises:

the observation sample data acquisition module acquires all observation sample data;

the intervention distance acquisition module is used for acquiring intervention factors of any two observation sample data according to the difference of the time attributes, and performing intervention on the difference of all the attributes of the observation sample data according to the intervention factors to acquire the intervention distance of any two observation sample data;

the suspicious discrete point and level acquisition module is used for acquiring all suspicious discrete points according to the variability of the average intervention distance of all observation sample data; obtaining all levels of each suspicious discrete point;

the optimal clustering quantity acquisition module is used for acquiring all combination modes of any two suspicious discrete points in any matching layer level number in the matching layer level number range; setting punishment according to the quantity of observation sample data contained in all layers which are not matched in each combination mode, and calculating the unmatched degree of each combination mode according to the matching results and punishment items of all pairs of layers in each combination mode; the smallest unmatched degree in the unmatched degrees of all the combination modes of the matching layer series is recorded as the unmatched degree of the matching layer series; recording the matching layer series corresponding to the smallest unmatched degree of any two suspicious discrete points in the matching layer series range as the optimal layer series of any two suspicious discrete points; obtaining the optimal layer series of every two suspicious discrete points in all suspicious discrete points, and taking the mode in all the optimal layer series as the optimal clustering quantity;

the soft classification analysis module clusters all the observation sample data according to the optimal cluster number and the intervention distance of all the observation sample data to obtain a plurality of cluster centers of the clusters, takes the cluster centers of all the clusters as initial parameters of a Gaussian mixture model to obtain a soft classification result, and analyzes the technological parameters of the defect product according to the soft classification result.

Further, the method for obtaining the intervention factors of any two observation sample data according to the difference of the time attributes comprises the following specific steps:

the calculation formula of the intervention factor is as follows:

wherein Q (alpha, beta) represents an intervention factor of the alpha-th observation sample data and the beta-th observation sample data, t _α Time attribute, t, representing the alpha-th observation sample data _β The time attribute of the beta observation sample data is represented, T represents the execution period of the production enterprise pipeline, and I represents taking an absolute value.

Further, the intervention distance for obtaining any two observation sample data comprises the following specific steps:

according to the intervention factors, the intervention is carried out on the differences of all the attributes of the observation sample data, the intervention distance of any two observation sample data is obtained, and the calculation formula of the intervention distance is as follows:

wherein D (alpha, beta) represents the intervention distance between the 0 th observation sample data and the beta th observation sample data, Q (alpha, beta) represents the intervention factors of the alpha th observation sample data and the beta th observation sample data, N represents the attribute quantity of the observation sample data, A alpha, beta 1 represents the epsilon th attribute of the alpha th observation sample data, A _β,ε Represents the epsilon th attribute of the beta th observation sample data, and max () represents the maximum value.

Further, the method for obtaining all suspicious discrete points comprises the following specific steps:

for any one observation sample data, calculating the intervention distance between the observation sample data and all other observation sample data, further obtaining the average value of the intervention distances between the observation sample data and all other observation sample data, and recording the average value as the average intervention distance of the observation sample data;

and (3) obtaining the average intervention distance of all the observation sample data, sequencing all the observation sample data according to the average intervention distance from large to small, generating an elbow graph, obtaining the maximum inflection point of the elbow graph, and marking all the observation sample data before the maximum inflection point as suspected discrete points.

Further, the obtaining all the levels of each suspicious discrete point comprises the following specific steps:

for any one suspicious discrete point, marking a set consisting of intervention distances of the suspicious discrete point and all observation sample data which do not belong to the suspicious discrete point as an intervention distance set of the suspicious discrete point, carrying out hierarchical clustering on the intervention distance set of the suspicious discrete point by using a hierarchical clustering method to obtain all levels of the suspicious discrete point, and marking all observation sample data corresponding to all intervention distances of each level of the suspicious discrete point as all observation sample data contained in each level of the suspicious discrete point; all levels of all suspicious discrete points are obtained, as well as all observation sample data contained by all levels.

Further, the method for obtaining all the combination modes of any two suspicious discrete points in any matching layer level number in the matching layer level number range comprises the following specific steps:

for any two suspicious discrete points, the suspicious discrete point with smaller number of layers is marked as suspicious discrete point a, the suspicious discrete point with larger number of layers is marked as suspicious discrete point b, and [1, M _a ]Marking as a matching level number range, and taking all integers in the matching level number range as a matching level number, wherein M _a Representing the number of levels of suspicious discrete points a;

for any matching layer level r, r levels in all levels of suspicious discrete points a and r levels in all levels of suspicious discrete points b are combined in pairs, and f is shared by possible combination modes _r Seed; each combination mode combination comprises a hierarchy of r pairs of pairwise combinations, wherein one hierarchy belongs to a suspicious discrete point a, and the other hierarchy belongs to a suspicious discrete point b.

Further, the calculating the unmatched degree of each combination mode includes the following specific steps:

the calculation formula of the unmatched degree of each combination mode is as follows:

wherein V (a, b, r, i) represents the unmatched degree of the ith combination mode of the suspicious discrete point a and the suspicious discrete point b in the matching layer level number r, N (c) _r,i,j,a ) Level c representing suspicious discrete point a in the jth pair of levels in the ith combination mode when matching level number r _r,i,j,a Number of observation sample data contained, N (c) _r,i,j,b ) Level c representing suspicious discrete point b in the jth pair of levels in the ith combination of matching level number r _r,i,j,b Number of observation sample data contained, c _r,i,j,a Representing the level belonging to the suspicious discrete point a in the jth pair of levels in the ith combination mode when the number of matching levels r, c _r,i,j,b Representing the level belonging to the suspicious discrete point b in the jth pair of levels in the ith combination mode when the number of matching levels r, P (c) _r,i,j,a ,c _r,i,j,b ) Representing level c _r,i,j,a And hierarchy ofc _r,i,j,b Number of matched observation sample data, E _r,i,a Representing the number of observation sample data contained in all the non-selected levels of the suspicious discrete point a in the ith combination mode when the number of layers r is matched, E _r,i,b Representing the number of observation sample data contained in all the levels of the suspicious discrete point b that are not selected in the ith combination mode at the time of matching the level number r, (E) _r,i,a +E _r,i,b ) A penalty term indicating the i-th combination when matching the number of layers r.

The technical scheme of the application has the beneficial effects that: when the Gaussian mixture model is used for clustering the multi-dimensional data at present, the problem of high initial parameter acquisition difficulty exists, and when the multi-dimensional data is clustered, the reliability of conventionally reducing the dimension of the multi-dimensional data is reduced along with the increase of the dimension, so that the method disclosed by the application is used for measuring the attribute difference degree of the observed sample data, calculating the distance, uniformly measuring the dimension of the multi-dimensional data difference, combining the production requirement and the expectation, obtaining an intervention factor through the difference of time attributes, adjusting the distance according to the intervention factor, and obtaining the intervention distance, so that the observed sample data with similar time attributes, namely adjacent time sequences, are in the same cluster in the subsequent clustering process; aiming at the problems that when the existing algorithm is used for clustering multi-dimensional data, the discrete points are sensitive and depend on a manual threshold, the method screens suspicious discrete points through the variability of the average intervention distance between all observation sample data and all other observation sample data, and performs hierarchical clustering and matching on the intervention distance set of the suspicious discrete points, calculates the unmatched degree of each combination mode of each matching layer level, obtains the optimal cluster number with the highest hierarchical matching consistency of the intervention distance set of all suspicious discrete points, takes the optimal cluster number as the optimal super parameter k of the k-means algorithm, takes the intervention distance as the distance in the k-means algorithm, obtains the cluster centers of a plurality of clusters, takes the cluster centers of all clusters as the initial parameter of a Gaussian mixture model, obtains a soft classification result, obtains the initial input parameter of the Gaussian mixture model under the condition of not depending on any subjective threshold, ensures that the multi-dimensional data clustering result is more accurate, improves the historical printing quality data utilization rate of enterprises, and provides convenience for the follow-up relevant analysis of defect forming factors.

Drawings

In order to more clearly illustrate the embodiments of the application or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a system block diagram of a print quality monitoring big data intelligent processing system of the present application.

Detailed Description

In order to further describe the technical means and effects adopted by the application to achieve the preset aim, the following detailed description refers to the specific implementation, structure, characteristics and effects of a print quality monitoring big data intelligent processing system according to the application with reference to the attached drawings and the preferred embodiment. In the following description, different "one embodiment" or "another embodiment" means that the embodiments are not necessarily the same. Furthermore, the particular features, structures, or characteristics of one or more embodiments may be combined in any suitable manner.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs.

The following specifically describes a specific scheme of the intelligent processing system for monitoring printing quality and big data.

Referring to fig. 1, an intelligent processing system for monitoring print quality according to an embodiment of the present application is shown, which includes the following modules:

the observation sample data acquisition module 101 is configured to obtain observation sample data according to historical data of printing defects of an enterprise medicine box.

The printing process comprises pre-printing control, printing control and post-printing control, and each independent process flow such as ink printing, film coating, die cutting, sticking box and the like has various types of defects with different degrees. Acquiring equipment process parameters using multiple sensors in each process flow, identifying defects from the equipment process parameters, and generating production monitoring logs, such as: the defect pattern is obtained by using a pattern sensor, and the equipment process parameters such as temperature, humidity, vibration frequency, current and the like are obtained by using a temperature sensor, a humidity sensor, a vibration sensor, a current sensor and the like.

And for any process flow, retrieving the production monitoring logs with defects in all the production monitoring logs of the process flow from a storage server of the enterprise, wherein the production monitoring logs comprise time, defect types, defect patterns, various equipment parameters, various process parameters and the like, each production monitoring log is recorded as one piece of observation sample data, the observation sample data is multi-dimensional data, and each dimension data of the observation sample data is recorded as an attribute of the observation sample data.

The intervention distance acquisition module 102 is configured to obtain an intervention factor of any two observation sample data according to the difference of the time attributes, and obtain an intervention distance of any two observation sample data according to the intervention factor.

It should be noted that, clustering the production monitoring logs of the enterprise can provide convenience and help for the subsequent correlation analysis of the defect forming factors, and the gaussian mixture model is a clustering algorithm widely used in the industry. The gaussian mixture model is originally a generation model, and sample data conforming to the gaussian distribution is output on the assumption of the gaussian distribution, similar to the prediction process on the assumption of the gaussian distribution, that is, the gaussian mixture model is known, and the samples are output. If the samples are known, the inverse process is performed when the samples are clustered by using the Gaussian mixture model, namely the samples are known and the model is unknown, then the initial parameters of the inverse process need to be estimated and obtained, the Gaussian mixture model clustering is similar to k-means clustering, but the latter only estimates the clustering center to enable surrounding points to be hard clustered, the former calculates the probability value of each data to the clustering center, estimates the clustering center and covariance, and the clustering center and covariance belong to soft clustering. Therefore, the k-means algorithm is adopted to obtain the center of the reference class cluster, the center is used as an initial parameter of the Gaussian mixture model, model parameters (mean, covariance and mixing coefficient) are continuously and iteratively updated through the Gaussian mixture model until the maximum expectation is met, and then the maximum posterior probability of each sample in which sub model is calculated, so that an accurate classification result can be obtained. The k-means algorithm k value and the acquisition of the clustering center directly determine the accuracy of the initial parameters of the Gaussian mixture model clustering, and the method has important influence on the final clustering result.

1. And obtaining intervention factors of any two observation sample data according to the difference of the time attributes.

Before the multidimensional data is clustered, the multidimensional data needs to be subjected to dimension reduction, the kernel unit of the conventional clustering algorithm is the Euclidean distance between the data, the difference between the multidimensional data is represented by using the Euclidean norm, but under the condition that the dimensionality of the data is more, the accuracy of the difference between the data is lower by using the Euclidean norm, so the dimension reduction is performed on the difference between two multidimensional data points by using the thought of mean square error.

It should be further noted that, the defect products are caused by abnormal operation of the printing device (such as jitter, unstable voltage, etc.) or poor process parameter setting (such as too high humidity, too high temperature, or too low temperature) in a single printing process, and in the same or continuous batch printing, the parameter settings are the same or are closer, so that defects with similar time sequences are likely to have a certain common problem, that is, in the same parameter setting, if two defect products occur, multiple parameters are consistent in a short time, only abnormal parameters need to be found, and the analysis is similar to the analysis of the control variables. Therefore, when the defect data are clustered, the defect products with local similar time sequences are expected to be classified into the same cluster as far as possible so as to carry out commonality analysis, and when the mean square error is calculated on the difference between the multidimensional data, the time sequence difference needs to be independently proposed and the influence weight of the local time sequence similarity needs to be improved.

In this embodiment, an intervention factor is set according to a time attribute difference of any two observation sample data, and a calculation formula of the intervention factor is as follows:

2. And obtaining the intervention distance of any two observation sample data according to the intervention factors.

wherein D (alpha, beta) represents the intervention distance between the 0 th observation sample data and the 2 nd observation sample data, Q (alpha, beta 4) represents the intervention factors of the alpha th observation sample data and the beta th observation sample data, N represents the attribute quantity of the observation sample data, A alpha, beta 1 represents the beta 3 rd attribute of the alpha th observation sample data, A beta, beta 5 represents the epsilon th attribute of the beta th observation sample data, A _α,ε -A _β,ε Representing the difference of the epsilon th attribute of the alpha th observation sample data and the beta th observation sample data, max () represents taking the maximum value.

The epsilon-th attribute of the two observation sample data is subjected to degree differentiation, and the purpose of degree differentiation is to uniformly measure the dimension of the multidimensional data difference; the smaller the time attribute difference between the alpha observation sample data and the beta observation sample data is, the smaller the intervention factor is; obtaining an intervention factor through the difference of the time attributes, and adjusting the difference of the attributes of the two observation sample data according to the intervention factor, so that the intervention of the observation sample data with similar time attributes, namely adjacent time sequence, is reduced, namely the smaller the intervention distance D (alpha, beta); the method is characterized in that for a plurality of observation sample data in the same execution period, the distance between the observation sample data is reduced through an intervention factor, the observation sample data is led to be attributed to the same cluster as much as possible in the subsequent clustering process, and when the time attribute difference of the two observation sample data is large, namely, the two observation sample data are separated by more than one execution period in time sequence, the intervention factor is equal to 1, and at the moment, the intervention factor cannot play an intervention role.

The objective of this step is to optimize the difference metric value between multidimensional data during subsequent clustering, compare with the attribute by mean square error, and unify the difference metric to a degree, compared with the traditional method of European norm measurement of multidimensional data difference, the difference metric result is more accurate, and supervise the clustering process according to the logic described before, so that the data with similar local time sequence is divided into the same clusters as much as possible, so as to realize the expected processing result of the defect record data.

The suspicious discrete point and level acquisition module 103 is configured to obtain all suspicious discrete points according to the average intervention distances of all observation sample data, and cluster the intervention distance set of each suspicious discrete point to obtain all levels of each suspicious discrete point.

The k-means unique core parameters, namely the acquisition of k initial clustering centers, are basically judged by the k value at the most obvious inflection point in the k value of one-dimensional data acquired by the existing elbow method, contour coefficient method and the like, and are basically different in size. However, the multidimensional data does not avoid the dimension reduction process, so that more discrete points are necessarily generated due to errors, the discrete points can cause pseudo clusters with a plurality of distortions, when the k value is obtained by a conventional elbow method, the node from severe change to stable no longer change in each cluster in the clustering result is only evaluated as the optimal k, but for the multidimensional data with larger data quantity and complicated dimension, the discrete points are more in number, the inflection point is not clear in many cases, the discrete points are more sensitive to iteration of k, the fluctuation of the clustering result is more severe, the stable state is difficult to reach, and the k value is difficult to accurately estimate, so that a learner proposes to apply an ISODATA algorithm in the multidimensional data clustering, and the idea is as follows: when the number of samples belonging to a certain class is too small, the class is removed, when the number of samples belonging to the certain class is too large and the dispersion degree is large, the class is divided into two subcategories, and a superior k value is obtained by continuously splitting and merging and updating a classification result. It is necessary to set more subjective thresholds such as cluster result variance threshold, minimum distance between different cluster centers, etc. Too much dependence on the threshold is obviously not of high accuracy. Therefore, in this embodiment, by marking a part of suspicious discrete points, the suspicious discrete points do not participate in the clustering process, so that interference caused by the suspicious discrete points to the clustering process is avoided, and the suspicious discrete points are used as an acquisition path of the optimal k, namely, the identity of an observer of the suspicious discrete points is given, so that hypothesis deduction is performed.

1. And obtaining all suspicious discrete points according to the average intervention distance of all observation sample data.

In this embodiment, for any one observation sample data, the intervention distance between the observation sample data and all other observation sample data is calculated, so as to obtain an average value of the intervention distances between the observation sample data and all other observation sample data, and the average value is recorded as the average intervention distance of the observation sample data; the larger the average intervention distance, the more the observed sample data deviates from all clusters.

The maximum inflection point of the elbow graph is known in the art, and will not be described in detail here.

2. Clustering the intervention distance set of each suspicious discrete point to obtain all levels of each suspicious discrete point.

It should be noted that, the multidimensional data always needs to undergo dimension reduction, so that a certain amount of discrete data is generated in the clustering result or according to any clustering rule due to the influence of error of dimension reduction, but in order to ensure that the discrete data can stably function, in the front operation, a batch of abnormal data with all parameters deviating from normal values can be actively input in the sample data set. All the discrete data are then far from each other, but given the existence of a number of uncertain clusters in the other data, the differences between the discrete data and the other data points must be divided into a number of levels.

In this embodiment, for any one suspicious discrete point, a set formed by the intervention distances between the suspicious discrete point and all observation sample data not belonging to the suspicious discrete point is recorded as an intervention distance set of the suspicious discrete point, hierarchical clustering is performed on the intervention distance set of the suspicious discrete point by using a hierarchical clustering method to obtain all levels of the suspicious discrete point, and all observation sample data corresponding to all intervention distances of each level of the suspicious discrete point is recorded as all observation sample data contained in each level of the suspicious discrete point; all levels of all suspicious discrete points are obtained, as well as all observation sample data contained by all levels.

The optimal clustering number obtaining module 104 is configured to obtain an optimal layer number of any two suspicious discrete points according to the matching condition of all layer numbers of any two suspicious discrete points, and further obtain an optimal clustering number according to the optimal layer number of each two suspicious discrete points.

It should be noted that, for all suspicious discrete points, when there is consistency between the levels of all suspicious discrete points, if there is the highest number of levels of consistency, then the number of levels at this time is the number of clusters with the highest degree of realism, that is, the optimal super-parameter k of the k-means algorithm, so it is necessary to compare the results of hierarchical clustering of the intervention distance set of all suspicious discrete points to obtain the optimal super-parameter k.

In this embodiment, for any two suspicious discrete points, the suspicious discrete point with smaller number of levels is denoted as suspicious discrete point a, the suspicious discrete point with larger number of levels is denoted as suspicious discrete point b, [1, M ] _a ]All integers in between as a matching layer series, where M _a Representing the number of levels of suspicious discrete points a.

For any matching layer level r, r levels in all levels of suspicious discrete points a and r levels in all levels of suspicious discrete points b are combined in pairs, and possible combination modes are sharedWherein M is _a Representing the number of levels of suspicious discrete points a, M _b Representing the number of levels of suspicious discrete points b, +.> () The following is carried out Representing a factorial.

For each combination mode, the j-th pair of layers of the combination modes are marked as (c) _j,a ,c _j,b ) Wherein c _j,a Representing a hierarchy belonging to a suspected discrete point a in a j-th pair of hierarchies, c _j,b Representing the hierarchy belonging to the suspicious discrete point b in the j-th pair of hierarchies.

Calculating the unmatched degree of each combination mode, wherein the calculation formula is as follows:

wherein V (a, b, r, i) represents the unmatched degree of the ith combination mode of the suspicious discrete point a and the suspicious discrete point b in the matching layer level number r, N (c) _r,i,j,a ) Representing the jth pair of levels in the ith combination of matching layer level rHierarchy c belonging to suspicious discrete point a _r,i,j,a Number of observation sample data contained, N (c) _r,i,j,b ) Level c representing suspicious discrete point b in the jth pair of levels in the ith combination of matching level number r _r,i,j,b Number of observation sample data contained, c _r,i,j,a Representing the level belonging to the suspicious discrete point a in the jth pair of levels in the ith combination mode when the number of matching levels r, c _r,i,j,b Representing the level belonging to the suspicious discrete point b in the jth pair of levels in the ith combination mode when the number of matching levels r, P (c) _r,i,j,a ,c _r,i,j,b ) Representing level c _r,i,j,a And level c _r,i,j,b Number of matched observation sample data, E _r,i,a Representing the number of observation sample data contained in all the non-selected levels of the suspicious discrete point a in the ith combination mode when the number of layers r is matched, E _r,i,b Representing the number of observation sample data contained in all the levels of the suspicious discrete point b that were not selected in the ith combination at the time of matching the level number r.Level c representing suspicious discrete point a in the jth pair of levels in the ith combination mode when matching level number r _r,i,j,a And hierarchy c belonging to suspicious discrete point b _r,i,j,b The smaller the value is, the better the matching result between the levels of the two suspicious discrete points is, and the higher the consistency between the layering results of the two suspicious discrete points is when representing the matching layer level r; meanwhile, in order to avoid that the matching layer progression r falls into local optimum, that is, the smaller the matching layer progression r is, the smaller the number of matching layers is, the more likely the situation that the better the matching result of the layers is, the higher the consistency is, leading to the premature convergence of the matching layer progression r, thus increasing the penalty term (E) _r,i,a +E _r,i,b ) I.e. when the number of matching layers r, there is a total amount of unmatched observation sample data remaining between two suspected discrete points, i.e. when +.>At least convergenceIt must also be ensured that the total amount of unmatched observation sample data remaining is as small as possible.

Calculating the unmatched degree of all combination modes, and recording the smallest unmatched degree of the unmatched degrees of all combination modes of the matching layer progression r as the unmatched degree of the matching layer progression r; the suspicious discrete point a and the suspicious discrete point b are in [1, M _a ]The matching layer series corresponding to the smallest unmatched degree in the unmatched degrees of all the matching layer series in the range is recorded as the optimal layer series of the suspicious discrete point a and the suspicious discrete point b;

and obtaining the optimal layer series of every two suspicious discrete points in all suspicious discrete points, and taking the mode in all the optimal layer series as the optimal clustering quantity.

The soft classification analysis module 105 is configured to cluster all the observation sample data according to the optimal cluster number and the intervention distance of all the observation sample data, obtain cluster centers of a plurality of clusters, and obtain a soft classification result by using the cluster centers of all the clusters as initial parameters of the gaussian mixture model.

Taking the optimal cluster number as a super parameter k in a k-means algorithm, taking the intervention distance as a distance in the k-means algorithm, and running the k-means algorithm to cluster all observation sample data to obtain a clustering result, namely k clusters; taking the average value of all observation sample data in each cluster in each dimension as the cluster center of each cluster to obtain the cluster centers of a plurality of clusters.

Taking the clustering centers of all clusters as initial parameters of a Gaussian mixture model, carrying out iterative operation on all observation sample data through the Gaussian mixture model, calculating the maximum posterior probability of each observation sample data in each Gaussian distribution model, and obtaining soft classification results, namely a plurality of categories.

The method and the device aim to acquire initial parameters of the Gaussian mixture model, provide convenience and assistance for subsequent correlation analysis of defect formation factors, and are known in the art, and are not described in detail herein.

And (3) storing the observation sample data of each category in one server independently, and directly calling the observation sample data in the same server to analyze when enterprises analyze the technological parameters of defective products, wherein most of parameters of a plurality of observation sample data in the same server are consistent, so that the reasons for occurrence of two different abnormal defects are necessarily among inconsistent parameters, and then the parameters of the same defects are combined for comparison, so that the formation reasons of different defects can be roughly obtained.

By carrying out soft classification on the observation sample data and storing the observation sample data of each category in an independent server, the observation sample data stored in each server has certain commonality in certain aspects, and when any analysis requirement, process optimization requirement and quality inspection requirement are analyzed, the observation sample data in the same server is directly called for analysis, so that the analysis convenience and the data utilization rate are greatly improved.

The system comprises an observation sample data acquisition module, an intervention distance acquisition module, a suspicious discrete point and level acquisition module, an optimal clustering quantity acquisition module and a soft classification analysis module. When the Gaussian mixture model is used for clustering the multi-dimensional data at present, the problem of high initial parameter acquisition difficulty exists, and when the multi-dimensional data is clustered, the reliability of conventionally reducing the dimension of the multi-dimensional data is reduced along with the increase of the dimension, so that the method disclosed by the application is used for measuring the attribute difference degree of the observed sample data, calculating the distance, uniformly measuring the dimension of the multi-dimensional data difference, combining the production requirement and the expectation, obtaining an intervention factor through the difference of time attributes, adjusting the distance according to the intervention factor, and obtaining the intervention distance, so that the observed sample data with similar time attributes, namely adjacent time sequences, are in the same cluster in the subsequent clustering process; aiming at the problems that when the existing algorithm is used for clustering multi-dimensional data, the discrete points are sensitive and depend on a manual threshold, the method screens suspicious discrete points through the variability of the average intervention distance between all observation sample data and all other observation sample data, and performs hierarchical clustering and matching on the intervention distance set of the suspicious discrete points, calculates the unmatched degree of each combination mode of each matching layer level, obtains the optimal cluster number with the highest hierarchical matching consistency of the intervention distance set of all suspicious discrete points, takes the optimal cluster number as the optimal super parameter k of the k-means algorithm, takes the intervention distance as the distance in the k-means algorithm, obtains the cluster centers of a plurality of clusters, takes the cluster centers of all clusters as the initial parameter of a Gaussian mixture model, obtains a soft classification result, obtains the initial input parameter of the Gaussian mixture model under the condition of not depending on any subjective threshold, ensures that the multi-dimensional data clustering result is more accurate, improves the historical printing quality data utilization rate of enterprises, and provides convenience for the follow-up relevant analysis of defect forming factors.

The foregoing description of the preferred embodiments of the application is not intended to be limiting, but rather is intended to cover all modifications, equivalents, alternatives, and improvements that fall within the spirit and scope of the application.

Claims

1. An intelligent print quality monitoring big data processing system, the system comprising:

the soft classification analysis module clusters all the observation sample data according to the optimal cluster number and the intervention distance of all the observation sample data to obtain a plurality of cluster centers of the clusters, takes the cluster centers of all the clusters as initial parameters of a Gaussian mixture model to obtain a soft classification result, and analyzes the technological parameters of the defect product according to the soft classification result;

the method for obtaining all the levels of each suspicious discrete point comprises the following specific steps:

2. The intelligent processing system for printing quality monitoring big data according to claim 1, wherein the intervention factor for obtaining any two observation sample data according to the difference of time attributes comprises the following specific steps:

the calculation formula of the intervention factor is as follows:

3. The intelligent processing system for printing quality monitoring big data according to claim 1, wherein the intervention distance for obtaining any two observation sample data comprises the following specific steps:

wherein D (alpha, beta) represents the intervention distance of the alpha th observation sample data and the beta th observation sample data, Q (alpha, beta) represents the intervention factors of the alpha th observation sample data and the beta th observation sample data, N represents the attribute quantity of the observation sample data, A _α,ε Epsilon attribute representing alpha observation sample data, A _β,ε Represents the epsilon th attribute of the beta th observation sample data, and max () represents the maximum value.

4. The intelligent print quality monitoring big data processing system according to claim 1, wherein the obtaining all suspicious discrete points comprises the following specific steps:

5. The intelligent processing system for monitoring print quality according to claim 1, wherein the method for obtaining all combinations of any two suspicious discrete points in any matching layer number in the matching layer number range comprises the following specific steps:

for any matching layer level r, combining r levels in all levels of suspicious discrete points a and r levels in all levels of suspicious discrete points b in pairs, wherein the combination mode comprises f _r Seed; each combination mode combination comprises a hierarchy of r pairs of pairwise combinations, wherein one hierarchy belongs to a suspicious discrete point a, and the other hierarchy belongs to a suspicious discrete point b.

6. The intelligent processing system for monitoring print quality and big data according to claim 1, wherein the calculating the unmatched degree of each combination mode comprises the following specific steps:

in the middle ofV (a, b, r, i) represents the unmatched degree of the ith combination mode of the suspicious discrete point a and the suspicious discrete point b in the matching layer level number r, N (c) _r,i,j,a ) Level c representing suspicious discrete point a in the jth pair of levels in the ith combination mode when matching level number r _r,i,j,a Number of observation sample data contained, N (c) _r,i,j,b ) Level c representing suspicious discrete point b in the jth pair of levels in the ith combination of matching level number r _r,i,j,b Number of observation sample data contained, c _r,i,j,a Representing the level belonging to the suspicious discrete point a in the jth pair of levels in the ith combination mode when the number of matching levels r, c _r,i,j,b Representing the level belonging to the suspicious discrete point b in the jth pair of levels in the ith combination mode when the number of matching levels r, P (c) _r,i,j,a ,c _r,i,j,b ) Representing level c _r,i,j,a And level c _r,i,j,b Number of matched observation sample data, E _r,i,a Representing the number of observation sample data contained in all the non-selected levels of the suspicious discrete point a in the ith combination mode when the number of layers r is matched, E _r,i,b Representing the number of observation sample data contained in all the levels of the suspicious discrete point b that are not selected in the ith combination mode at the time of matching the level number r, (E) _r,i,a +E _r,i,b ) A penalty term indicating the i-th combination when matching the number of layers r.