CN112241805A

CN112241805A - Defect prediction using historical inspection data

Info

Publication number: CN112241805A
Application number: CN201910859146.4A
Authority: CN
Inventors: H·K·曹; B·T·阮; K·N·彭
Original assignee: Inspectorio Co ltd
Current assignee: Inspectorio Co ltd
Priority date: 2019-07-19
Filing date: 2019-09-11
Publication date: 2021-01-19
Also published as: CA3053894A1

Abstract

The present disclosure relates to defect prediction using historical inspection data. Defect prediction using historical inspection data is provided. In various embodiments, historical inspection data for a plant is received. The inspection data includes an indication of a defect in one or more products produced in the factory. A plurality of features are extracted from the inspection data. The plurality of features is provided to a defect prediction model. The defect prediction model includes a trained classifier or collaborative filter. Obtaining, from a defect prediction model, an indication of a plurality of defects that are likely to occur in the one or more products.

Description

Defect prediction using historical inspection data

Technical Field

Embodiments of the present disclosure relate to defect prediction, and more particularly, to defect prediction using historical inspection data.

Disclosure of Invention

According to embodiments of the present disclosure, methods and computer program products for defect prediction are provided. In various embodiments, historical inspection data for a plant is received. The inspection data includes an indication of a defect in one or more products produced by the factory. A plurality of features are extracted from the inspection data. The plurality of features are provided to a defect prediction model. The defect prediction model includes a trained classifier or collaborative filter. An indication of a plurality of defects that may be present in one or more products is obtained from a defect prediction model.

In some embodiments, the defect prediction model includes a trained classifier and a collaborative filter configured to provide a consensus output. In some embodiments, the defect prediction model includes a trained classifier and a collaborative filter configured to provide a collective output.

In some embodiments, the indication of defects in the one or more products includes an indication of defects in a predetermined product type, product line, or product category. In some embodiments, the indication of defects in the one or more products includes a plurality of defect names and a defect rate corresponding to each of the plurality of defect names.

In some embodiments, the plurality of features includes: a property of a past inspection of the plant, a property of one or more products, or a property of a defect in one or more products.

In some embodiments, the trained classifier includes an artificial neural network. In some embodiments, the artificial neural network comprises a deep neural network. In some embodiments, the collaborative filter includes a neighborhood model or a latent factor model. In some embodiments, the plurality of defects includes a predetermined number of most likely defects.

In some embodiments, the method further comprises preprocessing the data. In some embodiments, preprocessing the data includes aggregating the data. In some embodiments, preprocessing the data further comprises filtering the data. In some embodiments, extracting the plurality of features from the data includes applying, for each indication of a defect, a mapping from a defect name to one or more standardized defect names from a predetermined name. In some embodiments, the historical verification data comprises a plurality of product names, and wherein extracting the plurality of features from the data comprises applying a mapping from each of the plurality of product names to a standardized product name from the predetermined nomenclature.

In some embodiments, the method further comprises: the historical verification data of the plant is anonymized. In some embodiments, the data also includes a performance history of the plant. In some embodiments, the data also includes geographic information of the plant. In some embodiments, the data also includes product data for the plant. In some embodiments, the data also includes brand data for the inspected product of the plant. In some embodiments, the data spans a predetermined time window.

In some embodiments, providing the plurality of features to the defect prediction model includes sending the plurality of features to a remote defect prediction server, and obtaining the indication of the plurality of defects from the defect prediction model includes receiving the indication of the plurality of defects from the defect prediction server. In some embodiments, extracting the plurality of features includes applying a dimensionality reduction algorithm. In some embodiments, the indication of the plurality of defects that may occur includes a list of the plurality of defects that may occur at the factory. In some embodiments, the list includes a defect name, a defect rate, and a defect description for each of the plurality of defects. In some embodiments, the list includes a list of a plurality of defects that may occur in a particular purchase order, product style, product line, or product category. In some embodiments, obtaining the indication of the plurality of defects further comprises an indication of a report to a user. In some embodiments, providing the indication to the user includes sending the indication to a mobile or web application. In some embodiments, the sending is performed via a wide area network.

In some embodiments, the trained classifier comprises a support vector machine. In some embodiments, obtaining the indication from the defect prediction model includes applying a gradient boosting algorithm.

In some embodiments, the method further comprises: measuring performance of a defect prediction model by comparing indications of a plurality of defects to ground truth indications of the plurality of defects; parameters of the defect prediction model are optimized according to the performance. In some embodiments, optimizing the parameters of the defect prediction model includes modifying the hyper-parameters of the trained machine learning model. In some embodiments, optimizing the parameters of the defect prediction model includes replacing the first machine learning algorithm with a second machine learning algorithm that includes hyper-parameters configured to improve performance of the defect prediction model.

Drawings

Fig. 1 is a schematic diagram of an example system for defect prediction, according to an embodiment of the present disclosure.

FIG. 2 illustrates a process for defect prediction according to an embodiment of the present disclosure.

3A-3B illustrate a framework for defect prediction according to an embodiment of the present disclosure.

Fig. 4 illustrates a framework for defect prediction according to an embodiment of the present disclosure.

FIG. 5 illustrates a process for training a defect prediction system according to an embodiment of the present disclosure.

Fig. 6 illustrates an exemplary process for feature extraction according to an embodiment of the present disclosure.

Fig. 7 illustrates an exemplary process for feature extraction according to an embodiment of the present disclosure.

FIG. 8 depicts a compute node according to an embodiment of the present disclosure.

Detailed Description

Quality control in a plant is typically addressed by reducing the number of defects found in the plant. For some defects, this requires addressing the root cause of the defect, while other defects (such as some systematic and recurring defects) are addressed by passive corrective action, without regard to the more extensive trends in factory performance that may have led to the defect. For example, defective products may be removed from the production cycle after a quality check of the plant fails, but the cause of the defect is still unknown and little information is obtained about problems that may occur in the future, which may occur at the production stage of future planning and manufacturing.

Furthermore, brands or retailers are often limited by the data they have in their possession (disposal) when investigating the performance, quality control or defects of a plant. Brands and retailers often have access only to plant performance data obtained by their internal teams and do not know similar data from other plants, brands, or retailers. Even within a plant, data obtained through self-verification procedures and third-party verification and quality control interventions may not be available in an attempt to investigate the performance or defects of the plant due to various circumstances (e.g., non-digital recording of information, manual information gathering processes, or isolated data).

To address these and other shortcomings, the present disclosure provides a framework for predicting defects that may occur at a factory, such as a textile or apparel factory. In embodiments of the present disclosure, a machine learning method is used to predict defects in a plant before they occur, whereby root causes of the defects may be investigated and addressed, thereby improving the overall quality of the plant. Knowing in advance that a defect may occur will enable a factory, brand, retailer, or business partner thereof to transition from a passive (reactive) quality control approach that deals with after a quality issue is found to a proactive approach that can take corrective action before the defect occurs or before inspection occurs.

For example, for defects predicted at the factory, root cause analysis may be performed. Such analysis may include analyzing the frequency of a given defect occurring in a plant or similar plant within a particular product category over the past several months. Further, the history and corresponding corrective and preventative actions may be reviewed. In addition, additional information may be obtained from the factory for further reference.

Further, the present disclosure provides for obtaining and analyzing data across multiple plants, brands, retailers, and inspection services when training a defect prediction model to accurately predict defects that may occur at a particular plant. In embodiments of the present disclosure, data from quality control or intervention activities performed by various services or personnel at various plants, brands, or retailers may be input into a defect prediction model in order to train the model to predict defects that may occur at a particular location, and may be used as input data to the defect prediction model to obtain an indication of defects that may occur.

Using data from multiple plants, brands, retailers, and inspection services allows for the generation of a robust defect prediction model whereby a large amount of data and analysis can be leveraged to provide accurate defect prediction for a plant that otherwise has relatively little data to proactively address quality issues.

It should be appreciated that the defect prediction system has many applications. The user may obtain an account with the service, enter its data into the service, and obtain a defect prediction result from the service. Services may be accessed via mobile or Web applications. Obtaining data from multiple users allows for leveraging of larger amounts of data to provide more robust predictions, thereby increasing user collaboration and facilitating proactive quality assurance strategies.

In some embodiments, inspectors may use a defect prediction system to obtain a visualization of the most likely defects they will find prior to inspection. In some embodiments, a mobile application is provided that displays the necessary steps and procedures for the inspector, facilitating the completion of the specified inspection. One of the processes in inspection is the process (work manship) process, in which an inspector performs an inspection according to a given workflow to ensure the quality of all products. In many cases, most defects are found during this process. The output of the defect prediction model may be provided on such mobile applications as part of the process portion. This allows visualization of the most likely defect for the inspector to find.

In some embodiments, defect prediction may be used by a factory, brand, or retailer to implement preventative actions during a production planning phase of manufacturing. Knowing which defects may occur will enable the factory to provide a solution and implement actions that mitigate the effects of or prevent defects from occurring. The brand or retailer may also use the defect prediction to ensure that preventative action is implemented as part of the production plan.

In many cases, during the production planning phase of manufacturing, brands/retailers present problems associated with the production plan and potential defects or problems that may arise at the factory. Previous factory inspection performance can be used to provide preventative action for upcoming production. When the factory submits a response to a particular problem, it may use insights about defects that are likely to occur at the factory for a particular product category in order to proactively suggest the necessary actions to correct and prevent the problems. These steps may help both brands/retailers and factories to reduce the potential risk in the late production phase.

In an embodiment of the present disclosure, historical inspection data for a factory is input into a defect prediction model, and a list of the top k defects most likely to occur at the factory is obtained. The inspection data may include information about the plant and/or specific product lines or product categories within the plant. The inspection data may include information about defects observed in the factory, including defect names and types, the number of defects observed in total, and the distribution of defects among inspected products. The list of defects obtained may include defects that may be observed in subsequent inspections of the factory, product line or product category within the factory. The list may include predictions about the types of defects found, the total number of types of defects found, and the distribution of each defect among the products of the plant, and will be useful for planning future actions to be taken at the plant.

As used herein, the term defect refers to any flaw, defect, or imperfection in the production cycle of a plant. In other words, a defect refers to an observable, undesirable deviation from a predetermined production quality standard. Defects may be found at various levels of the production cycle, such as in the factory as a whole, in a particular product category of the factory, product line, product or production method. Defects may be present in various features of a product or product line, or during various stages of a production or inspection cycle, such as design, process, packaging, manufacturing, or documentation. Defects may be quantified over a range of discrete or continuous values, or may be measured as a binary value (e.g., whether a defect is present). Defects may be found in various ways, for example, by inspectors during inspection, by specialized internal quality control teams at the factory, or by personnel responsible for the stage of production where the defect is found. Finding defects and preventing them from occurring are essential components of quality control in the manufacturing process.

In an embodiment of the present disclosure, data related to a plant is received. The data may include historical inspection data for the plant, indications of defects in one or more products produced in the plant, other attributes of defects at the plant, and/or other attributes of the plant. In some embodiments, features are extracted from the data. In some embodiments, the data is preprocessed. In some embodiments, the defect names are mapped to terms in the corresponding nomenclature. In some embodiments, the features are provided to a defect prediction model. In some embodiments, the defect prediction model includes a machine learning model (e.g., a neural network or collaborative filter). In some embodiments, an indication of a plurality of defects that may occur in one or more products of a plant is obtained from a predictive model. In some embodiments, the obtained indication includes a list of defects that may occur at the factory.

Referring now to fig. 1, shown is a schematic diagram of an exemplary system for defect prediction in accordance with an embodiment of the present disclosure. The system 100 includes a defect collection server 106, a defect prediction model 108, a defect prediction server 118, and an inspection quality/compliance platform 102. In some embodiments, the system 100 has three phases of operation: a training phase, a prediction phase and an update phase.

In the training phase, the defect collection server 106 generates an initial data set by collecting historical inspection data 104. The historical inspection data 104 may be input into the defect collection server 106 once via batch insertion. The historical inspection data 104 is then combined with the brand data 110, the factory data 112, the master product data 114, and the master defect data 116 to form an initial training data set. A plurality of relevant features from the historical inspection data and other input data may then be extracted. A plurality of machine learning models are trained on an initial training data set, and the performance of each model is evaluated. The performance of the machine learning models are compared and the model with the most desirable performance is selected as the defect prediction model 108 and deployed on the defect prediction server 118. In some embodiments, multiple models are deployed to defect prediction server 118, in which case consensus results are obtained from the multiple models during the prediction phase. Similarly, the highest results from multiple models may be combined to provide a set result. An Application Programming Interface (API) may be constructed to allow web or mobile applications to interact with defect prediction server 118 and defect collection server 106 by providing data and querying the prediction server to obtain a defect prediction. In some embodiments, the defect prediction server comprises a remote server.

In the prediction phase, the proof-quality/compliance platform 102, which may be adapted to integrate with web or mobile applications (e.g., via an API), may be used to query and provide data to the defect prediction server 118 and obtain defect prediction results from the defect prediction server 118. In some embodiments, the defect prediction result includes a list of k defects that are most likely to occur at the factory or product line.

In the update phase, the quality of inspection/compliance platform 102 may be used to provide new data to the defect collection server 106. In some embodiments, new inspection data is periodically entered into the defect collection server 106 while the inspection is being performed at the factory. Features may be extracted from the new data and input into the defect prediction model 108, resulting in an updated defect prediction for a particular plant or product line. In some embodiments, new data for a particular plant or product is used to update the prediction of defects for that plant or product. In some embodiments, new data for a particular plant or product is used to update the prediction of defects for a different plant or product.

In some embodiments, the defect prediction model may be tested and updated for new data to improve performance. In some embodiments, the new data is provided in the form of new inspection data and/or customer feedback on previous predictions. The customer feedback may include a ground truth report for the defect that includes an indication of the accuracy of the previous prediction, such as the prediction made by the prediction model is incorrect, and the corrected result of the prediction. In some embodiments, new data may be collected to form a new data set using brand data, factory data, primary product data, and primary defect data. It should be appreciated that the new data set may be constructed similarly to the data set described above. In some embodiments, an existing training data set may be added to the new data set. In some embodiments, the performance of the defect prediction model is measured for a new data set. In some embodiments, the defect prediction model is updated if the performance of the defect prediction model is below a certain threshold. The threshold may be heuristically selected, or may be adaptively calculated during training. In some embodiments, updating the defect prediction model includes modifying various features extracted from the input data. In some embodiments, updating the defect prediction model includes modifying parameters of a machine learning model in the defect prediction model. In some embodiments, a new machine learning model may be selected to perform defect prediction. It should be appreciated that the method of retraining the predictive model may be similar to the method used in training the defect prediction system, as described above. The process of retraining the predictive model may be repeated multiple times until the performance of the model on the new data set reaches an acceptable threshold. The updated defect prediction model is then deployed to a defect prediction server, and the existing training data set may be updated to include the new data.

Referring now to FIG. 2, a process for defect prediction is shown, according to an embodiment of the present disclosure. In some embodiments, input data 201 is provided to defect prediction system 202 and defect prediction results 206 are obtained. In some embodiments, the input data 201 includes an identification of a given factory, product category, and/or customer, brand, or retailer. In some embodiments, the input data 201 includes various data (e.g., inspection data, factory data) that may be used for defect prediction. In some embodiments, defect prediction system 202 includes a remote defect prediction server. In some embodiments, defect prediction system 202 includes a trained classifier. In some embodiments, defect prediction system 202 includes a collaborative filter. In some embodiments, defect prediction system 202 employs a machine learning model to predict defects that may occur at the factory. In some embodiments, defect prediction system 202 receives input data 201 and performs data processing step 203. In some embodiments, the data processing step 203 includes mapping terms used in the input data to terms in a standardized nomenclature. In some embodiments, all available relevant data is collected and processed at 203. In some embodiments, feature extraction step 204 is performed by defect prediction system 202 to extract various features. In some embodiments, the feature extraction step 204 is performed on the data processed in step 203. In some embodiments, a feature vector is output. In some embodiments, the features extracted at 204 are provided to a defect prediction model at 205. In some embodiments, the defect prediction model comprises a trained machine learning model. In some embodiments, the defect prediction model outputs the prediction results 206. In some embodiments, the prediction results 206 include a list of defects that may occur at the factory. In some embodiments, the list of defects is limited to providing the k most likely defects.

In some embodiments, the defect prediction model includes a trained classifier. In some embodiments, the trained classifier is a deep neural network. In some embodiments, the defect prediction model applies a collaborative filtering method to the input data. In some embodiments, the collaborative filtering method uses a neighborhood model or a latent factor model. Other suitable techniques for predictive models in accordance with the present disclosure include decomposition machines, neuro-factorization machines, field-aware neuro-factorization machines, deep decomposition machines, and deep crossover networks.

In some embodiments, the trained classifier is a random decision forest. However, it should be appreciated that various other classifiers are suitable for use in accordance with the present disclosure, including linear classifiers, Support Vector Machines (SVMs), gradient enhanced classifiers, or neural networks such as Convolutional Neural Networks (CNNs) or Recurrent Neural Networks (RNNs).

Suitable artificial neural networks include, but are not limited to, feed-forward neural networks, radial basis function networks, self-organizing maps, learning vector quantization, recurrent neural networks, Hopfield networks, Boltzmann machines, echo state networks, long-term short-term memory, bidirectional recurrent neural networks, hierarchical recurrent neural networks, stochastic neural networks, modular neural networks, associative neural networks, deep belief networks, convolutional neural networks, convolutional deep belief networks, large memory storage and retrieval neural networks, deep Boltzmann machines, deep stacking networks, tensor deep stacking networks, spike and slab-constrained Boltzmann machines, composite hierarchical depth models, depth coding networks, multi-layer kernel machines, or deep Q networks.

Various metrics may be used to measure the performance of the learning model. In some embodiments, the metrics used to measure performance include precision @ k and recall @ k. However, it should be appreciated that other metrics may be suitable for use, such as accuracy, recall, AUC, and F-1 score.

In embodiments of the present disclosure, data may be obtained in various formats. Data may be structured or unstructured and may include information stored in multiple media. The data may be manually entered into the computer or may be automatically obtained from a file by the computer. It should be appreciated that a variety of methods are known for obtaining data via a computer, including, but not limited to, parsing written or text documents using optical character recognition, text parsing techniques (e.g., looking up key/value pairs using regular expressions), and/or natural language processing, crawling web pages, and obtaining various measured values from databases (e.g., relational databases), XML files, CSV files, or JSON objects.

In some embodiments, factory or inspection data may be obtained directly from the data management system. In some embodiments, the data management system is configured to store information related to the plant and/or the inspection. The data management system may collect and store various types of information relating to the plant and the checkout, such as information relating to purchase orders, checkout reservations, assignments, reports, corrective and preventative actions (CAPA), checkout results, and other data obtained during the checkout. It should be appreciated that a large amount of data may be available, and in some embodiments, only a subset of the available data is used for input into the predictive model.

As used herein, verifying a subscription refers to a request for future verification at a suggested date. The checkout subscription may be initiated by a vendor, brand, or retailer and may contain information for purchase orders corresponding to future checkout. As used herein, assignment refers to a verified subscription that is confirmed. The assignment may include a confirmation of the proposed date for checking the subscription, as well as an identification of the assigned checker and information related to the subscription.

The data may be obtained via data pipes that collect data from various sources of plant and inspection data. The data pipeline may be implemented via an Application Programming Interface (API) with permissions to access and obtain desired data and compute various characteristics of the data. The API may be internal facing, e.g., it may provide access to an internal database containing plant or inspection data, or external facing, e.g., it may provide access to plant or inspection data from an external brand, retailer, or plant. In some embodiments, the data is provided by an entity wishing to obtain a prediction from a predictive model. The provided data may be input into the model in order to obtain prediction results, and may also be stored to train and test various predictive models.

In embodiments of the present disclosure, data may be sent and received via a mobile or web application. The user may have an account with a service adapted to send data via a mobile or web application and receive results from the prediction server. The data may be sent manually or automatically. The data may be automatically entered into the server after a triggering event, such as a test, or may be automatically entered at regular intervals (e.g., monthly, every 180 days). Similarly, information may be sent to the user via a mobile or web application. The information may include the prediction results from the prediction server. This information may be sent to the user upon request, or may be sent automatically. This information may be sent automatically after a triggering event, such as a change in an existing prediction result or a reconfiguration of a prediction model in the prediction system, or may be sent automatically at regular intervals. It should be appreciated that various other methods and data transfer schemes may also be used to send and receive information via an application.

The mobile application may be implemented on a smartphone, tablet, or other mobile device, and may run on various operating systems, such as iOS, Android, or Windows. In various embodiments, the defect prediction results are sent to a mobile or web application via a wide area network.

In accordance with the present disclosure, data may be obtained at various levels. Data may be retrieved for a particular purchase order, product line, product style, product category, department within a plant, or plant for a brand or retailer. Data may also be obtained for multiple products, product lines, categories, or departments within a plant, and data may be obtained for multiple products, product lines, product categories, or departments across multiple plants. It should be appreciated that while certain examples are described in terms of data relating to a factory or product line, it should be appreciated that this is meant to encompass specific products, purchase orders, styles or other classifications. Also, while certain examples are described in terms of results associated with a factory or product line, it should be appreciated that this is meant to encompass specific products, purchase orders, styles, or other classifications. Similarly, while various examples described herein relate to data for a plant, it should be appreciated that the present disclosure applies to brands, retailers, or other business entities involved in manufacturing or producing a product.

As used herein, style of a product or product line refers to the unique appearance of an item based on a corresponding design. A style may have a unique Identification (ID) within a particular brand, retailer, or factory. The style ID may be used as an identification feature by which other measurements may be aggregated to extract meaningful features related to inspection results and defect prediction.

In some embodiments, the obtained data is anonymized so that the ordinary user cannot obtain identifying information of the factory, brand, or retailer.

The obtained data may also be aggregated and statistical analysis may be performed on the data. According to embodiments of the present disclosure, data may be aggregated and analyzed in various ways, including, but not limited to, adding values for a given measurement over a given time window (e.g., 7 days, 14 days, 180 days, or a year), obtaining maximum and minimum values, mean values, median values, and patterns for a distribution of the given measurement values within the given time window, and obtaining a measure of prevalence of certain values or ranges of values among the data. For any feature or measurement of data, the variance, standard deviation, skewness, kurtosis, supersensitivity, high pressure, and various percentile values (e.g., 5%, 10%, 25%, 50%, 75%, 90%, 95%, 99%) of the distribution of the feature or measurement over a given time window may also be measured.

The data may also be filtered before being aggregated or statistical or aggregate analysis performed. The data may be grouped by certain characteristics and statistical analysis may be performed on subsets of the data having these characteristics. For example, the above metric may be calculated for data associated with only a particular test type or with tests exceeding a minimum sample size.

Aggregation and statistical analysis may also be performed on data resulting from previous aggregation or statistical analysis. For example, statistics for a given measurement over a given period of time may be measured over multiple consecutive time windows, and the resulting values may be analyzed to obtain values regarding their variation over time. For example, the average plant failure rate may be calculated for various consecutive 7-day windows, and the change in average failure rate may be measured over the 7-day window.

In an embodiment of the present disclosure, the historical inspection data includes information related to the results of past inspections (e.g., whether the inspection passed, information related to defects discovered during the inspection), and information obtained through the inspection process (e.g., a general profile and performance report of the plant). Examples of suitable data for predicting defects that may occur in a factory include: data obtained from previous inspections at the same plant, data obtained from inspections of other plants of similar products or product lines having subjects (subjects) for future inspections, data obtained from plants across multiple inspections, data predetermined about future inspections (e.g., geographic location, time, entity performing the inspection, and/or type of inspection), data related to business operations of the plant, data related to product quality of the plant, general information about the plant, data related to sustainability of the plant or other similar plants, and/or data related to performance of the plant or other similar plants. The data may include information obtained from customer reviews of products or product lines similar to the products or product lines produced by the plant, and/or customer reviews of products or product lines originating from the plant. It should be appreciated that for some metrics, a plant may be divided into various departments, each obtaining a different metric.

Examples of data related to defect prediction include: the amount of orders placed at the plant, the quantity of orders, the quality of orders, the monetary value of orders, general information about orders, a description of each product of the plant (e.g., Stock Keeping Units (SKUs) of products, dimensions, styles, colors, quantities, and packaging methods), financial performance of the plant, the amount of items inspected at the plant during inspection of processes such as work, packaging, and measurement, information about Acceptable Quality Limits (AQLs) of processes at the plant (e.g., a sample amount for testing quality), inspection results of past inspections at the plant, inspection results of past inspections of particular product/product lines, inspection results of other plants having similar products, inspection results of past inspections of business partners of the plant, inspection results of various metrics collected during inspection, The geographical location of the plant, the size of the plant, the operating conditions and operating time of the plant, and the aggregation and statistical measures of the data mentioned above.

The historical inspection data may also include specific information about defects found during inspection. This may include the number of defects found, the number of defective units, the name of the defect, the type of defect, the category of defect, the rate of defects in the tested good, the severity of the defect, and the distribution and/or severity of the type of defect among the tested products. In some embodiments, the defect category corresponds to an inspection process (e.g., process, packaging, or measurement) in which the defect is found. In some embodiments, defects are classified by product line, product category, or level (secondary/primary/critical).

The historical inspection data may include a list of all defects found in the factory during inspection. The list may refer to defects using defect names given by a particular factory, or it may use defect names corresponding to names in a standardized name. An average defect rate may then be calculated for a particular defect or factory within a given time window. The inspection data may also include a list of all categories and product lines of the plant, as well as all possible defects that may be found in the plant.

Information about the plant may be obtained from the inspection data, such as plant location, plant profile and/or product information related to the product being inspected, such as product name, product line, product category. Example plant profiles include plant population, plant business areas, plant addresses, and/or plant contacts. A measure of the overall plant performance may also be obtained by estimating the defect rate of the different defects and the overall inspection failure rate during a given time window.

In embodiments of the present disclosure, for each defect found, various metrics corresponding to the defect may be obtained. For example, the sample size measured at the time of defect discovery, the type of inspection performed (e.g., internal inspection, third party inspection), the total number of available quantities of products, product lines, or product categories being inspected, the number of different styles of products, and the number of defective articles being measured may be obtained. In various embodiments, the type of test includes a self test, a DUPRO test (DURING PROduction test), an FRI test (Final random test), a pre-PROduction test, a first in-line PROduction test, a second in-line PROduction test, and/or a re-test. For a particular defect, an average of the occurrence of defects during a particular time window may be obtained.

It should be appreciated that data may be collected over various time windows, for example, the last 7, 14, 30, 60, or 90 days, or a particular 7, 14, 30, 60, or 90 day window. Data may be collected from a plurality of factories, departments within factories, brands, retailers, product categories, product lines, and products. Data may be collected on various scales, such as the scale of a product category, product line, or product within a particular plant or group of plants, department within a plant, and within a plant or across multiple plants. In some embodiments, the inspection data and corresponding defect data are time stamped.

It should be appreciated that a large number of features may be extracted by various methods, such as manual feature extraction, whereby features having significant correlation with the target variable (e.g., defects may occur) are calculated or extracted from the obtained data. The features may be extracted directly from the data or may require processing and/or further computation to format it in such a way that the desired metrics can be extracted. For example, given the results of various tests conducted at the factory in the last year, it may be desirable to calculate the percentage of test failures within that time period. In some embodiments, extracting features results in feature vectors, which can be pre-processed by applying dimensionality reduction algorithms (such as principal component analysis and linear discriminant analysis) or inputting the feature vectors into a neural network, thereby reducing the size of the vectors and improving the performance of the overall system.

In embodiments of the present disclosure, neural networks may be used for defect prediction. The prediction of defects using neural networks can be expressed as follows:

suppose that for a given plant and product category, n attributes { x } can be extracted₁,x₂,...,x_nAnd D ═ D₁,d₂,...,d_MIs the list of all possible defects that can be found during inspection, where M is the total number of defects. Given the feature vector x ═ x for plant and product categories₁,x₂,...,x_nCan determine a vector for estimating a defect rate

Function f (x):

wherein

Is the predicted defect rate of the ith defect to be found during the next inspection of that factory, where i is 1,2, …, M, and

it will be appreciated that in calculating the vector

Then, through the pair set

All elements in (a) are sorted and the top K indices are selected, easily from the vector

The first K defects that may occur at the factory are extracted.

In embodiments of the present disclosure, although one factory may perform multiple inspections of the same product or product category, each inspection is associated with a unique factory. A list of actual defect rates for each defect occurring at the factory can be defined for a given product category and specific inspection s₁,s₂,...,s_M}. In addition, the item

And

the following can be defined:

wherein

And

the actual defect rate and the predicted defect rate of the ith defect during the kth inspection in the training dataset, respectively. It should be appreciated that various recommendation methods may be used to learn the function f (x), such as depth and width neural networks, factorization machines, and neurofactorization machines.

To train the neural network, various loss functions may be used. In some embodiments, a point-by-point approach to the learning-to-ranking problem is employed, where the penalty is defined as:

where N is defined as the number of tests used in the data set, M is the total number of defects, w is the weight vector, λ is the regularization constant used in the training process, and CE is

And

the cross entropy error function between, given by:

feature extraction for input to the neural network may be performed by extracting attributes from historical inspection data (such as those described in table 1) within a given time window and using the extracted attributes as inputs to the neural network. These features can be converted into various forms. In some embodiments, features that can be represented as categorical variables are converted into a single heat vector using single heat encoding. Features such as defect descriptions written in english or other languages may be processed to transform them into vectors. In some embodiments, all numbers and stop words are removed from the description, and then, using the bag of words approach, each defect description may be transformed into a high-dimensional vector, where each element of the vector is the number of occurrences of a particular word in the defect description. In some embodiments, a bag of words approach is used. Wallach's Topic Modeling: examples of suitable methods are described in Beyond band-of-words (https:// doi. org/10.1145/1143844.1143967).

By combining various text, classifications, and other features together over a given time window, a unique vector of dimension L can be obtained. To predict the K most likely defects that occur in the next inspection, a vector of dimension L may be obtained for each of the M defects, and the M vectors may be concatenated to form an M × L matrix. This matrix may be referred to as a "signature image" of the plant and product categories and may be used as input data to the neural network.

A pair of factory and product categories corresponds to a list of M feature vectors of L dimensions. These vectors can be used to predict the probability of occurrence of all M defects. These M vectors can be concatenated into a two-dimensional matrix of size M × L, which can be considered as an M × L image or "eigen image" of a plant and product category pair.

After the feature extraction process, various deep learning methods may be applied to learn a suitable model for defect prediction. In some embodiments, a depth and width neural network (DWN2) may be used. Using DWN2, given an input vector x ═ x₁,x₂,...,x_nAnd converting all classification variables into corresponding embedded vectors, so as to obtain cascaded vectors. The concatenated vector may pass through several hidden layers with various activation functions. In some embodiments, the hidden layers are fully connected. In some embodiments, random gradient descent may be used to learn the model parameters, but it should be appreciated that various optimization methods may be used depending on the loss function used in the network.

The input to the defect prediction model is a feature vector of dimension L. For a given factory and product category, the probability of occurrence of each defect (out of M defects) can be calculated, and these likelihood values can be ranked (sort) to extract the most likely defect.

Referring now to fig. 3A-B, a framework for defect prediction is shown, in accordance with an embodiment of the present disclosure. The framework 300 includes a deep neural network 304. Input data 302 includes features extracted from historical inspection data, as well as plant and product information, but it should be appreciated that various combinations of features and feature types may be used to generate input data. The input data 302 is sent to the neural network 304 and predicted defect rates for the plant or product category

The corresponding vector 306. In some embodiments, a sigmoid activation function is used in the neural network to ensure this

The defect prediction may be transformed into a recommendation problem to match the defects to the particular plant and/or product line in which they may be found. In embodiments of the present disclosure, recommendation algorithms such as Collaborative Filtering (CF) may be used to predict defects that may occur in a plant. Various methods of applying collaborative filtering techniques to input data may be used to generate defect prediction results, for example, memory-based methods such as neighborhood-based CF, item/user-based top-N recommendations, model-based methods, context-aware CF, hybrid methods, and latent-factor-based models.

In embodiments of the present disclosure, collaborative filtering may be used to predict defects by using various neighborhood models. In a factory-oriented neighborhood model, the rate of various defects can be estimated based on the known defect rates of many factory inspections within a given time window. In the defect-oriented neighborhood model, the ratios of various defects may be estimated based on known defect rates at the same factory for similar defects and/or products. In the neighborhood model, a function for measuring the similarity between two items may be selected. Should beIt is recognized that various similarity metrics may be used in accordance with the present disclosure, such as euclidean distance, Manhattan distance, Pearson correlation, and vector cosine. By calculating a similarity measure between each pair of defects, a defect rate r can be calculated for each defect i in each factory F_FiWhich represents the estimated occurrence of the defect at the next factory inspection. Defect rate r_FiA weighted average of the calculated defect rates of the neighborhood defects may be represented.

In embodiments of the present disclosure, collaborative filtering based on a latent factor model may be used to predict defects that may occur at a plant. In this model, plant F is associated with a plant factor vector x_FAre correlated, and defect u is associated with defect factor vector y_uAnd (4) associating. Predicted defect rate r representing predicted rate of defect i in factory F_FiCan be calculated as two latent factor vectors x_FAnd y_iInner product of (2):

during the training process of the learning model, parameter estimation can be realized by solving an optimization problem,

in equation 8, λ is a regularization parameter. This optimization problem can be calculated by using a random gradient descent to obtain the most suitable parameters of the model.

Referring now to fig. 4, a framework for defect prediction is shown, according to an embodiment of the present disclosure. Framework 400 uses a collaborative filtering approach for defect prediction. Using historical inspection data over a given time window, a factory defect table 410 can be generated that indicates the defect rate for each defect at each factory under consideration. In some embodiments, the defect rate may take a value within the range [0,1.0], or NA if the defect rate of a defect for a particular factory is unknown. In some embodiments, the table 410 is combined with additional information 420, and the additional information 420 may include factory information, product information, inspection information, brand information, and/or defect information. The factory defect table 410 and/or the additional information 420 may be input into the collaborative filtering model 430. The collaborative filtering model 430 may be deployed as a defect prediction model on the defect prediction server described above. The collaborative filtering model 430 may output an estimated defect rate vector 440 for each plant that indicates the predicted defect rate for each defect measured in the table 410. The defect rate indicated in vector 440 may correspond to a list of defects that may be found in the next inspection of the factory. In some embodiments, the vector 440 indicates defects that may occur or be found in the next inspection of the factory for a particular brand/retailer and/or product category.

Referring now to FIG. 5, a process for training a defect prediction system is shown, in accordance with an embodiment of the present disclosure. The steps of process 500 may be performed to train a defect prediction model. In some embodiments, the model is deployed on a prediction server. The steps of process 500 may be performed locally at a plant site, may be performed by a remote server (e.g., a cloud server), or may be shared between a local computing device and a remote server. At 501, an initial training data set is created. In some embodiments, the training data set may include historical inspection data for a large number of plants. In some embodiments, the training data set includes historical test data obtained over a particular time window (e.g., 3 months, 6 months, 9 months). In some embodiments, the initial training data set includes information about defects found during historical inspection. It should be appreciated that the data may include various features described above. The data may then be pre-processed at 503. In some embodiments, preprocessing the data includes mapping terms used in the data to a standardized nomenclature. Relevant features may then be extracted from the data at 505. As discussed above, the relevant features may include features related to historical inspection and observed defects. At 507, multiple machine learning models (e.g., collaborative filtering, deep neural networks) may be trained on the training dataset and the performance of each model is evaluated using the methods described above (e.g., measure precision @ k and recall @ k). The hyper-parameters of each model may be configured to optimize the performance of the model. The most useful features for performing the prediction may be selected. The model with the most desirable performance is selected 509. At 511, the selected model is deployed onto a prediction server where it can be used to provide defect prediction results for new input data, such as new input data from a web or mobile application.

In some embodiments, the initial training data set may be divided into a training data set, a test data set, and a validation data set. In some embodiments, the initial training data set is divided into a training data set and a test data set. In some embodiments, cross-validation techniques are used to estimate the performance of each defect prediction model. The performance results may be verified by subjecting the trained defect prediction model to new inspection data.

In some embodiments, defect names, product names, and any other factory-specific terms that may appear in the obtained data may be mapped to one or more of the predefined nomenclature. Given that many factories and inspection services use different names to classify and label defects and products, in combining and comparing data from multiple sources, it may be necessary to map variant names to one or more predefined terms or names. Mapping multiple factory-used terms to a particular name also prevents redundancy in the obtained data, which is the listing of two defect types as being a single type of defect when they are actually the same. Even within a factory, different terms may be used to refer to the same product or defect, as the brand, retailer, or inspection service used by the factory may change over time. Moreover, standardizing the data obtained as a standardized nomenclature allows new business partners of factories and retailers to better understand and evaluate the performance of the factory or retailer without understanding the specific terms used to measure its performance. Thus, the present disclosure provides for processing the obtained data to combine equivalent terms and map terms used across multiple data sources to a predetermined nomenclature.

In some embodiments, mapping terms to names includes combining a list of possible terms to which a term may be mapped. In some embodiments, various descriptors may be associated with each term. For example, when mapping defects to names, a primary list of primary defects may be created, where each defect is associated with various defect data (e.g., a primary defect category, a primary defect name, and a primary defect description). The entries for each data type may vary based on the product being described. Using the mapping, any defect can be associated with one or more primary defects. It should be appreciated that the naming can be updated or extended when a new data type is created or measured. It should also be appreciated that a similar process may be used to map product names or other data that differs from source to a standardized name.

Referring now to fig. 6, an exemplary process for feature extraction is illustrated, in accordance with an embodiment of the present disclosure. In the example of FIG. 6, the defect data is obtained from both brands, brand A and brand B. For each brand, each defect is mapped to one or more primary defects in the primary defect list.

Referring now to fig. 7, an exemplary process for feature extraction is illustrated, in accordance with an embodiment of the present disclosure. In the example of FIG. 7, a list of main-product lines, main-product categories, and main-product names may be defined. In some embodiments, naming is hierarchical, whereby certain terms are associated with a particular parent term, which itself may be associated with its own parent term. For example, certain main-product names may be associated with certain main-product categories, and certain main-product categories may be associated with certain main-product lines. In some embodiments, mapping terms to standardized terms in a nomenclature may include selecting a value of a first category (e.g., a main product line) and then selecting a value of a second category (e.g., a main product category associated with a main product line) from available possibilities associated with the first category. Similarly, the value of the third category may be selected from among the available possibilities associated with the second category (e.g., a main product name may be selected from the available main product names associated with the selected main product category). In some embodiments, the terms are mapped directly to the most specific standardized terms, thereby determining the value of the parent term.

Each brand or retailer may have different definitions for product lines, product categories, and product items. To improve the performance of the defect prediction model, these definitions may be standardized in various embodiments by constructing a common set of product lines, product categories, and product items. For example, a general list of different major product lines (e.g., footwear, apparel, durable goods, etc.) may be defined, which may cover all possible scenarios. Each main product line is broken down into different product categories, and each of these product categories is divided into a plurality of product items. In this way, it is ensured that a product item belongs to a unique product line and a unique product category. Once established, the master product may be used to map corresponding product lines, product categories, and product items from a given brand or retailer. In calculating the feature vectors for a given plant and product category, the main product line and the main product category are used.

In some embodiments, any defects found in the product may be assigned to a primary product line, a primary product category, a primary product name, a primary defect category, and a primary defect description. The mapped data may then be used to train a predictive model and obtain a prediction result.

Table 1 lists a number of features that can be extracted from the test data using the method described above. The main product and main defect functions are indicated by asterisks.

TABLE 1

According to embodiments of the present disclosure, a defect prediction model provides an indication of a plurality of defects that may occur in one or more products. In some embodiments, the indication includes a list of defects that may occur at the factory. In some embodiments, the list includes the first K defects most likely to occur at the factory. It should be appreciated that the defects most likely to occur at the factory may be understood as the defects most likely to be found at the next inspection. It should also be appreciated that the list of defects may be specific to a product, product line, style, product category, department within a plant, or plant, and each individual defect may include an indication of a particular level of granularity with which it is used. And (4) obtaining. In some embodiments, the received indication is specific to a particular brand or retailer's purchase order. The list may also include the name of each defect. In some embodiments, the defect names used in the standard nomenclature are mapped back to the names used by the particular factory/brand/retailer that received the report. The value of K may be selected by a user or may be predetermined. In some embodiments, all defects having a probability above a particular threshold are received from the defect prediction model. The threshold may be selected in various ways, for example, by a user, predetermined by a defect prediction system, or adaptively learned during training. In some embodiments, the defect likelihood score for a defect for a factory and a product category may be considered a predicted probability of a defect at the factory for the product category. For example, a score of 0.5 means that there is a 50% chance that a defect will occur in the factory for that product category.

The information provided for each defect in the report may include a number of different values. In some embodiments, the report indicates whether a defect is likely to occur. The probability of occurrence of a defect can be compared with a threshold value in the manner described above. In some embodiments, the report indicates a likelihood of a defect occurring. In some embodiments, the report includes an indication of the severity of the defect in the product. In some embodiments, the report includes an indication of the percentage of products that may have defects. In some embodiments, the report may include the number of different defects expected to be found within a particular product, the number of total defects expected to be found within available products, and/or the distribution of defects among available products and/or their severity. In some embodiments, a description of the defect is provided. This may guide the inspector in identifying and measuring a particular defect.

Referring now to FIG. 8, a schematic diagram of an example of a compute node is shown. The computing node 10 is only one example of a suitable computing node and is not intended to suggest any limitation as to the scope of use or functionality of the embodiments described herein. In any event, computing node 10 is capable of implementing and/or performing any of the functions set forth above.

In the computing node 10, there is a computer system/server 12, which is operational with numerous other general purpose or special purpose computing system environments or configurations. As is well known, examples of computing systems, environments, and/or configurations that may be suitable for operation with computer system/server 12 include, but are not limited to: personal computer systems, server computer systems, thin clients, thick clients, hand-held or laptop devices, microprocessor-based systems, set top boxes, programmable consumer electronics, network PCs, minicomputer systems, mainframe computer systems, distributed cloud computing environments that include any of the above, and the like.

Computer system/server 12 may be described in the general context of computer system-executable instructions, such as program modules, being executed by a computer system. Generally, program modules may include routines, programs, objects, components, logic, data structures, etc. that perform particular tasks or implement particular abstract data types. Computer system/server 12 may be practiced in distributed cloud computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed cloud computing environment, program modules may be located in both local and remote computer system storage media including memory storage devices.

As shown in FIG. 8, the computer system/server 12 in the computing node 10 is shown in the form of a general purpose computing device. The components of computer system/server 12 may include, but are not limited to: one or more processors or processing units 16, a system memory 28, and a bus 18 that couples various system components including the system memory 28 and the processing unit 16.

Bus 18 represents one or more of any of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, and a processor or local bus using any of a variety of bus architectures. By way of example, such architectures include, but are not limited to, Industry Standard Architecture (ISA) bus, micro-channel architecture (MAC) bus, enhanced ISA bus, Video Electronics Standards Association (VESA) local bus, Peripheral Component Interconnect (PCI) bus, peripheral component interconnect express (PCIe), and Advanced Microcontroller Bus Architecture (AMBA).

Computer system/server 12 typically includes a variety of computer system readable media. Such media includes any available media that is accessible by computer system/server 12 and includes both volatile and nonvolatile media, removable and non-removable media.

The system memory 28 may include computer system readable media in the form of volatile memory, such as Random Access Memory (RAM)30 and/or cache memory 32. The computer system/server 12 may also include other removable/non-removable, volatile/nonvolatile computer system storage media. By way of example only, storage system 34 may be used to read from and write to non-removable, nonvolatile magnetic media (not shown, and commonly referred to as a "hard drive"). Although not shown, a magnetic disk drive for reading from and writing to a removable, nonvolatile magnetic disk (e.g., a "floppy disk") and an optical disk drive for reading from or writing to a removable, nonvolatile optical disk such as a CD-ROM, DVD-ROM, or other optical media may be provided. In these cases, each drive may be connected to bus 18 by one or more data media interfaces. Memory 28 may include at least one program product having a set (e.g., at least one) of program modules that are configured to carry out the functions of embodiments of the disclosure.

A program/utility 40 having a set (at least one) of program modules 42, such program modules 42 including but not limited to an operating system, one or more applications, other program modules, and program data, may be stored in memory 28, each of which examples or some combination thereof may include an implementation of a network environment. Program modules 42 generally perform the functions and/or methods of the embodiments described herein.

The computer system/server 12 may also communicate with one or more external devices 14, such as a keyboard, pointing device, display 24, etc.; one or more devices that enable a user to interact with computer system/server 12; and/or any device (e.g., network card, modem, etc.) that enables computer system/server 12 to communicate with one or more other computing devices. Such communication may occur via an input/output (I/O) interface 22. Also, the computer system/server 12 may communicate with one or more networks, such as a Local Area Network (LAN), a general Wide Area Network (WAN), and/or a public network (e.g., the internet) via the network adapter 20. As depicted, network adapter 20 communicates with the other components of computer system/server 12 via bus 18. It should be understood that although not shown, other hardware and/or software components may be used in conjunction with the computer system/server 12. Examples include, but are not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, data archive storage systems, and the like.

The present disclosure may be embodied as systems, methods, and/or computer program products. The computer program product may include a computer-readable storage medium (or media) having computer-readable program instructions thereon for causing a processor to perform aspects of the present disclosure.

The computer readable storage medium may be a tangible device that can hold and store the instructions for use by the instruction execution device. The computer readable storage medium may be, for example, but is not limited to, an electronic memory device, a magnetic memory device, an optical memory device, an electromagnetic memory device, a semiconductor memory device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a Static Random Access Memory (SRAM), a portable compact disc read-only memory (CD-ROM), a Digital Versatile Disc (DVD), a memory stick, a floppy disk, a mechanical coding device, such as punch cards or in-groove projection structures having instructions stored thereon, and any suitable combination of the foregoing. Computer-readable storage media as used herein is not to be construed as transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through a waveguide or other transmission medium (e.g., optical pulses through a fiber optic cable), or electrical signals transmitted through electrical wires.

The computer-readable program instructions described herein may be downloaded from a computer-readable storage medium to a respective computing/processing device, or to an external computer or external storage device via a network, such as the internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers and/or edge servers. The network adapter card or network interface in each computing/processing device receives computer-readable program instructions from the network and forwards the computer-readable program instructions for storage in a computer-readable storage medium in the respective computing/processing device.

The computer program instructions for carrying out operations of the present disclosure may be assembler instructions, Instruction Set Architecture (ISA) instructions, machine-related instructions, microcode, firmware instructions, state setting data, or source code or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The computer-readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider). In some embodiments, the electronic circuitry that can execute the computer-readable program instructions implements aspects of the present disclosure by utilizing the state information of the computer-readable program instructions to personalize the electronic circuitry, such as a programmable logic circuit, a Field Programmable Gate Array (FPGA), or a Programmable Logic Array (PLA).

Various aspects of the present disclosure are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the disclosure. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer-readable program instructions.

These computer-readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer-readable program instructions may also be stored in a computer-readable storage medium that can direct a computer, programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer-readable medium storing the instructions comprises an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.

The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer, other programmable apparatus or other devices implement the functions/acts specified in the flowchart and/or block diagram block or blocks.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The description of various embodiments of the present disclosure has been presented for purposes of illustration but is not intended to be exhaustive or limited to the disclosed embodiments. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terms used herein were chosen in order to best explain the principles of the embodiments, the practical application, or technical improvements to the techniques found in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.

1. A system, comprising:

a computing node comprising a computer readable storage medium having program instructions embodied therewith, the program instructions executable by a processor of the computing node to cause the processor to perform a method comprising:

receiving historical inspection data for the plant, the inspection data including an indication of a defect in one or more products produced by the plant;

extracting a plurality of features from the inspection data;

providing the plurality of features to a defect prediction model, wherein the defect prediction model comprises a trained classifier or a collaborative filter;

obtaining from a defect prediction model an indication of a plurality of defects that are likely to occur in the one or more products.

2. The system of item 1, wherein the defect prediction model comprises a trained classifier and a collaborative filter configured to provide a consensus output.

3. The system of item 1, wherein the defect prediction model comprises a trained classifier and a collaborative filter configured to provide a collective output.

4. The system of item 1, wherein the indication of the defect in the one or more products comprises an indication of a defect in a predetermined product type, product line, or product category.

5. The system of item 1, wherein the indication of the defect in the one or more products comprises the plurality of defect names and a defect rate corresponding to each of the plurality of defect names.

6. The system of item 1, wherein the plurality of features comprises:

the attributes of the plant that were examined in the past,

an attribute of the one or more products, or

Attributes of defects in the one or more products.

7. The system of item 1, wherein the trained classifier comprises an artificial neural network.

8. The system of item 7, wherein the artificial neural network comprises a deep neural network.

9. The system of item 1, wherein the collaborative filter comprises a neighborhood model or a latent factor model.

10. The system of item 1, wherein the plurality of defects comprises a predetermined number of most likely defects.

11. The system of item 1, the method further comprising preprocessing the data.

12. The system of item 11, wherein preprocessing the data comprises aggregating the data.

13. The system of item 12, wherein preprocessing the data further comprises filtering the data.

14. The system of item 1, wherein extracting the plurality of features from the data comprises applying, for each of the indications of defects, a mapping from a defect name to one or more standardized defect names from a predetermined designation.

15. The system of item 1, wherein the historical verification data comprises a plurality of product names, and wherein extracting the plurality of features from the data comprises applying a mapping from each of the plurality of product names to a standardized product name from a predetermined designation.

16. The system of item 1, wherein the method further comprises:

the historical verification data of the plant is anonymized.

17. The system of item 1, wherein the data further comprises a performance history of the plant.

18. The system of item 1, wherein the data further comprises geographic information of the plant.

19. The system of item 1, wherein the data further comprises product data for the plant.

20. The system of item 1, wherein the data further comprises brand data for the tested product of the plant.

21. The system of item 1, wherein the data spans a predetermined time window.

22. The system of item 1, wherein

Providing the plurality of features to a defect prediction model includes sending the plurality of features to a remote defect prediction server, an

Obtaining the indication of the plurality of defects from the defect prediction model includes receiving the indication of the plurality of defects from a defect prediction server.

23. The system of item 1, wherein extracting the plurality of features comprises applying a dimensionality reduction algorithm.

24. The system of item 1, wherein the indication of the plurality of defects that may occur comprises a list of the plurality of defects that may occur at the factory.

25. The system of item 24, wherein the list comprises a defect name, a defect rate, and a defect description for each defect in the plurality of defects.

26. The system of item 24, wherein the list comprises a list of a plurality of defects that may occur in a particular purchase order, product style, product line, or product category.

27. The system of item 22, wherein obtaining the indication of the plurality of defects further comprises an indication of a report to a user.

28. The system of item 27, wherein providing the indication to the user comprises sending the indication to a mobile or web application.

29. The system of item 28, wherein the sending is performed via a wide area network.

30. The system of item 1, wherein the trained classifier comprises a support vector machine.

31. The system of item 1, wherein obtaining the indication from the defect prediction model comprises applying a gradient enhancement algorithm.

32. The system of item 1, wherein the method further comprises:

measuring performance of a defect prediction model by comparing indications of a plurality of defects to ground truth indications of the plurality of defects;

parameters of the defect prediction model are optimized according to the performance.

33. The system of item 32, wherein optimizing the parameters of the defect prediction model comprises modifying hyper-parameters of the trained machine learning model.

34. The system of item 32, wherein optimizing the parameters of the defect prediction model comprises replacing the first machine learning algorithm with a second machine learning algorithm comprising hyper-parameters configured to improve performance of the defect prediction model.

35. A method, comprising:

extracting a plurality of features from the inspection data;

36. The method of item 35, wherein the defect prediction model comprises a trained classifier and a collaborative filter configured to provide a consensus output.

37. The method of item 35, wherein the defect prediction model comprises a trained classifier and a collaborative filter configured to provide a collective output.

38. The method of item 35, wherein the indication of defects in the one or more products comprises an indication of defects in a predetermined product type, product line, or product category.

39. The method of item 35, wherein the indication of the defect in the one or more products comprises the plurality of defect names and a defect rate corresponding to each of the plurality of defect names.

40. The method of item 35, wherein the plurality of features comprises:

the attributes of the plant that were examined in the past,

an attribute of the one or more products, or

Attributes of defects in the one or more products.

41. The method of item 35, wherein the trained classifier comprises an artificial neural network.

42. The method of item 41, wherein the artificial neural network comprises a deep neural network.

43. The method of item 35, wherein the collaborative filter comprises a neighborhood model or a latent factor model.

44. The method of item 35, wherein the plurality of defects comprises a predetermined number of most likely defects.

45. The method of item 35, further comprising preprocessing the data.

46. The method of item 45, wherein preprocessing the data comprises aggregating the data.

47. The method of item 46, wherein preprocessing the data further comprises filtering the data.

48. The method of item 35, wherein extracting the plurality of features from the data comprises applying, for each of the indications of defects, a mapping from a defect name to one or more standardized defect names from a predetermined designation.

49. The method of item 35, wherein the historical verification data comprises a plurality of product names, and wherein extracting the plurality of features from the data comprises applying a mapping from each of the plurality of product names to a standardized product name from a predetermined designation.

50. The method of item 35, further comprising:

the historical verification data of the plant is anonymized.

51. The method of item 35, wherein the data further includes a performance history of the plant.

52. The method of item 35, wherein the data further comprises geographic information of the plant.

53. The method of item 35, wherein the data further comprises product data for the plant.

54. The method of item 35, wherein the data further comprises brand data for the tested product of the plant.

55. The method of item 35, wherein the data spans a predetermined time window.

56. The method of item 35, wherein

57. The method of item 35, wherein extracting the plurality of features comprises applying a dimensionality reduction algorithm.

58. The method of item 35, wherein the indication of the plurality of defects that may occur comprises a list of the plurality of defects that may occur at the factory.

59. The method of item 58, wherein the list includes a defect name, a defect rate, and a defect description for each of the plurality of defects.

60. The method of item 58, wherein the list comprises a list of a plurality of defects that may occur in a particular purchase order, product style, product line, or product category.

61. The method of item 56, wherein obtaining the indication of the plurality of defects further comprises an indication of a report to a user.

62. The method of item 61, wherein providing the indication to the user comprises sending the indication to a mobile or web application.

63. The method of item 62, wherein the sending is performed via a wide area network.

64. The method of item 35, wherein the trained classifier comprises a support vector machine.

65. The method of item 35, wherein obtaining the indication from the defect prediction model comprises applying a gradient enhancement algorithm.

66. The method of item 35, further comprising:

67. The method of item 66, wherein optimizing parameters of the defect prediction model comprises modifying hyper-parameters of the trained machine learning model.

68. The method of item 66, wherein optimizing parameters of the defect prediction model comprises replacing the first machine learning algorithm with a second machine learning algorithm comprising hyper-parameters configured to improve performance of the defect prediction model.

69. A computer program product for defect prediction, the computer program product comprising a computer readable storage medium having program instructions embodied therewith, the program instructions being executable by a processor to cause the processor to perform the method of items 35-68.

Claims

1. A system, comprising:

extracting a plurality of features from the inspection data;

2. The system of claim 1, wherein the defect prediction model comprises a trained classifier and a collaborative filter configured to provide consensus outputs.

3. The system of claim 1, wherein the defect prediction model comprises a trained classifier and a collaborative filter configured to provide a collective output.

4. The system of claim 1, wherein the indication of defects in the one or more products comprises an indication of defects in a predetermined product type, product line, or product category.

5. The system of claim 1, wherein the indication of defects in one or more products comprises the plurality of defect names and a defect rate corresponding to each of the plurality of defect names.

6. The system of claim 1, wherein the plurality of features comprises:

the attributes of the plant that were examined in the past,

an attribute of the one or more products, or

Attributes of defects in the one or more products.

7. The system of claim 1, wherein the trained classifier comprises an artificial neural network.

8. The system of claim 7, wherein the artificial neural network comprises a deep neural network.

9. The system of claim 1, wherein the collaborative filter comprises a neighborhood model or a latent factor model.

10. The system of claim 1, wherein the plurality of defects comprises a predetermined number of most likely defects.