CN118056189A - Abnormal root cause analysis method and device - Google Patents

Abnormal root cause analysis method and device Download PDF

Info

Publication number
CN118056189A
CN118056189A CN202280003212.8A CN202280003212A CN118056189A CN 118056189 A CN118056189 A CN 118056189A CN 202280003212 A CN202280003212 A CN 202280003212A CN 118056189 A CN118056189 A CN 118056189A
Authority
CN
China
Prior art keywords
data
product
production
root cause
abnormal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202280003212.8A
Other languages
Chinese (zh)
Inventor
王瑜
沈鸿翔
贺王强
沈国梁
兰天
袁菲
汤玥
王海金
何德材
吴建民
王洪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
BOE Technology Group Co Ltd
Beijing Zhongxiangying Technology Co Ltd
Original Assignee
BOE Technology Group Co Ltd
Beijing Zhongxiangying Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by BOE Technology Group Co Ltd, Beijing Zhongxiangying Technology Co Ltd filed Critical BOE Technology Group Co Ltd
Publication of CN118056189A publication Critical patent/CN118056189A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment

Landscapes

  • Engineering & Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Factory Administration (AREA)

Abstract

An abnormal root cause analysis method and device, the method comprising: obtaining product data to be processed corresponding to a target product; the product data to be processed are obtained by fusing production data and detection data corresponding to the target product according to a first preset parameter; according to the detection data, determining normal product data and abnormal product data in the product data to be processed; inputting the normal product data and the abnormal product data into a first factor analysis model to obtain first influence factor information of a detection result of the target product, wherein the first influence factor comprises one or more of the production data, and the first factor analysis model indicates a tree model, so that accuracy of determining the influence factor of the product is guaranteed, and determination efficiency is improved.

Description

Abnormal root cause analysis method and device Technical Field
The application relates to the technical field of computers, in particular to an abnormal root cause analysis method and device.
Background
Along with the development of technology, the production and manufacturing industry is rapidly developed, and the number of products produced by enterprises is increasingly increased, but the products produced by enterprises often have defects, damages or unusable conditions, namely defective products, in the production or use process. In order to ensure the income of enterprises and the quality of products, the reasons for the abnormality of the products need to be found out so as to be better improved.
Currently, when searching for a cause of a product abnormality, an engineer typically analyzes related data (e.g., production data) of the product to determine the cause of the product abnormality. However, this approach is less effective and less time efficient because it is determined by experience of the engineer.
Disclosure of Invention
In order to overcome the problems in the related art, the application provides an abnormal root cause analysis method and device.
According to a first aspect of the present application, there is provided a method of anomaly root cause analysis, the method comprising:
Obtaining product data to be processed corresponding to a target product; the product data to be processed are obtained by fusing production data and detection data corresponding to the target product according to a first preset parameter;
According to the detection data, determining normal product data and abnormal product data in the product data to be processed;
Inputting the normal product data and the abnormal product data into a first cause analysis model to obtain first influence factor information of a detection result of the target product, wherein the first influence factor comprises one or more of the production data, and the first cause analysis model indicates a tree model.
According to a second aspect of the present application, there is provided a method of anomaly root cause analysis, comprising:
Obtaining sample data to be processed corresponding to a target object;
Determining positive samples and negative samples in the sample data to be processed; wherein the positive and negative samples each comprise a first parameter;
And inputting the positive sample and the negative sample into a second root cause analysis model to obtain second influence factor information of the judging result of the target object.
According to a third aspect of the present application, there is provided an abnormal root cause analysis system including a data management server, an analysis server, and a display;
The data management server is configured to store data and extract, convert or load the data; the data includes at least one of production data and inspection data;
The analysis server is configured to acquire to-be-processed product data corresponding to a target product from the data management server when a task request is received, and determine normal product data and abnormal product data in the to-be-processed product data according to detection data in the to-be-processed product data; inputting the normal product data and the abnormal product data into a first root cause analysis model to obtain first influence factor information of a detection result of the target product, wherein the first influence factor comprises one or more of the production data, and the first root cause analysis model indicates a tree model; the product data to be processed are obtained by fusing the production data and the detection data corresponding to the target product according to a first preset parameter;
the display is configured to display the first influence factor information through a visual interface.
According to a fourth aspect of the present application, there is provided an abnormal root cause analysis apparatus comprising:
The first data acquisition module is used for acquiring product data to be processed corresponding to a target product; the product data to be processed are obtained by fusing production data and detection data corresponding to the target product according to a first preset parameter;
the first data processing module is used for determining normal product data and abnormal product data in the product data to be processed according to the detection data;
The first root cause determining module is used for inputting the normal product data and the abnormal product data into a first root cause analysis model to obtain first influence factor information of the detection result of the target product, wherein the first influence factor comprises one or more of the production data, and the first root cause analysis model indicates a tree model.
According to a fifth aspect of the present application, there is provided an abnormal root cause analysis apparatus comprising:
The second data acquisition module is used for acquiring sample data to be processed corresponding to the target object;
A second data processing module for determining positive and negative samples in the sample data to be processed; wherein the positive and negative samples each comprise a first parameter;
And the second root cause determining module is used for inputting the positive sample and the negative sample into a second root cause analysis model to obtain second influence factor information of the judging result of the target object.
According to a sixth aspect of the present application, there is provided an electronic device comprising:
The system comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor executes the program to realize the method for analyzing the root cause of the abnormal report according to the first aspect and the various possible designs of the first aspect.
According to a seventh aspect of the present application, there is provided an electronic device comprising:
A memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the anomaly root cause analysis method as described above in the second aspect and the various possible designs of the second aspect when the program is executed.
According to an eighth aspect of the present application, there is provided a computer-readable storage medium having stored therein computer-executable instructions which, when executed by a processor, implement the method for analyzing an abnormal root cause according to the above first aspect and the various possible designs of the first aspect.
According to a ninth aspect of the present application, there is provided a computer-readable storage medium having stored therein computer-executable instructions which, when executed by a processor, implement the method for analyzing an abnormal root cause according to the above second aspect and the various possible designs of the second aspect.
According to a tenth aspect of the present application, there is provided a computer program product comprising a computer program which, when executed by a processor, implements the method of anomaly root cause analysis as described above for the first aspect and the various possible designs of the first aspect.
According to an eleventh aspect of the present application, there is provided a computer program product comprising a computer program which, when executed by a processor, implements the method of anomaly root cause analysis according to the above second aspect and the various possible designs of the second aspect.
The technical scheme provided by the embodiment of the application can comprise the following beneficial effects:
According to the application, the to-be-processed product data corresponding to the target products are classified based on the detection data corresponding to the target products, so that normal product data and abnormal product data are obtained, the normal product data indicate to-be-processed product data of the target products with normal detection results, the abnormal product data indicate to-be-processed product data of the target products with abnormal detection results, and the normal product data and the abnormal product data comprise production parameters. The normal product data and the abnormal product data are analyzed by utilizing the first root cause analysis model, so that the influence factor information affecting the detection result of the target product, namely the first influence factor information, is determined by utilizing the production data, thereby determining the cause of product abnormality when the detection result is abnormal, realizing automatic analysis of the product data, realizing automatic determination of the cause of product abnormality, namely the root cause, ensuring the accuracy of determining the influence factor of the product without depending on manual determination, and improving the determination efficiency.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application as claimed.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the application and together with the description, serve to explain the principles of the application.
FIG. 1 is a schematic diagram of a distributed computing environment in accordance with an exemplary embodiment of the present application.
FIG. 2 is a schematic diagram of software modules in an anomaly root cause analysis system according to an exemplary embodiment of the present application.
Fig. 3 is a schematic diagram of a data management server according to an exemplary embodiment of the present application.
FIG. 4 is a flow chart illustrating a method of anomaly root cause analysis according to an exemplary embodiment of the present application.
FIG. 5 is a flow chart illustrating another method of anomaly root cause analysis according to an exemplary embodiment of the present application.
FIG. 6 is a flow chart illustrating yet another method of anomaly root cause analysis according to an exemplary embodiment of the present application.
Fig. 7 is a hardware configuration diagram of an electronic device in which an abnormal root cause analysis apparatus is located according to an exemplary embodiment of the present application.
FIG. 8 is a block diagram of an anomaly root cause analysis device according to an exemplary embodiment of the present application.
FIG. 9 is a block diagram of another anomaly root cause analysis device according to an exemplary embodiment of the present application.
Detailed Description
Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples do not represent all implementations consistent with the application. Rather, they are merely examples of apparatus and methods consistent with aspects of the application as detailed in the accompanying claims.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in this specification and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any or all possible combinations of one or more of the associated listed items.
It should be understood that although the terms first, second, third, etc. may be used herein to describe various information, these information should not be limited by these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of the application. The word "if" as used herein may be interpreted as "at … …" or "at … …" or "in response to a determination" depending on the context.
During the production or use of the product, abnormal conditions such as defects, damage or incapability of continuing use may occur. The related personnel need to analyze the related data (e.g., production data) of the product to determine the cause of the product anomaly. However, this approach is less effective and less time efficient because it is determined by experience of the engineer.
Taking a semiconductor or display panel related product as an example, various defects may occur during the manufacturing process. Examples of defects include particles, residues, line defects, holes, splatters, wrinkles, discoloration, and bubbles. Defects occurring in the manufacture of semiconductor electronic devices are difficult to track.
Although the present application is described in the specific context of industrial production (especially, panel display production), the present application is not limited thereto, and in fact, other sample anomaly detection conditions may be used in the embodiments of the present application with appropriate adjustments and modifications, and may be further generalized and applied to a general data analysis or machine learning platform.
In one aspect, the present application provides an anomaly root cause analysis system. In some embodiments, the anomaly root cause analysis system includes a distributed computing system including one or more networked computers configured to execute in parallel to perform at least one common task; one or more computer-readable storage media storing instructions that cause the distributed computing system to perform operations comprising. In some embodiments, the distributed computing system comprises: a data management server configured to store data and extract, convert, or load data, wherein the data includes at least one of production data and inspection data; an analysis server configured to acquire data from the data management server upon receiving a task request, perform an algorithm analysis on the data to obtain an anomaly root cause (i.e., influence factor information, i.e., first influence factor information and/or second influence factor information); and a display configured to provide a visual interface to display the results of the abnormal root cause analysis. Alternatively, the anomaly root cause analysis system is used for defect analysis in display panel manufacturing.
As used herein, the term "distributed computing system" generally refers to an interconnected computer network having a plurality of network nodes that connect a plurality of servers or hosts to one another or to an external network (e.g., the internet). The term "network node" generally refers to a physical network device. Example network nodes include routers, switches, hubs, bridges, load balancers, security gateways, or firewalls. A "host" generally refers to a physical computing device configured to implement, for example, one or more virtual machines or other suitable virtualized components. For example, the host may include a server having a hypervisor configured to support one or more virtual machines or other suitable types of virtual components.
FIG. 1 illustrates a distributed computing environment in some embodiments according to the application. Referring to FIG. 1, in a distributed computing environment, a plurality of autonomous computers/workstations, referred to as nodes, communicate with each other in a network, such as a LAN (local area network), to solve tasks, such as executing applications. Each computer node typically includes its own processor(s), memory, and communication links to other nodes. The computers may be located within a particular location (e.g., a clustered network) or may be connected through a wide area network (LAN) such as the internet. In such a distributed computing environment, different applications may share information and resources.
The networks in the distributed computing environment may include Local Area Networks (LANs) and Wide Area Networks (WANs). The network may include both wired technologies (e.g., ethernet) and wireless technologies (e.g.,Code Division Multiple Access (CDMA), global System for Mobile (GSM), universal Mobile Telephone Service (UMTS), bluetooth,Etc.).
The plurality of computing nodes are configured to join a resource group to provide a distributed service. A computing node in a distributed network may include any computing device, such as a computing device or a user device. The computing node may also include a data center. As used herein, a computing node may refer to any computing device or computing devices (i.e., a data center). The software modules may be executed on a single computing node (e.g., server) or distributed across multiple nodes in any suitable manner.
The distributed computing environment may also include one or more storage nodes for storing information related to execution of the software modules and/or output and/or other functions generated by execution of the software modules. One or more storage nodes communicate with each other in the network and with one or more computing nodes in the network.
FIG. 2 illustrates an anomaly root cause analysis system architecture in some embodiments according to the application. Referring to FIG. 2, the anomaly root cause analysis system includes a distributed computing system including one or more networked computers configured to execute in parallel to perform at least one common task; one or more computer-readable storage media storing instructions that, when executed by the distributed computing system, cause the distributed computing system to perform corresponding operational steps. In some embodiments, a distributed computing system includes a data management server configured to store data and to extract, convert, or load the data; an analysis server connected to the data management server and configured to acquire data from the data management server upon receipt of a task request and perform an analysis task, and a display configured to display analysis task results through a visual interface. The analysis server includes a plurality of business servers (similar to backend servers) and a plurality of algorithm servers configured to obtain data directly from the data management server. In some embodiments, the distributed computer system further comprises a query engine coupled to the data management server and configured to obtain data directly from the data management server. Alternatively, the query engine is a query engine based on Impala technology. As used herein, the term "connected to" in the context of the present application refers to a relationship having a direct flow of information or data from a first component to a second component of a system and/or from the second component to the first component of the system.
In some embodiments, the analysis server may acquire, when receiving a task request, to-be-processed product data corresponding to a target product from the data management server, and determine normal product data and abnormal product data in the to-be-processed product data according to detection data in the to-be-processed product data; inputting the normal product data and the abnormal product data into a first root cause analysis model to obtain first influence factor information of a detection result of a target product, wherein the first influence factor comprises one or more of production data, and the first root cause analysis model indicates a tree model; the product data to be processed is obtained by fusing production data and detection data corresponding to the target product according to the first preset parameters; correspondingly, the display can display the first influence factor information through a visual interface.
In some embodiments, the data management server includes an ETL module configured to extract, convert, or load data from at least one data source into a database of the data management server. Upon receipt of the assigned task, the at least one algorithm server is configured to obtain the data to be analyzed directly from the data management server. In performing the anomaly analysis, the at least one algorithm server is configured to computationally analyze the data to be analyzed and to send the resulting data to the data management server. The at least one algorithm server deploys various general algorithms for anomaly root cause analysis, such as big data analysis based algorithms, which may be algorithms based on specific machine learning models, such as one or more of decision trees, random forests, GBDT, LGBM, XGBoost, catBoost, naive bayes, support vector machines, adaboost, neural network models, etc., and may also be other statistical algorithm models, such as WOE & IV, apriori, etc. Also included are the abnormal root cause analysis algorithms mentioned below, which are not limited herein. The at least one algorithm server is configured to analyze the data to identify a cause of the anomaly generation. In another embodiment, the algorithm server is further configured to infer or predict whether an anomaly is generated based on the production data. As used herein, the term "ETL module" refers to computer program logic configured to provide functions such as extracting, converting, or loading data. In some embodiments, the ETL module is stored on a storage node, loaded into memory, and executed by a processor. In some embodiments, the ETL module is stored on one or more storage nodes in the distributed network, loaded into one or more memories in the distributed network, and executed by one or more processors in the distributed network.
The data management server stores data for the anomaly root cause analysis system. For example, the data management server stores data required for the algorithm analysis by the algorithm server. In another example, the data management server stores results of the algorithmic analysis. For algorithmic analysis and interactive display to the user, data from multiple data sources is purged and combined by the ETL module into the data to be analyzed. Examples of data for the abnormal root cause analysis include product history data, process parameter data, detected abnormal location data, and the like. The amount of data in a manufacturing process (e.g., a manufacturing process of a display panel) is huge, for example, there may be more than hundreds of G of data per day in a factory site. In order to meet the user's demand for defect analysis, it is necessary to increase the speed at which the algorithm server reads the production data. In one example, the data required for the algorithmic analysis is stored in a database based on Apache Hbase technology to improve efficiency and save storage space. In another example, the results of the algorithm analysis and other auxiliary data are stored in a APACHE HIVE technology-based data warehouse. In another example, the data may be stored in a database of Apache Beam technology (or, as the Apache Beam model). It is understood that the data management server may comprise one or more of an Hbase based database, a Hive based data warehouse, and a Apahce Beam based database.
APACHE HIVE is an open source data warehouse system built on top of Hadoop for querying and analyzing large data in structured and semi-structured forms stored in Hadoop files. APACHE HIVE is mainly used for batch processing and is therefore called OLAP.
Apache Hbase is a non-relational column-oriented distributed database running on top of the Hadoop Distributed File System (HDFS). In addition, it is a NoSQL open source database that stores data in columns. Apache Hbase is mainly used for transactions and is called OLTP. In the case of Apache HbaseTM, real-time processing is possible. Apache Hbase is a NoSQL database.
Apache Beam is an open source unified model that defines batch and stream data parallel processing pipelines. Using the open source BeamSDK therein, a defined pipeline program can be constructed.
In one example, the various components of the data management platform (e.g., data lakes, data warehouses) may be in the form of, for example, apache Hadoop, APACHE HIVE-based distributed data store, or Apache Beam-based distributed data store.
Fig. 3 illustrates a data management server in some embodiments according to the application. Referring to FIG. 3, in some embodiments, the data management server includes a distributed storage system (DFS), such as a Hadoop Distributed File System (HDFS). The data management server is configured to store data collected from at least one data source. The data source may be a database in a factory production system, or may be other data sources, not limited herein. Typically, data produced during factory production is stored in relational databases (e.g., oracle, mysql, etc.), but applications based on relational database management system (RDBMS) grid computing have limited hardware scalability. When the data volume reaches a certain order of magnitude, the input/output bottleneck of the hard disk makes processing large amounts of data very inefficient. Parallel processing of distributed file systems may meet challenges presented by increased data storage and computing requirements. In the abnormal root cause analysis process, firstly, data in a data source is extracted to a data management server, so that the processing process is greatly quickened. The data management server comprises a data lake, a data warehouse and a NoSQL database. In some embodiments, the data management platform comprises multiple sets of data with different content and/or storage structures, and in the present application, each set of data is defined as a "data layer", and the data lake, the data warehouse and the NoSQL database are different data layers in the data management server.
The data lake is configured to store a first set of data formed by the ETL module extracting raw data from at least one data source, the first set of data having the same content as the raw data. In some embodiments, the ETL module first extracts raw data from at least one data source into a data management server, forming a first data layer (e.g., a data lake). A data lake is a centralized HDFS or KUDU database configured to store any structured or unstructured data. The data lake DL is configured to store a first set of data extracted from at least one data source by the ETL module. The first set of data and the original data have the same content. Optionally, the dimensions and attributes of the original data are saved in the first set of data. In some embodiments, the first set of data stored in the data lake comprises dynamically updated data. Optionally, the dynamically updated data includes Kudu-based database real-time updated data, or periodically updated data in a Hadoop distributed file system. In one example, the periodic update data stored in the Hadoop distributed file system is periodic update data stored in APACHE HIVE-based memory. In one example, the dynamically updated data includes real-time update data and periodic update data. In one example, the real-time update represents less than a minute level, and does not include an update of minutes; periodic updates represent updates above the minute level and include minutes. It will be appreciated that the process of data from the data source to the first data layer is a backup of data content between two data management systems.
The data warehouse is configured to store a second set of data formed by the ETL module cleaning and normalizing the first set of data. In some embodiments, the data management server includes a second data layer, such as a data repository DW. The data repository DW comprises an internal storage system configured to provide data in an abstract manner, for example in a Table format (Table) or View format (View), without exposing the file system. The data repository DW may be based on APACHE HIVE. The ETL module ETLP is configured to extract, purge, convert, or load the first set of data to form a second set of data. Optionally, the second set of data is formed by subjecting the first set of data to cleaning and normalization. The data in the data warehouse layer can be understood as the data obtained by preprocessing the data in the data lake layer. Preprocessing includes cleaning of data, such as de-empting, de-duplication, removing unused fields, etc. Specifically, the server recognizes the missing values ("NA", "/", "null", "unknown") and converts to a unified missing value form. Preprocessing also includes standardization of data, such as the server detecting different time field formats and performing unified standard format conversion.
In some embodiments, preprocessing further includes data fusion. I.e. the second set of data further comprises data summarizing and fusing the first set of data. Data sink always refers to statistics, such as quantity summaries, percentage calculations, etc., of the same field or record in the data table. For example, in the display panel manufacturing process, the defective rate of one substrate (glass) can be calculated by counting the number of defective panels (panels) contained in one substrate/the total number of panels. Fusion refers to fusion of data tables. For abnormal root cause analysis, abnormal content data and root cause result data are often generated in two data tables respectively, and the time content data table and the root cause result data table can be fused into one table according to the same index field in the two data tables through data table fusion. In the production and manufacturing process, the production data table and the detection data table can be fused according to the same ID to form a complete data table for subsequent analysis. Furthermore, the integration and splitting of different data can be performed based on different analysis subjects, so that the subsequent data processing efficiency is improved. It can be understood that the data from the first data layer to the second data layer is to further process the backed up data to facilitate the management, presentation, etc. of the data in the data management server. It should be noted that, the preprocessing process in the present application may be performed in the data management server, or the preprocessing (such as cleaning, fusion, etc.) operation of the data may be completed in the process of data analysis and calculation (performed by the analysis server), and the execution timing of the preprocessing process is not limited.
The NoSQL database is configured to store a third set of data formed by converting the second set of data by the ETL module. In some embodiments, the third set of data is key-value data. In some embodiments, the data management platform includes a third data layer (e.g., a NoSQL database). In some embodiments, the third data layer is a database, e.g., HBase, clikHouse, that stores the NoSQL types available for computing processing. The ETL module is configured to convert the second set of data of the second data layer to form a third set of data. It will be appreciated that the change in data from the second data layer to the third data layer is a change in data storage structure, forming a database structure of the NoSQL type, such as a columnar database structure such as HBase. Compared with Hive, the NoSQL database can perform interactive response with the front-end interface faster in calculation and query, and can better process the requirements of users on real-time data query and calculation. Thus, in some embodiments, the data acquired by the analysis server (e.g., algorithm server) is data in a NoSQL database.
A process of converting the second set of data to form a third set of data. In one example, a first table is generated in a third data layer and a second table (e.g., an external table) is generated in a second data layer. The first table and the second table are configured to be synchronized such that when data is written to the second table, the first table will be updated simultaneously to include the corresponding data. In another example, the distributed computing processing module may be used to read data written into the second data layer. The MapReduce module in Hadoop may be used as a distributed computation processing module for reading data written into the second data layer. The data written in the second data layer may then be written in the third data layer. In one example, the HBase-based API may be used to write data into an HBase database. In another example, the MapReduce module, upon reading the data written to the second data layer, may generate HFile files that are bulk loaded (Bulkloaded) into the third data layer.
It will be appreciated by those skilled in the art that the first, second, and third sets of data may be stored and queried based on one or more data tables.
Alternatively, the data table of the third set of data may be the same table as the second set of data, or the data table of the second set of data may be split into a plurality of sub-tables. The plurality of sub-tables may be a plurality of sub-data tables having an index relationship. In one example of the present application, the data table of the third set of data includes a plurality of sub-data tables having an index relationship formed by splitting the data table of the second set of data. The sub-data table splitting may be based on screening criteria of the user interaction interface, keys of the third set of data, and/or value information. Thus, a first index of the plurality of index relationships corresponds to a filter criteria of the front-end interface, e.g., to a user-defined analysis scope or criteria in a user-interactive interface in communication with the data management server, thereby facilitating a faster data query and calculation process. In some embodiments, the plurality of sub-data tables includes a first sub-table, a second sub-table, and a third sub-table; the first sub-table comprises data screening options presented by the visual interface; the second sub-table includes a product serial number; and the third sub-table comprises data corresponding to the product serial number.
In some embodiments, the plurality of sub-data tables further comprises a fourth sub-table comprising manufacturing site information and/or equipment information, the third sub-table comprising codes or abbreviations for the manufacturing sites and/or equipment.
In some embodiments, the plurality of sub-tables has an index relationship between at least two sub-tables of the plurality of sub-tables. Optionally, splitting the data in the plurality of sub-tables is based on the screening criteria, keys of the third set of data, and/or value information. In some embodiments, the plurality of sub-tables includes a first sub-table (e.g., an attribute sub-table) that includes data filtering options (e.g., production time, production equipment, production engineering, etc.) presented by a visualization interface in a user interaction interface in communication with the data management server; a second sub-table comprising a product serial number (e.g., a substrate identification number or a lot identification number); and a third sub-table (e.g., a master sub-table) comprising values in the third set of data corresponding to the product serial number. Among the environmental factors described herein include: ambient particulate conditions, equipment temperature and equipment pressure, etc. Alternatively, the second sub-table may include different designated keys, such as a lot identification number or a lot identification number (e.g., a plurality of second sub-tables), based on different topics. Optionally, the values in the third set of data correspond to the baseboard identification number through an index relationship between the third sub-table and the second sub-table. Optionally, the plurality of sub-tables further includes a fifth sub-table (e.g., a metadata sub-table) that includes values in the third set of data that correspond to the lot identification numbers. Optionally, the second sub-table further comprises a lot identification number; the value corresponding to the lot identification number in the third set of data may be obtained by an index relation between the second sub-table and the fifth sub-table. Optionally, the plurality of sub-tables further comprises a fourth sub-table (e.g., a code generator sub-table) comprising manufacturing site information and/or equipment information. Optionally, the third sub-table comprises code or abbreviation of manufacturing site and/or equipment, from which manufacturing site information and/or equipment information can be obtained by means of an index relation between the third sub-table and the fourth sub-table. The third sub-table stores only manufacturing site and/or equipment information, enabling a reduction in data storage.
For a columnar database (e.g., hbase), querying the data in the third data layer may be performed based on the specified key to quickly locate the data (e.g., value) to query. Thus, and as discussed in more detail below, the table stored in the third data layer may be divided into at least three sub-tables. The first sub-table corresponds to data range options (e.g., production time, production equipment, production engineering, etc.) in the user interface for user screening or definition. The second sub-table corresponds to a specified key (e.g., product ID). The third sub-table corresponds to values (e.g., production data and inspection data corresponding to the product ID). It will be appreciated that the range of products that the user needs to analyze can be determined by the first sub-table, so that the corresponding data (value) in the third sub-table can be queried based on the serial number (key) of the corresponding product in the second sub-table. In one example, the third data layer utilizes a NoSQL database based on Hbase; the designated key in the second sub-table may be a row key (row key); and the fused data (column family data corresponding to the row key) in the third sub-table may be stored in the column family data model. Alternatively, the value in the third sub-table may be the data after the production data and the detection data are fused. In addition, the third data layer may further include a fourth sub-table. Some characters in the third sub-table may be stored in the code, for example, for their length or other reasons. The fourth sub-table includes characters (e.g., device names, manufacturing sites) corresponding to the codes stored in the third sub-table. The index or query between the first, second and third sub-tables may be based on the code. The fourth sub-table may be used to replace the code with characters before the results are presented to the user interface.
In some embodiments, data flows, data transformations, and data structures between various components of a data management server are described herein. In some embodiments, the raw data collected by the data source includes production data and inspection data. Wherein the production data includes history data and parameter data. The history data information includes information of a specific process that a product (e.g., a panel or a substrate) has undergone during manufacture. Examples of specific treatments that a product undergoes during manufacture include factories, procedures, stations, equipment, chambers, card slots, and operators. The parameter data information contains information of the specific environmental parameters and their variations experienced by the product (e.g. panel or substrate) during manufacture. Examples of specific environmental parameters and changes to which the product is subjected during manufacture include ambient particulate conditions, equipment temperatures, and equipment pressures, among others. The defect information contains information based on the quality of the inspected product. Example product quality information includes defect type, defect location, defect size, and the like.
In some embodiments, various business data generated by a factory (e.g., data related to semiconductor electronic device manufacturing) is integrated into a plurality of data sources (e.g., oracle databases). The ETL module ETLP extracts data from multiple data sources into a data lake, for example, using a stack tool, SQOOP tool, key tool, pentaho tool, or DataX tool. The data is then cleaned, converted and loaded into the data warehouse and NoSQL databases. Data lakes, data warehouses, noSQL databases store large amounts of data and analysis results using tools such as Kudu, hive, and Hbase.
Information generated during various stages of the manufacturing process is obtained by various sensors and inspection equipment and is then stored in a plurality of data sources. The root cause analysis system extracts and stores the extracted result into the data management server, and the calculation and analysis result generated by the root cause analysis system is also stored in the data management server. Data synchronization (streaming of data) between the various data layers (tables) of the data management server is achieved by the ETL module. For example, the ETL module is configured to obtain parameter configuration templates for the synchronization process, including network permissions and database port configurations, in-flow database names and table names, out-flow database names and table names, field correspondences, task types, scheduling periods, and the like. The ETL module configures parameters of the synchronization process based on the parameter configuration template. The ETL module synchronizes the data and cleans the synchronized data based on the process configuration templates. The ETL module cleans the data through SQL statements to remove null values, remove outliers, and build correlations between the correlation tables. The data synchronization tasks include data synchronization between multiple data sources and data management servers, and data synchronization between the various layers of the data management servers.
In another example, data extraction to a data lake may be accomplished in real-time or offline. In offline mode (which may correspond to the batch importation below), data extraction tasks are scheduled periodically. Alternatively, in offline mode, the extracted data may be stored in a Hadoop-based distributed file system storage (e.g., hive-based database). In real-time mode (which may correspond to the real-time importation below), the data extraction tasks may be performed by OGG (Oracle GoldenGate) in conjunction with APACHE KAFKA. Alternatively, in real-time mode, the extracted data may be stored in Kudu-based databases. The OGG reads log files in multiple data sources (e.g., oracle databases) to obtain add/delete data. In another example, the subject information is read by a flank, and Json is selected as the sync field type. And analyzing the data by using the JAR packet, and sending the analyzed information to Kudu API to realize the addition/deletion of Kudu table data. In one example, the front-end interface may perform display, query, and/or analysis based on data stored in Kudu-based databases. In another example, the front-end interface may perform display, query, and/or analysis based on data stored in any one or any combination of a Kudu-based database, a Hadoop distributed file system (e.g., APACHE HIVE-based database), and/or an Apache Hbase-based database. In another example, short-term data (e.g., generated within a few months) is stored in Kudu-based databases, while long-term data (e.g., all data generated in all cycles) is stored in Hadoop distributed file systems (e.g., APACHE HIVE-based databases). In another example, the ETL module is configured to extract data stored in Kudu-based databases into a Hadoop distributed file system (e.g., APACHE HIVE-based databases).
In some embodiments, for a second set of data in a second data layer, data fusion may be performed based on a different topic. The degree of themeing of the fused data is high, and the degree of aggregation is high, so that the query speed is greatly improved. In one example, tables in a data warehouse may be used to construct tables with dependencies constructed according to different user needs or different topics, with names assigned to the tables according to their respective uses. Various topics may correspond to different data analysis requirements. For example, the topics may correspond to different analysis requirements. In one example, a topic may correspond to an anomaly analysis attributed to one or more manufacturing node groups (e.g., one or more devices), and a data fusion based on the topic may include a data fusion of biographic information and defect information about a manufacturing process. In another example, a topic may correspond to an anomaly analysis attributed to one or more parameter types, and data fusion based on the topic may include data fusion regarding parameter characterization information and defect information. In another example, a topic may correspond to an anomaly analysis of one or more device operations (e.g., devices defined by respective operation sites at which respective operations are performed by respective devices), and data fusion based on the topic may include data fusion of at least two types of information regarding parametric characterization information, biographic information of a manufacturing process, and defect information. In another example, the topic may correspond to feature extraction of at least one type of parameter information to generate parameter feature information, wherein one or more of a maximum, minimum, average, and median are extracted for one type of parameter information. In one example of the present application, the at least one type of parameter information includes data of at least one device parameter, such as temperature, humidity, pressure, etc., and also includes data of environmental granularity, etc.
In some embodiments, the data management server may be an Apache Beam-based database to enable batch, stream parallel processing of data. Optionally, the Apache Beam receives data generated in real time by a data source, and the quality of the product is predicted or inferred in real time by a prediction algorithm through a butted analysis server, so that the real-time query of the user on the interactive interface is realized. Optionally, using Apache Beam to dock data source, extracting data in preset production period in batch, storing into database such as hive, hbase or ClickHouse to perform data precipitation, using the data precipitation to analyze server, and analyzing abnormal root cause by analysis algorithm, thereby accurately positioning abnormal generation cause, and performing timely tracing.
In some embodiments, an Apache Beam-based database implementation includes, first, a distributed computing system receiving BeamSDK class library components; secondly, constructing a data pipeline (pipe line), defining a data type of a key-value pair (key-value), optionally, a sample (product) ID (identity) of a key, and corresponding production data and detection data of the key; again, defining a data processing method in the pipeline, optionally calculating the total number of samples (products), abnormal constants, abnormal rates and arrival rates, and defining the related method of processing data by the ETL module in the pipeline; finally, defining the end-of-pipe data flow direction, optionally the data may flow to an analysis server (e.g., business server, algorithm server). In addition, the user can edit and configure the flow through the dragging component in the visual interface for data source combination, data conversion and data calculation operators.
In some embodiments, the software module further comprises a load balancing server connected to the analysis server. Optionally, a load balancing server (e.g., a first load balancing server) is configured to receive the task request and to distribute the task request to one or more of the plurality of traffic servers to achieve load balancing among the plurality of traffic servers. Optionally, a load balancing server (e.g., a second load balancing server) is configured to distribute tasks from the plurality of traffic servers to one or more of the plurality of algorithm servers to achieve load balancing among the plurality of algorithm servers. Optionally, the load balancing server is a load balancing server based on the nginnx technology. The following describes an anomaly root cause analysis method, it being understood that all or part of the steps in the method may be implemented based on a distributed computing system, or an analysis server or algorithm server.
As shown in fig. 4, fig. 4 is a flowchart illustrating an abnormal root cause analysis method according to an exemplary embodiment of the present application, including the steps of:
Step 401, obtaining product data to be processed corresponding to a target product. The product data to be processed are obtained by fusing production data and detection data corresponding to the target product according to first preset parameters.
In this embodiment, when it is necessary to determine the cause of affecting the detection result of a product, the product is taken as a target product, and the production data and the detection data of the target product are acquired.
Alternatively, the production data represents data related to the production of the target product, such as processing equipment through which the target product passes, production temperature of the target product, and the like. The production data includes a production parameter (i.e., a name of the production parameter) and a parameter value corresponding to the production parameter, for example, when the production data includes a production temperature and a specific value corresponding to the production temperature, the production temperature is the production parameter, and the specific value corresponding to the production temperature is the parameter value corresponding to the production temperature.
Optionally, the production data represents historical information and processing information of the product during the production process. The history parameters comprise information such as product ID, product basic attribute, process section, process station, equipment model and the like of the product in the production process. The processing parameters include processing information of the product in the equipment corresponding to different process sections and/or equipment models, such as pressure, temperature, dwell time, etc.
Alternatively, the detection data indicates data related to detecting the target product, for example, the detection data indicates a detection result of the target product after the production process, which indicates whether the target product (i.e., the target product after the production process) is abnormal. The detection data includes a detection parameter (i.e., a detection parameter name) and a parameter value corresponding to the detection parameter, for example, when the detection data includes a detection result of the target product and a specific value corresponding to the detection parameter (e.g., whether the target product is abnormal or abnormal), the detection result is the detection parameter, and the specific value corresponding to the detection result (e.g., whether the target product is abnormal or abnormal), i.e., the parameter value corresponding to the detection result of the target product.
Optionally, when the process section is finished, optical or electrical detection is performed on the product to detect whether the quality of the product meets the standard, so as to obtain a corresponding detection result, wherein the name of the detection result is a detection parameter, and the specific value of the detection result is a parameter value corresponding to the detection parameter. By detecting the data, it can be identified whether the product is defective, i.e. whether there is an abnormality, and which defects are present.
Specifically, the production data and the detection data of the target product are fused by using first preset parameters existing in both the production data and the detection data, so as to obtain the product data of the target product. The number of target products is at least one, for example, the first preset parameter is a product identifier, and each piece of production data includes the product identifier and a corresponding parameter value thereof. Each piece of detection data comprises a product identifier and a corresponding parameter value. And for each piece of production data (namely, production data corresponding to a target product), acquiring a parameter value corresponding to a product identifier in the production data, taking detection data with the parameter value being the parameter value corresponding to the product identifier as detection data to be fused, and fusing, namely, combining the production data and the detection data to be fused to obtain the product data of the target product corresponding to the production data. For example, the production data corresponding to a target product includes a parameter 1 and a parameter value corresponding thereto, and a parameter 2 and a parameter value corresponding thereto; and if the parameter 1 is a first preset parameter, after the detection data to be fused is determined, the detection data comprises the parameter 1 and the corresponding parameter value thereof and the parameter 3 and the corresponding parameter value thereof, and the fusion is carried out to obtain the product data which is the parameter 1 and the corresponding parameter value thereof, the parameter 2 and the corresponding parameter value thereof and the parameter 3 and the corresponding parameter value thereof. It will be appreciated that, typically, the production data and the detection data are obtained from two tables, and the two tables are fused into one table by the first preset parameter, so that subsequent data processing and calculation can be facilitated.
Optionally, when obtaining the data of the product to be processed corresponding to the target product, the data of the product to be processed corresponding to the target product may be obtained through a data pipeline constructed based on the Apache Beam model.
Step 402, determining normal product data and abnormal product data in the product data to be processed according to the detection data.
In this embodiment, after obtaining the product data to be processed corresponding to the target product, for each target product, determining whether the target product is normal according to the detection data (i.e., the parameter value corresponding to the detection parameter) in the product data to be processed corresponding to the target product, and when the target product is normal, taking the product data to be processed corresponding to the target product as normal product data; when the target product is abnormal, taking the product data to be processed corresponding to the target product as abnormal product data. For example, the target product includes a product 1, the detection parameter corresponding to the product 1 includes a defect point detection result, if the parameter value corresponding to the defect point detection result is that the target product has a defect point, the product 1 is determined to be an abnormal product, and correspondingly, the product data to be processed corresponding to the product 1 is abnormal product data. For another example, when the number of defect points or the defect ratio in the detection result reaches a preset threshold, the abnormal product is considered, otherwise, the normal product is considered.
Step 403, inputting the normal product data and the abnormal product data into a first root cause analysis model, and obtaining first influence factor information of a detection result of the target product, wherein the first influence factor comprises one or more of production data, and the first root cause analysis model indicates a tree model.
In this embodiment, after obtaining normal product data and abnormal product data in product data to be processed, the normal product data and the abnormal product data are both input into a first root cause analysis model (such as a tree model), so that the first root cause analysis model analyzes the normal product data and the abnormal product data, so as to determine influence factor information affecting a detection result of a target product, that is, first influence factor information, by using production parameters in the production data.
Optionally, the first influence factor information includes the first influence factor and/or an influence score corresponding to the first influence factor. The influence score indicates the influence degree of the first influence factor on the detection result of the target product, for example, the detection result indicates that the target product is abnormal, and when the influence score corresponding to the first influence factor is higher, the influence degree of the first influence factor on the target product is higher, that is, the first influence factor is more likely to be the cause of influencing the target product abnormality.
It should be appreciated that for reasons that lead to abnormal results in the product may be largely due to anomalies occurring during the production process, and accordingly, the first influencing factor described above must be reflected in the production data, which may be the result of certain parameters in the production data acting alone or in combination. Thus, the first influencing factor is ultimately presented as one or more parameters in the production data described above. However, since a large amount of data is generated in the production process of the product, it is difficult to manually distinguish which data or which data combinations (dimension increase, dimension decrease and fusion as mentioned below) cause the generation of final anomalies, intelligent analysis is required to be performed on production parameters causing anomalies through a root cause analysis model, so as to obtain the influence scores of all the production parameters or combinations, and thus determine the first influence factor information.
The first impact factor to be described includes one or more of the production data; the first root cause analysis model indicates a tree model.
In this embodiment, the normal product data and the abnormal product data are both input into the first root cause analysis model, that is, the tree model, and the process of obtaining the first influence factor information may obtain the influence factor through multiple training of the tree model, or may obtain the influence factor by using the calculation principle of the tree model without training the tree model.
In some embodiments, inputting the normal product data and the abnormal product data into a tree model to obtain first influence factor information of the detection result of the target product includes calculating a purity index of production data, and determining the first influence factor information based on the purity index. In this embodiment, the tree model is a decision tree model, and the principle is that the purity index is calculated to determine the root node and the multi-level sub-nodes of the decision tree, and the importance degree of the influence factor corresponding to the sub-nodes step by step from the root node is gradually reduced, so that the first influence factor information affecting the detection result can be directly obtained through the decision tree algorithm (calculating the purity index) without training. In this embodiment, the impact score corresponding to the first impact factor, i.e. the purity index determination, may be determined by a tree model. The higher the purity of the product parameters, the lower the uncertainty, and the higher the consistency, i.e. the higher the impact score. Thus, a production parameter with an impact score above a preset score threshold may be taken as the first impact factor. Wherein, when the tree model is constructed by an ID3 algorithm, the information gain index can be used as the purity index; when the tree model is constructed by the C4.5 algorithm, the information gain rate index may be used as the purity index. When the tree model is constructed by CART algorithm, the genie system can be used as the purity index.
In some embodiments, inputting the normal product data and the abnormal product data into a tree model, and obtaining the first influence factor information of the detection result of the target product includes training the tree model, so as to obtain the first influence factor information, and detailed steps are described below.
As is apparent from the above description, the to-be-processed product data corresponding to the target products is classified based on the detection data corresponding to each target product, so as to obtain normal product data and abnormal product data, the normal product data indicates to-be-processed product data of the target product with a normal detection result, the abnormal product data indicates to-be-processed product data of the target product with an abnormal detection result, and the normal product data and the abnormal product data each include production parameters. The normal product data and the abnormal product data are analyzed by utilizing the first root cause analysis model, so that the influence factor information affecting the detection result of the target product, namely the first influence factor information, is determined by utilizing the production parameters, thereby determining the cause of the product abnormality when the detection result is abnormal, realizing the automatic analysis of the product data, realizing the automatic determination of the cause of the product abnormality, namely the root cause, ensuring the accuracy of determining the influence factor of the product without depending on manual determination, and improving the determination efficiency.
As shown in fig. 5, fig. 5 is a flowchart illustrating another method for analyzing an abnormal root cause according to an exemplary embodiment of the present application, which describes how to determine a first influencing factor based on the foregoing embodiment, and the process will be described in detail with reference to a specific embodiment, as shown in fig. 5, and includes the steps of: the method comprises the following steps:
Step 501, obtaining product data to be processed corresponding to a target product. The product data to be processed are obtained by fusing production data and detection data corresponding to the target product according to first preset parameters.
In this embodiment, after obtaining the product data to be processed corresponding to the target product, the discrete data in the product data to be processed is encoded, so that the encoded data is more convenient for subsequent data analysis, and the encoding process specifically includes: and obtaining the type corresponding to the production parameter. And under the condition that the type corresponding to the production parameter is a preset discrete type, encoding the parameter value corresponding to the production parameter in the product data to be processed, and taking the encoding result as a new parameter value of the production parameter.
Optionally, the survival parameter in the product data to be processed includes a process site name, a type corresponding to the process site name is obtained as a name type, the name type belongs to a preset discrete type, and it is determined that a parameter value corresponding to the product site name needs to be encoded, where the parameter value corresponding to the process site name indicates whether the target product passes through a process site corresponding to the process site name, and the specific process is as follows: and acquiring parameter values corresponding to the names of the process stations from the product data to be processed. And under the condition that the parameter value corresponding to the process site name indicates that the target product passes through the process site, updating the parameter value corresponding to the process site name into a first code value, namely taking the first code value as the parameter value corresponding to the process site name. And under the condition that the parameter value corresponding to the process site name indicates that the target product does not pass through the process site, updating the parameter value corresponding to the process site name into a second code value, namely taking the first code value as the parameter value corresponding to the process site name.
The first code value and the second code value may be set according to actual requirements, for example, the first code value is 0, and the second code value is 1.
Alternatively, when the type corresponding to the production parameter is obtained, the type may be obtained from a preset correlation table.
Optionally, the parameter value corresponding to the detected parameter may be encoded, and the processing procedure is similar to the process of encoding the parameter value corresponding to the production parameter, which is not described herein.
Alternatively, the data storage device in the present application may be implemented based on the aforementioned data management server, or may be stored using other storage devices or databases, which is not limited herein.
In some embodiments, the data storage device may be a distributed database. The parallel processing of the distributed database can meet the storage and processing requirements of mass data, a user can process simple data through SQL query, and the complex processing can be realized by adopting a custom function. Therefore, when analyzing mass data, the data is extracted into the distributed database, so that the original data is not damaged, and the data analysis efficiency is improved.
In the embodiment of the present application, the manner of extracting the data into the storage device, that is, the manner of obtaining the data of the product to be processed corresponding to the target product, includes one or more of the following manners: 1) Manually importing, wherein for data to be analyzed, a user can complete importing of the data through an interactive interface at one time, so that the storage device acquires the data to be analyzed; 2) Batch import, similar to manual import, a user can call an API interface or an address of the distributed file system HDFS through an interactive interface, and a large amount of data is imported in batch; 3) The real-time importing is realized by establishing the connection between the original database and the storage device in the analysis system and based on the technology such as Kafka.
Step 502, determining normal product data and abnormal product data in the product data to be processed according to the detection data.
Step 503, inputting the normal product data and the abnormal product data into the tree model to train the tree model.
Step 504, determining first influence factor information according to the trained tree model.
In the present embodiment, normal product data and abnormal product data are input into a tree model to train the tree model. The trained tree model can output production parameters affecting the detection result of the target product, and first influence factor information is obtained.
The training of the tree model represents adjusting the number of production parameters and the weight corresponding to the production parameters. The first influence factor information is determined based on the weight size of the production parameter. Specifically, during the training process, the number of production parameters is increased or decreased, and the weight of the production parameters is adjusted. Then, a certain number of production parameters are selected according to the order of the weights from high to low, and the selected production parameters are used as a first influence factor. Or taking the production parameter with the weight higher than the preset weight threshold as a first influence factor.
The weight of the production parameter can be understood as the corresponding impact factor score of the production parameter. When the production parameter is a first influence factor, the influence factor score corresponding to the production parameter can be used as the influence score of the first influence factor.
In some embodiments, the tree model is trained based on a preset training algorithm. The preset training algorithm includes one or more of a backward search algorithm, a forward search algorithm, a bi-directional search algorithm, and a random search algorithm.
The backward search algorithm includes, for example, a recursive feature elimination method (Recursive Feature Elimination) that uses a tree model to perform multiple rounds of training, after each round of training, eliminates the features of several weight coefficients, or sets a threshold, eliminates the features smaller than the threshold, and then performs the next round of training based on the new feature set, continuously recursively until the number of remaining features reaches the required number of features. Model training by a backward search algorithm can reduce the number of production parameters.
Wherein the forward search algorithm is to first select an optimal one
And taking the single feature subset as a first round of feature subset, adding a feature on the basis to form new two feature subsets, performing model training, selecting an optimal dual feature subset, and continuously increasing iterative training update until the optimal feature subset is found. The method also belongs to a greedy algorithm of heuristic search. Here, model training by the forward search algorithm may increase the number of production parameters.
Illustratively, the bi-directional search algorithm refers to both backward and forward searches being performed simultaneously until both find the same optimal feature subset.
Illustratively, a random search algorithm is to randomly generate a subset of features and then perform a forward or backward search.
In this embodiment, when the first influence factor is determined by using the tree model, the corresponding first influence factor, i.e. the feature subset, is obtained while the model is trained.
In some embodiments, the process of training the tree model is followed by a fusion process of the production parameters during the training of the tree model. Wherein the fusion process indicates that a feature crossover and/or mutation is performed on the production parameters to obtain new features, i.e., production parameters.
Wherein the fusion process comprises a feature cross process and/or a genetic algorithm (CA) based fusion process.
The fusion processing process based on the genetic algorithm comprises the following steps: a product batch of feature sets is first randomized, each feature set comprising one or more features (i.e., production parameters). And after the tree is trained, performing first scoring according to the model test effect as an evaluation index. Summarizing the feature selection result, generating new feature sets by crossing, mutating and other forms on each feature set, continuously and iteratively updating, and eliminating the winner and the winner to finally obtain the feature set with the highest evaluation, namely the synthesis parameter.
The characteristic cross processing process comprises the following steps: the individual features are combined (multiplied or cartesian-integrated) to form a composite feature that helps represent the non-linear relationship. By employing a random gradient descent method, the linear model can be effectively trained. Thus, when using an extended linear model, supplemented with feature combinations is an efficient way to train a large-scale dataset, many different kinds of feature combinations can be created.
Wherein [ AX B ]: the values of the two features are multiplied to form a synthesis parameter. [ A x B x C x D x E ]: the values of the five features are multiplied to form a composite parameter. [ AxA ]: the values of the individual features are squared to form a composite parameter.
The mutation treatment instruction adopts a method of taking log, square and the like to carry out mutation based on the production parameters so as to obtain new production parameters.
Alternatively, the tree model is a simple machine learning model, i.e. of lower complexity, which can be considered as a base model. A more complex integrated model may be formed by combining multiple base models. The complexity level can be understood as depending on how many basic models and/or how many model parameters are.
In some embodiments, the first root cause analysis model may also indicate at least one integrated tree model that is obtained by integrating a plurality of tree models, i.e. the integrated tree model comprises a plurality of tree models. It will be appreciated that an integrated tree model is also one type of tree model, but with a higher complexity relative to a single tree model. Inputting the normal product data and the abnormal product data into a first root cause analysis model to obtain first influence factor information of a detection result of a target product, wherein the first influence factor information comprises: normal product data and abnormal product data are input into the integrated tree model to train the integrated tree model. And determining first influence factor information according to the trained integrated tree model.
In this embodiment, the weights of the production parameters are adjusted during the training of the integrated tree model. The trained integrated tree model can output production parameters affecting the detection result of the target product, and first influence factor information is obtained. The first influence factor information is determined according to the weight size of the production parameter. For example, a certain number of production parameters are selected in the order of the weights from high to low, and the selected production parameters are used as the first influencing factors. Or taking the production parameter with the weight higher than the preset weight threshold as a first influence factor.
Wherein the weight may be an L1 regularization term. Specifically, during the training process, feature selection can be achieved based on L1 regularized tuning parameters. To avoid the over-fitting problem, penalty term L1 regularization is typically introduced to the loss function, and a sparse weight matrix may be generated, i.e., a sparse model is generated for feature selection. In particular, only a few features contribute to the sparse model, most of which are not contributing, or contribute only slightly (since their preceding coefficients are 0 or very small, even if the model is removed, so that we can only focus on features where the coefficients are non-zero values, i.e. the production parameters). In some embodiments, the integrated method includes at least one of boosting, bagging and stacking.
Specifically, boosting methods typically aim at homogeneous weak learners (i.e., homogeneous base models), learn these in sequence in a highly adaptive way (each base model depends on the previous model), and combine them according to some deterministic strategy. Specifically including adaptive lifting AdaBoost (Adaptive Boosting) and gradient lifting Gradient Boosting. AdaBoost is adaptively lifted (Adaptive Boosting) by increasing the weight coefficient of the wrong sample point while decreasing the weight coefficient of the correct sample point to affect the error function, so that the model places a learning emphasis on the sample data with a larger weight coefficient Gradient Boosting is calculated by changing the target value of the sample, constructing a weak model for the negative gradient of the loss function each time (i.e., the negative gradient value for the sample is a new target value), and then integrating this learned weak model into the addition model as the latest item of the addition model, sequentially constructing the weak model until the threshold or other stopping condition is met.
Specifically, the training set of the individual weak learners of bagging is obtained through random sampling. Through 3 random samplings, 3 sampling sets can be obtained. For the 3 sampling sets, 3 weak learners can be trained independently, and then the 3 weak learners are subjected to a set strategy to obtain a final strong learner. For random sampling, a self-service sampling method (Bootstap sampling) is generally adopted, that is, for the original training set of m samples, one sample is randomly collected each time and put into the sampling set, and then the sample is put back, that is, the sample still can be collected during the next sampling, thus the sampling set of m samples can be finally obtained after m times of collection. Because of the random sampling, each sampling set is different from the original training set and other sampling sets are also different, so that a plurality of different weak learners are obtained.
Specifically, the stacking method generally considers a heterogeneous weak learner (i.e., heterogeneous base model) that learns multiple different base models in parallel and combines them by training a meta model to output a final prediction result according to the prediction results of the different weak models.
In some embodiments, the integrated tree includes any one of a random forest model, LGBM model, GBDT model, XGBoost model, and CatBoost model.
In some embodiments, if the first root cause analysis model is a base model (such as a single tree model), since the model is simpler, in order to ensure that the training obtains more accurate efficiency, the number of production parameters and the weights corresponding to the production parameters need to be adjusted in the training process, so as to obtain more accurate influence factor results. In another embodiment, if the first root cause analysis model is an integrated model (such as an integrated tree model), since the model itself is complex, a more accurate training effect can be obtained by conventional training, and thus, no additional adjustment of the number of production parameters and weights is required. Optionally, in the case that the integrated tree model is LGBM models, the decision tree initial parameter information of the integrated tree includes one or more of a range of 2 to 500 decision tree leaves, a range of 25 to 325 decision tree numbers, a range of 1 to 20 decision tree maximum depths, 1.00E-10 to 1.00E-01L 1 regularization coefficients, and 1.00E-10 to 1.00E-01L 2 regularization systems.
Specifically, when the decision tree information of the LGBM model is that the range of the number of the leaves of the decision tree is 2 to 500, the range of the number of the decision tree is 25 to 325, the range of the maximum depth of the decision tree is 1 to 20, the L1 regularization term coefficient is 1.00E-10 to 1.00E-01, and the L2 regularization term system is 1.00E-10 to 1.00E-0, the prediction effect is better than that of adopting a default value.
In the case where the integrated tree model is CATBoost models, the decision tree initial parameter information of the integrated tree model includes one or more of a decision tree depth of 1 to 16, a maximum tree number of 25 to 300, and an L2 regularization term coefficient of 1 to 100.
Specifically, when the initial parameter information of the decision tree of the CATBoost model is that the range of the leaf number of the decision tree is 2 to 500, the range of the number of the decision tree is 25 to 325, the range of the maximum depth of the decision tree is that the depth of the decision tree is 1 to 16, the maximum tree number is 25 to 300 and the L2 regular term coefficient is 1 to 100, compared with the default value, the prediction effect is better.
In some embodiments, the number of production parameters in the product data also affects the initial parameters of the integrated tree model. The more the number of the production parameters input into the first root cause analysis model (namely the higher the dimension), the greater the number of the decision tree leaves, the number of the decision trees, the depth of the decision tree and the number of the maximum tree in the integrated tree model, so that the prediction effect is better.
In some embodiments, the first root cause analysis model indicates at least two integrated tree models.
Wherein each integrated tree model corresponds to a weight. The complexity of the model can be further increased by at least two integrated tree models, resulting in better results.
In some embodiments, for results output by multiple models (base model or integrated model), the final output results may be obtained by averaging or voting.
Specifically, the average method sets different weights for each model; the voting method is that a plurality of models output results respectively, and the results are predicted according to voting rules which are similar to a minority and obey majority.
In some embodiments, deriving the final result of a first impact factor from the calculation result of a model algorithm (tree model (trained or not), integrated tree model, multiple basis number models, etc.) as the impact score of the final first impact factor may not be accurate enough. Thus, at least two influence factor score calculation methods (purity index calculation, weight calculation of production parameters) can be weighted to obtain the influence factor score of the production parameters. When the production parameter is determined to be the first influence factor, the influence factor score of the production parameter is the influence score of the first influence factor, so that the determination of the first influence factor information is realized. For example, a certain number of production parameters are selected in order of the impact factor score from high to low, the selected production parameters being the first impact factor, and the impact factor score of the selected production parameters being the impact score of the first impact factor.
The method for calculating the influence factor score of the production parameter (namely, the influence score of the first influence factor) further comprises a correlation analysis index, a distance index, a consistency index and the like.
In some embodiments, when the normal product data and the abnormal product data are input into the first root cause analysis model, the production parameters in the normal product data and the abnormal product data may be subjected to dimension up-scaling or dimension down-scaling. And then, inputting the production parameters subjected to dimension increasing or dimension decreasing treatment into a first factor analysis model.
When the production parameters are subjected to dimension increasing processing, the production parameters can be subjected to factor synthesis processing based on a dimension increasing algorithm so as to obtain new synthesis parameters, namely the production parameters. When the dimension reduction processing is carried out on the production parameters, the relevant factor combination processing can be carried out on the production parameters based on the dimension reduction algorithm. Wherein the correlation factor combination process indicates a dimension reduction process for the production parameter for which correlation exists.
Optionally, the dimension-increasing algorithm includes algorithms such as one-hot coding, feature crossing (Featrue Cross), and the like, which further digs own data rule to obtain new parameters.
Optionally, the dimension reduction algorithm mainly maps data points in the original high-dimension space to the low-latitude space, so that the calculation cost is reduced, and meanwhile, the correlation between the features is considered, namely, the dimension reduction algorithm is used for carrying out dimension reduction processing on production parameters with the correlation, namely, a plurality of production parameters with the correlation are mutated to obtain a representative parameter, so that when a first influence factor is determined, the representative parameters corresponding to the production parameters are not utilized, but only the representative parameters corresponding to the production parameters are utilized.
Optionally, the dimension reduction algorithm includes one or more of a principal component analysis (PRINCIPAL COMPONENT ANALYSIS, PCA) algorithm, a linear discriminant analysis (LATENT DIRICHLET Allocation, LDA) algorithm, a multidimensional scaling analysis (Multidimensional scaling, MDS) algorithm, and a manifold learning algorithm.
Alternatively, principal component analysis can maximize the intrinsic information of the retained data after dimension reduction, and measure the importance of the direction by measuring the magnitude of the variance of the data in that direction. The main component is to screen the representative index by dimension reduction method, combine a plurality of characteristic variables (i.e. production parameters) into a few main components, the new comprehensive index contains most of original information, i.e. n dimension features are mapped onto k dimension (k < n), the k dimension features are called main components, and are k dimension features recombined. The multi-factor combination method not only can achieve the purpose of dimension reduction, but also considers the correlation and common influence among the characteristics.
For example, in the bad diagnosis analysis of an OLED (Organic Light-Emitting Diode) product, that is, when determining the first influencing factor of the abnormality of the OLED product, all manufacturing process or equipment related data of the whole factory are taken as production data, and the relation between the result variable and the reason variable is established through a decision tree or other method, and is converted into effective data supporting decision, so that the reason of the abnormality of the OLED product, that is, the first influencing factor, is rapidly located. However, multivariate mass production data, while providing a large amount of rich information, also increases the complexity of the analysis and, more importantly, there is an interactive impact between many feature variables, each of which is isolated alone and not comprehensive, thus utilizing PCA for multi-factor combinatorial analysis. The specific process is as follows:
Assume that there are M target products { X 1,X 2,...,X M }, each having N-dimensional production parameters The specific calculation flow is as follows:
In the first step, the mean value is removed, and all the features are centered.
Averaging each production parameter (i.e. the corresponding parameter value of the production parameter), e.gThen, for each production parameter, subtracting the average value of the production parameter from the parameter value corresponding to the production parameter corresponding to each target product, thereby obtaining a new parameter value after decentralization.
And secondly, solving a covariance matrix.
And (3) respectively solving covariance matrixes for N-dimensional features in the step two, namely production parameters. For example, if n=2, x 1 and x 2 find their covariance matrices
The diagonal is the variance corresponding to each production parameter, the off-diagonal is the covariance, and the covariance is the degree of change of measuring two production parameters, namely the simultaneous transformation of the characteristic variables. The larger the absolute value of the covariance, the greater the influence of the two on each other and vice versa, so that the correlation between the two production parameters can be determined.
The covariance solving formula is as follows:
And thirdly, solving eigenvalues and eigenvectors of the covariance matrix.
Solving eigenvalues and eigenvectors of the covariance matrix: cu=λu; there will be N eigenvalues λ, i.e. one eigenvector u i for each λ i.
And fourthly, projecting and dimension-reducing to form new features.
The eigenvalues are ordered in order from large to small, and the largest top k are selected, along with their corresponding eigenvectors, { (λ 1,u 1),…,(λ k,u k) }. Then, projection, namely dimension reduction, is carried out, and for each target product X i in each M target products, the N dimension characteristics corresponding to the original target products are obtainedThe new features after projection are:
Moreover, the selected k-dimensional new features correspond to The load occupied by N original features in each new feature is calculated, and the original features with higher load are selected and combined, namely the new features represent most of information of the original features, and the original features have higher similarity.
Alternatively, the linear discriminant analysis algorithm is a supervised learning dimension reduction technique, i.e., each sample of its dataset (i.e., production parameters) is class-output. The main principle is that the intra-class variance is the smallest and the inter-class variance is the largest after projection.
Alternatively, a statistical study method for classifying the multidimensional scale analysis algorithm according to the similarity (near distance) or non-similarity (far distance, i.e. by calculating the distance) between production parameters with a plurality of dimensions is adopted. The objects are represented graphically in a low dimensional (two or three dimensional) space using perceptual graphics that simply and clearly illustrate the relative relationships between the objects (e.g., production parameters).
Alternatively, the manifold learning algorithm is a nonlinear dimension reduction method that maintains some "invariant feature quantity" of the high-dimensional data and the low-dimensional data to find the low-dimensional feature representation. The invariant feature quantity comprises one or more of Isomap geodesic distance, LLE local reconstruction coefficient, LE data field relationship and LTSA local cut space alignment.
In some embodiments, the dimension up process may increase the number of production parameters and the dimension down process may decrease the number of production parameters. Therefore, the above-mentioned dimension-up or dimension-down processing of the production parameters may be determined according to the number of production parameters in the product data to be processed. And under the condition that the number of the production parameters is smaller than a first preset threshold value, performing factor synthesis processing on the production parameters by using a dimension lifting algorithm, namely performing dimension lifting processing on the production parameters. And carrying out dimension reduction processing on the production parameters under the condition that the number of the production parameters is larger than or equal to a first preset threshold value.
Specifically, when the number of the production parameters is smaller than a first preset threshold, the available production parameters are indicated to be smaller, so that the number and accuracy of the determined first influence factors are influenced, the first root cause analysis model combines the production parameters through a corresponding dimension-lifting algorithm to obtain new parameters, namely synthesis parameters, and when the first influence factors are determined, the new parameters are determined according to the production parameters and the synthesis parameters.
When the number of the production parameters is greater than or equal to the first preset threshold, the fact that the available production parameters are too large may affect the efficiency of determining the first influence factor is indicated, and the first root cause analysis model utilizes a number compression algorithm to perform relevant factor combination processing on the production parameters so as to reduce the number of the production parameters for determining the first influence factor.
In some embodiments, after obtaining the product data to be processed, a filtering algorithm may be used to screen production parameters in the product data to be processed, so as to remove production parameters that have little influence on the detection result, that is, remove production parameters with low probability of becoming the first influence factor, so as to improve the efficiency of subsequently determining the first influence factor.
Optionally, the filtering algorithm selects (Filter) based on correlations between the various features (i.e., production parameters) and the outcome variables (i.e., the first influencing factors) and the evaluation index. And taking the correlation coefficient as the importance degree of each dimension of the feature, and removing part of unimportant production parameters according to the set threshold value or the number of the features. The evaluation index includes one or more of a correlation analysis index (e.g., pearson correlation coefficient (Pearson correlation coefficient), spearman phase relationship number, maximum information coefficient (Maximal Information Coefficient, MIC), etc.), a distance index, a purity index, a consistency index, etc.
Specifically, the pearson correlation coefficient is used for measuring the linear correlation between the feature and the result variable, the value interval is [ -1,1], and the closer to 1 is the more positive linear correlation, and the closer to-1 is the more negative linear correlation. The method is suitable for the condition that the characteristic and the result variable are both continuous numerical variables. The nonlinearities of the exponential function, etc., are calculated using Spearman correlation coefficients. Complex nonlinear functions such as periodic functions generally adopt a maximum information coefficient to measure the association degree of two groups of variables, so that the characteristic with smaller correlation, namely the production parameter, is removed.
Specifically, the good feature set (the feature set includes at least one production parameter) should make the distance between products belonging to the same class as small as possible and the distance between products belonging to different classes as far as possible, so that based on the classification problem, that is, the result variable is discrete type data, the common distance index, that is, the calculation method corresponding to the similarity index has the euclidean distance, etc.
Specifically, under the condition that one of the characteristic and the result variable is a discrete numerical variable, or after the continuous numerical variable is discretized, performing ANOVA variance analysis, T test, or non-parametric version Kruskal-Wallis test and Wilcoxon symbol rank test, namely performing significance check, so as to determine the index value corresponding to the corresponding consistency index. Assuming that two or more groups of variables (i.e., production parameters) come from the same distribution, i.e., whether there is a significant difference is checked, the result pValue is obtained, and the value interval is [0,1]. When pvue is smaller, the larger the difference is considered, the greater the degree of influence of the feature on the result variable, i.e., the greater the feature importance.
And when the characteristic and the result variable are both the constant data, namely the discrete variable, checking by using a chi-square. The method counts the deviation degree between the actual observed value and the theoretical inferred value of a sample (namely a target product), and the smaller the chi-square value is, the smaller the deviation is, and the more tends to be in line. Likewise, the smaller the pvue, the rejection of the original hypothesis demonstrates a significant correlation.
Optionally, the purity of the result after feature division is used as feature importance, namely the influence factor score, and the method is suitable for the condition that both the feature and the result variable are classified data. After the data of the product to be processed corresponding to the target product is divided by using a certain feature, the higher the purity of each data subset is, the lower the uncertainty is, the higher the consistency is, and the feature is more important.
In this embodiment, the first influence factor is obtained by screening the production parameters by using the filtering algorithm, and meanwhile, scoring of the first influence factor is also achieved, so as to obtain the influence score corresponding to the first influence factor.
In some embodiments, after the first influence factor is obtained, in order to more intuitively display the reason for influencing the product result, the first influence factor is ranked according to the influence score corresponding to the first influence factor.
In some embodiments, a mapping between the first impact factor and the production parameter may also be displayed.
In this embodiment, after the first influence factors are obtained, for each first influence factor, a production parameter corresponding to the first influence factor is determined, that is, a production parameter related to the first influence factor is determined, and a mapping relationship between the first influence factor and the corresponding production parameter is displayed, so that a related person can intuitively and quickly determine how to obtain the corresponding first influence factor through the production parameter.
The mapping relationship indicates how to determine the first influence factor through the production parameter corresponding to the first influence factor, and the mapping relationship may be a mathematical relationship, for example, a mathematical formula such as a cartesian product, or a related code, that is, when the related code is executed, the production parameter may be processed correspondingly, so as to obtain the corresponding first influence factor.
The above process will be described with reference to a specific example, which is specifically:
The target product includes a liquid crystal display panel. In manufacturing a liquid crystal display panel, the manufacturing stages of the display panel at least include an Array (Array) stage, a Color Film (CF) stage, a cell forming stage, and a module (module) stage. In the array stage, a thin film transistor array substrate is manufactured. In one example, during the array phase, a layer of material is deposited, the layer of material is subjected to photolithography, e.g., photoresist is deposited on the layer of material, the photoresist is subjected to exposure and then developed. Subsequently, the material layer is etched and the remaining photoresist is removed ("lift-off"). In the CF stage, the color film substrate is manufactured, which involves the following steps: coating, exposing and developing. In the box forming stage, the array substrate and the color film substrate are assembled to form a unit. The cartoning stage includes several steps including coating and rubbing alignment layers, injecting liquid crystal material, cell sealant coating, vacuum-packing, cutting, grinding, and cell inspection. In the modular stage, peripheral components and circuits are assembled to the panel. In one example, the module level includes several steps including assembly of a backlight, assembly of a printed circuit board, polarizer attachment, assembly of a chip on film, assembly of an integrated circuit, burn-in, and final inspection.
The target product comprises an organic light emitting diode display panel. In fabricating an organic light emitting diode display panel, the fabrication of the display panel includes at least four device processes including an array stage, an OLED stage, an EAC2 stage, and a module stage. In the array stage, the fabrication of the back plate of the display panel, for example, includes the fabrication of a plurality of thin film transistors. In the OLED stage, a plurality of light emitting elements (e.g., organic light emitting diodes) are manufactured, an encapsulation layer is formed to encapsulate the plurality of light emitting elements, and optionally, a protective film is formed on the encapsulation layer. In the EAC2 stage, large glass (glass) is first cut into half glass (hglass) and then further cut into panels. Further, in the EAC2 stage, an inspection apparatus is used to inspect the panel to detect defects therein, such as dark lighting and bright lines. At the module stage, the flexible printed circuit is bonded to the panel, for example, using chip-on-film technology. A cover glass is formed on the surface of the panel. Optionally, a further inspection is performed to detect defects in the panel.
Accordingly, the data produced in the above production steps can be divided into production data and detection data. The production data are historical parameter data and processing parameter data of the target product in the production and processing processes. The resume parameters comprise product ID, product basic attribute, product experience process section, site information, equipment model and the like in the production process; the processing parameters include processing parameters of the target product in different process sections or equipment, such as pressure, temperature, dwell time, etc. And (3) carrying out optical or electrical detection on the product at the end of each process section, detecting whether the quality of the product meets the standard, forming detection data based on the detection result of the detection equipment, and identifying whether and what kind of defects are generated in the product.
Optionally, the data containing the defect is defined as negative samples, i.e. abnormal product data, and the data not containing the defect is defined as positive samples, i.e. normal product data. It will be appreciated that when a certain type of defect data is selected, the other defect type of data is also positive samples.
Alternatively, a certain number of defect points may be tolerated for a display panel comprising a larger area. Therefore, positive and negative samples are distinguished by calculating the level of the ill-intentioned value. For example, in the case where the sample is a display panel motherboard, a ratio of a total number of defective display panels belonging to a defective type among a plurality of display panels of the display panel motherboard to a total number of the plurality of display panels, which may be referred to as a defective proportion of the sample, is used as a defective degree characterization value in a relevant production parameter of the sample; or the total number of defective display panels belonging to the defective type among the plurality of display panels of the display panel motherboard is used as the defective degree characterization value in the production parameters of the sample. In this case, the greater the level of the harmfulness characterization value in the production parameters of the sample, the greater the level of harmfulness characterized as belonging to the harmfulness type. Also exemplarily, in the case where the sample is a display panel motherboard, a ratio of a total number of display panels other than the defective display panel belonging to the defective type among the plurality of display panels of the display panel motherboard to the total number of the plurality of display panels is used as the defective degree characterization value in the production parameter of the sample; or the total number of defective display panels belonging to the defective type among the plurality of display panels of the display panel motherboard is used as the defective degree characterization value in the production parameters of the sample. In this case, the smaller the level of the harmfulness characterization value in the production parameters of the sample, the greater the level of harmfulness characterized as belonging to the harmfulness type.
It will be appreciated that many products (e.g., display panels) are produced by a production line, each of which includes a plurality of process stations, each of which is adapted to perform certain processes (e.g., cleaning, deposition, exposure, etching, alignment, inspection, etc.) on the product (including the semi-finished product). Meanwhile, each process station typically has a plurality of sample production facilities (i.e., process facilities) for performing the same process; of course, although the treatment is theoretically performed identically, the actual treatment effect is not exactly identical because different process apparatuses differ in model, state, etc. In this case, the production process of each sample requires multiple process stations, and the process stations through which different samples are passed during production may be different; and samples passing through the same process station may also be processed by different sample production facilities therein. Thus, in one production line, each sample production device is involved in the production of a part of the sample, but not in the production of the sample, i.e. each sample production device is involved in and only in the production of a part of the sample.
Alternatively, the production parameters, i.e. the production parameters to be analyzed, may be other column dimension attributes than the tag columns in the fusion data table, including the passing sites, equipment parameters, etc. in the factory production. For the production parameters, all dimension attributes in the data table can be fused, and preliminary screening can be performed according to the selection of a user.
Alternatively, the production parameter may be directly used as the first influencing factor for evaluating the root cause of the event. When there are more production parameters, there is a correlation between certain production parameters that may affect the final significance evaluation, and thus the synthesis parameters may be formed by combining the production parameters to determine the first influencing factor using them. When the production parameters are less, the first influencing factor can be determined by using the parameters obtained by processing by performing the variation on the production parameters, such as taking log, square and the like.
The number of the first influencing factors can be more than the number of the production parameters, can be the same as the production parameters, and can be less than the number of the production parameters.
As shown in fig. 6, fig. 6 is a flowchart illustrating yet another method of anomaly root cause analysis according to an exemplary embodiment of the present application, including the steps of:
Step 601, obtaining sample data to be processed corresponding to a target object.
Step 602, determining positive samples and negative samples in sample data to be processed. Wherein both the positive and negative samples comprise a first parameter.
And 603, inputting the positive sample and the negative sample into a second root cause analysis model to obtain second influence factor information of the judging result of the target object.
In this embodiment, the second influence factor information that causes the determination result of the target object may be determined according to the sample data to be processed corresponding to the target object, where the determination result of the target object corresponds to the target object, for example, when the target object is a device (e.g., a production device, a detection device), the determination result indicates whether the operation state of the device is normal; for another example, when the target object is a certain commodity, the judgment result indicates whether or not the sales amount of the commodity is normal.
Wherein, the target object can be determined according to the actual use scene. The data contained in the sample data to be processed corresponding to the target object can also be determined according to the actual use condition.
The process of determining the second influence factor information is similar to the process of determining the first influence factor information, and will not be described in detail herein.
Corresponding to the embodiment of the method, the application also provides an embodiment of the abnormal root cause analysis device and the electronic equipment applied by the device.
The embodiment of the abnormal root cause analysis device can be applied to electronic equipment, such as a server or terminal equipment. The embodiment of the abnormal root cause analysis device can be realized by software, hardware or a combination of the hardware and the software. Taking software implementation as an example, the device in a logic sense is formed by reading corresponding computer program instructions in a nonvolatile memory into a memory through a processor where the device is located. In the hardware level, as shown in fig. 7, which is a hardware structure diagram of an electronic device in which the abnormality cause analysis apparatus of the present application is located, in addition to the processor 710, the memory 730, the network interface 720, and the nonvolatile memory 740 shown in fig. 7, the electronic device in which the abnormality cause analysis apparatus 731 is located in the embodiment may generally include other hardware according to the actual function of the electronic device, which is not described herein.
As shown in fig. 8, fig. 8 is a block diagram illustrating an abnormal root cause analysis apparatus according to an exemplary embodiment of the present application, the apparatus comprising:
the first data obtaining module 810 is configured to obtain product data to be processed corresponding to a target product. The product data to be processed are obtained by fusing production data and detection data corresponding to the target product according to first preset parameters.
The first data processing module 820 is configured to determine normal product data and abnormal product data in the product data to be processed according to the detection data.
The first root cause determining module 830 is configured to input the normal product data and the abnormal product data into a first root cause analysis model, and obtain first influence factor information of a detection result of the target product, where the first influence factor includes one or more of the production data, and the first root cause analysis model indicates a tree model.
Optionally, the first root cause determining module is further configured to:
And inputting the normal product data and the abnormal product data into a first root cause analysis model, and calculating to obtain the purity index of the production data.
First influence factor information is determined based on the purity index.
Optionally, the first root cause determining module is further configured to:
normal product data and abnormal product data are input into the tree model to train the tree model.
And determining first influence factor information according to the trained tree model.
Optionally, the production data comprises production parameters. Training the tree model to indicate and adjust the number of production parameters and the weight corresponding to the production parameters.
The first influence factor information is determined according to the weight size of the production parameter.
Optionally, the first root cause analysis model indicates at least one integrated tree model, the integrated tree model being obtained by integrating a plurality of tree models.
Optionally, the first root cause determining module is further configured to:
Normal product data and abnormal product data are input into the integrated tree model to train the integrated tree model.
And determining first influence factor information according to the trained integrated tree model.
Optionally, the first root cause analysis model indicates at least two integrated tree models.
Optionally, in the case that the integrated tree model is LGBM models, the decision tree information of the integrated tree model includes one or more of a range of 2 to 500 decision tree leaves, a range of 25 to 325 decision tree numbers, a range of 1 to 20 decision tree maximum depths, 1.00E-10 to 1.00E-01L 1 regularization coefficients, and 1.00E-10 to 1.00E-01L 2 regularization systems.
Optionally, in the case that the integrated tree model is CATBoost models, the decision tree information of the integrated tree model includes one or more of a decision tree depth of 1 to 16, a maximum tree number of 25 to 300, and an L2 regularization term coefficient of 1 to 100.
Optionally, the first root cause determining module is further configured to:
And carrying out dimension increasing or dimension decreasing treatment on the production parameters in the normal product data and the abnormal product data, and inputting the production parameters subjected to dimension increasing or dimension decreasing treatment into a first root cause analysis model.
Optionally, the first root cause determining module is further configured to: and carrying out fusion processing of production parameters in the training process of the tree model.
Optionally, the fusion process indicates that a characteristic crossover and/or mutation is performed on the production parameters to obtain new production parameters.
Optionally, the first data acquisition module is specifically configured to:
One or more of manual importation, batch importation, and real-time importation.
Optionally, the first data acquisition module is specifically configured to:
And acquiring the data to be processed corresponding to the target product through a data pipeline constructed based on the Apache Beam model.
As shown in fig. 9, fig. 9 is a block diagram of another abnormality cause analysis apparatus according to an exemplary embodiment of the present application, the apparatus including:
a second data obtaining module 910, configured to obtain sample data to be processed corresponding to the target object.
A second data processing module 920 is configured to determine positive and negative samples in the sample data to be processed. Wherein both the positive and negative samples comprise a first parameter.
And the second root cause determining module 930 is configured to input the positive sample and the negative sample into a second root cause analysis model, and obtain second influence factor information of the determination result of the target object.
In another embodiment, the application provides an anomaly root cause analysis system, which comprises a data management server, an analysis server and a display.
And a data management server configured to store data and extract, convert, or load the data. The data includes at least one of production data and inspection data.
The analysis server is configured to acquire the to-be-processed product data corresponding to the target product from the data management server when the task request is received, and determine normal product data and abnormal product data in the to-be-processed product data according to the detection data in the to-be-processed product data. And inputting the normal product data and the abnormal product data into a first root cause analysis model to obtain first influence factor information of a detection result of the target product, wherein the first influence factor comprises one or more of production data, and the first root cause analysis model indicates a tree model. The product data to be processed is obtained by fusing the production data and the detection data corresponding to the target product according to the first preset parameters.
And a display configured to display the first influence factor information through the visual interface.
Optionally, the data management server includes a data lake, a data warehouse, a NoSQL database, and an ETL module.
The ETL module is configured to extract, convert, or load data.
The data lake is configured to store a first set of data formed by extracting, by the ETL module, raw data from at least one data source, the first set of data having the same content as the raw data.
The data warehouse is configured to store a second set of data formed by cleaning and normalizing the first set of data by the ETL module.
The NoSQL database is configured to store a third set of data formed by converting the second set of data by the ETL module.
Optionally, the data table of the third set of data includes a plurality of sub-data tables having an index relationship formed by splitting the data table of the second set of data.
Optionally, the plurality of sub-data tables includes a first sub-table, a second sub-table, and a third sub-table.
The first sub-table includes data filtering options presented by the visual interface.
The second sub-table includes the product serial number.
The third sub-table includes data corresponding to the product serial number.
Optionally, the plurality of sub-data tables further comprises a fourth sub-table comprising manufacturing site information and/or equipment information, the third sub-table comprising codes or abbreviations for manufacturing sites and/or equipment.
In another embodiment, the present application provides a computer-readable storage medium having stored therein computer-executable instructions that, when executed by a processor, implement the method for anomaly root cause analysis as described above.
In another embodiment, the application provides a computer program product comprising a computer program which, when executed by a processor, implements the method of anomaly root cause analysis as described above.
The implementation process of the functions and roles of each module in the above device is specifically shown in the implementation process of the corresponding steps in the above method, and will not be described herein again.
For the device embodiments, reference is made to the description of the method embodiments for the relevant points, since they essentially correspond to the method embodiments. The apparatus embodiments described above are merely illustrative, wherein the modules illustrated as separate components may or may not be physically separate, and the components shown as modules may or may not be physical, i.e., may be located in one place, or may be distributed over a plurality of network modules. Some or all of the modules may be selected according to actual needs to achieve the purposes of the present application. Those of ordinary skill in the art will understand and implement the present application without undue burden.
The foregoing describes certain embodiments of the present application. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims can be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.
Other embodiments of the application will be apparent to those skilled in the art from consideration of the specification and practice of the application disclosed herein. This application is intended to cover any variations, uses, or adaptations of the application following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the application pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the application being indicated by the following claims.
It is to be understood that the application is not limited to the precise arrangements and instrumentalities shown in the drawings, which have been described above, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the application is limited only by the appended claims.
The foregoing description of the preferred embodiments of the application is not intended to be limiting, but rather to enable any modification, equivalent replacement, improvement or the like to be made within the spirit and principles of the application.

Claims (24)

  1. An abnormal root cause analysis method, comprising:
    Obtaining product data to be processed corresponding to a target product; the product data to be processed are obtained by fusing production data and detection data corresponding to the target product according to a first preset parameter;
    According to the detection data, determining normal product data and abnormal product data in the product data to be processed;
    Inputting the normal product data and the abnormal product data into a first cause analysis model to obtain first influence factor information of a detection result of the target product, wherein the first influence factor comprises one or more of the production data, and the first cause analysis model indicates a tree model.
  2. The method of claim 1, wherein inputting the normal product data and the abnormal product data into a first root cause analysis model to obtain first impact factor information for a detection result of the target product comprises:
    inputting the normal product data and the abnormal product data into the first root cause analysis model, and calculating to obtain the purity index of the production data;
    The first influence factor information is determined based on the purity indicator.
  3. The method according to claim 1 or 2, wherein inputting the normal product data and the abnormal product data into a first root cause analysis model to obtain first influence factor information of a detection result of the target product comprises:
    inputting the normal product data and the abnormal product data into the tree model to train the tree model;
    and determining the first influence factor information according to the trained tree model.
  4. A method according to claim 3, wherein the production data comprises production parameters; the training of the tree model indicates to adjust the number of the production parameters and the weight corresponding to the production parameters,
    The first influence factor information is determined according to a weight size of the production parameter.
  5. The method of claim 1, wherein the first root cause analysis model indicates at least one integrated tree model, the integrated tree model being obtained by integrating a plurality of tree models.
  6. The method of claim 5, wherein inputting the normal product data and the abnormal product data into a first root cause analysis model to obtain first impact factor information of a detection result of the target product, comprises:
    Inputting the normal product data and the abnormal product data into the integrated tree model to train the integrated tree model;
    And determining the first influence factor information according to the trained integrated tree model.
  7. The method of claim 5, wherein the first cause analysis model is indicative of at least two integrated tree models.
  8. The method of any of claims 5 to 7, wherein in the case where the integrated tree model is LGBM models, the decision tree information of the integrated tree model includes one or more of a range of decision tree leaf numbers from 2 to 500, a range of decision tree numbers from 25 to 325, a range of decision tree maximum depths from 1 to 20, L1 regularization coefficients from 1.00E-10 to 1.00E-01, and L2 regularization systems from 1.00E-10 to 1.00E-01.
  9. The method according to any one of claims 5 to 7, wherein in case the integrated tree model is CATBoost model, the decision tree information of the integrated tree model includes one or more of a decision tree depth of 1 to 16, a maximum tree number of 25 to 300, and an L2 regularization coefficient of 1 to 100.
  10. The method of claim 1, wherein inputting the normal product data and the abnormal product data into a first root cause analysis model comprises:
    And carrying out dimension increasing or dimension decreasing treatment on the production parameters in the normal product data and the abnormal product data, and inputting the production parameters subjected to dimension increasing or dimension decreasing treatment to the first root cause analysis model.
  11. The method of claim 1, wherein the production data comprises production parameters; the method further comprises the steps of:
    And carrying out fusion processing of production parameters in the training process of the tree model.
  12. The method of claim 11, wherein the fusion process indicates that a characteristic crossover and/or mutation is performed on the production parameters to obtain new production parameters.
  13. The method of claim 1, wherein the obtaining the product data to be processed corresponding to the target product comprises:
    One or more of manual importation, batch importation, and real-time importation.
  14. The method of claim 13, wherein the obtaining the product data to be processed corresponding to the target product comprises:
    And acquiring the data to be processed corresponding to the target product through a data pipeline constructed based on the Apache Beam model.
  15. An abnormal root cause analysis method, comprising:
    Obtaining sample data to be processed corresponding to a target object;
    Determining positive samples and negative samples in the sample data to be processed; wherein the positive and negative samples each comprise a first parameter;
    And inputting the positive sample and the negative sample into a second root cause analysis model to obtain second influence factor information of the judging result of the target object.
  16. The abnormal root cause analysis system is characterized by comprising a data management server, an analysis server and a display;
    The data management server is configured to store data and extract, convert or load the data; the data includes at least one of production data and inspection data;
    The analysis server is configured to acquire to-be-processed product data corresponding to a target product from the data management server when a task request is received, and determine normal product data and abnormal product data in the to-be-processed product data according to detection data in the to-be-processed product data; inputting the normal product data and the abnormal product data into a first root cause analysis model to obtain first influence factor information of a detection result of the target product, wherein the first influence factor comprises one or more of the production data, and the first root cause analysis model indicates a tree model; the product data to be processed are obtained by fusing the production data and the detection data corresponding to the target product according to a first preset parameter;
    the display is configured to display the first influence factor information through a visual interface.
  17. The system of claim 16, wherein the data management server comprises a data lake, a data warehouse, a NoSQL database, and an ETL module;
    the ETL module is configured to extract, convert, or load data;
    The data lake is configured to store a first set of data formed by extracting, by the ETL module, raw data from at least one data source, the first set of data having the same content as the raw data;
    the data warehouse is configured to store a second set of data formed by cleansing and normalizing the first set of data by the ETL module;
    the NoSQL database is configured to store a third set of data formed by converting the second set of data by the ETL module.
  18. The system of claim 17, wherein the data table of the third set of data comprises a plurality of sub-data tables having an index relationship formed by splitting the data table of the second set of data.
  19. The system of claim 18, wherein the plurality of sub-data tables comprises a first sub-table, a second sub-table, and a third sub-table;
    the first sub-table comprises data screening options presented by the visual interface;
    The second sub-table includes a product serial number;
    and the third sub-table comprises data corresponding to the product serial number.
  20. The system of claim 19, wherein the plurality of sub-data tables further comprises a fourth sub-table comprising manufacturing site information and/or equipment information, the third sub-table comprising codes or abbreviations for the manufacturing site and/or equipment.
  21. An abnormal root cause analysis device, comprising:
    The first data acquisition module is used for acquiring product data to be processed corresponding to a target product; the product data to be processed are obtained by fusing production data and detection data corresponding to the target product according to a first preset parameter;
    the first data processing module is used for determining normal product data and abnormal product data in the product data to be processed according to the detection data;
    The first root cause determining module is used for inputting the normal product data and the abnormal product data into a first root cause analysis model to obtain first influence factor information of the detection result of the target product, wherein the first influence factor comprises one or more of the production data, and the first root cause analysis model indicates a tree model.
  22. An abnormal root cause analysis device, comprising:
    The second data acquisition module is used for acquiring sample data to be processed corresponding to the target object;
    A second data processing module for determining positive and negative samples in the sample data to be processed; wherein the positive and negative samples each comprise a first parameter;
    And the second root cause determining module is used for inputting the positive sample and the negative sample into a second root cause analysis model to obtain second influence factor information of the judging result of the target object.
  23. An electronic device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor implements the anomaly root cause analysis method of any one of claims 1 to 15 when the program is executed by the processor.
  24. A computer-readable storage medium having stored therein computer-executable instructions which, when executed by a processor, implement the anomaly root cause analysis method of any one of claims 1 to 15.
CN202280003212.8A 2022-09-16 2022-09-16 Abnormal root cause analysis method and device Pending CN118056189A (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2022/119262 WO2024055281A1 (en) 2022-09-16 2022-09-16 Abnormality root cause analysis method and apparatus

Publications (1)

Publication Number Publication Date
CN118056189A true CN118056189A (en) 2024-05-17

Family

ID=90273936

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202280003212.8A Pending CN118056189A (en) 2022-09-16 2022-09-16 Abnormal root cause analysis method and device

Country Status (2)

Country Link
CN (1) CN118056189A (en)
WO (1) WO2024055281A1 (en)

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111970157B (en) * 2020-08-27 2023-04-18 广州华多网络科技有限公司 Network fault root cause detection method and device, computer equipment and storage medium
CN112019932B (en) * 2020-08-27 2022-05-24 广州华多网络科技有限公司 Network fault root cause positioning method and device, computer equipment and storage medium
CN113570000A (en) * 2021-09-08 2021-10-29 南开大学 Ocean single-factor observation quality control method based on multi-model fusion
CN114490303B (en) * 2022-04-07 2022-07-12 阿里巴巴达摩院(杭州)科技有限公司 Fault root cause determination method and device and cloud equipment

Also Published As

Publication number Publication date
WO2024055281A1 (en) 2024-03-21

Similar Documents

Publication Publication Date Title
US11138376B2 (en) Techniques for information ranking and retrieval
US7565335B2 (en) Transform for outlier detection in extract, transfer, load environment
CN103513983B (en) method and system for predictive alert threshold determination tool
CN114868092B (en) Data management platform, defect analysis system, defect analysis method, computer storage medium, and method for defect analysis
US20220405909A1 (en) Computer-implemented method for defect analysis, apparatus for defect analysis, computer-program product, and intelligent defect analysis system
JP2023539284A (en) Enterprise spend optimization and mapping model architecture
CN107168995B (en) Data processing method and server
WO2019023982A1 (en) Multi-dimensional industrial knowledge graph
US10409817B1 (en) Database system and methods for domain-tailored detection of outliers, patterns, and events in data streams
US20220179873A1 (en) Data management platform, intelligent defect analysis system, intelligent defect analysis method, computer-program product, and method for defect analysis
US20210081876A1 (en) Systems and methods for process design including inheritance
WO2021142622A1 (en) Method for determining cause of defect, and electronic device, storage medium, and system
He et al. Efficiently localizing system anomalies for cloud infrastructures: a novel Dynamic Graph Transformer based Parallel Framework
US12032364B2 (en) Computer-implemented method for defect analysis, computer-implemented method of evaluating likelihood of defect occurrence, apparatus for defect analysis, computer-program product, and intelligent defect analysis system
Zeydan et al. Cloud 2 HDD: large-scale HDD data analysis on cloud for cloud datacenters
US20220027400A1 (en) Techniques for information ranking and retrieval
WO2024055281A1 (en) Abnormality root cause analysis method and apparatus
Saarinen Adaptive real-time anomaly detection for multi-dimensional streaming data
Wang et al. Enhanced soft subspace clustering through hybrid dissimilarity
US12061935B2 (en) Computer-implemented method for defect analysis, computer-implemented method of evaluating likelihood of defect occurrence, apparatus for defect analysis, computer-program product, and intelligent defect analysis system
Meghdouri et al. Modeling data with observers
TWI230349B (en) Method and apparatus for analyzing manufacturing data
CN113239026B (en) Cloud server and cloud data processing method based on same
WO2022198680A1 (en) Data processing method and apparatus, electronic device, and storage medium
Xiong Initial clustering based on the swarm intelligence algorithm for computing a data density parameter

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination