WO2022198680A1 - 数据处理方法及装置、电子设备、存储介质 - Google Patents

数据处理方法及装置、电子设备、存储介质 Download PDF

Info

Publication number
WO2022198680A1
WO2022198680A1 PCT/CN2021/083429 CN2021083429W WO2022198680A1 WO 2022198680 A1 WO2022198680 A1 WO 2022198680A1 CN 2021083429 W CN2021083429 W CN 2021083429W WO 2022198680 A1 WO2022198680 A1 WO 2022198680A1
Authority
WO
WIPO (PCT)
Prior art keywords
parameter
combined feature
product sample
feature
product
Prior art date
Application number
PCT/CN2021/083429
Other languages
English (en)
French (fr)
Inventor
王瑜
任佳伟
贺王强
王海金
柴栋
吴建民
王洪
Original Assignee
京东方科技集团股份有限公司
北京中祥英科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 京东方科技集团股份有限公司, 北京中祥英科技有限公司 filed Critical 京东方科技集团股份有限公司
Priority to CN202180000618.6A priority Critical patent/CN115413349A/zh
Priority to PCT/CN2021/083429 priority patent/WO2022198680A1/zh
Priority to KR1020237002264A priority patent/KR20230161409A/ko
Priority to DE112021001736.5T priority patent/DE112021001736T5/de
Publication of WO2022198680A1 publication Critical patent/WO2022198680A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • G06F18/2135Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods based on approximation criteria, e.g. principal component analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • G06Q10/06395Quality analysis or management
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B23/00Testing or monitoring of control systems or parts thereof
    • G05B23/02Electric testing or monitoring
    • G05B23/0205Electric testing or monitoring by means of a monitoring system capable of detecting and responding to faults
    • G05B23/0218Electric testing or monitoring by means of a monitoring system capable of detecting and responding to faults characterised by the fault detection method dealing with either existing or incipient faults
    • G05B23/0221Preprocessing measurements, e.g. data collection rate adjustment; Standardization of measurements; Time series or signal analysis, e.g. frequency analysis or wavelets; Trustworthiness of measurements; Indexes therefor; Measurements using easily measured parameters to estimate parameters difficult to measure; Virtual sensor creation; De-noising; Sensor fusion; Unconventional preprocessing inherently present in specific fault detection methods like PCA-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2471Distributed queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • G06F18/2137Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods based on criteria of topology preservation, e.g. multidimensional scaling or self-organising maps
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/04Manufacturing

Definitions

  • the present disclosure relates to the technical field of data processing, and in particular, to a data processing method and device, an electronic device, and a storage medium.
  • the production line of industrial products includes several process equipment, and each process equipment may affect the yield of the product when abnormal operation or abnormal working parameters occur.
  • each process equipment may affect the yield of the product when abnormal operation or abnormal working parameters occur.
  • the production personnel need to locate the cause of the defective product.
  • the present disclosure provides a data processing method and device, an electronic device, a storage medium, and a storage medium to solve the deficiencies of the related art.
  • a data processing method comprising:
  • each product sample in the set of product samples includes a first parameter and a second parameter; the first parameter is used to characterize the degree of badness of the product sample, and the second parameter is used to characterize the The original parameters of the sample production equipment through which the product sample passed;
  • the second parameter is processed based on a preset dimensionality reduction algorithm to obtain a combination feature of a specified dimension of the product sample set, and each combination feature in the combination feature of the specified dimension refers to an original parameter related to a defective product The combination;
  • the impact score of each dimension combined feature in the combined feature of the specified dimension based on the first parameter and the combined feature of the specified dimension; the impact score is used to represent the degree of influence of each combined feature on product defects;
  • the combined features are sorted to obtain at least one combined feature that is ranked first, and the original parameter corresponding to the at least one combined feature is used as the cause of the defective product.
  • the dimension reduction algorithm includes at least one of the following: principal component analysis method PCA, linear dimension reduction method LDA, local linear embedding LLE and Laplacian feature map LEP.
  • the dimensionality reduction algorithm includes principal component analysis (PCA), and the second parameter is processed based on a preset dimensionality reduction algorithm to obtain the combination feature of the specified dimension of the product sample set, including:
  • the second parameter is processed based on the principal component analysis method PCA to obtain the K-dimensional combined feature of the product sample set; the K-dimensional combined feature is used as the combined feature of the specified dimension.
  • the dimensionality reduction algorithm includes principal component analysis (PCA), and the second parameter is processed based on a preset dimensionality reduction algorithm to obtain the combined feature of the specified dimension of the product sample set, including:
  • the intersection of the first combined feature and the K-dimensional combined feature is acquired, the R-dimensional combined feature of the product sample set is obtained, and the R-dimensional combined feature is used as the combined feature of the specified dimension.
  • the second parameter is processed based on the principal component analysis method PCA to obtain the K-dimensional combined feature of the product sample set, including:
  • each covariance value in the covariance matrix represents the degree of similarity between the two original parameters
  • the eigenvector includes the weight corresponding to each original parameter
  • obtaining the impact score of each dimension combined feature in the combined feature of the specified dimension based on the first parameter and the combined feature of the specified dimension including:
  • Each dimension combined feature vector is obtained based on the combined feature of the specified dimension; the combined feature vector of each dimension includes the combined feature of the same dimension of each product sample;
  • the minimum value is used to represent the credibility threshold of the first parameter
  • the influence score of the corresponding combined feature is obtained according to the minimum value.
  • the purity index includes at least one of information gain, information gain rate and Gini coefficient.
  • the Gini coefficient is calculated using the following formula:
  • represents the number of product samples in the data combination located on the specified side of the segmentation point when any combination feature vector in the product sample set X is used as the segmentation point
  • K represents the classification category of the defective product, Here the value is 2
  • CK represents the number of product samples of the Kth category in the data combination on the specified side of the cut point.
  • the method further includes:
  • For each combined feature of the at least one combined feature display at least two original parameters with the highest weight in each combined feature.
  • a data processing method comprising:
  • the first parameter is used to characterize the degree of defectiveness of the product sample
  • a second parameter of each product sample in the product sample set is obtained; the second parameter is used to characterize the original parameter of the sample production equipment through which the product sample passes;
  • At least one combined feature is displayed on the third interface; the original parameter corresponding to the at least one combined feature is used as the cause of the defective product, and the at least one combined feature is based on the The first parameter and the second parameter are acquired.
  • At least one combination feature is displayed on the third interface in descending order according to the corresponding impact score or in descending order; the impact score is used to represent the degree of influence of each combination feature on the bad product. .
  • At least one combined feature is displayed on the third interface, including:
  • For each combined feature of the at least one combined feature display at least two original parameters with the highest weight in each combined feature.
  • the method further includes acquiring the at least one combined feature according to the first parameter and the second parameter, specifically including:
  • the second parameter is processed based on a preset dimensionality reduction algorithm to obtain a combination feature of a specified dimension of the product sample set, and each combination feature in the combination feature of the specified dimension refers to an original parameter related to a defective product The combination;
  • the impact score of each dimension combined feature in the combined feature of the specified dimension based on the first parameter and the combined feature of the specified dimension; the impact score is used to represent the degree of influence of each combined feature on product defects;
  • the combined features are sorted to obtain at least one combined feature that is ranked first, and the original parameter corresponding to the at least one combined feature is used as the cause of the defective product.
  • the method further includes:
  • the method further includes:
  • the product sample includes a display panel motherboard; the display panel motherboard includes a plurality of display panels.
  • a data processing apparatus comprising:
  • a sample set acquisition module configured to acquire a product sample set; each product sample in the product sample set includes a first parameter and a second parameter; the first parameter is used to characterize the degree of badness of the product sample, the first parameter The second parameter is used to characterize the original parameters of the sample production equipment through which the product sample passes;
  • the combined feature acquisition module is configured to process the second parameter based on a preset dimensionality reduction algorithm to obtain combined features of a specified dimension of the product sample set, where each combined feature in the combined features of the specified dimension refers to A combination of raw parameters associated with poor product;
  • an influence score acquisition module configured to obtain the influence score of each dimension combination feature in the combination feature of the specified dimension based on the first parameter and the combination feature of the specified dimension; the influence score is used to characterize each combination The degree of influence of the feature on the bad product;
  • the defective cause acquisition module is configured to sort the combined features according to the impact score to obtain at least one combined feature ranked first, and use the original parameter corresponding to the at least one combined feature as the cause of the defective product.
  • a data processing apparatus comprising:
  • a first parameter obtaining module configured to obtain a first parameter of each product sample in the product sample set in response to a user's first input on the first interface; the first parameter is used to characterize the degree of defectiveness of the product sample;
  • the second parameter acquisition module is configured to, in response to the second input by the user on the second interface, acquire the second parameter of each product sample in the product sample set; the second parameter is used to characterize the sample production through which the product sample passes.
  • the original parameters of the device are configured to, in response to the second input by the user on the second interface, acquire the second parameter of each product sample in the product sample set; the second parameter is used to characterize the sample production through which the product sample passes.
  • the defective cause acquisition module is configured to display at least one combined feature on the third interface in response to the third input by the user on the second interface; the original parameter corresponding to the at least one combined feature is used as the cause of the defective product, and the At least one combined feature is obtained from the first parameter and the second parameter.
  • an electronic device comprising:
  • a memory for storing a computer program executable by the processor
  • the processor is configured to execute the computer program in the memory to implement the above method.
  • a computer-readable storage medium is provided, and when an executable computer program in the storage medium is executed by a processor, the above method can be implemented.
  • the solutions provided by the embodiments of the present disclosure can reduce the product sample data by acquiring the combined feature of the specified dimension of each product sample, and the combined feature of the specified dimension has a dimension smaller than the dimension of the parameters in each product sample.
  • the combination feature of the specified dimension can be a combination of similar original parameters, which can retain the original information of the product sample and make the similar parameters form an association, which is conducive to quickly locating the cause of product failure and improving the detection efficiency.
  • FIG. 1 is a block diagram of a data processing system according to an exemplary embodiment.
  • FIG. 2 is a block diagram of another data processing system according to an exemplary embodiment.
  • Fig. 3 is a block diagram of an electronic device according to an exemplary embodiment.
  • Fig. 4 is a block diagram of another electronic device according to an exemplary embodiment.
  • Fig. 5 is a flowchart of a data processing method according to an exemplary embodiment.
  • FIG. 6A is a schematic diagram of a first interface according to an exemplary embodiment.
  • FIG. 6B is a schematic diagram illustrating obtaining a product sample according to an exemplary embodiment.
  • FIG. 6C is a schematic diagram illustrating obtaining a first parameter distribution according to an exemplary embodiment.
  • FIG. 7A is a schematic diagram illustrating a bad setting type according to an exemplary embodiment.
  • FIG. 7B is a schematic diagram illustrating a poor selection type according to an exemplary embodiment.
  • FIG. 8 is a schematic diagram illustrating setting a subordinate relationship according to an exemplary embodiment.
  • FIG. 9 is a schematic diagram illustrating a third interface displaying at least one combined feature according to an exemplary embodiment.
  • Fig. 10 is a flowchart showing another data processing method according to an exemplary embodiment.
  • Fig. 11 is a block diagram of a data processing apparatus according to an exemplary embodiment.
  • Fig. 12 is a block diagram of another data processing apparatus according to an exemplary embodiment.
  • the production line of industrial products includes several process equipment, and each process equipment may affect the yield of the product when abnormal operation or abnormal working parameters occur.
  • each process equipment may affect the yield of the product when abnormal operation or abnormal working parameters occur.
  • the production personnel need to locate the cause of the defective product.
  • the process equipment in the production line or the amount of data generated is relatively large, which increases the complexity of locating the cause, so that it takes a lot of time to locate the equipment that causes the failure.
  • Embodiments of the present disclosure provide a data processing system.
  • the data processing system 100 includes a data processing apparatus 300 , a display apparatus 200 and a distributed storage apparatus 400 .
  • the data processing apparatus 300 is connected to the display apparatus 200 and the distributed storage apparatus 400, respectively.
  • the distributed storage device 400 is used to store production data generated by a plurality of sample production equipment (or referred to as factory equipment).
  • the production data generated by multiple sample production equipment includes the production records of the multiple sample production equipment; for example, the production record includes the information of the sample production equipment that the multiple samples passed through in the production process and the information of the types of defects that occurred, each sample During the production process, multiple sample production equipment is experienced, and each sample production equipment participates in the production process of some samples among the multiple samples.
  • the distributed storage device stores relatively complete data (eg, a database).
  • Distributed storage devices may include multiple hardware memories, and different hardware memories are distributed in different physical locations (such as in different factories, or in different production lines), and transfer information between each other through wireless transmission (such as network, etc.) , so that the data is distributed and relational, but logically constitutes a database based on big data technology.
  • the raw data of a large number of different sample production equipment are stored in the corresponding manufacturing systems, such as YMS (Yield Management System, revenue management system), FDC (Fault Detection & Classification, error detection and classification), MES (Manufacturing Execution System) , Manufacturing Execution System) and other systems in relational databases (such as Oracle, Mysql, etc.), and these raw data can be extracted from the original table by data extraction tools (such as Sqoop, kettle, etc.) to transmit to distributed storage devices (such as distributed Hadoop Distributed File System (HDFS)) to reduce the load on sample production equipment and production and manufacturing systems, and facilitate the data reading of subsequent analysis equipment.
  • YMS Yield Management System, revenue management system
  • FDC fault Detection & Classification, error detection and classification
  • MES Manufacturing Execution System
  • other systems in relational databases such as Oracle, Mysql, etc.
  • data extraction tools such as Sqoop, kettle, etc.
  • distributed storage devices such as distributed Hadoop Distribu
  • the data in the distributed storage device can be stored in Hive tool or Hbase database format.
  • Hive tool the above raw data is first stored in the data lake; after that, you can continue to perform data cleaning, data conversion and other preprocessing in the Hive tool according to the application theme and scenario of the data, and obtain data with different themes (such as production history). topics, detection data topics, device data topics) data warehouses, and data marts with different scenarios (such as device analysis scenarios, parameter analysis scenarios).
  • the above data marts can be connected to display devices, analysis devices, etc. through different API interfaces to realize data interaction with these devices.
  • the data volume of the above raw data is very large due to the fact that multiple sample production equipments of multiple factories are involved. For example, all sample production equipment may generate hundreds of gigabytes of raw data per day, or tens of gigabytes per hour.
  • RDBMS Relational Database Management System
  • DFS Distributed File System
  • the grid computing of RDBMS is to divide the problems that require huge computing power into many small parts, and then distribute these parts to many computers for processing separately, and finally combine the calculation results.
  • Oracle RAC Real Application Clusters
  • Oracle Database is the core technology of grid computing supported by Oracle Database, in which all servers have direct access to all data in the database.
  • the grid computing application system of RDBMS cannot meet user requirements when the amount of data is large. For example, due to the limited expansion space of hardware, after the data increases to a large enough order of magnitude, the input/output bottleneck of the hard disk will cause Processing data is very inefficient.
  • the Hive tool is a data warehouse tool based on Hadoop, which can be used for data extraction, transformation and loading (ETL). complex analytical work.
  • the Hive tool does not have a special data storage format, nor does it build an index for the data. Users can freely organize the tables and process the data in the database. It can be seen that the parallel processing of distributed file management can meet the storage and processing requirements of massive data. Users can process simple data through SQL queries, and custom functions can be used for complex processing. Therefore, when analyzing the massive data of the factory, it is necessary to extract the data of the factory database into the distributed file system. On the one hand, it will not cause damage to the original data, and on the other hand, the data analysis efficiency is improved.
  • the distributed storage device 400 may be one memory, may be multiple memories, or may be a general term for multiple storage elements.
  • the memory may include: Random Access Memory (RAM), Double Data Rate Synchronous Dynamic Random Access Memory (DDR SRAM), or non-volatile memory (non-volatile memory) ), such as disk storage, flash memory (Flash), etc.
  • the data processing apparatus 300 is configured to implement the data processing method described in any of the following embodiments.
  • the data processing device 300 may acquire production records of multiple sample production devices, determine the impact score corresponding to each sample production device according to the production records of the multiple sample production devices, and determine the original parameters related to product defects according to the impact score .
  • the display device 200 is used for displaying an interface.
  • the interface may include the first interface, the second interface, the third interface, and the like described below.
  • the display device 200 may display the processing result of the data processing device 300 .
  • the display device may be a display, and may also be a product including a display, such as a television, a computer (all-in-one or a desktop), a computer, a tablet, a mobile phone, an electronic picture screen, and the like.
  • the display device may be any device that displays images, whether in motion (eg, video) or stationary (eg, still images), and whether text or images.
  • the embodiments may be implemented in or associated with a wide variety of electronic devices, such as, but not limited to, game consoles, television monitors, flat panel displays, computers Monitors, automotive displays (eg, odometer displays, etc.), navigators, cockpit controls and/or displays, electronic photographs, electronic billboards or signs, projectors, architectural structures, packaging, and aesthetic structures (eg, for a display of images of pieces of jewelry) etc.
  • electronic devices such as, but not limited to, game consoles, television monitors, flat panel displays, computers Monitors, automotive displays (eg, odometer displays, etc.), navigators, cockpit controls and/or displays, electronic photographs, electronic billboards or signs, projectors, architectural structures, packaging, and aesthetic structures (eg, for a display of images of pieces of jewelry) etc.
  • the display device described herein may include one or more displays, including one or more terminals with display capabilities, so that the data processing device can send its processed data (eg, impact parameters) to the display. device, the display device displays it again. That is, through the interface of the display device (ie, the user interaction interface), the complete interaction between the user and the system for analyzing the causes of sample failures (controlling and receiving the results) can be realized.
  • the data processing device can send its processed data (eg, impact parameters) to the display.
  • the display device displays it again. That is, through the interface of the display device (ie, the user interaction interface), the complete interaction between the user and the system for analyzing the causes of sample failures (controlling and receiving the results) can be realized.
  • Embodiments of the present disclosure provide an electronic device.
  • the electronic device may be a computer, a computer, or the like.
  • the electronic device 500 includes a data processing apparatus 300 and a display apparatus 200 .
  • the display device 200 is connected to the data processing device 300 .
  • the data processing apparatus 300 is configured to implement the data processing method described in any of the following embodiments.
  • the display device 200 is used for displaying an interface.
  • the display device 200 is used to display the processing result of the data processing device 300 .
  • the data processing device and the display device in the above-mentioned electronic equipment are similar to the data processing device and the display device in the above-mentioned data processing system, and the specific content of the data processing device and the display device in the electronic device can refer to the foregoing description. , which will not be repeated here.
  • the data processing apparatus 300 includes a memory 301 and a processor 302 .
  • the memory 301 is connected to the processor 302 .
  • the processor and the memory are connected through, for example, an I/O interface, thereby enabling information exchange.
  • One or more computer programs executable on the processor 302 are stored in the memory 301 .
  • the data processing apparatus 300 implements the data processing method described in any of the following embodiments.
  • the above-mentioned processor 302 may be one processor, or may be a collective term for multiple processing elements.
  • the processor 302 may be a general-purpose central processing unit (CPU), a microprocessor, an application-specific integrated circuit (ASIC), or one or more programs for controlling the present disclosure Implementing integrated circuits, such as one or more microprocessors.
  • the processor 302 may be a programmable device; for example, the programmable device is a CPLD (Complex Programmable Logic Device), an EPLD (Erasable Programmable Logic Device, an erasable programmable logic device) or FPGA (field-programmable gate array, field programmable gate array).
  • the above-mentioned memory 301 may be a memory, or may be a collective name of a plurality of storage elements, and is used to store executable program codes and the like. And the memory 301 may include random access memory, and may also include non-volatile memory, such as magnetic disk memory, flash memory, and the like.
  • the memory 301 is used for storing the application code for executing the solution of the present disclosure, and the execution is controlled by the processor 320 .
  • the processor 302 is configured to execute the application program code stored in the memory 301 to control the data processing apparatus 300 to implement the data processing method provided by any of the following embodiments of the present disclosure.
  • Embodiments of the present disclosure also provide a data processing method, for example, the data processing method can be applied to the above-mentioned electronic equipment, data processing system, and data processing apparatus. As shown in FIG. 5 , the data processing method includes steps 51 to 54:
  • step 51 a set of product samples is obtained; each product sample in the set of product samples includes a first parameter and a second parameter; the first parameter is used to characterize the degree of badness of the product sample, and the second parameter The raw parameters used to characterize the sample production equipment through which the product sample passed.
  • the electronic device may acquire a product sample set.
  • the product sample collection includes multiple product samples.
  • this embodiment can be used in a display panel production line; for example, in the production process of display panels (such as liquid crystal display panels, organic light emitting diode display panels, etc.) correlation.
  • display panels such as liquid crystal display panels, organic light emitting diode display panels, etc.
  • embodiments of the present disclosure may also be used for other products.
  • the product sample includes a display panel glass that includes a plurality of display panels.
  • the display panel motherboard further includes a substrate, and a plurality of display panels are disposed on the substrate.
  • the substrate may include: a rigid substrate such as glass (or referred to as a rigid substrate), or a flexible substrate such as PI (Polyimide, polyimide); it may also include: disposed on a rigid substrate or a flexible substrate buffer layer and other films.
  • a rigid substrate such as glass
  • PI Polyimide, polyimide
  • the "defect" in the embodiments of the present disclosure refers to quality defects in the product samples, and these defects may cause the product samples to be degraded or even scrapped, and may also cause the samples to be reworked or repaired. That is to say, the defective product samples in the present disclosure can be classified into different types according to needs.
  • the sample can be classified according to the direct impact of the bad on the performance of the sample, such as bad bright line, bad dark line, bad firefly (hot spot), etc.; Defects, etc.; alternatively, they can also be classified according to the general causes of the defects, such as poor array process, poor color filter technology, etc.; or, they can also be classified according to the severity of the defects, such as those that lead to scrap, those that reduce quality, etc.
  • the types of defects may not be distinguished, that is, as long as there is any defect in the sample, it is considered to be defective; otherwise, it is considered to be non-defective.
  • the defect type of the sample set is one type of defect, that is, the defect types of multiple samples included in the product sample set are the same. That is, the data processing method provided in this embodiment is implemented for one of the defective types; in other words, the reasons (ie parameters) that cause the defective types of products can be acquired each time.
  • the display device 200 may display a first interface 201 where the user performs a first input, such as a time range from time T1 to time T2 (eg, one day).
  • a first input such as a time range from time T1 to time T2 (eg, one day).
  • the data processing apparatus 300 obtains a set of product samples within the above-mentioned time range, and obtains a selection result whose effect is shown in FIG. 6B .
  • the user can also input a focus threshold (defect_ratio_glass) in the first interface 201 at the same time, and divide the selected product samples to obtain the first parameter, and the result is shown in FIG. 6C .
  • the electronic device may display a distribution map of the first parameter of each product sample.
  • the display device 200 displays an interface 202 in which the user makes an input, and the data processing device 300 determines the poor type of the sample set in response to the input.
  • the above input is to input a bad type, which is the bad type to be analyzed.
  • a first input box may be displayed on the interface 202, and the first input by the user on the interface 202 may be to directly input the defect type in the first input box on the interface 202 to determine the defect type of the product sample set .
  • the electronic device or data processing system may be preconfigured with a database including multiple defect types. Referring to FIG.
  • a first selection box may be displayed on the interface 202, and the first selection box includes options for multiple defect types (eg, in FIG. 7B ).
  • bad type A, bad type B, bad type C, etc.) the user's first input on the interface 202 may be to select from a variety of bad type options to determine the bad type of the sample set. It should be noted that this step can be selected according to actual needs, which is not limited here.
  • each product sample includes a first parameter and a second parameter.
  • the first parameter is used to characterize the degree of defect of the product sample belonging to the defect type acquired by the interface 202;
  • the second parameter is used to characterize the original parameter of the sample production equipment through which the product sample passes.
  • the first parameter of the product sample can indicate that the sample belongs to a good sample or a bad sample of a bad type; for example, according to the first parameter of the product sample, it can be obtained that the product sample is a good sample (or Say positive samples) or bad samples (or negative samples).
  • positive samples and negative samples in the multiple product samples can be obtained according to the first parameters of the multiple product samples in the product sample set.
  • the first parameter of the product sample is used to characterize the defectiveness of the product sample.
  • the ratio of the total number of defective display panels belonging to the defective type among the multiple display panels of the display panel motherboard to the total number of the multiple display panels is taken as the first sample of the product sample.
  • the defectiveness degree in the parameter characterizes the value, and this ratio can be called the defectiveness ratio of the sample; or, the total number of defective display panels belonging to the defectiveness type among the multiple display panels of the display panel motherboard is taken as the defectiveness degree in the first parameter of the sample Characteristic value.
  • the larger the value representing the degree of defectiveness in the first parameter of the product sample the greater the degree of defectiveness belonging to the type of defectiveness that is represented.
  • the total number of display panels other than the defective display panels belonging to the defective type among the multiple display panels of the display panel motherboard is the same as the number of display panels of the multiple display panels.
  • the ratio of the total number is taken as the characterization value of the defectiveness degree in the first parameter of the sample; or, the total number of defective display panels except the defective display panels belonging to the defective type among the multiple display panels of the display panel motherboard is taken as the defectiveness degree in the first parameter of the sample Characteristic value.
  • the ratio of the good pixels to the total number of pixels in the display panel the smaller the bad degree characterizing value in the first parameter of the sample is, the greater the bad degree that belongs to the bad type is represented.
  • each production line includes multiple process stations, each of which is used to perform certain processing (such as cleaning, deposition, exposure, etching, cell alignment, inspection, etc.).
  • each process station usually has multiple sample production equipment (that is, process equipment) for the same processing; of course, although the theoretical processing is the same, due to the different models, states, etc. of different process equipment, the actual The processing effect is not the same.
  • the production process of each sample needs to pass through multiple process stations, and different samples may pass through different process stations during the production process; and samples passing through the same process station may also be processed by different sample production equipment. Therefore, in a production line, each sample production equipment will participate in the production process of some samples, but not in the production process of samples, that is, each sample production equipment will participate in and only participate in the production process of some samples.
  • the second parameter is used to characterize the original parameters of the sample production equipment through which the product sample passes, and may include: the name, model or code of the sample production equipment through which the sample passes, the process site where the sample production equipment is located, The name of the production line or factory, the time when the sample production equipment produced the sample, etc.
  • There will be multiple sample production devices corresponding to each product sample so that there will be multiple original parameters of the multiple sample production devices through which the product sample represented by the second parameter passes.
  • a technical person can select an appropriate second parameter according to a specific scenario, and in the case that the R-dimensional combination feature or the subsequent influence score can be obtained by using the second parameter, the corresponding solution falls within the protection scope of the present disclosure.
  • step 52 the second parameter is processed based on a preset dimensionality reduction algorithm to obtain a combination feature of a specified dimension of the product sample set, and each combination feature in the combination feature of the specified dimension refers to the same A combination of poorly correlated raw parameters.
  • the reason for the defective product may be the sample production equipment or the process parameters, that is, the original parameters may include equipment or parameters.
  • the electronic device can also simultaneously display the second interface shown in FIG. 8: the user can make a second input on the second interface, and the electronic device can create a sample production equipment in response to the second input
  • the affiliation with process parameters such as the affiliation of DataTag-Step-Process-Parameter, where DataTag can represent a product sample (such as GlassID), Step can represent the corresponding sample production equipment, and Process can represent a processing step in the sample production equipment , Parameter can represent an original parameter (such as temperature, pressure, flow, etc.) in this processing step. That is to say, at this time, the electronic device can obtain the second parameter of the product sample, and FIG. 8 also shows the subordination relationship of one of the second parameters.
  • the user can click the analysis button.
  • the electronic device can distribute the reasons for product failure based on the above R-dimensional combination characteristics and the first parameter, and finally display the third interface as shown in Figure 9. .
  • the electronic device may acquire the combination feature of the specified dimension of the product sample set based on the second parameter. In other words, the electronic device may acquire the combined feature of the specified dimension of each product sample in the product sample set.
  • the electronic device may acquire the combined feature of a specified dimension based on a preset dimension reduction algorithm.
  • the above-mentioned dimensionality reduction algorithm includes at least one of the following: principal component analysis method PCA, linear dimensionality reduction method LDA, local linear embedding LLE and Laplacian feature map LEP.
  • PCA principal component analysis method
  • LDA linear dimensionality reduction method
  • LLE local linear embedding LLE
  • Laplacian feature map LEP Laplacian feature map
  • the electronic device may select K principal component features from the second parameter, and obtain the second combined feature of each product sample based on the K principal component features; the cumulative contribution of the K original parameters to the second parameter The value exceeds the preset contribution value threshold.
  • the electronic device may acquire the average value of each original parameter in the product sample set, and subtract the average value from each original parameter corresponding to each product sample to obtain a new value of each original parameter in each product sample.
  • principal component analysis Principal Component Analysis, PCA
  • PCA Principal Component Analysis
  • the product sample set includes M product samples, each product sample has n-dimensional features, ⁇ V 1 , V 2 , ..., V n ⁇ , Average each raw parameter of GLASS for all product samples, such as Then, the average value is subtracted from the original parameters corresponding to each GLASS to obtain the new values of the decentralized original parameters ⁇ X 1 , X 2 , ..., X n ⁇ ,
  • the electronic device can obtain the covariance of any two original parameters in the second parameter to obtain a covariance matrix; each covariance value in the covariance matrix represents the degree of similarity between the two original parameters.
  • the covariance matrix corresponding to each product sample is shown in Table 2.
  • the diagonal line in the covariance matrix is the variance of each original parameter
  • the off-diagonal line is the covariance
  • the covariance is a measure of the degree of change of the simultaneous transformation of the two original parameters. The greater the absolute value of the covariance, the greater the influence of the two on each other, and vice versa.
  • the electronic device can acquire the eigenvalues and eigenvectors of the covariance matrix, and acquire the cumulative contribution value corresponding to each eigenvalue.
  • the electronic device may acquire the eigenvalues and eigenvectors whose cumulative contribution value exceeds the preset contribution value threshold, and obtain K principal component features.
  • the electronic device can select the top k eigenvalues and eigenvectors whose cumulative feature contribution rate reaches 80%, ⁇ ( ⁇ 1 ,u 1 ),( ⁇ 2 ,u 2 ),..., ( ⁇ k , u k ) ⁇ , that is, k principal component features are obtained.
  • the electronic device can obtain the projection of each original parameter on the feature vector in each product sample after updating the above-mentioned new value, and obtain the K-dimensional combined feature of the product sample set or the K-dimensional combined feature of each product sample ; the K-dimensional combined feature can be used as the combined feature of the corresponding specified dimension in this example.
  • the k principal component features after projection are:
  • the weights of the n original parameters in the jth combined feature which means that the jth combined feature represents most of the information of these original features, and these original features have high similarity with each other.
  • at least two original parameters with the highest weights can be selected and combined, and if the combined feature is determined to be at least one combined feature described in step 54, it will be displayed in the subsequent display process.
  • the above-mentioned weights are at least two original parameters, so that it is convenient for the user to quickly locate the cause of the defect.
  • the electronic device may acquire the first combined characteristic.
  • the electronic device may acquire keywords corresponding to each original parameter in the second parameter.
  • the keyword is an explanation of the value of the original parameter, such as pressure, temperature or flow rate, etc.; in the specific implementation, a keyword can be understood as a name of the original parameter.
  • the electronic device can combine the keywords according to the preset process relationship.
  • the process relationship may include the installation position of the production equipment, the sequence in the production process, the included process steps, etc., which are not limited here.
  • This merging process can combine primitive parameters that cause the same bad type into the same combination.
  • the electronic device can merge the original parameters of the same or different processes in the same sample production equipment, such as merging the temperature parameters in different processes into a characteristic temperature parameter. in combination. It is understandable that the merging process only divides the N original parameters into different groups, but does not change the dimension of the second parameter, that is, maintains the data of the N-dimensional original parameters.
  • Table 4 shows a combined data table based on keywords.
  • the combination S1 includes three parameters, namely Step1-Process1-Paramter1_value, Step2-Process2-Paramter2_value and Step3-Process3-Paramter3_value.
  • the combination St contains 2 parameters, namely Step(n-1)-Process(n-1)-Paramter(n-1)_value, Step n-Process n-Paramter n_value.
  • the electronic device may obtain the intersection of the first combined feature and the above K-dimensional combined feature, and obtain the R-dimensional combined feature of the product sample set, or obtain The R-dimensional combined feature of each product sample in the product sample set, and the R-dimensional combined feature is used as the combined feature of the specified dimension.
  • the electronic device can perform a one-to-one comparison between the t combination features matched by the keyword and the K-dimensional combination features to ensure that each combination is similar in name (that is, each process equipment and process parameter category), and at the same time in the There is also correlation in numerical analysis, removing the principal components that do not satisfy the two conditions at the same time, and obtaining the final R-dimensional combined feature.
  • each dimension combination feature in the R-dimensional combination feature may include at least one original parameter, for the subsequent operation process, two original parameters can be selected for each dimension combination feature in this example, and the effect is shown in Table 5.
  • Fr represents the rth combined feature in the R-dimensional combined feature, which contains the original parameters Step a-Process a-Paramter a_value and Step b-Process b-Paramter b_value.
  • two original parameters with larger weights can also be selected to represent the combined features of each dimension in the K-dimensional combined feature. If these two original parameters are in any one of the t combined features , then keep the combined feature in the K-dimensional combined feature. After successive comparisons, some combined features in the K-dimensional combined features can be eliminated to obtain the R-dimensional combined features.
  • a technical person can select an appropriate solution according to a specific scenario, and the corresponding solution falls within the protection scope of the present disclosure.
  • step 53 based on the first parameter and the combined feature of the specified dimension, the impact score of each dimension combined feature in the combined feature of the specified dimension is obtained; the impact score is used to represent the impact of each combined feature on the product degree of adverse effects.
  • the electronic device may obtain a combined feature vector of each dimension based on a combined feature of a specified dimension (K-dimensional or R-dimensional); wherein the combined feature vector of each dimension includes a combined feature of the same dimension of each product sample.
  • the R-dimensional combined feature of the product sample is composed of the original parameters as elements, and the combined feature of each dimension of each product sample is extracted and reconstructed into a feature vector to obtain the above-mentioned combined feature vector of each dimension. For example, for any combination feature vector in a given product sample set X
  • the electronic device can calculate the purity indexes corresponding to the combined features of each dimension, and obtain the same number of purity indexes as the product samples in the product sample set; the purity indexes are used to indicate the degree of influence of each combined feature on product defects.
  • the yield purity index includes at least one of information gain, information gain ratio, and Gini coefficient.
  • the purity index can be characterized by "information entropy", and the smaller the information entropy, the higher the purity.
  • the purity index can also be characterized by a Gini coefficient, and the smaller the Gini coefficient, the higher the purity of the sample set.
  • the yield and purity index of the sample production equipment for multiple product samples represents the purity of the bad types that appear on the multiple product samples of the sample production equipment.
  • the lower the yield purity index of the sample production equipment the higher the uncertainty of the sample production equipment on the type of bad samples, the smaller the influence of the sample production equipment on the bad type of the sample, the higher the yield purity index, the higher the uncertainty of the sample production equipment.
  • the lower the uncertainty of the production equipment on the occurrence of bad types in the sample the greater the influence of the sample production equipment on the occurrence of bad types in the sample.
  • the embodiment of the present disclosure draws on the idea of constructing a decision tree, uses multiple sample production devices as features, and sorts the features based on the purity index.
  • the decision tree is not directly used to construct the decision tree, and the technical problem to be solved is not the prediction problem solved by the decision tree, but the idea of improving the purity of the decision tree and good quality.
  • the problem of rate analysis is combined, and based on big data technology, the problem of rapid localization of the root cause of adverse effects is solved.
  • a certain combined feature is used as a child node in the decision tree, that is, the feature attribute of the binary classification, to judge whether it is the optimal cutpoint (cutpoint); and the impurity measurement method is used in the CART tree.
  • the Gini coefficient is used to calculate the influence and importance of each feature on the entire sample set.
  • represents the number of product samples in the data combination located on the specified side of the segmentation point when any combination feature vector in the product sample set X is used as the segmentation point
  • K represents the classification category of the defective product, Here the value is 2
  • CK represents the number of product samples of the Kth category in the data combination on the specified side of the cut point.
  • the defective type of the product sample when the product sample is smaller than the cut point, the defective type of the product sample will not be affected; when the product sample is greater than or equal to the cut point (corresponding to the cut point in the above content)
  • the number of good products when the product sample is smaller than the cutting point, the designated side corresponding to the cutting point in the above content can be selected by the technician according to the specific scene, which is not limited here.
  • the contingency table shown in Table 6 can be obtained; according to the contingency table, M Ginis can be obtained.
  • the electronic device may obtain the minimum value of the purity index corresponding to the combined feature of each dimension; the minimum value is used to represent the reliability threshold of the first parameter. Take the smallest of the M Ginis, that is, the corresponding optimal cutpoint cutpoint. In other words, the minimum value is used to characterize the reliability threshold of the first parameter.
  • the electronic device may obtain the influence score of the corresponding combination feature according to the minimum value, that is, the electronic device may obtain the influence score score of the jth dimension combination feature according to (1-minimum Gini coefficient).
  • step 54 the respective combined features are sorted according to the influence score to obtain at least one combined feature that is ranked first, and the original parameter corresponding to the at least one combined feature is used as the cause of the defective product.
  • the electronic device can sort the corresponding R-dimensional combined features according to the impact score, such as from large to small or from small to large, that is, to obtain the influence of the original parameters in each combined feature on the first parameter of the product sample degree
  • Figure 9 shows the effect of the top 2 original parameters of the weight.
  • the reasons for the defect may include: product sample 1 is step3-process3-param3 and step4-process4-param4.
  • the user can clearly locate the top-ranked combined features (that is, the original parameters), etc., so as to conduct targeted troubleshooting and processing, and improve the detection efficiency.
  • the solution provided by the embodiments of the present disclosure can reduce the dimension of the product sample data by acquiring the combined feature of the specified dimension of each product sample, and the dimension of the combined feature of the specified dimension is smaller than the dimension of the parameter in each product sample; and , the combination feature of the specified dimension can be a combination of similar original parameters, which can retain the original information of the product sample and associate similar parameters, which is conducive to quickly locating the cause of product defects and improving detection efficiency.
  • An embodiment of the present disclosure also provides a data processing method, see FIG. 10 , the method includes:
  • step 101 in response to the user's first input on the first interface, a first parameter of each product sample in the product sample set is obtained; the first parameter is used to characterize the degree of defectiveness of the product sample;
  • step 102 in response to the user's second input on the second interface, a second parameter of each product sample in the product sample set is obtained; the second parameter is used to characterize the original value of the sample production equipment through which the product sample passes. parameter;
  • step 103 in response to the user's third input on the second interface, at least one combined feature is displayed on the third interface; the original parameter corresponding to the at least one combined feature is used as the cause of the defective product, and the at least one combined feature The combined feature is obtained according to the first parameter and the second parameter.
  • At least one combination feature is displayed on the third interface and is arranged in order from large to small or from small to large according to the corresponding impact score; influence level.
  • displaying at least one combined feature on the third interface includes:
  • For each combined feature of the at least one combined feature display at least two original parameters with the highest weight in each combined feature.
  • the method further includes acquiring the at least one combined feature according to the first parameter and the second parameter, specifically including:
  • the second parameter is processed based on a preset dimensionality reduction algorithm to obtain a combination feature of a specified dimension of the product sample set, and each combination feature in the combination feature of the specified dimension refers to an original parameter related to a defective product The combination;
  • the impact score of each dimension combined feature in the combined feature of the specified dimension based on the first parameter and the combined feature of the specified dimension; the impact score is used to represent the degree of influence of each combined feature on product defects;
  • the combined features are sorted to obtain at least one combined feature that is ranked first, and the original parameter corresponding to the at least one combined feature is used as the cause of the defective product.
  • the method further includes:
  • the method further includes:
  • the product sample includes a display panel motherboard; the display panel motherboard includes a plurality of display panels
  • the method further includes acquiring the at least one combined feature according to the first parameter and the second parameter, specifically including:
  • the second parameter is processed based on a preset dimensionality reduction algorithm to obtain a combination feature of a specified dimension of the product sample set, and each combination feature in the combination feature of the specified dimension refers to an original parameter related to a defective product The combination;
  • the impact score of each dimension combined feature in the combined feature of the specified dimension based on the first parameter and the combined feature of the specified dimension; the impact score is used to represent the degree of influence of each combined feature on product defects;
  • the combined features are sorted to obtain at least one combined feature that is ranked first, and the original parameter corresponding to the at least one combined feature is used as the cause of the defective product.
  • the method further includes:
  • the method further includes:
  • the product sample includes a display panel motherboard; the display panel motherboard includes a plurality of display panels.
  • An embodiment of the present disclosure further provides a data processing apparatus, see FIG. 11 , the apparatus includes:
  • the sample set obtaining module 111 is used to obtain a product sample set; each product sample in the product sample set includes a first parameter and a second parameter; the first parameter is used to characterize the degree of defectiveness of the product sample, the The second parameter is used to characterize the original parameters of the sample production equipment through which the product sample passes;
  • the combined feature acquisition module 112 is configured to process the second parameter based on a preset dimensionality reduction algorithm to obtain combined features of a specified dimension of the product sample set, where each combined feature of the combined features of the specified dimension is Refers to the combination of raw parameters associated with a defective product;
  • the influence score obtaining module 113 is configured to obtain the influence score of each dimension combination feature in the combination feature of the specified dimension based on the first parameter and the combination feature of the specified dimension; the influence score is used to characterize each dimension The degree of influence of the combination of characteristics on product defects;
  • the defective cause acquisition module 114 is configured to sort the combined features according to the impact score to obtain at least one combined feature ranked first, and use the original parameter corresponding to the at least one combined feature as the cause of the defective product.
  • the combined feature acquisition module includes:
  • the average value obtaining unit is used to obtain the average value of each original parameter of the product sample set, and subtract the average value from each original parameter corresponding to the product sample set to obtain each product sample set. the new value of the original parameter;
  • a covariance acquisition unit configured to acquire the covariance of any two original parameters in the second parameter to obtain a covariance matrix; each covariance value in the covariance matrix represents the degree of similarity between the two original parameters;
  • a contribution value obtaining unit used for obtaining the eigenvalue and eigenvector of the covariance matrix, and obtaining the cumulative contribution value corresponding to each eigenvalue;
  • the eigenvector includes the weight corresponding to each original parameter;
  • an eigenvalue acquiring unit configured to acquire eigenvalues and eigenvectors whose cumulative contribution value exceeds a preset contribution value threshold, and obtain K principal component features
  • the combined feature acquisition unit is used to acquire the projection of each original parameter in each product sample after updating the new value on the feature vector, and obtain the K-dimensional combined feature of the product sample set; the K-dimensional combined feature is used as The combined feature of the specified dimension.
  • the combined feature acquisition module further includes:
  • a first feature acquisition unit configured to combine the original parameters based on the keywords in the second parameters to obtain the first combined feature of each product book
  • the combined feature acquisition unit is further configured to acquire the intersection of the first combined feature and the K-dimensional combined feature, obtain the R-dimensional combined feature of the product sample set, and use the R-dimensional combined feature as the specified The combined feature of the dimension.
  • the influence score acquisition module includes:
  • a feature vector obtaining subunit configured to obtain a combined feature vector of each dimension based on the combined feature of the specified dimension; the combined feature vector of each dimension includes the combined feature of the same dimension of each product sample;
  • the index value calculation subunit is used to calculate the purity index corresponding to the combined features of each dimension, and obtain the same number of purity indexes as the product samples in the product sample collection; influence level;
  • the minimum value obtaining subunit is used to obtain the minimum value of the purity index corresponding to the combined feature of each dimension; the minimum value is used to represent the credibility threshold of the first parameter;
  • the influence score obtaining subunit is configured to obtain the influence score of the corresponding combined feature according to the minimum value.
  • the purity index includes at least one of information gain, information gain rate, and Gini coefficient.
  • the Gini coefficient is calculated by the following formula:
  • An embodiment of the present disclosure further provides a data processing apparatus, see FIG. 12 , the apparatus includes:
  • the first parameter obtaining module 121 is configured to obtain the first parameter of each product sample in the product sample set in response to the user's first input on the first interface; the first parameter is used to characterize the bad degree of the product sample ;
  • the second parameter acquisition module 122 is configured to, in response to a second input by the user on the second interface, acquire a second parameter of each product sample in the product sample set; the second parameter is used to characterize the sample that the product sample passes through The original parameters of the production equipment;
  • the defective cause acquisition module 123 is configured to display at least one combined feature on the third interface in response to the third input of the user on the second interface; the original parameter corresponding to the at least one combined feature is used as the cause of the defective product, and all The at least one combined feature is obtained according to the first parameter and the second parameter.
  • the bad cause acquisition module includes:
  • the original parameter display unit is configured to, for each combined feature in the at least one combined feature, display at least two original parameters with the highest weight in each combined feature.
  • the failure cause obtaining module is further configured to obtain the at least one combined feature according to the first parameter and the second parameter, specifically including:
  • the combined feature acquisition unit is configured to process the second parameter based on a preset dimensionality reduction algorithm to obtain combined features of a specified dimension of the product sample set, where each combined feature of the combined features of the specified dimension refers to A combination of raw parameters associated with poor product;
  • an influence score obtaining unit configured to obtain the influence score of each dimension combination feature in the combination feature of the specified dimension based on the first parameter and the combination feature of the specified dimension; the influence score is used to characterize each combination The degree of influence of the feature on the bad product;
  • a defective cause acquiring unit configured to sort the combined features according to the impact score to obtain at least one combined feature that is ranked first, and use the original parameter corresponding to the at least one combined feature as the cause of the defective product.
  • the apparatus further includes:
  • the distribution diagram display module is used for displaying the distribution diagram of the first parameter of each product sample.
  • the apparatus further includes:
  • the affiliation display module is used to display the affiliation of each second parameter.
  • the product sample includes a display panel motherboard; the display panel motherboard includes a plurality of display panels.
  • an electronic device comprising:
  • a memory for storing a computer program executable by the processor
  • the processor is configured to execute the computer program in the memory to implement the steps of the method as shown in FIG. 1 .
  • a computer-readable storage medium comprising an executable, such as a memory comprising instructions, the above-mentioned executable computer program being executable by a processor to implement the steps of the above-mentioned method.
  • the readable storage medium may be ROM, random access memory (RAM), CD-ROM, magnetic tape, floppy disk, optical data storage device, and the like.

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Human Resources & Organizations (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Strategic Management (AREA)
  • Economics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Engineering & Computer Science (AREA)
  • Marketing (AREA)
  • Tourism & Hospitality (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Educational Administration (AREA)
  • Development Economics (AREA)
  • General Business, Economics & Management (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computational Linguistics (AREA)
  • Manufacturing & Machinery (AREA)
  • Databases & Information Systems (AREA)
  • Game Theory and Decision Science (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Fuzzy Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Primary Health Care (AREA)
  • Automation & Control Theory (AREA)
  • General Factory Administration (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

提供了一种数据处理方法及装置、电子设备、存储介质。该方法包括:获取产品样本集合;基于预设的降维算法对第二参数进行处理,获得产品样本集合的指定维度的组合特征;基于第一参数和指定维度的组合特征获取指定维度的组合特征中各维组合特征的影响分值;根据影响分值对各组合特征进行排序获得排序靠前的至少一个组合特征,将至少一个组合特征对应的原始参数作为引起产品不良的原因。

Description

数据处理方法及装置、电子设备、存储介质 技术领域
本公开涉及数据处理技术领域,尤其涉及一种数据处理方法及装置、电子设备、存储介质。
背景技术
目前,工业产品的生产线包括若干个工艺设备,每个工艺设备在工作异常或工作参数异常时均有可能影响到产品的良率。当生产出不良产品时,生产人员需要定位出产生不良的原因。
然而,生产线中工艺设备或者所产生的数据量比较大,增加了定位原因的复杂性,从而导致定位到引起不良的设备消耗大量的时间。
发明内容
本公开提供一种数据处理方法及装置、电子设备、存储介质、存储介质,以解决相关技术的不足。
根据本公开实施例的第一方面,提供一种数据处理方法,所述方法包括:
获取产品样本集合;所述产品样本集合中每个产品样本包括第一参数和第二参数;所述第一参数用于表征所述产品样本的不良程度,所述第二参数用于表征所述产品样本经过的样本生产设备的原始参数;
基于预设的降维算法对所述第二参数进行处理,获得所述产品样本集合的指定维度的组合特征,所述指定维度的组合特征中的各组合特征是指与产品不良相关的原始参数的组合;
基于所述第一参数和所述指定维度的组合特征获取所述指定维度的组合特征中各维组合特征的影响分值;所述影响分值用于表征各个组合特征对产品不良的影响程度;
根据所述影响分值对所述各组合特征进行排序获得排序靠前的至少一个组合特征,将所述至少一个组合特征对应的原始参数作为引起产品不良的原因。
可选地,所述降维算法包括以下至少一种:主成分分析法PCA、线性降维法LDA、局部线性嵌入LLE和拉普拉斯特征映射LEP。
可选地,所述降维算法包括主成分分析法PCA,基于预设的降维算法对所述第二参 数进行处理,获得所述产品样本集合的指定维度的组合特征,包括:
基于所述主成分分析法PCA对所述第二参数进行处理,获得所述产品样本集合的K维组合特征;将所述K维组合特征作为所述指定维度的组合特征。
可选地,所述降维算法包括主成分分析法PCA,基于预设的降维算法对所述第二参数进行处理,获得所述产品样本集合的指定维度的组合特征,包括:
基于所述第二参数中的关键词对原始参数进行合并,获得每个产品本的第一组合特征;
基于所述主成分分析法PCA对所述第二参数进行处理,获得所述产品样本集合的K维组合特征;
获取所述第一组合特征和所述K维组合特征的交集,得到所述产品样本集合的R维组合特征,将所述R维组合特征作为所述指定维度的组合特征。
可选地,基于所述主成分分析法PCA对所述第二参数进行处理,获得所述产品样本集合的K维组合特征,包括:
获取所述产品样本集合每一个原始参数的平均值,并将所述产品样本集合对应的每个原始参数减去所述平均值,获得所述产品样本集合中每个原始参数的新值;
获取所述第二参数中任意两个原始参数的协方差得到协方差矩阵;所述协方差矩阵中每个协方差值表征两个原始参数相似的程度;
获取所述协方差矩阵的特征值和特征向量,并获取各个特征值对应的累计贡献值;所述特征向量中包括每个原始参数对应的权重;
获取所述累计贡献值超过预设的贡献值阈值的特征值和特征向量,获得K个主成分特征;
获取更新新值后的每个原始参数在所述特征向量上的分量,获得所述产品样本集合的K维组合特征。
可选地,基于所述第一参数和所述指定维度的组合特征获取所述指定维度的组合特征中各维组合特征的影响分值,包括:
基于所述指定维度的组合特征获取各维组合特征向量;所述各维组合特征向量中包括各个产品样本的同一维的组合特征;
计算所述各维组合特征对应的纯度指标,获得与所述产品样本集合中产品样本同一数量个纯度指标;所述纯度指标用于表示各个组合特征对产品不良的影响程度;
获取各维组合特征对应的纯度指标的最小值;所述最小值用于表征所述第一参数的可信度阈值;
根据所述最小值获取对应组合特征的影响分值。
可选地,所述纯度指标包括信息增益、信息增益率和基尼系数中的至少一种。
可选地,所述基尼系数采用以下公式计算:
Figure PCTCN2021083429-appb-000001
式中,|X|表示产品样本集合X中任一个组合特征向量作为切分点时,位于所述切分点的指定侧的数据组合中产品样本的个数;K表示产品不良的分类类别,此处取值为2;|C K|表示位于所述切分点的指定侧的数据组合中第K类的产品样本的个数。
可选地,所述方法还包括:
针对所述至少一个组合特征中的各组合特征,显示各组合特征中权重靠前的至少2个原始参数。
根据本公开实施例的第二方面,提供一种数据处理方法,所述方法包括:
响应于用户在第一界面的第一输入,获取产品样本集合中每个产品样本的第一参数;所述第一参数用于表征所述产品样本的不良程度;
响应于用户在第二界面的第二输入,获取产品样本集合中每个产品样本的第二参数;所述第二参数用于表征所述产品样本经过的样本生产设备的原始参数;
响应于用户在第二界面的第三输入,在第三界面上显示至少一个组合特征;所述至少一个组合特征对应的原始参数作为引起产品不良的原因,并且所述至少一个组合特征根据所述第一参数和所述第二参数获取。
可选地,在第三界面上显示至少一个组合特征按照对应的影响分值从大到小或者从小到大的顺序依次排列;所述影响分值用于表征各个组合特征对产品不良的影响程度。
可选地,在第三界面上显示至少一个组合特征,包括:
针对所述至少一个组合特征中的各组合特征,显示各组合特征中权重靠前的至少2个原始参数。
可选地,所述方法还包括根据所述第一参数和所述第二参数获取所述至少一个组合特征,具体包括:
基于预设的降维算法对所述第二参数进行处理,获得所述产品样本集合的指定维度的组合特征,所述指定维度的组合特征中的各组合特征是指与产品不良相关的原始参数的组合;
基于所述第一参数和所述指定维度的组合特征获取所述指定维度的组合特征中各维 组合特征的影响分值;所述影响分值用于表征各个组合特征对产品不良的影响程度;
根据所述影响分值对所述各组合特征进行排序获得排序靠前的至少一个组合特征,将所述至少一个组合特征对应的原始参数作为引起产品不良的原因。
可选地,获取产品样本集合中每个产品样本的第一参数之后,所述方法还包括:
显示每个产品样本的第一参数的分布图。
可选地,获取产品样本集合中每个产品样本的第二参数之后,所述方法还包括:
显示每个第二参数的从属关系。
可选地,所述产品样本包括显示面板母板;所述显示面板母板包括多个显示面板。
根据本公开实施例的第三方面,提供一种数据处理装置,所述装置包括:
样本集合获取模块,用于获取产品样本集合;所述产品样本集合中每个产品样本包括第一参数和第二参数;所述第一参数用于表征所述产品样本的不良程度,所述第二参数用于表征所述产品样本经过的样本生产设备的原始参数;
组合特征获取模块,用于基于预设的降维算法对所述第二参数进行处理,获得所述产品样本集合的指定维度的组合特征,所述指定维度的组合特征中的各组合特征是指与产品不良相关的原始参数的组合;
影响分值获取模块,用于基于所述第一参数和所述指定维度的组合特征获取所述指定维度的组合特征中各维组合特征的影响分值;所述影响分值用于表征各个组合特征对产品不良的影响程度;
不良原因获取模块,用于根据所述影响分值对所述各组合特征进行排序获得排序靠前的至少一个组合特征,将所述至少一个组合特征对应的原始参数作为引起产品不良的原因。
根据本公开实施例的第四方面,提供一种数据处理装置,所述装置包括:
第一参数获取模块,用于响应于用户在第一界面的第一输入,获取产品样本集合中每个产品样本的第一参数;所述第一参数用于表征所述产品样本的不良程度;
第二参数获取模块,用于响应于用户在第二界面的第二输入,获取产品样本集合中每个产品样本的第二参数;所述第二参数用于表征所述产品样本经过的样本生产设备的原始参数;
不良原因获取模块,用于响应于用户在第二界面的第三输入,在第三界面上显示至少一个组合特征;所述至少一个组合特征对应的原始参数作为引起产品不良的原因,并且所述至少一个组合特征根据所述第一参数和所述第二参数获取。
根据本公开实施例的第五方面,提供一种电子设备,包括:
处理器;
用于存储所述处理器可执行的计算机程序的存储器;
其中,所述处理器用于执行所述存储器中的计算机程序,以实现上述方法。
根据本公开实施例的第六方面,提供一种计算机可读存储介质,当所述存储介质中的可执行的计算机程序由处理器执行时,能够实现上述方法。
本公开的实施例提供的技术方案可以包括以下有益效果:
由上述实施例可知,本公开实施例提供的方案可以通过获取每个产品样本的指定维度的组合特征,该指定维度的组合特征的维度小于每个产品样本中参数的维度,可以降低产品样本数据的维度;并且,该指定维度的组合特征可以是具有相似性的原始参数的组合,可以保留产品样本的原始信息的同时使相似的参数形成关联,有利于快速定位到引起产品不良的原因,提高检测效率。
应当理解的是,以上的一般描述和后文的细节描述仅是示例性和解释性的,并不能限制本公开。
附图说明
此处的附图被并入说明书中并构成本说明书的一部分,示出了符合本公开的实施例,并与说明书一起用于解释本公开的原理。
图1是根据一示例性实施例示出的一种数据处理系统的框图。
图2是根据一示例性实施例示出的另一种数据处理系统的框图。
图3是根据一示例性实施例示出的一种电子设备的框图。
图4是根据一示例性实施例示出的另一种电子设备的框图。
图5是根据一示例性实施例示出的一种数据处理方法的流程图。
图6A是根据一示例性实施例示出的第一界面的示意图。
图6B是根据一示例性实施例示出的获取产品样本的示意图。
图6C是根据一示例性实施例示出的获取第一参数分布的示意图。
图7A是根据一示例性实施例示出的设置不良类型的示意图。
图7B是根据一示例性实施例示出的选择不良类型的示意图。
图8是根据一示例性实施例示出的设置从属关系的示意图。
图9是根据一示例性实施例示出的显示至少一个组合特征的第三界面的示意图。
图10是根据一示例性实施例示出的另一种数据处理方法的流程图。
图11是根据一示例性实施例示出的一种数据处理装置的框图。
图12是根据一示例性实施例示出的另一种数据处理装置的框图。
具体实施方式
这里将详细地对示例性实施例进行说明,其示例表示在附图中。下面的描述涉及附图时,除非另有表示,不同附图中的相同数字表示相同或相似的要素。以下示例性所描述的实施例并不代表与本公开相一致的所有实施例。相反,它们仅是与如所附权利要求书中所详述的、本公开的一些方面相一致的装置例子。
目前,工业产品的生产线包括若干个工艺设备,每个工艺设备在工作异常或工作参数异常时均有可能影响到产品的良率。当生产出不良产品时,生产人员需要定位出产生不良的原因。然而,生产线中工艺设备或者所产生的数据量比较大,增加了定位原因的复杂性,从而导致定位到引起不良的设备消耗大量的时间。
本公开实施例提供了一种数据处理系统。如图1所示,数据处理系统100包括数据处理装置300、显示装置200和分布式存储装置400。数据处理装置300分别与显示装置200和分布式存储装置400连接。
分布式存储装置400用于存储多个样本生产设备(或称为工厂设备)产生的生产数据。例如,多个样本生产设备产生的生产数据包括多个样本生产设备的生产记录;例如,生产记录包括多个样本在生产过程中经过的样本生产设备的信息和出现不良类型的信息,每个样本在生产过程中经历多个样本生产设备,每个样本生产设备参与且仅参与多个样本中部分样本的生产过程。
其中,分布式存储装置中存储有相对完整的数据(如一个数据库)。分布式存储装置可以包括多个硬件的存储器,且不同的硬件存储器分布在不同物理位置(如在不同工厂,或在不同生产线),并通过无线传输(例如网络等)实现相互之间信息的传递,从而使得数据是分布式关系的,但在逻辑上构成一个基于大数据技术的数据库。
参考图2,大量不同样本生产设备的原始数据存储在相应的生产制造系统中,如YMS(Yield Management System,收益管理系统)、FDC(Fault Detection&Classification,错误侦测及分类)、MES(Manufacturing Execution System,制造执行系统)等系统的关系型数据库(如Oracle、Mysql等)中,而这些原始数据可通过数据抽取工具(如Sqoop、kettle等)进行原表抽取以传输给分布式存储装置(如分布式文件系统(Hadoop Distributed File System,HDFS)),以降低对样本生产设备和生产制造系统的负载,便于后续分析设备的数据读取。
分布式存储装置中的数据可采用Hive工具或Hbase数据库格式存储。例如,根据 Hive工具,以上原始数据先存储在数据湖中;之后,可继续在Hive工具中按照数据的应用主题、场景等进行数据清洗、数据转换等预处理,得到具有不同主题(如生产履历主题、检测数据主题、设备数据主题)的数据仓库,以及具有不同场景(如设备分析场景、参数分析场景)的数据集市。以上数据集市可再通过不同的API接口,与显示设备、分析设备等连接,以实现与这些设备间的数据交互。
其中,由于涉及多个工厂的多个样本生产设备,故以上原始数据的数据量是很大的。例如,所有样本生产设备每天产生的原始数据可能有几百GB,每小时产生的数据也可能有几十GB。
在一实施例中,对海量结构化数据实现存储与计算主要有两种方案:RDBMS关系型数据库管理(Relational Database Management System,RDBMS)的网格计算方案;分布式文件管理系统(Distributed File System,DFS)的大数据方案。
其中,RDBMS的网格计算是把需要非常巨大的计算能力的问题分成许多小部分,然后把这些部分分配给许多计算机分别处理,最后把这些计算结果综合起来。例如,Oracle RAC(真正应用集群)是Oracle数据库支持的网格计算的核心技术,其中所有服务器都可直接访问数据库中的所有数据。但是,RDBMS的网格计算的应用系统在数据量很大时无法满足用户要求,例如,由于硬件的扩展空间有限,故数据增加到足够大的数量级后,会因为硬盘的输入/输出的瓶颈使得处理数据的效率非常低。
DFS为基础的大数据技术,则允许采用多个廉价硬件设备构建大型集群,以对海量数据进行处理。如Hive工具是基于Hadoop的数据仓库工具,可用来进行数据提取转化加载(ETL),Hive工具定义了简单的类SQL查询语言,同时也允许通过自定义的MapReduce的mapper和reducer来默认工具无法完成的复杂的分析工作。Hive工具没有专门的数据存储格式,也没有为数据建立索引,用户可以自由的组织其中的表,对数据库中的数据进行处理。可见,分布式文件管理的并行处理可满足海量数据的存储和处理要求,用户可通过SQL查询处理简单数据,而复杂处理时可采用自定义函数来实现。因此,在对工厂的海量数据分析时,需要将工厂数据库的数据抽取到分布式文件系统中,一方面不会对原始数据造成破坏,另一方面提高了数据分析效率。
在一实施例中,分布式存储装置400可以是一个存储器,可以是多个存储器,也可以是多个存储元件的统称。例如,存储器可以包括:随机存储器(Random Access Memory,RAM),双倍速率同步动态随机存储器(Double Data Rate Synchronous Dynamic Random Access Memory,DDR SRAM),也可以包括非易失性存储器(non-volatile memory),例如磁盘存储器,闪存(Flash)等。
数据处理装置300用于实现如下述任一实施例所述的数据处理方法。例如,数据处理装置300可以获取多个样本生产设备的生产记录,根据多个样本生产设备的生产记录,确定每个样本生产设备对应的影响分值,根据影响分值与产品不良相关的原始参数。
显示装置200用于显示界面。例如,该界面可以包括下文所述的第一界面、第二界面和第三界面等。例如,显示装置200可以显示数据处理装置300的处理结果。
在一实施例中,显示装置可以是显示器,还可以是包含显示器的产品,例如电视机、电脑(一体机或台式机)、计算机、平板电脑、手机、电子画屏等。在一实施例中,该显示装置可以是显示不论运动(例如,视频)还是固定(例如,静止图像)的且不论文字还是的图像的任何装置。更明确地说,预期所述实施例可实施在多种电子装置中或与多种电子装置关联,所述多种电子装置例如(但不限于)游戏控制台、电视监视器、平板显示器、计算机监视器、汽车显示器(例如,里程表显示器等)、导航仪、座舱控制器和/或显示器、电子相片、电子广告牌或指示牌、投影仪、建筑结构、包装和美学结构(例如,对于一件珠宝的图像的显示器)等。
在一实施例中,文中所述的显示装置可包括一个或多个显示器,包括一个或多个具有显示功能的终端,从而数据处理装置可将其处理后的数据(例如影响参数)发送给显示装置,显示装置再将其显示出来。也就是说,通过该显示装置的界面(也即用户交互界面),可实现用户与样本不良成因分析的系统的完全交互(控制和接收结果)。
本公开的实施例提供一种电子设备。例如,电子设备可以是电脑、计算机等。如图3所示,电子设备500包括数据处理装置300和显示装置200。显示装置200与数据处理装置300连接。
数据处理装置300用于实现如下述任一实施例所述的数据处理方法。显示装置200用于显示界面。例如,显示装置200用于显示数据处理装置300的处理结果。
需要说明的是,上述的电子设备中的数据处理装置和显示装置与上述的数据处理系统中的数据处理装置和显示装置类似,电子设备中的数据处理装置和显示装置的具体内容可以参考前文描述,在此不作赘述。
在一些实施例中,如图4所示,数据处理装置300包括存储器301和处理器302。其中,存储器301与处理器302连接。在一实施例中,处理器与存储器通过例如I/O接口连接,从而能实现信息交互。
存储器301中存储可在处理器302上运行的一个或多个计算机程序。
处理器302执行该计算机程序时,以使数据处理装置300实现如下述任一实施例所述的数据处理方法。
在一实施例中,上述处理器302可以是一个处理器,也可以是多个处理元件的统称。例如,该处理器302可以是一个通用中央处理器(central processing unit,CPU),微处理器,特定应用集成电路(application specific integrated circuit,ASIC),或一个或多个用于控制本公开方案程序执行的集成电路,例如:一个或多个微处理器。又例如,该处理器302可以是可编程器件;例如,该可编程器件为CPLD(Complex Programmable Logic Device,复杂可编程逻辑器件)、EPLD(Erasable Programmable Logic Device,可擦除可编辑逻辑器件)或者FPGA(field-programmable gate array,现场可编程门阵列)。
上述存储器301可以是一个存储器,也可以是多个存储元件的统称,且用于存储可执行程序代码等。且存储器301可以包括随机存储器,也可以包括非易失性存储器,例如磁盘存储器,闪存等。
其中,存储器301用于存储执行本公开方案的应用程序代码,并由处理器320来控制执行。处理器302用于执行存储器301中存储的应用程序代码,以控制数据处理装置300实现本公开下述任一实施例提供的数据处理方法。
本公开实施例还提供了一种数据处理方法,例如,该数据处理方法可以应用于上述的电子设备、数据处理系统、以及数据处理装置。如图5所示,该数据处理方法,包括步骤51~步骤54:
在步骤51中,获取产品样本集合;所述产品样本集合中每个产品样本包括第一参数和第二参数;所述第一参数用于表征所述产品样本的不良程度,所述第二参数用于表征所述产品样本经过的样本生产设备的原始参数。
本实施例中,电子设备可以获取产品样本集合。该产品样本集合中包括多个产品样本。在一实施例中,本实施例可用于显示面板生产线;例如可用于在显示面板(如液晶显示面板、有机发光二极管显示面板等)的生产过程中,确定显示面板生产线的各工艺设备与不良类型的相关性。当然,本公开实施例也可用于其它产品。在一实施例中,产品样本包括显示面板母板(glass),显示面板母板包括多个显示面板(panel)。例如,显示面板母板还包括基板,多个显示面板设置于基板上。在一实施例中,该基板可以包括:玻璃等刚性基板(或称为硬质基板),或者PI(Polyimide,聚酰亚胺)等柔性基板;还可以包括:设置在刚性基板或柔性基板上的缓冲层等薄膜。
需要说明的是,本公开实施例中所述的“不良”是指产品样本中的质量缺陷,这些缺陷可能导致产品样本品质降低甚至报废,也可能导致样本需要进行返工或修复。也就是说,本公开中产品样本不良可根据需要分为不同类型。例如,可根据不良对样本性能的直接影响进行分类,如亮线不良、暗线不良、萤火虫不良(hot spot)等;或者, 也可根据不良的具体成因进行分类,如信号线短路不良、对位不良等;或者,也可根据不良的大体成因进行分类,如阵列工艺不良、彩膜工艺不良等;或者,也可根据不良的严重程度进行分类,如导致报废的不良、导致降低品质的不良等;或者,也可不区分不良的种类,即只要样本存在任何不良,即认为其有不良,反之则认为其无不良。
其中,样本集合的不良类型为一种不良类型,也即产品样本集合中所包括的多个样本的不良类型相同。即本实施例提供的数据处理方法是针对其中一种不良类型来实现的;或者说,每次可以获取引起产品不良类型的原因(即参数)。
在一实施例中,参见图6A,显示装置200可以显示第一界面201,用户在第一界面201进行第一输入,如从时间T1到时间T2的时间范围(如一天)。数据处理装置300响应于该输入,获取上述时间范围内的产品样本集合,获得效果如图6B所示的选取结果。当然,用户还可以同时在第一界面201内输入聚焦阈值(defect_ratio_glass),对所选取的产品样本进行划分,以获得第一参数,结果如图6C所示。参见图6C,在设置聚焦阈值后,电子设备可以显示每个产品样本的第一参数的分布图。
在用户选择不同的聚焦阈值时,可以获取表1所示的数据表。
表1所选effect为Defect_code1时的数据表
GlassID Check Step Defect_Name Ratio END_TIME
GlassID 1 Check Step1 Defect_code1 0.022 2021-01-24 08:25:03
GlassID 2 Check Step1 Defect_code1 0.264 2021-01-28 07:43:11
GlassID m Check Step1 Defect_code1 0.011 2021-02-11 20:37:45
在一实施例中,参考图7A和图7B,显示装置200显示界面202,用户在界面202进行输入,数据处理装置300响应于该输入,确定样本集合的不良类型。例如,上述输入为输入一种不良类型,该不良类型为待分析的不良类型。例如,参考图7A,界面202上可以显示第一输入框,用户在界面202上的第一输入可以是在界面202上的第一输入框中直接输入不良类型,以确定产品样本集合的不良类型。例如,电子设备或数据处理系统可以预先配置有包括多种不良类型的数据库,参考图7B,界面202上可以显示第一选择框,第一选择框包括多种不良类型的选项(例如图7B中的不良类型A、不良类型B和不良类型C等),用户在界面202上的第一输入可以是从多种不良类型的选项中进行选择,以确定样本集合的不良类型。需要说明的是,本步骤可以根据实际需要进行选择,在此不作限定。
其中,每个产品样本包括第一参数和第二参数。其中,第一参数用于表征产品样本的属于上述界面202所获取的不良类型的不良程度;第二参数用于表征产品样本经 过的样本生产设备的原始参数。
在一实施例中,产品样本的第一参数可以表征该样本属于不良类型的良样本或不良样本;例如,根据产品样本的第一参数,可以得到该产品样本对于该不良类型是良样本(或者说正样本)或不良样本(或者说负样本)。例如,对于样本集合的不良类型,根据产品样本集合中多个产品样本的第一参数,可以得到多个产品样本中的正样本和负样本。
在一实施例中,产品样本的第一参数用于表征所述产品样本的不良程度。例如,在产品样本为显示面板母板的情况下,显示面板母板的多个显示面板中的属于不良类型的不良显示面板的总数与多个显示面板的总数的比值,作为产品样本的第一参数中的不良程度表征值,该比值可以称为样本的不良比例;或者,显示面板母板的多个显示面板中的属于不良类型的不良显示面板的总数作为样本的第一参数中的不良程度表征值。在此情况下,产品样本的第一参数中的不良程度表征值越大,表征的属于不良类型的不良程度越大。
又一实施例中,在产品样本为显示面板母板的情况下,显示面板母板的多个显示面板中除了属于不良类型的不良显示面板之外的显示面板的总数,与多个显示面板的总数的比值,作为样本的第一参数中的不良程度表征值;或者,显示面板母板的多个显示面板中除了属于不良类型的不良显示面板的总数,作为样本的第一参数中的不良程度表征值。或者说,显示面板中像素良点与像素总数的比值。在此情况下,样本的第一参数中的不良程度表征值越小,表征的属于不良类型的不良程度越大。
可以理解的是,许多产品(例如显示面板)都是通过生产线生产的,每条生产线包括多个工艺站点,每个工艺站点用于对产品(包括半成品)进行一定的处理(如清洗、沉积、曝光、刻蚀、对盒、检测等)。同时,每个工艺站点通常有多个用于进行同样处理的样本生产设备(也即工艺设备);当然,虽然理论上进行的处理相同,但不同工艺设备由于型号、状态等的不同,故实际的处理效果并不完全相同。在此情况下,每个样本的生产过程需要经过多个工艺站点,且不同样本在生产过程中经过的工艺站点可能不同;而经过同一工艺站点的样本也可能由其中的不同样本生产设备处理。因此,在一条生产线中,每个样本生产设备都会参与部分样本的生产过程,但不是参与样本的生产过程,即每个样本生产设备都参与且仅参与部分样本的生产过程。
在一实施例中,第二参数用于表征所述产品样本经过的样本生产设备的原始参数,可以包括:样本经过的样本生产设备的名称、型号或编码,样本生产设备所处的工艺站点、生产线或工厂的名称,样本生产设备产出样本的时间等。每个产品样本对应的 样本生产设备会有多个,这样第二参数所表征的该产品样本经过的多个样本生产设备的原始参数会有多个。技术人员可以根据具体场景选择合适的第二参数,在能够利用第二参数获取到R维组合特征或者后面的影响分值的情况下,相应方案落入本公开的保护范围。
在步骤52中,基于预设的降维算法对所述第二参数进行处理,获得所述产品样本集合的指定维度的组合特征,所述指定维度的组合特征中的各组合特征是指与产品不良相关的原始参数的组合。
需要说明的是,由于引起产品不良的原因可以是样本生产设备,还可以是工艺参数,即原始参数可以包括设备或者参数。在步骤51中选定样本生产设备后,电子设备还可以同时显示图8所示的第二界面:用户可以在第二界面进行第二输入,电子设备可以响应于上述第二输入建立样本生产设备和工艺参数的从属关系,如DataTag-Step-Process-Parameter的从属关系,其中DataTag可以表示产品样本(如GlassID),Step可以表示对应的样本生产设备,Process可以表示在样本生产设备中一个处理步骤,Parameter可以表示在该处理步骤中的一个原始参数(如温度、压力、流量等)。也就是说,此时电子设备可以获得产品样本的第二参数,并且图8也显示了其中一个第二参数的从属关系。
实际应用中,用户在建立从属关系之后,可以点击分析按键,此时电子设备可以基于上述R维组合特征和第一参数来分布产品产生不良的原因,最终显示如图9所示的第三界面。
本实施例中,电子设备可以基于第二参数来获取产品样本集合的的指定维度的组合特征。或者说,电子设备可以获取产品样本集合中每个产品样本的指定维度的组合特征。
在一示例中,电子设备可以基于预设的降维算法获取指定维度的组合特征。其中,上述降维算法包括以下至少一种:主成分分析法PCA、线性降维法LDA、局部线性嵌入LLE和拉普拉斯特征映射LEP。技术人员可以根据具体场景选择合适的降维算法,相应方案落入本公开的保护范围。
示例性地,电子设备可以从第二参数中选取K个主成分特征,并基于K个主成分特征获得每个产品样本的第二组合特征;K个原始参数对所述第二参数的累计贡献值超过预设的贡献值阈值。例如,电子设备可以获取产品样本集合每一个原始参数的平均值,并将各产品样本对应的每个原始参数减去该平均值,获得各产品样本中每个原始参数的新值。通过将各产品样本的原始参数去中心化,不但可以减小原始参数的大小,还 可获得原始参数的变化范围,利于后续计算。示例性地,本实施例中可以采用多元统计方法中的主成分分析(Principal Component Analysis,PCA)来获取指定维度的组合特征。
例如,产品样本集合中包括M个产品样本,每个产品样本有n维特征,{V 1,V 2,...,V n},
Figure PCTCN2021083429-appb-000002
对所有产品样本GLASS的每一个原始参数求平均值,如
Figure PCTCN2021083429-appb-000003
然后,每一张GLASS所对应的原始参数减去该平均值,得到的去中心化后的原始参数的新值{X 1,X 2,...,X n},
Figure PCTCN2021083429-appb-000004
Figure PCTCN2021083429-appb-000005
然后,电子设备可以获取第二参数中任意两个原始参数的协方差得到协方差矩阵;该协方差矩阵中每个协方差值表征两个原始参数相似的程度。
对于上述步骤中的n维特征,分别求协方差矩阵。
例如,当n=2时,x 1和x 2的协方差矩阵如下所示:
Figure PCTCN2021083429-appb-000006
基于上述思路,则可以获得产品样本中任意两个参数的协方差,公式如下:
Figure PCTCN2021083429-appb-000007
各产品样本对应的协方差矩阵如表2所示。
表2相关系数矩阵
  x 1 x 2 x n
x 1 1.00 0.86 0.37
x 2 0.86 1.00 0.69
x n 0.37 0.69 1.00
需要说明的是,协方差矩阵中对角线上是各个原始参数的方差,非对角线是协方差,协方差是衡量两个原始参数同时变换的变化程度。协方差绝对值越大,两者对彼此的影响越大,反之越小。
之后,电子设备可以获取协方差矩阵的特征值和特征向量,并获取各个特征值对应的累计贡献值。
例如,求协方差矩阵的特征值和特征向量,如下式所示:Cu=λu。此时,特征值λ会有n个,即每一个λ i对应一个特征向量u i,即i=1,2,…,n,λ 1>λ 2>…λ n。且此时
Figure PCTCN2021083429-appb-000008
即第i个成分对应的特征向量,如表3所示。
表3特征值及特征向量的关系
Figure PCTCN2021083429-appb-000009
之后,电子设备可以获取累计贡献值超过预设的贡献值阈值的特征值和特征向量,获得K个主成分特征。以贡献值阈值取80%为例,电子设备可以选取特征累计贡献率达到80%的前k个特征值与特征向量,{(λ 1,u 1),(λ 2,u 2),…,(λ k,u k)},即获得k个主成分特征。
最后,电子设备可以获取更新上述新值后的各产品样本中每个原始参数在特征向量上的投影,获得所述产品样本集合的K维组合特征或者说获得每个产品样本的K维组合特征;该K维组合特征可以作为本示例下对应的指定维度的组合特征。
例如,对于去中心化后的新的原因变量,投影之后的k个主成分特征为:
Figure PCTCN2021083429-appb-000010
而且,对于k维中的每一个主成分特征,满足以下公式:
Figure PCTCN2021083429-appb-000011
其中,
Figure PCTCN2021083429-appb-000012
Figure PCTCN2021083429-appb-000013
分别是n个原始参数在第j个组合特征中所占的权重,即表示第j个组合特征代表了这些原始特征的大部分信息,且这些原始特征彼此之间有较高的相似性。在一示例中,可以选取权重靠前的至少2个的原始参数,将其组合起来,如果在该个组合特征被确定为步骤54所述的至少一个组合特征时,则在后续显示过程中显示上述权重靠前的至少2个的原始参数,从而方便用户快速定位出不良的原因。
另需要说明的是,由于采用PCA方法获取的K维组合特征中有可能仅在数据上存在相关关系,但是在工艺流程中并不存在相应的关系,为剔除上述情况,在另一示例中,电子设备可以获取第一组合特征。示例性地,电子设备可以获取第二参数中各原始参数对应的关键词。其中关键词是对原始参数所取值的解释,如压力、温度或者流量等;具体实现中,一个关键词可以理解为原始参数的一个名称。然后,电子设备可以根据预设的工艺关系对关键词进行合并。其中工艺关系可以包括生产设备的安装位置、生产流程中的前后顺序、所包含的工艺步骤等等,在此不作限定。此合并过程中可以将引起同一不良类型的原始参数合并至同一个组合中。以将相同关键词的原始参数进行合并为例,电子设备可以将同一样本生产设备中相同或者不同工艺(process)中的原始参数进行合并,如将不同工艺中的温度参数合并到一个表征温度的组合中。可理解的是,合并过程仅是对N个原始参数分为不同的分组,但是未改变第二参数的维度即保持N维原始参数的数据。表4示出了一个基于关键词合并后的数据表。
表4基于关键词组合后所得数据表
Figure PCTCN2021083429-appb-000014
参见表4,组合S1中包含3个参数,即Step1-Process1-Paramter1_value、Step2-Process2-Paramter2_value和Step3-Process3-Paramter3_value。组合St中包含2个参数,即Step(n-1)-Process(n-1)-Paramter(n-1)_value,Step n-Process n-Paramter n_value。
本示例中,在获得第一组合特征和上述K维组合特征之后,电子设备可以获取第一组合特征和上述K维组合特征的交集,得到所述产品样本集合的R维组合特征,或者说得到产品样本集合中每个产品样本的R维组合特征,将该R维组合特征作为指定维度的组合特征。例如,电子设备可以将关键词匹配的t个组合特征和K维组合特征进行一一组合比对,确保每个组合既在名称上(即各工序设备与工艺参数类别)有相似性,同时在数值分析上也有相关性,去除未同时满足两个条件的主成分,可得最终的R维组合特征。考虑到R维组合特征中各维组合特征可能包括至少1个原始参数,为后续的运算过程,本示例中可以各维组合特征选择2个原始参数,效果如表5所示。
表5组合后边R维组合特征与聚焦阈值的对应关系
Figure PCTCN2021083429-appb-000015
参见表5,Fr表示R维组合特征中的第r个组合特征,其包含原始参数Step  a-Process a-Paramter a_value和Step b-Process b-Paramter b_value。
需要说明的是,实际应用中,还可以选取权重较大的2个原始参数来表征K维组合特征中的各维组合特征,如果这2个原始参数在t个组合特征中的任一个组合特征中,则保留K维组合特征中的该组合特征。依次比较之后,可以剔除K维组合特征中的一些组合特征,获得R维组合特征。技术人员可以根据具体场景选择合适的方案,相应方案落入本公开的保护范围。
在步骤53中,基于所述第一参数和所述指定维度的组合特征获取所述指定维度的组合特征中各维组合特征的影响分值;所述影响分值用于表征各个组合特征对产品不良的影响程度。
本实施例中,电子设备可以基于指定维度(K维或者R维)的组合特征获取各维组合特征向量;其中,各维组合特征向量中包括各个产品样本的同一维的组合特征。换言之,产品样本的R维组合特征中是以各原始参数作为元素而构成的特征,将每个产品样本的各维组合特征提取出来再重新构成一个特征向量,得到上述各维组合特征向量。如对于给定的产品样本集合X中任一个组合特征向量
Figure PCTCN2021083429-appb-000016
然后,电子设备可以计算各维组合特征对应的纯度指标,获得与产品样本集合中产品样本同一数量个纯度指标;该纯度指标用于表示各个组合特征对产品不良的影响程度。
在一些实施例中,良率纯度指标包括信息增益、信息增益率和基尼系数中的至少一种。
在一实施例中,纯度指标可以用“信息熵”(information entropy)来表征,信息熵越小,纯度越高。纯度越低,表征特征的不确定性越高,即,样本生产设备对样本出现不良类型的影响程度越小,纯度越高,表征特征的不确定性越低,即,样本生产设备对产品样本出现不良类型的影响程度越大。在本公开的一些实施例中,纯度指标还可用基尼系数来表征,基尼系数越小,样本集合的纯度越高。
其中,样本生产设备对多个产品样本的良率纯度指标,表征样本生产设备对多个产品样本上出现不良类型的纯度。例如,样本生产设备的良率纯度指标越低,样本生产设备对样本出现不良类型的不确定性越高,样本生产设备对样本出现不良类型的影响程度越小,良率纯度指标越高,样本生产设备对样本出现不良类型的不确定性越低,样本生产设备对样本出现不良类型的影响程度越大。
在一实施例中,对于样本生产中所经过的工艺设备或样本生产设备,需要在样本的每个工序的制造过程、样本生产设备的大量数据中定位出造成样本出现不良类型的 属性,即对决策树的分叉节点属性进行特征重要性的排序。因此,本公开的实施例借鉴决策树构建思想,将多个样本生产设备作为特征,基于纯度指标,对特征进行排序。
需要说明的是,在本公开的实施例中,并没有直接利用决策树进行决策树构建,解决的技术问题也并非决策树所解决的预测问题,而是将决策树中纯度提升的思想与良率分析问题进行了结合,基于大数据技术,解决了不良程度影响根因的快速定位问题。或者说,本实施例中将某一组合特征作为决策树中的一个子节点,即二分类的特征属性,判断是否为最优切分点(cutpoint);并利用CART树中是用杂质度量方法Gini系数来计算各个特征对整个样本集的影响程度和重要性,Gini系数越小,代表不确定性越低,作为切分点更优。在本公开实施例中是否不良的分类问题中,分类类别K=2,纯度指标是基尼系数时其计算方式如下式所示:
Figure PCTCN2021083429-appb-000017
式中,|X|表示产品样本集合X中任一个组合特征向量作为切分点时,位于所述切分点的指定侧的数据组合中产品样本的个数;K表示产品不良的分类类别,此处取值为2;|C K|表示位于所述切分点的指定侧的数据组合中第K类的产品样本的个数。以产品样本是显示面板为例,本示例中,当产品样本小于切分点时,此时不会影响到产品样本的不良类型;当产品样本大于或者等于切分点(对应上述内容中的切分点的指定侧)时,此时会影响到产品样本的不良类型,且当K=1时表示对不良类型有影响的良品的个数,当K=2时表示对不良类型有影响的不良品的个数。当然,在一些场景中,当产品样本小于切分点时对应上述内容中的切分点的指定侧,技术人员可以根据具体场景进行选择,在此不作限定。
对于Fj中的M个组合特征向量,分别以Fj中任一值(如F11)作为切分点按大小将该组数据划分为两组,划分后其中不良样本有subbad张Glass,无缺陷样本为subgood张Glass,此时可表6所示的列联表;根据列联表可求得M个Gini。
表6列联表
  不良
≥cutpoint组 subbad subgood
<cutpoint组 B-subbad G-subgood
之后,电子设备可以获取各维组合特征对应的纯度指标的最小值;所述最小值用于表征所述第一参数的可信度阈值。从取M个Gini中最小的,即对应的最优切分点cutpoint。或者说,该最小值用于表征第一参数的可信度阈值。
最后,电子设备可以根据最小值获取对应组合特征的影响分值,即电子设备可以根据(1-最小的Gini系数)获得第j维组合特征的影响分值score。
在步骤54中,根据所述影响分值对所述各组合特征进行排序获得排序靠前的至少一个组合特征,将所述至少一个组合特征对应的原始参数作为引起产品不良的原因。
本实施例中,电子设备可以根据影响分值对对应的R维组合特征进行排序,如从大到小或者从小到大,即获取各个组合特征中的原始参数对产品样本的第一参数的影响程度,图9示出了权重靠前的2个原始参数的效果。参见图9,在第三界面中,对于本次的不良种类,引起不良的原因可以包括:产品样本1是step3-process3-param3和step4—process4-param4。这样,用户可清晰定位到排名靠前的组合特征(即原始参数)等,从而有针对性的进行不良排查与处理,提高了检测效率。
至此,本公开实施例提供的方案可以通过获取每个产品样本的指定维度的组合特征,该指定维度的组合特征的维度小于每个产品样本中参数的维度,可以降低产品样本数据的维度;并且,该指定维度的组合特征可以是具有相似性的原始参数的组合,可以保留产品样本的原始信息的同时使相似的参数形成关联,有利于快速定位到引起产品不良的原因,提高检测效率。
本公开实施例还提供了一种数据处理方法,参见图10,所述方法包括:
在步骤101中,响应于用户在第一界面的第一输入,获取产品样本集合中每个产品样本的第一参数;所述第一参数用于表征所述产品样本的不良程度;
在步骤102中,响应于用户在第二界面的第二输入,获取产品样本集合中每个产品样本的第二参数;所述第二参数用于表征所述产品样本经过的样本生产设备的原始参数;
在步骤103中,响应于用户在第二界面的第三输入,在第三界面上显示至少一个组合特征;所述至少一个组合特征对应的原始参数作为引起产品不良的原因,并且所述至少一个组合特征根据所述第一参数和所述第二参数获取。
在一实施例中,在第三界面上显示至少一个组合特征按照对应的影响分值从大到小或者从小到大的顺序依次排列;所述影响分值用于表征各个组合特征对产品不良的影响程度。
在一实施例中,在第三界面上显示至少一个组合特征,包括:
针对所述至少一个组合特征中的各组合特征,显示各组合特征中权重靠前的至少2个原始参数。
在一实施例中,所述方法还包括根据所述第一参数和所述第二参数获取所述至少一个组合特征,具体包括:
基于预设的降维算法对所述第二参数进行处理,获得所述产品样本集合的指定维度的组合特征,所述指定维度的组合特征中的各组合特征是指与产品不良相关的原始参数的组合;
基于所述第一参数和所述指定维度的组合特征获取所述指定维度的组合特征中各维组合特征的影响分值;所述影响分值用于表征各个组合特征对产品不良的影响程度;
根据所述影响分值对所述各组合特征进行排序获得排序靠前的至少一个组合特征,将所述至少一个组合特征对应的原始参数作为引起产品不良的原因。
在一实施例中,获取产品样本集合中每个产品样本的第一参数之后,所述方法还包括:
显示每个产品样本的第一参数的分布图。
在一实施例中,获取产品样本集合中每个产品样本的第二参数之后,所述方法还包括:
显示每个第二参数的从属关系。
在一实施例中,所述产品样本包括显示面板母板;所述显示面板母板包括多个显示面板
所述方法还包括根据所述第一参数和所述第二参数获取所述至少一个组合特征,具体包括:
基于预设的降维算法对所述第二参数进行处理,获得所述产品样本集合的指定维度的组合特征,所述指定维度的组合特征中的各组合特征是指与产品不良相关的原始参数的组合;
基于所述第一参数和所述指定维度的组合特征获取所述指定维度的组合特征中各维组合特征的影响分值;所述影响分值用于表征各个组合特征对产品不良的影响程度;
根据所述影响分值对所述各组合特征进行排序获得排序靠前的至少一个组合特征,将所述至少一个组合特征对应的原始参数作为引起产品不良的原因。
在一实施例中,获取产品样本集合中每个产品样本的第一参数之后,所述方法 还包括:
显示每个产品样本的第一参数的分布图。
在一实施例中,获取产品样本集合中每个产品样本的第二参数之后,所述方法还包括:
显示每个第二参数的从属关系。
在一实施例中,所述产品样本包括显示面板母板;所述显示面板母板包括多个显示面板。
本公开实施例还提供了一种数据处理装置,参见图11,所述装置包括:
样本集合获取模块111,用于获取产品样本集合;所述产品样本集合中每个产品样本包括第一参数和第二参数;所述第一参数用于表征所述产品样本的不良程度,所述第二参数用于表征所述产品样本经过的样本生产设备的原始参数;
组合特征获取模块112,用于基于预设的降维算法对所述第二参数进行处理,获得所述产品样本集合的指定维度的组合特征,所述指定维度的组合特征中的各组合特征是指与产品不良相关的原始参数的组合;
影响分值获取模块113,用于基于所述第一参数和所述指定维度的组合特征获取所述指定维度的组合特征中各维组合特征的影响分值;所述影响分值用于表征各个组合特征对产品不良的影响程度;
不良原因获取模块114,用于根据所述影响分值对所述各组合特征进行排序获得排序靠前的至少一个组合特征,将所述至少一个组合特征对应的原始参数作为引起产品不良的原因。
在一实施例中,所述组合特征获取模块包括:
平均值获取单元,用于获取所述产品样本集合每一个原始参数的平均值,并将所述产品样本集合对应的每个原始参数减去所述平均值,获得所述产品样本集合中每个原始参数的新值;
协方差获取单元,用于获取所述第二参数中任意两个原始参数的协方差得到协方差矩阵;所述协方差矩阵中每个协方差值表征两个原始参数相似的程度;
贡献值获取单元,用于获取所述协方差矩阵的特征值和特征向量,并获取各个特征值对应的累计贡献值;所述特征向量中包括每个原始参数对应的权重;
特征值获取单元,用于获取所述累计贡献值超过预设的贡献值阈值的特征值和特征向量,获得K个主成分特征;
组合特征获取单元,用于获取更新新值后的各产品样本中每个原始参数在所述 特征向量上的投影,获得所述产品样本集合的K维组合特征;将所述K维组合特征作为所述指定维度的组合特征。
在一实施例中,所述组合特征获取模块还包括:
第一特征获取单元,用于基于所述第二参数中的关键词对原始参数进行合并,获得每个产品本的第一组合特征;
所述组合特征获取单元,还用于获取所述第一组合特征和所述K维组合特征的交集,得到所述产品样本集合的R维组合特征,将所述R维组合特征作为所述指定维度的组合特征。
在一实施例中,所述影响分值获取模块包括:
特征向量获取子单元,用于基于所述指定维度的组合特征获取各维组合特征向量;所述各维组合特征向量中包括各个产品样本的同一维的组合特征;
指标值计算子单元,用于计算所述各维组合特征对应的纯度指标,获得与所述产品样本集合中产品样本同一数量个纯度指标;所述纯度指标用于表示各个组合特征对产品不良的影响程度;
最小值获取子单元,用于获取各维组合特征对应的纯度指标的最小值;所述最小值用于表征所述第一参数的可信度阈值;
影响分值获取子单元,用于根据所述最小值获取对应组合特征的影响分值。
在一实施例中,所述纯度指标包括信息增益、信息增益率和基尼系数中的至少一种。
在一实施例中,所述基尼系数采用以下公式计算:
Figure PCTCN2021083429-appb-000018
式中,|X|表示产品样本集合X中任一个组合特征向量作为切分点时,位于所述切分点的指定侧的数据组合中产品样本的个数;K表示产品不良的分类类别,此处取值为2;|C K|表示位于所述切分点的指定侧的数据组合中第K类的产品样本的个数。可理解的是,本公开实施例提供的装置与上述方法相对应,具体内容可以参考方法各实施例的内容,在此不再赘述。
本公开实施例还提供了一种数据处理装置,参见图12,所述装置包括:
第一参数获取模块121,用于响应于用户在第一界面的第一输入,获取产品样本集合中每个产品样本的第一参数;所述第一参数用于表征所述产品样本的不良程度;
第二参数获取模块122,用于响应于用户在第二界面的第二输入,获取产品样本集合中每个产品样本的第二参数;所述第二参数用于表征所述产品样本经过的样本生产设备的原始参数;
不良原因获取模块123,用于响应于用户在第二界面的第三输入,在第三界面上显示至少一个组合特征;所述至少一个组合特征对应的原始参数作为引起产品不良的原因,并且所述至少一个组合特征根据所述第一参数和所述第二参数获取。
在一实施例中,在第三界面上显示至少一个组合特征按照对应的影响分值从大到小或者从小到大的顺序依次排列;所述影响分值用于表征各个组合特征对产品不良的影响程度。在一实施例中,所述不良原因获取模块包括:
原始参数显示单元,用于针对所述至少一个组合特征中的各组合特征,显示各组合特征中权重靠前的至少2个原始参数。
在一实施例中,所述不良原因获取模块还用于根据所述第一参数和所述第二参数获取所述至少一个组合特征,具体包括:
组合特征获取单元,用于基于预设的降维算法对所述第二参数进行处理,获得所述产品样本集合的指定维度的组合特征,所述指定维度的组合特征中的各组合特征是指与产品不良相关的原始参数的组合;
影响分值获取单元,用于基于所述第一参数和所述指定维度的组合特征获取所述指定维度的组合特征中各维组合特征的影响分值;所述影响分值用于表征各个组合特征对产品不良的影响程度;
不良原因获取单元,用于根据所述影响分值对所述各组合特征进行排序获得排序靠前的至少一个组合特征,将所述至少一个组合特征对应的原始参数作为引起产品不良的原因。
在一实施例中,所述装置还包括:
分布图显示模块,用于显示每个产品样本的第一参数的分布图。
在一实施例中,所述装置还包括:
从属关系显示模块,用于显示每个第二参数的从属关系。
在一实施例中,所述产品样本包括显示面板母板;所述显示面板母板包括多个显示面板。
可理解的是,本公开实施例提供的装置与上述方法相对应,具体内容可以参考方法各实施例的内容,在此不再赘述。
在示例性实施例中,还提供了一种电子设备,包括:
显示器;
处理器;
用于存储所述处理器可执行的计算机程序的存储器;
其中,所述处理器用于执行所述存储器中的计算机程序,以实现如图1所述方法的步骤。
在示例性实施例中,还提供了一种包括可执行的计算机可读存储介质,例如包括指令的存储器,上述可执行的计算机程序可由处理器执行,以实现如上述方法的步骤。其中,可读存储介质可以是ROM、随机存取存储器(RAM)、CD-ROM、磁带、软盘和光数据存储设备等。
本领域技术人员在考虑说明书及实践这里公开的公开后,将容易想到本公开的其它实施方案。本公开旨在涵盖任何变型、用途或者适应性变化,这些变型、用途或者适应性变化遵循本公开的一般性原理并包括本公开未公开的本技术领域中的公知常识或惯用技术手段。说明书和实施例仅被视为示例性的,本公开的真正范围和精神由下面的权利要求指出。
应当理解的是,本公开并不局限于上面已经描述并在附图中示出的精确结构,并且可以在不脱离其范围进行各种修改和改变。本公开的范围仅由所附的权利要求来限制。

Claims (20)

  1. 一种数据处理方法,其特征在于,所述方法包括:
    获取产品样本集合;所述产品样本集合中每个产品样本包括第一参数和第二参数;所述第一参数用于表征所述产品样本的不良程度,所述第二参数用于表征所述产品样本经过的样本生产设备的原始参数;
    基于预设的降维算法对所述第二参数进行处理,获得所述产品样本集合的指定维度的组合特征,所述指定维度的组合特征中的各组合特征是指与产品不良相关的原始参数的组合;
    基于所述第一参数和所述指定维度的组合特征获取所述指定维度的组合特征中各维组合特征的影响分值;所述影响分值用于表征各个组合特征对产品不良的影响程度;
    根据所述影响分值对所述各组合特征进行排序获得排序靠前的至少一个组合特征,将所述至少一个组合特征对应的原始参数作为引起产品不良的原因。
  2. 根据权利要求1所述的方法,其特征在于,所述降维算法包括以下至少一种:主成分分析法PCA、线性降维法LDA、局部线性嵌入LLE和拉普拉斯特征映射LEP。
  3. 根据权利要求2所述的方法,其特征在于,所述降维算法包括主成分分析法PCA,基于预设的降维算法对所述第二参数进行处理,获得所述产品样本集合的指定维度的组合特征,包括:
    基于所述主成分分析法PCA对所述第二参数进行处理,获得所述产品样本集合的K维组合特征;将所述K维组合特征作为所述指定维度的组合特征。
  4. 根据权利要求2所述的方法,其特征在于,所述降维算法包括主成分分析法PCA,基于预设的降维算法对所述第二参数进行处理,获得所述产品样本集合的指定维度的组合特征,包括:
    基于所述第二参数中的关键词对原始参数进行合并,获得每个产品本的第一组合特征;
    基于所述主成分分析法PCA对所述第二参数进行处理,获得所述产品样本集合的K维组合特征;
    获取所述第一组合特征和所述K维组合特征的交集,得到所述产品样本集合的R维组合特征,将所述R维组合特征作为所述指定维度的组合特征。
  5. 根据权利要求3或4所述的方法,其特征在于,基于所述主成分分析法PCA对所述第二参数进行处理,获得所述产品样本集合的K维组合特征,包括:
    获取所述产品样本集合每一个原始参数的平均值,并将所述产品样本集合对应的每 个原始参数减去所述平均值,获得所述产品样本集合中每个原始参数的新值;
    获取所述第二参数中任意两个原始参数的协方差得到协方差矩阵;所述协方差矩阵中每个协方差值表征两个原始参数相似的程度;
    获取所述协方差矩阵的特征值和特征向量,并获取各个特征值对应的累计贡献值;所述特征向量中包括每个原始参数对应的权重;
    获取所述累计贡献值超过预设的贡献值阈值的特征值和特征向量,获得K个主成分特征;
    获取更新新值后的每个原始参数在所述特征向量上的分量,获得所述产品样本集合的K维组合特征。
  6. 根据权利要求1所述的方法,其特征在于,基于所述第一参数和所述指定维度的组合特征获取所述指定维度的组合特征中各维组合特征的影响分值,包括:
    基于所述指定维度的组合特征获取各维组合特征向量;所述各维组合特征向量中包括各个产品样本的同一维的组合特征;
    计算所述各维组合特征对应的纯度指标,获得与所述产品样本集合中产品样本同一数量个纯度指标;所述纯度指标用于表示各个组合特征对产品不良的影响程度;
    获取各维组合特征对应的纯度指标的最小值;所述最小值用于表征所述第一参数的可信度阈值;
    根据所述最小值获取对应组合特征的影响分值。
  7. 根据权利要求6所述的方法,其特征在于,所述纯度指标包括信息增益、信息增益率和基尼系数中的至少一种。
  8. 根据权利要求7所述的方法,其特征在于,所述基尼系数采用以下公式计算:
    Figure PCTCN2021083429-appb-100001
    式中,|X|表示产品样本集合X中任一个组合特征向量作为切分点时,位于所述切分点的指定侧的数据组合中产品样本的个数;K表示产品不良的分类类别,此处取值为2;|C K|表示位于所述切分点的指定侧的数据组合中第K类的产品样本的个数。
  9. 根据权利要求1所述的方法,其特征在于,所述方法还包括:
    针对所述至少一个组合特征中的各组合特征,显示各组合特征中权重靠前的至少2个原始参数。
  10. 一种数据处理方法,其特征在于,所述方法包括:
    响应于用户在第一界面的第一输入,获取产品样本集合中每个产品样本的第一参数;所述第一参数用于表征所述产品样本的不良程度;
    响应于用户在第二界面的第二输入,获取产品样本集合中每个产品样本的第二参数;所述第二参数用于表征所述产品样本经过的样本生产设备的原始参数;
    响应于用户在第二界面的第三输入,在第三界面上显示至少一个组合特征;所述至少一个组合特征对应的原始参数作为引起产品不良的原因,并且所述至少一个组合特征根据所述第一参数和所述第二参数获取。
  11. 根据权利要求10所述的方法,其特征在于,在第三界面上显示至少一个组合特征按照对应的影响分值从大到小或者从小到大的顺序依次排列;所述影响分值用于表征各个组合特征对产品不良的影响程度。
  12. 根据权利要求10或者11所述的方法,其特征在于,在第三界面上显示至少一个组合特征,包括:
    针对所述至少一个组合特征中的各组合特征,显示各组合特征中权重靠前的至少2个原始参数。
  13. 根据权利要求10所述的方法,其特征在于,所述方法还包括根据所述第一参数和所述第二参数获取所述至少一个组合特征,具体包括:
    基于预设的降维算法对所述第二参数进行处理,获得所述产品样本集合的指定维度的组合特征,所述指定维度的组合特征中的各组合特征是指与产品不良相关的原始参数的组合;
    基于所述第一参数和所述指定维度的组合特征获取所述指定维度的组合特征中各维组合特征的影响分值;所述影响分值用于表征各个组合特征对产品不良的影响程度;
    根据所述影响分值对所述各组合特征进行排序获得排序靠前的至少一个组合特征,将所述至少一个组合特征对应的原始参数作为引起产品不良的原因。
  14. 根据权利要求10所述的方法,其特征在于,获取产品样本集合中每个产品样本的第一参数之后,所述方法还包括:
    显示每个产品样本的第一参数的分布图。
  15. 根据权利要求10所述的方法,其特征在于,获取产品样本集合中每个产品样本的第二参数之后,所述方法还包括:
    显示每个第二参数的从属关系。
  16. 根据权利要求10所述的方法,其特征在于,所述产品样本包括显示面板母板;所述显示面板母板包括多个显示面板。
  17. 一种数据处理装置,其特征在于,所述装置包括:
    样本集合获取模块,用于获取产品样本集合;所述产品样本集合中每个产品样本包括第一参数和第二参数;所述第一参数用于表征所述产品样本的不良程度,所述第二参数用于表征所述产品样本经过的样本生产设备的原始参数;
    组合特征获取模块,用于基于预设的降维算法对所述第二参数进行处理,获得所述产品样本集合的指定维度的组合特征,所述指定维度的组合特征中的各组合特征是指与产品不良相关的原始参数的组合;
    影响分值获取模块,用于基于所述第一参数和所述指定维度的组合特征获取所述指定维度的组合特征中各维组合特征的影响分值;所述影响分值用于表征各个组合特征对产品不良的影响程度;
    不良原因获取模块,用于根据所述影响分值对所述各组合特征进行排序获得排序靠前的至少一个组合特征,将所述至少一个组合特征对应的原始参数作为引起产品不良的原因。
  18. 一种数据处理装置,其特征在于,所述装置包括:
    第一参数获取模块,用于响应于用户在第一界面的第一输入,获取产品样本集合中每个产品样本的第一参数;所述第一参数用于表征所述产品样本的不良程度;
    第二参数获取模块,用于响应于用户在第二界面的第二输入,获取产品样本集合中每个产品样本的第二参数;所述第二参数用于表征所述产品样本经过的样本生产设备的原始参数;
    不良原因获取模块,用于响应于用户在第二界面的第三输入,在第三界面上显示至少一个组合特征;所述至少一个组合特征对应的原始参数作为引起产品不良的原因,并且所述至少一个组合特征根据所述第一参数和所述第二参数获取。
  19. 一种电子设备,其特征在于,包括:
    显示器;
    处理器;
    用于存储所述处理器可执行的计算机程序的存储器;
    其中,所述处理器用于执行所述存储器中的计算机程序,以实现如权利要求1~16任一项所述方法。
  20. 一种计算机可读存储介质,其特征在于,当所述存储介质中的可执行的计算机程序由处理器执行时,能够实现如权利要求1~16任一项所述方法。
PCT/CN2021/083429 2021-03-26 2021-03-26 数据处理方法及装置、电子设备、存储介质 WO2022198680A1 (zh)

Priority Applications (4)

Application Number Priority Date Filing Date Title
CN202180000618.6A CN115413349A (zh) 2021-03-26 2021-03-26 数据处理方法及装置、电子设备、存储介质
PCT/CN2021/083429 WO2022198680A1 (zh) 2021-03-26 2021-03-26 数据处理方法及装置、电子设备、存储介质
KR1020237002264A KR20230161409A (ko) 2021-03-26 2021-03-26 데이터 처리 방법 및 장치, 전자 장비, 저장 매체
DE112021001736.5T DE112021001736T5 (de) 2021-03-26 2021-03-26 Datenverarbeitungsverfahren und -vorrichtungen, elektronisches gerät und speichermedium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2021/083429 WO2022198680A1 (zh) 2021-03-26 2021-03-26 数据处理方法及装置、电子设备、存储介质

Publications (1)

Publication Number Publication Date
WO2022198680A1 true WO2022198680A1 (zh) 2022-09-29

Family

ID=83395084

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/083429 WO2022198680A1 (zh) 2021-03-26 2021-03-26 数据处理方法及装置、电子设备、存储介质

Country Status (4)

Country Link
KR (1) KR20230161409A (zh)
CN (1) CN115413349A (zh)
DE (1) DE112021001736T5 (zh)
WO (1) WO2022198680A1 (zh)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120263376A1 (en) * 2011-04-12 2012-10-18 Sharp Laboratories Of America, Inc. Supervised and semi-supervised online boosting algorithm in machine learning framework
CN110276410A (zh) * 2019-06-27 2019-09-24 京东方科技集团股份有限公司 确定不良原因的方法、装置、电子设备及存储介质
CN112269818A (zh) * 2020-11-25 2021-01-26 成都数之联科技有限公司 一种设备参数根因定位方法及系统、装置、介质

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120263376A1 (en) * 2011-04-12 2012-10-18 Sharp Laboratories Of America, Inc. Supervised and semi-supervised online boosting algorithm in machine learning framework
CN110276410A (zh) * 2019-06-27 2019-09-24 京东方科技集团股份有限公司 确定不良原因的方法、装置、电子设备及存储介质
CN112269818A (zh) * 2020-11-25 2021-01-26 成都数之联科技有限公司 一种设备参数根因定位方法及系统、装置、介质

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
WANG WENXING: "Analysis and Application of Stubborn Low-yield Problem of Product Manufacturing Based on Data Mining", MASTER THESIS, TIANJIN POLYTECHNIC UNIVERSITY, CN, 15 July 2020 (2020-07-15), CN , pages 1 - 84, XP055971045, ISSN: 1674-0246, DOI: 10.27005/d.cnki.gdzku.2020.001277 *
郝文宁等 (HAO, WENNING ET AL.): "决策树分类实验 (Decision Tree Classification Experiment)", 数据分析与数据挖掘实验指导书 (EXPERIMENT MANUAL OF DATA ANALYSIS AND DATA MINING), 31 March 2016 (2016-03-31), XP09541532 *

Also Published As

Publication number Publication date
DE112021001736T5 (de) 2023-01-05
KR20230161409A (ko) 2023-11-27
CN115413349A (zh) 2022-11-29

Similar Documents

Publication Publication Date Title
Chien et al. A system for online detection and classification of wafer bin map defect patterns for manufacturing intelligence
Yuan-Fu A deep learning model for identification of defect patterns in semiconductor wafer map
CN110825644A (zh) 一种跨项目软件缺陷预测方法及其系统
US11972548B2 (en) Computer-implemented method for defect analysis, apparatus for defect analysis, computer-program product, and intelligent defect analysis system
US20210364999A1 (en) System and method for analyzing cause of product defect, computer readable medium
WO2023103527A1 (zh) 一种访问频次的预测方法及装置
US20220179873A1 (en) Data management platform, intelligent defect analysis system, intelligent defect analysis method, computer-program product, and method for defect analysis
CN114868092A (zh) 数据管理平台、智能缺陷分析系统、智能缺陷分析方法、计算机程序产品和用于缺陷分析的方法
WO2021142622A1 (zh) 确定不良原因的方法、电子设备、存储介质及系统
US20190050672A1 (en) INCREMENTAL AUTOMATIC UPDATE OF RANKED NEIGHBOR LISTS BASED ON k-th NEAREST NEIGHBORS
Chen et al. Wafer map defect pattern detection method based on improved attention mechanism
CN112215655B (zh) 一种客户画像的标签管理方法及系统
US20220374004A1 (en) Computer-implemented method for defect analysis, computer-implemented method of evaluating likelihood of defect occurrence, apparatus for defect analysis, computer-program product, and intelligent defect analysis system
EP3745321A1 (en) An operating envelope recommendation system with guaranteed probabilistic coverage
WO2022198680A1 (zh) 数据处理方法及装置、电子设备、存储介质
US11847599B1 (en) Computing system for automated evaluation of process workflows
CN114418011B (zh) 一种产品不良成因分析的方法、设备及系统、存储介质
US20240004375A1 (en) Data processing method, and electronic device and storage medium
Wang et al. Hierarchical graph convolutional network for data evaluation of dynamic graphs
Zhuang et al. DyS-IENN: a novel multiclass imbalanced learning method for early warning of tardiness in rocket final assembly process
CN113868460A (zh) 一种图像检索方法、装置及系统
CN114766023B (zh) 数据处理方法、装置及系统、电子设备
WO2024055281A1 (zh) 异常根因分析方法及装置
Qiu Systematic Risk Analysis of Semiconductor Global Market Based on Deep Feature Fusion K-Means Algorithm
WO2022252051A1 (zh) 数据处理方法、装置、设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21932299

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 21932299

Country of ref document: EP

Kind code of ref document: A1