WO2022227094A1 - 数据处理方法、装置、设备及存储介质 - Google Patents

数据处理方法、装置、设备及存储介质 Download PDF

Info

Publication number
WO2022227094A1
WO2022227094A1 PCT/CN2021/091775 CN2021091775W WO2022227094A1 WO 2022227094 A1 WO2022227094 A1 WO 2022227094A1 CN 2021091775 W CN2021091775 W CN 2021091775W WO 2022227094 A1 WO2022227094 A1 WO 2022227094A1
Authority
WO
WIPO (PCT)
Prior art keywords
sample
production
process information
samples
degree
Prior art date
Application number
PCT/CN2021/091775
Other languages
English (en)
French (fr)
Inventor
王瑜
王海金
贺王强
柴栋
雷一鸣
王洪
吴建民
Original Assignee
京东方科技集团股份有限公司
北京中祥英科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 京东方科技集团股份有限公司, 北京中祥英科技有限公司 filed Critical 京东方科技集团股份有限公司
Priority to CN202180001029.XA priority Critical patent/CN115623872A/zh
Priority to PCT/CN2021/091775 priority patent/WO2022227094A1/zh
Priority to US18/253,961 priority patent/US20240004375A1/en
Publication of WO2022227094A1 publication Critical patent/WO2022227094A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B19/00Programme-control systems
    • G05B19/02Programme-control systems electric
    • G05B19/418Total factory control, i.e. centrally controlling a plurality of machines, e.g. direct or distributed numerical control [DNC], flexible manufacturing systems [FMS], integrated manufacturing systems [IMS], computer integrated manufacturing [CIM]
    • G05B19/41875Total factory control, i.e. centrally controlling a plurality of machines, e.g. direct or distributed numerical control [DNC], flexible manufacturing systems [FMS], integrated manufacturing systems [IMS], computer integrated manufacturing [CIM] characterised by quality surveillance of production
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B19/00Programme-control systems
    • G05B19/02Programme-control systems electric
    • G05B19/418Total factory control, i.e. centrally controlling a plurality of machines, e.g. direct or distributed numerical control [DNC], flexible manufacturing systems [FMS], integrated manufacturing systems [IMS], computer integrated manufacturing [CIM]
    • G05B19/4184Total factory control, i.e. centrally controlling a plurality of machines, e.g. direct or distributed numerical control [DNC], flexible manufacturing systems [FMS], integrated manufacturing systems [IMS], computer integrated manufacturing [CIM] characterised by fault tolerance, reliability of production system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B2219/00Program-control systems
    • G05B2219/30Nc systems
    • G05B2219/32Operator till task planning
    • G05B2219/32187Correlation between controlling parameters for influence on quality parameters
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B2219/00Program-control systems
    • G05B2219/30Nc systems
    • G05B2219/32Operator till task planning
    • G05B2219/32196Store audit, history of inspection, control and workpiece data into database
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B2219/00Program-control systems
    • G05B2219/30Nc systems
    • G05B2219/32Operator till task planning
    • G05B2219/32368Quality control

Definitions

  • the present disclosure relates to the technical field of data processing, and in particular, to a data processing method, apparatus, device, and storage medium.
  • the process steps that the product passes through and the process parameters corresponding to the process steps will affect the performance of the product, which may lead to the performance of the product not meeting the standard (also known as bad).
  • the process steps include the equipment passed through during the production of the product. Therefore, for products with substandard performance, it is necessary to determine the reasons for the sudden failure of products from the process steps and process parameters.
  • a data processing method comprising: obtaining a production record corresponding to each sample in a plurality of samples; the production record includes process information, production time corresponding to the process information, and an index value; and the process information is process parameters and/or process steps ;
  • the index value is used to characterize the degree of defect that the sample belongs to the preset defect type; the multiple samples include the defective sample, and the poor sample is the sample whose index value is greater than the first threshold; according to the index value in the obtained production record and the time corresponding to the process information , determine the high-incidence time period, which is the time period when the distribution probability of bad samples is greater than the second threshold; according to the high-incidence time period and the obtained production records, determine the degree of influence of process information on the sudden bad.
  • the above-mentioned determining the degree of influence of the process information on the sudden defect according to the high defect occurrence time period and the obtained production records includes: determining the index of the sample in the defect high occurrence time period The target distribution of the value in the production time corresponding to the process step; the difference value between the target distribution and the preset distribution is determined; the difference value is used to characterize the distribution probability of the defective samples after the process step; influence level.
  • the above-mentioned determining the target distribution of the index value of the sample in the high-incidence time period on the production time corresponding to the process step includes: quantifying the production time corresponding to the process step into the time value corresponding to the process step; determining The target distribution of the index values of the samples in the time period of high occurrence of defects on the time values corresponding to the process steps.
  • the above-mentioned determining the target distribution of the index values of the samples in the high-incidence time period on the time values corresponding to the process steps includes: using a polynomial curve fitting method to fit the time values corresponding to the process steps as a fit the index value; the distribution of the fitting index value on the time value is determined as: the target distribution of the index value of the sample in the time period corresponding to the process step in the high-incidence time period.
  • the preset distribution is a standard normal distribution
  • the target distribution is a polynomial distribution
  • determining the difference value between the target distribution and the preset distribution includes: using a significance test to obtain the difference value between the target distribution and the standard normal distribution .
  • the above-mentioned determining the time period with high occurrence of defects according to the obtained index value in the production record and the time corresponding to the process information includes: dividing a plurality of samples according to the first threshold and the obtained index value in the production record are good samples and bad samples; the time period when the ratio of the number of bad samples in the production record to the total number of samples is greater than the second threshold is determined as the time period of high occurrence of bad samples; the time period is the time period corresponding to the process information.
  • determining the degree of influence of the process information on the sudden defect according to the high defect occurrence time period and the obtained production record including: using mutation point detection to obtain the first production time ;
  • the first production time is the mutation time point of the index value;
  • the first production time is the time point in the high-incidence time period;
  • the critical change point of the process parameters in the production record is obtained, and the time corresponding to the critical change point is determined as the second Production time.
  • the difference between the first production time and the second production time is determined, and the degree of influence of the process parameters on the sudden failure is determined according to the difference.
  • obtaining the critical change point of the process parameter in the production record includes: obtaining the Gini coefficient of the process parameter in the production record; and determining the value of the process parameter with the smallest Gini coefficient as the critical change point of the process parameter.
  • the Gini coefficient of the process information in the production record is obtained; the Gini coefficient of the process information is used to represent the degree of correlation between the process information and the index value of the sample; according to the Gini coefficient of the process information, it is determined that the process information has no effect on the sudden failure degree of influence.
  • the production record further includes a sample identifier
  • the above-mentioned acquiring the Gini coefficient of the process information in the production record includes: acquiring the process parameter corresponding to the sample identifier from the production record and Index value; obtain the Gini coefficient of the process parameter according to the process parameter corresponding to the sample identification and the index value.
  • the production record further includes a sample identification
  • acquiring the Gini coefficient of the process information in the production record includes: acquiring the process step and index corresponding to the sample identification from the production record value; obtain the Gini coefficient of the process step according to the process step corresponding to the sample identification and the index value.
  • the method further includes: performing a chi-square test on the process steps in the production record to obtain a chi-square test value of the index value of the process step on the sample, and the chi-square test value is used to characterize the index of the process step on the sample
  • the degree of influence of the value; according to the first preset weight, the chi-square test value and the Gini coefficient of the process step, the degree of influence of the process step on the sudden failure is determined.
  • the method further includes: performing correlation inspection on the process parameters in the production records and the index values of the samples to obtain the influence parameters of the process parameters; the influence parameters are used to characterize the degree of influence of the process parameters on the index values of the samples; According to the second preset weight, the influence parameter and the Gini coefficient of the process parameter, the influence degree of the process parameter on the sudden failure is obtained.
  • obtaining the production record corresponding to each sample in the plurality of samples includes: obtaining the first correspondence between the sample identifier and the index value of each sample in the plurality of samples, and obtaining the each sample.
  • the sample identification, process information, and the second corresponding relationship of the process information corresponding to the production time; according to the sample identification, the first corresponding relationship and the second corresponding relationship of each sample, the process information of each sample, the corresponding production time of the process information and the second corresponding relationship are established.
  • the third correspondence of the index value is established.
  • a data processing method comprising: receiving a sample screening condition input by a user on a condition selection interface; obtaining a production record of each sample in a plurality of samples corresponding to the sample screening condition; the production record includes process information, process The information corresponds to the production time and the index value; the process information is the process parameter and/or the process step; the index value is used to represent the degree of failure of the sample belonging to the preset defect type; the multiple samples include the defective sample, and the defective sample is the index value greater than the first threshold value According to the index values in the obtained production records and the time corresponding to the process information, determine the high-incidence time period, which is the time period when the distribution probability of the bad samples is greater than the second threshold; The production records are used to determine the degree of influence of process information on sudden defects; the degree of influence of process information on sudden defects is displayed on the analysis result display interface.
  • the above-mentioned displaying on the analysis result display interface the degree of influence of the process information on the sudden failure includes: sorting the degree of influence of the acquired plurality of process information on the sudden failure; after displaying the sorting on the analysis result display interface The degree of influence of the process information on the sudden adverse effect.
  • a data processing device comprising: an acquisition module, a first determination module and a second determination module, where the acquisition module is used to acquire a production record corresponding to each sample in a plurality of samples; the production record includes process information, process The information corresponds to the production time and the index value; the process information is the process parameter and/or the process step; the index value is used to represent the degree of failure of the sample belonging to the preset defect type; the multiple samples include the defective sample, and the defective sample is the index value greater than the first threshold value
  • the first determination module is used to determine the time period of high occurrence of defects according to the index value in the obtained production record and the time corresponding to the process information, and the time period of high incidence of defects is the time period in which the distribution probability of the bad samples is greater than the second threshold;
  • the second determination module is used to determine the degree of influence of process information on sudden defects according to the high defect occurrence time period and the obtained production records.
  • the second determination module is specifically configured to: determine the target distribution of the index value of the sample in the high defect high occurrence time period on the production time corresponding to the process step; determine the target distribution and The difference value of the preset distribution; the difference value is used to characterize the distribution probability of the defective samples that have passed through the process step; the degree of influence of the process step on the sudden defect is determined according to the difference value.
  • the second determining module is specifically used to: convert the production time corresponding to the process step into a time value corresponding to the process step; determine that the index value of the sample in the high-incidence time period of the defect is on the time value corresponding to the process step target distribution.
  • the second determining module is specifically configured to: adopt a polynomial curve fitting method to fit the time value corresponding to the process step as the fitting index value; and determine the distribution of the fitting index value on the time value as : The target distribution of the index values of the samples in the time period with high occurrence of defects in the time values corresponding to the process steps.
  • the preset distribution is a standard normal distribution
  • the target distribution is a polynomial distribution
  • the second determining module is specifically configured to: obtain a difference value between the target distribution and the standard normal distribution by using a significance test.
  • the first determining module is specifically configured to: divide the multiple samples into good samples and bad samples according to the first threshold and the obtained index value in the production record; compare the number of bad samples in the production record with the number of samples The time period in which the ratio of the total number is greater than the second threshold value is determined as the time period with high occurrence of defects; the time period is the time period corresponding to the process information.
  • the acquisition module is further configured to: acquire the first production time by detecting the mutation point; the first production time is the mutation time point of the index value; the first production time is The time point in the time period of high occurrence of defects; the critical change point of the process parameters in the production record is obtained, and the time corresponding to the critical change point is determined as the second production time, and the second determination module is specifically used to determine the first production time and the second production time.
  • the difference in production time is determined, and the degree of influence of process parameters on sudden defects is determined according to the difference.
  • the obtaining module is further configured to: obtain the Gini coefficient of the process parameter in the production record; and determine the value of the process parameter with the smallest Gini coefficient as the critical change point of the process parameter.
  • the acquisition module is further used to: acquire the Gini coefficient of the process information in the production record; the Gini coefficient of the process information is used to characterize the degree of association between the process information and the index value of the sample; the second determination module is specifically used for: According to the Gini coefficient of the process information, the influence degree of the process information on the sudden failure is determined.
  • the production record further includes a sample identifier
  • the acquisition module is specifically configured to: acquire from the production record the process parameters and index values corresponding to the sample identifier; The corresponding process parameters and index values are used to obtain the Gini coefficient of the process parameters.
  • the production record further includes a sample identifier
  • the acquisition module is specifically configured to: acquire from the production record the process step and index value corresponding to the sample identifier; Corresponding process steps and index values to obtain the Gini coefficient of the process steps.
  • the data processing device further includes a checking module for performing a chi-square test on the process steps in the production records to obtain a chi-square test value of the index value of the sample by the process step, and the chi-square test value is used to characterize the process The degree of influence of the step on the index value of the sample; the second determination module is specifically configured to: determine the degree of influence of the process step on the sudden failure according to the first preset weight, the chi-square test value and the Gini coefficient of the process step.
  • the inspection module is further configured to: perform correlation inspection on the process parameters in the production records and the index values of the samples to obtain the influence parameters of the process parameters; the influence parameters are used to characterize the degree of influence of the process parameters on the index values of the samples
  • the second determination module is specifically used for: obtaining the influence degree of the process parameter on the sudden failure according to the second preset weight, the influence parameter and the Gini coefficient of the process parameter.
  • the above-mentioned obtaining module is specifically configured to obtain the first correspondence between the sample identifier and the index value of each sample in the plurality of samples, and obtain the sample identifier, process information, and corresponding process information of each sample.
  • the second corresponding relationship of production time; according to the sample identification, the first corresponding relationship and the second corresponding relationship of each sample, the third corresponding relationship between the process information of each sample, the production time corresponding to the process information and the index value is established.
  • a data processing device comprising: a receiving module, an obtaining module, a determining module and a display module, the receiving module is used for receiving sample screening conditions input by a user on a condition selection interface; the obtaining module is used for obtaining and screening samples
  • the determination module is used to determine the high occurrence time period of defectiveness according to the index value in the obtained production record and the time corresponding to the process information.
  • the high-incidence time period is the time period when the distribution probability of bad samples is greater than the second threshold; according to the high-incidence time period and the obtained production records, the influence degree of the process information on the sudden bad is determined; the display module is used to display on the analysis result display interface The degree of influence of process information on sudden failures.
  • the data processing apparatus further includes: a sorting module, configured to sort the degree of influence of the acquired plurality of process information on the unexpected failure; the display module is specifically configured to: display the sorted processes on the analysis result display interface The degree of influence of information on sudden adverse effects.
  • an electronic device comprising a processor and a memory for storing executable instructions of the processor; wherein the processor is configured to execute the executable instructions to implement any of the above aspects.
  • a computer-readable storage medium stores computer program instructions that, when executed on a processor, cause the processor to perform one or more steps in the data processing method described in any of the foregoing embodiments .
  • a computer program product includes computer program instructions that, when executed on a computer, cause the computer to perform one or more steps of the data processing method according to any of the above embodiments.
  • a computer program When the computer program is executed on a computer, the computer program causes the computer to perform one or more steps in the data processing method as described in any of the above embodiments.
  • FIG. 1 is a block diagram of a data processing system according to some embodiments.
  • FIG. 2 is a block diagram of a data processing system combined with data processing according to some embodiments
  • FIG. 3 is a block diagram of an electronic device according to some embodiments.
  • FIG. 4 is a flowchart of a data processing method according to some embodiments.
  • 5 is a distribution diagram of positive and negative samples according to some embodiments.
  • FIG. 6 is a distribution graph of index values of samples over production time according to some embodiments.
  • FIG. 7 is a graph comparing a target distribution and a standard normal distribution according to some embodiments.
  • FIG. 8 is a flowchart of a data processing method according to some embodiments.
  • FIG. 9 is a flowchart of another data processing method according to some embodiments.
  • FIG. 10 is a block diagram of a condition selection interface according to some embodiments.
  • FIG. 11 is a block diagram of a cause variable input interface according to some embodiments.
  • FIG. 12 is a structural diagram of an analysis result display interface according to some embodiments.
  • FIG. 13 is a structural diagram of a data processing apparatus 70 according to some embodiments.
  • FIG. 14 is a structural diagram of another data processing apparatus 80 according to some embodiments.
  • first and second are only used for descriptive purposes, and should not be construed as indicating or implying relative importance or implicitly indicating the number of indicated technical features. Thus, a feature defined as “first” or “second” may expressly or implicitly include one or more of that feature.
  • plural means two or more.
  • the expressions “coupled” and “connected” and their derivatives may be used.
  • the term “connected” may be used in describing some embodiments to indicate that two or more components are in direct physical or electrical contact with each other.
  • the term “coupled” may be used in describing some embodiments to indicate that two or more components are in direct physical or electrical contact.
  • the terms “coupled” or “communicatively coupled” may also mean that two or more components are not in direct contact with each other, but yet still co-operate or interact with each other.
  • the embodiments disclosed herein are not necessarily limited by the content herein.
  • the term “if” is optionally construed to mean “when” or “at” or “in response to determining” or “in response to detecting,” depending on the context.
  • the phrases “if it is determined that" or “if a [statement or event] is detected” are optionally interpreted to mean “in determining" or “in response to determining" or “on detection of [recited condition or event]” or “in response to detection of [recited condition or event]”.
  • any production process that the product passes through, the equipment involved in the production process, and the configuration parameters of the equipment will affect the performance of the product, which may cause the performance of the product to fail to meet the standard (aka bad).
  • process parameters also known as process parameters
  • an embodiment of the present disclosure provides a data processing method, which performs automatic diagnosis and analysis through a data mining method, and uses the data generated in each production process of the entire factory to obtain a production record corresponding to each sample in a plurality of samples;
  • the index values of the samples in the obtained production records and the time corresponding to the process information are used to determine the high-incidence time period.
  • the high-incidence time period and the obtained production records determine the degree of influence of the process information on the unexpected defects, and convert them into quantitative Judgment indicators (such as correlation quantification value), thereby improving detection efficiency, so that users can make comprehensive and rapid decisions and locate the cause of sudden failure.
  • the data processing method provided by the embodiments of the present disclosure is applicable to the data processing system 10 shown in FIG. 1 , and the data processing system 10 includes a data processing apparatus 100 , a display apparatus 200 and a distributed storage apparatus 300 .
  • the data processing apparatus 100 is coupled to the display apparatus 200 and the distributed storage apparatus 300, respectively.
  • the distributed storage device 300 is configured to store production data generated by a plurality of devices (or referred to as plant devices).
  • the production data generated by multiple devices includes the production records of the multiple devices; for example, the production records include the identifiers of the devices that the multiple samples passed through during the production process, the environmental parameters corresponding to the devices, the index values, and the production time.
  • the sample goes through at least one device during production.
  • the distributed storage device 300 stores relatively complete data (eg, a database).
  • the distributed storage device 300 may include a plurality of hardware memories, and different hardware memories are distributed in different physical locations (such as in different factories, or in different production lines), and realize mutual information through wireless transmission (such as a network, etc.). transfer, so that the data is distributed and relational, but logically constitutes a database based on big data technology.
  • the raw data of a large number of different equipment is stored in the corresponding manufacturing systems, such as yield management system (Yield Management System, YMS), error detection and classification (Fault Detection & Classification, FDC), manufacturing execution system (Manufacturing Execution System, MES)
  • yield management system Yield Management System, YMS
  • error detection and classification Fault Detection & Classification, FDC
  • manufacturing execution system Manufacturing Execution System, MES
  • MES Manufacturing Execution System
  • relational database such as Oracle, Mysql, etc.
  • data extraction tools such as Sqoop, kettle, etc.
  • the distributed storage device 300 such as distributed file system ( Hadoop Distributed File System, HDFS)
  • the data in the distributed storage device 300 may be stored in the Hive tool or Hbase database format.
  • the above raw data is first stored in the database; after that, data cleaning, data conversion and other preprocessing can be continued in the Hive tool to obtain the production record data warehouse of the sample.
  • the data warehouse can be connected to the display device 200, the data processing device 100, etc. through different API interfaces to realize data interaction with these devices.
  • the display device 200 displays a selection page, and the selection page is used for the user to select filter conditions.
  • the filter conditions include result variables, cause variables, and filter conditions (for example, sample categories and preset time periods, etc.).
  • Dimensional analysis and/or intelligent mining are used to perform bad diagnosis analysis, and the analysis result obtained by the data processing device 100 through the bad diagnosis analysis is displayed to the user on the analysis result display page of the display device 200 .
  • the data volume of the above raw data is very large.
  • the raw data generated by all devices may be hundreds of gigabytes per day, and the data generated per hour may also be tens of gigabytes.
  • the Hive tool is a data warehouse tool based on Hadoop, which can be used for data extraction, transformation and loading (ETL). complex analytical work.
  • the Hive tool does not have a special data storage format, nor does it build an index for the data. Users can freely organize the tables and process the data in the database. It can be seen that the parallel processing of distributed file management can meet the storage and processing requirements of massive data. Users can process simple data through SQL queries, and custom functions can be used for complex processing. Therefore, when analyzing the massive data of the factory, it is necessary to extract the data of the factory database into the distributed file system. On the one hand, it will not cause damage to the original data, and on the other hand, the data analysis efficiency is improved.
  • a relational database can be any of Oracle, DB2, MySQL, Microsoft SQL Server, and Microsoft Access.
  • Distributed computing decomposes a computing task into multiple subtasks, and assigns multiple subtasks to multiple computer devices for simultaneous processing. Finally, the processing results obtained by each computer device are aggregated into a final result.
  • the distributed storage device 300 may be one memory, may be multiple memories, or may be a general term for multiple storage elements.
  • the memory may include: Random Access Memory (RAM), Double Data Rate Synchronous Dynamic Random Access Memory (DDR SRAM), or non-volatile memory (non-volatile memory) ), such as disk storage, flash memory (Flash), etc.
  • the data processing apparatus 100 may be any terminal device, server, virtual machine or server cluster.
  • the display device 200 may be a display, and may also be a product including a display, such as a television, a computer (all-in-one or a desktop), a computer, a tablet computer, a mobile phone, an electronic picture screen, and the like.
  • the display device may be any device that displays images, whether in motion (eg, video) or stationary (eg, still images), and whether text or images.
  • the embodiments may be implemented in or associated with a wide variety of electronic devices, such as, but not limited to, game consoles, television monitors, flat panel displays, computers Monitors, automotive displays (eg, odometer displays, etc.), navigators, cockpit controls and/or displays, electronic photographs, electronic billboards or signs, projectors, architectural structures, packaging, and aesthetic structures (eg, for a display of images of pieces of jewelry) etc.
  • electronic devices such as, but not limited to, game consoles, television monitors, flat panel displays, computers Monitors, automotive displays (eg, odometer displays, etc.), navigators, cockpit controls and/or displays, electronic photographs, electronic billboards or signs, projectors, architectural structures, packaging, and aesthetic structures (eg, for a display of images of pieces of jewelry) etc.
  • the display device 200 described herein may include one or more displays, including one or more terminals with a display function, so that the data processing device 100 can send its processed data (eg, impact parameters) to the display. device 200, the display device 200 displays it again. That is, through the interface of the display device 200 (ie, the user interaction interface), full user interaction (controlling and receiving results) with the data processing system 10 can be achieved.
  • the interface of the display device 200 ie, the user interaction interface
  • the functions of the data processing device 100, the display device 200 and the distributed storage device 300 can be integrated into one electronic device or two electronic devices, or they can be implemented separately by different devices.
  • the functions of the display device 200 and the distributed storage device 300 are not limited in this embodiment of the present disclosure.
  • the functions of the data processing apparatus 100 , the display apparatus 200 and the distributed storage apparatus 300 described above may all be implemented by the electronic device 30 as shown in FIG. 3 .
  • the electronic device 30 in FIG. 3 includes, but is not limited to, a processor 301, a memory 302, an input unit 303, an interface unit 304, a power supply 305, and the like.
  • electronic device 30 includes display 306 .
  • the processor 301 is the control center of the electronic device, using various interfaces and lines to connect various parts of the entire electronic device, by running or executing the software programs and/or modules stored in the memory 302, and calling the data stored in the memory 302. , perform various functions of electronic equipment and process data, so as to monitor electronic equipment as a whole.
  • the processor 301 may include one or more processing units; optionally, the processor 301 may integrate an application processor and a modem processor, wherein the application processor mainly processes the operating system, user interface, and application programs, etc.
  • the modulation processor mainly handles wireless communication. It can be understood that, the above-mentioned modulation and demodulation processor may not be integrated into the processor 301.
  • Memory 302 may be used to store software programs as well as various data.
  • the memory 302 may mainly include a stored program area and a stored data area, wherein the stored program area may store an operating system, an application program required by at least one functional unit, and the like. Additionally, memory 302 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device.
  • the memory 302 may be a non-transitory computer-readable storage medium, for example, the non-transitory computer-readable storage medium may be a read-only memory (read-only memory, ROM), a random access memory (random access memory, RAM) ), CD-ROMs, magnetic tapes, floppy disks, and optical data storage devices, etc.
  • ROM read-only memory
  • RAM random access memory
  • CD-ROMs compact disc-read-only memory
  • magnetic tapes magnetic tapes
  • floppy disks floppy disks
  • optical data storage devices etc.
  • the input unit 303 may be a keyboard, a touch screen or other devices.
  • the interface unit 304 is an interface for connecting an external device to the electronic device 30 .
  • external devices may include wired or wireless headset ports, external power (or battery charger) ports, wired or wireless data ports, memory card ports, ports for connecting devices with identification modules, audio input/output (I/O) ports, video I/O ports, headphone ports, and more.
  • the interface unit 304 may be used to receive input (eg, data information, etc.) from an external device and transmit the received input to one or more elements within the electronic device 30 or may be used to communicate between the electronic device 30 and the external device transfer data.
  • the power source 305 (for example, a battery) can be used to supply power to various components.
  • the power source 305 can be logically connected to the processor 301 through a power management system, so that functions such as managing charging, discharging, and power consumption can be implemented through the power management system. .
  • the display 306 is used to display information input by the user or information provided to the user (eg, data processed by the processor 301).
  • the display 306 may include a display panel, and the display panel may be configured in the form of a liquid crystal display (LCD), an organic light-emitting diode (OLED), or the like.
  • the electronic device 30 is the display device 200
  • the electronic device 30 includes a display 306 .
  • the computer instructions in the embodiments of the present disclosure may also be referred to as application program codes or systems, which are not specifically limited in the embodiments of the present disclosure.
  • the electronic device shown in FIG. 3 is only an example, which does not limit the electronic device to which the embodiments of the present disclosure are applicable. In actual implementation, the electronic device may include more or less devices or devices than those shown in FIG. 3 .
  • FIG. 4 is a flowchart of a data processing method provided by an embodiment of the present disclosure. The method can be applied to the electronic device shown in FIG. 3 , and the method shown in FIG. 4 may include the following steps:
  • the electronic device obtains a production record corresponding to each of the multiple samples; the production record includes process information, production time corresponding to the process information, and an index value.
  • the process information is process parameters and/or process steps, and the index value is used to represent the degree of failure of the sample belonging to the preset failure type.
  • the multiple samples include bad samples, and the bad samples are samples whose index value is greater than the first threshold.
  • the first threshold may be preset according to experience, or the electronic device may determine the first threshold according to the distribution of the index value of each sample in the multiple samples.
  • the sample is the glass of the production panel
  • the index value of the sample is the defective rate of the glass belonging to the preset defective type
  • the defective rate is the number of defective panels produced by the glass and the difference of the panels produced by the glass. ratio of the total number. 90% of the plurality of glasses have a defective rate of 10%, then, the electronic device determines that the first threshold is 10%.
  • the process parameters include at least one of temperature, pressure or flow.
  • Process steps can be process identifications and/or equipment identifications.
  • a first correspondence between sample identifiers and index values, and a second correspondence between sample identifiers, process information, and production time corresponding to the process information are stored in the memory or the distributed storage system.
  • the electronic device obtains the first correspondence and the second correspondence from the memory or the distributed storage system, and associates the index value in the first correspondence, the process information in the second correspondence, and the process information corresponding to the production time through the sample identifier Then, the process information of the sample, the third correspondence between the production time and the index value corresponding to the process information are obtained, that is, the production records of the multiple samples are obtained.
  • the electronic device acquires an identifier of a display panel of a specific model from the Hbase database, and obtains a production record corresponding to each display panel according to the acquired identifier of the display panel.
  • samples in the embodiments of the present disclosure may be display panels in a display panel production line; of course, the samples in the embodiments of the present disclosure may also be other products.
  • the production record corresponding to the sample may further include a display panel glass, and the display panel glass may be produced and processed into a plurality of display panels.
  • the preset bad type refers to the type of quality defect of the sample, and the quality defect of the sample may cause the performance of the sample to be lower than the performance threshold.
  • the present disclosure does not limit the way of classifying the quality defects (also known as defects) of the samples.
  • the defects can be classified into different types according to requirements. For example, it can be classified according to the direct impact of the bad on the performance of the sample, such as bad bright line, bad dark line, bad firefly (hot spot), etc.; Defects, etc.; alternatively, they can also be classified according to the general causes of the defects, such as poor array process, poor color filter technology, etc.; or, they can also be classified according to the severity of the defects, such as those that lead to scrap, those that reduce quality, etc.
  • the types of defects may not be distinguished, that is, as long as there is any defect in the sample, it is considered to be defective; otherwise, it is considered to be non-defective.
  • the defective type of each sample in the multiple samples of the present application is the same defective type.
  • the electronic device receives a production record corresponding to each of the multiple samples.
  • part of the data in the production record obtained by the electronic device is shown in Table 1 below.
  • the index value of the sample is taken as an example of statistical indicators such as thickness or electrical parameters:
  • GlassID1 is the sample ID
  • Step 1 is the process steps that the sample represented by GlassID1 goes through in the production process
  • VTH is the bad type of the sample represented by GlassID1
  • -2.14833 is the index value
  • 2020-03-25 12: 18:13 is the production time when the sample characterized by GlassID1 goes through step 1 in the production process.
  • the device represented by device 1 is the device that the sample represented by GlassID1 passes through in step 1 in the production process.
  • n is a positive integer. The rest are similar to this and will not be repeated here.
  • part of the data in the production records obtained by the electronic device is shown in Table 2 below:
  • GlassID1 is the sample ID
  • step 1 is the process step that the sample represented by GlassID1 goes through in the production process
  • parameter 1 is the configuration parameter of the sample represented by GlassID1 when it goes through step 1
  • 457 is the value of parameter 1
  • the index value 0.022 is the index value of the sample represented by GlassID1
  • 2020-05-07 05:49:55 is the production time of the sample represented by GlassID1 going through step 1 in the production process. The rest are similar to this and will not be repeated here.
  • the value of the process parameter and the corresponding production time may be collected based on a certain event trigger, and the value of the process parameter and the corresponding production time of the sample used in the embodiment of the present disclosure may be: collected The obtained value of a plurality of process parameters of the sample and the value of one process parameter and its corresponding production time of the sample.
  • the production records obtained by the electronic device may be data that has been integrated into the above-mentioned form, and the electronic device may also, after receiving the original production data of the sample, integrate the original production data of the sample into the above-mentioned data according to the identification of the sample.
  • the production record form is not limited in this embodiment of the present disclosure.
  • the above-mentioned index values are usually different from the initial sources of process information. When the index values and process information (such as: process steps or process parameters, etc.) come from different data sources, the electronic equipment can Metric values are associated with process information.
  • the electronic device determines a time period with high occurrence of defects according to the obtained index value in the production record and the time corresponding to the process information.
  • the electronic device divides the acquired samples into good samples and bad samples according to the first threshold value and the acquired index value in the production record, and then the electronic device divides the acquired production samples into The time period in which the ratio of the number of defective samples to the total number of samples in the record is greater than the second threshold is determined as the time period of high occurrence of defects, wherein the time period is the time period corresponding to the process information.
  • the second threshold may be preset according to experience, or the electronic device may determine the second threshold according to the distribution of bad samples.
  • the positive samples and negative samples divided by the electronic device are shown in FIG. 5 : in FIG. 5 , the horizontal axis is the time corresponding to the process step, and the vertical axis is the index value of the samples.
  • the first threshold is 0.1. It can be understood that the horizontal axis can also be process steps. This embodiment of the present disclosure does not limit this.
  • the high-incidence time period determined based on the example electronic device in FIG. 5 is from April 30, 2020 to May 1, 2020.
  • the electronic device acquires the first production time according to the detection of the mutation point; the first production time is the mutation time point of the index value in the acquired production record.
  • the first production time is a time point in the time period of high occurrence of defects.
  • the electronic device obtains the time series x 1 , x 2 , x 3 , . . . , x n corresponding to the process parameters from the obtained production records, and uses the Pettitt mutation point detection to obtain the first production time.
  • the Pettitt mutation point test is a nonparametric test that not only obtains mutation points, but also quantifies the level of statistical significance of mutation points.
  • the method directly utilizes the rank-sum sequence to detect mutation points.
  • the electronic device calculates the statistics U t,n , U t,n to satisfy the following formula:
  • U t, n is a statistic
  • n is the number of times corresponding to the process parameters in the obtained production records
  • x i is each time in x 1 , x 2 , x 3 ,..., x n , t is an integer greater than 2 and less than or equal to n
  • i is a positive integer greater than 1 and less than or equal to n.
  • k t max 1 ⁇ t ⁇ n
  • point t is the mutation point
  • k t is the largest absolute value among
  • the detected mutation points were considered statistically significant if P ⁇ 0.05.
  • the electronic device determines that the corresponding production time at point t is the first production time.
  • the first production time is the mutation time point of the index value (also known as the high incidence time point of bad).
  • the index value suddenly becomes smaller than the third threshold value -2.5 on March 21, 2020, and the samples whose index value is lower than -2.5 are bad samples.
  • the electronic device determines March 21, 2020 as the mutation time point of the index value.
  • S102 The electronic device determines the degree of influence of the process information on the sudden defect according to the high defect occurrence time period and the obtained production record.
  • the electronic device determines the degree of influence of the process information on the sudden failure through the following steps:
  • Step 1 The electronic device determines the target distribution of the index values of the samples in the high-incidence time period on the production time corresponding to the process step.
  • the electronic device converts the production time value corresponding to the process step in the high defect high occurrence time period into the time value corresponding to the process step. Then, the electronic device determines the target distribution of the index value of the sample in the time value corresponding to the process step in the high-defect-incidence time period.
  • the electronic device adopts a polynomial curve fitting method to fit the time value corresponding to the process step as the fitting index value, and determines the distribution of the fitting index value on the time value as: Target distribution of index values over time values corresponding to process steps.
  • the electronic device obtains the first difference value between the fitting index value and the index value of the sample, and in the case that the first difference value is less than or equal to the fourth threshold, the following step 2 is performed.
  • the first difference value is the largest difference value among the difference values of the fitted index value and the index value of the corresponding sample.
  • the fourth threshold can be set according to experience.
  • the fourth threshold is 0.6
  • the electronic device quantifies the production time corresponding to the process steps as t1, t2, t3, ..., tm, and uses a quartic polynomial curve fitting method for t1, t2, t3, ..., tm
  • the index values of the samples corresponding to t1, t2, t3, ..., tm respectively are index values x1, x2, x3, ..., xm .
  • xi is any index value among x1, x2, x3, ..., xm. If the maximum error between x′ i and the corresponding xi is less than or equal to 0.6, the following step 2 is performed.
  • Step 2 The electronic device determines a second difference value between the target distribution and the preset distribution; the second difference value is used to characterize the distribution probability of the defective samples that have passed through the process step.
  • the preset distribution is a distribution obtained by summarizing experience, and the preset distribution may be a standard normal distribution.
  • the electronic device uses a significance test to obtain the difference value between the target distribution and the standard normal distribution.
  • the electronic device transforms the standard normal distribution of the same time series into a probability density function to obtain s 1 , s 2 , s 3 , ..., s m , converting s 1 , s 2 , s 3 , . .., s m and x' 1 , x' 2 , x' 3 , ..., x' m are tested for significance.
  • the electronic equipment uses the Mann-Whitney U test (Mann-Whitney U test) nonparametric test method to determine the standard normal distribution (s 1 , s 2 , s 3 , ..., s m ) and the target distribution (x' 1 ) , x′ 2 , x′ 3 , ..., x′ m ). Specifically, it is assumed that the two samples are from two identical populations except for the overall mean, and the purpose is to test whether the means of the two populations are significantly different.
  • Mann-Whitney U test Mann-Whitney U test
  • m is the number of time points corresponding to the process steps in the production record.
  • W 1 is the rank sum of s 1 , s 2 , s 3 , ..., s m
  • W 2 is the rank sum of x' 1 , x' 2 , x' 3 , ..., x' m
  • electronic equipment Select the smaller value of U 1 and U 2 as U and compare it with the preset critical value U a .
  • U ⁇ U a the above assumption is rejected, that is, the difference between the target distribution and the standard normal distribution is large.
  • U is greater than or equal to U a
  • the above assumption is accepted, and the two samples are considered to be from the same population, indicating that the difference between the target distribution and the standard normal distribution is small.
  • Figure 7 shows the comparison between the target distribution and the standard normal distribution.
  • the difference between the target distribution and the standard normal distribution in the left graph is smaller than the difference between the target distribution and the standard normal distribution in the right graph.
  • Step 3 The electronic device determines the degree of influence of the process step on the sudden failure according to the difference value.
  • the electronic device determines, according to U, the degree p value of the influence of the process step on the sudden failure.
  • the electronic device converts U into a value between 0 and 1 as the p value. The larger the p value, the less the above hypothesis cannot be rejected, and the difference is not significant, that is, the two groups of data have the same distribution. If no samples with low incidence of defects are interspersed in the burst time period, the greater the influence of this process step on the burst failure.
  • the degree of influence of the process steps corresponding to the left figure on the burst failure is greater than that of the process steps in the right figure.
  • the preset distribution is a summary of the distribution law of the index value of the sample in the case of sudden failure.
  • the preset distribution may be a standard normal distribution, or may be other distribution types such as an exponential distribution. Not limited.
  • the method for determining the degree of influence of a process step on sudden failure in the embodiment of the present disclosure is based on the principle of consistency of the performance of the index value of the sample in the high failure time period corresponding to a certain process step, that is, in a certain process step.
  • the index value variation of the samples passing through this process step is highly concentrated.
  • the samples that pass through the first process step during the high defect occurrence time period of the first process step have no samples with a low defect occurrence rate interspersed, and pass through the second process step during the defect high occurrence time period of the second process step.
  • the samples with a low incidence of defects are interspersed with samples, and the influence degree of the first process step on the defect burst is greater than the influence degree of the second process step on the defect burst.
  • the electronic device determines the degree of influence of the process parameter on the sudden failure through the following steps:
  • Step 1 The electronic device acquires the critical change point of the process parameter in the production record, and determines the time corresponding to the critical change point as the second production time.
  • the electronic device obtains the Gini coefficient of the process parameter in the production record, determines the value of the process parameter with the smallest Gini coefficient as the critical change point of the process parameter, and determines the time corresponding to the critical change point as the second Production time.
  • the electronic device uses each value of the process parameter in the production record as a cutpoint, calculates the Gini coefficient corresponding to each cutpoint, obtains multiple Gini coefficients, and determines the value of the process parameter with the smallest Gini coefficient as For the critical change point of the process parameters, the time corresponding to the critical change point is determined as the second production time.
  • the value of the process parameter with the smallest Gini coefficient is taken as the sudden change point, and the time corresponding to the sudden change point is taken as the second production time.
  • Step 2 The electronic device determines the difference between the first production time and the second production time, and determines the degree of influence of the process parameters on the sudden failure according to the difference.
  • a time threshold can be preset in the electronic device, and when the difference between the first production time and the second production time is greater than the time threshold, the degree of influence of the process parameters of the second production time on the sudden failure is determined If it is 0, it means that the process parameter has no effect on the sudden failure.
  • the electronic device determines that the degree of influence of the process parameter of the second production time on the sudden failure is 1, which means that the process parameter Affects sudden adverse effects.
  • the electronic device may analyze each process parameter, or: first, the electronic device determines the process according to the above-mentioned process. The degree of influence of the steps on the sudden failure is determined, and the process steps to be analyzed are determined, and the electronic equipment analyzes the process parameters under the process steps to be analyzed to determine the degree of influence of multiple process parameters on the sudden failure. The degree of influence of process parameters on the burst failure, thereby improving the efficiency of locating the cause of the burst failure.
  • S103 The electronic device obtains the Gini coefficient of the process information in the production record.
  • the Gini coefficient of the process information is used to characterize the degree of correlation between the process information and the index value of the sample; according to the Gini coefficient of the process information, the degree of influence of the process information on the sudden failure is determined.
  • the electronic device obtains the process step and index value corresponding to the sample identification from the production record, and obtains the Gini coefficient of the process step according to the process step and index value corresponding to the sample identification.
  • the electronic equipment determines the degree of influence of the process step on the sudden failure according to the Gini coefficient of the process step.
  • the adverse effect of the first process step on the sample results in influence or no influence.
  • Electronic equipment can use Gini coefficient, an impurity measurement method in a CART tree of decision tree, to calculate the degree of influence of each process step that the sample passes through on the index value of the obtained sample.
  • the Gini coefficient is C K is the sample in D that belongs to class K.
  • the total number of samples that have undergone the first process step is D
  • CK is the number of bad samples in D.
  • Gini(D) reflects the degree of influence of the first process step on sample defects to a certain extent.
  • the electronic device performs a chi-square test on the process step in the production record to obtain a chi-square test value of the process step on the index value of the sample, and the chi-square test value is used to characterize the influence of the process step on the index value of the sample. Then, the electronic device determines the degree of influence of the process step on the sudden failure according to the first preset weight, the chi-square test value and the Gini coefficient of the process step.
  • the chi-square test in statistics is the degree of deviation between the actual observed value of the statistical sample and the theoretically inferred value, and the degree of deviation between the actual observed value and the theoretically inferred value determines the size of the chi-square value.
  • the most basic idea of the chi-square test is to determine whether the theory is correct or not by observing the deviation between the actual value and the theoretical value.
  • the chi-square test value chicsquare satisfies the formula
  • This formula represents the degree of deviation between the theoretical value E and the actual value x in n samples.
  • the device 1 represented by a certain device identifier, it is assumed that the device 1 has no influence on the failure of the sample, they are independent and irrelevant, and the index value of the sample through the device 1 is actually
  • the theoretical value can be calculated according to the defective rate of the whole sample, and the chi-square test value can be obtained from the above formula. Substitute the chi-square test value into the probability density function to calculate the chi-square distribution to obtain pValue.
  • subbad is the number of bad samples that have passed through device 1
  • subgood is the number of good samples that have passed through device 1
  • totalbad-subbad is the number of bad samples that have not passed through device 1
  • totalgood-subgood is the number of good samples that have passed through device 1.
  • the electronic device can use the chi-square test value corresponding to the above-mentioned process step 1 or the Gini coefficient corresponding to the process step 1 as the degree of influence of the process step 1 on the index value of the sample, or the electronic device can use the first preset weight according to the degree of influence. , the chi-square test value and the Gini coefficient of the process steps to determine the degree of influence of the process steps on the sudden failure.
  • the first preset weights are 0.5 and 0.5
  • the electronic device obtains the first product of the chi-square test value and 0.5
  • the electronic device obtains the second product of the Gini coefficient and 0.5
  • calculates the difference between the first product and the second product and calculates the difference between the first product and the second product.
  • the degree of impact on sudden adverse effects are 0.5 and 0.5.
  • the electronic device obtains the process parameter and the index value corresponding to the sample identification from the production record, and obtains the Gini coefficient of the process parameter according to the process parameter and the index value corresponding to the sample identification, and then the electronic device The equipment determines the influence degree of the process parameters on the sudden failure according to the Gini coefficient of the process parameters.
  • the electronic equipment performs correlation inspection on the process parameters in the production records and the index values of the samples to obtain the influence parameters of the process parameters; the influence parameters are used to characterize the degree of influence of the process parameters on the index values of the samples, and then the electronic equipment determines The second preset weight, the Gini coefficient of the influence parameter and the process parameter obtains the degree of influence of the process parameter on the sudden failure.
  • the association test may be at least one of a normal distribution test, a homogeneity of variance test, or a T test.
  • the second preset weight of the influence parameter is 0.4 and the second preset weight of the Gini coefficient of the process parameter is 0.6
  • the electronic device obtains the third product of the influence parameter and 0.4
  • the electronic device obtains the Gini coefficient of the process parameter and
  • the fourth product of 0.6 is used, and the sum of the third product and the fourth product is used as the influence degree of the process parameters on the sudden failure.
  • the electronic device obtains the result variable (ie, the above-mentioned index value) according to the obtained production record, and then the electronic device obtains the cause variable and divides the cause variable into two groups.
  • Continuous causal variables eg: the above process parameters
  • discrete causal variables eg: the above process steps.
  • the electronic equipment adopts mutation point detection to determine the high incidence time point of adverse events in the result variable, as well as the critical change point in the continuous causal variable. degree of influence.
  • electronic equipment locates the time period with high incidence of defects, and fits the result variable and causal variable in the time period with high incidence of defects, and performs a significance test between the fitting results and the standard normal distribution to obtain discrete The degree of influence of type causal variables on sudden adverse events. It comprehensively displays the influence degree of discrete causal variables on sudden failure and the influence degree of continuous causal variables on sudden failure.
  • the result finally determined in the embodiment of the present disclosure is the degree of influence of each process parameter in the multiple process parameters on the burst failure, and/or the effect of each process step in the multiple process steps on the burst failure.
  • the electronic device displays the determined result, so that the user can judge the one or more process information that has the greatest impact on the sudden adverse effect from the displayed results, so as to locate the cause of the sudden bad.
  • the electronic device may acquire the source production information of each sample in the multiple samples only once (for example, the sample identifier stored in one device and the index value of the sample represented by the sample identifier, The sample identification stored in several other devices and the process information corresponding to the sample identification) are stored on the electronic device or other intermediate devices.
  • the electronic device obtains the production information from the electronic device or other devices that store the production information of the multiple samples.
  • the data can be obtained from the device, thus speeding up the data processing speed.
  • This embodiment of the present disclosure does not limit the format of the data storage.
  • the data storage may be stored in the parquet format.
  • the electronic device determines the high defect time period according to the obtained index value in the production record and the time corresponding to the process information, and then the electronic device determines the process information to the sudden change according to the high defect time period and the obtained production record.
  • the correlation effect of the time trend of the bad burst of the sample can be mined and quantified into a numerical value, which can provide users with more accurate and comprehensive data to locate the cause of the bad.
  • FIG. 9 is a flowchart of another data processing method provided by an embodiment of the present disclosure. The method can be applied to the electronic device shown in FIG. 3 , and the method shown in FIG. 9 can include the following steps:
  • the electronic device receives the sample screening condition input by the user on the condition selection interface.
  • the sample screening conditions include at least one of product model, testing site, production time period, process identification, equipment identification, process parameters or defect type.
  • FIG. 10 Exemplarily, the condition selection interface displayed by the electronic device is shown in FIG. 10 .
  • a of FIG. 10 includes a time period input box, a detection site input box, a product model input box, a process (that is, a process step) input box, etc.
  • FIG. 10 The B is the bad type input box interface.
  • the raw material can be a panel master, and the testing site can be used by the user to select the testing site.
  • the testing site includes at least six types of defects: the number of defects in type 1 can be used for the user to select the number of defects in the sample of type 1 as the defect.
  • the defect rate of type 1 can be used for the user to select the defect rate of the sample of type 1 as the defect type
  • the defect rate of the type 1 raw material can be used for the user to select the defect rate of the raw material of type 1 as the defect type
  • the number of type 2 defects can be For the user to select the number of defective samples of type 2 as the defective type
  • the defective rate of type 2 can be used for the defective rate of the sample of type 2 selected by the user as the defective type
  • the defective rate of the raw material of type 2 can be used for the user to select the defective rate of type 2.
  • the defective rate of raw materials is used as the defective type.
  • the filter condition further includes a cause variable.
  • the cause variable input interface displayed by the electronic device is shown in FIG. 11 .
  • the raw material in FIG. 11 may be a panel master.
  • the testing site in FIG. 11 is a testing site that can be used by the user to select the product, and the product can be used for the user to select the product model.
  • the process identification in Figure 11 can be used for the user to select the corresponding process, one process corresponds to at least one process step, the process step identification 1 and the process step identification 2 in Figure 11 can be used for the user to select the process step, and the identification in Figure 11 is the process step identification
  • the process steps of 2 correspond to at least three devices. Wherein, device 1 corresponds to one device, device 2 corresponds to one device, and device 3 corresponds to one device.
  • the electronic device obtains the production records of each of the multiple samples corresponding to the sample screening conditions; the production records include process information, production time corresponding to the process information, and index values; the process information is process parameters or process steps; the index values are used for Indicates the degree of badness of the sample belonging to the preset bad type; the plurality of samples includes bad samples, and the bad samples are samples whose index value is greater than the first threshold.
  • the index value may be the defective rate and the defective number of the defective type.
  • whether the Qtest measurement data eg, thickness, electrical parameters
  • the standard can be used to determine whether the sample is good or bad.
  • S202 The electronic device determines a time period with high occurrence of defects according to the obtained index value in the production record and the time corresponding to the process information.
  • S203 The electronic device determines the degree of influence of the process information on the sudden defect according to the high defect occurrence time period and the obtained production records.
  • S204 The electronic equipment displays the degree of influence of the process information on the sudden failure in the analysis result display.
  • the electronic device sorts the degree of influence of the acquired plurality of process information on the sudden failure, and then the electronic device displays the influence degree of the sorted process information on the sudden failure on the analysis result display interface.
  • the electronic device sorts in descending order the degree of influence of the acquired pieces of process information on the bad burst, and displays the degree of influence of the sorted process information on the bad burst. In this way, the ones that have the greatest impact on sudden adverse effects will be ranked at the top, which is convenient for users to view.
  • the electronic device can determine and display one quantified value of the influence degree of the process step on the burst failure according to the preset weight and multiple quantification values of the influence degree of the process step on the burst failure. Alternatively, the electronic device separately displays a quantified value for each degree of influence of the process step on the burst failure. Similarly, the electronic device may determine and display a quantified value of the influence degree of the process parameter on the burst failure according to the preset weight and multiple quantification values of the influence degree of the process parameter on the burst failure. Alternatively, the electronic device displays a quantified value for each degree of influence of the process parameter on the burst failure, respectively.
  • Figure 12 shows the quantified value of the impact degree of the device ID on the bad burst displayed by the electronic device in the analysis result display interface.
  • 16 under the serial number is the serial number of the row of data
  • device 1 is the device ID.
  • the first influence degree quantified value of 0.9397 is the quantified value of the influence degree of the device identified as device 1 on the sudden failure.
  • the second quantified value of the influence degree 0.012293 is another quantified value of the influence degree of the equipment identified as equipment 1 on the burst failure obtained from the analysis of the time dimension. The rest are similar to this and will not be repeated here.
  • each functional module can be divided corresponding to each function, or two or more functions can be integrated into one processing module. middle.
  • the above-mentioned integrated modules can be implemented in the form of hardware, and can also be implemented in the form of software function modules. It should be noted that, the division of modules in the embodiments of the present disclosure is schematic, and is only a logical function division, and there may be other division manners in actual implementation.
  • the data processing device 70 includes: an acquisition module 701, a first determination module 702, and a second determination module 703.
  • the acquisition module 701 is used to acquire production records corresponding to each sample in the multiple samples; the production records include process information, and the production records corresponding to the process information time and index value; the process information is process parameters and/or process steps; the index value is used to represent the degree of defect that the sample belongs to the preset defect type; the multiple samples include the defect sample, and the defect sample is the sample whose index value is greater than the first threshold;
  • the first determination module 702 is configured to determine the time period with high occurrence of defects according to the obtained index value in the production record and the time corresponding to the process information, where the high incidence of defects is the time period in which the distribution probability of the bad samples is greater than the second threshold; the second A determination module 703 is configured to determine the degree of influence of the process information on the sudden defect according to the high defect occurrence
  • the second determination module 703 is specifically configured to: determine the target distribution of the index values of the samples in the high-incidence time period in the production time corresponding to the process step; determine the target distribution The difference value from the preset distribution; the difference value is used to represent the distribution probability of the defective samples that have passed through the process steps; the degree of influence of the process step on the sudden defect is determined according to the difference value.
  • the second determination module 703 is specifically used for: the second determination module is specifically used for: converting the production time corresponding to the process step into the time value corresponding to the process step; The target distribution of values over time values corresponding to process steps.
  • the second determination module 703 is specifically configured to: use a polynomial curve fitting method to fit the time value corresponding to the process step as the fitting index value; determine the distribution of the fitting index value on the time value is: the target distribution of the index value of the sample in the time period corresponding to the process step in the high-incidence time period.
  • the preset distribution is a standard normal distribution
  • the target distribution is a polynomial distribution
  • the second determining module 703 is specifically configured to: obtain a difference value between the target distribution and the standard normal distribution by using a significance test.
  • the first determination module 702 is specifically configured to: divide the multiple samples into good samples and bad samples according to the first threshold and the obtained index value in the production record; and compare the number of bad samples in the production record with the number of bad samples in the production record.
  • the time period in which the ratio of the total number of samples is greater than the second threshold is determined as the time period with high occurrence of defects; the time period is the time period corresponding to the process information.
  • the acquiring module 701 is further configured to: acquire the first production time by detecting the mutation point; the first production time is the mutation time point of the index value; the first production time is the time point in the high-incidence time period; the critical change point of the process parameters in the production record is obtained, and the time corresponding to the critical change point is determined as the second production time, and the second determination module 703 is specifically used to determine the first production time and The difference of the second production time, and the influence degree of the process parameter on the sudden failure is determined according to the difference.
  • the obtaining module 701 is further configured to: obtain the Gini coefficient of the process parameter in the production record; and determine the value of the process parameter with the smallest Gini coefficient as the critical change point of the process parameter.
  • the acquiring module 701 is further configured to: acquire the Gini coefficient of the process information in the production record; the Gini coefficient of the process information is used to represent the degree of correlation between the process information and the index value of the sample; the second determination module 703 specifically uses Yu: According to the Gini coefficient of process information, determine the degree of influence of process information on sudden failure.
  • the production record further includes a sample identifier.
  • the acquiring module 701 is specifically configured to: acquire process parameters and index values corresponding to the sample identifier from the production record; Identify the corresponding process parameters and the index value to obtain the Gini coefficient of the process parameters.
  • the production record further includes a sample identifier
  • the acquiring module 701 is specifically configured to: acquire from the production record the process step and index value corresponding to the sample identifier; Identify the corresponding process step and index value, and obtain the Gini coefficient of the process step.
  • the data processing apparatus further includes a checking module 704, configured to perform a chi-square check on the process steps in the production records, to obtain a chi-square check value of the index value of the sample by the process step, and the chi-square check value is used to represent The influence degree of the process step on the index value of the sample;
  • the second determination module 703 is specifically configured to: determine the influence degree of the process step on the sudden failure according to the first preset weight, the chi-square test value and the Gini coefficient of the process step.
  • the inspection module 704 is further configured to: perform correlation inspection on the process parameters in the production records and the index values of the samples to obtain the influence parameters of the process parameters; the influence parameters are used to characterize the influence of the process parameters on the index values of the samples
  • the second determining module 703 is specifically configured to: obtain the influence degree of the process parameter on the sudden failure according to the second preset weight, the influence parameter and the Gini coefficient of the process parameter.
  • the acquisition module 701 is specifically configured to: acquire the first correspondence between the sample identifier and the index value of each sample in the multiple samples, and acquire the sample identifier, process information, and process information of each sample corresponding to the production The second corresponding relationship of time; according to the sample identification, the first corresponding relationship and the second corresponding relationship of each sample, the third corresponding relationship of the process information of each sample, the production time corresponding to the process information and the index value is established.
  • the receiving function of the above-mentioned acquisition module 701 may be implemented by the interface unit 304 in FIG. 3 .
  • the processing functions of the obtaining module 701 , the first determining module 702 , the second determining module 703 and the checking module 704 can be implemented by the processor 301 in FIG. 3 calling a computer program stored in the memory 302 .
  • the data processing apparatus 80 includes a receiving module 801 , an obtaining module 802 , a determining module 803 and a display module 804 .
  • the receiving module 801 uses for receiving the sample screening conditions input by the user in the condition selection interface; the acquisition module 802 is used to acquire the production records of each sample in the multiple samples corresponding to the sample screening conditions; the production records include process information, production time and indicators corresponding to the process information
  • the process information is process parameters and/or process steps; the index value is used to represent the degree of defectiveness of the sample belonging to the preset defective type; the multiple samples include defective samples, and the defective samples are the samples whose index value is greater than the first threshold; determination module 803 , which is used to determine the high-incidence time period according to the index value in the obtained production record and the time corresponding to the process information.
  • the production record is used to determine the degree of influence of the process information on the sudden failure; the display module 804 is used for displaying the influence degree of the process information on the sudden failure on the analysis result display interface.
  • the receiving module 801 can be used to execute S200
  • the obtaining module 802 can be used to execute S201
  • the determining module 803 can be used to execute S202-S203
  • the display module 804 can be used to execute S204.
  • the data processing apparatus further includes: a sorting module 805, configured to sort the degree of influence of the acquired plurality of process information on the bad bursts; the display module 804 is specifically configured to: after the sorting is displayed on the analysis result display interface The degree of influence of the process information on the sudden adverse effect.
  • a sorting module 805 configured to sort the degree of influence of the acquired plurality of process information on the bad bursts
  • the display module 804 is specifically configured to: after the sorting is displayed on the analysis result display interface The degree of influence of the process information on the sudden adverse effect.
  • the receiving functions of the above-mentioned receiving module 801 and obtaining module 802 may be implemented by the interface unit 304 in FIG. 3 .
  • the processing functions of the obtaining module 802, the determining module 803, and the sorting module 805 can be implemented by the processor 301 in FIG. 3 calling the computer program stored in the memory 302.
  • the display module 804 may be implemented by the display 306 in FIG. 3 .
  • Embodiments of the present disclosure further provide an electronic device, comprising: a processor and a memory for storing executable instructions of the processor; wherein the processor is configured to execute the executable instructions to implement any of the above The data processing method according to an embodiment.
  • Some embodiments of the present disclosure provide a computer-readable storage medium (eg, a non-transitory computer-readable storage medium) having computer program instructions stored therein that when executed on a processor , so that the processor executes one or more steps in the data processing method described in any one of the foregoing embodiments.
  • a computer-readable storage medium eg, a non-transitory computer-readable storage medium
  • the above-mentioned computer-readable storage media may include, but are not limited to: magnetic storage devices (for example, hard disks, floppy disks or magnetic tapes, etc.), optical disks (for example, CD (Compact Disk, compact disk), DVD (Digital Versatile Disk, Digital Universal Disk), etc.), smart cards and flash memory devices (eg, EPROM (Erasable Programmable Read-Only Memory, Erasable Programmable Read-Only Memory), card, stick or key drive, etc.).
  • the various computer-readable storage media described in this disclosure may represent one or more devices and/or other machine-readable storage media for storing information.
  • the term "machine-readable storage medium" may include, but is not limited to, wireless channels and various other media capable of storing, containing, and/or carrying instructions and/or data.
  • the computer program product includes computer program instructions that, when executed on a computer, cause the computer to perform one or more steps in the data processing methods described in the above embodiments.
  • Some embodiments of the present disclosure also provide a computer program.
  • the computer program When the computer program is executed on a computer, the computer program causes the computer to perform one or more steps in the data processing method described in the above-mentioned embodiments.

Abstract

一种数据处理方法,该方法包括:获取多个样本中每个样本对应的生产记录;生产记录包括工艺信息、工艺信息对应生产时间以及指标值;根据获取的生产记录中的指标值以及工艺信息对应的时间,确定不良高发时间段;根据不良高发时间段以及获取的生产记录,确定工艺信息对突发不良的影响程度。

Description

数据处理方法、装置、设备及存储介质 技术领域
本公开涉及数据处理技术领域,尤其涉及数据处理方法、装置、设备及存储介质。
背景技术
在制造产品的过程中,产品所经过的工艺步骤以及工艺步骤所对应的工艺参数都会影响产品的性能,从而可能导致产品的性能不达标(又称不良)。工艺步骤中包括产品生产过程中经过的设备。因此,对于性能不达标的产品需要从工艺步骤以及工艺参数中确定导致产品突发不良的原因。
发明内容
一方面,提供一种数据处理方法,包括:获取多个样本中每个样本对应的生产记录;生产记录包括工艺信息、工艺信息对应生产时间以及指标值;工艺信息为工艺参数和/或工艺步骤;指标值用于表征样本属于预设不良类型的不良程度;多个样本包括不良样本,不良样本为指标值大于第一阈值的样本;根据获取的生产记录中的指标值以及工艺信息对应的时间,确定不良高发时间段,不良高发时间段为不良样本的分布概率大于第二阈值的时间段;根据不良高发时间段以及获取的生产记录,确定工艺信息对突发不良的影响程度。
在一些实施例中,在工艺信息为工艺步骤的情况下,上述根据不良高发时间段以及获取的生产记录,确定工艺信息对突发不良的影响程度,包括:确定不良高发时间段内样本的指标值在工艺步骤对应的生产时间上的目标分布;确定目标分布与预设分布的差异值;差异值用于表征经过工艺步骤的不良样本的分布概率;根据差异值确定工艺步骤对突发不良的影响程度。
在另一些实施例中,上述确定不良高发时间段内样本的指标值在工艺步骤对应的生产时间上的目标分布,包括:将工艺步骤对应的生产时间数值化为工艺步骤对应的时间数值;确定不良高发时间段内样本的指标值在工艺步骤对应的时间数值上的目标分布。
在另一些实施例中,上述确定不良高发时间段内样本的指标值在工艺步骤对应的时间数值上的目标分布,包括:采用多项式曲线拟合方法,将工艺步骤对应的时间数值拟合为拟合指标值;将拟合指标值在时间数值上的分布确定为:不良高发时间段内样本的指标值在工艺步骤对应的时间数值上的目标分布。
在另一些实施例中,预设分布为标准正态分布;目标分布为多项式分布;确定目标分布与预设分布的差异值,包括:采用显著性检验获取目标分布与标准正态分布的差异值。
在另一些实施例中,上述根据获取的生产记录中的指标值以及工艺信息 对应的时间,确定不良高发时间段,包括:根据第一阈值以及获取的生产记录中的指标值将多个样本划分为良样本和不良样本;将生产记录中不良样本的数量与样本总数量的比值大于第二阈值的时间段确定为不良高发时间段;时间段为工艺信息对应的时间的时间段。
在另一些实施例中,在工艺信息为工艺参数的情况下,根据不良高发时间段以及获取的生产记录,确定工艺信息对突发不良的影响程度,包括:采用突变点检测获取第一生产时间;第一生产时间为指标值的突变时间点;第一生产时间为不良高发时间段中的时间点;获取生产记录中工艺参数的临界变化点,并将临界变化点对应的时间确定为第二生产时间。确定第一生产时间与第二生产时间的差值,并根据该差值确定工艺参数对突发不良的影响程度。
在另一些实施例中,上述获取生产记录中工艺参数的临界变化点,包括:获取生产记录中工艺参数的基尼系数;将基尼系数最小的工艺参数的值确定为工艺参数的临界变化点。
在另一些实施例中,获取生产记录中工艺信息的基尼系数;工艺信息的基尼系数用于表征工艺信息与样本的指标值的关联程度;根据工艺信息的基尼系数,确定工艺信息对突发不良的影响程度。
在另一些实施例中,生产记录还包括样本标识,在工艺信息为工艺参数的情况下,上述获取生产记录中工艺信息的基尼系数,包括:从生产记录中获取与样本标识对应的工艺参数以及指标值;根据与样本标识对应的工艺参数以及指标值获取工艺参数的基尼系数。
在另一些实施例中,生产记录还包括样本标识,在工艺信息为工艺步骤的情况下,获取生产记录中工艺信息的基尼系数,包括:从生产记录中获取与样本标识对应的工艺步骤以及指标值;根据与样本标识对应的工艺步骤以及指标值,获取工艺步骤的基尼系数。
在另一些实施例中,该方法还包括:对生产记录中工艺步骤进行卡方检验,得到工艺步骤对样本的指标值的卡方检验值,卡方检验值用于表征工艺步骤对样本的指标值的影响程度;根据第一预设权重、卡方检验值以及工艺步骤的基尼系数,确定工艺步骤对突发不良的影响程度。
在另一些实施例中,该方法还包括:对生产记录中工艺参数以及样本的指标值,进行关联检验得到工艺参数的影响参数;影响参数用于表征工艺参数对样本的指标值的影响程度;根据第二预设权重、影响参数与工艺参数的基尼系数得到工艺参数对突发不良的影响程度。
在另一些实施例中,上述获取多个样本中每个样本对应的生产记录,包括:获取多个样本中每个样本的样本标识与指标值的第一对应关系,并获取所述每个样本的样本标识、工艺信息以及工艺信息对应生产时间的第二对应关系;根据每个样本的样本标识、第一对应关系以及第二对应关系,建立每个样本的工艺信息、工艺信息对应生产时间以及指标值的第三对应关系。
另一方面,提供一种数据处理方法,包括:接收用户在条件选择界面输入的样本筛选条件;获取与样本筛选条件对应的多个样本中每个样本的生产记录;生产记录包括工艺信息、工艺信息对应生产时间以及指标值;工艺信息为工艺参数和/或工艺步骤;指标值用于表征样本属于预设不良类型的不良程度;多个样本包括不良样本,不良样本为指标值大于第一阈值的样本;根据获取的生产记录中的指标值以及工艺信息对应的时间,确定不良高发时间段,不良高发时间段为不良样本的分布概率大于第二阈值的时间段;根据不良高发时间段以及获取的生产记录,确定工艺信息对突发不良的影响程度;在分析结果展示界面显示工艺信息对突发不良的影响程度。
在一些实施例中,上述在分析结果展示界面显示工艺信息对突发不良的影响程度,包括:对获取的多个工艺信息对突发不良的影响程度进行排序;在分析结果展示界面显示排序后的工艺信息对突发不良的影响程度。
再一方面,提供一种数据处理装置,包括:获取模块、第一确定模块以及第二确定模块,获取模块用于获取多个样本中每个样本对应的生产记录;生产记录包括工艺信息、工艺信息对应生产时间以及指标值;工艺信息为工艺参数和/或工艺步骤;指标值用于表征样本属于预设不良类型的不良程度;多个样本包括不良样本,不良样本为指标值大于第一阈值的样本;第一确定模块,用于根据获取的生产记录中的指标值以及工艺信息对应的时间,确定不良高发时间段,不良高发时间段为不良样本的分布概率大于第二阈值的时间段;第二确定模块,用于根据不良高发时间段以及获取的生产记录,确定工艺信息对突发不良的影响程度。
在一些实施例中,在工艺信息为工艺步骤的情况下,第二确定模块具体用于:确定不良高发时间段内样本的指标值在工艺步骤对应的生产时间上的目标分布;确定目标分布与预设分布的差异值;差异值用于表征经过工艺步骤的不良样本的分布概率;根据差异值确定工艺步骤对突发不良的影响程度。
在另一些实施例中,第二确定模块具体用于:将工艺步骤对应的生产时间数值化为工艺步骤对应的时间数值;确定不良高发时间段内样本的指标值在工艺步骤对应的时间数值上的目标分布。
在另一些实施例中,第二确定模块具体用于:采用多项式曲线拟合方法,将工艺步骤对应的时间数值拟合为拟合指标值;将拟合指标值在时间数值上的分布确定为:不良高发时间段内样本的指标值在工艺步骤对应的时间数值上的目标分布。
在另一些实施例中,预设分布为标准正态分布;目标分布为多项式分布;第二确定模块具体用于:采用显著性检验获取目标分布与标准正态分布的差异值。
在另一些实施例中,第一确定模块具体用于:根据第一阈值以及获取的生产记录中的指标值将多个样本划分为良样本和不良样本;将生产记录中不 良样本的数量与样本总数量的比值大于第二阈值的时间段确定为不良高发时间段;时间段为工艺信息对应的时间的时间段。
在另一些实施例中,在工艺信息为工艺参数的情况下,获取模块还用于:采用突变点检测获取第一生产时间;第一生产时间为指标值的突变时间点;第一生产时间为不良高发时间段中的时间点;获取生产记录中工艺参数的临界变化点,并将临界变化点对应的时间确定为第二生产时间,第二确定模块具体用于确定第一生产时间与第二生产时间的差值,并根据该差值确定工艺参数对突发不良的影响程度。
在另一些实施例中,获取模块还用于:获取生产记录中工艺参数的基尼系数;将基尼系数最小的工艺参数的值确定为工艺参数的临界变化点。
在另一些实施例中,获取模块还用于:获取生产记录中工艺信息的基尼系数;工艺信息的基尼系数用于表征工艺信息与样本的指标值的关联程度;第二确定模块具体用于:根据工艺信息的基尼系数,确定工艺信息对突发不良的影响程度。
在另一些实施例中,生产记录还包括样本标识,在工艺信息为工艺参数的情况下,获取模块具体用于:从生产记录中获取与样本标识对应的工艺参数以及指标值;根据与样本标识对应的工艺参数以及指标值获取工艺参数的基尼系数。
在另一些实施例中,生产记录还包括样本标识,在工艺信息为工艺步骤的情况下,获取模块具体用于:从生产记录中获取与样本标识对应的工艺步骤以及指标值;根据与样本标识对应的工艺步骤以及指标值,获取工艺步骤的基尼系数。
在另一些实施例中,数据处理装置还包括检验模块,用于对生产记录中工艺步骤进行卡方检验,得到工艺步骤对样本的指标值的卡方检验值,卡方检验值用于表征工艺步骤对样本的指标值的影响程度;第二确定模块具体用于:根据第一预设权重、卡方检验值以及工艺步骤的基尼系数,确定工艺步骤对突发不良的影响程度。
在另一些实施例中,检验模块还用于:对生产记录中工艺参数以及样本的指标值,进行关联检验得到工艺参数的影响参数;影响参数用于表征工艺参数对样本的指标值的影响程度;第二确定模块具体用于:根据第二预设权重、影响参数与工艺参数的基尼系数得到工艺参数对突发不良的影响程度。
在另一些实施例中,上述获取模块具体用于获取多个样本中每个样本的样本标识与指标值的第一对应关系,并获取所述每个样本的样本标识、工艺信息以及工艺信息对应生产时间的第二对应关系;根据每个样本的样本标识、第一对应关系以及第二对应关系,建立每个样本的工艺信息、工艺信息对应生产时间以及指标值的第三对应关系。
又一方面,提供一种数据处理装置,包括:接收模块、获取模块、确定模块以及显示模块,接收模块用于接收用户在条件选择界面输入的样本筛选 条件;获取模块,用于获取与样本筛选条件对应的多个样本中每个样本的生产记录;生产记录包括工艺信息、工艺信息对应生产时间以及指标值;工艺信息为工艺参数或工艺步骤;指标值用于表征样本属于预设不良类型的不良程度;多个样本包括不良样本,不良样本为指标值大于第一阈值的样本;确定模块,用于根据获取的生产记录中的指标值以及工艺信息对应的时间,确定不良高发时间段,不良高发时间段为不良样本的分布概率大于第二阈值的时间段;根据不良高发时间段以及获取的生产记录,确定工艺信息对突发不良的影响程度;显示模块,用于在分析结果展示界面显示工艺信息对突发不良的影响程度。
在一些实施例中,数据处理装置还包括:排序模块,用于对获取的多个工艺信息对突发不良的影响程度进行排序;显示模块具体用于:在分析结果展示界面显示排序后的工艺信息对突发不良的影响程度。
又一方面,提供一种电子设备,包括处理器和用于存储所述处理器可执行指令的存储器;其中,所述处理器被配置为执行所述可执行指令,以实现上述任一方面实施例所述的数据处理方法。
再一方面,提供一种计算机可读存储介质。所述计算机可读存储介质存储有计算机程序指令,所述计算机程序指令在处理器上运行时,使得所述处理器执行如上述任一实施例所述的数据处理方法中的一个或多个步骤。
又一方面,提供一种计算机程序产品。所述计算机程序产品包括计算机程序指令,在计算机上执行所述计算机程序指令时,所述计算机程序指令使计算机执行如上述任一实施例所述的数据处理方法中的一个或多个步骤。
又一方面,提供一种计算机程序。当所述计算机程序在计算机上执行时,所述计算机程序使计算机执行如上述任一实施例所述的数据处理方法中的一个或多个步骤。
附图说明
为了更清楚地说明本公开中的技术方案,下面将对本公开一些实施例中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本公开的一些实施例的附图,对于本领域普通技术人员来讲,还可以根据这些附图获得其他的附图。此外,以下描述中的附图可以视作示意图,并非对本公开实施例所涉及的产品的实际尺寸、方法的实际流程、信号的实际时序等的限制。
图1为根据一些实施例的数据处理系统的结构图;
图2为根据一些实施例的数据处理系统与数据处理结合的结构图;
图3为根据一些实施例的电子设备的结构图;
图4为根据一些实施例的一种数据处理方法的流程图;
图5为根据一些实施例正负样本的分布图;
图6为根据一些实施例的样本的指标值在生产时间上的分布图;
图7为根据一些实施例的目标分布与标准正态分布的对比图;
图8为根据一些实施例的数据处理方法的流程图;
图9为根据一些实施例的另一种数据处理方法的流程图;
图10为根据一些实施例的条件选择界面的结构图;
图11为根据一些实施例的原因变量输入界面的结构图;
图12为根据一些实施例的分析结果展示界面的结构图;
图13为根据一些实施例的一种数据处理装置70的结构图;
图14为根据一些实施例的另一种数据处理装置80的结构图。
具体实施方式
除非上下文另有要求,否则,在整个说明书和权利要求书中,术语“包括(comprise)”及其其他形式例如第三人称单数形式“包括(comprises)”和现在分词形式“包括(comprising)”被解释为开放、包含的意思,即为“包含,但不限于”。在说明书的描述中,术语“一个实施例(one embodiment)”、“一些实施例(some embodiments)”、“示例性实施例(exemplary embodiments)”、“示例(example)”、“特定示例(specific example)”或“一些示例(some examples)”等旨在表明与该实施例或示例相关的特定特征、结构、材料或特性包括在本公开的至少一个实施例或示例中。上述术语的示意性表示不一定是指同一实施例或示例。此外,所述的特定特征、结构、材料或特点可以以任何适当方式包括在任何一个或多个实施例或示例中。
以下,术语“第一”、“第二”仅用于描述目的,而不能理解为指示或暗示相对重要性或者隐含指明所指示的技术特征的数量。由此,限定有“第一”、“第二”的特征可以明示或者隐含地包括一个或者更多个该特征。在本公开实施例的描述中,除非另有说明,“多个”的含义是两个或两个以上。
在描述一些实施例时,可能使用了“耦接”和“连接”及其衍伸的表达。例如,描述一些实施例时可能使用了术语“连接”以表明两个或两个以上部件彼此间有直接物理接触或电接触。又如,描述一些实施例时可能使用了术语“耦接”以表明两个或两个以上部件有直接物理接触或电接触。然而,术语“耦接”或“通信耦合(communicatively coupled)”也可能指两个或两个以 上部件彼此间并无直接接触,但仍彼此协作或相互作用。这里所公开的实施例并不必然限制于本文内容。
如本文中所使用,根据上下文,术语“如果”任选地被解释为意思是“当……时”或“在……时”或“响应于确定”或“响应于检测到”。类似地,根据上下文,短语“如果确定……”或“如果检测到[所陈述的条件或事件]”任选地被解释为是指“在确定……时”或“响应于确定……”或“在检测到[所陈述的条件或事件]时”或“响应于检测到[所陈述的条件或事件]”。
本文中“适用于”或“被配置为”的使用意味着开放和包容性的语言,其不排除适用于或被配置为执行额外任务或步骤的设备。
另外,“基于”的使用意味着开放和包容性,因为“基于”一个或多个所述条件或值的过程、步骤、计算或其他动作在实践中可以基于额外条件或超出所述的值。
如本文所使用的那样,“约”或“近似”包括所阐述的值以及处于特定值的可接受偏差范围内的平均值,其中所述可接受偏差范围如由本领域普通技术人员考虑到正在讨论的测量以及与特定量的测量相关的误差(即,测量系统的局限性)所确定。
相关技术中,在制造产品的过程中,产品经过的任意一个生产工艺、生产工艺涉及的设备以及设备的配置参数(又称工艺参数)都会影响产品的性能,有可能导致产品的性能不达标(又称不良)。由于生产工艺较为繁杂,生产的产品数量庞大,人工查找造成产品性能不达标的原因较为困难,数据处理的时效和准确率都受到限制,难以满足日益增长的生产需求。基于此本公开实施例提供一种数据处理方法,通过数据挖掘的方法进行自动诊断分析,利用整个工厂的每个生产工艺中产生的数据,获取多个样本中每个样本对应的生产记录;根据获取的生产记录中的样本的指标值以及工艺信息对应的时间,确定不良高发时间段,根据不良高发时间段以及获取的生产记录,确定工艺信息对突发不良的影响程度,并转换成量化的判断指标(如:相关性量化值),从而提高检测效率,以便用户全面快速做出决策,定位突发不良原因。
下面将结合附图,对本公开一些实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本公开一部分实施例,而不是全部的实施例。基于本公开所提供的实施例,本领域普通技术人员所获得的所有其他实施例,都属于本公开保护的范围。
本公开的实施例所提供的数据处理方法适用于如图1所示的数据处理系统10,数据处理系统10包括数据处理装置100、显示装置200和分布式存储装置300。数据处理装置100分别与显示装置200和分布式存储装置300耦接。
分布式存储装置300被配置为存储多个设备(或称为工厂设备)产生的生产数据。例如,多个设备产生的生产数据包括多个设备的生产记录;例如,生产记录包括多个样本在生产过程中经过的设备的标识、设备所对应的环境参数、指标值以及生产时间,每个样本在生产过程中经历至少一个设备。
其中,分布式存储装置300中存储有相对完整的数据(如一个数据库)。分布式存储装置300可以包括多个硬件的存储器,且不同的硬件存储器分布在不同物理位置(如在不同工厂,或在不同生产线),并通过无线传输(例如网络等)实现相互之间信息的传递,从而使得数据是分布式关系的,但在逻辑上构成一个基于大数据技术的数据库。
大量不同设备的原始数据存储在相应的生产制造系统中,如良率管理系统(Yield Management System,YMS)、错误侦测及分类(Fault Detection&Classification,FDC)、制造执行系统(Manufacturing Execution System,MES)等系统的关系型数据库(如Oracle、Mysql等)中,而这些原始数据可通过数据抽取工具(如Sqoop、kettle等)进行原表抽取以传输给分布式存储装置300(如分布式文件系统(Hadoop Distributed File System,HDFS)),以降低对设备和生产制造系统的负载,便于后续数据处理装置100读取数据。
参考图2,分布式存储装置300中的数据可采用Hive工具或Hbase数据库格式存储。例如,根据Hive工具,以上原始数据先存储在数据库中;之后,可继续在Hive工具中进行数据清洗、数据转换等预处理,得到样本的生产记录数据仓库。数据仓库可再通过不同的API接口,与显示装置200、数据处理装置100等连接以实现与这些设备间的数据交互。显示装置200展示选择页面,选择页面用于用户选择筛选条件,筛选条件包括结果变量、原因变量以及过滤条件(例如:样本类别和预设时间段等),数据处理装置100进行突发性不良时间维度分析和/或智能挖掘以进行不良诊断分析,数据处理装置100经过不良诊断分析得到的分析结果,在显示装置200的分析结果展示页面展示给用户。
其中,由于涉及多个工厂的多个设备,故以上原始数据的数据量是很大的。例如,所有设备每天产生的原始数据可能有几百G,每小时产生的数据也可能有几十G。
示例性地,对海量结构化数据实现存储与计算主要有两种方案:一种为分布式文件管理系统(Distributed File System,DFS)的大数据方案。另一种为采用关系型数据库实现数据的存储,采用分布式计算实现数据的计算。
DFS为基础的大数据技术,允许采用多个廉价硬件设备构建大型集群, 以对海量数据进行处理。如Hive工具是基于Hadoop的数据仓库工具,可用来进行数据提取转化加载(ETL),Hive工具定义了简单的类SQL查询语言,同时也允许通过自定义的MapReduce的mapper和reducer来默认工具无法完成的复杂的分析工作。Hive工具没有专门的数据存储格式,也没有为数据建立索引,用户可以自由的组织其中的表,对数据库中的数据进行处理。可见,分布式文件管理的并行处理可满足海量数据的存储和处理要求,用户可通过SQL查询处理简单数据,而复杂处理时可采用自定义函数来实现。因此,在对工厂的海量数据分析时,需要将工厂数据库的数据抽取到分布式文件系统中,一方面不会对原始数据造成破坏,另一方面提高了数据分析效率。
关系型数据库可以为Oracle、DB2、MySQL、Microsoft SQL Server、Microsoft Access中的任意一种,分布式计算将一个计算任务分解为多个子任务,将多个子任务分配给多个计算机设备同时进行处理,最终将每个计算机设备处理得到的处理结果汇总为最终的结果。
示例性地,分布式存储装置300可以是一个存储器,可以是多个存储器,也可以是多个存储元件的统称。例如,存储器可以包括:随机存储器(Random Access Memory,RAM),双倍速率同步动态随机存储器(Double Data Rate Synchronous Dynamic Random Access Memory,DDR SRAM),也可以包括非易失性存储器(non-volatile memory),例如磁盘存储器,闪存(Flash)等。
数据处理装置100可以是任意一个终端设备、服务器、虚拟机或服务器集群。
显示装置200可以是显示器,还可以是包含显示器的产品,例如电视机、电脑(一体机或台式机)、计算机、平板电脑、手机、电子画屏等。示例性地,该显示装置可以是显示不论运动(例如,视频)还是固定(例如,静止图像)的且不论文字还是的图像的任何装置。更明确地说,预期所述实施例可实施在多种电子装置中或与多种电子装置关联,所述多种电子装置例如(但不限于)游戏控制台、电视监视器、平板显示器、计算机监视器、汽车显示器(例如,里程表显示器等)、导航仪、座舱控制器和/或显示器、电子相片、电子广告牌或指示牌、投影仪、建筑结构、包装和美学结构(例如,对于一件珠宝的图像的显示器)等。
示例性地,文中所述的显示装置200可包括一个或多个显示器,包括一个或多个具有显示功能的终端,从而数据处理装置100可将其处理后的数据(例如影响参数)发送给显示装置200,显示装置200再将其显示出来。也就是说,通过该显示装置200的界面(也即用户交互界面),可实现用户与数 据处理系统10的完全交互(控制和接收结果)。
可以理解的是上述数据处理装置100、显示装置200和分布式存储装置300的功能可以集成在一个电子装置或两个电子装置中,也可以是分开分别由不同的装置实现上述数据处理装置100、显示装置200和分布式存储装置300的功能,本公开实施例对此不进行限定。
上述数据处理装置100、显示装置200和分布式存储装置300的功能均可以由如图3所示的电子设备30实现。图3中电子设备30包括但不限于:处理器301、存储器302、输入单元303、接口单元304和电源305等。可选的,电子设备30包括显示器306。
处理器301是电子设备的控制中心,利用各种接口和线路连接整个电子设备的各个部分,通过运行或执行存储在存储器302内的软件程序和/或模块,以及调用存储在存储器302内的数据,执行电子设备的各种功能和处理数据,从而对电子设备进行整体监控。处理器301可包括一个或多个处理单元;可选的,处理器301可集成应用处理器和调制解调处理器,其中,应用处理器主要处理操作系统、用户界面和应用程序等,调制解调处理器主要处理无线通信。可以理解的是,上述调制解调处理器也可以不集成到处理器301中。
存储器302可用于存储软件程序以及各种数据。存储器302可主要包括存储程序区和存储数据区,其中,存储程序区可存储操作系统、至少一个功能单元所需的应用程序等。此外,存储器302可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件、闪存器件、或其他易失性固态存储器件。可选地,存储器302可以是非临时性计算机可读存储介质,例如,非临时性计算机可读存储介质可以是只读存储器(read-only memory,ROM)、随机存取存储器(random access memory,RAM)、CD-ROM、磁带、软盘和光数据存储设备等。
输入单元303可以为键盘、触摸屏等器件。
接口单元304为外部装置与电子设备30连接的接口。例如,外部装置可以包括有线或无线头戴式耳机端口、外部电源(或电池充电器)端口、有线或无线数据端口、存储卡端口、用于连接具有识别模块的装置的端口、音频输入/输出(I/O)端口、视频I/O端口、耳机端口等等。接口单元304可以用于接收来自外部装置的输入(例如,数据信息等)并且将接收到的输入传输到电子设备30内的一个或多个元件或者可以用于在电子设备30和外部装置之间传输数据。
电源305(比如:电池)可以用于为各个部件供电,可选的,电源305可 以通过电源管理系统与处理器301逻辑相连,从而通过电源管理系统实现管理充电、放电、以及功耗管理等功能。
显示器306用于显示由用户输入的信息或提供给用户的信息(例如由处理器301处理后的数据)。显示器306可包括显示面板(panel),可以采用液晶显示器(liquid crystal display,LCD)、有机发光二极管(Organic Light-Emitting Diode,OLED)等形式来配置显示面板。在电子设备30为显示装置200的情况下,电子设备30包括显示器306。
可选的,本公开实施例中的计算机指令也可以称之为应用程序代码或系统,本公开实施例对此不作具体限定。
需要说明的是,图3所示的电子设备仅为示例,其不对本公开实施例可适用的电子设备构成限定。实际实现时,电子设备可以包括比图3中所示的更多或更少的设备或器件。
如图4所示为本公开实施例所提供的一种数据处理方法的流程图,该方法可以应用于图3所示的电子设备,图4所示的方法可以包括以下步骤:
S100:电子设备获取多个样本中每个样本对应的生产记录;生产记录包括工艺信息、工艺信息对应生产时间以及指标值。其中,工艺信息为工艺参数和/或工艺步骤,指标值用于表征样本属于预设不良类型的不良程度。该多个样本包括不良样本,不良样本为指标值大于第一阈值的样本。
其中,第一阈值可以根据经验预先设定,或者,电子设备根据多个样本中每个样本的指标值的分布确定第一阈值。示例性的,假设,样本为生产面板的玻璃,样本的指标值为该玻璃属于预设不良类型的不良率,不良率为该玻璃生产得到的不合格面板的数量与该玻璃生产得到的面板的总数量的比值。该多个玻璃中90%的玻璃的不良率为10%,那么,电子设备确定第一阈值为10%。
工艺参数包括温度、压力或流量中的至少一种。工艺步骤可以为工艺标识和/或设备标识。
在一种可能的实现方式中,存储器或分布式存储系统中分别存储了样本标识与指标值的第一对应关系,样本标识、工艺信息以及工艺信息对应生产时间的第二对应关系。电子设备从存储器或分布式存储系统中获取第一对应关系以及第二对应关系,并通过样本标识将第一对应关系中的指标值、第二对应关系中的工艺信息以及工艺信息对应生产时间关联起来,得到样本的工艺信息、工艺信息对应生产时间以及指标值的第三对应关系,即得到该多个样本的生产记录。
示例性的,电子设备从Hbase数据库中获取生产特定型号的显示面板的标识,根据获取的显示面板的标识得到每个显示面板对应的生产记录。
需要说明的是,本公开实施例中的样本可以为显示面板生产线中的显示面板;当然,本公开实施例中的样本也可以为其它产品。样本对应的生产记录还可以包括显示面板母板(glass),显示面板母板可以被生产加工为多个显示面板。
预设不良类型指样本的质量缺陷的类型,样本的质量缺陷可能导致样本的性能低于性能阈值。本公开对样本的质量缺陷(又称不良)的划分方式不进行限定,示例性地,不良可根据需要分为不同类型。例如,可根据不良对样本性能的直接影响进行分类,如亮线不良、暗线不良、萤火虫不良(hot spot)等;或者,也可根据不良的具体成因进行分类,如信号线短路不良、对位不良等;或者,也可根据不良的大体成因进行分类,如阵列工艺不良、彩膜工艺不良等;或者,也可根据不良的严重程度进行分类,如导致报废的不良、导致降低品质的不良等;或者,也可不区分不良的种类,即只要样本存在任何不良,即认为其有不良,反之则认为其无不良。其中,本申请多个样本中每个样本的不良类型为同一种不良类型。
在另一种可能的实现方式中,电子设备接收多个样本中每个样本对应的生产记录。
在一个例子中,电子设备获取的生产记录中的部分数据如下表1所示,表1中以样本的指标值为厚度或电性参数等统计指标为例进行说明:
表1
Figure PCTCN2021091775-appb-000001
表1中,GlassID1为样本标识,步骤1为GlassID1所表征的样本在生产 过程中经过的工艺步骤,VTH为GlassID1所表征的样本的不良类型,-2.14833为指标值,2020-03-25 12:18:13为GlassID1所表征的样本在生产过程中经过步骤1时的生产时间。设备1所表征的设备为GlassID1所表征的样本在生产过程中经过的步骤1时经过的设备。其中,n为正整数。其余与此类似不再赘述。
在另一个例子中,电子设备获取的生产记录中的部分数据如下表2所示:
表2
Figure PCTCN2021091775-appb-000002
表2中,GlassID1为样本标识,步骤1为GlassID1所表征的样本在生产过程中经过的工艺步骤,参数1为GlassID1所表征的样本在经过步骤1时的的配置参数,457为参数1的值,指标值0.022为GlassID1所表征的样本的指标值,2020-05-07 05:49:55为GlassID1所表征的样本在生产过程中经过步骤1时的生产时间。其余与此类似不再赘述。需要说明的是,工艺参数的值及其对应的生产时间可以是基于某一事件触发收集得到的,本公开实施例中所使用的样本的工艺参数的值及其对应的生产时间可以是:收集得到的该样本的多个工艺参数的值及其对应的生产时间中的一个工艺参数的值及其对应的生产时间。
可以理解的是,电子设备获取到的生产记录可以为已经整合为上述形式的数据,电子设备也可以在接收样本的原始生产数据之后,将样本的原始生产数据根据样本的标识整合为整合为上述生产记录形式,本公开实施例对此不进行限定。需要说明的是,通常上述指标值与工艺信息的初始来源不同,在指标值与工艺信息(如:工艺步骤或工艺参数等)分别来自不同的数据源的情况下,电子设备可以通过样本标识将指标值与工艺信息关联起来。
S101:电子设备根据获取的生产记录中的指标值以及工艺信息对应的时间,确定不良高发时间段,不良高发时间段为不良样本的分布概率大于第二阈值的时间段。
具体的,在工艺信息为工艺步骤的情况下,电子设备根据第一阈值以及获取的生产记录中的指标值将获取的多个样本划分为良样本和不良样本,然后,电子设备将获取的生产记录中不良样本的数量与样本总数量的比值大于第二阈值的时间段确定为不良高发时间段,其中,时间段为工艺信息对应的时间的时间段。其中,第二阈值可以根据经验预先设定,或者,电子设备根据不良样本的分布确定第二阈值。
示例性的,电子设备划分后的正样本和负样本如图5所示:图5中横轴为工艺步骤对应的时间,纵轴为样本的指标值。第一阈值为0.1。可以理解的是,横轴也可以为工艺步骤。本公开实施例对此不进行限定。
基于图5的示例电子设备确定的不良高发时间段为2020年4月30日至2020年05月01日。
在工艺信息为工艺参数的情况下,电子设备根据突变点检测获取第一生产时间;第一生产时间为获取的生产记录中指标值的突变时间点。第一生产时间为不良高发时间段中的时间点。
示例性的,电子设备从获取的生产记录中获取工艺参数对应的时间序列x 1,x 2,x 3,...,x n,使用Pettitt突变点检测得到第一生产时间。Pettitt突变点检测是一种非参数检验方法,不仅可以获得突变点,还能量化突变点在统计意义上的显著性水平。该方法直接利用符合秩和序列来检测突变点。
首先,电子设备计算统计量U t,n,U t,n满足如下公式:
Figure PCTCN2021091775-appb-000003
其中,U t,n为统计量,n为获取的生产记录中工艺参数对应的时间的个数,x i为x 1,x 2,x 3,...,x n中的每一个时间,t为大于2小于或等于n的整数,
Figure PCTCN2021091775-appb-000004
i为大于1小于或者等于n的正整数。若存在t时刻满足k t=max 1≤t<n|U t,n|,则t点处为突变点,k t为|U 1,n|至|U n,n|中绝对值最大的值,
Figure PCTCN2021091775-appb-000005
若P≤0.05,则认为检测出的突变点在统计意义上是显著的。电子设备确定t点处对应的生产时间为第一生产时间。第一生产时间为指标值的突变时间点(又称不良高发时间点)。
示例性的,如图6所示,指标值在2020年3月21突然变小至第三阈值-2.5 以下,指标值低于-2.5的样本即为不良样本。电子设备确定2020年3月21为指标值的突变时间点。
S102:电子设备根据不良高发时间段以及获取的生产记录,确定工艺信息对突发不良的影响程度。
在工艺信息为工艺步骤的情况下,电子设备通过如下步骤确定工艺信息对突发不良的影响程度:
步骤一:电子设备确定不良高发时间段内样本的指标值在工艺步骤对应的生产时间上的目标分布。
具体的,电子设备将不良高发时间段内工艺步骤对应的生产时间数值化为工艺步骤对应的时间数值。然后,电子设备确定不良高发时间段内样本的指标值在工艺步骤对应的时间数值上的目标分布。
可选的,电子设备采用多项式曲线拟合方法,将工艺步骤对应的时间数值拟合为拟合指标值,并将拟合指标值在时间数值上的分布确定为:不良高发时间段内样本的指标值在工艺步骤对应的时间数值上的目标分布。
可选的,电子设备获取拟合指标值与样本的指标值的第一差异值,在第一差异值小于或者等于第四阈值的情况下,执行如下步骤二。第一差异值为拟合指标值与对应样本的指标值差异值中最大的差异值。其中,第四阈值可以根据经验设定。
示例性的,第四阈值为0.6,电子设备将工艺步骤对应的生产时间数值化为t1,t2,t3,…,tm,对t1,t2,t3,…,tm采用四次多项式曲线拟合方法得到x′ 1,x′ 2,x′ 3,...,x′ m,与t1,t2,t3,…,tm分别对应的样本的指标值为指标值x1,x2,x3,…,xm。其中,对应的x′与t满足公式x′=a 0+a 1t+a 2t 2+a 3t 3。若x′ i与对应xi的最大的误差大于0.6,则说明并非突发性不良的分析范围。xi为x1,x2,x3,…,xm中的任意一个指标值。若x′ i与对应xi的最大的误差小于或等于0.6,则执行如下步骤二。
步骤二:电子设备确定目标分布与预设分布的第二差异值;第二差异值用于表征经过该工艺步骤的不良样本的分布概率。
可以理解的是,预设分布为根据经验总结得出的分布,预设分布可以为标准正态分布。电子设备采用显著性检验获取目标分布与标准正态分布的差异值。
在一个例子中,电子设备将同样时间序列的标准正态分布转换为概率密度函数,得到s 1,s 2,s 3,...,s m,将s 1,s 2,s 3,...,s m与x′ 1,x′ 2,x′ 3,...,x′ m进行显著性检验。电子设备采用曼-惠特尼U检验(Mann-Whitney U test)非参数检验方法确定标准正态分布(s 1,s 2,s 3,...,s m)与目标分布(x′ 1,x′ 2,x′ 3,...,x′ m)的差异值。具体的,假设两个样本分别来自除了总体均值以外完全相同的两个总体,目的是检验这两个总体的均值是否有显著的差别。首先,混合两组数据 (s 1,s 2,s 3,...,s m)与(x′ 1,x′ 2,x′ 3,...,x′ m),对所有数据排序,按照数值大小给定一个值,即秩。然后分别求出两组数据的秩和,(s 1,s 2,s 3,...,s m)的秩和为W 1,(x′ 1,x′ 2,x′ 3,...,x′ m)的秩和为W 2,,计算两组数据的统计量U1,U2,U1,U2与对应的W 1,W 2满足如下公式:
Figure PCTCN2021091775-appb-000006
Figure PCTCN2021091775-appb-000007
其中,m为生产记录中工艺步骤对应的时间点的个数。W 1为s 1,s 2,s 3,...,s m的秩和,W 2为x′ 1,x′ 2,x′ 3,...,x′ m的秩和,电子设备选择U 1、U 2中较小的值作为U与预设的临界值U a比较,当U<U a时,拒绝上述假设,即目标分布与标准正态分布的差异值大。当U大于或等于U a时,接受上述假设,认为两个样本来自相同的总体,说明目标分布与标准正态分布的差异值小。
如图7所示为目标分布与标准正态分布的对比图。图7中左图中目标分布与标准正态分布的差异值小于右图中目标分布与标准正态分布的差异值。
步骤三:电子设备根据该差异值确定工艺步骤对突发不良的影响程度。
具体的,基于步骤二电子设备确定的U,电子设备根据U确定工艺步骤对突发不良的影响程度p value。示例性的,电子设备将U转换为0到1之间的数值作为p value,p value越大,说明越不能拒绝上述假设,差别无显著性意义,即两组数据分布相同,相应地,突发性时间段内无不良发生率低的样本穿插,则该工艺步骤对突发不良的影响程度越大。
图7中左图对应的工艺步骤对突发不良的影响程度大于右图中工艺步骤对突发不良的影响程度。
需要说明的是,采用四次多项式曲线拟合方法得到的拟合指标值,在进行工艺步骤对突发不良的影响程度的判断的过程中得到的结果更准确。预设分布是对样本在突发不良的情况下的样本的指标值的分布规律的总结,预设分布可以是标准正态分布,也可以是指数分布等其他分布类型,本公开实施例对此不进行限定。
可以理解的是,本公开实施例确定工艺步骤对突发不良的影响程度的方法基于样本的指标值在某个工艺步骤对应的不良高发时间段内表现的一致性原则,即在某个工艺步骤对应的不良高发时间段内,经过该工艺步骤的样本的指标值异变高度集中的原则。示例性的,在第一工艺步骤的不良高发时间段内经过该第一工艺步骤的样本无不良发生率低的样本穿插,而在第二工艺 步骤的不良高发时间段内经过该第二工艺步骤的样本存在不良发生率低的样本穿插,则第一工艺步骤对不良突发的影响程度大于第二工艺步骤对不良突发的影响程度。
在工艺信息为工艺参数的情况下,电子设备通过如下步骤确定工艺参数对突发不良的影响程度:
步骤一:电子设备获取生产记录中工艺参数的临界变化点,并将临界变化点对应的时间确定为第二生产时间。
在一种可能的实现方式中,电子设备获取生产记录中工艺参数的基尼系数,将基尼系数最小的工艺参数的值确定为工艺参数的临界变化点,将临界变化点对应的时间确定为第二生产时间。
具体的,电子设备将生产记录中工艺参数的每个值作为切割点(cutpoint),求每个切割点对应的基尼系数,得到多个基尼系数,并将基尼系数最小的工艺参数的值确定为工艺参数的临界变化点,将临界变化点对应的时间确定为第二生产时间。
在一个例子中,电子设备以目标工艺参数的值按大小排序,得到数组为effect_data=[x 1,x 2,x 3,...,x n],得到每个目标工艺参数的值对应的样本的指标值为cause_data=[y 1,y 2,y 3,...,y n];电子设备对cause_data中y 1,y 2,y 3,...,y n分别求基尼系数,将基尼系数最小的工艺参数的值作为突变点,将该突变点对应的时间作为第二生产时间。
步骤二:电子设备确定第一生产时间与第二生产时间的差值,并根据该差值确定工艺参数对突发不良的影响程度。
示例性的,假设,第一生产时间与第一工艺参数的第二生产时间的差值的绝对值为0.5小时,第一生产时间与第二工艺参数的第二生产时间的差值的绝对值为8小时,那么,第一工艺参数对突发不良的影响程度大于第二工艺参数对突发不良的影响程度。在实际应用过程中,电子设备中可以预设时间阈值,在第一生产时间与第二生产时间的差值大于时间阈值的情况下,确定第二生产时间的工艺参数对突发不良的影响程度为0,即说明该工艺参数对突发不良没有影响。在第一生产时间与第二生产时间的差值的绝对值小于或者等于时间阈值的情况下,电子设备确定第二生产时间的工艺参数对突发不良的影响程度为1,即说明该工艺参数对突发不良有影响。
需要说明的是,本公开实施例中电子设备确定工艺参数对突发不良的影响程度的过程中,电子设备可以对每个工艺参数进行分析,也可以是:首先,电子设备根据上述确定的工艺步骤对突发不良的影响程度,确定出待分析工艺步骤,电子设备对待分析工艺步骤下的工艺参数进行分析以确定出多个工艺参数对突发不良的影响程度,这样,可以确定较少的工艺参数对突发不良 的影响程度,从而提高定位突发不良的原因的效率。
可选的,S103:电子设备获取生产记录中工艺信息的基尼系数。工艺信息的基尼系数用于表征工艺信息与样本的指标值的关联程度;根据工艺信息的基尼系数,确定工艺信息对突发不良的影响程度。
在工艺信息为工艺步骤的情况下,电子设备从生产记录中获取与样本标识对应的工艺步骤以及指标值,并根据与样本标识对应的工艺步骤以及指标值,获取该工艺步骤的基尼系数。电子设备根据工艺步骤的基尼系数确定工艺步骤对突发不良的影响程度。
假设,第一工艺步骤作为决策树中的一个子节点,即二分类的特征属性,该第一工艺步骤对样本的不良的影响结果为影响或不影响。电子设备可以使用决策树的一种CART树中的杂质度量方法Gini系数来计算样本经过的各个工艺步骤对获取的样本的指标值的影响程度,Gini系数越小,代表不确定性越小,影响程度越大。在一个K类的分类问题中,对于给定的多个样本D,其Gini系数为
Figure PCTCN2021091775-appb-000008
C K是D中属于第K类的样本。本公开实施例中为一个二分类问题,经过第一工艺步骤的样本的总数量为D,C K为D中不良样本的数量。Gini(D)从一定程度上反应第一工艺步骤对样本不良的影响程度。
可选的,电子设备对生产记录中该工艺步骤进行卡方检验,得到该工艺步骤对样本的指标值的卡方检验值,卡方检验值用于表征该工艺步骤对样本的指标值的影响程度,然后,电子设备根据第一预设权重、卡方检验值以及工艺步骤的基尼系数,确定工艺步骤对突发不良的影响程度。
可以理解的是,统计学中卡方检验就是统计样本的实际观测值与理论推断值之间的偏离程度,实际观测值与理论推断值之间的偏离程度就决定卡方值的大小,卡方值越大,实际观测值与理论推断值越不符合;卡方值越小,偏差越小,实际观测值与理论推断值越趋于符合。卡方检验最基本的思想就是通过观察实际值与理论值的偏差来确定理论的正确与否,卡方检验值chicsquare满足公式
Figure PCTCN2021091775-appb-000009
该公式表示n个样本中,理论值为E,实际值为x的偏差程度。对于本公开实施例而言,对于某一设备标识所表征的设备1来说,假设,设备1对样本的不良是没有影响的,它们是独立不相关的,而经过设备1样本的指标值实际为下表3所示,那么根据全体样本不良率可计算出理论值,由上述公式得出卡方检验值。将卡方检验值代入计算卡方分布的概率密度函数,可得到pValue。
表3
  不良
经过设备1 subbad subgood
未经过设备1 totalbad-subbad totalgood-subgood
表3中subbad为经过设备1的不良样本数量,subgood为经过设备1的良样本数量,totalbad-subbad为未经过设备1的不良样本数量,totalgood-subgood为经过设备1的良样本数量。
可以理解的是,电子设备可以使用上述工艺步骤1对应的卡方检验值或工艺步骤1对应的Gini系数作为工艺步骤1对样本的指标值的影响程度,或者,电子设备根据第一预设权重、卡方检验值以及工艺步骤的基尼系数,确定工艺步骤对突发不良的影响程度。示例性的,第一预设权重为0.5和0.5,电子设备获取卡方检验值与0.5的第一乘积,电子设备获取基尼系数与0.5的第二乘积,并将第一乘积与第二乘积的和作为工艺步骤对突发不良的影响程度。
在工艺信息为工艺参数的情况下,电子设备从生产记录中获取与样本标识对应的工艺参数以及指标值,并根据与样本标识对应的工艺参数以及指标值获取工艺参数的基尼系数,然后,电子设备根据工艺参数的基尼系数确定工艺参数对突发不良的影响程度。
具体的,获取工艺参数的基尼系数参考上述方法,不再赘述。
可选的,电子设备对生产记录中工艺参数以及样本的指标值,进行关联检验得到工艺参数的影响参数;影响参数用于表征工艺参数对样本的指标值的影响程度,然后,电子设备根据第二预设权重、该影响参数与工艺参数的基尼系数得到该工艺参数对突发不良的影响程度。其中,关联检验可以为正态分布检验、方差齐性检验或T检验中的至少一种。
示例性的,影响参数的第二预设权重为0.4和工艺参数的基尼系数的第二预设权重为0.6,电子设备获取影响参数与0.4的第三乘积,电子设备获取工艺参数的基尼系数与0.6的第四乘积,并将第三乘积与第四乘积的和作为工艺参数对突发不良的影响程度。
如图8所示为本公开上述实施例中从S100~S102的流程图,电子设备根据获取的生产记录获取结果变量(即上述指标值),再然后电子设备获取原因变量并将原因变量分为连续型原因变量(例如:上述工艺参数)和离散型原因变量(例如:上述工艺步骤)。对于连续型原因变量电子设备采用突变点检测确定结果变量中不良的高发时间点,以及连续型原因变量中的临界变化点,根据不良高发时间点以及临界变化点确定连续型原因变量对突发不良的影响程度。对于离散型原因变量,电子设备定位不良高发的时间段,并对不良高发的时间段内的结果变量以及原因变量进行拟合,并将拟合结果与标准正态分布进行显著性检验从而得到离散型原因变量对突发不良的影响程度。综合显示离散型原因变量对突发不良的影响程度以及连续型原因变量对突发不良的影响程度。
可以理解的是,本公开实施例中最终确定出的结果为多个工艺参数中每个工艺参数对突发不良的影响程度,和/或,多个工艺步骤中每个工艺步骤对突发不良的影响程度,电子设备显示确定出的结果,这样,用户可以从显示的结果中判断对突发不良影响程度最大的一个或多个工艺信息,以定位突发不良的原因。
可以理解的是,本公开实施例中,电子设备可以仅获取一次多个样本中每个样本的源生产信息(例如,在一个设备中存储的样本标识以及该样本标识所表征样本的指标值,在其他几个设备中存储的样本标识以及该样本标识对应的工艺信息),并将其存储在该电子设备或其他中间设备上。在上述不同步骤中电子设备在获取数据(例如:上述获取生产记录、工艺步骤、工艺参数、生产时间等数据)的过程中,电子设备从该电子设备或其他存储该多个样本的生产信息的设备中获取数据即可,这样,加快了数据处理的速度。本公开实施例对数据存储的格式不进行限定,示例性的,数据存储可以为parquet格式存储。
本公开实施例中,电子设备根据获取的生产记录中的指标值以及工艺信息对应的时间,确定不良高发时间段,然后,电子设备根据不良高发时间段以及获取的生产记录,确定工艺信息对突发不良的影响程度,从而可以挖掘样本突发不良的时间趋势的相关性影响,并量化为数值,从而可以为用户更准确全面地提供数据以定位不良原因。
如图9所示为本公开实施例所提供的另一种数据处理方法的流程图,该方法可以应用于图3所示的电子设备,图9所示的方法可以包括以下步骤:
S200:电子设备接收用户在条件选择界面输入的样本筛选条件。其中,样本筛选条件包括产品型号、检测站点、生产时间段、工艺标识、设备标识、工艺参数或不良类型中的至少一种。
示例性的,电子设备显示的条件选择界面如图10所示,图10的A中包括时间段输入框、检测站点输入框、产品型号输入框、工序(即工艺步骤)输入框等,图10的B为不良类型输入框界面。图10中原材料可以为面板母版,检测站点可以用于用户选择该检测站点,该检测站点下至少包括六种不良类型:类型1不良数可以用于用户选择类型1的样本的不良数作为不良类型,类型1不良率可以用于用户选择类型1的样本的不良率作为不良类型,类型1原材料的不良率可以用于用户选择类型1的该原材料的不良率作为不良类型,类型2不良数可以用于用户选择类型2的样本的不良数作为不良类型,类型2不良率可以用于用户选择类型2的样本的不良率作为不良类型,类型2原材料的不良率可以用于用户选择类型2的该原材料的不良率作为不良类型。
可选的,筛选条件还包括原因变量,示例性的,电子设备显示的原因变量输入界面如图11所示。图11中原材料可以为面板母版。图11中的检测站点为可以用于用户选择的检测站点,产品可以用于用户选择产品型号。图11 中工艺标识可以用于用户选择对应工艺,一个工艺对应至少一个工艺步骤,图11中工艺步骤标识1以及工艺步骤标识2均可以用于用户选择工艺步骤,图11中标识为工艺步骤标识2的工艺步骤对应了至少三个设备。其中,设备1对应一个设备、设备2对应一个设备、设备3对应一个设备。
S201:电子设备获取与样本筛选条件对应的多个样本中每个样本的生产记录;生产记录包括工艺信息、工艺信息对应生产时间以及指标值;工艺信息为工艺参数或工艺步骤;指标值用于表征样本属于预设不良类型的不良程度;多个样本包括不良样本,不良样本为指标值大于第一阈值的样本。
具体的,参考上述S100中电子设备获取生产记录的方式,不再赘述。
本公开实施例中指标值可以为不良类型的不良率、不良数。本公开实施例可以使用Qtest量测类数据(如:厚度、电性参数)是否达标来确定样本良或不良。
S202:电子设备根据获取的生产记录中的指标值以及工艺信息对应的时间,确定不良高发时间段,不良高发时间段为不良样本的分布概率大于第二阈值的时间段。
具体的,参考上述S101中的描述不再赘述。
S203:电子设备根据不良高发时间段以及获取的生产记录,确定工艺信息对突发不良的影响程度。
具体的,参考上述S102中的描述,不再赘述。
S204:电子设备在分析结果展示显示工艺信息对突发不良的影响程度。
可选的,首先,电子设备对获取的多个工艺信息对突发不良的影响程度进行排序,然后,电子设备在分析结果展示界面显示排序后的工艺信息对突发不良的影响程度。
示例性的,电子设备对获取的多个工艺信息对突发不良的影响程度进行降序排序,并显示排序后的工艺信息对突发不良的影响程度。这样,对突发不良的影响程度最大的则会排在最前面,方便用户查看。
可以理解的是,电子设备可以根据预设权重、工艺步骤对突发不良的多个影响程度量化值,确定该工艺步骤对突发不良的一个影响程度量化值并显示。或者,电子设备分别显示工艺步骤对突发不良的每个影响程度量化值。同样的,电子设备可以根据预设权重、工艺参数对突发不良的多个影响程度量化值,确定该工艺参数对突发不良的一个影响程度量化值并显示。或者,电子设备分别显示工艺参数对突发不良的每个影响程度量化值。
如图12所示为电子设备在分析结果展示界面中显示的设备标识对突发不良的影响程度量化值,图12中序号下的16为该行数据的序号,设备1为设备标识。第一影响程度量化值0.9397为标识为设备1的设备对突发不良的影响程度量化值。第二影响程度量化值0.012293为从时间维度分析得到的标识 为设备1的设备对突发不良的另一个影响程度量化值。其余与此类似,不再赘述。
上述主要从方法的角度对本公开实施例提供的方案进行了介绍。为了实现上述功能,其包含了执行各个功能相应的硬件结构和/或软件模块。本领域技术人员应该很容易意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,本公开能够以硬件或硬件和计算机软件的结合形式来实现。某个功能究竟以硬件还是计算机软件驱动硬件的方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
本公开实施例可以根据上述方法示例对上述实施例中的电子设备进行功能模块的划分,例如,可以对应各个功能划分各个功能模块,也可以将两个或两个以上的功能集成在一个处理模块中。上述集成的模块既可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。需要说明的是,本公开实施例中对模块的划分是示意性的,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式。
如图13所示,为本公开实施例提供的一种数据处理装置70的结构图。数据处理装置70包括:获取模块701、第一确定模块702以及第二确定模块703,获取模块701用于获取多个样本中每个样本对应的生产记录;生产记录包括工艺信息、工艺信息对应生产时间以及指标值;工艺信息为工艺参数和/或工艺步骤;指标值用于表征样本属于预设不良类型的不良程度;多个样本包括不良样本,不良样本为指标值大于第一阈值的样本;第一确定模块702,用于根据获取的生产记录中的指标值以及工艺信息对应的时间,确定不良高发时间段,不良高发时间段为不良样本的分布概率大于第二阈值的时间段;第二确定模块703,用于根据不良高发时间段以及获取的生产记录,确定工艺信息对突发不良的影响程度。例如:结合图4,获取模块701可以用于执行S100,第一确定模块702可以用于执行S101,第二确定模块703可以用于执行S102。
在一些实施例中,在工艺信息为工艺步骤的情况下,第二确定模块703具体用于:确定不良高发时间段内样本的指标值在工艺步骤对应的生产时间上的目标分布;确定目标分布与预设分布的差异值;差异值用于表征经过工艺步骤的不良样本的分布概率;根据差异值确定工艺步骤对突发不良的影响程度。
在另一些实施例中,第二确定模块703具体用于:第二确定模块具体用于:将工艺步骤对应的生产时间数值化为工艺步骤对应的时间数值;确定不良高发时间段内样本的指标值在工艺步骤对应的时间数值上的目标分布。
在另一些实施例中,第二确定模块703具体用于:采用多项式曲线拟合方法,将工艺步骤对应的时间数值拟合为拟合指标值;将拟合指标值在时间数值上的分布确定为:不良高发时间段内样本的指标值在工艺步骤对应的时间数值上的目标分布。
在另一些实施例中,预设分布为标准正态分布;目标分布为多项式分布;第二确定模块703具体用于:采用显著性检验获取目标分布与标准正态分布的差异值。
在另一些实施例中,第一确定模块702具体用于:根据第一阈值以及获取的生产记录中的指标值将多个样本划分为良样本和不良样本;将生产记录中不良样本的数量与样本总数量的比值大于第二阈值的时间段确定为不良高发时间段;时间段为工艺信息对应的时间的时间段。
在另一些实施例中,在工艺信息为工艺参数的情况下,获取模块701还用于:采用突变点检测获取第一生产时间;第一生产时间为指标值的突变时间点;第一生产时间为不良高发时间段中的时间点;获取生产记录中工艺参数的临界变化点,并将临界变化点对应的时间确定为第二生产时间,第二确定模块703具体用于确定第一生产时间与第二生产时间的差值,并根据该差值确定工艺参数对突发不良的影响程度。
在另一些实施例中,获取模块701还用于:获取生产记录中工艺参数的基尼系数;将基尼系数最小的工艺参数的值确定为工艺参数的临界变化点。
在另一些实施例中,获取模块701还用于:获取生产记录中工艺信息的基尼系数;工艺信息的基尼系数用于表征工艺信息与样本的指标值的关联程度;第二确定模块703具体用于:根据工艺信息的基尼系数,确定工艺信息对突发不良的影响程度。
在另一些实施例中,生产记录还包括样本标识,在工艺信息为工艺参数的情况下,获取模块701具体用于:从生产记录中获取与样本标识对应的工艺参数以及指标值;根据与样本标识对应的工艺参数以及指标值获取工艺参数的基尼系数。
在另一些实施例中,生产记录还包括样本标识,在工艺信息为工艺步骤的情况下,获取模块701具体用于:从生产记录中获取与样本标识对应的工艺步骤以及指标值;根据与样本标识对应的工艺步骤以及指标值,获取工艺步骤的基尼系数。
在另一些实施例中,数据处理装置还包括检验模块704,用于对生产记录中工艺步骤进行卡方检验,得到工艺步骤对样本的指标值的卡方检验值,卡方检验值用于表征工艺步骤对样本的指标值的影响程度;第二确定模块703具体用于:根据第一预设权重、卡方检验值以及工艺步骤的基尼系数,确定工艺步骤对突发不良的影响程度。
在另一些实施例中,检验模块704还用于:对生产记录中工艺参数以及样本的指标值,进行关联检验得到工艺参数的影响参数;影响参数用于表征工艺参数对样本的指标值的影响程度;第二确定模块703具体用于:根据第二预设权重、影响参数与工艺参数的基尼系数得到工艺参数对突发不良的影响程度。
在一些实施例中,上述获取模块701具体用于:获取多个样本中每个样 本的样本标识与指标值的第一对应关系,并获取每个样本的样本标识、工艺信息以及工艺信息对应生产时间的第二对应关系;根据每个样本的样本标识、第一对应关系以及第二对应关系,建立每个样本的工艺信息、工艺信息对应生产时间以及指标值的第三对应关系。
在一个示例中,参见图3,上述获取模块701的接收功能可以由图3中的接口单元304实现。上述获取模块701的处理功能、第一确定模块702、第二确定模块703以及检验模块704可以由图3中的处理器301调用存储器302中存储的计算机程序实现。
关于上述可选方式的具体描述参见前述的方法实施例,此处不再赘述。此外,上述提供的任一种应用实例的数据处理装置70的解释以及有益效果的描述均可参考上述对应的方法实施例,不再赘述。
需要说明的是,上述各个模块对应执行的动作仅是具体举例,各个单元实际执行的动作参照上述基于图4所述的实施例的描述中提及的动作或步骤。
如图14所示,为本公开实施例提供的另一种数据处理装置80的结构图,该数据处理装置80包括接收模块801、获取模块802、确定模块803以及显示模块804,接收模块801用于接收用户在条件选择界面输入的样本筛选条件;获取模块802,用于获取与样本筛选条件对应的多个样本中每个样本的生产记录;生产记录包括工艺信息、工艺信息对应生产时间以及指标值;工艺信息为工艺参数和/或工艺步骤;指标值用于表征样本属于预设不良类型的不良程度;多个样本包括不良样本,不良样本为指标值大于第一阈值的样本;确定模块803,用于根据获取的生产记录中的指标值以及工艺信息对应的时间,确定不良高发时间段,不良高发时间段为不良样本的分布概率大于第二阈值的时间段;根据不良高发时间段以及获取的生产记录,确定工艺信息对突发不良的影响程度;显示模块804,用于在分析结果展示界面显示工艺信息对突发不良的影响程度。例如,结合图9,接收模块801可以用于执行S200,获取模块802可以用于执行S201,确定模块803可以用于执行S202~S203,显示模块804可以用于执行S204。
在一些实施例中,数据处理装置还包括:排序模块805,用于对获取的多个工艺信息对突发不良的影响程度进行排序;显示模块804具体用于:在分析结果展示界面显示排序后的工艺信息对突发不良的影响程度。
在一个示例中,参见图3,上述接收模块801、获取模块802的接收功能可以由图3中的接口单元304实现。上述获取模块802的处理功能、确定模块803以及排序模块805可以由图3中的处理器301调用存储器302中存储 的计算机程序实现。显示模块804可以由图3中的显示器306实现。
关于上述可选方式的具体描述参见前述的方法实施例,此处不再赘述。此外,上述提供的任一种应用实例的数据处理装置80的解释以及有益效果的描述均可参考上述对应的方法实施例,不再赘述。
需要说明的是,上述各个模块对应执行的动作仅是具体举例,各个单元实际执行的动作参照上述基于图9所述的实施例的描述中提及的动作或步骤。
本公开说实施例还提供一种电子设备,包括:处理器和用于存储所述处理器可执行指令的存储器;其中,所述处理器被配置为执行所述可执行指令,以实现上述任一实施例所述的数据处理方法。
本公开的一些实施例提供了一种计算机可读存储介质(例如,非暂态计算机可读存储介质),该计算机可读存储介质中存储有计算机程序指令,计算机程序指令在处理器上运行时,使得处理器执行如上述实施例中任一实施例所述的数据处理方法中的一个或多个步骤。
示例性的,上述计算机可读存储介质可以包括,但不限于:磁存储器件(例如,硬盘、软盘或磁带等),光盘(例如,CD(Compact Disk,压缩盘)、DVD(Digital Versatile Disk,数字通用盘)等),智能卡和闪存器件(例如,EPROM(Erasable Programmable Read-Only Memory,可擦写可编程只读存储器)、卡、棒或钥匙驱动器等)。本公开描述的各种计算机可读存储介质可代表用于存储信息的一个或多个设备和/或其它机器可读存储介质。术语“机器可读存储介质”可包括但不限于,无线信道和能够存储、包含和/或承载指令和/或数据的各种其它介质。
本公开的一些实施例还提供了一种计算机程序产品。该计算机程序产品包括计算机程序指令,在计算机上执行该计算机程序指令时,该计算机程序指令使计算机执行如上述实施例所述的数据处理方法中的一个或多个步骤。
本公开的一些实施例还提供了一种计算机程序。当该计算机程序在计算机上执行时,该计算机程序使计算机执行如上述实施例所述的数据处理方法中的一个或多个步骤。
上述计算机可读存储介质、计算机程序产品及计算机程序的有益效果和上述一些实施例所述的数据处理方法的有益效果相同,此处不再赘述。
以上所述,仅为本公开的具体实施方式,但本公开的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本公开揭露的技术范围内,想到变化或替换,都应涵盖在本公开的保护范围之内。因此,本公开的保护范围应以所述权利要求的保护范围为准。

Claims (19)

  1. 一种数据处理方法,包括:
    获取多个样本中每个样本对应的生产记录;生产记录包括工艺信息、工艺信息对应生产时间以及指标值;所述工艺信息为工艺参数和/或工艺步骤;所述指标值用于表征样本属于预设不良类型的不良程度;所述多个样本包括不良样本,所述不良样本为所述指标值大于第一阈值的样本;
    根据获取的生产记录中的所述指标值以及所述工艺信息对应的时间,确定不良高发时间段,所述不良高发时间段为不良样本的分布概率大于第二阈值的时间段;
    根据所述不良高发时间段以及获取的生产记录,确定所述工艺信息对突发不良的影响程度。
  2. 根据权利要求1所述的数据处理方法,在所述工艺信息为所述工艺步骤的情况下,所述根据所述不良高发时间段以及获取的生产记录,确定所述工艺信息对突发不良的影响程度,包括:
    确定所述不良高发时间段内样本的指标值在所述工艺步骤对应的生产时间上的目标分布;
    确定所述目标分布与预设分布的差异值;所述差异值用于表征经过所述工艺步骤的不良样本的分布概率;
    根据所述差异值确定所述工艺步骤对突发不良的影响程度。
  3. 根据权利要求2所述的数据处理方法,所述确定所述不良高发时间段内样本的指标值在所述工艺步骤对应的生产时间上的目标分布,包括:
    将所述工艺步骤对应的生产时间数值化为所述工艺步骤对应的时间数值;
    确定所述不良高发时间段内样本的指标值在所述工艺步骤对应的时间数值上的目标分布。
  4. 根据权利要求3所述的数据处理方法,所述确定所述不良高发时间段内样本的指标值在所述工艺步骤对应的时间数值上的目标分布,包括:
    采用多项式曲线拟合方法,将所述工艺步骤对应的时间数值拟合为拟合指标值;
    将所述拟合指标值在时间数值上的分布确定为:所述不良高发时间段内样本的指标值在所述工艺步骤对应的时间数值上的目标分布。
  5. 根据权利要求2-4任一项所述的数据处理方法,所述预设分布为标准正态分布;所述目标分布为多项式分布;所述确定所述目标分布与预设分布的差异值,包括:
    采用显著性检验获取所述目标分布与所述标准正态分布的差异值。
  6. 根据权利要求1-5任一项所述的数据处理方法,所述根据获取的生产记录中的所述指标值以及所述工艺信息对应的时间,确定不良高发时间段,包括:
    根据所述第一阈值以及获取的生产记录中的所述指标值将所述多个样本划分为良样本和不良样本;
    将所述生产记录中不良样本的数量与样本总数量的比值大于所述第二阈 值的时间段确定为所述不良高发时间段。
  7. 根据权利要求1-6任一项所述的数据处理方法,在所述工艺信息为所述工艺参数的情况下,所述根据所述不良高发时间段以及获取的生产记录,确定所述工艺信息对突发不良的影响程度,包括:
    采用突变点检测获取第一生产时间;所述第一生产时间为所述指标值的突变时间点;所述第一生产时间为所述不良高发时间段中的时间点;
    获取所述生产记录中所述工艺参数的临界变化点,并将所述临界变化点对应的时间确定为第二生产时间;
    确定所述第一生产时间与所述第二生产时间的差值,并根据所述差值确定所述工艺参数对突发不良的影响程度。
  8. 根据权利要求7所述的数据处理方法,所述获取所述生产记录中所述工艺参数的临界变化点,包括:
    获取所述生产记录中所述工艺参数的基尼系数;
    将基尼系数最小的所述工艺参数的值确定为所述工艺参数的临界变化点。
  9. 根据权利要求1-8任一项所述的数据处理方法,所述方法还包括:
    获取所述生产记录中工艺信息的基尼系数;所述工艺信息的基尼系数用于表征所述工艺信息与样本的指标值的关联程度;
    根据所述工艺信息的基尼系数,确定所述工艺信息对所述突发不良的影响程度。
  10. 根据权利要求9所述的数据处理方法,所述方法还包括:
    对所述生产记录中所述工艺步骤进行卡方检验,得到所述工艺步骤对样本的指标值的卡方检验值,所述卡方检验值用于表征所述工艺步骤对样本的指标值的影响程度;
    根据第一预设权重、所述卡方检验值以及所述工艺步骤的基尼系数,确定所述工艺步骤对突发不良的影响程度。
  11. 根据权利要求9或10所述的数据处理方法,所述方法还包括:
    对所述生产记录中所述工艺参数以及样本的指标值,进行关联检验得到所述工艺参数的影响参数;所述影响参数用于表征所述工艺参数对样本的指标值的影响程度;
    根据第二预设权重、所述影响参数与所述工艺参数的基尼系数得到所述工艺参数对突发不良的影响程度。
  12. 根据权利要求1-11任一项所述的数据处理方法,所述获取多个样本中每个样本对应的生产记录,包括:
    获取多个样本中每个样本的样本标识与指标值的第一对应关系,并获取所述每个样本的样本标识、工艺信息以及工艺信息对应生产时间的第二对应关系;
    根据所述每个样本的样本标识、所述第一对应关系以及所述第二对应关系,建立所述每个样本的工艺信息、工艺信息对应生产时间以及指标值的第三对应关系。
  13. 一种数据处理方法,包括:
    接收用户在条件选择界面输入的样本筛选条件;
    获取与所述样本筛选条件对应的多个样本中每个样本的生产记录;生产记录包括工艺信息、工艺信息对应生产时间以及指标值;所述工艺信息为工艺参数和/或工艺步骤;所述指标值用于表征样本属于预设不良类型的不良程度;所述多个样本包括不良样本,所述不良样本为所述指标值大于第一阈值的样本;
    根据获取的生产记录中的所述指标值以及所述工艺信息对应的时间,确定不良高发时间段,所述不良高发时间段为不良样本的分布概率大于第二阈值的时间段;
    根据所述不良高发时间段以及获取的生产记录,确定所述工艺信息对突发不良的影响程度;
    在分析结果展示界面显示所述工艺信息对突发不良的影响程度。
  14. 根据权利要求13所述的数据处理方法,在分析结果展示界面显示所述工艺信息对突发不良的影响程度,包括:
    对获取的多个所述工艺信息对突发不良的影响程度进行排序;
    在分析结果展示界面显示排序后的所述工艺信息对突发不良的影响程度。
  15. 一种数据处理装置,包括:
    获取模块,用于获取多个样本中每个样本对应的生产记录;生产记录包括工艺信息、工艺信息对应生产时间以及指标值;所述工艺信息为工艺参数和/或工艺步骤;所述指标值用于表征样本属于预设不良类型的不良程度;所述多个样本包括不良样本,所述不良样本为所述指标值大于第一阈值的样本;
    第一确定模块,用于根据获取的生产记录中的所述指标值以及所述工艺信息对应的时间,确定不良高发时间段,所述不良高发时间段为不良样本的分布概率大于第二阈值的时间段;
    第二确定模块,用于根据所述不良高发时间段以及获取的生产记录,确定所述工艺信息对突发不良的影响程度。
  16. 一种数据处理装置,包括:
    接收模块,用于接收用户在条件选择界面输入的样本筛选条件;
    获取模块,用于获取与所述样本筛选条件对应的多个样本中每个样本的生产记录;生产记录包括工艺信息、工艺信息对应生产时间以及指标值;所述工艺信息为工艺参数和/或工艺步骤;所述指标值用于表征样本属于预设不良类型的不良程度;所述多个样本包括不良样本,所述不良样本为所述指标 值大于第一阈值的样本;
    确定模块,用于根据获取的生产记录中的所述指标值以及所述工艺信息对应的时间,确定不良高发时间段,所述不良高发时间段为不良样本的分布概率大于第二阈值的时间段;根据所述不良高发时间段以及获取的生产记录,确定所述工艺信息对突发不良的影响程度;
    显示模块,用于在分析结果展示界面显示所述工艺信息对突发不良的影响程度。
  17. 一种电子设备,其特征在于,包括:
    处理器和用于存储所述处理器可执行指令的存储器;其中,所述处理器被配置为执行所述可执行指令,以实现如权利要求1-12任一项所述的数据处理方法,或者,以实现如权利要求13或14所述的数据处理方法。
  18. 一种计算机可读存储介质,其特征在于,当所述计算机可读存储介质中的指令由电子设备的处理器执行时,使得所述电子设备能够执行如权利要求1-12任一项所述的数据处理方法,或者,执行如权利要求13或14所述的数据处理方法。
  19. 一种计算机程序产品,其特征在于,所述计算机程序产品包括计算机指令,当所述计算机指令在计算机设备上运行时,使得所述计算机设备执行如权利要求1-12任一项所述的数据处理方法,或者,以执行如权利要求13或14任一项所述的数据处理方法。
PCT/CN2021/091775 2021-04-30 2021-04-30 数据处理方法、装置、设备及存储介质 WO2022227094A1 (zh)

Priority Applications (3)

Application Number Priority Date Filing Date Title
CN202180001029.XA CN115623872A (zh) 2021-04-30 2021-04-30 数据处理方法、装置、设备及存储介质
PCT/CN2021/091775 WO2022227094A1 (zh) 2021-04-30 2021-04-30 数据处理方法、装置、设备及存储介质
US18/253,961 US20240004375A1 (en) 2021-04-30 2021-04-30 Data processing method, and electronic device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2021/091775 WO2022227094A1 (zh) 2021-04-30 2021-04-30 数据处理方法、装置、设备及存储介质

Publications (1)

Publication Number Publication Date
WO2022227094A1 true WO2022227094A1 (zh) 2022-11-03

Family

ID=83847585

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/091775 WO2022227094A1 (zh) 2021-04-30 2021-04-30 数据处理方法、装置、设备及存储介质

Country Status (3)

Country Link
US (1) US20240004375A1 (zh)
CN (1) CN115623872A (zh)
WO (1) WO2022227094A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117370871A (zh) * 2023-12-05 2024-01-09 张家港广大特材股份有限公司 一种特种钢材的质量分析方法与系统

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109711659A (zh) * 2018-11-09 2019-05-03 成都数之联科技有限公司 一种工业生产的良率提升管理系统和方法
CN110276410A (zh) * 2019-06-27 2019-09-24 京东方科技集团股份有限公司 确定不良原因的方法、装置、电子设备及存储介质
CN110399996A (zh) * 2018-04-25 2019-11-01 深圳富桂精密工业有限公司 制程异常状态预判方法及预判系统
US20190354094A1 (en) * 2018-05-17 2019-11-21 National Cheng Kung University System and method that consider tool interaction effects for identifying root causes of yield loss
CN110531722A (zh) * 2019-09-12 2019-12-03 四川长虹电器股份有限公司 基于数据采集的工艺参数推荐系统及方法

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110399996A (zh) * 2018-04-25 2019-11-01 深圳富桂精密工业有限公司 制程异常状态预判方法及预判系统
US20190354094A1 (en) * 2018-05-17 2019-11-21 National Cheng Kung University System and method that consider tool interaction effects for identifying root causes of yield loss
CN109711659A (zh) * 2018-11-09 2019-05-03 成都数之联科技有限公司 一种工业生产的良率提升管理系统和方法
CN110276410A (zh) * 2019-06-27 2019-09-24 京东方科技集团股份有限公司 确定不良原因的方法、装置、电子设备及存储介质
CN110531722A (zh) * 2019-09-12 2019-12-03 四川长虹电器股份有限公司 基于数据采集的工艺参数推荐系统及方法

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117370871A (zh) * 2023-12-05 2024-01-09 张家港广大特材股份有限公司 一种特种钢材的质量分析方法与系统
CN117370871B (zh) * 2023-12-05 2024-04-02 张家港广大特材股份有限公司 一种特种钢材的质量分析方法与系统

Also Published As

Publication number Publication date
US20240004375A1 (en) 2024-01-04
CN115623872A (zh) 2023-01-17

Similar Documents

Publication Publication Date Title
US8370181B2 (en) System and method for supply chain data mining and analysis
CN111209274B (zh) 一种数据质量检核方法、系统、设备及可读存储介质
CN112860769B (zh) 一种能源规划数据管理系统
WO2021103401A1 (zh) 数据对象分类方法、装置、计算机设备和存储介质
CN113763502B (zh) 一种图表生成方法、装置、设备和存储介质
CN103971023A (zh) 研发过程质量自动评估系统和方法
CN113051317A (zh) 一种数据探查方法和系统、数据挖掘模型更新方法和系统
CN114880405A (zh) 一种基于数据湖的数据处理方法及系统
WO2022227094A1 (zh) 数据处理方法、装置、设备及存储介质
CN111881000A (zh) 一种故障预测方法、装置、设备及机器可读介质
CN108122059B (zh) 一种药品生产企业的生产风险识别的方法及自动预警系统
US20080208528A1 (en) Apparatus and method for quantitatively measuring the balance within a balanced scorecard
CN104156312B (zh) 一种评估软件可靠性的方法
WO2023184281A9 (zh) 一种检测参数分析方法及装置
US20180046927A1 (en) Data analysis device and analysis method
WO2022088084A1 (zh) 数据处理方法、装置及系统、电子设备
WO2022252051A1 (zh) 数据处理方法、装置、设备及存储介质
CN115344495A (zh) 批量任务测试的数据分析方法、装置、计算机设备及介质
CN113688120A (zh) 数据仓库的质量检测方法、装置和电子设备
CN111241086B (zh) 一种基于医疗大数据的数据质量改进方法及系统
WO2024055281A1 (zh) 异常根因分析方法及装置
WO2022198680A1 (zh) 数据处理方法及装置、电子设备、存储介质
US20210318672A1 (en) Manufacturing Defect Factor Searching Method and Manufacturing Defect Factor Searching Apparatus
CN103971194A (zh) 送验材料清单的产生装置及方法
CN117370326A (zh) 一种数据评估方法、装置、电子设备及介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21938567

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 18253961

Country of ref document: US

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21938567

Country of ref document: EP

Kind code of ref document: A1