WO2023130304A1

WO2023130304A1 - Data processing method and system, and computer-readable storage medium

Info

Publication number: WO2023130304A1
Application number: PCT/CN2022/070461
Authority: WO
Inventors: 代言玉; 吴建波; 王瑜; 王士侠; 吴建民; 王洪; 李园园; 王萍; 陈韵; 何德材
Original assignee: 京东方科技集团股份有限公司; 北京中祥英科技有限公司
Priority date: 2022-01-06
Filing date: 2022-01-06
Publication date: 2023-07-13
Also published as: CN116724321A

Abstract

The present disclosure relates to a data processing method and system, and a computer-readable storage medium. The method comprises: generating an HBase table according to a first Hive table and a second Hive table, and taking the HBase table as a target data table, wherein the first Hive table comprises production record data of each product, and the second Hive table comprises test parameter data of a product which has been tested; acquiring a target test parameter to be processed, wherein the target test parameter matches a row key of the target data table; according to the target test parameter matching the row key of the target data table, obtaining target test data corresponding to the target test parameter, wherein the target test data comprises test parameter data representing that a product is defective; and displaying the target test data, so as to perform data defect analysis according to the target test data. In the present embodiment, target parameter data matches a product test scenario, thereby facilitating the reduction of the difficulty of data analysis, and improving the efficiency and accuracy of analysis and testing.

Description

Data processing method, system, computer-readable storage medium

technical field

The present disclosure relates to the technical field of data processing, and in particular, to a data processing method, system, and computer-readable storage medium.

Background technique

At present, in the production field of industrial products, different products will adopt different processes, and through the operation of different equipment and different personnel, subtle problems in any link will cause problems in the final industrial products. Therefore, it is necessary to record the details of each industrial product. The production process provides a basis for subsequent product failure analysis.

Contents of the invention

The present disclosure provides a data processing method, system, and computer-readable storage medium to solve the deficiencies of related technologies.

According to a first aspect of an embodiment of the present disclosure, a data processing method is provided, including:

Generate the HBase table according to the first Hive table and the second Hive table, and use the HBase table as the target data table; the first Hive table includes the production history data of each product, and the second Hive table includes the detected Product testing parameter data;

Obtaining target detection parameters to be processed, where the target detection parameters match the row keys of the target data table;

Matching the row keys of the target data table according to the target detection parameters to obtain target detection data corresponding to the target detection parameters; the target detection data includes detection parameter data representing defective products;

Displaying the target detection data for data analysis based on the target detection data.

Optionally, generate an HBase table according to the first Hive table and the second Hive table, including:

Extracting the production history information based on a preset first time interval and storing it in the first Hive table; and extracting the detection parameter data based on a preset second time interval and storing it in the second Hive table;

The first Hive is converted into the Hbase intermediate table with the product identification code as the row key;

Fusing the second Hive table and the Hbase intermediate table to obtain the HBase table.

Optionally, merging the second Hive table and the Hbase intermediate table to obtain the HBase table includes:

Obtaining the main process site based on the detection data of the second Hive table;

The Hbase intermediate table after screening is obtained based on the data screening in the Hbase intermediate table based on the main process site;

The second Hive table is fused with the filtered Hbase intermediate table to obtain the HBase table.

Optionally, obtaining target detection data corresponding to the target detection parameters includes:

According to the reference point of the reference product, it is determined to obtain the target detection data by averaging the detection data whose distance from the reference point is less than or equal to the set distance;

The benchmark product is the product with the most detection points.

Optionally, according to the reference point of the reference product, it is determined that the target detection data is obtained by averaging the detection data whose distance from the reference point is less than or equal to the set distance, including:

Respectively match the control detection point on the control product with the reference point on the reference product;

Obtain the distance between each control detection point of the reference product and the corresponding reference point on the reference product;

For each reference point on the reference product, obtain the candidate detection data of the reference point and the reference detection point when the distance between the comparison detection point is less than or equal to the preset distance threshold;

Obtain the average value of the candidate detection data, and use the average value of all detection points to generate a mean value distribution map, and use the mean value distribution map as the target detection data.

Optionally, a step of obtaining a benchmark product is also included, including:

Obtain the number of detection points of each product in the target data table, and obtain the product with the largest number of detection points; and use the product with the largest number of detection points as a reference product, and the products other than the reference product as reference products.

Optionally, the method further includes the step of post-processing the target detection data, including:

Taking each coordinate point in the mean value distribution diagram as the center of a circle, taking a preset length as a radius to obtain a circle, and obtaining an integer closest to the center of the circle;

Merge the coordinate points in the circle to obtain a coordinate point corresponding to the circle; the coordinate data of the coordinate point is the integer, and the nearest integer is used as the result of merging the coordinate points in the circle. coordinate.

Optionally, the target detection data also includes matching deviation data, and the method further includes:

Obtain the maximum value and minimum value of the detection data of the product in the target detection data, and the average value corresponding to each detection point;

The deviation data of each detection point is calculated according to the maximum value, the minimum value and the average value corresponding to each detection point.

Optionally, the method also includes:

Visualizing the deviation data in the mean distribution graph.

Optionally, the target detection parameters are obtained through a preset filter, and a download button is set on the filter, and the method further includes:

When an operation triggering the download button is detected, importing the target detection data into a preset table;

The preset table is downloaded to a designated location, so that the user can perform data analysis according to the target detection data in the preset table.

According to a second aspect of an embodiment of the present disclosure, a data processing system is provided, including:

The target table acquisition module is used to generate the HBase table according to the first Hive table and the second Hive table, and use the HBase table as the target data table;

A target parameter acquisition module, configured to acquire preset target detection parameters; a target data acquisition module, configured to match the row keys of the target data table according to the target detection parameters, and obtain target detection data corresponding to the target detection parameters; The target detection data includes detection parameter data representing defective products;

The target data display module is used to display the target detection data, so as to perform data analysis according to the target detection data.

According to a third aspect of an embodiment of the present disclosure, a data processing system is provided, including a data processing device, and the data processing device includes:

processor;

memory for storing a computer program executable by said processor;

Wherein, the processor is configured to execute the computer program in the memory, so as to realize the above-mentioned method.

Optionally, a display device and a distributed storage device are also included;

The distributed storage device is configured to acquire and store production history data and detection parameter data;

The display device is configured to display the target detection data.

According to a fourth aspect of the embodiments of the present disclosure, a computer-readable storage medium is provided, and when an executable computer program in the storage medium is executed by a data processing device, the above-mentioned method can be implemented.

The technical solutions provided by the embodiments of the present disclosure may include the following beneficial effects:

It can be seen from the above embodiments that in the solution provided by the embodiments of the present disclosure, an HBase table can be generated according to the first Hive table and the second Hive table, and the HBase table can be used as the target data table; the first Hive table includes the Production history data, the second Hive table includes the detection parameter data of the product that has been detected; obtains the target detection parameter to be processed, and the target detection parameter matches the row key of the target data table; according to the target detection The parameter matches the row key of the target data table, and the target detection data corresponding to the target detection parameter is obtained; the target detection data includes detection parameter data representing defective products; the target detection data is displayed to detect Data for data analysis. In this embodiment, the above-mentioned target parameter data can be selected according to the product process and historical experience, so that the target parameter data can be matched with the product detection scene, which is conducive to reducing the difficulty of analyzing the defective process; and, using the target detection data for defective analysis , can find information such as the location and process of product defects, which improves the efficiency and accuracy of analysis and detection.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the present disclosure.

Description of drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description serve to explain the principles of the disclosure.

Fig. 1 is a block diagram of a data processing system according to an exemplary embodiment.

Fig. 2 is a flow chart showing a data processing method according to an exemplary embodiment.

Fig. 3 is a flow chart of acquiring a target data table according to an exemplary embodiment.

Fig. 4 is a schematic diagram showing the effect of a filter according to an exemplary embodiment.

Fig. 5 is a flow chart of acquiring target detection data according to an exemplary embodiment.

Fig. 6 is a flow chart of acquiring target detection data according to an exemplary embodiment.

Fig. 7 is a schematic diagram showing the effect of a mean value distribution diagram according to an exemplary embodiment.

Fig. 8 is a schematic diagram showing the effect of another mean distribution graph according to an exemplary embodiment.

Fig. 9 is a flow chart of merging coordinate points according to an exemplary embodiment.

Fig. 10 is a flow chart of acquiring deviation data according to an exemplary embodiment.

Fig. 11 is a block diagram of a data processing device according to an exemplary embodiment.

Fig. 12 is a block diagram of a server according to an exemplary embodiment.

Detailed ways

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numerals in different drawings refer to the same or similar elements unless otherwise indicated. The exemplary described embodiments below do not represent all embodiments consistent with the present disclosure. Rather, they are merely examples of devices consistent with aspects of the present disclosure as recited in the appended claims. It should be noted that, in the case of no conflict, the features in the following embodiments and implementation manners can be combined with each other.

At present, in the production field of industrial products, different products will adopt different processes, and through the operation of different equipment and different personnel, subtle problems in any link will cause problems in the final industrial products. Therefore, it is necessary to record the details of each industrial product. The production process provides a basis for subsequent product failure analysis. Taking the semiconductor display industry as an example, in the process of producing display panels, the glass substrate Glass will go through different production processes and different equipment. Affected by objective factors such as various production processes or equipment, various defects may occur in the final display panel. In related technologies, technicians usually manually locate the cause of failure, so that the processing time and accuracy cannot meet the production requirements.

In order to solve the above technical problem, an embodiment of the present disclosure provides a data processing method, which can be applied to a data processing system. FIG. 1 is a block diagram of a data processing system according to an exemplary embodiment. Referring to FIG. 1 , the data processing system 100 includes a data processing device 300 , a display device 200 and a distributed storage device 400 . The data processing device 300 is connected to the display device 200 and the distributed storage device 400 respectively. The distributed storage device 400 includes a data lake layer, a data warehouse layer (HIVE) and a data mart (HBASE). The user can input the parameters to be queried through the interactive interface on the display device 200, and the display device 200 can also access the data mart through the API interface. The data processing device 300 can access the data mart through the API interface, so as to process the data obtained from the data mart and send it to the display device 200 for display.

Continuing to refer to FIG. 1 , the data processing system includes multiple sets of data with different contents and/or storage structures, and stores them in the distributed storage device 400 . In some embodiments, the ETL module in the distributed storage device 400 can extract raw data from multiple data sources into the data processing system to form the first data layer (for example, the data lake layer DL), so as to reduce the impact on the product The load of production equipment and manufacturing system is convenient for data reading of subsequent analysis equipment. The data source can be the original data of the production equipment, which is stored in the corresponding manufacturing system, such as YMS (Yield Management System, yield management system), FDC (Fault Detection&Classification, error detection and classification), MES (Manufacturing Execution System, manufacturing execution system) and other system relational databases (such as Oracle, Mysql, etc.). The above-mentioned ETL module refers to computer program logic configured to provide functions such as extracting, transforming or loading data. In some embodiments, the ETL module is stored on one or more storage nodes in the distributed network, loaded into one or more memories in the distributed network, and processed by one or more memory nodes in the distributed network device execution.

The data lake layer in the distributed storage device 400 is a centralized HDFS (Hadoop Distributed File System, distributed file system) or KUDU database for storing any structured or unstructured data. Optionally, the data lake is configured to store the first set of data extracted by the ETL module from multiple data sources DS. Optionally, the first set of data has the same content as the original data. The dimensions and attributes of the original data are saved in the first set of data. In some embodiments, the first set of data stored in the data lake includes dynamically updated data. Optionally, the dynamically updated data includes real-time updated data in a Kudu-based database, or periodically updated data in the Hadoop distributed file system. In one example, periodically updated data stored in the Hadoop distributed file system is stored in Hive-based storage. In an example, the dynamically updated data also includes real-time update data, and the real-time update means the update below the minute level but does not include the minute update, so as to be different from the above-mentioned periodic update that means above the minute level and includes the minute update.

In some embodiments, the distributed storage device 400 further includes a second data layer, such as a data warehouse. A data warehouse includes an internal storage system characterized by providing data in an abstracted manner, which may include a table format or a view format, without exposing the file system. A data warehouse can be implemented based on Hive. At this point, the ETL module can extract, clean, convert or load the first set of data to form the second set of data. Optionally, the first set of data can be cleaned and standardized to form the second set of data. In some embodiments, the second set of data further includes statistical data, such as detection point count, maximum value, minimum value and average value of detection point parameter values, proportion of defects, and the like.

In some embodiments, the distributed storage device 400 includes a third data layer, such as at least one data mart. Optionally, the data mart is a database of NoSQL type that can be used for computing processing. Optionally, the data mart is implemented based on Hbase. The ETL module can also transform the second data to form a third set of data.

Those skilled in the art can understand that the first set of data, the second set of data, and the third set of data can be stored and queried based on one or more data tables.

In some embodiments, the process of converting the second set of data to form the third set of data may be importing data from the data warehouse (hive table) into the data mart (Hbase table). In one example, a first table is generated in a data mart and a second table (eg, an external table) is generated in a data warehouse. The first table and the second table are configured to be synchronized such that when data is written to the second table, the first table will be simultaneously updated to include the corresponding data. In another example, the MapReduce module in Hadoop can be used as a distributed computing processing module for reading data written to a data warehouse. The data written to the data warehouse can then be written to the data mart. In one example, data can be written to a data mart using the HBase-based API. In another example, once the MapReduce module reads the data written to the data mart, it can generate HFile files and load them in batches (Bulkloaded) to the data mart.

In some embodiments, data flows, data transformations, and data structures between various components of a data processing system are described. In some embodiments, the raw data collected by the plurality of data sources DS includes at least one of production history data, parameter data or detection parameter data. Raw data can optionally contain dimensional information (time, factory, equipment, operator, map, chamber, card slot, etc.) duration, etc.).

Production history data information contains information on specific treatments that a product such as a panel or glass undergoes during manufacture. Examples of specific processes a product undergoes during manufacture include factories, processes, stations, equipment, chambers, slots, and operators.

Parametric data contains information on the specific environmental parameters and changes to which a product (such as a panel or glass) is subjected during manufacture. Examples of specific environmental parameters and changes to which a product is subjected during manufacturing include ambient particulate conditions, equipment temperature, and equipment pressure, among others.

The detection parameter data includes the resistance, film thickness, threshold voltage, reflection pattern deviation, reverse cut-off current, etc. of the product detected based on the detection station.

In one example, the present data processing system integrates various business data (eg, data related to semiconductor electronic device manufacturing) into multiple data sources DS (eg, Oracle database). The ETL module extracts data from multiple data sources into a data lake, for example using a data stack tool, SQOOP tool, kettle tool, Pentaho tool or DataX tool. Then, the data is cleaned, transformed and loaded into the data warehouse. Data warehouse DW and data mart DMT utilize tools such as Kudu, Hive, and Hbase to store large amounts of data and analysis results.

The information generated in various stages of the manufacturing process is obtained by various sensors and inspection equipment, and then stored in multiple data sources DS, or calculated or analyzed by the data obtained by sensors and inspection equipment, and the results are calculated at this time And analysis results are also stored in multiple data sources DS. The data synchronization (flow of data) among the various components of the data processing system is realized through the ETL module. For example, the ETL module is configured to obtain a parameter configuration template of the synchronization process, including network license and database port configuration, inflow database name and table name, outflow database name and table name, field correspondence, task type, scheduling cycle, etc. The ETL module configures the parameters of the synchronization process based on the parameter configuration template. The ETL module synchronizes the data and cleans the synchronized data based on the process configuration template. The ETL module cleans the data through SQL statements to remove null values, remove outliers, and establish correlations between related tables. The data synchronization task includes data synchronization between multiple data sources and the distributed storage device 400 , and data synchronization between various layers of the distributed storage device 400 (eg, data lake, data warehouse, or data mart).

In another example, the distributed storage device 400 may complete data extraction to the data lake in real time or offline. In offline mode, data extraction tasks are scheduled periodically. Optionally, in the offline mode, the extracted data may be stored in a storage device based on the Hadoop distributed file system (for example, a Hive-based database). In real-time mode, data extraction tasks can be performed by OGG (Oracle GoldenGate) combined with Apache Kafka. Optionally, in real-time mode, the extracted data can be stored in a Kudu-based database. OGG reads log files in multiple data sources (eg, Oracle database) for add/remove data. In one example, a front-end interface (eg, an API interface) can perform display, query, and/or analysis based on data stored in a Kudu-based database. In another example, the front-end interface may be based on data stored in any one or any combination of a Kudu-based database, a Hadoop distributed file system (e.g., an Apache Hive ^T -based database), and/or an Hbase-based database. Perform display, query and/or analysis. In another example, short-term data (e.g., generated over several months) is stored in a Kudu-based database, while long-term data (e.g., all data generated over all cycles) is stored in Hadoop distributed files system (for example, a Hive-based database). In another example, the ETL module is configured to ingest data stored in a Kudu-based database into a Hadoop distributed file system (eg, a Hive-based database).

Build a data warehouse based on a data lake by combining data from various business systems (MDW, YMS, MES, FDC, etc.). Divide the data ingested from the data lake based on task execution times that do not exactly match the timestamps in the raw data. In addition, there is a possibility of data duplication. Therefore, it is necessary to clean and standardize the data in the data lake to build a data warehouse based on the data lake to meet the needs of upper-level applications for data accuracy and division. The data tables stored in the data warehouse are obtained by cleaning and standardizing the data in the data lake. Based on user requirements, the field format is standardized to ensure that the data tables in the data warehouse are completely consistent with the data tables in multiple data sources. At the same time, data is divided according to date or month, time and other fields, which greatly improves query efficiency and reduces running memory requirements. The data warehouse can be one or any combination of Kudu-based databases and Apache Hive-based databases.

In an embodiment, the distributed storage device 400 may be one storage, multiple storages, or a general term for multiple storage elements. For example, the memory can include: random access memory (Random Access Memory, RAM), double data rate synchronous dynamic random access memory (Double Data Rate Synchronous Dynamic Random Access Memory, DDR SRAM), can also include non-volatile memory (non-volatile memory ), such as disk storage, flash memory (Flash), etc.

The display device 200 is used for displaying an interface, and can display processing results of the data processing device 300 . In an embodiment, the display device may be a display, or a product including a display, such as a TV, a computer (all-in-one or desktop), a computer, a tablet computer, a mobile phone, an electronic picture screen, and the like. In an embodiment, the display device may be any device that displays images, whether in motion (eg, video) or stationary (eg, still images), and whether text or text. More specifically, it is contemplated that the described embodiments may be implemented in or associated with a variety of electronic devices such as, but not limited to, game consoles, television monitors, flat panel displays, computer monitors, automotive displays (e.g., odometer displays, etc.), navigators, cockpit controls and/or displays, electronic photographs, electronic billboards or signs, projectors, architectural structures, packaging and aesthetic structures (e.g., displays of images of pieces of jewelry), etc.

In an embodiment, the data processing device 300 can be implemented by at least one server, and is used to implement the data processing method described in any of the following embodiments, that is, the data processing system implements the data processing method described in any subsequent embodiment , FIG. 2 is a flow chart showing a data processing method according to an exemplary embodiment. Referring to FIG. 2 , a data processing method includes steps 21 to 24 .

In step 21, an HBase table is generated according to the first Hive table and the second Hive table, and the HBase table is used as a target data table; the first Hive table includes production history data of each product, and the second Hive table Contains inspection parameter data for products that have already been inspected.

In this embodiment, the data processing system can acquire the target data table.

In an example, the target data table can be stored in a preset location, such as local storage, cache or cloud, and is a target data table pre-processed by other servers. In this scenario, the data processing system can read from the preset location The target data table can be used directly.

In another example, the target data table may be generated by the data processing system in real time, and the generation process may match the first data layer, the second data layer, and the third data layer of the distributed storage device 400 in FIG. 1 . Referring to Fig. 3, the data processing system can obtain the main process site based on the detection data of the second Hive table; Then, based on the main process site, the data screening in the Hbase intermediate table obtains the Hbase intermediate table after screening; afterward, fusion The second Hive table and the filtered Hbase intermediate table are used to obtain the HBase table.

For example, combined with the target data table including the production history data and detection parameter data of each product, the data processing system can communicate with the production system according to the first time interval (such as every natural day) and extract the production history data (such as historical production history data and extract the production history data of the current day), that is, to obtain the production history data of the product (in an additional way), this loading process can be similar to the process in which the distributed storage device 400 reads the source data from the Oracle database to the data lake in Figure 1 match. Then, the data processing system can extract, clean, convert and load the production history data (including historical production history data and current day production history data), that is, ETL (Extract-Load-Transform) processing, and then store it in the first Hive table. The data structure of the first Hive table is shown in Table 1.

Table 1 GLASS production history data sheet

字段名field name	意义significance
Timedaytime day	检测当天年月日Year, month, day of detection
Timekeytime key	写入时间write time
FactoryFactory	工厂factory
glass_idglass_id	玻璃基板识别码-Glass IDGlass substrate identification code-Glass ID
product_idproduct_id	产品IDProduct ID
step_idstep_id	工艺站点craft site
eqp_ideqp_id	工艺设备Process equipment
unit_idunit_id	设备单元equipment unit

The data processing system can generate the Hbase intermediate table for the historical production history data of the row key (RowKey) according to the product production history data and/or product identification code (such as Glass ID) of the day in the first Hive table. The data structure of the Hbase intermediate table is shown in Table 2.

Table 2 Structure description of the intermediate table (MAP:Track)

The structural description of the middle table (MAP:Track) in Table 2 continues to refer to FIG. 3 , the data processing system can communicate with the detection system according to the second time interval (such as every natural day) and load the detection parameter data in the detection parameter data table. Next, take the loading of the test data of the day as an example. The data processing system can store the test parameter data of the product in the second Hive table after ETL processing. The data structure of the second Hive table is shown in Table 3.

Table 3 (twyield.tw_glass_map) detection parameter data table

字段名field name	意义significance
timedaytimeday	检测当天年月日Year, month, day of detection
end_timeend_time	检测时间detection time
factoryfactory	工厂factory
lot_idlot_id	批次IDbatch ID
glass_idglass_id	Glass IDGlass ID
step_idstep_id	检测站点detection site
eqp_ideqp_id	设备IDdevice ID
product_idproduct_id	产品IDProduct ID
xx	检测点Glass坐标-xDetection point Glass coordinates-x
ythe y	检测点Glass坐标-yDetection point Glass coordinates-y
itemitem	检测参数名称Detection parameter name
valuevalue	检测值detection value
typetype	参数类型Parameter Type

Then, the data processing system can read the detected data of the day in the second Hive table. In some embodiments, the abnormal value filtering operation can be performed on the parameters, wherein each parameter value corresponds to a value range, and the value range can be set according to the product or experience value, which is not limited here. In this example, filtering parameter outliers through this value range can avoid the influence of outliers on subsequent calculation results, and make subsequent calculation results pay more attention to statistical trend values rather than values of special examples, which is more conducive to data analysis. Causes of poor positioning.

And, the data processing system can obtain the main process site list according to the detection parameter data in the second Hive table of the day, or obtain the main process site list uploaded by the user according to the product. In this example, obtaining the main process site list through the detection parameter data of the day can reduce the amount of data processing and improve the efficiency of obtaining the list. In addition, the data processing system obtains the main process site list uploaded by the user, and can use the user's experience in the product production process and data analysis experience such as poor positioning problems to obtain a more reliable site list, which is conducive to obtaining a relatively accurate list, and then has This is conducive to improving the accuracy of subsequent processing results and improving positioning efficiency.

The data processing system can obtain the intersection of the main process site in the Hbase intermediate table and the list of main process sites obtained based on the second Hive table, and obtain the main process site that produces the product and the corresponding detection data of each main process site.

Continuing to refer to FIG. 3 , the data processing system can also process the filtered data, such as obtaining statistical information of detection points, such as the maximum value, minimum value or average value of the detection data. In this example, the relatively large data volume of production history data and detection parameter data can be used to obtain statistical information, and information that reflects the trend of parameters can be obtained, so as to use statistical information to assist in locating defects.

Continuing to refer to FIG. 3 , the data processing system can aggregate the second Hive table, the Hbase intermediate point, the above-mentioned main process site, the detection data corresponding to the main process site, the statistical information of the detection point, etc., to obtain the above-mentioned target data table. The data structure of the above target data table is shown in Table 4.

Table 4 (MAP:Master) Hbase table structure description

In step 22, target detection parameters to be processed are acquired, and the target detection parameters match the row keys of the target data table.

In this embodiment, the data processing system may acquire target detection parameters. The above-mentioned target detection parameters include time range, factory, product model, detection site and detection parameters, and may also include main process site, equipment, and unit. It should be noted that the first parameter in this embodiment includes parameters such as time range, factory, product model, detection site and detection parameters, and is stored in the first specified column family (the column family is Info, and the column name is MasterStep). The main process site, equipment, and unit can set target detection parameters according to requirements, so as to avoid the problem of excessive data volume caused by too many target detection parameters, which is conducive to improving query efficiency.

In an example, the filter shown in FIG. 4 can be displayed in the display device 200 shown in FIG. 1, and the user can select the target detection parameter in the filter, that is, the target detection parameter can be passed through the filter shown in FIG. to get. In this example, the filter can adopt a cascade structure. The detection parameters of each level are related to the upper and lower levels. After selecting the detection parameters of a certain level, all options of the detection parameters of the next level can be filtered out. The value of the item is defaulted to select the first item under this option as the selected value, and the user can also choose other option values as the filter condition. For example, the first level is the time range, when the start time (such as May 1, 2021) and end time (such as May 20, 2021) are detected, the factories involved in production within this time range can be screened out (that is, the second level); when it is detected that the user selects a factory (such as ARRAY), the model of the product (such as BNA320WH5V401) that has been produced by the factory can be filtered out (that is, the third level). When it is detected that the user has input the production batch/product ID (that is, the fourth level) under the model of the produced product, and so on, the detection parameters (such as RS_DATA) to be queried can finally be obtained. Among them, different detection parameters can represent different meanings, such as RS_DATA (area resistance), THICKNESS (film thickness), TP (reflecting graphic offset), Vth (threshold voltage), IOFF (reverse cut-off current) and so on. It is understandable that technicians can select different parameters according to requirements and finally combine different target detection parameters.

It should be noted that the time range in this example can be within one month (period). If it is detected that the time interval between the start time and the end time exceeds one month, a reminder message will be generated to remind the user that the time filled in exceeds the time range and needs to be reset. fill in. In this way, by setting the time range in this example, you can avoid too much data in each query, which is beneficial to improve query efficiency and ensure data validity.

It should be noted that in this example, multiple Iot IDs or Glass IDs can be entered in the input box of the Iot/Glass ID parameter, and the adjacent two IDs are separated by commas or spaces to achieve the effect of synchronous detection of multiple parameters , so as to improve the detection efficiency.

In step 23, according to the matching of the target detection parameters and the row key of the target data table, the target detection data corresponding to the target detection parameters is obtained; the target detection data includes detection parameter data representing defective products.

In this embodiment, the data processing system can obtain the target detection data according to the target detection parameters and the target data table, and according to the reference point of the reference product, determine the average of the detection data whose distance from the reference point is less than or equal to the set distance to obtain the target detection Data, specifically:

In an example, referring to FIG. 5 , the acquisition of object detection data by the data processing system includes steps 51 to 53 .

In step 51, the data processing system can match the target detection parameters with the row keys in the target data table to obtain the original detection data corresponding to the target detection parameters.

Considering that target detection parameters can include time range, factory, product model, detection site and detection parameters, in this step, each parameter can be sequentially matched with the row keys in the target data table to obtain corresponding data; and so on, to obtain Satisfy the original detection data corresponding to the above target detection parameters.

In step 52, the data processing system may acquire the detection point data of the column family in the original detection data, and obtain the initial detection data corresponding to the target detection parameters. The data processing system can obtain the detection point data of each product under the column family (for example, the column family is Info, and the column name is MasterStep) in the original detection data to obtain the initial detection data. Among them, the initial detection data refers to the collection of detection data of the detection points on the (passed inspection) product, and the passed inspection means that the detection system has already detected it.

In step 53, the data processing system may process the initial detection data to obtain target detection data.

It should be noted that the target detection parameters can also include parameters such as the main process site, equipment, and unit of the column family in the original detection data, so the data volume of the initial detection data obtained at this time is much smaller than that of the scheme shown in Figure 5 Quantity, that is, those skilled in the art can select the number of target detection parameters according to specific scenarios, so as to obtain initial detection data that meets the requirements. In other words, by increasing the number of parameters included in the target detection parameters, the final initial detection data can be reduced, which is beneficial to reduce the amount of data in the subsequent processing process, thereby improving the efficiency of obtaining processing results.

In an embodiment, the data processing system may acquire a reference product, and the reference product may be a product pre-specified by the user, or may be selected from past products. Considering the actual situation in the production process, the number of detection points for some products will be reduced, or the number of detection points for some products will be increased, that is, the number of detection points for each product is random. Therefore, in this step, the data processing system can obtain the number of detection points of each product, and then sort the number of detection points in all products in the initial detection data, so as to obtain the product with the largest number of detection points. The data processing system can use the product with the largest number of detection points as the reference product, and the products other than the reference product as the reference product.

In this embodiment, referring to FIG. 6 , the acquisition of target detection data by the data processing system in step 53 may include steps 61 to 64 .

In step 61, the data processing system can respectively match the control detection point on the control product with the reference point on the reference product. Taking the product as a glass substrate as an example, the data processing system can align the top corners or positioning marks of each control product and the reference product, and then match the detection points on the control product and the reference product in turn, that is, the first row of the control product The control detection point in the first column matches the reference point position in the first row and the first column on the reference product, and the control detection point in the first row and the second column of the reference product matches the reference point in the first row and the second column on the reference product Match, and so on, until the match is the detection point of the last row and last column.

In step 62, the data processing system may acquire the distance between each control detection point of the control product and the corresponding reference point on the reference product. The data processing system can use the Euclidean distance method to obtain the distance between two matching detection points, for example, calculate the control detection point of the first row and first column of the control product and the reference point of the first row and first column of the reference product The distance between matches, calculate the distance between the control detection point of the first row and the second column of the reference product and the reference point of the first row and the second column of the reference product, and so on, calculate the last row and the last column of the reference product The distance between the control test point and the reference point of the last row and last column of the reference product.

In step 63, for each reference point on the reference product, the data processing system can obtain, for each detection point on the reference product, the reference point and the comparison detection point whose distance from the comparison detection point is less than or equal to the preset distance threshold candidate detection data. The data processing system can store a preset distance threshold, and the preset distance threshold ranges from 1 to 10 mm, which can be set according to specific scenarios. In an example, the preset distance threshold is 3mm. The data processing system can compare the distance of each detection point with the preset distance threshold, so as to obtain the reference point and control detection point whose distance is less than or equal to the preset distance threshold, and the detection data of each detection point.

In step 64, the data processing system may obtain the average value of the detection data, and use the average value of all detection points to generate a mean value distribution map, and use the mean value distribution map as the target detection data. In this step, for each detection point on the reference product, the data processing system can calculate the detection data of each detection point obtained in step 64, that is, the distance from each detection point on the reference product is less than or equal to the above-mentioned preset distance threshold. The average value of the test data of the test point (including this test point on the reference product).

In this step, the data processing system can generate an average value distribution diagram based on the coordinate data of each detection point on the reference product and the average value of the detection data corresponding to the detection point, as shown in Figure 7 or Figure 8 . It should be noted that the mean value distribution diagram shown in Figure 7 is generated based on the target detection data shown in Figure 5, and the mean value distribution diagram shown in Figure 8 is based on the target detection parameters shown in Figure 5 by adding the main process station and equipment The target detection data shown after the unit and other parameters are generated.

It should be noted that the first row of data in the mean distribution diagram shown in Figure 7 and Figure 8 is the abscissa of each detection point, the first column of data is the ordinate of each detection point, and the data in other parts is the The average value of the detection data of the point, and a blank indicates that there is no detection data at this coordinate.

It should be noted that the average value distribution diagrams shown in Fig. 7 and Fig. 8 are to establish a table in advance, and then combine the coordinate data of each detection point with the coordinate points in the table (that is, the uniquely determined spaces of the abscissa and ordinate) One-to-one correspondence, and the average value of each detection point is imported into the coordinate point, and finally the effects shown in Figure 7 and Figure 8 are obtained.

It can be understood that the adjacent abscissas or adjacent ordinates in the pre-established table can be set at equal intervals. At this time, the number of coordinate points (or cells) in the table will be far more than the number of detection points (such as 70), then there will be a lot of spaces in the mean distribution map, which will affect the normal use of the mean distribution map. For this reason, the data processing system can also perform the following processing on the mean value distribution graph, referring to FIG. 9 , specifically including

steps

91 and 92.

In step 91, the data processing system can take each coordinate point in the mean value distribution diagram as the center and a preset length as the radius (such as 6mm) to obtain a circle, and obtain the nearest integer to the center of the circle. In step 92, the data processing system may combine the coordinate points in the circle to obtain a coordinate point corresponding to the circle; the coordinate data of the coordinate point is the integer. Considering that the distance between each detection point (such as tens of mm) is usually greater than the above-mentioned preset length, forming a circle with the coordinate point as the center usually only includes the detection point, so the coordinate data of the detection point can be processed as (the closest to the center of the circle) integer, so the number of spaces in the mean distribution graph can be greatly reduced after the combination shown in Figure 9 above. It can be understood that, FIG. 7 and FIG. 8 show the effect after the coordinates are merged, and the interval between the abscissa and the ordinate is no longer equal after the merge. For example, the abscissa -1670 to the abscissa -839, the difference between the two is 831mm; the abscissa -839 to the abscissa -9mm, the difference between the two is 830mm. Combined with the reduction of the interval of the ordinate, the number of spaces can be greatly reduced, and the effect that the number of coordinate points in the mean value distribution diagram shown in Figure 7 and Figure 8 is similar to the number of detection points can be achieved, and finally the result shown in Figure 7 and the mean value distribution graph shown in Figure 8, thereby facilitating the analysis of the mean value distribution graph.

It should be noted that in the outlier filtering process, when the detection data of the same detection point of multiple products are all outliers, they need to be filtered out. At this time, blanks will also be formed in the mean distribution graph. Therefore, by detecting the location of the space in the mean distribution graph, it is possible to detect whether the corresponding detection point of the product is defective. Since the merging process illustrated in FIG. 9 can greatly reduce the number of blanks, the number of blanks to be checked can be reduced, which is beneficial to improve the efficiency of bad analysis.

In one embodiment, considering that it is difficult to find bad positions by directly displaying numbers, the target detection data in this example also includes matching deviation data. See FIG. 10 , the acquisition includes steps 101 and 102:

In step 101, the data processing system can acquire the maximum value and minimum value of the target detection data, and the average value corresponding to each detection point. For example, the above maximum and minimum values can be read from the structure shown in Table 4. In step 102, the data processing system may calculate the deviation data of each detection point according to the above-mentioned maximum value, minimum value and average value corresponding to each detection point, and the deviation data is used to assist data analysis. Among them, the calculation formula of the deviation data is shown in the following formula (1):

In formula (1), p _i represents the deviation at each detection point, x _min and x _max are respectively the maximum and minimum values of all detection data on the product after matching the target detection parameters from the target data table, and x _i represents The detection data of a certain detection point.

In practical applications, the above-mentioned deviation data can be written into the above-mentioned average value distribution graph at the same time, so that the user can see the deviation data synchronously when seeing the average value of each detection point, and use the average value and deviation data to analyze whether the detection point is Bad points, and other uncertain analysis.

In an embodiment, the data processing system can visualize the deviation data in a mean distribution graph. For example, the data processing system can convert the above-mentioned deviation data into a background color, and use the depth of the background color to represent the change trend and degree of deviation of the mean value of the product detection parameters. That is, the larger the deviation data is, the darker the color is, and the smaller the deviation data is, the lighter the color is. In an example, the degree of color depth can be set according to rgba(0,255,0,p) and rgba(255,0,0,p) functions, as shown in Figure 8 using rgba(0,255,0,p) to represent The effect of offset data, FIG. 9 shows the effect of using rgba(255,0,0,p) to represent the offset data. Technicians can choose an appropriate background color according to the specific scene. Considering the requirements of the patent application documents, gray scale is used in this disclosure to achieve the same effect.

In step 24, the target detection data is presented for data analysis based on the target detection data.

In this embodiment, the data processing system may send the above target detection data to the display device, and the display device will display the above target detection data, so that the user can perform defect analysis according to the target detection data. That is to say, in this embodiment, the parameter distribution trend of product detection parameters in different main production processes can be quantified so as to quickly locate the cause of the failure, thereby enriching the failure diagnosis and analysis method.

In an embodiment, referring to FIG. 4 , the above-mentioned filter may also be provided with a download button (such as a similarity analysis button), and when the filter detects that the user triggers the operation of the download button, it may send a download request to the data processing system. When the data processing system receives the above-mentioned download request, it can import the target detection data into a preset table, and the preset table can be realized by using an EXCLE table. Then, the data processing system can download the preset form to a designated location, so that the user can perform data analysis according to the target detection data in the preset form.

So far, in this embodiment, the above-mentioned target parameter data can be selected according to the product process and historical experience, so that the target parameter data can be matched with the product detection scene, which is conducive to reducing the difficulty of analyzing the bad process; and, using the target detection data to perform Defective analysis can find information such as the location and process of product defects, which improves the efficiency and accuracy of analysis and detection. That is to say, in this embodiment, through the multi-faceted analysis of the product detection parameters, the analysis results have a theoretical basis and experience judgment in the professional field, which can help the user to locate the cause of the failure more accurately and quickly. At the same time, the operation in this embodiment is simple and convenient, which is beneficial for users to control the flow of data processing, thereby improving production efficiency.

An embodiment of the present disclosure also provides a data processing system, see FIG. 11 , including:

The target table acquisition module 111 is used to generate the HBase table according to the first Hive table and the second Hive table, and use the HBase table as the target data table;

A target parameter acquisition module 112, configured to acquire preset target detection parameters;

Target data acquisition module 113, configured to match the row key of the target data table according to the target detection parameters, and obtain target detection data corresponding to the target detection parameters; the target detection data includes detection parameter data representing defective products;

The target data display module 114 is configured to display the target detection data, so as to perform data analysis according to the target detection data.

It should be noted that the method shown in this embodiment matches the content of the method embodiment shown in FIG. 1 , and reference may be made to the content of the above method embodiment, which will not be repeated here.

In an exemplary embodiment, a data processing system is also provided. The data processing system includes a data processing device. Referring to FIG. 12 , the data processing device includes:

Processor 121;

memory 122 for storing computer programs executable by said processor;

Wherein, the processor is configured to execute the computer program in the memory, so as to realize the methods described in FIGS. 2 to 10 .

In one embodiment, it also includes a display device and a distributed storage device;

The display device is configured to display the target detection data.

In an exemplary embodiment, a computer-readable storage medium is also provided, and when the executable computer program in the storage medium is executed by a processor, the methods described in FIGS. 2 to 10 can be implemented.

In the above embodiments, all or part of them may be implemented by software, hardware, firmware or any combination thereof. When implemented using a software program, it may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When the computer instructions are loaded and executed on the computer, the processes or functions according to the embodiments of the present disclosure will be generated in whole or in part. The computer can be a general purpose computer, special purpose computer, computer network, or other programmable device. The computer instructions may be stored in a computer readable storage medium. The computer-readable storage medium may be any available medium that can be accessed by a computer, or a data storage device including a server, a data center, and the like integrated with one or more available media. The available medium may be a magnetic medium (for example, a floppy disk, a magnetic disk, a magnetic tape), an optical medium (for example, a digital video disc (digital video disc, DVD)), or a semiconductor medium (for example, a solid state drive (solid state drives, SSD)), etc. .

Some embodiments of the present disclosure provide a computer-readable storage medium (for example, a non-transitory computer-readable storage medium), where computer program instructions are stored in the computer-readable storage medium, and when the computer program instructions are run on a processor , so that the computer executes the data processing method as described in any one of the above embodiments, for example, one or more steps in the data processing method.

Exemplarily, the above-mentioned computer-readable storage medium may include, but is not limited to: a magnetic storage device (for example, a hard disk, a floppy disk, or a magnetic tape, etc.), an optical disk (for example, a CD (Compact Disk, a compact disk), a DVD (Digital Versatile Disk, Digital Versatile Disk), etc.), smart cards and flash memory devices (for example, EPROM (Erasable Programmable Read-Only Memory, Erasable Programmable Read-Only Memory), card, stick or key drive, etc.). Various computer-readable storage media described in this disclosure can represent one or more devices and/or other machine-readable storage media for storing information. The term "machine-readable storage medium" may include, but is not limited to, wireless channels and various other media capable of storing, containing and/or carrying instructions and/or data.

The processor mentioned in the embodiment of the present disclosure may be a central processing unit (Central Processing Unit, CPU), a general-purpose processor, a digital signal processor (Digital Signal Processor, DSP), an application-specific integrated circuit (Application-Specific Integrated Circuit, ASIC), Field Programmable Gate Array (Field Programmable Gate Array, FPGA) or other programmable logic devices, transistor logic devices, hardware components or any combination thereof. It may implement or execute the various illustrative logical blocks and modules described in connection with this disclosure. The processor can also be a combination of computing functions, for example, a combination of one or more microprocessors, a combination of DSP and a microprocessor, and so on.

In addition, the memory mentioned in the embodiments of the present disclosure may be Random Access Memory (Random Access Memory, RAM), flash memory, Read Only Memory (Read Only Memory, ROM), Erasable Programmable ROM (Erasable Programmable ROM) , EPROM), Electrically Erasable Programmable Read-Only Memory (Electrically Erasable EPROM, EEPROM), register, hard disk, removable hard disk, CD-ROM (CD-ROM) or any other form of storage medium well known in the art.

Other embodiments of the disclosure will be readily apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. The present disclosure is intended to cover any modification, use or adaptation that follows the general principles of the present disclosure and includes common knowledge or conventional technical means in the technical field not disclosed in the present disclosure. The specification and examples are to be considered exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It should be understood that the present disclosure is not limited to the precise constructions which have been described above and shown in the drawings, and various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

A data processing method, characterized in that, comprising:

Generate the HBase table according to the first Hive table and the second Hive table, and use the HBase table as the target data table; the first Hive table includes the production history data of each product, and the second Hive table includes the detected Product testing parameter data;

Obtaining target detection parameters to be processed, where the target detection parameters match the row keys of the target data table;

Matching the row keys of the target data table according to the target detection parameters to obtain target detection data corresponding to the target detection parameters; the target detection data includes detection parameter data representing defective products;

Displaying the target detection data for data analysis based on the target detection data.
The method according to claim 1, wherein generating the HBase table according to the first Hive table and the second Hive table includes:

Extracting the production history information based on a preset first time interval and storing it in the first Hive table; and extracting the detection parameter data based on a preset second time interval and storing it in the second Hive table;

The first Hive is converted into the Hbase intermediate table with the product identification code as the row key;

Fusing the second Hive table and the Hbase intermediate table to obtain the HBase table.
The method according to claim 2, wherein the fusion of the second Hive table and the Hbase intermediate table obtains the HBase table, including:

Obtaining the main process site based on the detection data of the second Hive table;

The Hbase intermediate table after screening is obtained based on the data screening in the Hbase intermediate table based on the main process site;

The second Hive table is fused with the filtered Hbase intermediate table to obtain the HBase table.
The method according to claim 1, wherein obtaining target detection data corresponding to the target detection parameters comprises:

According to the reference point of the reference product, it is determined that the target detection data is obtained by averaging the detection data whose distance from the reference point is less than or equal to the set distance.
The method according to claim 4, characterized in that, according to the reference point of the reference product, it is determined to obtain the target detection data by averaging the detection data whose distance from the reference point is less than or equal to the set distance, including:

Respectively match the control detection point on the control product with the reference point on the reference product;

Obtain the distance between each control detection point of the reference product and the corresponding reference point on the reference product;

For each reference point on the reference product, obtain the candidate detection data of the reference point and the reference detection point when the distance between the comparison detection point is less than or equal to the preset distance threshold;

Obtain the average value of the candidate detection data, and use the average value of all detection points to generate a mean value distribution map, and use the mean value distribution map as the target detection data.
The method according to claim 4, further comprising the step of obtaining a benchmark product, specifically comprising:

Obtain the number of detection points of each product in the target data table, and obtain the product with the largest number of detection points; and use the product with the largest number of detection points as a reference product, and the products other than the reference product as reference products.
The method according to claim 5, characterized in that, the method further comprises the step of post-processing the target detection data, specifically comprising:

Taking each coordinate point in the mean value distribution diagram as the center of a circle, taking a preset length as a radius to obtain a circle, and obtaining an integer closest to the center of the circle;

Merge the coordinate points in the circle to obtain a coordinate point corresponding to the circle; the coordinate data of the coordinate point is the integer.
The method according to claim 5, wherein the target detection data also includes matching deviation data, and the method also includes:

Obtaining the maximum value and minimum value of the detection data in the target detection data, and the average value corresponding to each detection point;

The deviation data of each detection point is calculated according to the maximum value, the minimum value and the average value corresponding to each detection point.
The method according to claim 8, characterized in that the method further comprises:

Visualizing the deviation data in the mean distribution graph.
The method according to claim 5, wherein the target detection parameters are acquired through a preset filter, the filter is provided with a download button, and the method further comprises:

When an operation triggering the download button is detected, importing the target detection data into a preset table;

The preset table is downloaded to a designated location, so that the user can perform data analysis according to the target detection data in the preset table.
A data processing system, characterized in that it comprises:

The target table acquisition module is used to generate the HBase table according to the first Hive table and the second Hive table, and use the HBase table as the target data table;

A target parameter acquisition module, configured to acquire preset target detection parameters; a target data acquisition module, configured to match the row keys of the target data table according to the target detection parameters, and obtain target detection data corresponding to the target detection parameters; The target detection data includes detection parameter data representing defective products;

The target data display module is used to display the target detection data, so as to perform data analysis according to the target detection data.
A data processing system, comprising a data processing device, characterized in that the data processing device includes:

processor;

memory for storing a computer program executable by said processor;

Wherein, the processor is configured to execute the computer program in the memory, so as to realize the method according to any one of claims 1-10.
The data processing system according to claim 12, further comprising a display device and a distributed storage device;

The distributed storage device is configured to acquire and store production history data and detection parameter data;

The display device is configured to display the target detection data.
A computer-readable storage medium, characterized in that, when the executable computer program in the storage medium is executed by a processor, the method according to any one of claims 1-10 can be realized.