WO2023130304A1 - Data processing method and system, and computer-readable storage medium - Google Patents

Data processing method and system, and computer-readable storage medium Download PDF

Info

Publication number
WO2023130304A1
WO2023130304A1 PCT/CN2022/070461 CN2022070461W WO2023130304A1 WO 2023130304 A1 WO2023130304 A1 WO 2023130304A1 CN 2022070461 W CN2022070461 W CN 2022070461W WO 2023130304 A1 WO2023130304 A1 WO 2023130304A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
target
detection
product
target detection
Prior art date
Application number
PCT/CN2022/070461
Other languages
French (fr)
Chinese (zh)
Inventor
代言玉
吴建波
王瑜
王士侠
吴建民
王洪
李园园
王萍
陈韵
何德材
Original Assignee
京东方科技集团股份有限公司
北京中祥英科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 京东方科技集团股份有限公司, 北京中祥英科技有限公司 filed Critical 京东方科技集团股份有限公司
Priority to PCT/CN2022/070461 priority Critical patent/WO2023130304A1/en
Priority to CN202280000008.0A priority patent/CN116724321A/en
Publication of WO2023130304A1 publication Critical patent/WO2023130304A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"

Definitions

  • the present disclosure relates to the technical field of data processing, and in particular, to a data processing method, system, and computer-readable storage medium.
  • the present disclosure provides a data processing method, system, and computer-readable storage medium to solve the deficiencies of related technologies.
  • a data processing method including:
  • the HBase table Generate the HBase table according to the first Hive table and the second Hive table, and use the HBase table as the target data table;
  • the first Hive table includes the production history data of each product, and the second Hive table includes the detected Product testing parameter data;
  • the target detection data includes detection parameter data representing defective products
  • the first Hive is converted into the Hbase intermediate table with the product identification code as the row key;
  • merging the second Hive table and the Hbase intermediate table to obtain the HBase table includes:
  • the Hbase intermediate table after screening is obtained based on the data screening in the Hbase intermediate table based on the main process site;
  • the second Hive table is fused with the filtered Hbase intermediate table to obtain the HBase table.
  • obtaining target detection data corresponding to the target detection parameters includes:
  • the reference point of the reference product it is determined to obtain the target detection data by averaging the detection data whose distance from the reference point is less than or equal to the set distance;
  • the benchmark product is the product with the most detection points.
  • the target detection data is obtained by averaging the detection data whose distance from the reference point is less than or equal to the set distance, including:
  • a step of obtaining a benchmark product including:
  • the method further includes the step of post-processing the target detection data, including:
  • the target detection data also includes matching deviation data
  • the method further includes:
  • the deviation data of each detection point is calculated according to the maximum value, the minimum value and the average value corresponding to each detection point.
  • the method also includes:
  • the target detection parameters are obtained through a preset filter, and a download button is set on the filter, and the method further includes:
  • the preset table is downloaded to a designated location, so that the user can perform data analysis according to the target detection data in the preset table.
  • a data processing system including:
  • the target table acquisition module is used to generate the HBase table according to the first Hive table and the second Hive table, and use the HBase table as the target data table;
  • a target parameter acquisition module configured to acquire preset target detection parameters
  • a target data acquisition module configured to match the row keys of the target data table according to the target detection parameters, and obtain target detection data corresponding to the target detection parameters
  • the target detection data includes detection parameter data representing defective products
  • the target data display module is used to display the target detection data, so as to perform data analysis according to the target detection data.
  • a data processing system including a data processing device, and the data processing device includes:
  • memory for storing a computer program executable by said processor
  • the processor is configured to execute the computer program in the memory, so as to realize the above-mentioned method.
  • a display device and a distributed storage device are also included;
  • the distributed storage device is configured to acquire and store production history data and detection parameter data
  • the display device is configured to display the target detection data.
  • a computer-readable storage medium is provided, and when an executable computer program in the storage medium is executed by a data processing device, the above-mentioned method can be implemented.
  • an HBase table can be generated according to the first Hive table and the second Hive table, and the HBase table can be used as the target data table;
  • the first Hive table includes the Production history data
  • the second Hive table includes the detection parameter data of the product that has been detected; obtains the target detection parameter to be processed, and the target detection parameter matches the row key of the target data table; according to the target detection The parameter matches the row key of the target data table, and the target detection data corresponding to the target detection parameter is obtained;
  • the target detection data includes detection parameter data representing defective products; the target detection data is displayed to detect Data for data analysis.
  • the above-mentioned target parameter data can be selected according to the product process and historical experience, so that the target parameter data can be matched with the product detection scene, which is conducive to reducing the difficulty of analyzing the defective process; and, using the target detection data for defective analysis , can find information such as the location and process of product defects, which improves the efficiency and accuracy of analysis and detection.
  • Fig. 1 is a block diagram of a data processing system according to an exemplary embodiment.
  • Fig. 2 is a flow chart showing a data processing method according to an exemplary embodiment.
  • Fig. 3 is a flow chart of acquiring a target data table according to an exemplary embodiment.
  • Fig. 4 is a schematic diagram showing the effect of a filter according to an exemplary embodiment.
  • Fig. 5 is a flow chart of acquiring target detection data according to an exemplary embodiment.
  • Fig. 6 is a flow chart of acquiring target detection data according to an exemplary embodiment.
  • Fig. 7 is a schematic diagram showing the effect of a mean value distribution diagram according to an exemplary embodiment.
  • Fig. 8 is a schematic diagram showing the effect of another mean distribution graph according to an exemplary embodiment.
  • Fig. 9 is a flow chart of merging coordinate points according to an exemplary embodiment.
  • Fig. 10 is a flow chart of acquiring deviation data according to an exemplary embodiment.
  • Fig. 11 is a block diagram of a data processing device according to an exemplary embodiment.
  • Fig. 12 is a block diagram of a server according to an exemplary embodiment.
  • FIG. 1 is a block diagram of a data processing system according to an exemplary embodiment.
  • the data processing system 100 includes a data processing device 300 , a display device 200 and a distributed storage device 400 .
  • the data processing device 300 is connected to the display device 200 and the distributed storage device 400 respectively.
  • the distributed storage device 400 includes a data lake layer, a data warehouse layer (HIVE) and a data mart (HBASE).
  • HIVE data warehouse layer
  • HBASE data mart
  • the user can input the parameters to be queried through the interactive interface on the display device 200, and the display device 200 can also access the data mart through the API interface.
  • the data processing device 300 can access the data mart through the API interface, so as to process the data obtained from the data mart and send it to the display device 200 for display.
  • the data processing system includes multiple sets of data with different contents and/or storage structures, and stores them in the distributed storage device 400 .
  • the ETL module in the distributed storage device 400 can extract raw data from multiple data sources into the data processing system to form the first data layer (for example, the data lake layer DL), so as to reduce the impact on the product
  • the load of production equipment and manufacturing system is convenient for data reading of subsequent analysis equipment.
  • the data source can be the original data of the production equipment, which is stored in the corresponding manufacturing system, such as YMS (Yield Management System, yield management system), FDC (Fault Detection&Classification, error detection and classification), MES (Manufacturing Execution System, manufacturing execution system) and other system relational databases (such as Oracle, Mysql, etc.).
  • the above-mentioned ETL module refers to computer program logic configured to provide functions such as extracting, transforming or loading data.
  • the ETL module is stored on one or more storage nodes in the distributed network, loaded into one or more memories in the distributed network, and processed by one or more memory nodes in the distributed network device execution.
  • the data lake layer in the distributed storage device 400 is a centralized HDFS (Hadoop Distributed File System, distributed file system) or KUDU database for storing any structured or unstructured data.
  • the data lake is configured to store the first set of data extracted by the ETL module from multiple data sources DS.
  • the first set of data has the same content as the original data.
  • the dimensions and attributes of the original data are saved in the first set of data.
  • the first set of data stored in the data lake includes dynamically updated data.
  • the dynamically updated data includes real-time updated data in a Kudu-based database, or periodically updated data in the Hadoop distributed file system.
  • periodically updated data stored in the Hadoop distributed file system is stored in Hive-based storage.
  • the dynamically updated data also includes real-time update data
  • the real-time update means the update below the minute level but does not include the minute update, so as to be different from the above-mentioned periodic update that means above the minute level and includes the minute update.
  • the distributed storage device 400 further includes a second data layer, such as a data warehouse.
  • a data warehouse includes an internal storage system characterized by providing data in an abstracted manner, which may include a table format or a view format, without exposing the file system.
  • a data warehouse can be implemented based on Hive.
  • the ETL module can extract, clean, convert or load the first set of data to form the second set of data.
  • the first set of data can be cleaned and standardized to form the second set of data.
  • the second set of data further includes statistical data, such as detection point count, maximum value, minimum value and average value of detection point parameter values, proportion of defects, and the like.
  • the distributed storage device 400 includes a third data layer, such as at least one data mart.
  • the data mart is a database of NoSQL type that can be used for computing processing.
  • the data mart is implemented based on Hbase.
  • the ETL module can also transform the second data to form a third set of data.
  • first set of data, the second set of data, and the third set of data can be stored and queried based on one or more data tables.
  • the process of converting the second set of data to form the third set of data may be importing data from the data warehouse (hive table) into the data mart (Hbase table).
  • a first table is generated in a data mart and a second table (eg, an external table) is generated in a data warehouse.
  • the first table and the second table are configured to be synchronized such that when data is written to the second table, the first table will be simultaneously updated to include the corresponding data.
  • the MapReduce module in Hadoop can be used as a distributed computing processing module for reading data written to a data warehouse. The data written to the data warehouse can then be written to the data mart.
  • data can be written to a data mart using the HBase-based API.
  • MapReduce module can generate HFile files and load them in batches (Bulkloaded) to the data mart.
  • the raw data collected by the plurality of data sources DS includes at least one of production history data, parameter data or detection parameter data.
  • Raw data can optionally contain dimensional information (time, factory, equipment, operator, map, chamber, card slot, etc.) duration, etc.).
  • Production history data information contains information on specific treatments that a product such as a panel or glass undergoes during manufacture. Examples of specific processes a product undergoes during manufacture include factories, processes, stations, equipment, chambers, slots, and operators.
  • Parametric data contains information on the specific environmental parameters and changes to which a product (such as a panel or glass) is subjected during manufacture.
  • specific environmental parameters and changes to which a product is subjected during manufacturing include ambient particulate conditions, equipment temperature, and equipment pressure, among others.
  • the detection parameter data includes the resistance, film thickness, threshold voltage, reflection pattern deviation, reverse cut-off current, etc. of the product detected based on the detection station.
  • the present data processing system integrates various business data (eg, data related to semiconductor electronic device manufacturing) into multiple data sources DS (eg, Oracle database).
  • the ETL module extracts data from multiple data sources into a data lake, for example using a data stack tool, SQOOP tool, kettle tool, Pentaho tool or DataX tool. Then, the data is cleaned, transformed and loaded into the data warehouse.
  • Data warehouse DW and data mart DMT utilize tools such as Kudu, Hive, and Hbase to store large amounts of data and analysis results.
  • the information generated in various stages of the manufacturing process is obtained by various sensors and inspection equipment, and then stored in multiple data sources DS, or calculated or analyzed by the data obtained by sensors and inspection equipment, and the results are calculated at this time And analysis results are also stored in multiple data sources DS.
  • the data synchronization (flow of data) among the various components of the data processing system is realized through the ETL module.
  • the ETL module is configured to obtain a parameter configuration template of the synchronization process, including network license and database port configuration, inflow database name and table name, outflow database name and table name, field correspondence, task type, scheduling cycle, etc.
  • the ETL module configures the parameters of the synchronization process based on the parameter configuration template.
  • the ETL module synchronizes the data and cleans the synchronized data based on the process configuration template.
  • the ETL module cleans the data through SQL statements to remove null values, remove outliers, and establish correlations between related tables.
  • the data synchronization task includes data synchronization between multiple data sources and the distributed storage device 400 , and data synchronization between various layers of the distributed storage device 400 (eg, data lake, data warehouse, or data mart).
  • the distributed storage device 400 may complete data extraction to the data lake in real time or offline.
  • offline mode data extraction tasks are scheduled periodically.
  • the extracted data may be stored in a storage device based on the Hadoop distributed file system (for example, a Hive-based database).
  • data extraction tasks can be performed by OGG (Oracle GoldenGate) combined with Apache Kafka.
  • OGG Online GoldenGate
  • Apache Kafka Apache Kafka
  • the extracted data can be stored in a Kudu-based database.
  • OGG reads log files in multiple data sources (eg, Oracle database) for add/remove data.
  • a front-end interface can perform display, query, and/or analysis based on data stored in a Kudu-based database.
  • the front-end interface may be based on data stored in any one or any combination of a Kudu-based database, a Hadoop distributed file system (e.g., an Apache Hive T -based database), and/or an Hbase-based database. Perform display, query and/or analysis.
  • short-term data e.g., generated over several months
  • long-term data e.g., all data generated over all cycles
  • is stored in Hadoop distributed files system for example, a Hive-based database.
  • the ETL module is configured to ingest data stored in a Kudu-based database into a Hadoop distributed file system (eg, a Hive-based database).
  • the data warehouse can be one or any combination of Kudu-based databases and Apache Hive-based databases.
  • the distributed storage device 400 may be one storage, multiple storages, or a general term for multiple storage elements.
  • the memory can include: random access memory (Random Access Memory, RAM), double data rate synchronous dynamic random access memory (Double Data Rate Synchronous Dynamic Random Access Memory, DDR SRAM), can also include non-volatile memory (non-volatile memory ), such as disk storage, flash memory (Flash), etc.
  • the display device 200 is used for displaying an interface, and can display processing results of the data processing device 300 .
  • the display device may be a display, or a product including a display, such as a TV, a computer (all-in-one or desktop), a computer, a tablet computer, a mobile phone, an electronic picture screen, and the like.
  • the display device may be any device that displays images, whether in motion (eg, video) or stationary (eg, still images), and whether text or text.
  • the described embodiments may be implemented in or associated with a variety of electronic devices such as, but not limited to, game consoles, television monitors, flat panel displays, computer monitors, automotive displays (e.g., odometer displays, etc.), navigators, cockpit controls and/or displays, electronic photographs, electronic billboards or signs, projectors, architectural structures, packaging and aesthetic structures (e.g., displays of images of pieces of jewelry), etc.
  • electronic devices such as, but not limited to, game consoles, television monitors, flat panel displays, computer monitors, automotive displays (e.g., odometer displays, etc.), navigators, cockpit controls and/or displays, electronic photographs, electronic billboards or signs, projectors, architectural structures, packaging and aesthetic structures (e.g., displays of images of pieces of jewelry), etc.
  • the data processing device 300 can be implemented by at least one server, and is used to implement the data processing method described in any of the following embodiments, that is, the data processing system implements the data processing method described in any subsequent embodiment
  • FIG. 2 is a flow chart showing a data processing method according to an exemplary embodiment. Referring to FIG. 2 , a data processing method includes steps 21 to 24 .
  • an HBase table is generated according to the first Hive table and the second Hive table, and the HBase table is used as a target data table;
  • the first Hive table includes production history data of each product, and the second Hive table Contains inspection parameter data for products that have already been inspected.
  • the data processing system can acquire the target data table.
  • the target data table can be stored in a preset location, such as local storage, cache or cloud, and is a target data table pre-processed by other servers.
  • the data processing system can read from the preset location
  • the target data table can be used directly.
  • the target data table may be generated by the data processing system in real time, and the generation process may match the first data layer, the second data layer, and the third data layer of the distributed storage device 400 in FIG. 1 .
  • the data processing system can obtain the main process site based on the detection data of the second Hive table; Then, based on the main process site, the data screening in the Hbase intermediate table obtains the Hbase intermediate table after screening; afterward, fusion The second Hive table and the filtered Hbase intermediate table are used to obtain the HBase table.
  • the data processing system can communicate with the production system according to the first time interval (such as every natural day) and extract the production history data (such as historical production history data and extract the production history data of the current day), that is, to obtain the production history data of the product (in an additional way), this loading process can be similar to the process in which the distributed storage device 400 reads the source data from the Oracle database to the data lake in Figure 1 match. Then, the data processing system can extract, clean, convert and load the production history data (including historical production history data and current day production history data), that is, ETL (Extract-Load-Transform) processing, and then store it in the first Hive table.
  • ETL Extract-Load-Transform
  • the data processing system can generate the Hbase intermediate table for the historical production history data of the row key (RowKey) according to the product production history data and/or product identification code (such as Glass ID) of the day in the first Hive table.
  • product identification code such as Glass ID
  • the data processing system can communicate with the detection system according to the second time interval (such as every natural day) and load the detection parameter data in the detection parameter data table. Next, take the loading of the test data of the day as an example.
  • the data processing system can store the test parameter data of the product in the second Hive table after ETL processing.
  • the data structure of the second Hive table is shown in Table 3.
  • the data processing system can read the detected data of the day in the second Hive table.
  • the abnormal value filtering operation can be performed on the parameters, wherein each parameter value corresponds to a value range, and the value range can be set according to the product or experience value, which is not limited here.
  • filtering parameter outliers through this value range can avoid the influence of outliers on subsequent calculation results, and make subsequent calculation results pay more attention to statistical trend values rather than values of special examples, which is more conducive to data analysis. Causes of poor positioning.
  • the data processing system can obtain the main process site list according to the detection parameter data in the second Hive table of the day, or obtain the main process site list uploaded by the user according to the product.
  • obtaining the main process site list through the detection parameter data of the day can reduce the amount of data processing and improve the efficiency of obtaining the list.
  • the data processing system obtains the main process site list uploaded by the user, and can use the user's experience in the product production process and data analysis experience such as poor positioning problems to obtain a more reliable site list, which is conducive to obtaining a relatively accurate list, and then has This is conducive to improving the accuracy of subsequent processing results and improving positioning efficiency.
  • the data processing system can obtain the intersection of the main process site in the Hbase intermediate table and the list of main process sites obtained based on the second Hive table, and obtain the main process site that produces the product and the corresponding detection data of each main process site.
  • the data processing system can also process the filtered data, such as obtaining statistical information of detection points, such as the maximum value, minimum value or average value of the detection data.
  • the relatively large data volume of production history data and detection parameter data can be used to obtain statistical information, and information that reflects the trend of parameters can be obtained, so as to use statistical information to assist in locating defects.
  • the data processing system can aggregate the second Hive table, the Hbase intermediate point, the above-mentioned main process site, the detection data corresponding to the main process site, the statistical information of the detection point, etc., to obtain the above-mentioned target data table.
  • the data structure of the above target data table is shown in Table 4.
  • step 22 target detection parameters to be processed are acquired, and the target detection parameters match the row keys of the target data table.
  • the data processing system may acquire target detection parameters.
  • the above-mentioned target detection parameters include time range, factory, product model, detection site and detection parameters, and may also include main process site, equipment, and unit.
  • the first parameter in this embodiment includes parameters such as time range, factory, product model, detection site and detection parameters, and is stored in the first specified column family (the column family is Info, and the column name is MasterStep).
  • the main process site, equipment, and unit can set target detection parameters according to requirements, so as to avoid the problem of excessive data volume caused by too many target detection parameters, which is conducive to improving query efficiency.
  • the filter shown in FIG. 4 can be displayed in the display device 200 shown in FIG. 1, and the user can select the target detection parameter in the filter, that is, the target detection parameter can be passed through the filter shown in FIG. to get.
  • the filter can adopt a cascade structure.
  • the detection parameters of each level are related to the upper and lower levels. After selecting the detection parameters of a certain level, all options of the detection parameters of the next level can be filtered out.
  • the value of the item is defaulted to select the first item under this option as the selected value, and the user can also choose other option values as the filter condition.
  • the first level is the time range, when the start time (such as May 1, 2021) and end time (such as May 20, 2021) are detected, the factories involved in production within this time range can be screened out (that is, the second level); when it is detected that the user selects a factory (such as ARRAY), the model of the product (such as BNA320WH5V401) that has been produced by the factory can be filtered out (that is, the third level).
  • the detection parameters such as RS_DATA
  • different detection parameters can represent different meanings, such as RS_DATA (area resistance), THICKNESS (film thickness), TP (reflecting graphic offset), Vth (threshold voltage), IOFF (reverse cut-off current) and so on. It is understandable that technicians can select different parameters according to requirements and finally combine different target detection parameters.
  • the time range in this example can be within one month (period). If it is detected that the time interval between the start time and the end time exceeds one month, a reminder message will be generated to remind the user that the time filled in exceeds the time range and needs to be reset. fill in. In this way, by setting the time range in this example, you can avoid too much data in each query, which is beneficial to improve query efficiency and ensure data validity.
  • multiple Iot IDs or Glass IDs can be entered in the input box of the Iot/Glass ID parameter, and the adjacent two IDs are separated by commas or spaces to achieve the effect of synchronous detection of multiple parameters , so as to improve the detection efficiency.
  • step 23 according to the matching of the target detection parameters and the row key of the target data table, the target detection data corresponding to the target detection parameters is obtained; the target detection data includes detection parameter data representing defective products.
  • the data processing system can obtain the target detection data according to the target detection parameters and the target data table, and according to the reference point of the reference product, determine the average of the detection data whose distance from the reference point is less than or equal to the set distance to obtain the target detection Data, specifically:
  • the acquisition of object detection data by the data processing system includes steps 51 to 53 .
  • step 51 the data processing system can match the target detection parameters with the row keys in the target data table to obtain the original detection data corresponding to the target detection parameters.
  • target detection parameters can include time range, factory, product model, detection site and detection parameters
  • each parameter can be sequentially matched with the row keys in the target data table to obtain corresponding data; and so on, to obtain Satisfy the original detection data corresponding to the above target detection parameters.
  • the data processing system may acquire the detection point data of the column family in the original detection data, and obtain the initial detection data corresponding to the target detection parameters.
  • the data processing system can obtain the detection point data of each product under the column family (for example, the column family is Info, and the column name is MasterStep) in the original detection data to obtain the initial detection data.
  • the initial detection data refers to the collection of detection data of the detection points on the (passed inspection) product, and the passed inspection means that the detection system has already detected it.
  • the data processing system may process the initial detection data to obtain target detection data.
  • the target detection parameters can also include parameters such as the main process site, equipment, and unit of the column family in the original detection data, so the data volume of the initial detection data obtained at this time is much smaller than that of the scheme shown in Figure 5 Quantity, that is, those skilled in the art can select the number of target detection parameters according to specific scenarios, so as to obtain initial detection data that meets the requirements.
  • the final initial detection data can be reduced, which is beneficial to reduce the amount of data in the subsequent processing process, thereby improving the efficiency of obtaining processing results.
  • the data processing system may acquire a reference product, and the reference product may be a product pre-specified by the user, or may be selected from past products. Considering the actual situation in the production process, the number of detection points for some products will be reduced, or the number of detection points for some products will be increased, that is, the number of detection points for each product is random. Therefore, in this step, the data processing system can obtain the number of detection points of each product, and then sort the number of detection points in all products in the initial detection data, so as to obtain the product with the largest number of detection points. The data processing system can use the product with the largest number of detection points as the reference product, and the products other than the reference product as the reference product.
  • the acquisition of target detection data by the data processing system in step 53 may include steps 61 to 64 .
  • the data processing system can respectively match the control detection point on the control product with the reference point on the reference product.
  • the data processing system can align the top corners or positioning marks of each control product and the reference product, and then match the detection points on the control product and the reference product in turn, that is, the first row of the control product
  • the control detection point in the first column matches the reference point position in the first row and the first column on the reference product
  • the control detection point in the first row and the second column of the reference product matches the reference point in the first row and the second column on the reference product Match, and so on, until the match is the detection point of the last row and last column.
  • the data processing system may acquire the distance between each control detection point of the control product and the corresponding reference point on the reference product.
  • the data processing system can use the Euclidean distance method to obtain the distance between two matching detection points, for example, calculate the control detection point of the first row and first column of the control product and the reference point of the first row and first column of the reference product The distance between matches, calculate the distance between the control detection point of the first row and the second column of the reference product and the reference point of the first row and the second column of the reference product, and so on, calculate the last row and the last column of the reference product The distance between the control test point and the reference point of the last row and last column of the reference product.
  • the data processing system can obtain, for each detection point on the reference product, the reference point and the comparison detection point whose distance from the comparison detection point is less than or equal to the preset distance threshold candidate detection data.
  • the data processing system can store a preset distance threshold, and the preset distance threshold ranges from 1 to 10 mm, which can be set according to specific scenarios. In an example, the preset distance threshold is 3mm.
  • the data processing system can compare the distance of each detection point with the preset distance threshold, so as to obtain the reference point and control detection point whose distance is less than or equal to the preset distance threshold, and the detection data of each detection point.
  • the data processing system may obtain the average value of the detection data, and use the average value of all detection points to generate a mean value distribution map, and use the mean value distribution map as the target detection data.
  • the data processing system can calculate the detection data of each detection point obtained in step 64, that is, the distance from each detection point on the reference product is less than or equal to the above-mentioned preset distance threshold. The average value of the test data of the test point (including this test point on the reference product).
  • the data processing system can generate an average value distribution diagram based on the coordinate data of each detection point on the reference product and the average value of the detection data corresponding to the detection point, as shown in Figure 7 or Figure 8 .
  • the mean value distribution diagram shown in Figure 7 is generated based on the target detection data shown in Figure 5
  • the mean value distribution diagram shown in Figure 8 is based on the target detection parameters shown in Figure 5 by adding the main process station and equipment The target detection data shown after the unit and other parameters are generated.
  • the first row of data in the mean distribution diagram shown in Figure 7 and Figure 8 is the abscissa of each detection point, the first column of data is the ordinate of each detection point, and the data in other parts is the The average value of the detection data of the point, and a blank indicates that there is no detection data at this coordinate.
  • the average value distribution diagrams shown in Fig. 7 and Fig. 8 are to establish a table in advance, and then combine the coordinate data of each detection point with the coordinate points in the table (that is, the uniquely determined spaces of the abscissa and ordinate) One-to-one correspondence, and the average value of each detection point is imported into the coordinate point, and finally the effects shown in Figure 7 and Figure 8 are obtained.
  • the adjacent abscissas or adjacent ordinates in the pre-established table can be set at equal intervals.
  • the number of coordinate points (or cells) in the table will be far more than the number of detection points (such as 70), then there will be a lot of spaces in the mean distribution map, which will affect the normal use of the mean distribution map.
  • the data processing system can also perform the following processing on the mean value distribution graph, referring to FIG. 9 , specifically including steps 91 and 92.
  • the data processing system can take each coordinate point in the mean value distribution diagram as the center and a preset length as the radius (such as 6mm) to obtain a circle, and obtain the nearest integer to the center of the circle.
  • the data processing system may combine the coordinate points in the circle to obtain a coordinate point corresponding to the circle; the coordinate data of the coordinate point is the integer.
  • FIG. 7 and FIG. 8 show the effect after the coordinates are merged, and the interval between the abscissa and the ordinate is no longer equal after the merge.
  • the abscissa -1670 to the abscissa -839 the difference between the two is 831mm; the abscissa -839 to the abscissa -9mm, the difference between the two is 830mm.
  • the number of spaces can be greatly reduced, and the effect that the number of coordinate points in the mean value distribution diagram shown in Figure 7 and Figure 8 is similar to the number of detection points can be achieved, and finally the result shown in Figure 7 and the mean value distribution graph shown in Figure 8, thereby facilitating the analysis of the mean value distribution graph.
  • outlier filtering process when the detection data of the same detection point of multiple products are all outliers, they need to be filtered out. At this time, blanks will also be formed in the mean distribution graph. Therefore, by detecting the location of the space in the mean distribution graph, it is possible to detect whether the corresponding detection point of the product is defective. Since the merging process illustrated in FIG. 9 can greatly reduce the number of blanks, the number of blanks to be checked can be reduced, which is beneficial to improve the efficiency of bad analysis.
  • the target detection data in this example also includes matching deviation data. See FIG. 10 , the acquisition includes steps 101 and 102:
  • the data processing system can acquire the maximum value and minimum value of the target detection data, and the average value corresponding to each detection point.
  • the above maximum and minimum values can be read from the structure shown in Table 4.
  • the data processing system may calculate the deviation data of each detection point according to the above-mentioned maximum value, minimum value and average value corresponding to each detection point, and the deviation data is used to assist data analysis. Among them, the calculation formula of the deviation data is shown in the following formula (1):
  • p i represents the deviation at each detection point
  • x min and x max are respectively the maximum and minimum values of all detection data on the product after matching the target detection parameters from the target data table
  • x i represents The detection data of a certain detection point.
  • the above-mentioned deviation data can be written into the above-mentioned average value distribution graph at the same time, so that the user can see the deviation data synchronously when seeing the average value of each detection point, and use the average value and deviation data to analyze whether the detection point is Bad points, and other uncertain analysis.
  • the data processing system can visualize the deviation data in a mean distribution graph.
  • the data processing system can convert the above-mentioned deviation data into a background color, and use the depth of the background color to represent the change trend and degree of deviation of the mean value of the product detection parameters. That is, the larger the deviation data is, the darker the color is, and the smaller the deviation data is, the lighter the color is.
  • the degree of color depth can be set according to rgba(0,255,0,p) and rgba(255,0,0,p) functions, as shown in Figure 8 using rgba(0,255,0,p) to represent The effect of offset data, FIG. 9 shows the effect of using rgba(255,0,0,p) to represent the offset data.
  • Technicians can choose an appropriate background color according to the specific scene. Considering the requirements of the patent application documents, gray scale is used in this disclosure to achieve the same effect.
  • step 24 the target detection data is presented for data analysis based on the target detection data.
  • the data processing system may send the above target detection data to the display device, and the display device will display the above target detection data, so that the user can perform defect analysis according to the target detection data. That is to say, in this embodiment, the parameter distribution trend of product detection parameters in different main production processes can be quantified so as to quickly locate the cause of the failure, thereby enriching the failure diagnosis and analysis method.
  • the above-mentioned filter may also be provided with a download button (such as a similarity analysis button), and when the filter detects that the user triggers the operation of the download button, it may send a download request to the data processing system.
  • a download button such as a similarity analysis button
  • the data processing system can import the target detection data into a preset table, and the preset table can be realized by using an EXCLE table. Then, the data processing system can download the preset form to a designated location, so that the user can perform data analysis according to the target detection data in the preset form.
  • the above-mentioned target parameter data can be selected according to the product process and historical experience, so that the target parameter data can be matched with the product detection scene, which is conducive to reducing the difficulty of analyzing the bad process; and, using the target detection data to perform Defective analysis can find information such as the location and process of product defects, which improves the efficiency and accuracy of analysis and detection. That is to say, in this embodiment, through the multi-faceted analysis of the product detection parameters, the analysis results have a theoretical basis and experience judgment in the professional field, which can help the user to locate the cause of the failure more accurately and quickly. At the same time, the operation in this embodiment is simple and convenient, which is beneficial for users to control the flow of data processing, thereby improving production efficiency.
  • An embodiment of the present disclosure also provides a data processing system, see FIG. 11 , including:
  • the target table acquisition module 111 is used to generate the HBase table according to the first Hive table and the second Hive table, and use the HBase table as the target data table;
  • a target parameter acquisition module 112 configured to acquire preset target detection parameters
  • Target data acquisition module 113 configured to match the row key of the target data table according to the target detection parameters, and obtain target detection data corresponding to the target detection parameters; the target detection data includes detection parameter data representing defective products;
  • the target data display module 114 is configured to display the target detection data, so as to perform data analysis according to the target detection data.
  • a data processing system includes a data processing device.
  • the data processing device includes:
  • memory 122 for storing computer programs executable by said processor
  • the processor is configured to execute the computer program in the memory, so as to realize the methods described in FIGS. 2 to 10 .
  • it also includes a display device and a distributed storage device
  • the distributed storage device is configured to acquire and store production history data and detection parameter data
  • the display device is configured to display the target detection data.
  • a computer-readable storage medium is also provided, and when the executable computer program in the storage medium is executed by a processor, the methods described in FIGS. 2 to 10 can be implemented.
  • all or part of them may be implemented by software, hardware, firmware or any combination thereof.
  • a software program When implemented using a software program, it may be implemented in whole or in part in the form of a computer program product.
  • the computer program product includes one or more computer instructions. When the computer instructions are loaded and executed on the computer, the processes or functions according to the embodiments of the present disclosure will be generated in whole or in part.
  • the computer can be a general purpose computer, special purpose computer, computer network, or other programmable device.
  • the computer instructions may be stored in a computer readable storage medium.
  • the computer-readable storage medium may be any available medium that can be accessed by a computer, or a data storage device including a server, a data center, and the like integrated with one or more available media.
  • the available medium may be a magnetic medium (for example, a floppy disk, a magnetic disk, a magnetic tape), an optical medium (for example, a digital video disc (digital video disc, DVD)), or a semiconductor medium (for example, a solid state drive (solid state drives, SSD)), etc. .
  • a magnetic medium for example, a floppy disk, a magnetic disk, a magnetic tape
  • an optical medium for example, a digital video disc (digital video disc, DVD)
  • a semiconductor medium for example, a solid state drive (solid state drives, SSD)
  • Some embodiments of the present disclosure provide a computer-readable storage medium (for example, a non-transitory computer-readable storage medium), where computer program instructions are stored in the computer-readable storage medium, and when the computer program instructions are run on a processor , so that the computer executes the data processing method as described in any one of the above embodiments, for example, one or more steps in the data processing method.
  • a computer-readable storage medium for example, a non-transitory computer-readable storage medium
  • the above-mentioned computer-readable storage medium may include, but is not limited to: a magnetic storage device (for example, a hard disk, a floppy disk, or a magnetic tape, etc.), an optical disk (for example, a CD (Compact Disk, a compact disk), a DVD (Digital Versatile Disk, Digital Versatile Disk), etc.), smart cards and flash memory devices (for example, EPROM (Erasable Programmable Read-Only Memory, Erasable Programmable Read-Only Memory), card, stick or key drive, etc.).
  • Various computer-readable storage media described in this disclosure can represent one or more devices and/or other machine-readable storage media for storing information.
  • the term "machine-readable storage medium" may include, but is not limited to, wireless channels and various other media capable of storing, containing and/or carrying instructions and/or data.
  • the processor mentioned in the embodiment of the present disclosure may be a central processing unit (Central Processing Unit, CPU), a general-purpose processor, a digital signal processor (Digital Signal Processor, DSP), an application-specific integrated circuit (Application-Specific Integrated Circuit, ASIC), Field Programmable Gate Array (Field Programmable Gate Array, FPGA) or other programmable logic devices, transistor logic devices, hardware components or any combination thereof. It may implement or execute the various illustrative logical blocks and modules described in connection with this disclosure.
  • the processor can also be a combination of computing functions, for example, a combination of one or more microprocessors, a combination of DSP and a microprocessor, and so on.
  • the memory mentioned in the embodiments of the present disclosure may be Random Access Memory (Random Access Memory, RAM), flash memory, Read Only Memory (Read Only Memory, ROM), Erasable Programmable ROM (Erasable Programmable ROM) , EPROM), Electrically Erasable Programmable Read-Only Memory (Electrically Erasable EPROM, EEPROM), register, hard disk, removable hard disk, CD-ROM (CD-ROM) or any other form of storage medium well known in the art.
  • RAM Random Access Memory
  • ROM Read Only Memory
  • ROM Erasable Programmable ROM
  • EPROM Erasable Programmable Read-Only Memory
  • register hard disk, removable hard disk
  • CD-ROM CD-ROM

Abstract

The present disclosure relates to a data processing method and system, and a computer-readable storage medium. The method comprises: generating an HBase table according to a first Hive table and a second Hive table, and taking the HBase table as a target data table, wherein the first Hive table comprises production record data of each product, and the second Hive table comprises test parameter data of a product which has been tested; acquiring a target test parameter to be processed, wherein the target test parameter matches a row key of the target data table; according to the target test parameter matching the row key of the target data table, obtaining target test data corresponding to the target test parameter, wherein the target test data comprises test parameter data representing that a product is defective; and displaying the target test data, so as to perform data defect analysis according to the target test data. In the present embodiment, target parameter data matches a product test scenario, thereby facilitating the reduction of the difficulty of data analysis, and improving the efficiency and accuracy of analysis and testing.

Description

数据处理方法、系统、计算机可读存储介质Data processing method, system, computer-readable storage medium 技术领域technical field
本公开涉及数据处理技术领域,尤其涉及一种数据处理方法、系统、计算机可读存储介质。The present disclosure relates to the technical field of data processing, and in particular, to a data processing method, system, and computer-readable storage medium.
背景技术Background technique
目前,在工业产品的生产领域,不同的产品会采用不同工艺,通过不同设备和不同人员的操作,其中任意环节的细微问题都会使最终生成的工业产品出现问题,因此需要记录每个工业产品的生产流程,以为后续产品不良分析提供依据。At present, in the production field of industrial products, different products will adopt different processes, and through the operation of different equipment and different personnel, subtle problems in any link will cause problems in the final industrial products. Therefore, it is necessary to record the details of each industrial product. The production process provides a basis for subsequent product failure analysis.
发明内容Contents of the invention
本公开提供一种数据处理方法、系统、计算机可读存储介质,以解决相关技术的不足。The present disclosure provides a data processing method, system, and computer-readable storage medium to solve the deficiencies of related technologies.
根据本公开实施例的第一方面,提供一种数据处理方法,包括:According to a first aspect of an embodiment of the present disclosure, a data processing method is provided, including:
根据第一Hive表和第二Hive表生成HBase表,并将所述HBase表作为目标数据表;所述第一Hive表包括各个产品的生产履历数据,所述第二Hive表包括已经检测过的产品的检测参数数据;Generate the HBase table according to the first Hive table and the second Hive table, and use the HBase table as the target data table; the first Hive table includes the production history data of each product, and the second Hive table includes the detected Product testing parameter data;
获取待处理的目标检测参数,所述目标检测参数与所述目标数据表的行键匹配;Obtaining target detection parameters to be processed, where the target detection parameters match the row keys of the target data table;
根据所述目标检测参数匹配所述目标数据表的行键,获得所述目标检测参数对应的目标检测数据;所述目标检测数据包括表征产品不良的检测参数数据;Matching the row keys of the target data table according to the target detection parameters to obtain target detection data corresponding to the target detection parameters; the target detection data includes detection parameter data representing defective products;
展示所述目标检测数据,以根据所述目标检测数据进行数据分析。Displaying the target detection data for data analysis based on the target detection data.
可选地,根据第一Hive表和第二Hive表生成HBase表,包括:Optionally, generate an HBase table according to the first Hive table and the second Hive table, including:
基于预设第一时间间隔抽取所述生产履历信息并存储到所述第一Hive表中;以及基于预设第二时间间隔抽取所述检测参数数据并存储到所述第二Hive表中;Extracting the production history information based on a preset first time interval and storing it in the first Hive table; and extracting the detection parameter data based on a preset second time interval and storing it in the second Hive table;
将所述第一Hive转化为以产品识别码为行键的Hbase中间表;The first Hive is converted into the Hbase intermediate table with the product identification code as the row key;
融合所述第二Hive表和所述Hbase中间表,得到所述HBase表。Fusing the second Hive table and the Hbase intermediate table to obtain the HBase table.
可选地,融合所述第二Hive表和所述Hbase中间表,得到所述HBase表,包括:Optionally, merging the second Hive table and the Hbase intermediate table to obtain the HBase table includes:
基于所述第二Hive表的检测数据获取主工序站点;Obtaining the main process site based on the detection data of the second Hive table;
基于所述主工序站点对所述Hbase中间表中的数据筛选得到筛选后的Hbase中间表;The Hbase intermediate table after screening is obtained based on the data screening in the Hbase intermediate table based on the main process site;
融合所述第二Hive表和筛选后的Hbase中间表,得到所述HBase表。The second Hive table is fused with the filtered Hbase intermediate table to obtain the HBase table.
可选地,获得所述目标检测参数对应的目标检测数据,包括:Optionally, obtaining target detection data corresponding to the target detection parameters includes:
根据基准产品的基准点位,确定以距离基准点位小于或者等于设定距离的检测数据求平均得到目标检测数据;According to the reference point of the reference product, it is determined to obtain the target detection data by averaging the detection data whose distance from the reference point is less than or equal to the set distance;
基准产品为检测点数最多的产品。The benchmark product is the product with the most detection points.
可选地,根据基准产品的基准点位,确定以距离基准点位小于或者等于设定距离的检测数据求平均得到目标检测数据,包括:Optionally, according to the reference point of the reference product, it is determined that the target detection data is obtained by averaging the detection data whose distance from the reference point is less than or equal to the set distance, including:
分别匹配所述对照产品上的对照检测点与所述基准产品上的基准点位;Respectively match the control detection point on the control product with the reference point on the reference product;
获取所述对照产品的各对照检测点和所述基准产品上对应基准点位之间的距离;Obtain the distance between each control detection point of the reference product and the corresponding reference point on the reference product;
针对所述基准产品上的各基准点位,获取对照检测点距离小于或者等于预设距离阈值时基准点位和对照检测点的候选检测数据;For each reference point on the reference product, obtain the candidate detection data of the reference point and the reference detection point when the distance between the comparison detection point is less than or equal to the preset distance threshold;
获取所述候选检测数据的平均值,以及利用所有检测点的平均值生成均值分布图, 并将所述均值分布图作为所述目标检测数据。Obtain the average value of the candidate detection data, and use the average value of all detection points to generate a mean value distribution map, and use the mean value distribution map as the target detection data.
可选地,还包括获取基准产品的步骤,具体包括:Optionally, a step of obtaining a benchmark product is also included, including:
获取所述目标数据表中各产品的检测点数量,得到检测点数量最多的产品;并将所述检测点数量最多的产品作为基准产品,所述基准产品之外的产品作为对照产品。Obtain the number of detection points of each product in the target data table, and obtain the product with the largest number of detection points; and use the product with the largest number of detection points as a reference product, and the products other than the reference product as reference products.
可选地,所述方法还包括对所述目标检测数据作数据后处理的步骤,具体包括:Optionally, the method further includes the step of post-processing the target detection data, including:
以所述均值分布图中各坐标点为圆心,以预设长度为半径得到一个圆形,获取距离所述圆心最近的整数;Taking each coordinate point in the mean value distribution diagram as the center of a circle, taking a preset length as a radius to obtain a circle, and obtaining an integer closest to the center of the circle;
合并所述圆形内的坐标点,得到所述圆形对应的一个坐标点;所述坐标点的坐标数据为所述整数将所述最近的整数作为所述圆形内各坐标点合并后的坐标。Merge the coordinate points in the circle to obtain a coordinate point corresponding to the circle; the coordinate data of the coordinate point is the integer, and the nearest integer is used as the result of merging the coordinate points in the circle. coordinate.
可选地,所述目标检测数据还包括与之匹配的偏差数据,所述方法还包括:Optionally, the target detection data also includes matching deviation data, and the method further includes:
获取所述目标检测数据中产品的检测数据的最大值和最小值,以及各检测点对应的平均值;Obtain the maximum value and minimum value of the detection data of the product in the target detection data, and the average value corresponding to each detection point;
根据所述最大值、所述最小值和各检测点对应的平均值计算各检测点的偏差数据。The deviation data of each detection point is calculated according to the maximum value, the minimum value and the average value corresponding to each detection point.
可选地,所述方法还包括:Optionally, the method also includes:
将所述偏差数据在所述均值分布图内进行可视化显示。Visualizing the deviation data in the mean distribution graph.
可选地,所述目标检测参数通过预设的筛选器获取,所述筛选器上设置有下载按键,所述方法还包括:Optionally, the target detection parameters are obtained through a preset filter, and a download button is set on the filter, and the method further includes:
当检测到触发所述下载按键的操作时,将所述目标检测数据导入预设表格;When an operation triggering the download button is detected, importing the target detection data into a preset table;
将所述预设表格下载到指定位置,以使用户根据所述预设表格中的目标检测数据进行数据分析。The preset table is downloaded to a designated location, so that the user can perform data analysis according to the target detection data in the preset table.
根据本公开实施例的第二方面,提供一种数据处理系统,包括:According to a second aspect of an embodiment of the present disclosure, a data processing system is provided, including:
目标表获取模块,用于根据第一Hive表和第二Hive表生成HBase表,并将所述HBase表作为目标数据表;The target table acquisition module is used to generate the HBase table according to the first Hive table and the second Hive table, and use the HBase table as the target data table;
目标参数获取模块,用于获取预设的目标检测参数;目标数据获取模块,用于根据所述目标检测参数匹配所述目标数据表的行键,获得所述目标检测参数对应的目标检测数据;所述目标检测数据包括表征产品不良的检测参数数据;A target parameter acquisition module, configured to acquire preset target detection parameters; a target data acquisition module, configured to match the row keys of the target data table according to the target detection parameters, and obtain target detection data corresponding to the target detection parameters; The target detection data includes detection parameter data representing defective products;
目标数据展示模块,用于展示所述目标检测数据,以根据所述目标检测数据进行数据分析。The target data display module is used to display the target detection data, so as to perform data analysis according to the target detection data.
根据本公开实施例的第三方面,提供一种数据处理系统,包括数据处理装置,所述数据处理装置包括:According to a third aspect of an embodiment of the present disclosure, a data processing system is provided, including a data processing device, and the data processing device includes:
处理器;processor;
用于存储所述处理器可执行的计算机程序的存储器;memory for storing a computer program executable by said processor;
其中,所述处理器被配置为执行所述存储器中的计算机程序,以实现如上述的方法。Wherein, the processor is configured to execute the computer program in the memory, so as to realize the above-mentioned method.
可选地,还包括显示装置和分布式存储装置;Optionally, a display device and a distributed storage device are also included;
所述分布式存储装置,被配置为获取并存储生产履历数据和检测参数数据;The distributed storage device is configured to acquire and store production history data and detection parameter data;
所述显示装置,被配置为显示所述目标检测数据。The display device is configured to display the target detection data.
根据本公开实施例的第四方面,提供一种计算机可读存储介质,当所述存储介质中的可执行的计算机程序由数据处理装置执行时,能够实现如上述的方法。According to a fourth aspect of the embodiments of the present disclosure, a computer-readable storage medium is provided, and when an executable computer program in the storage medium is executed by a data processing device, the above-mentioned method can be implemented.
本公开的实施例提供的技术方案可以包括以下有益效果:The technical solutions provided by the embodiments of the present disclosure may include the following beneficial effects:
由上述实施例可知,本公开实施例提供的方案中可以根据第一Hive表和第二Hive表生成HBase表,并将所述HBase表作为目标数据表;所述第一Hive表包括各个产品的生产履历数据,所述第二Hive表包括已经检测过的产品的检测参数数据;获取待处理的目标检测参数,所述目标检测参数与所述目标数据表的行键匹配;根据所述目标检测参数匹配所述目标数据表的行键,获得所述目标检测参数对应的目标检测数据;所述目标检测数据包括表征产品不良的检测参数数据;展示所述目标检测数据,以根据所述目标检测数据进行数据分析。本实施例中,上述目标参数数据可以根据产品工艺过程和 历史经验来选择出来,使目标参数数据与产品检测场景相匹配,有利于降低分析不良过程的难度;并且,利用目标检测数据进行不良分析,可以发现产品不良所发生的位置和工序等信息,提高了分析检测的效率和准确性。It can be seen from the above embodiments that in the solution provided by the embodiments of the present disclosure, an HBase table can be generated according to the first Hive table and the second Hive table, and the HBase table can be used as the target data table; the first Hive table includes the Production history data, the second Hive table includes the detection parameter data of the product that has been detected; obtains the target detection parameter to be processed, and the target detection parameter matches the row key of the target data table; according to the target detection The parameter matches the row key of the target data table, and the target detection data corresponding to the target detection parameter is obtained; the target detection data includes detection parameter data representing defective products; the target detection data is displayed to detect Data for data analysis. In this embodiment, the above-mentioned target parameter data can be selected according to the product process and historical experience, so that the target parameter data can be matched with the product detection scene, which is conducive to reducing the difficulty of analyzing the defective process; and, using the target detection data for defective analysis , can find information such as the location and process of product defects, which improves the efficiency and accuracy of analysis and detection.
应当理解的是,以上的一般描述和后文的细节描述仅是示例性和解释性的,并不能限制本公开。It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the present disclosure.
附图说明Description of drawings
此处的附图被并入说明书中并构成本说明书的一部分,示出了符合本公开的实施例,并与说明书一起用于解释本公开的原理。The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description serve to explain the principles of the disclosure.
图1是根据一示例性实施例示出的一种数据处理系统的框图。Fig. 1 is a block diagram of a data processing system according to an exemplary embodiment.
图2是根据一示例性实施例示出的一种数据处理方法的流程图。Fig. 2 is a flow chart showing a data processing method according to an exemplary embodiment.
图3是根据一示例性实施例示出的获取目标数据表的流程图。Fig. 3 is a flow chart of acquiring a target data table according to an exemplary embodiment.
图4是根据一示例性实施例示出的一种筛选器的效果示意图。Fig. 4 is a schematic diagram showing the effect of a filter according to an exemplary embodiment.
图5是根据一示例性实施例示出的一种获取目标检测数据的流程图。Fig. 5 is a flow chart of acquiring target detection data according to an exemplary embodiment.
图6是根据一示例性实施例示出的一种获取目标检测数据的流程图。Fig. 6 is a flow chart of acquiring target detection data according to an exemplary embodiment.
图7是根据一示例性实施例示出的一种均值分布图的效果示意图。Fig. 7 is a schematic diagram showing the effect of a mean value distribution diagram according to an exemplary embodiment.
图8是根据一示例性实施例示出的另一种均值分布图的效果示意图。Fig. 8 is a schematic diagram showing the effect of another mean distribution graph according to an exemplary embodiment.
图9是根据一示例性实施例示出的合并坐标点的流程图。Fig. 9 is a flow chart of merging coordinate points according to an exemplary embodiment.
图10是根据一示例性实施例示出的获取偏差数据的流程图。Fig. 10 is a flow chart of acquiring deviation data according to an exemplary embodiment.
图11是根据一示例性实施例示出的一种数据处理装置的框图。Fig. 11 is a block diagram of a data processing device according to an exemplary embodiment.
图12是根据一示例性实施例示出的一种服务器的框图。Fig. 12 is a block diagram of a server according to an exemplary embodiment.
具体实施方式Detailed ways
这里将详细地对示例性实施例进行说明,其示例表示在附图中。下面的描述涉及附图时,除非另有表示,不同附图中的相同数字表示相同或相似的要素。以下示例性所描述的实施例并不代表与本公开相一致的所有实施例。相反,它们仅是与如所附权利要求书中所详述的、本公开的一些方面相一致的装置例子。需要说明的是,在不冲突的情况下,下述的实施例及实施方式中的特征可以相互组合。Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numerals in different drawings refer to the same or similar elements unless otherwise indicated. The exemplary described embodiments below do not represent all embodiments consistent with the present disclosure. Rather, they are merely examples of devices consistent with aspects of the present disclosure as recited in the appended claims. It should be noted that, in the case of no conflict, the features in the following embodiments and implementation manners can be combined with each other.
目前,在工业产品的生产领域,不同的产品会采用不同工艺,通过不同设备和不同人员的操作,其中任意环节的细微问题都会使最终生成的工业产品出现问题,因此需要记录每个工业产品的生产流程,以为后续产品不良分析提供依据。以半导体显示行业为例,在生产显示面板的过程中,玻璃基板Glass会经过不同的生产工序和不同的设备。受到各生产工序或设备等客观因素的影响,最终的显示面板会产生各种不良。相关技术中通常由技术人员人力定位不良原因,使得处理时效和准确率无法满足生产需求。At present, in the production field of industrial products, different products will adopt different processes, and through the operation of different equipment and different personnel, subtle problems in any link will cause problems in the final industrial products. Therefore, it is necessary to record the details of each industrial product. The production process provides a basis for subsequent product failure analysis. Taking the semiconductor display industry as an example, in the process of producing display panels, the glass substrate Glass will go through different production processes and different equipment. Affected by objective factors such as various production processes or equipment, various defects may occur in the final display panel. In related technologies, technicians usually manually locate the cause of failure, so that the processing time and accuracy cannot meet the production requirements.
为解决上述技术问题,本公开实施例提供了一种数据处理方法,可以应用于数据处理系统。图1是根据一示例性实施例示出的一种数据处理系统的框图,参见图1,数据处理系统100包括数据处理装置300、显示装置200和分布式存储装置400。数据处理装置300分别与显示装置200和分布式存储装置400连接。其中分布式存储装置400包括数据湖层、数据仓库层(HIVE)和数据集市(HBASE)。用户可以通过显示装置200上的交互界面输入需要查询的参数,显示装置200还可以通过API接口访问数据集市。数据处理装置300可以通过API接口访问数据集市,从而将从数据集市获得的数据处理后发送给显示装置200进行显示。In order to solve the above technical problem, an embodiment of the present disclosure provides a data processing method, which can be applied to a data processing system. FIG. 1 is a block diagram of a data processing system according to an exemplary embodiment. Referring to FIG. 1 , the data processing system 100 includes a data processing device 300 , a display device 200 and a distributed storage device 400 . The data processing device 300 is connected to the display device 200 and the distributed storage device 400 respectively. The distributed storage device 400 includes a data lake layer, a data warehouse layer (HIVE) and a data mart (HBASE). The user can input the parameters to be queried through the interactive interface on the display device 200, and the display device 200 can also access the data mart through the API interface. The data processing device 300 can access the data mart through the API interface, so as to process the data obtained from the data mart and send it to the display device 200 for display.
继续参见图1,数据处理系统包括具有不同内容和/或存储结构的多组数据,并存储到分布式存储装置400之内。在一些实施例中,分布式存储装置400之内的ETL模块 可以将原始数据从多个数据源抽取到数据处理系统中,形成第一数据层(例如,数据湖层DL),以降低对产品生产设备和生产制造系统的负载,便于后续分析设备的数据读取。其中数据源可以是生产设备的原始数据,其存储在相应的生产制造系统中,如YMS(Yield Management System,良率管理系统)、FDC(Fault Detection&Classification,错误侦测及分类)、MES(Manufacturing Execution System,制造执行系统)等系统的关系型数据库(如Oracle、Mysql等)中。上述ETL模块是指被配置为提供诸如抽取、转换或加载数据的功能的计算机程序逻辑。在一些实施例中,ETL模块被存储在分布式网络中的一个或多个存储节点上,加载到分布式网络中的一个或多个存储器中,并且由分布式网络中的一个或多个处理器执行。Continuing to refer to FIG. 1 , the data processing system includes multiple sets of data with different contents and/or storage structures, and stores them in the distributed storage device 400 . In some embodiments, the ETL module in the distributed storage device 400 can extract raw data from multiple data sources into the data processing system to form the first data layer (for example, the data lake layer DL), so as to reduce the impact on the product The load of production equipment and manufacturing system is convenient for data reading of subsequent analysis equipment. The data source can be the original data of the production equipment, which is stored in the corresponding manufacturing system, such as YMS (Yield Management System, yield management system), FDC (Fault Detection&Classification, error detection and classification), MES (Manufacturing Execution System, manufacturing execution system) and other system relational databases (such as Oracle, Mysql, etc.). The above-mentioned ETL module refers to computer program logic configured to provide functions such as extracting, transforming or loading data. In some embodiments, the ETL module is stored on one or more storage nodes in the distributed network, loaded into one or more memories in the distributed network, and processed by one or more memory nodes in the distributed network device execution.
分布式存储装置400中的数据湖层是用于存储任何结构或非结构数据的集中式HDFS(Hadoop Distributed File System,分布式文件系统)或KUDU数据库。可选地,数据湖被配置为存储由ETL模块从多个数据源DS抽取的第一组数据。可选地,第一组数据和原始数据具有相同的内容。原始数据的维度和属性被保存在第一组数据中。在一些实施例中,数据湖存储的第一组数据包括动态更新的数据。可选地,动态更新的数据包括基于Kudu的数据库实时更新的数据,或在Hadoop分布式文件系统中周期性更新的数据。在一个示例中,存储在Hadoop分布式文件系统中周期性更新的数据存储在基于Hive的存储器中。在一个示例中,动态更新的数据还包括实时更新数据,实时更新表示分钟级以下,而不包括分钟的更新,以区别于上述表示分钟级以上且包括分钟更新的周期性更新。The data lake layer in the distributed storage device 400 is a centralized HDFS (Hadoop Distributed File System, distributed file system) or KUDU database for storing any structured or unstructured data. Optionally, the data lake is configured to store the first set of data extracted by the ETL module from multiple data sources DS. Optionally, the first set of data has the same content as the original data. The dimensions and attributes of the original data are saved in the first set of data. In some embodiments, the first set of data stored in the data lake includes dynamically updated data. Optionally, the dynamically updated data includes real-time updated data in a Kudu-based database, or periodically updated data in the Hadoop distributed file system. In one example, periodically updated data stored in the Hadoop distributed file system is stored in Hive-based storage. In an example, the dynamically updated data also includes real-time update data, and the real-time update means the update below the minute level but does not include the minute update, so as to be different from the above-mentioned periodic update that means above the minute level and includes the minute update.
在一些实施例中,分布式存储装置400还包括第二数据层,例如数据仓库。数据仓库包括内部存储系统,该内部存储系统具有以抽象方式提供数据而不暴露文件系统的特点,其中抽象方式可以包括表格格式或视图格式。数据仓库可以基于Hive实现。此时,ETL模块可以对第一组数据进行抽取、清洗、转换或加载等处理,以形成第二组数据。可选地,第一组数据经过清洗和标准化后可以形成第二组数据。在一些实施例中,第二组数据还包括统计数据,例如检测点计数、检测点参数值的最大值、最小值和平均值、缺陷占比等。In some embodiments, the distributed storage device 400 further includes a second data layer, such as a data warehouse. A data warehouse includes an internal storage system characterized by providing data in an abstracted manner, which may include a table format or a view format, without exposing the file system. A data warehouse can be implemented based on Hive. At this point, the ETL module can extract, clean, convert or load the first set of data to form the second set of data. Optionally, the first set of data can be cleaned and standardized to form the second set of data. In some embodiments, the second set of data further includes statistical data, such as detection point count, maximum value, minimum value and average value of detection point parameter values, proportion of defects, and the like.
在一些实施例中,分布式存储装置400包括第三数据层,例如至少一个数据集市。可选地,数据集市是存储可用于计算处理的NoSQL类型的数据库。可选地,数据集市基于Hbase实现。ETL模块还可以将第二数据进行转换以形成第三组数据。In some embodiments, the distributed storage device 400 includes a third data layer, such as at least one data mart. Optionally, the data mart is a database of NoSQL type that can be used for computing processing. Optionally, the data mart is implemented based on Hbase. The ETL module can also transform the second data to form a third set of data.
本领域技术人员可以理解的是,第一组数据、第二组数据、第三组数据、可以基于一张或多张数据表的形式进行数据的存储和查询。Those skilled in the art can understand that the first set of data, the second set of data, and the third set of data can be stored and queried based on one or more data tables.
在一些实施例中,第二组数据转换形成第三组数据的过程,可以是将数据仓库中的数据(hive表)导入到数据集市中(Hbase表)。在一个示例中,在数据集市中生成第一表,并且在数据仓库中生成第二表(例如,外部表)。第一表和第二表被配置为是同步的,以便当数据被写入第二表时,第一表将被同时更新以包括对应的数据。在另一示例中,Hadoop中的MapReduce模块可被用作分布式计算处理模块,以用于读取被写到数据仓库中的数据。然后,可以将写入到数据仓库中的数据写入到数据集市上。在一个示例中,可以使用基于HBase API将数据写入数据集市。在另一示例中,MapReduce模块一旦读取被写到数据集市上的数据,就可以生成HFile文件,批量加载(Bulkloaded)到数据集市上。In some embodiments, the process of converting the second set of data to form the third set of data may be importing data from the data warehouse (hive table) into the data mart (Hbase table). In one example, a first table is generated in a data mart and a second table (eg, an external table) is generated in a data warehouse. The first table and the second table are configured to be synchronized such that when data is written to the second table, the first table will be simultaneously updated to include the corresponding data. In another example, the MapReduce module in Hadoop can be used as a distributed computing processing module for reading data written to a data warehouse. The data written to the data warehouse can then be written to the data mart. In one example, data can be written to a data mart using the HBase-based API. In another example, once the MapReduce module reads the data written to the data mart, it can generate HFile files and load them in batches (Bulkloaded) to the data mart.
在一些实施例中,描述了数据处理系统的各种组件之间的数据流、数据转换和数据结构。在一些实施例中,由多个数据源DS收集的原始数据包括生产履历数据、参数数据或检测参数数据中的至少一个。原始数据可选地可以包含维度信息(时间、工厂、设备、操作者、Map、腔室、卡槽等)和属性信息(工厂位置、设备使用年限、坏点数、异常参数、能耗参数、处理持续时间等)。In some embodiments, data flows, data transformations, and data structures between various components of a data processing system are described. In some embodiments, the raw data collected by the plurality of data sources DS includes at least one of production history data, parameter data or detection parameter data. Raw data can optionally contain dimensional information (time, factory, equipment, operator, map, chamber, card slot, etc.) duration, etc.).
生产履历数据信息包含产品(例如面板或玻璃)在制造期间经过的特定处理的信息。 产品在制造期间经过的特定处理的示例包括工厂、工序、站点、设备、腔室、卡槽和操作者。Production history data information contains information on specific treatments that a product such as a panel or glass undergoes during manufacture. Examples of specific processes a product undergoes during manufacture include factories, processes, stations, equipment, chambers, slots, and operators.
参数数据包含产品(例如面板或玻璃)在制造期间经受的特定环境参数及其变化的信息。产品在制造期间经受的特定环境参数及其变化的示例包括环境颗粒条件、设备温度和设备压力等。Parametric data contains information on the specific environmental parameters and changes to which a product (such as a panel or glass) is subjected during manufacture. Examples of specific environmental parameters and changes to which a product is subjected during manufacturing include ambient particulate conditions, equipment temperature, and equipment pressure, among others.
检测参数数据包括基于检测站点检测到的产品的电阻、膜厚、阈值电压、反映图形偏移度、反向截止电流等。The detection parameter data includes the resistance, film thickness, threshold voltage, reflection pattern deviation, reverse cut-off current, etc. of the product detected based on the detection station.
在一个示例中,本数据处理系统将各种业务数据(例如,与半导体电子器件制造相关的数据)集成到多个数据源DS(例如,Oracle数据库)中。ETL模块例如使用数栈工具、SQOOP工具、kettle工具、Pentaho工具或DataX工具,将来自多个数据源的数据抽取到数据湖中。然后,数据被清洗、转换并加载到数据仓库中。数据仓库DW和数据集市DMT利用诸如Kudu、Hive和Hbase的工具存储大量数据和分析结果。In one example, the present data processing system integrates various business data (eg, data related to semiconductor electronic device manufacturing) into multiple data sources DS (eg, Oracle database). The ETL module extracts data from multiple data sources into a data lake, for example using a data stack tool, SQOOP tool, kettle tool, Pentaho tool or DataX tool. Then, the data is cleaned, transformed and loaded into the data warehouse. Data warehouse DW and data mart DMT utilize tools such as Kudu, Hive, and Hbase to store large amounts of data and analysis results.
在制造过程的各个阶段中生成的信息由各种传感器和检查设备获得,并且随后被保存在多个数据源DS中,或者由对传感器和检测设备获得数据进行计算或分析,这此时计算结果和分析结果也被保存在多个数据源DS中。通过ETL模块实现数据处理系统的各个部件之间的数据同步(数据的流动)。例如,ETL模块被配置为获得同步过程的参数配置模板,包括网络许可和数据库端口配置、流入数据库名称和表名称、流出数据库名称和表名称、字段对应关系、任务类型、调度周期等。ETL模块基于参数配置模板配置同步过程的参数。ETL模块同步数据,并基于过程配置模板清洗同步的数据。ETL模块通过SQL语句来清洗数据,以移除空值、移除离群值,并建立相关表之间的相关性。数据同步任务包括多个数据源和分布式存储装置400之间的数据同步,以及分布式存储装置400的各个层(例如,数据湖、数据仓库、或数据集市)之间的数据同步。The information generated in various stages of the manufacturing process is obtained by various sensors and inspection equipment, and then stored in multiple data sources DS, or calculated or analyzed by the data obtained by sensors and inspection equipment, and the results are calculated at this time And analysis results are also stored in multiple data sources DS. The data synchronization (flow of data) among the various components of the data processing system is realized through the ETL module. For example, the ETL module is configured to obtain a parameter configuration template of the synchronization process, including network license and database port configuration, inflow database name and table name, outflow database name and table name, field correspondence, task type, scheduling cycle, etc. The ETL module configures the parameters of the synchronization process based on the parameter configuration template. The ETL module synchronizes the data and cleans the synchronized data based on the process configuration template. The ETL module cleans the data through SQL statements to remove null values, remove outliers, and establish correlations between related tables. The data synchronization task includes data synchronization between multiple data sources and the distributed storage device 400 , and data synchronization between various layers of the distributed storage device 400 (eg, data lake, data warehouse, or data mart).
在另一示例中,分布式存储装置400可以实时地或离线地完成到数据湖的数据抽取。在离线模式中,周期性地调度数据抽取任务。可选地,在离线模式中,所抽取的数据可以存储在基于Hadoop分布式文件系统的存储装置(例如,基于Hive的数据库)中。在实时模式中,数据抽取任务可以由OGG(Oracle GoldenGate)结合Apache Kafka来执行。可选地,在实时模式中,所抽取的数据可以存储在基于Kudu的数据库中。OGG读取多个数据源(例如,Oracle数据库)中的日志文件,以获得添加/删除数据。在一个示例中,前端接口(如API接口)可基于存储在基于Kudu的数据库中的数据来执行显示、查询和/或分析。在另一示例中,前端接口可基于存储在基于Kudu的数据库、Hadoop分布式文件系统(例如,基于Apache Hive T的数据库)和/或基于Hbase的数据库中的任何一个或任何组合中的数据来执行显示、查询和/或分析。在另一示例中,(例如,在几个月内生成的)短期数据被存储在基于Kudu的数据库中,而长期数据(例如,在所有周期中生成的全部数据)被存储在Hadoop分布式文件系统(例如,基于Hive的数据库)中。在另一示例中,ETL模块被配置为将存储在基于Kudu的数据库中的数据抽取到Hadoop分布式文件系统(例如,基于Hive的数据库)中。 In another example, the distributed storage device 400 may complete data extraction to the data lake in real time or offline. In offline mode, data extraction tasks are scheduled periodically. Optionally, in the offline mode, the extracted data may be stored in a storage device based on the Hadoop distributed file system (for example, a Hive-based database). In real-time mode, data extraction tasks can be performed by OGG (Oracle GoldenGate) combined with Apache Kafka. Optionally, in real-time mode, the extracted data can be stored in a Kudu-based database. OGG reads log files in multiple data sources (eg, Oracle database) for add/remove data. In one example, a front-end interface (eg, an API interface) can perform display, query, and/or analysis based on data stored in a Kudu-based database. In another example, the front-end interface may be based on data stored in any one or any combination of a Kudu-based database, a Hadoop distributed file system (e.g., an Apache Hive T -based database), and/or an Hbase-based database. Perform display, query and/or analysis. In another example, short-term data (e.g., generated over several months) is stored in a Kudu-based database, while long-term data (e.g., all data generated over all cycles) is stored in Hadoop distributed files system (for example, a Hive-based database). In another example, the ETL module is configured to ingest data stored in a Kudu-based database into a Hadoop distributed file system (eg, a Hive-based database).
通过组合来自各种业务系统(MDW、YMS、MES、FDC等)的数据,基于数据湖来构建数据仓库。根据任务执行时间来划分从数据湖中抽取的数据,所述任务执行时间不完全匹配原始数据中的时间戳。另外,存在数据重复的可能性。因此,有必要通过对数据湖中的数据进行清洗和标准化来基于数据湖构建数据仓库,以满足上层应用对数据准确性和划分的需要。数据仓库中存储的数据表是通过对数据湖中的数据进行清洗和标准化而获得的。基于用户需求,对字段格式进行标准化,以保证数据仓库中的数据表与多个数据源中的数据表完全一致。同时,按日期或月份,根据时间以及其他字段划分数据,大大提高了查询效率,降低了运行存储器需求。数据仓库可以是基于Kudu的数据库和基于Apache Hive的数据库中的一个或任意组合。Build a data warehouse based on a data lake by combining data from various business systems (MDW, YMS, MES, FDC, etc.). Divide the data ingested from the data lake based on task execution times that do not exactly match the timestamps in the raw data. In addition, there is a possibility of data duplication. Therefore, it is necessary to clean and standardize the data in the data lake to build a data warehouse based on the data lake to meet the needs of upper-level applications for data accuracy and division. The data tables stored in the data warehouse are obtained by cleaning and standardizing the data in the data lake. Based on user requirements, the field format is standardized to ensure that the data tables in the data warehouse are completely consistent with the data tables in multiple data sources. At the same time, data is divided according to date or month, time and other fields, which greatly improves query efficiency and reduces running memory requirements. The data warehouse can be one or any combination of Kudu-based databases and Apache Hive-based databases.
在一实施例中,分布式存储装置400可以是一个存储器,可以是多个存储器,也可 以是多个存储元件的统称。例如,存储器可以包括:随机存储器(Random Access Memory,RAM),双倍速率同步动态随机存储器(Double Data Rate Synchronous Dynamic Random Access Memory,DDR SRAM),也可以包括非易失性存储器(non-volatile memory),例如磁盘存储器,闪存(Flash)等。In an embodiment, the distributed storage device 400 may be one storage, multiple storages, or a general term for multiple storage elements. For example, the memory can include: random access memory (Random Access Memory, RAM), double data rate synchronous dynamic random access memory (Double Data Rate Synchronous Dynamic Random Access Memory, DDR SRAM), can also include non-volatile memory (non-volatile memory ), such as disk storage, flash memory (Flash), etc.
显示装置200用于显示界面,可以显示数据处理装置300的处理结果。在一实施例中,显示装置可以是显示器,还可以是包含显示器的产品,例如电视机、电脑(一体机或台式机)、计算机、平板电脑、手机、电子画屏等。在一实施例中,该显示装置可以是显示不论运动(例如,视频)还是固定(例如,静止图像)的且不论文字还是的图像的任何装置。更明确地说,预期所述实施例可实施在多种电子装置中或与多种电子装置关联,所述多种电子装置例如(但不限于)游戏控制台、电视监视器、平板显示器、计算机监视器、汽车显示器(例如,里程表显示器等)、导航仪、座舱控制器和/或显示器、电子相片、电子广告牌或指示牌、投影仪、建筑结构、包装和美学结构(例如,对于一件珠宝的图像的显示器)等。The display device 200 is used for displaying an interface, and can display processing results of the data processing device 300 . In an embodiment, the display device may be a display, or a product including a display, such as a TV, a computer (all-in-one or desktop), a computer, a tablet computer, a mobile phone, an electronic picture screen, and the like. In an embodiment, the display device may be any device that displays images, whether in motion (eg, video) or stationary (eg, still images), and whether text or text. More specifically, it is contemplated that the described embodiments may be implemented in or associated with a variety of electronic devices such as, but not limited to, game consoles, television monitors, flat panel displays, computer monitors, automotive displays (e.g., odometer displays, etc.), navigators, cockpit controls and/or displays, electronic photographs, electronic billboards or signs, projectors, architectural structures, packaging and aesthetic structures (e.g., displays of images of pieces of jewelry), etc.
在一实施例中,数据处理装置300可以采用至少一台服务器实现,用于实现如下述任一实施例所述的数据处理方法,即数据处理系统实现后续任一实施例所述的数据处理方法,图2是根据一示例性实施例示出的一种数据处理方法的流程图。参见图2,一种数据处理方法,包括步骤21~步骤24。In an embodiment, the data processing device 300 can be implemented by at least one server, and is used to implement the data processing method described in any of the following embodiments, that is, the data processing system implements the data processing method described in any subsequent embodiment , FIG. 2 is a flow chart showing a data processing method according to an exemplary embodiment. Referring to FIG. 2 , a data processing method includes steps 21 to 24 .
在步骤21中,根据第一Hive表和第二Hive表生成HBase表,并将所述HBase表作为目标数据表;所述第一Hive表包括各个产品的生产履历数据,所述第二Hive表包括已经检测过的产品的检测参数数据。In step 21, an HBase table is generated according to the first Hive table and the second Hive table, and the HBase table is used as a target data table; the first Hive table includes production history data of each product, and the second Hive table Contains inspection parameter data for products that have already been inspected.
本实施例中,数据处理系统可以获取目标数据表。In this embodiment, the data processing system can acquire the target data table.
在一示例中,该目标数据表可以存储在预设位置,如本地存储器、缓存或云端,是由其他服务器预先处理完成的目标数据表,此场景下,数据处理系统可以从预设位置读取该目标数据表直接使用即可。In an example, the target data table can be stored in a preset location, such as local storage, cache or cloud, and is a target data table pre-processed by other servers. In this scenario, the data processing system can read from the preset location The target data table can be used directly.
在另一示例中,该目标数据表可以由数据处理系统实时生成,该生成过程可以与图1中分布式存储装置400的第一数据层、第二数据层和第三数据层相匹配。参见图3,数据处理系统可以基于第二Hive表的检测数据获取主工序站点;然后,基于所述主工序站点对所述Hbase中间表中的数据筛选得到筛选后的Hbase中间表;之后,融合所述第二Hive表和筛选后的Hbase中间表,得到所述HBase表。In another example, the target data table may be generated by the data processing system in real time, and the generation process may match the first data layer, the second data layer, and the third data layer of the distributed storage device 400 in FIG. 1 . Referring to Fig. 3, the data processing system can obtain the main process site based on the detection data of the second Hive table; Then, based on the main process site, the data screening in the Hbase intermediate table obtains the Hbase intermediate table after screening; afterward, fusion The second Hive table and the filtered Hbase intermediate table are used to obtain the HBase table.
例如,结合目标数据表包括各张产品的生产履历数据和检测参数数据,数据处理系统可以按照第一时间间隔(如每个自然日)与生产系统通信并抽取生产履历数据(如历史生产履历数据和抽取当日的当日生产履历数据),即(以追加的方式)获得产品的生产履历数据,此加载过程可以与图1中分布式存储装置400从Oracle数据库读取源数据到数据湖的过程相匹配。然后,数据处理系统可以对(包括历史生产履历数据和当日生产履历数据的)生产履历数据进行抽取、清洗、转换和装载即ETL(Extract-Load-Transform)处理后存储到第一Hive表。其中第一Hive表的数据结构如表1所示。For example, combined with the target data table including the production history data and detection parameter data of each product, the data processing system can communicate with the production system according to the first time interval (such as every natural day) and extract the production history data (such as historical production history data and extract the production history data of the current day), that is, to obtain the production history data of the product (in an additional way), this loading process can be similar to the process in which the distributed storage device 400 reads the source data from the Oracle database to the data lake in Figure 1 match. Then, the data processing system can extract, clean, convert and load the production history data (including historical production history data and current day production history data), that is, ETL (Extract-Load-Transform) processing, and then store it in the first Hive table. The data structure of the first Hive table is shown in Table 1.
表1 GLASS生产履历数据表Table 1 GLASS production history data sheet
字段名field name 意义significance
Timedaytime day 检测当天年月日Year, month, day of detection
Timekeytime key 写入时间write time
FactoryFactory 工厂factory
glass_idglass_id 玻璃基板识别码-Glass IDGlass substrate identification code-Glass ID
product_idproduct_id 产品IDProduct ID
step_idstep_id 工艺站点craft site
eqp_ideqp_id 工艺设备Process equipment
unit_idunit_id 设备单元equipment unit
数据处理系统可以根据第一Hive表中当日的产品生产履历数据和/或产品识别码(如Glass ID)为行键(RowKey)的历史生产履历数据生成Hbase中间表。其中Hbase中间表的数据结构如表2所示。The data processing system can generate the Hbase intermediate table for the historical production history data of the row key (RowKey) according to the product production history data and/or product identification code (such as Glass ID) of the day in the first Hive table. The data structure of the Hbase intermediate table is shown in Table 2.
表2 中间表(MAP:Track)的结构说明Table 2 Structure description of the intermediate table (MAP:Track)
Figure PCTCN2022070461-appb-000001
Figure PCTCN2022070461-appb-000001
表2中间表(MAP:Track)的结构说明继续参见图3,数据处理系统可以按照第二时间间隔(如每个自然日)与检测系统通信并加载检测参数数据表中的检测参数数据。后续以加载当日检测数据为例,数据处理系统可以将产品的检测参数数据进行ETL处理后存储到第二Hive表中。其中第二Hive表的数据结构如表3所示。The structural description of the middle table (MAP:Track) in Table 2 continues to refer to FIG. 3 , the data processing system can communicate with the detection system according to the second time interval (such as every natural day) and load the detection parameter data in the detection parameter data table. Next, take the loading of the test data of the day as an example. The data processing system can store the test parameter data of the product in the second Hive table after ETL processing. The data structure of the second Hive table is shown in Table 3.
表3 (twyield.tw_glass_map)检测参数数据表Table 3 (twyield.tw_glass_map) detection parameter data table
字段名field name 意义significance
timedaytimeday 检测当天年月日Year, month, day of detection
end_timeend_time 检测时间detection time
factoryfactory 工厂factory
lot_idlot_id 批次IDbatch ID
glass_idglass_id Glass IDGlass ID
step_idstep_id 检测站点detection site
eqp_ideqp_id 设备IDdevice ID
product_idproduct_id 产品IDProduct ID
xx 检测点Glass坐标-xDetection point Glass coordinates-x
ythe y 检测点Glass坐标-yDetection point Glass coordinates-y
itemitem 检测参数名称Detection parameter name
valuevalue 检测值detection value
typetype 参数类型Parameter Type
然后,数据处理系统可以读取第二Hive表中的当日检测数据。在一些实施例中,可以对参数进行异常值过滤操作,其中各参数取值对应一个取值范围,该取值范围可以根据产品来设定或者根据经验值设定,在此不作限定。本示例中通过该取值范围过滤参数异常值可以避免异常值对后续计算结果的影响,使后续计算结果更关注统计意义上的趋 势值而非特殊示例的取值,更利于进行数据分析,以定位产生不良的原因。Then, the data processing system can read the detected data of the day in the second Hive table. In some embodiments, the abnormal value filtering operation can be performed on the parameters, wherein each parameter value corresponds to a value range, and the value range can be set according to the product or experience value, which is not limited here. In this example, filtering parameter outliers through this value range can avoid the influence of outliers on subsequent calculation results, and make subsequent calculation results pay more attention to statistical trend values rather than values of special examples, which is more conducive to data analysis. Causes of poor positioning.
以及,数据处理系统可以根据第二Hive表中的当日检测参数数据获得主工序站点列表,或者获得用户根据产品上传的主工序站点列表。本示例中通过当日检测参数数据获取主工序站点列表可以减少数据处理量,提升获得列表的效率。并且,数据处理系统获得用户上传的主工序站点列表,可以利用用户对产品生产过程等经验和定位不良问题等数据分析的经验来获得更可靠的站点列表,有利于获得相对准确的列表,进而有利于提高后续获得处理结果的准确度,提升定位效率。And, the data processing system can obtain the main process site list according to the detection parameter data in the second Hive table of the day, or obtain the main process site list uploaded by the user according to the product. In this example, obtaining the main process site list through the detection parameter data of the day can reduce the amount of data processing and improve the efficiency of obtaining the list. In addition, the data processing system obtains the main process site list uploaded by the user, and can use the user's experience in the product production process and data analysis experience such as poor positioning problems to obtain a more reliable site list, which is conducive to obtaining a relatively accurate list, and then has This is conducive to improving the accuracy of subsequent processing results and improving positioning efficiency.
数据处理系统可以获取Hbase中间表中主工艺站点和基于第二Hive表获得的主工序站点列表的交集,得到生产该产品的主工艺站点,以及各主工艺站点对应的检测数据。The data processing system can obtain the intersection of the main process site in the Hbase intermediate table and the list of main process sites obtained based on the second Hive table, and obtain the main process site that produces the product and the corresponding detection data of each main process site.
继续参见图3,数据处理系统还可以对筛选出的数据进行处理,如获取检测点的统计信息,如检测数据的最大值、最小值或者平均值等。本示例中可以利用生产履历数据和检测参数数据的数据量比较大的特点,获取统计信息,能够获得反应参数趋势的信息,以利用统计消息来辅助定位不良。Continuing to refer to FIG. 3 , the data processing system can also process the filtered data, such as obtaining statistical information of detection points, such as the maximum value, minimum value or average value of the detection data. In this example, the relatively large data volume of production history data and detection parameter data can be used to obtain statistical information, and information that reflects the trend of parameters can be obtained, so as to use statistical information to assist in locating defects.
继续参见图3,数据处理系统可以对第二Hive表、Hbase中间点、上述主工艺站点、主工艺站点对应的检测数据、检测点的统计信息等进行聚合,得到上述目标数据表。上述目标数据表的数据结构如表4所示。Continuing to refer to FIG. 3 , the data processing system can aggregate the second Hive table, the Hbase intermediate point, the above-mentioned main process site, the detection data corresponding to the main process site, the statistical information of the detection point, etc., to obtain the above-mentioned target data table. The data structure of the above target data table is shown in Table 4.
Figure PCTCN2022070461-appb-000002
Figure PCTCN2022070461-appb-000002
表4 (MAP:Master)Hbase表结构说明Table 4 (MAP:Master) Hbase table structure description
在步骤22中,获取待处理的目标检测参数,所述目标检测参数与所述目标数据表的行键匹配。In step 22, target detection parameters to be processed are acquired, and the target detection parameters match the row keys of the target data table.
本实施例中,数据处理系统可以获取目标检测参数。上述目标检测参数包括时间范围、工厂、产品型号、检测站点和检测参数,还可以包括主工艺站点、设备、单元。需要说明的是,本实施例中的第一参数包括时间范围、工厂、产品型号、检测站点和检测参数等参数,在第一指定列族(列族为Info,列名为MasterStep)中保存了主工艺站点、设备和单元,可以根据需求设置目标检测参数,达到避免目标检测参数过多而引起数据量过大的问题,有利于提升查询效率。In this embodiment, the data processing system may acquire target detection parameters. The above-mentioned target detection parameters include time range, factory, product model, detection site and detection parameters, and may also include main process site, equipment, and unit. It should be noted that the first parameter in this embodiment includes parameters such as time range, factory, product model, detection site and detection parameters, and is stored in the first specified column family (the column family is Info, and the column name is MasterStep). The main process site, equipment, and unit can set target detection parameters according to requirements, so as to avoid the problem of excessive data volume caused by too many target detection parameters, which is conducive to improving query efficiency.
在一示例中,图1所示的显示装置200内可以显示图4所示的筛选器,用户可以在上述筛选器内选择目标检测参数,即上述目标检测参数可以通过图4所示的筛选器来获取。本示例中筛选器可以采用级联结构,每一级的检测参数与上下级相关,在选定某一级的检测参数后,下一级检测参数的所有选项可以筛选出来,实际应用中每一项的值都默认选择该选项下的第一项为选定值,用户也可任意选择其他选项值作为筛选条件。例如,在第一级为时间范围,当检测到开始时间(如2021年5月1日)和结束时间(如2021年5月20日)后,可以筛选出在此时间范围内参与生产的工厂(即第二级);当检测到用户选择工厂(如ARRAY)后,可以筛选出该工厂已生产产品的型号(如BNA320WH5V401)(即第三级)。当检测到用户已经输入生产产品的型号下的生产批次/产品ID(即第四级),以此类推,最终可以获得待查询的检测参数(如RS_DATA)。其中不同的检测参数可以代表不同的含义,如RS_DATA(面电阻)、THICKNESS(膜厚)、TP(反映图形偏移度)、Vth(阈值电压)、IOFF(反向截止电流)等。可理解的是,技术人员可以根据需求来选择不同的参数最终组合出不同的目标检测参数。In an example, the filter shown in FIG. 4 can be displayed in the display device 200 shown in FIG. 1, and the user can select the target detection parameter in the filter, that is, the target detection parameter can be passed through the filter shown in FIG. to get. In this example, the filter can adopt a cascade structure. The detection parameters of each level are related to the upper and lower levels. After selecting the detection parameters of a certain level, all options of the detection parameters of the next level can be filtered out. The value of the item is defaulted to select the first item under this option as the selected value, and the user can also choose other option values as the filter condition. For example, the first level is the time range, when the start time (such as May 1, 2021) and end time (such as May 20, 2021) are detected, the factories involved in production within this time range can be screened out (that is, the second level); when it is detected that the user selects a factory (such as ARRAY), the model of the product (such as BNA320WH5V401) that has been produced by the factory can be filtered out (that is, the third level). When it is detected that the user has input the production batch/product ID (that is, the fourth level) under the model of the produced product, and so on, the detection parameters (such as RS_DATA) to be queried can finally be obtained. Among them, different detection parameters can represent different meanings, such as RS_DATA (area resistance), THICKNESS (film thickness), TP (reflecting graphic offset), Vth (threshold voltage), IOFF (reverse cut-off current) and so on. It is understandable that technicians can select different parameters according to requirements and finally combine different target detection parameters.
需要说明的是,本示例中时间范围可以为一个月(周期)内,如果检测到开始时间与结束时间的时间间隔超过一个月,则生成提醒信息即提醒用户所填写的时间超过时间范围需要重新填写。这样,本示例中通过设置时间范围可以避免每次查询数据过多,有利于提高查询效率和保证数据的有效性。It should be noted that the time range in this example can be within one month (period). If it is detected that the time interval between the start time and the end time exceeds one month, a reminder message will be generated to remind the user that the time filled in exceeds the time range and needs to be reset. fill in. In this way, by setting the time range in this example, you can avoid too much data in each query, which is beneficial to improve query efficiency and ensure data validity.
需要说明的是,本示例中可以在Iot/Glass ID参数的输入框内输入多条Iot ID或者Glass ID,相邻两条ID之间用逗号或者空格隔开,达到同步检测多条参数的效果,从而提高检测效率。It should be noted that in this example, multiple Iot IDs or Glass IDs can be entered in the input box of the Iot/Glass ID parameter, and the adjacent two IDs are separated by commas or spaces to achieve the effect of synchronous detection of multiple parameters , so as to improve the detection efficiency.
在步骤23中,根据所述目标检测参数匹配和所述目标数据表的行键,获得所述目标检测参数对应的目标检测数据;所述目标检测数据包括表征产品不良的检测参数数据。In step 23, according to the matching of the target detection parameters and the row key of the target data table, the target detection data corresponding to the target detection parameters is obtained; the target detection data includes detection parameter data representing defective products.
本实施例中,数据处理系统可以根据目标检测参数和目标数据表获得目标检测数据,根据基准产品的基准点位,确定以距离基准点位小于或者等于设定距离的检测数据求平均得到目标检测数据,具体包括:In this embodiment, the data processing system can obtain the target detection data according to the target detection parameters and the target data table, and according to the reference point of the reference product, determine the average of the detection data whose distance from the reference point is less than or equal to the set distance to obtain the target detection Data, specifically:
在一示例中,参见图5,数据处理系统获取目标检测数据包括步骤51~步骤53。In an example, referring to FIG. 5 , the acquisition of object detection data by the data processing system includes steps 51 to 53 .
在步骤51中,数据处理系统可以匹配目标检测参数与目标数据表中的行键,得到目标检测参数对应的原始检测数据。In step 51, the data processing system can match the target detection parameters with the row keys in the target data table to obtain the original detection data corresponding to the target detection parameters.
考虑到目标检测参数可以包括时间范围、工厂、产品型号、检测站点和检测参数,本步骤中可以将每个参数与目标数据表中的行键依次匹配,得到相应的数据;依此类推,得到满足上述目标检测参数对应的原始检测数据。Considering that target detection parameters can include time range, factory, product model, detection site and detection parameters, in this step, each parameter can be sequentially matched with the row keys in the target data table to obtain corresponding data; and so on, to obtain Satisfy the original detection data corresponding to the above target detection parameters.
在步骤52中,数据处理系统可以获取所述原始检测数据中列族的检测点数据,得到所述目标检测参数对应的初始检测数据。数据处理系统可以获取原始检测数据中列族(如列族为Info,列名为MasterStep)下各产品的检测点数据,得到初始检测数据。其中,初始检测数据是指(过检)产品上检测点的检测数据的集合,其中过检是指检测系统已经检测过的意思。In step 52, the data processing system may acquire the detection point data of the column family in the original detection data, and obtain the initial detection data corresponding to the target detection parameters. The data processing system can obtain the detection point data of each product under the column family (for example, the column family is Info, and the column name is MasterStep) in the original detection data to obtain the initial detection data. Among them, the initial detection data refers to the collection of detection data of the detection points on the (passed inspection) product, and the passed inspection means that the detection system has already detected it.
在步骤53中,数据处理系统可以对所述初始检测数据进行处理,获得目标检测数据。In step 53, the data processing system may process the initial detection data to obtain target detection data.
需要说明的是,目标检测参数还可以包括原始检测数据中列族的主工艺站点、设备和单元等参数,那么此时所获得的初始检测数据的数据量要远小于图5所示方案的数据量,即本领域技术人员可以根据具体场景选择目标检测参数的数量,从而获得满足需求的初始检测数据。或者说,通过增加目标检测参数中所包含参数的数量,可以减少最终的初始检测数据,有利于减少后续处理过程的数据量,进而提升获取处理结果的效率。It should be noted that the target detection parameters can also include parameters such as the main process site, equipment, and unit of the column family in the original detection data, so the data volume of the initial detection data obtained at this time is much smaller than that of the scheme shown in Figure 5 Quantity, that is, those skilled in the art can select the number of target detection parameters according to specific scenarios, so as to obtain initial detection data that meets the requirements. In other words, by increasing the number of parameters included in the target detection parameters, the final initial detection data can be reduced, which is beneficial to reduce the amount of data in the subsequent processing process, thereby improving the efficiency of obtaining processing results.
在一实施例中,数据处理系统可以获取基准产品,该基准产品可以是用户预先指定的产品,还可以通过已过产品中选取。考虑到生产过程中的实际情况,部分产品的检测点数量会减少,或者部分产品的检测点数量增加,即各产品的检测点的数量是随机的。因此,本步骤中,数据处理系统可以获取各产品的检测点数量,然后对初始检测数据中所有产品中的检测点数量进行排序,可以得到检测点数量最多的产品。数据处理系统可以将检测点数量最多的产品作为基准产品,基准产品之外的产品作为对照产品。In an embodiment, the data processing system may acquire a reference product, and the reference product may be a product pre-specified by the user, or may be selected from past products. Considering the actual situation in the production process, the number of detection points for some products will be reduced, or the number of detection points for some products will be increased, that is, the number of detection points for each product is random. Therefore, in this step, the data processing system can obtain the number of detection points of each product, and then sort the number of detection points in all products in the initial detection data, so as to obtain the product with the largest number of detection points. The data processing system can use the product with the largest number of detection points as the reference product, and the products other than the reference product as the reference product.
本实施例中,参见图6,在步骤53中数据处理系统获取目标检测数据可以包括步骤61~步骤64。In this embodiment, referring to FIG. 6 , the acquisition of target detection data by the data processing system in step 53 may include steps 61 to 64 .
在步骤61中,数据处理系统可以分别匹配所述对照产品上的对照检测点与所述基准产品上的基准点位。以产品是玻璃基板为例,数据处理系统可以将各个对照产品和基准产品的各顶角对齐或者定位标识对准后,依次匹配对照产品和基准产品上的检测点,即对照产品的第一行第一列的对照检测点与基准产品上第一行第一列的基准点位匹配,对照产品的第一行第二列的对照检测点与基准产品上第一行第二列的基准点位匹配,依次类推,直至匹配为最后一行最后一列的检测点为止。In step 61, the data processing system can respectively match the control detection point on the control product with the reference point on the reference product. Taking the product as a glass substrate as an example, the data processing system can align the top corners or positioning marks of each control product and the reference product, and then match the detection points on the control product and the reference product in turn, that is, the first row of the control product The control detection point in the first column matches the reference point position in the first row and the first column on the reference product, and the control detection point in the first row and the second column of the reference product matches the reference point in the first row and the second column on the reference product Match, and so on, until the match is the detection point of the last row and last column.
在步骤62中,数据处理系统可以获取所述对照产品的各对照检测点和所述基准产品上对应基准点位之间的距离。数据处理系统可以采用欧式距离方式获取相匹配的两个检测点之间的距离,例如计算对照产品的第一行第一列的对照检测点与基准产品上第一行第一列的基准点位匹配之间的距离,计算对照产品的第一行第二列的对照检测点与基准产品上第一行第二列的基准点位之间的距离,依次类推,计算对照产品的最后一行最后一列的对照检测点和基准产品的最后一行最后一列的基准点位之间的距离。In step 62, the data processing system may acquire the distance between each control detection point of the control product and the corresponding reference point on the reference product. The data processing system can use the Euclidean distance method to obtain the distance between two matching detection points, for example, calculate the control detection point of the first row and first column of the control product and the reference point of the first row and first column of the reference product The distance between matches, calculate the distance between the control detection point of the first row and the second column of the reference product and the reference point of the first row and the second column of the reference product, and so on, calculate the last row and the last column of the reference product The distance between the control test point and the reference point of the last row and last column of the reference product.
在步骤63中,针对基准产品上的各基准点位,数据处理系统可以针对所述基准产品上的各检测点,获取对照检测点距离小于或者等于预设距离阈值的基准点位和对照检测点的候选检测数据。数据处理系统可以存储预设距离阈值,该预设距离阈值的范围为1~10mm,可以根据具体场景进行设置。在一示例中,预设距离阈值取值3mm。数据处理系统可以将各检测点的距离与预设距离阈值进行对比,从而获得距离小于或者等于上述预设距离阈值的基准点位和对照检测点,以及各检测点的检测数据。In step 63, for each reference point on the reference product, the data processing system can obtain, for each detection point on the reference product, the reference point and the comparison detection point whose distance from the comparison detection point is less than or equal to the preset distance threshold candidate detection data. The data processing system can store a preset distance threshold, and the preset distance threshold ranges from 1 to 10 mm, which can be set according to specific scenarios. In an example, the preset distance threshold is 3mm. The data processing system can compare the distance of each detection point with the preset distance threshold, so as to obtain the reference point and control detection point whose distance is less than or equal to the preset distance threshold, and the detection data of each detection point.
在步骤64中,数据处理系统可以获取所述检测数据的平均值,并利用所有检测点的平均值生成均值分布图,将所述均值分布图作为所述目标检测数据。本步骤中,针对基准产品上的各检测点,数据处理系统可以计算步骤64中所获得的各检测点的检测数据,即与基准产品上的各检测点距离小于或等于上述预设距离阈值的检测点(包括基准产品上这一检测点)的检测数据的平均值。In step 64, the data processing system may obtain the average value of the detection data, and use the average value of all detection points to generate a mean value distribution map, and use the mean value distribution map as the target detection data. In this step, for each detection point on the reference product, the data processing system can calculate the detection data of each detection point obtained in step 64, that is, the distance from each detection point on the reference product is less than or equal to the above-mentioned preset distance threshold. The average value of the test data of the test point (including this test point on the reference product).
本步骤中,数据处理系统可以根据各检测点在基准产品上的坐标数据,结合该检测点对应的检测数据的平均值,生成均值分布图,效果如图7或图8所示。需要说明的是,图7所示均值分布图是基于图5所示的目标检测数据生成的,图8所示均值分布图是在图5所示目标检测参数的基础上增加主工艺站点、设备和单元等参数后所示的目标检测数据生成的。In this step, the data processing system can generate an average value distribution diagram based on the coordinate data of each detection point on the reference product and the average value of the detection data corresponding to the detection point, as shown in Figure 7 or Figure 8 . It should be noted that the mean value distribution diagram shown in Figure 7 is generated based on the target detection data shown in Figure 5, and the mean value distribution diagram shown in Figure 8 is based on the target detection parameters shown in Figure 5 by adding the main process station and equipment The target detection data shown after the unit and other parameters are generated.
需要说明的是,图7和图8所示均值分布图中第一行数据为每个检测点的横坐标,第一列数据为每个检测点的纵坐标,其他部分的数据为每个检测点的检测数据的平 均值,空格表示此坐标处无检测数据。It should be noted that the first row of data in the mean distribution diagram shown in Figure 7 and Figure 8 is the abscissa of each detection point, the first column of data is the ordinate of each detection point, and the data in other parts is the The average value of the detection data of the point, and a blank indicates that there is no detection data at this coordinate.
需要说明的是,图7和图8所示的均值分布图,是预先建立一个表格,然后将结合各检测点的坐标数据与表格中坐标点(即横坐标和纵坐标唯一确定的空格)一一对应,并将各检测点的平均值导入该坐标点,最终得到图7和图8所示的效果。It should be noted that the average value distribution diagrams shown in Fig. 7 and Fig. 8 are to establish a table in advance, and then combine the coordinate data of each detection point with the coordinate points in the table (that is, the uniquely determined spaces of the abscissa and ordinate) One-to-one correspondence, and the average value of each detection point is imported into the coordinate point, and finally the effects shown in Figure 7 and Figure 8 are obtained.
可理解的是,预先建立的表格中相邻横坐标或相邻纵坐标可以按照等间隔来设置,此时表格中的坐标点(或单元格)的数量会远多于检测点的数量(如70个),那么均值分布图中会存在很多的空格,从而影响到均值分布图的正常使用。为此,数据处理系统还可以对均值分布图作如下处理,参见图9,具体包括步骤91和步骤92。It can be understood that the adjacent abscissas or adjacent ordinates in the pre-established table can be set at equal intervals. At this time, the number of coordinate points (or cells) in the table will be far more than the number of detection points (such as 70), then there will be a lot of spaces in the mean distribution map, which will affect the normal use of the mean distribution map. For this reason, the data processing system can also perform the following processing on the mean value distribution graph, referring to FIG. 9 , specifically including steps 91 and 92.
在步骤91中,数据处理系统可以以均值分布图中各坐标点为圆心,以预设长度为半径(如6mm)得到一个圆形,获取距离所述圆心最近的整数。在步骤92中,数据处理系统可以合并所述圆形内的坐标点,得到所述圆形对应的一个坐标点;所述坐标点的坐标数据为所述整数。考虑到各检测点之间的距离(如数十mm)通常大于上述预设长度,以坐标点为圆心形成一个圆形通常会仅包括该检测点,那么可以将该检测点的坐标数据处理为(距离圆心最近的)整数,因此上述图9所示合并后可以大大减少均值分布图中空格的数量。可理解的是,图7和图8示出了坐标合并后的效果,合并后横坐标和纵坐标之间的间隔不再为等间隔。例如横坐标-1670到横坐标-839,两者相差831mm;横坐标-839到横坐标-9mm,两者相差830mm。再结合纵坐标的间隔缩小,即可以使空格数量大大减少,达到图7和图8所示的均值分布图中各坐标点之间的数量与检测点的数量相近的效果,最终得到如图7和图8所示的均值分布图,从而方便分析该均值分布图。In step 91, the data processing system can take each coordinate point in the mean value distribution diagram as the center and a preset length as the radius (such as 6mm) to obtain a circle, and obtain the nearest integer to the center of the circle. In step 92, the data processing system may combine the coordinate points in the circle to obtain a coordinate point corresponding to the circle; the coordinate data of the coordinate point is the integer. Considering that the distance between each detection point (such as tens of mm) is usually greater than the above-mentioned preset length, forming a circle with the coordinate point as the center usually only includes the detection point, so the coordinate data of the detection point can be processed as (the closest to the center of the circle) integer, so the number of spaces in the mean distribution graph can be greatly reduced after the combination shown in Figure 9 above. It can be understood that, FIG. 7 and FIG. 8 show the effect after the coordinates are merged, and the interval between the abscissa and the ordinate is no longer equal after the merge. For example, the abscissa -1670 to the abscissa -839, the difference between the two is 831mm; the abscissa -839 to the abscissa -9mm, the difference between the two is 830mm. Combined with the reduction of the interval of the ordinate, the number of spaces can be greatly reduced, and the effect that the number of coordinate points in the mean value distribution diagram shown in Figure 7 and Figure 8 is similar to the number of detection points can be achieved, and finally the result shown in Figure 7 and the mean value distribution graph shown in Figure 8, thereby facilitating the analysis of the mean value distribution graph.
需要说明的是,在异常值过滤过程中,当多个产品的同一检测点的检测数据均为异常值时需要滤除,此时同样会在均值分布图中形成空格。因此,通过检测均值分布图中空格所在位置来检测产品对应检测点是否出现不良。由于图9所示例的合并过程可以极大减少空格的数量,可以减少所需要检查空格的数量,有利于提升不良分析的效率。It should be noted that in the outlier filtering process, when the detection data of the same detection point of multiple products are all outliers, they need to be filtered out. At this time, blanks will also be formed in the mean distribution graph. Therefore, by detecting the location of the space in the mean distribution graph, it is possible to detect whether the corresponding detection point of the product is defective. Since the merging process illustrated in FIG. 9 can greatly reduce the number of blanks, the number of blanks to be checked can be reduced, which is beneficial to improve the efficiency of bad analysis.
在一实施例中,考虑到直接显示数字不易发现不良位置,因此,本示例中目标检测数据还包括与之匹配的偏差数据,参见图10,获取包括步骤101和步骤102:In one embodiment, considering that it is difficult to find bad positions by directly displaying numbers, the target detection data in this example also includes matching deviation data. See FIG. 10 , the acquisition includes steps 101 and 102:
在步骤101中,数据处理系统可以获取目标检测数据的最大值和最小值,以及各检测点对应的平均值。例如,上述最大值和最小值可以从表4所示结构中读取。在步骤102中,数据处理系统可以根据上述最大值、最小值和各检测点对应的平均值计算各检测点的偏差数据,所述偏差数据用于辅助数据分析。其中,偏差数据的计算公式如下式(1)所示:In step 101, the data processing system can acquire the maximum value and minimum value of the target detection data, and the average value corresponding to each detection point. For example, the above maximum and minimum values can be read from the structure shown in Table 4. In step 102, the data processing system may calculate the deviation data of each detection point according to the above-mentioned maximum value, minimum value and average value corresponding to each detection point, and the deviation data is used to assist data analysis. Among them, the calculation formula of the deviation data is shown in the following formula (1):
Figure PCTCN2022070461-appb-000003
Figure PCTCN2022070461-appb-000003
式(1)中,p i代表每一个检测点处的偏差,x min和x max分别从目标数据表中通过目标检测参数匹配后的产品上所有检测数据的最大值与最小值,x i表示某一个检测点的检测数据。 In formula (1), p i represents the deviation at each detection point, x min and x max are respectively the maximum and minimum values of all detection data on the product after matching the target detection parameters from the target data table, and x i represents The detection data of a certain detection point.
实际应用中,上述偏差数据可以同时写入上述均值分布图,使得用户在看到各检测点的平均值时可以同步看到该偏差数据,并利用平均值和偏差数据来分析该检测点是否为不良点,以及其他不定分析。In practical applications, the above-mentioned deviation data can be written into the above-mentioned average value distribution graph at the same time, so that the user can see the deviation data synchronously when seeing the average value of each detection point, and use the average value and deviation data to analyze whether the detection point is Bad points, and other uncertain analysis.
在一实施例中,数据处理系统可以将偏差数据在均值分布图中进行可视化显示。例如,数据处理系统可以将上述偏差数据转换成背景颜色,通过背景颜色的深浅来代表产品检测参数均值的变化趋势及偏差程度。即偏差数据越大则颜色越深,偏差数据越小则颜色越浅。在一示例中,颜色深浅程度可以依据rgba(0,255,0,p)与rgba(255,0,0,p)函数来设置,图8中示出了利用rgba(0,255,0,p)来表示偏差数据的效果,图9中示出了利用rgba(255,0,0,p)来表示偏差数据的效果。技术人员可以根据具体场景选择合适的背景颜色,考虑到专利申请文件的要求,本公开中使用了灰度来表示,可以达到同样效果。In an embodiment, the data processing system can visualize the deviation data in a mean distribution graph. For example, the data processing system can convert the above-mentioned deviation data into a background color, and use the depth of the background color to represent the change trend and degree of deviation of the mean value of the product detection parameters. That is, the larger the deviation data is, the darker the color is, and the smaller the deviation data is, the lighter the color is. In an example, the degree of color depth can be set according to rgba(0,255,0,p) and rgba(255,0,0,p) functions, as shown in Figure 8 using rgba(0,255,0,p) to represent The effect of offset data, FIG. 9 shows the effect of using rgba(255,0,0,p) to represent the offset data. Technicians can choose an appropriate background color according to the specific scene. Considering the requirements of the patent application documents, gray scale is used in this disclosure to achieve the same effect.
在步骤24中,展示所述目标检测数据,以根据所述目标检测数据进行数据分析。In step 24, the target detection data is presented for data analysis based on the target detection data.
本实施例中,数据处理系统可以将上述目标检测数据发送给显示设备,由显示设备展示上述目标检测数据,从而使用户可以根据目标检测数据进行不良分析。也就是说,本实施例中可以量化产品检测参数在不同生产主工艺中的参数分布趋势从而迅速定位引起不良的原因所在,从而丰富不良诊断分析方法。In this embodiment, the data processing system may send the above target detection data to the display device, and the display device will display the above target detection data, so that the user can perform defect analysis according to the target detection data. That is to say, in this embodiment, the parameter distribution trend of product detection parameters in different main production processes can be quantified so as to quickly locate the cause of the failure, thereby enriching the failure diagnosis and analysis method.
在一实施例中,继续参见图4,上述筛选器还可以设置有下载按键(如相似性分析按键),当筛选器检测到用户触发下载按键的操作时,可以向数据处理系统发送下载请求。数据处理系统在接收到上述下载请求时可以将目标检测数据导入预设表格,该预设表格可以采用EXCLE表实现。然后,数据处理系统可以将预设表格下载至到指定位置,以使用户根据预设表格中的目标检测数据进行数据分析。In an embodiment, referring to FIG. 4 , the above-mentioned filter may also be provided with a download button (such as a similarity analysis button), and when the filter detects that the user triggers the operation of the download button, it may send a download request to the data processing system. When the data processing system receives the above-mentioned download request, it can import the target detection data into a preset table, and the preset table can be realized by using an EXCLE table. Then, the data processing system can download the preset form to a designated location, so that the user can perform data analysis according to the target detection data in the preset form.
至此,本实施例中,上述目标参数数据可以根据产品工艺过程和历史经验来选择出来,使目标参数数据与产品检测场景相匹配,有利于降低分析不良过程的难度;并且,利用目标检测数据进行不良分析,可以发现产品不良所发生的位置和工序等信息,提高了分析检测的效率和准确性。也就是说,本实施例中,经过对产品检测参数的多方面分析,使得分析结果有理论依据且又有专业领域的经验判断,能帮助用户更精确快速地定位到导致不良的原因。同时,本实施例中操作简便,有利于用户掌控数据处理的流程,从而提升生产效率。So far, in this embodiment, the above-mentioned target parameter data can be selected according to the product process and historical experience, so that the target parameter data can be matched with the product detection scene, which is conducive to reducing the difficulty of analyzing the bad process; and, using the target detection data to perform Defective analysis can find information such as the location and process of product defects, which improves the efficiency and accuracy of analysis and detection. That is to say, in this embodiment, through the multi-faceted analysis of the product detection parameters, the analysis results have a theoretical basis and experience judgment in the professional field, which can help the user to locate the cause of the failure more accurately and quickly. At the same time, the operation in this embodiment is simple and convenient, which is beneficial for users to control the flow of data processing, thereby improving production efficiency.
本公开实施例还提供了一种数据处理系统,参见图11,包括:An embodiment of the present disclosure also provides a data processing system, see FIG. 11 , including:
目标表获取模块111,用于根据第一Hive表和第二Hive表生成HBase表,并将所述HBase表作为目标数据表;The target table acquisition module 111 is used to generate the HBase table according to the first Hive table and the second Hive table, and use the HBase table as the target data table;
目标参数获取模块112,用于获取预设的目标检测参数;A target parameter acquisition module 112, configured to acquire preset target detection parameters;
目标数据获取模块113,用于根据所述目标检测参数匹配所述目标数据表的行键,获得所述目标检测参数对应的目标检测数据;所述目标检测数据包括表征产品不良的检测参数数据;Target data acquisition module 113, configured to match the row key of the target data table according to the target detection parameters, and obtain target detection data corresponding to the target detection parameters; the target detection data includes detection parameter data representing defective products;
目标数据展示模块114,用于展示所述目标检测数据,以根据所述目标检测数据进行数据分析。The target data display module 114 is configured to display the target detection data, so as to perform data analysis according to the target detection data.
需要说明的是,本实施例中示出的方法与图1所示方法实施例的内容相匹配,可以参考上述方法实施例的内容,在此不再赘述。It should be noted that the method shown in this embodiment matches the content of the method embodiment shown in FIG. 1 , and reference may be made to the content of the above method embodiment, which will not be repeated here.
在示例性实施例中,还提供了一种数据处理系统,所述数据处理系统包括数据处理装置,参见图12,所述数据处理装置包括:In an exemplary embodiment, a data processing system is also provided. The data processing system includes a data processing device. Referring to FIG. 12 , the data processing device includes:
处理器121; Processor 121;
用于存储所述处理器可执行的计算机程序的存储器122; memory 122 for storing computer programs executable by said processor;
其中,所述处理器被配置为执行所述存储器中的计算机程序,以实现如图2~图10所述的方法。Wherein, the processor is configured to execute the computer program in the memory, so as to realize the methods described in FIGS. 2 to 10 .
在一实施例中,还包括显示装置和分布式存储装置;In one embodiment, it also includes a display device and a distributed storage device;
所述分布式存储装置,被配置为获取并存储生产履历数据和检测参数数据;The distributed storage device is configured to acquire and store production history data and detection parameter data;
所述显示装置,被配置为显示所述目标检测数据。The display device is configured to display the target detection data.
在示例性实施例中,还提供了一种计算机可读存储介质,当所述存储介质中的可执行的计算机程序由处理器执行时,能够实现如图2~图10所述的方法。In an exemplary embodiment, a computer-readable storage medium is also provided, and when the executable computer program in the storage medium is executed by a processor, the methods described in FIGS. 2 to 10 can be implemented.
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件程序实现时,可以全部或部分地以计算机程序产品的形式实现。该计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行该计算机指令时,全部或部分地产生按照本公开实施例中的流程或功能。该计算机可以是通用计算机、专用计算机、计算机网络或者其他可编程装置。该计算机指令可以存储在计算机可读存储介质中。该计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包括一个或多个可用介质集成的服务器、数据中心等数据存储设备。该可用介质可以是磁性介质(例 如,软盘、磁盘、磁带)、光介质(例如,数字视频光盘(digital video disc,DVD))、或者半导体介质(例如固态硬盘(solid state drives,SSD))等。In the above embodiments, all or part of them may be implemented by software, hardware, firmware or any combination thereof. When implemented using a software program, it may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When the computer instructions are loaded and executed on the computer, the processes or functions according to the embodiments of the present disclosure will be generated in whole or in part. The computer can be a general purpose computer, special purpose computer, computer network, or other programmable device. The computer instructions may be stored in a computer readable storage medium. The computer-readable storage medium may be any available medium that can be accessed by a computer, or a data storage device including a server, a data center, and the like integrated with one or more available media. The available medium may be a magnetic medium (for example, a floppy disk, a magnetic disk, a magnetic tape), an optical medium (for example, a digital video disc (digital video disc, DVD)), or a semiconductor medium (for example, a solid state drive (solid state drives, SSD)), etc. .
本公开的一些实施例提供了一种计算机可读存储介质(例如,非暂态计算机可读存储介质),该计算机可读存储介质中存储有计算机程序指令,计算机程序指令在处理器上运行时,使得计算机执行如上述实施例中任一实施例所述的数据处理方法,例如所述的数据处理方法中的一个或多个步骤。Some embodiments of the present disclosure provide a computer-readable storage medium (for example, a non-transitory computer-readable storage medium), where computer program instructions are stored in the computer-readable storage medium, and when the computer program instructions are run on a processor , so that the computer executes the data processing method as described in any one of the above embodiments, for example, one or more steps in the data processing method.
示例性的,上述计算机可读存储介质可以包括,但不限于:磁存储器件(例如,硬盘、软盘或磁带等),光盘(例如,CD(Compact Disk,压缩盘)、DVD(Digital Versatile Disk,数字通用盘)等),智能卡和闪存器件(例如,EPROM(Erasable Programmable Read-Only Memory,可擦写可编程只读存储器)、卡、棒或钥匙驱动器等)。本公开描述的各种计算机可读存储介质可代表用于存储信息的一个或多个设备和/或其它机器可读存储介质。术语“机器可读存储介质”可包括但不限于,无线信道和能够存储、包含和/或承载指令和/或数据的各种其它介质。Exemplarily, the above-mentioned computer-readable storage medium may include, but is not limited to: a magnetic storage device (for example, a hard disk, a floppy disk, or a magnetic tape, etc.), an optical disk (for example, a CD (Compact Disk, a compact disk), a DVD (Digital Versatile Disk, Digital Versatile Disk), etc.), smart cards and flash memory devices (for example, EPROM (Erasable Programmable Read-Only Memory, Erasable Programmable Read-Only Memory), card, stick or key drive, etc.). Various computer-readable storage media described in this disclosure can represent one or more devices and/or other machine-readable storage media for storing information. The term "machine-readable storage medium" may include, but is not limited to, wireless channels and various other media capable of storing, containing and/or carrying instructions and/or data.
本公开实施例中所提到的处理器可以是中央处理器(Central Processing Unit,CPU),通用处理器,数字信号处理器(Digital Signal Processor,DSP),专用集成电路(Application-Specific Integrated Circuit,ASIC),现场可编程门阵列(Field Programmable Gate Array,FPGA)或者其他可编程逻辑器件、晶体管逻辑器件、硬件部件或者其任意组合。其可以实现或执行结合本公开所描述的各种示例性的逻辑方框和模块。处理器也可以是实现计算功能的组合,例如包含一个或多个微处理器组合,DSP和微处理器的组合等等。The processor mentioned in the embodiment of the present disclosure may be a central processing unit (Central Processing Unit, CPU), a general-purpose processor, a digital signal processor (Digital Signal Processor, DSP), an application-specific integrated circuit (Application-Specific Integrated Circuit, ASIC), Field Programmable Gate Array (Field Programmable Gate Array, FPGA) or other programmable logic devices, transistor logic devices, hardware components or any combination thereof. It may implement or execute the various illustrative logical blocks and modules described in connection with this disclosure. The processor can also be a combination of computing functions, for example, a combination of one or more microprocessors, a combination of DSP and a microprocessor, and so on.
此外,本公开实施例所提到的存储器可以是随机存取存储器(Random Access Memory,RAM)、闪存、只读存储器(Read Only Memory,ROM)、可擦除可编程只读存储器(Erasable Programmable ROM,EPROM)、电可擦可编程只读存储器(Electrically EPROM,EEPROM)、寄存器、硬盘、移动硬盘、只读光盘(CD-ROM)或者本领域熟知的任何其它形式的存储介质。In addition, the memory mentioned in the embodiments of the present disclosure may be Random Access Memory (Random Access Memory, RAM), flash memory, Read Only Memory (Read Only Memory, ROM), Erasable Programmable ROM (Erasable Programmable ROM) , EPROM), Electrically Erasable Programmable Read-Only Memory (Electrically Erasable EPROM, EEPROM), register, hard disk, removable hard disk, CD-ROM (CD-ROM) or any other form of storage medium well known in the art.
本领域技术人员在考虑说明书及实践这里公开的公开后,将容易想到本公开的其它实施方案。本公开旨在涵盖任何变型、用途或者适应性变化,这些变型、用途或者适应性变化遵循本公开的一般性原理并包括本公开未公开的本技术领域中的公知常识或惯用技术手段。说明书和实施例仅被视为示例性的,本公开的真正范围和精神由下面的权利要求指出。Other embodiments of the disclosure will be readily apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. The present disclosure is intended to cover any modification, use or adaptation that follows the general principles of the present disclosure and includes common knowledge or conventional technical means in the technical field not disclosed in the present disclosure. The specification and examples are to be considered exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.
应当理解的是,本公开并不局限于上面已经描述并在附图中示出的精确结构,并且可以在不脱离其范围进行各种修改和改变。本公开的范围仅由所附的权利要求来限制。It should be understood that the present disclosure is not limited to the precise constructions which have been described above and shown in the drawings, and various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims (14)

  1. 一种数据处理方法,其特征在于,包括:A data processing method, characterized in that, comprising:
    根据第一Hive表和第二Hive表生成HBase表,并将所述HBase表作为目标数据表;所述第一Hive表包括各个产品的生产履历数据,所述第二Hive表包括已经检测过的产品的检测参数数据;Generate the HBase table according to the first Hive table and the second Hive table, and use the HBase table as the target data table; the first Hive table includes the production history data of each product, and the second Hive table includes the detected Product testing parameter data;
    获取待处理的目标检测参数,所述目标检测参数与所述目标数据表的行键匹配;Obtaining target detection parameters to be processed, where the target detection parameters match the row keys of the target data table;
    根据所述目标检测参数匹配所述目标数据表的行键,获得所述目标检测参数对应的目标检测数据;所述目标检测数据包括表征产品不良的检测参数数据;Matching the row keys of the target data table according to the target detection parameters to obtain target detection data corresponding to the target detection parameters; the target detection data includes detection parameter data representing defective products;
    展示所述目标检测数据,以根据所述目标检测数据进行数据分析。Displaying the target detection data for data analysis based on the target detection data.
  2. 根据权利要求1所述的方法,其特征在于,根据第一Hive表和第二Hive表生成HBase表,包括:The method according to claim 1, wherein generating the HBase table according to the first Hive table and the second Hive table includes:
    基于预设第一时间间隔抽取所述生产履历信息并存储到所述第一Hive表中;以及基于预设第二时间间隔抽取所述检测参数数据并存储到所述第二Hive表中;Extracting the production history information based on a preset first time interval and storing it in the first Hive table; and extracting the detection parameter data based on a preset second time interval and storing it in the second Hive table;
    将所述第一Hive转化为以产品识别码为行键的Hbase中间表;The first Hive is converted into the Hbase intermediate table with the product identification code as the row key;
    融合所述第二Hive表和所述Hbase中间表,得到所述HBase表。Fusing the second Hive table and the Hbase intermediate table to obtain the HBase table.
  3. 根据权利要求2所述的方法,其特征在于,融合所述第二Hive表和所述Hbase中间表,得到所述HBase表,包括:The method according to claim 2, wherein the fusion of the second Hive table and the Hbase intermediate table obtains the HBase table, including:
    基于所述第二Hive表的检测数据获取主工序站点;Obtaining the main process site based on the detection data of the second Hive table;
    基于所述主工序站点对所述Hbase中间表中的数据筛选得到筛选后的Hbase中间表;The Hbase intermediate table after screening is obtained based on the data screening in the Hbase intermediate table based on the main process site;
    融合所述第二Hive表和筛选后的Hbase中间表,得到所述HBase表。The second Hive table is fused with the filtered Hbase intermediate table to obtain the HBase table.
  4. 根据权利要求1所述的方法,其特征在于,获得所述目标检测参数对应的目标检测数据,包括:The method according to claim 1, wherein obtaining target detection data corresponding to the target detection parameters comprises:
    根据基准产品的基准点位,确定以距离基准点位小于或者等于设定距离的检测数据求平均得到目标检测数据。According to the reference point of the reference product, it is determined that the target detection data is obtained by averaging the detection data whose distance from the reference point is less than or equal to the set distance.
  5. 根据权利要求4所述的方法,其特征在于,根据基准产品的基准点位,确定以距离基准点位小于或者等于设定距离的检测数据求平均得到目标检测数据,包括:The method according to claim 4, characterized in that, according to the reference point of the reference product, it is determined to obtain the target detection data by averaging the detection data whose distance from the reference point is less than or equal to the set distance, including:
    分别匹配对照产品上的对照检测点与所述基准产品上的基准点位;Respectively match the control detection point on the control product with the reference point on the reference product;
    获取所述对照产品的各对照检测点和所述基准产品上对应基准点位之间的距离;Obtain the distance between each control detection point of the reference product and the corresponding reference point on the reference product;
    针对所述基准产品上的各基准点位,获取对照检测点距离小于或者等于预设距离阈值时基准点位和对照检测点的候选检测数据;For each reference point on the reference product, obtain the candidate detection data of the reference point and the reference detection point when the distance between the comparison detection point is less than or equal to the preset distance threshold;
    获取所述候选检测数据的平均值,以及利用所有检测点的平均值生成均值分布图,并将所述均值分布图作为所述目标检测数据。Obtain the average value of the candidate detection data, and use the average value of all detection points to generate a mean value distribution map, and use the mean value distribution map as the target detection data.
  6. 根据权利要求4所述的方法,其特征在于,还包括获取基准产品的步骤,具体包括:The method according to claim 4, further comprising the step of obtaining a benchmark product, specifically comprising:
    获取所述目标数据表中各产品的检测点数量,得到检测点数量最多的产品;并将所述检测点数量最多的产品作为基准产品,所述基准产品之外的产品作为对照产品。Obtain the number of detection points of each product in the target data table, and obtain the product with the largest number of detection points; and use the product with the largest number of detection points as a reference product, and the products other than the reference product as reference products.
  7. 根据权利要求5所述的方法,其特征在于,所述方法还包括对所述目标检测数据作数据后处理的步骤,具体包括:The method according to claim 5, characterized in that, the method further comprises the step of post-processing the target detection data, specifically comprising:
    以所述均值分布图中各坐标点为圆心,以预设长度为半径得到一个圆形,获取距离所述圆心最近的整数;Taking each coordinate point in the mean value distribution diagram as the center of a circle, taking a preset length as a radius to obtain a circle, and obtaining an integer closest to the center of the circle;
    合并所述圆形内的坐标点,得到所述圆形对应的一个坐标点;所述坐标点的坐标数据为所述整数。Merge the coordinate points in the circle to obtain a coordinate point corresponding to the circle; the coordinate data of the coordinate point is the integer.
  8. 根据权利要求5所述的方法,其特征在于,所述目标检测数据还包括与之匹配的偏差数据,所述方法还包括:The method according to claim 5, wherein the target detection data also includes matching deviation data, and the method also includes:
    获取所述目标检测数据中检测数据的最大值和最小值,以及各检测点对应的平均值;Obtaining the maximum value and minimum value of the detection data in the target detection data, and the average value corresponding to each detection point;
    根据所述最大值、所述最小值和各检测点对应的平均值计算各检测点的偏差数据。The deviation data of each detection point is calculated according to the maximum value, the minimum value and the average value corresponding to each detection point.
  9. 根据权利要求8所述的方法,其特征在于,所述方法还包括:The method according to claim 8, characterized in that the method further comprises:
    将所述偏差数据在所述均值分布图内进行可视化显示。Visualizing the deviation data in the mean distribution graph.
  10. 根据权利要求5所述的方法,其特征在于,所述目标检测参数通过预设的筛选器获取,所述筛选器上设置有下载按键,所述方法还包括:The method according to claim 5, wherein the target detection parameters are acquired through a preset filter, the filter is provided with a download button, and the method further comprises:
    当检测到触发所述下载按键的操作时,将所述目标检测数据导入预设表格;When an operation triggering the download button is detected, importing the target detection data into a preset table;
    将所述预设表格下载到指定位置,以使用户根据所述预设表格中的目标检测数据进行数据分析。The preset table is downloaded to a designated location, so that the user can perform data analysis according to the target detection data in the preset table.
  11. 一种数据处理系统,其特征在于,包括:A data processing system, characterized in that it comprises:
    目标表获取模块,用于根据第一Hive表和第二Hive表生成HBase表,并将所述HBase表作为目标数据表;The target table acquisition module is used to generate the HBase table according to the first Hive table and the second Hive table, and use the HBase table as the target data table;
    目标参数获取模块,用于获取预设的目标检测参数;目标数据获取模块,用于根据所述目标检测参数匹配所述目标数据表的行键,获得所述目标检测参数对应的目标检测数据;所述目标检测数据包括表征产品不良的检测参数数据;A target parameter acquisition module, configured to acquire preset target detection parameters; a target data acquisition module, configured to match the row keys of the target data table according to the target detection parameters, and obtain target detection data corresponding to the target detection parameters; The target detection data includes detection parameter data representing defective products;
    目标数据展示模块,用于展示所述目标检测数据,以根据所述目标检测数据进行数据分析。The target data display module is used to display the target detection data, so as to perform data analysis according to the target detection data.
  12. 一种数据处理系统,包括数据处理装置,其特征在于,所述数据处理装置包括:A data processing system, comprising a data processing device, characterized in that the data processing device includes:
    处理器;processor;
    用于存储所述处理器可执行的计算机程序的存储器;memory for storing a computer program executable by said processor;
    其中,所述处理器被配置为执行所述存储器中的计算机程序,以实现如权利要求1~10任一项所述的方法。Wherein, the processor is configured to execute the computer program in the memory, so as to realize the method according to any one of claims 1-10.
  13. 根据权利要求12所述的数据处理系统,其特征在于,还包括显示装置和分布式存储装置;The data processing system according to claim 12, further comprising a display device and a distributed storage device;
    所述分布式存储装置,被配置为获取并存储生产履历数据和检测参数数据;The distributed storage device is configured to acquire and store production history data and detection parameter data;
    所述显示装置,被配置为显示所述目标检测数据。The display device is configured to display the target detection data.
  14. 一种计算机可读存储介质,其特征在于,当所述存储介质中的可执行的计算机程序由处理器执行时,能够实现如权利要求1~10任一项所述的方法。A computer-readable storage medium, characterized in that, when the executable computer program in the storage medium is executed by a processor, the method according to any one of claims 1-10 can be realized.
PCT/CN2022/070461 2022-01-06 2022-01-06 Data processing method and system, and computer-readable storage medium WO2023130304A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/CN2022/070461 WO2023130304A1 (en) 2022-01-06 2022-01-06 Data processing method and system, and computer-readable storage medium
CN202280000008.0A CN116724321A (en) 2022-01-06 2022-01-06 Data processing method, system and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2022/070461 WO2023130304A1 (en) 2022-01-06 2022-01-06 Data processing method and system, and computer-readable storage medium

Publications (1)

Publication Number Publication Date
WO2023130304A1 true WO2023130304A1 (en) 2023-07-13

Family

ID=87072902

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/070461 WO2023130304A1 (en) 2022-01-06 2022-01-06 Data processing method and system, and computer-readable storage medium

Country Status (2)

Country Link
CN (1) CN116724321A (en)
WO (1) WO2023130304A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116894032A (en) * 2023-09-05 2023-10-17 江苏数兑科技有限公司 Method for automatically generating data cleaning rule based on data exploration analysis result

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170364534A1 (en) * 2016-06-15 2017-12-21 Chen Zhang Platform, system, process for distributed graph databases and computing
CN110765331A (en) * 2019-07-08 2020-02-07 中国人民解放军战略支援部队信息工程大学 Retrieval method and system of spatio-temporal data
CN113454661A (en) * 2019-11-29 2021-09-28 京东方科技集团股份有限公司 System and method for product failure cause analysis, computer readable medium
CN113614758A (en) * 2020-01-22 2021-11-05 京东方科技集团股份有限公司 Equipment index goodness grade prediction model training method, monitoring system and method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170364534A1 (en) * 2016-06-15 2017-12-21 Chen Zhang Platform, system, process for distributed graph databases and computing
CN110765331A (en) * 2019-07-08 2020-02-07 中国人民解放军战略支援部队信息工程大学 Retrieval method and system of spatio-temporal data
CN113454661A (en) * 2019-11-29 2021-09-28 京东方科技集团股份有限公司 System and method for product failure cause analysis, computer readable medium
CN113614758A (en) * 2020-01-22 2021-11-05 京东方科技集团股份有限公司 Equipment index goodness grade prediction model training method, monitoring system and method

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116894032A (en) * 2023-09-05 2023-10-17 江苏数兑科技有限公司 Method for automatically generating data cleaning rule based on data exploration analysis result
CN116894032B (en) * 2023-09-05 2023-11-21 江苏数兑科技有限公司 Method for automatically generating data cleaning rule based on data exploration analysis result

Also Published As

Publication number Publication date
CN116724321A (en) 2023-09-08

Similar Documents

Publication Publication Date Title
CN106294888B (en) A kind of method for subscribing of the object data based on space-time database
US20160328432A1 (en) System and method for management of time series data sets
CN109191338B (en) Student behavior early warning method based on campus one-card consumption data
US11972548B2 (en) Computer-implemented method for defect analysis, apparatus for defect analysis, computer-program product, and intelligent defect analysis system
CN110941657B (en) Service data processing method and device
US11797557B2 (en) Data management platform, intelligent defect analysis system, intelligent defect analysis method, computer-program product, and method for defect analysis
US20210364999A1 (en) System and method for analyzing cause of product defect, computer readable medium
WO2023130304A1 (en) Data processing method and system, and computer-readable storage medium
CN108897821B (en) Method and device for automatically generating data conclusion
US20220179873A1 (en) Data management platform, intelligent defect analysis system, intelligent defect analysis method, computer-program product, and method for defect analysis
CN114880405A (en) Data lake-based data processing method and system
CN115617743A (en) Science and technology project archive management system based on data acquisition
WO2020237540A1 (en) Power grid user classification method and device, and computer-readable storage medium
WO2023050275A1 (en) Data processing method and system, and computer readable storage medium
US20220374004A1 (en) Computer-implemented method for defect analysis, computer-implemented method of evaluating likelihood of defect occurrence, apparatus for defect analysis, computer-program product, and intelligent defect analysis system
US20240004375A1 (en) Data processing method, and electronic device and storage medium
CN114493159B (en) Node position verification method and device based on MES system
US11308115B2 (en) Method and system for persisting data
CN115827777A (en) Self-adaptive synchronization and difference identification method, device and equipment for multiple data sources
CN114155037A (en) Work result visualization method and system
CN114766023B (en) Data processing method, device and system and electronic equipment
Betts et al. Fast Data: Smart and at Scale
CN109933764A (en) A kind of same process multi-format examines the dynamic selection method of list
TW200415480A (en) Processing tester information by trellising in integrated circuit technology development
US20220182442A1 (en) Computer-implemented method for defect analysis, computer-implemented method of evaluating likelihood of defect occurrence, apparatus for defect analysis, computer-program product, and intelligent defect analysis system

Legal Events

Date Code Title Description
WWE Wipo information: entry into national phase

Ref document number: 202280000008.0

Country of ref document: CN

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22917780

Country of ref document: EP

Kind code of ref document: A1