CN117743314A

CN117743314A - Data cleaning system and data cleaning method

Info

Publication number: CN117743314A
Application number: CN202410101200.XA
Authority: CN
Inventors: 张友友
Original assignee: Digiwin Software Co Ltd
Current assignee: Digiwin Software Co Ltd
Priority date: 2024-01-24
Filing date: 2024-01-24
Publication date: 2024-03-22

Abstract

The invention provides a data cleaning system and a data cleaning method. The data cleansing system includes a memory and a processor. The processor executes a plurality of modules in the memory. Each acquisition module acquires production data in a workshop system. In the first mode, each filtering module judges whether to generate the filtered production data according to the preset quantity of production data. In the second mode, each filtering module switches to the first mode according to the corresponding parameter model and the filtered production data, or outputs the filtered production data to an application system, so that effective filtered production data is obtained under limited resources.

Description

Data cleaning system and data cleaning method

Technical Field

The invention relates to a data cleaning system, in particular to a data cleaning system and a data cleaning method applied to industrial Internet of things data.

Background

Enterprises can analyze and process production data in the industrial internet of things through the system, and further manage and maintain the production line. Industrial internet of things may also be referred to as production facility networking, or machine networking. The production data may be classified into normal data, redundant data, and abnormal data. Redundant data is repeated and invalid data and can be removed by the system according to the continuous period and the set value. The abnormal data is data that cannot correctly reflect the state of the device due to various reasons. The foregoing reasons include unknown reasons such as abnormal setting of the device by the factory or the intermediate integrator, sudden disturbance of the device, and program code errors, and belong to invalid data to be removed. The reasons also include known reasons such as custom settings of the manual adjustment device, and belong to valid data that needs to be retained.

In general, the system may filter the anomaly data according to a threshold setting, a standard deviation setting, a low pass filter, or a supervised learning model. However, based on the current filtering manner, the system cannot acquire normal data and valid data under limited resources, so that the efficiency and accuracy of various subsequent application services are reduced.

Specifically, based on the threshold setting, the system cannot filter incremental anomaly data such as yield and power consumption. Based on the standard deviation setting, the system cannot filter outlier data that is near the average and cannot continue the frequency domain analysis from the filtered data. Based on the low pass filter, the system may erroneously filter (i.e., remove) valid data that was manually adjusted. Based on the supervised learning model, the system is too costly because multiple learning models need to be deployed in each device.

Disclosure of Invention

The invention aims at a data cleaning system which is suitable for the production data of the internet of machines and can acquire effective filtered production data under limited resources.

According to an embodiment of the invention, a data cleansing system includes a memory and a processor. The memory stores a plurality of modules. The processor is coupled to the memory, the workshop system and the application system. The processor executes a plurality of modules. The plurality of modules includes a plurality of acquisition modules corresponding to the plurality of parametric models and a plurality of filtering modules. Each of the plurality of acquisition modules acquires production data in the shop floor system. In the first mode, each of the plurality of filtering modules determines whether to generate filtered production data according to a preset number of production data. In the second mode, each of the plurality of filtering modules switches to the first mode or outputs the filtered production data to the application system according to the corresponding parametric model and the filtered production data.

According to an embodiment of the present invention, the data cleansing method of the present invention executes a plurality of modules in a memory by a processor. The plurality of modules includes a plurality of acquisition modules corresponding to the plurality of parametric models and a plurality of filtering modules. The step of executing, by the processor, the plurality of modules in the memory includes the following steps. Production data in the shop floor system is acquired by each of the plurality of acquisition modules. In a first mode, by each of the plurality of filtering modules, it is determined whether to generate filtered production data based on a preset number of production data. And switching to the first mode or outputting the filtered production data to the application system according to the corresponding parameter model and the filtered production data in the second mode by each of the plurality of filtering modules.

Based on the above, the data cleaning system and the data cleaning method of the present invention can adaptively acquire the filtered production data belonging to the normal data as well as the valid data by operating the filtering module in different modes. Thus, the data cleaning system can improve the efficiency of the data cleaning process and the correctness of the data cleaning result, thereby obtaining effective filtered production data under limited resources.

In order to make the above features and advantages of the present invention more comprehensible, embodiments accompanied with figures are described in detail below.

Drawings

FIG. 1 is a block diagram of a data cleansing system according to an embodiment of the present invention;

FIG. 2 is a flow chart of a data cleansing method according to an embodiment of the present invention;

FIG. 3 is a block diagram of a data cleansing system according to another embodiment of the present invention;

FIGS. 4A-4B are flowcharts of a data cleansing method according to the embodiment of FIG. 3 of the present invention;

FIG. 5 is a schematic diagram illustrating the operation of the data cleansing system of the embodiment of FIG. 3 in accordance with the present invention.

Description of the reference numerals

100. 300: a data cleaning system;

110. 310: a processor;

120. 320: a memory;

121 to 121N1, 321 to 321N: an acquisition module;

122 to 122N2, 322 to 322N: a filtration module;

210: a workshop system;

211: an apparatus;

220: an application system;

323-323N: a parametric model;

324: a driving module;

d1: production data;

d1': filtered production data;

s210 to S230, S410 to S493: and (3) step (c).

Detailed Description

Reference will now be made in detail to the exemplary embodiments of the present invention, examples of which are illustrated in the accompanying drawings. Wherever possible, the same reference numbers are used in the drawings and the description to refer to the same or like parts.

FIG. 1 is a block diagram of a data cleansing system according to an embodiment of the present invention. Referring to fig. 1, a data cleaning system 100 is applied to the internet of machines. The data cleansing system 100 is capable of adaptively performing data cleansing on data in the machine-networking (e.g., production data D1) and outputting the cleansed data (e.g., filtered production data D1') to other systems (e.g., application system 220). In this embodiment, the data cleaning system 100 may be, for example, a software as a service (Software as a Service, saaS) system.

It should be noted that the data cleansing system 100 may perform data cleansing for abnormal data in the production data D1. The abnormal data is data that cannot correctly reflect the state of the device due to various reasons. The anomaly data may include invalid data due to unknown reasons (e.g., the device being subject to glitches) and valid data due to known reasons (e.g., the customized settings of the device being manually adjusted). The data cleansing system 100 is capable of filtering invalid data and retaining valid data as well as normal data in the production data D1.

In this embodiment, the user may operate the electronic device to call the data cleansing system 100 through an application program interface (Application Programming Interface, API). The electronic device may be, for example, a mobile phone, a tablet computer, a notebook computer, a desktop computer, and the like.

In this embodiment, the user may also operate the electronic device to deploy the shop floor system 210 on an operations technology (Operational Technology, OT) network through API calls. The plant system 210 may be, for example, a management system to manage the various devices in the plant. In addition, the user may operate the electronic device to invoke an application system 220 deployed on an information technology (Information Technology, IT) network through an API call. The application system 220 may be, for example, a management system to manage workshops, or an enterprise resource planning (Enterprise resource planning, ERP) system to perform various Business services.

In this embodiment, the data cleansing system 100 may include a processor 110 and a memory 120. The processor 110 is coupled to the memory 120, the plant system 210, and the application system 220. The memory 120 stores a plurality of modules. The modules may include a plurality of acquisition modules 121-12N 1 and a plurality of filtering modules 122-122N 2 corresponding to a plurality of parametric models (not shown in FIG. 1), wherein N1 and N2 are positive integers greater than 1, respectively. N1 may be the same as N2. Such modules 121-123 may be implemented, for example, in firmware or software, and may have various functions.

In this embodiment, each acquisition module (e.g., acquisition module 121) creates a corresponding parametric model through a corresponding filter module 122. Each parameter model applies a corresponding parameter model method according to the parameter properties of the acquisition module 121. That is, a plurality of parameter models are respectively applied to different parameter model methods. The parametric model method is used to classify the parameter properties of the production data D1, such as "incremental (decremental) parameter", periodic parameter ", or" setup parameter ". The parametric model method is implemented, for example, in a programming language. For example, the first parametric model uses a first parametric model method to evaluate whether the production data D1 (e.g., yield) belongs to an increasing (decreasing) type of parameter. The plurality of parametric models may be stored in the memory 120 or other storage device.

In detail, the acquisition module 121 corresponds to the first parametric model. The acquisition module 121 is configured to perform data acquisition on the shop floor system 210 according to the parameter setting (i.e., the parametric model method) of the first parametric model to obtain the corresponding production data D1. Other acquisition modules 12N1 may refer to acquisition module 121 and so on.

In addition, the filtering module 122 corresponds to the collecting module 121 and the first parameter model. The filtering module 122 is configured to switch between a first mode and a second mode and perform data cleansing on the production data D1 (i.e., filter the production data D1) according to the configuration of the first parametric model to provide filtered production data D1'. Other filter modules 122N2 may refer to filter module 122 and so on.

In this embodiment, the memory 120 may also store related algorithms, programs and data such as operation software for implementing the functions of the present invention such as data acquisition and various calculations. The memory 120 may be, for example, a dynamic random access memory (Dynamic Random Access Memory, DRAM), flash memory (flash memory), non-volatile random access memory (Non-Volatile Random Access Memory, NVRAM), or a combination of these memories.

In the present embodiment, the processor 110 accesses the memory 120 and the plurality of parameter models, and can execute the data in the memory 120 and the plurality of modules 121-12N 1 and 122-122N 2, and the plurality of parameter models. The processor 110 may also access data of the shop floor system 210 as well as the application system 200. In this embodiment, the processor 110 may be, for example, a server, a signal converter, a field programmable gate array (Field Programmable Gate Array, FPGA), a central processing unit (Central Processing Unit, CPU), or other programmable general purpose or special purpose Microprocessor (Microprocessor), a digital signal processor (Digital Signal Processor, DSP), a programmable controller, an application specific integrated circuit (Application Specific Integrated Circuits, ASIC), a programmable logic device (Programmable Logic Device, PLD), or other similar devices or combinations thereof, which can load and execute computer program related firmware or software to perform functions such as data acquisition, and various calculations.

FIG. 2 is a flow chart of a data cleansing method according to an embodiment of the invention. Referring to fig. 1 and 2, the data cleansing system 100 may perform steps S210 to S230. The order of steps S210 to S230 is merely illustrative, and not limited thereto. In the present embodiment, the processor 110 accesses the memory 120 and executes a plurality of modules 121-12N 1 and 122-122N 2 to implement a data cleaning method.

In step S210, each of the acquisition modules 121 to 12N1 acquires the production data D1 in the shop system 210. In detail, taking the acquisition module 121 as an example, the acquisition module 121 performs data acquisition on the shop floor system 210 according to the parameter setting (i.e., the parametric model method) of the first parametric model (e.g., the incremental parametric model) to acquire the production data D1 associated with the yield.

In step S220, each of the filtering modules 122 to 122N2 operates in the first mode, and each of the filtering modules 122 to 122N2 determines whether to generate the filtered production data D1' according to the preset number of production data D1. In the present embodiment, the first mode is an operation mode in which each of the filter modules 122 to 122N2 performs a filtering operation. The preset number may be, for example, a minimum number of test strokes. The minimum detection count indicates the minimum number required to produce the data D1 in the filtering operation, thereby ensuring the validity of the data cleansing result.

That is, taking the filtering module 122 as an example, in the first mode, when the number of the production data D1 reaches the preset number, the filtering module 122 filters the production data D1 according to the foregoing condition result to obtain the filtered production data D1'. On the other hand, in the first mode, when the number of production data D1 does not reach the preset number, the filtering module 122 does not perform the filtering operation according to the aforementioned condition result until the number of production data D1 is accumulated to the preset number.

In step S230, each of the filter modules 122 to 122N2 operates in the second mode, and each of the filter modules 122 to 122N2 switches to the first mode or outputs the filtered production data D1 'to the application system 220 according to the corresponding parametric model and the filtered production data D1'. In the present embodiment, the second mode evaluates whether the data is normal and the effective operation mode is available for each of the filtering modules 122-122N 2, so as to ensure the correctness of the data cleaning result.

That is, taking the filtering module 122 as an example, in the second mode, when the filtered production data D1' does not conform to the first parametric model, the filtering module 122 switches back to the first mode to re-execute the step S220. In a second mode, on the other hand, when the filtered production data D1 'conforms to the first parametric model, the filtering module 122 outputs the filtered production data D1' to the application system 220.

In this embodiment, the application 220 accesses the filtered production data D1'. The application system 220 performs various application services according to the filtered production data D1'. For example, the application system 220 calculates a device efficiency assessment (Overall equipment effectiveness, OEE) from the filtered production data D1' associated with the production.

It should be noted that, by operating each of the filtering modules 122 to 122N2 in the first mode or the second mode, the data cleansing system 100 can adaptively determine whether to generate the filtered production data D1' according to the production data D1, so as to effectively filter abnormal data and retain normal data and valid data, thereby improving the correctness of the data cleansing result. In addition, the data cleansing system 100 can obtain the filtered production data D1' by using a small amount of the production data D1, thereby improving the efficiency of the data cleansing process. In addition, by each of the filtering modules 122-122N 2 evaluating whether the filtered production data D1' is normal and valid based on the corresponding parameter properties, the data cleansing system 100 is able to ensure the correctness of the data cleansing results. In this way, the application system 220 can continue various application services according to the data cleaning result (i.e., the filtered production data D1'), thereby improving the efficiency and accuracy of the application services.

FIG. 3 is a block diagram of a data cleansing system according to another embodiment of the present invention. Referring to FIG. 3, a data cleansing system 300 may include a processor 310 and a memory 320. The memory 320 stores a plurality of acquisition modules 321-321N, a plurality of filtering modules 322-322N, and a plurality of parametric models 323-323N, where N is a positive integer greater than 1. The processor 310 accesses the memory 320 and can execute data in the memory 320 and the plurality of modules 321-321N and 322-322N. Such modules 321-321N and 322-322N may be implemented, for example, in a program language such as JSON (JavaScript Object Notation), extensible markup language (Extensible Markup Language, XML), or YAML, but the invention is not limited thereto. The data cleansing system 300 may refer to the relevant description of the data cleansing system 100 and so on.

In the embodiment of FIG. 3, the memory 320 also stores a driver module 324. The processor 310 executes the drive module 324 such that the drive module 324 accesses the production data D1 of one or more devices 211 in the shop floor system via a communication interface (e.g., an OT integrator). In addition, the processor 310 executes the driving module 324 such that the driving module 324 activates one or more of the acquisition modules 321-321N and the corresponding filtering modules 322-322N, thereby providing the production data D1 to the activated modules 321-321N and 322-322N. The drive module 324 may also be referred to as a data acquisition drive module.

In the present embodiment, the plurality of acquisition modules 321 to 321N and the plurality of filtering modules 322 to 322N respectively correspond to the same plurality of parameter models 323 to 323N, so as to operate based on the production data D1 of the same parameter property. For example, the acquisition module 321 and the filtering module 322 correspond to an incremental parameter model 323. The acquisition module 321 and the filtering module 322 may operate based on incremental production data D1 (e.g., yield).

In the present embodiment, each filter module (e.g., filter module 322) uses a corresponding parametric model method to automatically build a corresponding parametric model 323 based on the parametric properties. The different parameter properties may include incremental or decremental, periodic, and set-up parameter properties. The parametric model method may include steps in the embodiments of fig. 4A-4B with respect to calculating a crowd value in a plurality of discrete convolution values. The aforementioned mode value indicates how much distance between each two data is acceptable.

That is, the parametric model method is used to provide additional definitions (or specifications) of the value range variation to further define (i.e., build) the corresponding parametric model (e.g., parametric model 323). For example, the parametric model method defines that production data D1 regarding production is only to be incremented to build an incremental parametric model 323. For another example, the parametric model method defines the production data D1 about the feed rate, and has limits such as upper and lower numerical limits and positive and negative variation to build the periodic parametric model 323N.

In particular, data of the incremental or decremental type may exhibit a plurality of patterns of behavior that are incremental, decremental, and zeroed under manual adjustment. The data of this parameter property may be, for example, production data D1 and/or historical production data such as yield, power consumption, running time, and cooling water level. Data belonging to periodicity may exhibit a gentle behavior pattern over a range compared to other ranges. The data of this parameter property may be, for example, the feed rate, spindle rotation speed, and motor temperature etc. production data D1 and/or historical production data. Data belonging to a setting may exhibit a low frequency of variation, and a plurality of behavior patterns controlled by manual adjustments to exhibit a fixed value. The data of the parameter property may be, for example, production data D1 and/or historical production data such as spindle magnification, feed magnification, and tool compensation value.

In the present embodiment, the plurality of acquisition modules 321 to 321N are independent from each other. That is, during operation, each acquisition module (e.g., acquisition module 321) does not consider correlations between other parameter settings, except for parameter settings associated with itself. In addition, the acquisition module 321 provides the production data D1 acquired by itself to the associated filtering module 322. The acquisition module 321 may also be referred to as a data acquisition parameter module. The filtering module 322 may also be referred to as a data filter.

In this embodiment, the application 220 accesses the plurality of filtering modules 322-322N to obtain filtered production data D1' of various parametric properties. The application system 220 performs various application services according to the filtered production data D1'. For example, the application system 220 performs an application service of the database by accessing the filtered production data D1'. The application system 220 performs an application service of OEE analysis by calculating filtered production data D1' associated with the production. The application system 220 performs application services of the job ticket management by calculating filtered production data D1' associated with the job ticket newsletter. In this embodiment, the application service may further include extended services such as energy consumption calculation, equipment health management maintenance, and wage calculation of the production line.

Fig. 4A to 4B are flowcharts of a data cleansing method according to the embodiment of fig. 3 of the present invention. Referring to fig. 3 and fig. 4A to 4B, the processor 310 accesses the memory 320 and performs steps S410 to S493 by executing the plurality of modules 321 to 321N, 322 to 322N, and 324 to implement the data cleansing method. The driving module 324 activates at least one of the plurality of acquisition modules 321-321N and the corresponding filtering module 322-322N. In the present embodiment, steps S410 to S493 may be applied to the following exemplary cases.

In this embodiment, each filter module (e.g., filter module 322) may operate in either the first mode or the second mode. The first mode may also be referred to as Pending (Pending) mode. The second mode may be referred to as a Direct mode. The filtering module 322 switches among such modes to achieve corresponding functionality in various scenarios.

In detail, the first mode may be applied to a scene in which the plurality of parameter models 323 to 323N have not been established or are well established. In the first mode, each filter module (e.g., filter module 322) is able to observe whether the current parametric model 323 needs to be adjusted. Furthermore, the first pattern may also be applied in another scenario where the production data D1 has a significantly varying behavior pattern. In the first mode, each filtering module (e.g., filtering module 322) is capable of observing whether each piece of data in production data D1 needs to be filtered (i.e., removed).

In addition, the second mode may be applied to a scene in which a plurality of parametric models 323 to 323N have been established. In the second mode, each filter module (e.g., filter module 322) is able to quickly determine whether the production data D1 matches the corresponding parametric model 323. Thus, based on the foregoing determination result, the filtering module 322 can consider the production data D1 as normal data, or switch back to the first mode to continue the filtering operation.

It should be noted that each filter module (e.g., filter module 322) can quickly determine whether the current production data D1 matches the parameter model 323 by the established parameter model 323 in the second mode. Once the foregoing determination indicates no match, the filtering module 322 switches to the first mode to continue the corresponding calculation by the established parametric model 323. In this way, the data cleaning system 300 can increase the efficiency of the data cleaning process by switching each filter module (e.g., the filter module 322) between the first mode and the second mode. In this manner, the data cleansing system 300 can avoid a large number of repeated computations and can implement online data cleansing functionality in limited resources.

In step S410, each activated acquisition module (e.g., acquisition module 321) performs data acquisition on the plant system according to the parameter settings of the corresponding parameter model (e.g., incremental parameter model 323) to obtain the production data D1 of the device 211. Furthermore, the acquisition module 321 provides the acquired production data D1 to the corresponding filtering module 322.

In step S420, each activated filter module (e.g., filter module 322) determines whether the production data D1 in step S410 is the first data after activation. That is, the filtering module 322 determines whether the current scene is in an initial state.

When the determination result of step S420 is yes, it indicates that the production data D1 is the first data after the start, that is, it indicates that the data cleansing system 300 performs the data cleansing method on the production data D1 for the first time. The data cleaning system 300 continues with steps S431 to S432.

In step S431, each filter module (e.g., filter module 322) switches the current mode operated by itself to the first mode according to the determination result indicated as yes in step S420.

In step S432, each filtering module (e.g., the filtering module 322) temporarily stores the production data D1 (i.e., the first data in the production data D1) in the memory 320, and accumulates the production data D1. The data cleansing system 300 ends the method flow and re-executes step S410. That is, in the initial state, the data cleansing system 300 temporarily stores the first pen data in the production data D1, and accumulates the other pen data in the production data D1 by executing step S410 again.

On the other hand, when the determination result of step S420 is no, it indicates that the production data D1 is not the first data after the start-up, that is, that the data cleansing system 300 does not perform the data cleansing method for the first time on the production data D1. The data cleansing system 300 continues with step S440.

In step S440, each filtering module (e.g., filtering module 322) determines whether the current mode operated by itself is the first mode according to the determination result indicated as "no" in step S420. When the determination result of step S440 is yes, it indicates that the filtering module 322 operates in the first mode (i.e., the Pending mode). The data cleaning system 300 continues with step S451. On the other hand, when the determination result of step S440 is no, it indicates that the filtering module 322 operates in the second mode (i.e., direct mode). The data cleaning system 300 continues with steps S461-S462.

In step S451, in the first mode, each filter module (e.g. the filter module 322) determines whether the number of production data D1 reaches the preset number according to the determination result indicated as yes in step S440.

In this embodiment, the preset number may be, for example, a minimum detection number. The minimum number of detection strokes may be, for example, 3. That is, at least after the production data D1 is accumulated over 3 data, the data cleansing system 300 performs a filtering operation on the production data D1, thereby ensuring the validity of the data cleansing result. In some embodiments, the preset number may also include a maximum number of test strokes. The maximum number of test strokes indicates the maximum number of loads that the production data D1 can be loaded in the filtering operation, thereby ensuring the efficiency of the data cleaning process and the load.

It should be noted that the preset number may be set to various minimum detection counts according to the frequency at which the data cleaning system 300 performs steps S410 to S451, due to the difference in the nature of the parameters. The larger the minimum detected number of strokes is, the more strokes are accumulated in the production data D1. Thus, the higher the accuracy of the data cleansing result, and the longer the operation time of the data cleansing flow. As such, the larger the allowable ratio of the abnormal data existing in the production data D1, and the longer the operation time of the data cleaning flow.

When the determination result of step S451 is no, it indicates that the number of strokes accumulated in the production data D1 does not reach the preset number. The data cleaning system 300 returns to step S432. On the other hand, when the determination result of step S451 is yes, it means that the number of strokes accumulated in the production data D1 reaches the preset number. The data cleansing system 300 continues with steps S471 to S472.

That is, in the first mode, when the number of the current production data D1 (for example, 2) is smaller than the preset number (assuming that the minimum detection number is 3), each filter module (for example, the filter module 322) temporarily stores the 2 production data D1, and accumulates the production data D1, so as to re-execute the step S410 to obtain the next data in the production data D1.

In step S461, in the second mode, each filter module (e.g., filter module 322) calculates the credibility of the production data D1 according to the corresponding parameter model (e.g., parameter model 323). That is, the filtering module 322 inputs each of the production data D1 to the incremental parameter model 323 to output the confidence level. The reliability indicates whether the production data D1 matches the corresponding parametric model 323, i.e. whether the production data D1 belongs to the incremental parametric property and conforms to the normal data specified by the parametric model 323.

For example, the incremental parameter model 323 includes the current data minus the previous data must be between [1,3] to define the incremental parameter properties. When the newly entered production data D1 minus the last production data D1 is between [1,3], the confidence level output by the filtering module 322 indicates that the production data D1 matches (i.e., matches) the incremental parameter properties and is trusted. On the other hand, when the newly entered production data D1 minus the last production data D1 is not between [1,3], the confidence level output by the filtering module 322 indicates that the production data D1 does not match (i.e., does not match) the incremental parameter properties and is not trusted.

In step S462, in the second mode, each filter module (e.g., filter module 322) determines whether the production data D1 is normal data according to the reliability in step S461. That is, the filtering module 322 determines whether the reliability falls within the preset confidence range configured by the parameter model 323 to determine that the production data D1 is normal data (and valid data) or abnormal data.

When the determination result of step S462 is yes, it indicates that the production data D1 is normal data (and valid data). The data cleaning system 300 continues with step S463. In step S463, in the second mode, each filter module (e.g., filter module 322) takes the plurality of data in the production data D1 as normal data according to the determination result indicated as yes in step S462, and outputs the normal data (i.e., the production data D1) to the application system 220.

On the other hand, when the determination result of step S462 is no, it indicates that the production data D1 is abnormal data. The data cleansing system 300 returns to steps S431 to S432 to switch the current mode in which it operates to the first mode, and accumulates other pen data in the production data D1 by executing step S410 again.

In step S471, in the first mode, each filtering module (e.g. the filtering module 322) performs K-nearest neighbor algorithm (K Nearest Neighbors, KNN) and mode rule to generate a plurality of discrete convolution values and a plurality of numerical values according to the determination result indicated as yes in step S451.

In detail, in the first mode, when the number of the current production data D1 (for example, 3 strokes) is not less than the preset number (assuming that the minimum detection stroke number is 3), each filtering module (for example, the filtering module 322) calculates a plurality of neighbor distances of the production data D1 according to a different plurality of distance parameters to generate a plurality of discrete convolution values. In this embodiment, the distance parameter may be, for example, a distance value or a ratio set by the user. Such distance parameters indicate the forward and backward distances spread with each data in the production data D1 as the origin, or indicate the radii spread with each data in the production data D1 as the center of the circle.

That is, the filtering module 322 treats each piece of data in the production data D1 as a single detection point. The filtering module 322 spreads the fixed range interval (i.e., the plurality of distance parameters) at each detection point. The aforementioned interval may be, for example, a range of positive and negative distances for each detection point, i.e. a plurality of different distance parameters. The filtering module 322 calculates a neighbor distance within each section and regards the calculation result as a characteristic value (i.e., a discrete convolution value) of each detection point.

Furthermore, in the first mode, each filter module (e.g., filter module 322) further calculates a number of values in the plurality of discrete convolution values. That is, the filtering module 322 performs mode calculation on the plurality of characteristic values of all the detection points to obtain a value of the mode.

It should be noted that, unlike the conventional KNN algorithm, the KNN and mode rules in step S471 do not need to perform training operations based on the labeled training sets. Therefore, the filtering module 322 can reduce the complexity of analyzing the production data D1 by KNN and mode rules.

In addition, unlike the average or median calculation, the KNN and mode rules in step S471 can obtain effective and data sampling principle-compliant feature results by utilizing the characteristic that the mode calculation is not susceptible to extreme values. Thus, the filtering module 322 can take into account the valid data resulting from manually adjusting the device and obtain the most dominant (or strongest) feature value (i.e., mode value).

In step S472, in the first mode, each filtering module (e.g., filtering module 322) determines whether the mole voting result is significant according to the discrete convolution values and the mode values in step S471.

In detail, in the first mode, each filter module (e.g., filter module 322) calculates a molo vote result for a plurality of discrete convolution values. That is, the filtering module 322 performs a mole voting algorithm (Boyer-Moore majority vote algorithm) on the plurality of discrete convolution values to generate a mole voting result. The molo voting result indicates the most of all the discrete convolution values.

It should be noted that unlike mode calculations, which determine the outcome in terms of a number of numbers, the mole voting algorithm determines different mole voting outcomes due to the voting order when the data does not have an absolute number. Therefore, compared to the number value in step S471, the molar voting result in step S472 can verify whether the number value is sufficiently representative, thereby reducing the probability of misjudging the feature result of the plurality of discrete convolution values.

Furthermore, in the first mode, each filtering module (e.g., filtering module 322) further filters the production data D1 to generate filtered production data D1 'according to the comparison between the crowd value and the mole voting result, or adjusts the preset quantity to determine whether to generate the filtered production data D1' according to the production data D1 according to the adjusted preset quantity. In this embodiment, the comparison indicates whether the crowd value is the same as (i.e., matches) the mole-vote result.

Specifically, when the determination result of step S472 is no, it indicates that the mole-vote result is not significant, that is, it indicates that the crowd value does not match the mole-vote result. The data cleaning system 300 continues with step S481. In step S481, in the first mode, each filter module (e.g., the filter module 322) increases the preset number according to the comparison result (i.e., the comparison result does not conform) in step S472, so as to update the preset number to be the preset number +1. The data cleansing system 300 returns to step S432 to temporarily store and accumulate the production data D1. The data cleansing system 300 re-performs step S410 to obtain the next data in the production data D1, and continues the other steps according to the adjusted (i.e., increased) preset number.

On the other hand, when the judgment result of step S472 is yes, it indicates that the mole-vote result is remarkable, that is, it indicates that the crowd value matches the mole-vote result. The data cleansing system 300 continues with steps S491 to S493. In step S491, in the first mode, each filtering module (e.g., filtering module 322) determines whether the plurality of discrete convolution values match the crowd value to determine whether to filter the production data D1 according to the comparison (i.e., match) in step S472.

In this embodiment, since the parametric model is, for example, a parametric model method using incremental parameter properties, the filtering module 322 can use the difference value of the plurality of discrete convolution values as the basis for determining whether the discrete convolution values meet the crowd value. The filtering module 322 further determines whether to filter the production data D1 or temporarily store the production data D1 according to the above determination result.

In the present embodiment, the plurality of differential values may be, for example, a plurality of forward differential values between two adjacent strokes of data. The plurality of differential values indicate whether the plurality of pieces of data in the production data D1 have the same behavior pattern (e.g., are incremented). When the plurality of differential values indicate that the plurality of data have the same behavior pattern, the plurality of data are indicated to belong to the same group. At this time, the filtering module 322 determines that the production data D1 does not need to be filtered according to the differential values, and then temporarily stores (i.e. retains) each piece of data in the production data D1 as the filtered production data D1'.

In another aspect, when the plurality of differential values indicate that the plurality of data does not have the same behavior pattern, it is indicated that at least one of the plurality of data and the other plurality do not belong to the same group. At this time, the filtering module 322 determines that the production data D1 needs to be filtered according to the differential values, and removes one or more data (i.e. abnormal data) not belonging to the plurality, so as to temporarily store (i.e. retain) the filtered production data D1'.

In step S492, in the first mode, each filter module (e.g., filter module 322) switches its current mode of operation to the second mode. Since the filtering module 322 has filtered (or buffered) the production data D1 into the filtered production data D1 'through steps S471-S472, the filtering module 322 operates according to the normal data and the valid data (i.e., the filtered production data D1') in the subsequent steps.

In step S493, in the second mode, each filtering module (e.g. the filtering module 322) determines that the filtered production data D1' is normal data and valid data according to the corresponding parameter model (e.g. the parameter model 323), so as to ensure the correctness of the data cleaning result. Further, in the second mode, each filtering module (e.g., filtering module 322) takes each piece of data in the filtered production data D1 'as normal data as well as valid data, and outputs the normal data (i.e., the filtered production data D1') to the application system 220.

FIG. 5 is a schematic diagram illustrating the operation of the data cleansing system of the embodiment of FIG. 3 in accordance with the present invention. In fig. 5, the horizontal axis represents the operation time of the device 211, and the vertical axis represents the operation value (for example, yield unit) of the device 211. The horizontal axis may also be considered as the ordering of the operational values. Referring to fig. 3 and 5, taking the production data D1 (e.g., yield) shown in fig. 5 as an example, the data cleaning system 300 illustrates details of the operation of the data cleaning method.

It is assumed that abnormal data occurs in a plurality of sections "B", "D", "F" and "H". The data in the intervals "B" and "D" belong to invalid data to be removed. The data in the intervals "F" and "H" are valid data to be reserved. The multiple data in the multiple intervals "F" and "H" may be, for example, due to the manual adjustment device 211 (e.g., inserting a new work order).

In the fig. 5 embodiment, acquisition module 321 obtains production data D1 for device 211 according to the parameter settings (i.e., parametric model methods) of incremental parametric model 323. The acquisition module 321 provides the production data D1 to the corresponding filtering module 322.

First, in interval "a", the filtering module 322 initially operates in the first mode and begins accumulating the production data D1 to reach a preset amount. The accumulated production data D1 may include 1 st to 9 th data. Since the change of the plurality of data in the interval "a" is stable and appears to be increasing, the filtering module 322 determines to temporarily store (i.e. retain) the plurality of data in the interval "a" as the filtered production data D1' by comparing the mode value of the plurality of data in the interval "a" with the mole voting result. In the second mode, the filtering module 322 determines that the plurality of data in the interval "a" are normal data and outputs the normal data.

In interval "B", the accumulated production data D1 may include the 10 th data. The 10 th data were dropped. Since the accumulated production data D1 does not reach the preset amount, in the first mode, the filtering module 322 temporarily stores the 10 th data and accumulates other data in the production data D1.

In the plurality of intervals "C" and "D", the accumulated production data D1 may include 11 th to 15 th data. In interval "E" of the first portion, the accumulated production data D1 may include 16 th to 17 th data.

At this time, the filtering module 322 performs KNN and mode rules to generate a plurality of discrete convolution values and a plurality of values according to the accumulated production data D1 (i.e., the interval "B" to the interval "E" of the first portion). The plurality of discrete convolution values may be represented, for example, by the series "1,1,1,3,3,1,1" and may be considered the series "a, b, c, d, d, e, f". The crowd value may be, for example, "d". The filtering module 322 further calculates the mole vote result for the plurality of discrete convolution values as "f".

Therefore, the filtering module 322 determines to update the preset number to be the preset number +1 according to the comparison result (i.e. the non-coincidence) between the mode value (i.e. "D") and the mole voting result (i.e. the non-coincidence) of the multiple data in the interval "B" to the interval "E" of the first portion, and temporarily stores and accumulates other data in the production data D1.

Then, in the interval "E" of the second portion, the accumulated production data D1 may include 18 th to 19 th data. The filtering module 322 performs KNN and mode rules to generate a plurality of discrete convolution values and a plurality of values according to the accumulated production data D1 (i.e., the interval "B" to the interval "E" of the second portion). The plurality of discrete convolution values may be represented, for example, by the series "a, b, c, d, d, e, f, d, d". The crowd value may be, for example, "d". The filtering module 322 further calculates the Moore voting result for the plurality of discrete convolution values as "d".

Thus, the filtering module 322 calculates a plurality of forward differential values of a plurality of discrete convolution values according to a comparison result (i.e., coincidence) between the mode value (i.e., d) of the plurality of data in the interval "B" to the interval "E" of the second portion and the mole voting result (i.e., d). These forward differential values indicate that the 10 th data (i.e., interval "B") and the 15 th data (i.e., interval "D") do not belong to the same group as the other data.

At this time, the filtering module 322 removes the 10 th data (i.e., interval "B") and the 15 th data (i.e., interval "D") according to the plurality of forward differential values to obtain other data (i.e., filtered production data D1'). In the second mode, the filtering module 322 determines the other data with the plurality of intervals "B" and "D" removed as normal data, and outputs the normal data.

In the third section "E", the accumulated production data D1 may include 19 th to 27 th data. Since the variation of the plurality of data in the section "E" of the third portion is stable and appears to be increasing, the filtering module 322 determines to register (i.e. retain) the plurality of data in the section "E" of the third portion as the filtered production data D1' by comparing the mode value of the plurality of data in the section "E" of the third portion with the mole voting result. In the second mode, the filtering module 322 determines that the plurality of data in the section "E" of the third portion is normal data, and outputs the normal data.

In interval "F", the accumulated production data D1 may include the 28 th data. A dip occurs in data at 28. Since the accumulated production data D1 does not reach the preset amount, in the first mode, the filtering module 322 temporarily stores the 28 th data and accumulates other data in the production data D1.

In the interval "G" of the first portion, the accumulated production data D1 may include 29 th to 34 th data. At this time, the filtering module 322 performs KNN and mode rules to generate a plurality of discrete convolution values and a plurality of values according to the accumulated production data D1 (i.e., the interval "F" to the interval "G" of the first portion). The plurality of discrete convolution values may be represented, for example, by the series "1,6,6,6,6,6,6" and may be considered the series "a, b, b, b, b, b, b". The crowd value may be, for example, "b". The filtering module 322 further calculates the mole vote result for the plurality of discrete convolution values as "b".

Thus, the filtering module 322 calculates a plurality of forward differential values of a plurality of discrete convolution values according to a comparison result (i.e., coincidence) between the mode value (i.e., d) and the mole voting result (i.e., b) of the plurality of data in the interval "F" to the interval "G" of the first portion. These forward difference values indicate that the 28 th data (i.e., interval "F") and the other data belong to the same group. That is, the filtering module 322 determines that the 28 th data (i.e., the interval "F") is the result of the manual adjustment and belongs to the valid data that does not need to be filtered.

At this time, the filtering module 322 temporarily stores (i.e., retains) the 28 th data (i.e., the interval "F") and the 29 th to 34 th data (i.e., the interval "G" of the first portion) according to the forward differential values. In the second mode, the filtering module 322 determines the plurality of data in the interval "F" and the interval "G" of the first portion as normal data, and outputs the normal data.

In the interval "G" of the second portion, the accumulated production data D1 may include 35 th to 37 th data. Since the variation of the plurality of data in the section "G" of the second portion is stable and appears to be increasing, the filtering module 322 determines to register (i.e. retain) the plurality of data in the section "G" of the second portion as the filtered production data D1' by comparing the mode value of the plurality of data in the section "G" of the second portion with the mole voting result. In the second mode, the filtering module 322 determines that the plurality of data in the interval "G" of the second portion is normal data, and outputs the normal data.

In interval "H", the accumulated production data D1 may include data of the 38 th stroke. For the operation of the filtering module 322 according to the interval "H", reference is made to the description of the interval "F" and so on. In addition, in interval "I", the accumulated production data D1 may include 39 th to 44 th data. The operation of the filtering module 322 according to the interval "I" can be described with reference to the interval "G" of the first portion and so on.

In summary, the data cleaning system and the data cleaning method of the present invention can adaptively operate in different modes by using a small amount of production data through the filtering module, so as to improve the efficiency of the data cleaning process. The KNN and mode rule is executed through the filtering module, the data cleaning system can avoid the training process of the model, and the additional cost of manually marking the training set and establishing the model is reduced. In addition, based on KNN and mode rule, the data cleaning system can effectively filter abnormal data, and then improves the accuracy of data cleaning results. Therefore, the application system can realize various application services according to the data cleaning result, and further reduce errors of the application services. In some implementations, the data cleaning system can also reduce the complexity of analyzing production data by performing the mole voting algorithm and the comparison calculation by the filtering module, thereby reducing the specification requirements and costs of various hardware in the data cleaning system and the machine networking.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or equivalent substitutions can be made to some or all of the technical features; such modifications and substitutions do not depart from the spirit of the invention.

Claims

1. A data cleansing system, comprising:

a memory storing a plurality of modules; and

a processor coupled to the memory, the plant system, and the application system, executing the plurality of modules, wherein the plurality of modules includes a plurality of acquisition modules and a plurality of filtering modules corresponding to a plurality of parametric models,

wherein each of the plurality of acquisition modules acquires production data in the shop floor system,

wherein in a first mode, each of the plurality of filtering modules determines whether to generate filtered production data based on a preset number of the production data,

wherein in a second mode, each of the plurality of filtering modules switches to the first mode or outputs the filtered production data to the application system according to the corresponding parametric model and the filtered production data.

2. The data cleansing system of claim 1 wherein in the first mode, each of the filtration modules accumulates the production data when the number of production data is less than the preset number.

3. The data cleansing system of claim 1 wherein in the first mode, when the number of production data is not less than the preset number, each of the plurality of filtering modules calculates a plurality of neighbor distances of the production data from a different plurality of distance parameters to generate a plurality of discrete convolution values and calculates a crowd value of the plurality of discrete convolution values.

4. The data cleansing system of claim 3 wherein in the first mode, each of the plurality of filtering modules calculates a molo vote result for the plurality of discrete convolution values and filters the production data based on a comparison between the mode value and the molo vote result to generate filtered production data or adjusts the preset number to determine whether to generate filtered production data based on the adjusted preset number.

5. The data cleansing system of claim 4 wherein in the first mode, when the mode value meets the mole voting result, each of the plurality of filtering modules determines whether the plurality of discrete convolution values meets the mode value based on the comparison result to determine whether to filter the production data.

6. The data cleansing system of claim 4 wherein, in the first mode, each of the plurality of filter modules increases the predetermined number based on the comparison when the mode value does not correspond to the mols voting result.

7. A method of data cleansing, wherein a plurality of modules in a memory are executed by a processor, wherein the plurality of modules includes a plurality of acquisition modules and a plurality of filtering modules corresponding to a plurality of parametric models, comprising:

acquiring production data in a workshop system through each of the plurality of acquisition modules;

judging whether to generate filtered production data according to a preset number of production data in a first mode through each of the plurality of filtering modules; and

and switching to the first mode or outputting the filtered production data to an application system according to the corresponding parameter model and the filtered production data in a second mode through each of the plurality of filtering modules.

8. The data cleansing method of claim 7 wherein the step of executing, by the processor, the plurality of modules in the memory further comprises:

by each of the plurality of filter modules, in the first mode, the production data is accumulated when the number of production data is less than the preset number.

9. The data cleansing method of claim 7 wherein the step of executing, by the processor, the plurality of modules in the memory further comprises:

with each of the plurality of filtering modules, in the first mode, when the number of production data is not less than the preset number, calculating a plurality of neighbor distances of the production data according to a different plurality of distance parameters to generate a plurality of discrete convolution values, and calculating a crowd value of the plurality of discrete convolution values.

10. The data cleansing method of claim 9 wherein the step of executing, by the processor, the plurality of modules in the memory further comprises:

calculating, by each of the plurality of filtering modules, a mole vote result for the plurality of discrete convolution values in the first mode; and

And in the first mode, filtering the production data according to a comparison result between the mode value and the mole voting result to generate filtered production data, or adjusting the preset quantity to judge whether to generate the filtered production data according to the adjusted preset quantity.

11. The data cleansing method of claim 10 wherein the step of executing, by the processor, the plurality of modules in the memory further comprises:

and in the first mode, when the mode value accords with the mole voting result, judging whether the discrete convolution values accord with the mode value according to the comparison result so as to judge whether to filter the production data.

12. The data cleansing method of claim 10 wherein the step of executing, by the processor, the plurality of modules in the memory further comprises:

and in the first mode, when the mode value does not accord with the mole voting result, increasing the preset quantity according to the comparison result through each of the plurality of filtering modules.