CN114036144A - Data cleaning method and device, electronic equipment and storage medium - Google Patents
Data cleaning method and device, electronic equipment and storage medium Download PDFInfo
- Publication number
- CN114036144A CN114036144A CN202111325874.0A CN202111325874A CN114036144A CN 114036144 A CN114036144 A CN 114036144A CN 202111325874 A CN202111325874 A CN 202111325874A CN 114036144 A CN114036144 A CN 114036144A
- Authority
- CN
- China
- Prior art keywords
- data
- result
- processed
- cleaning
- original
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000004140 cleaning Methods 0.000 title claims abstract description 86
- 238000000034 method Methods 0.000 title claims abstract description 61
- 238000003860 storage Methods 0.000 title claims abstract description 24
- 238000012545 processing Methods 0.000 claims abstract description 53
- 238000004364 calculation method Methods 0.000 claims abstract description 28
- 238000006243 chemical reaction Methods 0.000 claims abstract description 28
- 238000012795 verification Methods 0.000 claims abstract description 16
- 238000013507 mapping Methods 0.000 claims description 9
- 238000012216 screening Methods 0.000 claims description 9
- 230000002688 persistence Effects 0.000 claims description 8
- 238000004458 analytical method Methods 0.000 claims description 4
- 230000004044 response Effects 0.000 claims description 4
- 238000004590 computer program Methods 0.000 claims description 3
- 238000012544 monitoring process Methods 0.000 claims description 3
- 238000012800 visualization Methods 0.000 claims description 3
- 238000003672 processing method Methods 0.000 abstract description 6
- 230000008569 process Effects 0.000 description 22
- 230000009471 action Effects 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 238000010586 diagram Methods 0.000 description 2
- 238000001914 filtration Methods 0.000 description 2
- 238000004519 manufacturing process Methods 0.000 description 2
- JASONGFGOLHLGB-UHFFFAOYSA-N Atranol Chemical compound CC1=CC(O)=C(C=O)C(O)=C1 JASONGFGOLHLGB-UHFFFAOYSA-N 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 238000004422 calculation algorithm Methods 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000007405 data analysis Methods 0.000 description 1
- 238000013524 data verification Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000008030 elimination Effects 0.000 description 1
- 238000003379 elimination reaction Methods 0.000 description 1
- 230000003631 expected effect Effects 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000003032 molecular docking Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000013515 script Methods 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 239000013589 supplement Substances 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
- 230000001960 triggered effect Effects 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/21—Design, administration or maintenance of databases
- G06F16/215—Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/25—Integrating or interfacing systems involving database management systems
- G06F16/258—Data format conversion from or to a database
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/28—Databases characterised by their database models, e.g. relational or object models
- G06F16/284—Relational databases
- G06F16/285—Clustering or classification
- G06F16/287—Visualization; Browsing
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Quality & Reliability (AREA)
- Stored Programmes (AREA)
Abstract
The application discloses a method and a device for data cleaning processing, electronic equipment and a storage medium, wherein the method for data cleaning processing comprises the following steps: acquiring data of an original data source; converting the data of the original data source into data to be processed with a uniform caliber according to a preset conversion rule; cleaning the data to be processed according to the preset logic flow arrangement and data logic configuration through a flow calculation model to obtain result data; analyzing whether data which do not meet the corresponding expected result exist in the result data or not based on preconfigured alarm information and verification information; if the result data are analyzed to be free of data which do not meet the corresponding expected results, the result data are output to a target table; and visually displaying the result data in the target table. Therefore, a simple and accurate data cleaning processing method is realized.
Description
Technical Field
The present disclosure relates to the field of data cleaning technologies, and in particular, to a method and an apparatus for cleaning data, an electronic device, and a storage medium.
Background
In the actual production process of the current industry, production information of equipment can be collected by a plurality of industries, so that a large amount of data in different structural forms can be generated, the data often needs to be further cleaned, such as redundant data elimination, missing data supplement and the like, and the cleaned data can be visually displayed to meet the requirements of enterprise data analysis and decision making.
The existing method for cleaning data mainly configures corresponding data cleaning rules for different data sources, and compiles corresponding scripts based on the corresponding data cleaning rules to realize cleaning processing of data.
However, the cleaning process in the existing mode is an integral process, and if a problem exists, the whole cleaning process needs to be reconfigured and compiled, so that the cleaning process is too complicated. Moreover, different data sources are configured with different data rules, which not only takes a lot of time and cost, but also is quite inconvenient when the data rules need to be adjusted.
Disclosure of Invention
Based on the defects of the prior art, the application provides a method and a device for data cleaning processing, electronic equipment and a storage medium, so as to solve the problem that the existing mode is too complicated.
In order to achieve the above object, the present application provides the following technical solutions:
a first aspect of the present application provides a method for data cleaning processing, including:
acquiring data of an original data source;
converting the data of the original data source into data to be processed with a uniform caliber according to a preset conversion rule;
cleaning the data to be processed according to the preset logic flow arrangement and data logic configuration through a flow calculation model to obtain result data;
analyzing whether data which do not meet the corresponding expected result exist in the result data or not based on preconfigured alarm information and verification information;
if the result data are analyzed to be free of data which do not meet the corresponding expected results, the result data are output to a target table;
and visually displaying the result data in the target table.
Optionally, in the method of data cleansing processing described above, the acquiring data of the original data source includes:
collecting data of a plurality of data sources;
and storing the data of each data source according to a preset storage format to obtain a plurality of original data sources.
Optionally, in the method for data cleaning processing, the converting the data of the original data source into the to-be-processed data with a uniform aperture according to a preset conversion rule includes:
screening the original data source to be processed from a plurality of original data sources;
determining the data type of the data of the original data source to be processed according to the metadata of the data of the original data source to be processed;
and converting the data type of the data of each original data source to be processed according to the pre-established mapping relationship between the data type of each field of each original data source to be processed and the data type of each field of the target data source to obtain the data to be processed with the uniform caliber.
Optionally, in the method for data cleaning processing, after the converting the data of the original data source into the to-be-processed data with a uniform aperture according to a preset conversion rule, the method further includes:
and carrying out local persistence processing on the data to be processed.
Optionally, in the method for data cleaning processing, before the cleaning the data to be processed according to the preset logic flow arrangement and data logic configuration through the stream computation model, the method further includes:
configuring the logic flow arrangement, the data logic configuration, and the alarm information of the flow computation model in response to a configuration operation by a user;
triggering any flow node of the flow calculation model, and cleaning the debugging data according to the preset logic flow arrangement and data logic configuration to obtain the cleaning result of any flow node;
displaying the cleaning result of any flow node, and analyzing whether the cleaning result of any flow node meets a corresponding expected result or not based on the alarm information and the verification information;
and if the cleaning result of any flow node does not meet the corresponding expected result, generating alarm prompt information based on the cleaning result of any flow node, and visualizing the alarm prompt information.
A second aspect of the present application provides an apparatus for data cleaning processing, including:
the acquisition unit is used for acquiring data of an original data source;
the conversion unit is used for converting the data of the original data source into the data to be processed with the uniform caliber according to a preset conversion rule;
the data processing model is used for cleaning the data to be processed through the stream calculation model according to the preset logic flow arrangement and data logic configuration to obtain result data;
the data monitoring unit is used for analyzing whether data which do not meet the corresponding expected result exist in the result data or not based on preconfigured alarm information and verification information;
the output unit is used for outputting the result data to a target table if the result data is analyzed to be free of data which does not meet the corresponding expected result;
and the visualization unit is used for visually displaying the result data in the target table.
Optionally, in the above apparatus for data cleaning processing, the acquiring unit includes:
the acquisition unit is used for acquiring data of a plurality of data sources;
and the storage unit is used for storing the data of each data source according to a preset storage format to obtain a plurality of original data sources.
Optionally, in the above apparatus for data cleansing processing, the converting unit includes:
the screening unit is used for screening the original data source to be processed from the plurality of original data sources;
the type determining unit is used for determining the data type of the data of the original data source to be processed according to the metadata of the data of the original data source to be processed;
and the conversion subunit is used for converting the data type of the data of each original data source to be processed according to the pre-established mapping relationship between the data type of each field of each original data source to be processed and the data type of each field of the target data source to obtain the data to be processed with the uniform aperture.
Optionally, the apparatus for data cleansing processing described above further includes:
and the persistence unit is used for carrying out local persistence processing on the data to be processed.
Optionally, the apparatus for data cleansing processing described above further includes:
a configuration unit, configured to configure the logic flow arrangement, the data logic configuration, and the alarm information of the flow computation model in response to a configuration operation of a user;
the debugging unit is used for triggering any flow node of the flow calculation model, cleaning the debugging data according to the preset logic flow arrangement and data logic configuration, and obtaining the cleaning result of any flow node;
the analysis unit is used for displaying the cleaning result of any flow node and analyzing whether the cleaning result of any flow node meets a corresponding expected result or not based on the alarm information and the verification information;
and the prompting unit is used for generating alarm prompting information based on the cleaning result of any flow node and visualizing the alarm prompting information if the cleaning result of any flow node does not meet the corresponding expected result.
A third aspect of the present application provides an electronic device comprising:
a memory and a processor;
wherein the memory is used for storing programs;
the processor is configured to execute the program, and when the program is executed, the processor is specifically configured to implement the method of data cleansing processing according to any one of the above items.
A fourth aspect of the present application provides a computer storage medium storing a computer program for implementing a method of data cleansing processing as claimed in any one of the preceding claims when executed.
According to the data cleaning processing method, the data of the original data source is obtained, and then the data of the original data source is converted into the data to be processed with the uniform caliber according to the preset conversion rule, so that the process of analyzing the data type in the program execution process of the data is simplified. And cleaning the data to be processed through a flow calculation model according to the preset logic flow arrangement and data logic configuration to obtain result data. Therefore, data cleaning is achieved through flow calculation, the flow calculation is divided into flow nodes to clean the data, and adjustment can be performed only on a certain flow node. And then, analyzing whether data which do not meet the corresponding expected result exist in the result data or not based on the pre-configured alarm information and the check information. And if the data which do not meet the corresponding expected result do not exist in the analyzed result data, outputting the result data to a target table, thereby ensuring the accuracy of the result. And finally, visually displaying the result data in the target table. Therefore, a simple and accurate data cleaning processing method is realized.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, it is obvious that the drawings in the following description are only embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.
Fig. 1 is a flowchart of a method of data cleansing processing according to an embodiment of the present application;
fig. 2 is a flowchart of a method for acquiring data of an original data source according to an embodiment of the present application;
fig. 3 is a flowchart of a method for converting data according to an embodiment of the present application;
fig. 4 is a flowchart of a configuration method of a flow calculation model according to an embodiment of the present disclosure;
fig. 5 is a schematic structural diagram of an apparatus for data cleaning processing according to an embodiment of the present disclosure;
fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
In this application, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
An embodiment of the present application provides a method for data cleaning processing, as shown in fig. 1, including the following steps:
s101, acquiring data of an original data source.
Specifically, in the embodiment of the present application, first, the data source needs to be accessed, and then, the data of the data source needs to be processed.
Optionally, the data of one or more original data sources may be acquired in a manner of being entered from a collector or manually. Wherein the format of the data of each original data source can be different.
As shown in fig. 2, one embodiment of step S101 includes the following steps:
s201, collecting data of a plurality of data sources.
In particular, the respective data may be collected from a plurality of data sources that generate the data.
S202, storing the data of each data source according to a preset storage format to obtain a plurality of original data sources.
The data of each data source is stored according to a preset storage format, for example, the data of the same type is stored in the same field, and the like, so that the subsequent mapping of the data type and the processing of other processing are facilitated.
And S102, converting the data of the original data source into the data to be processed with the uniform caliber according to a preset conversion rule.
It should be noted that, because formats of data of different original data sources or the same data source may not be uniform, so that uniform processing is not facilitated, in the embodiment of the present application, a preset conversion rule is preset, so as to convert data of an original data source into to-be-processed data with a uniform aperture according to the preset conversion rule.
Specifically, the preset conversion rules may include conversion rules corresponding to different types of data, so as to convert the different types of data respectively.
Optionally, in another embodiment of the present application, after the step S102 is executed, the following steps may be further executed: and carrying out local persistence processing on the data to be processed.
In the embodiment of the present application, the access of the data source has four parts including acquisition, storage, access and local persistence of the data source.
Optionally, in another embodiment of the present application, the original data source includes a plurality of, and accordingly, an implementation manner of step S102 in this embodiment of the present application is, as shown in fig. 3, including:
s301, screening out original data sources to be processed from the plurality of original data sources.
It should be noted that, in order to avoid the need for different docking data sources, all the original data sources are usually docked. However, in the data to be processed, the requirements are different according to the business and the like, so that the original data source to be processed needs to be screened out from a plurality of original data sources according to the requirements.
S302, determining the data type of the data of the original data source to be processed according to the metadata of the data of the original data source to be processed.
The metadata is data describing data, that is, descriptive information on data set information resources. In the embodiment of the application, the data type of the data is converted according to the data type of the data, so that the data type of the data of the original data source to be processed is determined according to the metadata of the data.
And S303, converting the data type of the data of each original data source to be processed according to the pre-established mapping relationship between each field of each original data source to be processed and each field of the target data source to obtain the data to be processed with the uniform caliber.
In the embodiment of the application, the mapping relationship between each field of each original data source and the data type of each field of the target data source is established in advance. The target data source refers to a data source for storing the generated data to be processed.
The established mapping relation is that data in different recording formats or on different storage media are processed through a conversion rule to form a data source with a uniform caliber.
Alternatively, for structured data, such as MySql, Orcal, etc., a data type conversion rule is employed, such as conversion of MySql data types in Table 1.
TABLE 1
Data type of original data source | Transformation rules | Data type of target data source |
char | … | String |
Date | … | Date |
BOOL | … | Boolean |
For semi-structured data, such as JSON, XML and the like, a visual operation interface is adopted to carry out serialized display on the semi-structured data, and the data type of the data of the target data source and the data type of the data of the original data source are mapped through the specification of a user on the data type of the data field.
And S103, cleaning the data to be processed through the stream calculation model according to the preset logic flow arrangement and data logic configuration to obtain result data.
It should be noted that, in the implementation of the present application, after the flow calculation model is configured in advance, the flow calculation process is performed on the to-be-processed flow calculation model by the concurrent flow calculation model. Logic flow arrangement, data logic configuration and alarm information are particularly required to be configured.
It should be noted that, when the flow computation model cleans the data to be processed, it usually needs to process the data through multiple process links, such as filtering, sorting, missing value processing, and the like. Therefore, in order to ensure the accuracy of the result, each flow node of the flow calculation model may also be debugged when the flow calculation model is configured, that is, in the embodiment of the present application, the flow calculation model processing includes data logic flow arrangement, data logic configuration, alarm information configuration, and data flow debugging.
Optionally, a method for configuring a flow computation model provided in another embodiment of the present application is shown in fig. 4, and includes:
s401, responding to configuration operation of a user, configuring logic flow arrangement, data logic configuration and alarm information of the flow calculation model.
The data logic is configured with each processing rule of the data to be processed. The data flow arrangement refers to a flow that a target data source with a uniform caliber passes through a series of logic assumptions of a user so as to achieve the purpose of data cleaning. The method specifically comprises three parts of data input, data rule processing arrangement and data output.
The data input refers to selecting a target data source with a uniform caliber accessed according to metadata and processed, further performing field name deduplication processing on the input target data source, circulating to a specified process environment, and processing data to be processed by using a corresponding processing rule. Alternatively, the input target data source may be one or more different types of target data sources.
It should be noted that the data rule processing refers to performing data cleaning on input data to be processed according to a certain processing rule, and forwarding the data to be processed to the next process node for processing. The processing rule refers to a data cleaning mode, such as filtering, sorting, missing value processing and the like, of data.
The alarm information configuration is mainly used for performing data quality and safety control on input data to be processed through a data logic flow and a data result generated by logic configuration by using the configured alarm information and a data verification mechanism in the system.
S402, triggering any flow node of the flow calculation model, cleaning debugging data according to the preset logic flow arrangement and data logic configuration, obtaining the cleaning result of the flow node and displaying the cleaning result of the flow node.
Specifically, in the data flow debugging process, any node debugging operation is triggered to process debugging data in real time, a cleaning result of the flow node can be returned in real time through WebSocket, and then the request result is visually displayed to provide debugging personnel.
And S403, analyzing whether the cleaning result of any flow node meets the corresponding expected result or not based on the alarm information and the verification information.
The alarm information comprises a user-defined threshold, an upper limit, a lower limit, precision and the like. The verification information has rules including verification methods and the like.
Optionally, in order to make the user know the current alarm information and the verification information, the alarm information and the verification information may be displayed on a user interface.
If the cleaning result of any flow node does not meet the corresponding expected result, step S404 is executed.
S404, generating alarm prompt information based on the cleaning result of any flow node, and visualizing the alarm prompt information.
Optionally, the alarm prompt message may include the analysis result in step S403, specific error data, and the like, so that the user can accurately know the existing problem and adjust the process node. After the flow node is adjusted, subsequent debugging can be continued from the flow node until the final result achieves the expected effect.
Alternatively, the result data obtained by executing step S103 may only include the final cleaning result, i.e. only include the input of the last process node, or may also include the input of each process node.
And S104, analyzing whether data which do not meet the corresponding expected result exist in the result data or not based on the pre-configured alarm information and the check information.
In order to ensure the accuracy of the finally displayed result data, whether data which do not meet the corresponding expected result exists in the result data or not needs to be analyzed based on the preset alarm information and the check information.
If it is determined that there is no data that does not satisfy the expected result in the result data, step S105 is executed. If the data which do not meet the corresponding expected results exist in the analyzed result data, the flow nodes which do not meet the corresponding expected results are adjusted according to the mode based on the graph shown in fig. 4.
And S105, outputting the result data to a target table.
Specifically, the result data is output to the target table to be stored according to a specified format, so that charts with different formats can be generated and displayed according to requirements.
And S106, visually displaying the result data in the target table.
Optionally, the result data in the target table is visually displayed, specifically, the result data may be visualized in a form of a report, a table, a chart, or the like.
According to the data cleaning processing method provided by the embodiment of the application, the data of the original data source is obtained, and then the data of the original data source is converted into the data to be processed with the uniform caliber according to the preset conversion rule, so that corresponding data rules do not need to be configured for different data sources, and the configuration process is simplified. And cleaning the data to be processed through a flow calculation model according to the preset logic flow arrangement and data logic configuration to obtain result data. Therefore, data cleaning is achieved through flow calculation, the flow calculation is divided into flow nodes to clean the data, and adjustment can be performed only on a certain flow node. And then, analyzing whether data which do not meet the corresponding expected result exist in the result data or not based on the pre-configured alarm information and the check information. And if the data which do not meet the corresponding expected result do not exist in the analyzed result data, outputting the result data to a target table, thereby ensuring the accuracy of the result. And finally, visually displaying the result data in the target table. Therefore, a simple, convenient and accurate data cleaning processing method is realized.
Another embodiment of the present application provides an apparatus for data cleaning processing, as shown in fig. 5, including:
an obtaining unit 501 is configured to obtain data of an original data source.
The conversion unit 502 is configured to convert data of an original data source into to-be-processed data with a uniform aperture according to a preset conversion rule.
And the data processing model 503 is configured to perform cleaning on the data to be processed through the stream computation model according to the preset logic flow arrangement and the data logic configuration, so as to obtain result data.
And a data monitoring unit 504, configured to analyze whether data that does not meet a corresponding expected result exists in the result data based on the preconfigured alarm information and the verification information.
And an output unit 505, configured to output the result data to the target table if it is analyzed that no data that does not meet the corresponding expected result exists in the result data.
And the visualization unit 506 is used for visually displaying the result data in the target table.
Optionally, in an apparatus for data cleaning processing provided in another embodiment of the present application, the obtaining unit includes:
and the acquisition unit is used for acquiring data of a plurality of data sources.
And the storage unit is used for storing the data of each data source according to a preset storage format to obtain a plurality of original data sources.
Optionally, in an apparatus for data cleansing processing provided in another embodiment of the present application, a conversion unit includes:
and the screening unit is used for screening the original data source to be processed from the plurality of original data sources.
And the type determining unit is used for determining the data type of the data of the original data source to be processed according to the metadata of the data of the original data source to be processed.
And the conversion subunit is used for converting the data type of the data of each original data source to be processed according to the pre-established mapping relationship between the data type of each field of each original data source to be processed and the data type of each field of the target data source to obtain the data to be processed with the uniform caliber.
Optionally, in an apparatus for data cleaning processing provided in another embodiment of the present application, the apparatus further includes:
and the persistence unit is used for performing local persistence processing on the data to be processed.
Optionally, in an apparatus for data cleaning processing provided in another embodiment of the present application, the apparatus further includes:
and the configuration unit is used for responding to the configuration operation of a user, and configuring the logic flow arrangement, the data logic configuration and the alarm information of the flow calculation model.
And the debugging unit is used for triggering any flow node of the flow calculation model, cleaning debugging data according to the preset logic flow arrangement and data logic configuration, and obtaining the cleaning result of any flow node.
And the analysis unit is used for displaying the cleaning result of any flow node and analyzing whether the cleaning result of any flow node meets the corresponding expected result or not based on the alarm information and the verification information.
And the prompting unit is used for generating alarm prompting information based on the cleaning result of any flow node and visualizing the alarm prompting information if the cleaning result of any flow node does not meet the corresponding expected result.
It should be noted that, for the specific working processes of each unit provided in the foregoing embodiments of the present application, corresponding steps in the foregoing method embodiments may be referred to accordingly, and are not described herein again.
Another embodiment of the present application provides an electronic device, as shown in fig. 6, including:
a memory 601 and a processor 602.
The memory 601 is used for storing programs.
The processor 602 is configured to execute the program stored in the memory 601, and when the program is executed, the method for implementing the data cleansing processing provided in any of the above embodiments is specifically implemented.
It should be noted that, in the specific implementation process, reference may be made to the specific steps of the data cleaning processing method provided in each of the above embodiments, and details are not described here again.
Another embodiment of the present application provides a computer storage medium for storing a computer program, which when executed, is used to implement the method of data cleansing processing provided in any one of the above embodiments.
Computer storage media, including permanent and non-permanent, removable and non-removable media, may implement the information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.
Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
Claims (10)
1. A method of data cleansing processing, comprising:
acquiring data of an original data source;
converting the data of the original data source into data to be processed with a uniform caliber according to a preset conversion rule;
cleaning the data to be processed according to the preset logic flow arrangement and data logic configuration through a flow calculation model to obtain result data;
analyzing whether data which do not meet the corresponding expected result exist in the result data or not based on preconfigured alarm information and verification information;
if the result data are analyzed to be free of data which do not meet the corresponding expected results, the result data are output to a target table;
and visually displaying the result data in the target table.
2. The method of claim 1, wherein the obtaining data from the original data source comprises:
collecting data of a plurality of data sources;
and storing the data of each data source according to a preset storage format to obtain a plurality of original data sources.
3. The method according to claim 2, wherein the converting the data of the original data source into the data to be processed with the uniform aperture according to the preset conversion rule comprises:
screening the original data source to be processed from a plurality of original data sources;
determining the data type of the data of the original data source to be processed according to the metadata of the data of the original data source to be processed;
and converting the data type of the data of each original data source to be processed according to the pre-established mapping relationship between the data type of each field of each original data source to be processed and the data type of each field of the target data source to obtain the data to be processed with the uniform caliber.
4. The method according to claim 1, wherein after converting the data of the original data source into the data to be processed with the uniform aperture according to the preset conversion rule, the method further comprises:
and carrying out local persistence processing on the data to be processed.
5. The method according to claim 1, wherein the cleaning of the data to be processed by the stream computation model according to the pre-configured logic flow arrangement and data logic configuration further comprises, before the obtained result data:
configuring the logic flow arrangement, the data logic configuration, and the alarm information of the flow computation model in response to a configuration operation by a user;
triggering any flow node of the flow calculation model, and cleaning the debugging data according to the preset logic flow arrangement and data logic configuration to obtain the cleaning result of any flow node;
displaying the cleaning result of any flow node, and analyzing whether the cleaning result of any flow node meets a corresponding expected result or not based on the alarm information and the verification information;
and if the cleaning result of any flow node does not meet the corresponding expected result, generating alarm prompt information based on the cleaning result of any flow node, and visualizing the alarm prompt information.
6. An apparatus for data cleansing processing, comprising:
the acquisition unit is used for acquiring data of an original data source;
the conversion unit is used for converting the data of the original data source into the data to be processed with the uniform caliber according to a preset conversion rule;
the data processing model is used for cleaning the data to be processed through the stream calculation model according to the preset logic flow arrangement and data logic configuration to obtain result data;
the data monitoring unit is used for analyzing whether data which do not meet the corresponding expected result exist in the result data or not based on preconfigured alarm information and verification information;
the output unit is used for outputting the result data to a target table if the result data is analyzed to be free of data which does not meet the corresponding expected result;
and the visualization unit is used for visually displaying the result data in the target table.
7. The apparatus of claim 6, wherein the conversion unit comprises:
the screening unit is used for screening the original data source to be processed from the plurality of original data sources;
the type determining unit is used for determining the data type of the data of the original data source to be processed according to the metadata of the data of the original data source to be processed;
and the conversion subunit is used for converting the data type of the data of each original data source to be processed according to the pre-established mapping relationship between the data type of each field of each original data source to be processed and the data type of each field of the target data source to obtain the data to be processed with the uniform aperture.
8. The apparatus of claim 6, further comprising:
a configuration unit, configured to configure the logic flow arrangement, the data logic configuration, and the alarm information of the flow computation model in response to a configuration operation of a user;
the debugging unit is used for triggering any flow node of the flow calculation model, cleaning the debugging data according to the preset logic flow arrangement and data logic configuration, and obtaining the cleaning result of any flow node;
the analysis unit is used for displaying the cleaning result of any flow node and analyzing whether the cleaning result of any flow node meets a corresponding expected result or not based on the alarm information and the verification information;
and the prompting unit is used for generating alarm prompting information based on the cleaning result of any flow node and visualizing the alarm prompting information if the cleaning result of any flow node does not meet the corresponding expected result.
9. An electronic device, comprising:
a memory and a processor;
wherein the memory is used for storing programs;
the processor is adapted to execute the program, which when executed is particularly adapted to implement the method of data cleansing processing according to any of claims 1 to 5.
10. A computer storage medium storing a computer program which, when executed, implements a method of data cleansing processing according to any one of claims 1 to 5.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111325874.0A CN114036144A (en) | 2021-11-10 | 2021-11-10 | Data cleaning method and device, electronic equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111325874.0A CN114036144A (en) | 2021-11-10 | 2021-11-10 | Data cleaning method and device, electronic equipment and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114036144A true CN114036144A (en) | 2022-02-11 |
Family
ID=80143822
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111325874.0A Pending CN114036144A (en) | 2021-11-10 | 2021-11-10 | Data cleaning method and device, electronic equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114036144A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114780527A (en) * | 2022-04-21 | 2022-07-22 | 中国农业银行股份有限公司 | Data cleaning method and device |
CN115391315A (en) * | 2022-07-15 | 2022-11-25 | 生命奇点(北京)科技有限公司 | Data cleaning method and device |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109003459A (en) * | 2018-07-17 | 2018-12-14 | 泉州装备制造研究所 | A kind of regional traffic signal control method and system based on layering stream calculation |
CN110245158A (en) * | 2019-06-10 | 2019-09-17 | 上海理想信息产业(集团)有限公司 | A kind of multi-source heterogeneous generating date system and method based on Flink stream calculation technology |
CN112685004A (en) * | 2020-12-21 | 2021-04-20 | 福建新大陆软件工程有限公司 | Online component arrangement calculation method and system based on real-time stream calculation |
US20210319043A1 (en) * | 2020-04-13 | 2021-10-14 | Singapore University Of Technology And Design | Multi-source data management mechanism and platform |
-
2021
- 2021-11-10 CN CN202111325874.0A patent/CN114036144A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109003459A (en) * | 2018-07-17 | 2018-12-14 | 泉州装备制造研究所 | A kind of regional traffic signal control method and system based on layering stream calculation |
CN110245158A (en) * | 2019-06-10 | 2019-09-17 | 上海理想信息产业(集团)有限公司 | A kind of multi-source heterogeneous generating date system and method based on Flink stream calculation technology |
US20210319043A1 (en) * | 2020-04-13 | 2021-10-14 | Singapore University Of Technology And Design | Multi-source data management mechanism and platform |
CN112685004A (en) * | 2020-12-21 | 2021-04-20 | 福建新大陆软件工程有限公司 | Online component arrangement calculation method and system based on real-time stream calculation |
Non-Patent Citations (1)
Title |
---|
朱碧钦;吴飞;罗富财;: "基于大数据的全业务统一数据中心数据分析域建设研究", 电力信息与通信技术, no. 02, 15 February 2017 (2017-02-15) * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114780527A (en) * | 2022-04-21 | 2022-07-22 | 中国农业银行股份有限公司 | Data cleaning method and device |
CN115391315A (en) * | 2022-07-15 | 2022-11-25 | 生命奇点(北京)科技有限公司 | Data cleaning method and device |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
EP3798846B1 (en) | Operation and maintenance system and method | |
Miguel et al. | A review of software quality models for the evaluation of software products | |
CN106656536B (en) | Method and equipment for processing service calling information | |
Mkrtchyan et al. | Methods for building conditional probability tables of Bayesian belief networks from limited judgment: an evaluation for human reliability application | |
CN114036144A (en) | Data cleaning method and device, electronic equipment and storage medium | |
US20190258747A1 (en) | Interactive digital twin | |
JP5831558B2 (en) | Operation management apparatus, operation management method, and program | |
CN109542789B (en) | Code coverage rate statistical method and device | |
US20170192872A1 (en) | Interactive detection of system anomalies | |
Subramaniyan et al. | An algorithm for data-driven shifting bottleneck detection | |
Subramaniyan et al. | Data-driven algorithm for throughput bottleneck analysis of production systems | |
CN103631713A (en) | ERP software automated testing system and method | |
CN111324526B (en) | Interface test system, method and server | |
CN111679808B (en) | RPA robot application demand evaluation method and device | |
CN110908903B (en) | Test method based on editable YAML file | |
Mandhan et al. | Analysis of approach for predicting software defect density using static metrics | |
Schachinger et al. | An advanced data analytics framework for energy efficiency in buildings | |
CN112686773A (en) | Method for constructing power metering full-link key service abnormity positioning model based on fusion service topology | |
KR20210138933A (en) | Software development and test automation framework | |
CN116527553A (en) | Processing method, system and storage medium for automatic test report of switch | |
Corallo et al. | Processing big data in streaming for fault prediction: an industrial application | |
CN114756401A (en) | Abnormal node detection method, device, equipment and medium based on log | |
CN114168408A (en) | Inspection method and system based on Internet of things, electronic equipment and storage medium | |
WO2015182072A1 (en) | Causal structure estimation system, causal structure estimation method and program recording medium | |
JP2011159202A (en) | Test item generating method, device, and program |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |