CN115309735A - Big data cleaning method and device, computer equipment and storage medium - Google Patents

Big data cleaning method and device, computer equipment and storage medium Download PDF

Info

Publication number
CN115309735A
CN115309735A CN202211078831.1A CN202211078831A CN115309735A CN 115309735 A CN115309735 A CN 115309735A CN 202211078831 A CN202211078831 A CN 202211078831A CN 115309735 A CN115309735 A CN 115309735A
Authority
CN
China
Prior art keywords
data
cleaning
cleaned
strategy
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211078831.1A
Other languages
Chinese (zh)
Inventor
宋平
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Bank of China Ltd
Original Assignee
Bank of China Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Bank of China Ltd filed Critical Bank of China Ltd
Priority to CN202211078831.1A priority Critical patent/CN115309735A/en
Publication of CN115309735A publication Critical patent/CN115309735A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Quality & Reliability (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application provides a big data cleaning method, a device, computer equipment and a storage medium, which can be applied to multiple fields such as artificial intelligence, cloud computing, big data and the like or financial fields, and can dynamically adjust a data cleaning strategy in the process of cleaning data to be cleaned from any data source under the condition of not stopping data cleaning treatment according to an original target data cleaning strategy, namely when a data cleaning strategy adjusting event is detected, the same data to be cleaned is cleaned simultaneously according to the data cleaning strategies before and after adjustment, and the received data to be cleaned is continuously cleaned by judging whether the data to be cleaned is applied or not to determine that the obtained cleaning data or pre-cleaning data is the target cleaning data of the data to be cleaned, so that the data cleaning effect is ensured, the full utilization of computing resources is realized, the maintenance cost is reduced, and the data cleaning speed is increased.

Description

Big data cleaning method and device, computer equipment and storage medium
Technical Field
The application relates to the field of big data application, in particular to a big data cleaning method and device, computer equipment and a storage medium.
Background
Data cleansing (dataclearing) is a process of reviewing and verifying data to remove duplicate information, correct errors that exist, check for multiple types of data, such as incomplete data. Therefore, the data cleaning can find and correct recognizable errors in the data file, generally including checking data consistency, processing invalid values and missing values and the like, and an appropriate data cleaning method can be selected according to actual conditions.
The cleaning process based on the abnormal data such as missing values and abnormal values in the service data is limited by the computing power of the computer equipment, and is usually implemented by simple fixed operations such as direct discarding, smoothing the adjacent data, mean (median) substitution, and substitution of the numerical values fitted by a fitting curve, and the processing process is simple but the data cleaning effect is poor and the cleaning speed is low.
Disclosure of Invention
In order to solve the above problem, the embodiments of the present application provide the following technical solutions:
in one aspect, the present application provides a big data cleaning method, including:
receiving data to be cleaned from a data source, and acquiring a corresponding target data cleaning strategy;
detecting a data cleaning strategy adjusting event, cleaning the data to be cleaned according to the target data cleaning strategy to obtain cleaning data, and pre-cleaning the data to be cleaned according to the adjusted data cleaning strategy to obtain pre-cleaning data;
obtaining an application judgment result aiming at the to-be-determined data cleaning strategy; the application judgment result can represent whether to apply the cleaning strategy of the data to be cleaned to continuously carry out cleaning treatment on the received data to be cleaned;
obtaining target cleaning data of the data to be cleaned according to the application judgment result; the target cleaning data is the cleaning data or the pre-cleaning data.
Optionally, the method further includes:
writing the processed cleaning data into a first database for storage; and/or the presence of a gas in the gas,
writing the pre-cleaning data obtained by processing into a second database for storage;
and carrying out data synchronization on the first database and the second database according to the synchronization mode corresponding to the application judgment result.
Optionally, the performing data synchronization on the first database and the second database according to the synchronization mode corresponding to the application determination result includes:
if the application judgment result is yes, writing the target cleaning data into the first database for storage, and deleting the cleaning data stored in the first database and obtained by corresponding processing;
and if the application judgment result is negative, writing the target cleaning data into the second database for storage, and deleting the pre-cleaning data obtained by the corresponding processing stored in the second database.
Optionally, the method further includes:
writing the data to be cleaned from the data source into a third database for storage; the third database is configured with a data storage period so as to delete the data to be cleaned, the storage time of which reaches the data storage period;
and forbidding the response to the cleaning processing instruction and the data synchronization instruction aiming at the data to be cleaned stored in the third database.
Optionally, the obtaining of the corresponding target data cleaning policy includes:
acquiring data characteristics of the data to be cleaned;
determining a target data cleaning strategy aiming at the data to be cleaned according to the data characteristics and a pre-configured cleaning depth strategy;
the target data cleaning strategy comprises at least one data cleaning model corresponding to a cleaning depth, and the data cleaning model is obtained based on machine learning algorithm and/or cleaning algorithm training so as to realize cleaning treatment on cleaning data.
Optionally, the obtaining the application determination result for the pending data cleansing policy includes:
obtaining an application selection instruction input by a monitoring person aiming at the to-be-determined data cleaning strategy, and obtaining an application judgment result of whether to apply the to-be-determined data cleaning strategy to execute data cleaning operation;
or the like, or a combination thereof,
calling a pre-configured data cleaning evaluation strategy, and evaluating the pre-cleaning data to obtain a cleaning evaluation result;
and judging whether to apply the undetermined data cleaning strategy to execute data cleaning operation or not according to the cleaning evaluation result to obtain a corresponding application judgment result.
Optionally, the method further includes:
acquiring cleaning index information aiming at the pre-cleaning data and the cleaning data according to a pre-configured cleaning index; the cleaning index information can represent a comparison result of a pre-cleaning effect and an original cleaning effect aiming at the same data to be cleaned;
outputting the cleaning index information;
and/or dynamically adjusting model parameters of a data cleaning model contained in the target data cleaning strategy by using the cleaning index information and the target cleaning data in an asynchronous communication mode to generate a data cleaning strategy adjusting event.
In another aspect, the present application further provides a big data washing apparatus, including:
the data receiving module to be cleaned is used for receiving data to be cleaned from a data source;
the data cleaning strategy obtaining module is used for obtaining a corresponding target data cleaning strategy;
the data cleaning processing module is used for detecting a data cleaning strategy adjusting event, cleaning the data to be cleaned according to the target data cleaning strategy to obtain cleaning data, and meanwhile pre-cleaning the data to be cleaned according to the adjusted data cleaning strategy to obtain pre-cleaning data;
the application judgment result obtaining module is used for obtaining an application judgment result aiming at the undetermined data cleaning strategy; the application judgment result can represent whether the to-be-cleaned data cleaning strategy is applied or not to continuously clean the received to-be-cleaned data;
a target cleaning data obtaining module, configured to obtain target cleaning data of the data to be cleaned according to the application determination result; the target cleaning data is the cleaning data or the pre-cleaning data.
In yet another aspect, the present application further proposes a computer device, comprising:
a communication interface;
a memory for storing a program for implementing the big data washing method as described above;
and the processor is used for loading and executing the program stored in the memory to realize the big data cleaning method.
In yet another aspect, the present application further provides a computer-readable storage medium, on which a computer program is stored, the computer program being loaded and executed by a processor to implement the big data cleansing method as described above.
Therefore, the application provides a data cleaning method, a device, a computer device and a storage medium, and provides that in the process of cleaning data to be cleaned from any data source, under the condition of not stopping the data cleaning treatment according to the original target data cleaning strategy, in order to ensure the data cleaning effect, the data cleaning strategy can be dynamically adjusted, namely when a data cleaning strategy adjusting event is detected, the same data to be cleaned is simultaneously cleaned according to the data cleaning strategies before and after adjustment, and the cleaning treatment is continuously carried out on the received data to be cleaned by judging whether the undetermined data cleaning strategy is applied or not, so that the obtained cleaning data or pre-cleaning data is determined to be the target cleaning data of the data to be cleaned, the data cleaning effect is ensured, the full utilization of computing resources is realized, the maintenance cost is reduced, and the data cleaning speed is improved.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, it is obvious that the drawings in the following description are only embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.
FIG. 1 is a schematic flow chart diagram of an alternative example of a big data cleansing method proposed in the present application;
FIG. 2 is a schematic flow chart diagram of yet another alternative example of the big data cleansing method proposed in the present application;
FIG. 3 is a schematic diagram of an alternative system architecture for an application environment suitable for the big data cleansing method proposed in the present application;
FIG. 4 is a schematic flow chart diagram of yet another alternative example of the big data cleansing method proposed in the present application;
FIG. 5 is a schematic flow chart diagram of yet another alternative example of the big data cleansing method proposed in the present application;
FIG. 6 is a schematic diagram of an alternative example of a big data washer proposed in the present application;
FIG. 7 is a schematic diagram of a structure of yet another alternative example of the big data washer proposed in the present application;
fig. 8 is a schematic hardware configuration diagram of an alternative example of a computer device suitable for the big data cleansing method proposed in the present application.
Detailed Description
For the description content of the background art, along with the development of the computer communication technology, the computing capacity of computer equipment is greatly improved, and a big data technology is provided for data cleaning, for example, a batch-flow integrated big data cleaning method based on a flink technology greatly improves the cleaning speed under the condition of ensuring the cleaning effect, and simultaneously can dynamically adjust the data cleaning strategy under the condition that a data cleaning system is not stopped (namely, the process of cleaning the data to be cleaned is not interrupted according to the data cleaning strategy) so as to improve the data cleaning effect, save machine resources and reduce the maintenance cost.
Based on the method, the multiple databases are adopted, the data to be cleaned is subjected to pre-cleaning processing based on the adjusted pending data cleaning strategy, and the data to be cleaned is subjected to synchronous cleaning processing based on the data cleaning strategy (which can be self-defined) before adjustment, the obtained pre-cleaning data and cleaning data aiming at the same data to be cleaned are stored in different databases, so that the adjustment process of the current effective data cleaning strategy is realized in an asynchronous communication mode, the original data cleaning process is not interfered, the data cleaning safety is ensured, and the problem of data loss in the original cleaning process is avoided even if the pending data cleaning strategy is abnormal.
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it should be understood that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
Referring to fig. 1, a schematic flowchart of an optional example of a big data cleansing method proposed in this embodiment may be implemented by a computer device, where the computer device may be a terminal device and/or a server with data processing capability, and the server may be at least one physical server or a cloud server with cloud computing capability, and may be determined according to a data cleansing scenario, as shown in fig. 1, the big data cleansing method proposed in this embodiment may include:
s11, receiving data to be cleaned from a data source, and acquiring a corresponding target data cleaning strategy;
in the embodiment of the present application, referring to a flow diagram shown in fig. 2, a data source (such as a service server, a service terminal, and the like) that needs to perform data cleansing this time may be predetermined, and various policies for data cleansing may be configured, such as data cleansing policies of different cleansing depths, cleansing depth policies for determining different cleansing depths and data cleansing policies thereof, data cleansing models (which may be obtained by performing model training based on a machine learning algorithm, a cleansing algorithm, and the like, and the present application does not limit a training implementation method of the data cleansing models) required for each data cleansing policy to perform data cleansing, and cleansing evaluation policies for evaluating cleansing effects of the data cleansing by the data cleansing policies, and configuration information such as cleansing indexes required for representing different cleansing effects may also be configured in advance as needed, and the present application is not detailed herein by way of example.
In combination with the system architecture diagram of an optional application environment applicable to the big data cleansing method proposed in this application, shown in fig. 3, for each piece of preconfigured information listed above, the preconfigured information may be reported to the computer device by the same or different staff in an asynchronous communication manner after being completed on each terminal device, so that the computer device may determine or dynamically adjust a data cleansing policy used for data cleansing processing, a data cleansing model used for the data cleansing policy, and the like in the data cleansing process in combination with the preconfigured information.
Optionally, as shown in fig. 2, the present application may adopt an asynchronous communication mode, and send the pre-configured configuration information such as the above-mentioned policies, machine learning algorithms, cleaning algorithms, and the like, and the data to be cleaned from the selected data source to the corresponding function module of the computer device, such as the data access module, the cleaning policy module, the algorithm library module, and the like, and may also configure some parameters in the algorithm library as needed to optimize the corresponding algorithm, and the like.
Based on the above analysis, after the computer device receives the data to be cleaned from the at least one data source selected in advance, the cleaning policy module may determine, according to the information such as the pre-configured cleaning depth policy and the data cleaning requirement of the data source that needs to be subjected to data cleaning processing, the data cleaning policy currently used for performing cleaning processing on the data to be cleaned from the data source, and record the data cleaning policy as the target data cleaning policy, but is not limited to the data cleaning policy determination method described in this embodiment. For example, the data may be obtained by configuring a data cleaning policy in advance and reporting the data by a provider of the data to be cleaned or a demander of the data that needs to be cleaned, and the details of the example are not described herein.
In practical Application, the big data cleaning method provided by the Application can support multiple data access modes, that is, a computer device can read data to be cleaned of multiple data sources, such as a pulsar (a cloud native distributed Message stream platform, i.e., a distributed Message publishing/subscribing transmission platform), kafka (a high throughput distributed publishing and subscribing Message system), MQTT (Message Queuing Telemetry Transport protocol), mySQL (relational database management system), tiDB (an open source distributed warehouse relational database), hive (a data tool based on Hadoop), API (Application program interface), and the like, to provide data to be cleaned; the batch import mode of the data to be cleaned can be supported, and batch flow integrated data cleaning of the data is realized; the method and the device can also support real-time access of data, namely real-time acquisition of the data to be cleaned generated by the data source and the like.
Step S12, detecting a data cleaning strategy adjusting event, cleaning the data to be cleaned according to a target data cleaning strategy to obtain cleaning data, and pre-cleaning the data to be cleaned according to the adjusted data cleaning strategy to obtain pre-cleaning data;
in combination with the above description of the technical solution of the present application, in order to improve the data cleaning effect, the target data cleaning policy may be dynamically adjusted through an asynchronous communication manner, by manual intervention or by system feedback based on the cleaning effect, so as to ensure that the policy adjustment process does not interfere with the data cleaning process executed by the current computer equipment, i.e. dynamically updating the policy under the normal operation condition of the data cleaning system, without suspending the data cleaning processing task, so as to improve the data cleaning efficiency.
Because the updated data cleaning strategy is not necessarily a better data cleaning strategy, and the cleaning effect obtained through the data cleaning treatment may be worse, in order to avoid such invalid updating and ensure the data cleaning effect, the application provides the method for pre-cleaning the data to be cleaned by using the updated data cleaning strategy, so as to determine whether to use the pre-cleaning data obtained by using the updated data cleaning strategy as the target cleaning data of the data to be cleaned by evaluating the cleaning effect, and even process the subsequently received data to be cleaned.
Therefore, the data cleaning process can be divided into two parts, namely pre-cleaning treatment and cleaning treatment by using the flink architecture characteristics, as shown in fig. 2, when a data cleaning strategy is updated, namely a data cleaning strategy adjustment event is detected, the two parts can be triggered to be started simultaneously, and the currently received data to be cleaned is cleaned according to the original target data cleaning strategy to obtain the cleaning data; and meanwhile, pre-cleaning the data to be cleaned according to the adjusted undetermined data cleaning strategy to obtain pre-cleaning data, so that which cleaning result is used as a target cleaning result is determined through subsequent cleaning effect evaluation. Therefore, the processing mode can not interfere the cleaning processing operation of the original data cleaning strategy, and can also respond to the adjusted undetermined data cleaning strategy in time, so that the cleaning data with better cleaning effect can be obtained more quickly.
For the pre-cleaning processing and the cleaning processing, the computer device may create two independent threads, and notify the execution of the original target data cleaning policy and the adjusted pending cleaning policy according to the data cleaning method described above, so as to obtain different cleaning results of the same data to be cleaned, that is, the cleaning data and the pre-cleaning data.
S13, obtaining an application judgment result aiming at the data cleaning strategy to be determined;
as described above, since the pending data cleaning policy is not necessarily better than the cleaning effect of the target data cleaning policy before adjustment, and compared with the data cleaning policy after dynamic adjustment is directly executed, the application proposes to evaluate the cleaning effect achieved by the adjusted pending data cleaning policy to determine whether to apply the pending data cleaning policy, that is, whether to trigger the pending data cleaning policy to take effect, and the original target data cleaning policy is invalid. That is to say, the computer device may first obtain an application determination result that can represent whether to apply the pending data cleaning policy to continue cleaning the received data to be cleaned, but the application does not limit the obtaining method of the application determination result.
Optionally, the application determination result may be determined manually and then fed back to the computer device, or the computer device may automatically determine whether to apply the adjusted pending data cleaning policy according to a cleaning index (i.e., an evaluation feature index) of the pre-cleaning data, to obtain a corresponding application determination result, and the like.
Step S14, obtaining target cleaning data of the data to be cleaned according to the application judgment result; the target cleaning data is cleaning data or pre-cleaning data.
When the application judgment result is yes, the data cleaning effect achieved by the adjusted undetermined data cleaning strategy is superior to the data cleaning effect achieved by the original target data cleaning strategy, and the pre-cleaning data obtained by the processing can be determined as the target cleaning data corresponding to the data to be cleaned; otherwise, the adjusted undetermined data cleaning strategy achieves a worse data cleaning effect, the updating of the target data cleaning strategy at this time is abandoned, and the cleaning data obtained through the processing is determined to be the target cleaning data corresponding to the data to be cleaned, so that the data cleaning effect is ensured.
In still other embodiments, as shown in fig. 2, the data storage module may be further invoked to perform data cleaning operation according to different data cleaning strategies, and the obtained different cleaning results (i.e., cleaning data and pre-cleaning data) for the same data to be cleaned are classified and stored in a storage, so that effective data isolation is realized, and the quality and safety of data cleaning are ensured. Or only the obtained cleaning data or pre-cleaning data of the same data to be cleaned can be stored, and the like. And after finishing data cleaning and evaluating, can call the data synchronization module to carry on the data synchronization, make the cleaning result that is used for storing each database storage of the data cleaning result unanimous, implement the course and do not detailed this application.
Optionally, the data cleaning process and the data cleaning effect can be monitored, so that after the data cleaning process is completed according to the method, cleaning index information corresponding to at least one preset cleaning index of the pre-cleaning data can be obtained, and the obtained cleaning index information is output according to a preset visualization mode, so that the cleaning effect and the comparison result of the cleaning effect achieved by the data cleaning strategy before and after adjustment can be displayed, the content of the data cleaning strategy executed by the system can be displayed, and the like.
It should be understood that, in the case of no policy update, if no data cleansing policy event is detected, only the cleansing part may be triggered to start, and cleansing processing may be performed on the received data to be cleansed only according to the target data cleansing policy, and the obtained cleansing data may be determined as target cleansing data of the data to be cleansed. It can be seen that the cleaning processing part is started no matter whether the data cleaning strategy is adjusted or not, so that the data cleaning operation is executed according to the currently effective target data, and the data cleaning work is ensured to be uninterrupted. And then, the data storage module and the data synchronization module can be called to ensure the consistency of the data stored in the multiple databases, and the implementation process of the embodiment of the application is not described in detail herein.
In summary, in the embodiment of the present application, in the process of cleaning data to be cleaned from any data source, it is proposed that the data cleaning policy may be dynamically adjusted without stopping the data cleaning process according to the original target data cleaning policy, that is, when a data cleaning policy adjustment event is detected, the same data to be cleaned is simultaneously cleaned according to the data cleaning policies before and after adjustment, and the cleaning process is continued on the received data to be cleaned by determining whether the data to be cleaned is to be cleaned by applying the data cleaning policy to determine that the obtained cleaning data or pre-cleaning data is the target cleaning data of the data to be cleaned, so as to ensure the data cleaning effect, fully utilize computing resources, reduce the maintenance cost, and improve the data cleaning speed.
Referring to fig. 4, which is a schematic flow chart of yet another optional example of the big data cleansing method proposed in the present application, this embodiment may be an optional detailed implementation manner of the big data cleansing method described above, and as shown in fig. 4, the method may include:
step S41, receiving data to be cleaned from a data source, writing the data to be cleaned into a third database, and acquiring data characteristics of the data to be cleaned;
in some embodiments, as shown in fig. 5, the original database configured to store the data to be cleaned may be recorded as a third database, so that for the data to be cleaned received by the data access module, one path of the data may be input to the data processing module for cleaning, and the other path of the data may be written into the third database for storage. Therefore, the computer device can forbid responding to the cleaning processing instruction and the data synchronization instruction aiming at the data to be cleaned stored in the third database, and the third database is ensured to store only the original data to be cleaned.
Optionally, the present application may further configure a data storage period for the third database, so as to delete the data to be cleaned whose storage duration reaches the data storage period, that is, to clean the data to be cleaned whose storage time is longer in the third database in time, thereby saving storage resources.
Step S42, determining a target data cleaning strategy aiming at the data to be cleaned according to the data characteristics and a pre-configured cleaning depth strategy;
in conjunction with the description of the corresponding portions of the embodiments above, the target data washing strategy may include a data washing model corresponding to at least one washing depth (i.e., different levels) that is trained based on a machine learning algorithm and/or a washing algorithm. In practical application of the present application, corresponding data cleaning strategies can be configured for different cleaning depths, and an algorithm library (such as the cleaning algorithm library and the machine learning algorithm library shown in fig. 2) is called to train and learn a corresponding data cleaning model, so as to implement the data cleaning strategy for the cleaning depth.
Optionally, the data cleansing strategies of different cleansing depths (i.e. different levels) may include, but are not limited to: the first (i.e., deep) cleaning strategy may be: removing all incomplete and abnormal data; the secondary cleaning strategy may be: removing incomplete data and reserving other abnormal data; the tertiary cleaning strategy may be: removing missing data of the main key and reserving other abnormal data; the four-stage cleaning strategy can be as follows: removing missing data of the main key and repairing other abnormal data; the level five cleaning strategy may be: and repairing the missing data, repairing abnormal data and the like.
In the data cleaning process, the used data cleaning strategy can be dynamically adjusted manually, and the dynamic adjustment of the data cleaning strategy can also be realized according to the cleaning evaluation result, for example, the data is selected according to a default sequence, or randomly selected, or manually selected, or part of levels are selected for data processing.
Based on the above analysis, referring to a further alternative flow diagram of the big data cleansing method shown in fig. 5, the preconfigured policies and parameters input into the computer device may be input into four functional modules, namely, a "data source configuration" functional module, a "cleansing depth policy" functional module, a "custom cleansing policy" functional module, and a "cleansing process dashboard" functional module, in an inter-frequency communication manner, through the asynchronous communication module, where the "data source configuration" functional module may be used to configure a data source, that is, to select a data source that needs to be cleansed from among multiple supported data sources (including but not limited to mysql, kafka, pulsar, tidb, mqtt, hive, and the like). The "cleaning depth strategy" module may configure a plurality of cleaning depths, including but not limited to those listed above, and an adaptive adjustment strategy for the cleaning depths according to the received configuration information, so as to dynamically adjust the sequence of the plurality of cleaning depth levels and the cleaning strategies corresponding thereto, and so on. The self-defined cleaning strategy module can be used for self-defining a cleaning index (namely a characteristic index), and dynamically adjusting the cleaning depth and the self-defined strategy according to the cleaning index. The "configure cleaning process dashboard" may be used to configure cleaning indicators to be displayed on the dashboard, but is not limited to the dashboard display mode, and may be determined as the case may be.
As shown in fig. 5, according to the above description, the computer device may determine, based on the configuration information uploaded by the asynchronous communication module, a plurality of cleaning depths and cleaning strategies corresponding to the cleaning depths included in the basic cleaning strategy library, and various built-in algorithm libraries for implementing the cleaning algorithms, the machine learning algorithms, and statistics required by each cleaning strategy, including but not limited to the cleaning algorithm library and the machine learning library shown in fig. 5, and the present application does not describe in detail the types of algorithms included in each algorithm library and the operation principle thereof.
In the practical application of the data cleaning method, in order to improve the data cleaning efficiency, the data characteristics of the data to be cleaned and a preset data machine learning model (namely a data cleaning module) can be combined for the system to use, and the cleaning treatment of the received data to be cleaned is realized. The data cleaning module can call a corresponding cleaning algorithm/machine learning algorithm to perform model training based on the data to be cleaned and the cleaning result thereof, and can dynamically optimize (i.e. adjust model parameters) the data cleaning module according to the cleaning result (cleaning effect) obtained by data cleaning processing each time in the training and subsequent use processes, so that the accuracy of model output is improved, and the subsequent data cleaning quality is improved. The application does not detail the training of the data cleaning model of each cleaning depth and the optimization realization method thereof.
S43, detecting a data cleaning strategy adjusting event, cleaning data to be cleaned according to a target data cleaning strategy, writing the obtained cleaning data into a first database, meanwhile, pre-cleaning the data to be cleaned according to the adjusted data cleaning strategy to be determined, and writing the obtained pre-cleaning data into a second database;
as described above, in the data cleaning process, the received data to be cleaned may be input into the data cleaning model corresponding to the target data cleaning policy for cleaning, so as to obtain corresponding cleaning data, and the data to be cleaned is input into the adjusted data cleaning model corresponding to the data cleaning policy to be determined for pre-cleaning, so as to obtain corresponding pre-cleaning data, so as to implement effective data isolation and ensure data cleaning quality and safety.
In the embodiment of the present application, when there is an update policy and a new policy (i.e., an adjusted pending data cleaning policy) is not determined to be in effect, the pre-cleaning process and the cleaning process shown in fig. 2 are both started, and cleaning results of the same data to be cleaned are stored in separate banks, so that cleaning data can be stored in a pre-configured first database, and pre-cleaning data can be stored in a configured second database. When the pre-cleaning data is not available, that is, the data cleaning strategy adjustment event is not detected, the cleaning data obtained according to the target data cleaning strategy can be respectively written into the first database and the second database, so that the cleaning data stored in the two databases are completely consistent, and the two databases can be used as backup databases for each other.
In still other embodiments, in the case of synchronously performing the pre-cleaning process and the cleaning process according to the above method, only one cleaning result may be written into the corresponding database for storage, for example, when the obtained cleaning data is written into the first database, the corresponding pre-cleaning data may not be stored for a while, and after the subsequent evaluation is completed, it is determined whether to store the pre-cleaning data; or, the pre-cleaning data may be written into the second database, the corresponding cleaning data is not stored temporarily, and after the subsequent evaluation determination is completed, it is determined whether the first database is to store the pre-cleaning data synchronously or the corresponding cleaning data synchronously.
S44, calling a pre-configured data cleaning evaluation strategy, and evaluating the pre-cleaning data to obtain a cleaning evaluation result;
step S45, judging whether to apply the undetermined data cleaning strategy to execute data cleaning operation or not according to the cleaning evaluation result to obtain a corresponding application judgment result;
according to the description of the corresponding part of the above embodiment, the data cleaning result (such as the obtained pre-cleaning data) can be evaluated according to the pre-constructed cleaning index, and the obtained cleaning evaluation result is fed back to the data cleaning system through the asynchronous communication module, so that the corresponding data cleaning model and the cleaning strategy can be adjusted, and the data cleaning effect is improved. Optionally, a corresponding response instruction may be fed back to the asynchronous communication module according to the cleaning evaluation result, the response instruction is executed, and the cleaning depth level is automatically adjusted according to the cleaning depth policy, so that the obtained data cleaning policy is more suitable for cleaning currently received data, and the data cleaning effect is improved.
In the embodiment of the application, corresponding cleaning characteristic indexes can be constructed according to data to be cleaned provided by a data source, grouping is performed according to different data dimensions and the like, statistical analysis is performed by applying four major distribution principles of statistics, data types are determined, and data characteristics of different data types are extracted to construct one or more cleaning indexes for evaluating the cleaning effect of the data; meanwhile, in the training process of the data cleaning model, cleaning strategies with different cleaning depths can be constructed in the mode, the data cleaning model corresponding to each cleaning strategy is trained and executed, and the implementation method is not detailed in the application and can be determined according to the situation.
Optionally, the cleaning evaluation policy may include one or more cleaning indexes configured in advance, so that in a cleaning evaluation process automatically executed by the computer device, corresponding cleaning index information may be extracted from the pre-cleaning data according to the one or more cleaning indexes configured in advance and an index threshold corresponding to the one or more cleaning indexes (i.e., an index parameter critical value representing a better cleaning effect, which is not limited by the present application), so as to determine a cleaning effect of the to-be-determined data cleaning policy, and further determine whether to subsequently apply the to-be-determined data cleaning policy.
In practical application, the data cleaning evaluation module may feed back the obtained cleaning evaluation result to the asynchronous communication module, so that the data cleaning evaluation module may send each received information to the data processing module without interfering with the currently executed data cleaning work, automatically adjust the pre-configured cleaning depth level, allow manual intervention in the cleaning process, adjust the designated parameters in the cleaning policy, the model parameters of the data cleaning model (i.e., the data after cleaning is used to train the data cleaning model acutely and adjust the model parameters), update the data cleaning policy, and the like, so as to obtain the best cleaning effect, and the implementation process may refer to the description of the context corresponding part, which is not described herein.
In still other embodiments, it may be determined by an artificial interference manner whether to apply the pending data cleaning policy, that is, whether to use the adjusted pending data cleaning policy, and a decision right to perform cleaning processing on subsequently imported data to be cleaned is given to a service person, and an application determination instruction for the pending data cleaning policy may be sent to a terminal device of the service person, so that the terminal device responds to the application determination instruction and outputs application prompt information whether to apply the pending data cleaning policy, and the terminal device responds to application selection operation on the application prompt information to obtain an application determination result for the pending data cleaning policy, and feeds the application determination result back to the computer device.
Therefore, in order to ensure the data cleaning effect, the data cleaning strategy is dynamically adjusted in time according to the changes of the content, the cleaning requirement and the like of the accessed data to be cleaned, the currently accessed data to be cleaned is synchronously pre-cleaned according to the adjusted data to be cleaned, the cleaning effect of the obtained pre-cleaning result is evaluated, and whether the data cleaning strategy (namely a new strategy) is subsequently applied for cleaning the data or not is determined, so that the influence on the data cleaning quality caused by directly applying the new strategy with poorer cleaning effect is avoided.
Moreover, the multiple database storage mode provided by the application realizes effective data isolation in the pre-cleaning treatment process, ensures the quality and safety of data cleaning, can be independent databases or database clusters for the three types of databases, namely the first database, the second database and the third database, can dynamically expand the scale of the database according to actual requirements, meets the data storage requirements, and does not limit the data storage mode and the structure of the database according to the conditions.
Step S46, if the application judgment result is yes, determining the pre-cleaning data as target cleaning data corresponding to the data to be cleaned, and synchronously updating the target cleaning data to the first database so as to enable the data stored in the first database and the data stored in the second database to be consistent;
step S47, if the application judgment result is negative, determining the cleaning data as target cleaning data corresponding to the data to be cleaned, and synchronously updating the target cleaning data to the second database so as to enable the data stored in the first database and the second database to be consistent;
as can be seen, in the embodiment of the present application, after data cleaning evaluation, if it is determined that an adjusted pending data cleaning policy is applied, the data synchronization module may use a pre-cleaning result as a target cleaning result of data to be cleaned, and synchronize the target cleaning result into all cleaning result databases (e.g., the first database and the second database), that is, write the target cleaning data into the first database for storage, and delete the cleaning data that has been stored in the first database and obtained by corresponding processing.
If the undetermined data cleaning strategy is determined to be abandoned, the target data cleaning strategy before the adjustment is applied is kept, the data synchronization module synchronizes the cleaning data to all cleaning result databases, namely, the cleaning data is determined to be the target cleaning data corresponding to the data to be cleaned, the target cleaning data is written into a second database for storage, the pre-cleaning data obtained by corresponding processing and stored in the second database is deleted, namely, the pre-cleaning data is abandoned, and the corresponding cleaning data is filled, so that the data stored in all the cleaning result databases are ensured to be completely consistent whether the undetermined data cleaning strategy is applied or not. It should be noted that, the implementation method of data synchronization for each cleaning result database includes, but is not limited to, the above-described manner, and can be flexibly adjusted according to circumstances, and the detailed description of the present application is not given.
Step S48, outputting the obtained cleaning index information aiming at the pre-cleaning data and the cleaning data according to the pre-configured cleaning index;
in the embodiment of the application, the cleaning index information can represent a comparison result between a pre-cleaning effect (i.e., a cleaning effect achieved by performing pre-cleaning treatment according to a cleaning strategy for data to be cleaned) and an original cleaning effect (i.e., a cleaning effect achieved by performing cleaning treatment according to a cleaning strategy for target data) for the same data to be cleaned, and may further include a currently executed cleaning strategy for target data, etc., and the obtained cleaning index information may be determined according to the content of the monitored cleaning index
And S49, dynamically adjusting model parameters of a data cleaning model contained in the target data cleaning strategy by using the cleaning index information and the target cleaning data in an asynchronous communication mode, and generating a data cleaning strategy adjusting event.
As shown in fig. 2 and 5, in the data cleaning process, the cleaning index may be monitored, and the cleaning index information configured for all systems may be displayed through the dashboard processing module, but the method is not limited to this monitoring implementation method, and each monitored cleaning index information or other information may also be sent to a preset terminal device for displaying through a preset communication mode, so that a monitoring person may visually monitor the data cleaning process in real time or periodically, and the implementation process is not described in detail in this application.
Referring to fig. 6, a schematic diagram of an alternative example of the big data washing apparatus proposed in the present application may include:
a data receiving module 61 for receiving data to be cleaned from a data source;
a data cleaning strategy obtaining module 62, configured to obtain a corresponding target data cleaning strategy;
the data cleaning processing module 63 is configured to detect a data cleaning policy adjustment event, perform cleaning processing on the data to be cleaned according to the target data cleaning policy to obtain cleaning data, and perform pre-cleaning processing on the data to be cleaned according to the adjusted data to be cleaned cleaning policy to obtain pre-cleaning data;
an application determination result obtaining module 64, configured to obtain an application determination result for the pending data cleansing policy; the application judgment result can represent whether the to-be-cleaned data cleaning strategy is applied or not to continuously clean the received to-be-cleaned data;
a target cleaning data obtaining module 65, configured to obtain target cleaning data of the data to be cleaned according to the application determination result; the target cleaning data is the cleaning data or the pre-cleaning data.
Optionally, the data cleansing policy obtaining module 62 may include:
the data characteristic acquisition unit is used for acquiring the data characteristics of the data to be cleaned;
the target data cleaning strategy determining unit is used for determining a target data cleaning strategy aiming at the data to be cleaned according to the data characteristics and a pre-configured cleaning depth strategy;
the target data cleaning strategy comprises at least one data cleaning model corresponding to a cleaning depth, and the data cleaning model is obtained based on machine learning algorithm and/or cleaning algorithm training so as to realize cleaning treatment on cleaning data.
Optionally, the module 64 for obtaining the application determination result may include:
the first obtaining module is used for obtaining an application selection instruction input by a monitoring person aiming at the to-be-determined data cleaning strategy and obtaining an application judgment result of whether to apply the to-be-determined data cleaning strategy to execute data cleaning operation;
alternatively, the application determination result obtaining module 64 may include:
the cleaning evaluation unit is used for calling a pre-configured data cleaning evaluation strategy and evaluating the pre-cleaning data to obtain a cleaning evaluation result;
and the application judgment unit is used for judging whether to apply the undetermined data cleaning strategy to execute data cleaning operation or not according to the cleaning evaluation result to obtain a corresponding application judgment result.
In still other embodiments, the apparatus may further include:
a cleaning index information obtaining module, configured to obtain cleaning index information for the pre-cleaning data and the cleaning data according to a pre-configured cleaning index; the cleaning index information can represent a comparison result of a pre-cleaning effect and an original cleaning effect aiming at the same data to be cleaned;
the cleaning index information output module is used for outputting the cleaning index information;
and/or the model parameter adjusting module is used for dynamically adjusting the model parameters of the data cleaning model contained in the target data cleaning strategy by utilizing the cleaning index information and the target cleaning data in an asynchronous communication mode to generate a data cleaning strategy adjusting event.
In still other embodiments, as shown in fig. 7, the apparatus may further include:
the first storage module 66 is configured to write the processed cleaning data into a first database for storage; and/or the presence of a gas in the gas,
the second storage module 67 is configured to write the pre-cleaning data obtained through processing into a second database for storage;
a data synchronization module 68, configured to perform data synchronization on the first database and the second database according to a synchronization manner corresponding to the application determination result;
optionally, the data synchronization module 68 may include:
the first synchronization unit is used for writing the target cleaning data into the first database for storage and deleting the cleaning data stored in the first database and obtained by corresponding processing under the condition that the application judgment result is yes;
and the second synchronization unit is used for writing the target cleaning data into the second database for storage and deleting the pre-cleaning data which is stored in the second database and obtained by corresponding processing under the condition that the application judgment result is negative.
In still other embodiments, as shown in fig. 7, the apparatus may further include:
a third storage module 69, configured to write the data to be cleaned from the data source into a third database for storage; the third database is configured with a data storage period so as to delete the data to be cleaned, the storage time of which reaches the data storage period;
and a response forbidding module 610, configured to forbid a response to the cleaning processing instruction and the data synchronization instruction for the data to be cleaned stored in the third database.
It should be noted that, various modules, units, and the like in the embodiments of the foregoing apparatuses may be stored in a memory as program modules, and the processor may execute the program modules stored in the memory to implement corresponding functions, or may be implemented by combining the program modules and hardware, and for the functions implemented by the program modules and the combinations thereof and the achieved technical effects, reference may be made to the description of corresponding parts in the embodiments of the foregoing methods, and this embodiment is not described again.
The embodiment of the present application further provides a computer-readable storage medium, where a computer program is stored, and the computer program is loaded and executed by a processor to implement each step of the above big data cleaning method, where a specific implementation process may refer to descriptions of corresponding parts in the above embodiments, and details are not described in this embodiment.
Referring to fig. 8, a schematic diagram of a hardware structure of an alternative example of a computer device suitable for the big data cleansing method proposed in the present application, the product type of the computer device is not limited in the present application, and the computer device is a server for example, as shown in fig. 8, the computer device may include but is not limited to: a communication interface 81, a memory 82, and a processor 83, wherein:
the number of the communication interface 81, the memory 82, and the processor 83 may be at least one, and the communication interface 81, the memory 82, and the processor 83 may be connected to a communication bus, and data interaction between each other and other structural components of the computer device is realized through the communication bus, which may be determined according to actual requirements, and is not described in detail herein.
The communication interface 81 may include a data interface of a communication module of the computer device, and a communication interface such as a USB interface, a serial/parallel interface, an I/O interface, etc. for implementing data interaction between internal components of the computer device; the communication module may include, for example, a WIFI module, a 5G/6G (fifth generation mobile communication network/sixth generation mobile communication network) module, a GPRS module, a radio frequency communication module, and the like, so that the computer device can implement data interaction with other devices (such as various data sources, databases, terminal devices, and the like) through corresponding wireless communication networks.
In this embodiment of the application, the communication interface 81 may be configured to receive data to be cleaned of each data source, preconfigured configuration information, and the like, and may also write each cleaning result obtained by the cleaning process into a corresponding database, and send monitored cleaning index information to the terminal device for output, thereby implementing visual monitoring. The application does not limit the data transmission content of the communication interface in the big data cleaning method, and can be determined according to the situation.
The memory 82 may be used to store programs for implementing the big data washing methods described in the above method embodiments; the processor 83 may load and execute the program stored in the memory to implement the steps of the big data cleaning method described in the above corresponding method embodiment, and the specific implementation process may refer to the description of the corresponding parts in the above embodiment, which is not described again.
In the embodiment of the present application, the memory 82 may include a high-speed random access memory, and may further include a nonvolatile memory, such as at least one magnetic disk storage device or other volatile solid-state storage device. The processor 83 may be a Central Processing Unit (CPU), an application-specific integrated circuit (ASIC), a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic device.
It should be understood that the structure of the computer device shown in fig. 8 is not limited to the computer device in the embodiment of the present application, and in practical applications, the computer device may include more components than those shown in fig. 8, or may combine some components, which is not listed here.
It should be noted that the big data cleaning method, the big data cleaning device, the computer equipment and the storage medium provided by the invention can be used in the fields of artificial intelligence, block chain, distribution, cloud computing, big data, internet of things and finance. The foregoing is merely an example, and does not limit the application fields of the big data cleaning method, apparatus, computer device, and storage medium provided by the present invention.
In connection with the above embodiments, the terms "a", "an" and/or "the" are not intended to refer to the singular, but may include the plural unless the context clearly dictates otherwise. In general, the terms "comprises" and "comprising" are intended to cover only the explicitly identified steps or elements as not constituting an exclusive list and that the method or apparatus may comprise further steps or elements. An element defined by the phrase "comprising a … …" does not exclude the presence of additional like elements in a process, method, article, or apparatus that comprises the element.
In the description of the embodiments herein, "/" means "or" unless otherwise specified, for example, a/B may mean a or B; "and/or" herein is merely an association describing an associated object, and means that there may be three relationships, e.g., a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, in the description of the embodiments of the present application, "a plurality" means two or more than two.
This application is directed to terms such as "first," "second," and the like, which are used for descriptive purposes only to distinguish one operation, element, or module from another operation, element, or module and do not necessarily require or imply any actual relationship or order between such elements, operations, or modules. And are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated, whereby a feature defined as "first" or "second" may explicitly or implicitly include one or more of such features.
In addition, in the present specification, the embodiments are described in a progressive or parallel manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments can be referred to each other. The device, the computer device, the system and the storage medium disclosed by the embodiment correspond to the method disclosed by the embodiment, so that the description is simple, and the relevant points can be referred to the method part for description.
Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether these functions are performed in hardware or software depends on the specific application of the solution and design pre-set conditions. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the disclosure. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (10)

1. A big data cleaning method is characterized by comprising the following steps:
receiving data to be cleaned from a data source, and acquiring a corresponding target data cleaning strategy;
detecting a data cleaning strategy adjusting event, cleaning the data to be cleaned according to the target data cleaning strategy to obtain cleaning data, and pre-cleaning the data to be cleaned according to the adjusted data cleaning strategy to obtain pre-cleaning data;
obtaining an application judgment result aiming at the to-be-determined data cleaning strategy; the application judgment result can represent whether to apply the cleaning strategy of the data to be cleaned to continuously carry out cleaning treatment on the received data to be cleaned;
obtaining target cleaning data of the data to be cleaned according to the application judgment result; the target cleaning data is the cleaning data or the pre-cleaning data.
2. The method of claim 1, further comprising:
writing the processed cleaning data into a first database for storage; and/or the presence of a gas in the gas,
writing the pre-cleaning data obtained by processing into a second database for storage;
and carrying out data synchronization on the first database and the second database according to the synchronization mode corresponding to the application judgment result.
3. The method according to claim 2, wherein the performing data synchronization on the first database and the second database according to the synchronization mode corresponding to the application determination result comprises:
if the application judgment result is yes, writing the target cleaning data into the first database for storage, and deleting the cleaning data stored in the first database and obtained by corresponding processing;
and if the application judgment result is negative, writing the target cleaning data into the second database for storage, and deleting the pre-cleaning data obtained by the corresponding processing stored in the second database.
4. The method of claim 1, further comprising:
writing the data to be cleaned from the data source into a third database for storage; the third database is configured with a data storage period so as to delete the data to be cleaned, the storage time of which reaches the data storage period;
and prohibiting a cleaning processing instruction and a data synchronization instruction which respond to the data to be cleaned stored aiming at the third database.
5. The method according to any one of claims 1 to 4, wherein the obtaining of the corresponding target data cleansing policy comprises:
acquiring data characteristics of the data to be cleaned;
determining a target data cleaning strategy aiming at the data to be cleaned according to the data characteristics and a pre-configured cleaning depth strategy;
the target data cleaning strategy comprises at least one data cleaning model corresponding to cleaning depth, and the data cleaning model is obtained based on machine learning algorithm and/or cleaning algorithm training so as to realize cleaning treatment on cleaning data.
6. The method according to any one of claims 1-4, wherein the obtaining the application decision result for the pending data cleansing policy comprises:
obtaining an application selection instruction input by a monitoring person aiming at the to-be-determined data cleaning strategy, and obtaining an application judgment result of whether to apply the to-be-determined data cleaning strategy to execute data cleaning operation;
or the like, or, alternatively,
calling a pre-configured data cleaning evaluation strategy, and evaluating the pre-cleaning data to obtain a cleaning evaluation result;
and judging whether to apply the undetermined data cleaning strategy to execute data cleaning operation or not according to the cleaning evaluation result to obtain a corresponding application judgment result.
7. The method according to any one of claims 1-4, further comprising:
acquiring cleaning index information aiming at the pre-cleaning data and the cleaning data according to a pre-configured cleaning index; the cleaning index information can represent a comparison result of a pre-cleaning effect and an original cleaning effect aiming at the same data to be cleaned;
outputting the cleaning index information;
and/or dynamically adjusting model parameters of a data cleaning model contained in the target data cleaning strategy by using the cleaning index information and the target cleaning data in an asynchronous communication mode to generate a data cleaning strategy adjusting event.
8. A big data washing apparatus, the apparatus comprising:
the data receiving module to be cleaned is used for receiving data to be cleaned from a data source;
the data cleaning strategy obtaining module is used for obtaining a corresponding target data cleaning strategy;
the data cleaning processing module is used for detecting a data cleaning strategy adjusting event, cleaning the data to be cleaned according to the target data cleaning strategy to obtain cleaning data, and meanwhile pre-cleaning the data to be cleaned according to the adjusted data cleaning strategy to obtain pre-cleaning data;
the application judgment result obtaining module is used for obtaining an application judgment result aiming at the undetermined data cleaning strategy; the application judgment result can represent whether to apply the cleaning strategy of the data to be cleaned to continuously carry out cleaning treatment on the received data to be cleaned;
the target cleaning data obtaining module is used for obtaining target cleaning data of the data to be cleaned according to the application judgment result; the target cleaning data is the cleaning data or the pre-cleaning data.
9. A computer device, characterized in that the computer device comprises:
a communication interface;
a memory for storing a program for implementing the big data washing method according to any one of claims 1 to 7;
a processor for loading and executing the program stored in the memory to realize the big data washing method of any one of claims 1 to 7.
10. A computer-readable storage medium, on which a computer program is stored, the computer program being loaded and executed by a processor to implement the big data washing method according to any of claims 1 to 7.
CN202211078831.1A 2022-09-05 2022-09-05 Big data cleaning method and device, computer equipment and storage medium Pending CN115309735A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211078831.1A CN115309735A (en) 2022-09-05 2022-09-05 Big data cleaning method and device, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211078831.1A CN115309735A (en) 2022-09-05 2022-09-05 Big data cleaning method and device, computer equipment and storage medium

Publications (1)

Publication Number Publication Date
CN115309735A true CN115309735A (en) 2022-11-08

Family

ID=83866445

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211078831.1A Pending CN115309735A (en) 2022-09-05 2022-09-05 Big data cleaning method and device, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN115309735A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115618283A (en) * 2022-12-02 2023-01-17 中国汽车技术研究中心有限公司 Cross-site script attack detection method, device, equipment and storage medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115618283A (en) * 2022-12-02 2023-01-17 中国汽车技术研究中心有限公司 Cross-site script attack detection method, device, equipment and storage medium

Similar Documents

Publication Publication Date Title
CN109345377B (en) Data real-time processing system and data real-time processing method
CN111309539A (en) Abnormity monitoring method and device and electronic equipment
CN109597800B (en) Log distribution method and device
CN115309735A (en) Big data cleaning method and device, computer equipment and storage medium
CN104683155A (en) Alarm shielding mechanism in network management system
CN111752706A (en) Resource allocation method, device and storage medium
CN105069029B (en) A kind of real-time ETL system and method
CN115509875A (en) Server health degree evaluation method and device
CN110009347B (en) Block chain transaction information auditing method and device
CN105357026B (en) A kind of resource information collection method and calculate node
CN114385378A (en) Active data processing method and device for Internet of things equipment and storage medium
CN104735063B (en) A kind of safe evaluating method for cloud infrastructure
CN108551444A (en) A kind of log processing method, device and equipment
CN111130882A (en) Monitoring system and method of network equipment
CN109462510B (en) CDN node quality evaluation method and device
CN108255710B (en) Script abnormity detection method and terminal thereof
CN114244681B (en) Equipment connection fault early warning method and device, storage medium and electronic equipment
CN116361631A (en) Method and equipment for detecting time sequence data period, detecting abnormality and scheduling resources
CN113254253B (en) Data processing method, system and equipment
CN112905119B (en) Data write-in control method, device and equipment of distributed storage system
CN113992378B (en) Security monitoring method and device, electronic equipment and storage medium
CN112800089B (en) Intermediate data storage level adjusting method, storage medium and computer equipment
CN112052147B (en) Monitoring method, electronic device and storage medium
CN112448855B (en) Method and system for updating block chain system parameters
CN117131117A (en) Data acquisition and warehousing method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination