CN112632132B - Processing method, device and equipment for abnormal imported data - Google Patents

Processing method, device and equipment for abnormal imported data Download PDF

Info

Publication number
CN112632132B
CN112632132B CN202011637145.4A CN202011637145A CN112632132B CN 112632132 B CN112632132 B CN 112632132B CN 202011637145 A CN202011637145 A CN 202011637145A CN 112632132 B CN112632132 B CN 112632132B
Authority
CN
China
Prior art keywords
data
abnormal
target
imported
file
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011637145.4A
Other languages
Chinese (zh)
Other versions
CN112632132A (en
Inventor
谢南翔
张岩
黄宇昕
王元文
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Agricultural Bank of China
Original Assignee
Agricultural Bank of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Agricultural Bank of China filed Critical Agricultural Bank of China
Priority to CN202011637145.4A priority Critical patent/CN112632132B/en
Publication of CN112632132A publication Critical patent/CN112632132A/en
Application granted granted Critical
Publication of CN112632132B publication Critical patent/CN112632132B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24553Query execution of query operations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • G06F16/2433Query languages
    • G06F16/244Grouping and aggregation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/258Data format conversion from or to a database
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/285Clustering or classification
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the application discloses a processing method, a device and equipment for abnormal imported data, which are used for firstly acquiring data to be imported and judging whether the data to be imported meets an abnormal condition or not. If the data to be imported meets the abnormal condition, determining the data to be imported as abnormal data to be checked, and writing the abnormal data to be checked into an abnormal data table. And finally, selecting target abnormal data from the abnormal data table, importing the target abnormal data into a target database, and determining an abnormal problem according to an importing result. If the importing is successful, the abnormal problem is determined as the data abnormal problem. If the import is unsuccessful, the abnormal problem is determined to be the warehousing abnormal problem. By importing the target abnormal data into the target database, the automatic importing of the target abnormal data which can be partially and normally put into storage can be realized, the abnormal reason of the target abnormal data is determined, and the subsequent processing of the abnormal data aiming at the abnormal reason is facilitated. The processing speed of the abnormal data is increased, and the data importing efficiency is improved.

Description

Processing method, device and equipment for abnormal imported data
Technical Field
The present invention relates to the field of data processing, and in particular, to a method, an apparatus, and a device for processing exception-imported data.
Background
The database is used for storing a large amount of data and providing services such as data storage, data processing, data analysis and the like for users by utilizing the stored data. When new data is generated or data stored in the database needs to be modified, the data needs to be imported into the database.
Abnormal data may exist in the imported data of the database. At present, the abnormal data in the imported data is extracted, and the process of importing the abnormal data is controlled manually or the abnormal data is processed. The manual processing mode results in lower efficiency of processing abnormal data and influences the speed of data importing.
Disclosure of Invention
In view of this, the embodiments of the present application provide a method, an apparatus, and a device for processing exception imported data, which can implement automatic storage of part of exception data, and determine an exception problem of the exception imported data, so that the exception data can be processed according to the exception problem, and the processing efficiency of the exception data is improved.
In order to solve the above problems, the technical solution provided in the embodiments of the present application is as follows:
in a first aspect, the present application provides a method for processing exception-imported data, where the method includes:
Acquiring data to be imported, and judging whether the data to be imported meets an abnormal condition or not; the abnormal condition comprises that the number of the data fields of the data to be imported does not accord with the number of the target fields and/or the field format of the data to be imported does not accord with the preset field format;
if the data to be imported meets the abnormal condition, determining the data to be imported as abnormal data to be checked, and writing the abnormal data to be checked into an abnormal data table;
selecting target abnormal data from the abnormal data table, and importing the target abnormal data into a target database;
if the importing is successful, determining the abnormal problem of the target abnormal data as a data abnormal problem;
if the import is unsuccessful, determining the abnormal problem of the target abnormal data as a warehousing abnormal problem.
In one possible implementation manner, the writing the to-be-examined abnormal data into an abnormal data table includes:
acquiring a target table name of a data table to be imported by the abnormal data to be checked and a preset importing time of the abnormal data to be checked into a target database;
acquiring column information, data information and anomaly information of the anomaly data to be checked, and forming a target value field by the column information, the data information and the anomaly information;
And writing the target table name, the target import time and the target value field into the abnormal data table.
In one possible implementation manner, the selecting the target abnormal data from the abnormal data table includes:
selecting a data table in a target database as a data table to be written;
acquiring the table name of the data table to be written as the table name to be written; acquiring date information of the data table to be written as date to be written;
inquiring whether the abnormal data table has corresponding abnormal data to be checked or not according to the table name to be written and the date to be written;
and if the abnormal data to be checked is the target abnormal data.
In one possible implementation manner, the importing the target abnormal data into a target database includes:
obtaining key value data of the target abnormal data, and splicing the key value data to obtain target key value data;
and writing the target key value data into a target database.
In one possible implementation manner, before the obtaining the data to be imported and determining whether the data to be imported is abnormal data to be examined, the method further includes:
Acquiring original data, and performing data aggregation and data conversion on the original data to obtain source data;
and selecting data to be imported from the source data.
In one possible implementation manner, before the acquiring the source data, performing data aggregation and data conversion on the original data to obtain the source data, the method further includes:
acquiring first file information of a first data file from a configuration file, wherein the first file information comprises first position information and first data information; the first data file is a file for storing original data;
querying whether the first data file exists or not by utilizing the first position information;
if so, inquiring whether the original data in the first data file is complete or not by utilizing the first data information;
the acquiring the original data comprises the following steps:
if the data in the first data file is complete, taking the data in the first data file as original data;
and acquiring original data from the first data file.
In one possible implementation manner, before the obtaining the data to be imported and determining whether the data to be imported is abnormal data to be examined, the method further includes:
Acquiring second file information of a second data file from the configuration file, wherein the second file information comprises second position information and second data information; the second data file is a file for storing source data;
querying whether the second data file exists or not by using the second position information;
if so, inquiring whether the data in the second data file is complete or not by utilizing the second data information;
the obtaining the data to be imported comprises the following steps:
if the data in the second data file is complete, taking the data in the second data file as source data;
and acquiring data to be imported from the second data file.
In a second aspect, the present application provides a processing apparatus for exception-imported data, the apparatus comprising:
the first acquisition unit is used for acquiring data to be imported and judging whether the data to be imported meets an abnormal condition or not; the abnormal condition comprises that the number of the data fields of the data to be imported does not accord with the number of the target fields and/or the field format of the data to be imported does not accord with the preset field format;
the first determining unit is used for determining the data to be imported as abnormal data to be checked if the data to be imported meets the abnormal condition, and writing the abnormal data to be checked into an abnormal data table;
The importing unit is used for selecting target abnormal data from the abnormal data table and importing the target abnormal data into a target database;
a second determining unit, configured to determine, if the importing is successful, an anomaly problem of the target anomaly data as a data anomaly problem;
and the third determining unit is used for determining the abnormal problem of the target abnormal data as a warehousing abnormal problem if the importing is unsuccessful.
In one possible implementation manner, the first determining unit is specifically configured to obtain a target table name of a data table into which the to-be-examined abnormal data is to be imported and a predetermined import time of the to-be-examined abnormal data into a target database;
acquiring column information, data information and anomaly information of the anomaly data to be checked, and forming a target value field by the column information, the data information and the anomaly information;
and writing the target table name, the target import time and the target value field into the abnormal data table.
In one possible implementation manner, the importing unit is specifically configured to select a data table in the target database as the data table to be written;
acquiring the table name of the data table to be written as the table name to be written; acquiring date information of the data table to be written as date to be written;
Inquiring whether the abnormal data table has corresponding abnormal data to be checked or not according to the table name to be written and the date to be written;
and if the abnormal data to be checked is the target abnormal data.
In one possible implementation manner, the importing unit is specifically configured to obtain key value data of the target abnormal data, and splice the key value data to obtain target key value data;
and writing the target key value data into a target database.
In one possible implementation, the apparatus further includes:
the second acquisition unit is used for acquiring original data, and carrying out data aggregation and data conversion on the original data to obtain source data;
and the first selecting unit is used for selecting data to be imported from the source data.
In one possible implementation, the apparatus further includes:
a third obtaining unit, configured to obtain first file information of a first data file from a configuration file, where the first file information includes first location information and first data information; the first data file is a file for storing original data;
a first querying unit, configured to query whether the first data file exists using the first location information;
The second query unit is used for querying whether the original data in the first data file is complete or not by utilizing the first data information if the original data is complete;
the second obtaining unit is specifically configured to take the data in the first data file as original data if the data in the first data file is complete;
and acquiring original data from the first data file.
In one possible implementation, the apparatus further includes:
a fourth acquisition unit configured to acquire second file information of a second data file from the configuration file, the second file information including second location information and second data information; the second data file is a file for storing source data;
a third query unit configured to query whether the second data file exists using the second location information;
a fourth query unit, configured to query whether the data in the second data file is complete using the second data information, if so;
the first obtaining unit is specifically configured to take the data in the second data file as source data if the data in the second data file is complete;
and acquiring data to be imported from the second data file.
In a third aspect, the present application provides a processing apparatus for exception-import data, including: a processor, memory, system bus;
the processor and the memory are connected through the system bus;
the memory is configured to store one or more programs, the one or more programs comprising instructions, which when executed by the processor, cause the processor to perform the methods of the embodiments described above.
In a fourth aspect, the present application provides a computer readable storage medium having stored therein instructions that, when executed on a terminal device, cause the terminal device to perform the method according to the above embodiments.
From this, the embodiment of the application has the following beneficial effects:
the embodiment of the application provides a processing method, a device and equipment for abnormal imported data, which are used for firstly acquiring data to be imported and judging whether the data to be imported meets an abnormal condition or not; the exception condition includes that the number of data fields of the data to be imported does not conform to the target number of fields and/or that the field format of the data to be imported does not conform to a preset field format. If the data to be imported meets the abnormal condition, determining the data to be imported as abnormal data to be checked, and writing the abnormal data to be checked into an abnormal data table. And finally, selecting target abnormal data from the abnormal data table, importing the target abnormal data into a target database, and determining an abnormal problem according to an importing result. If the import is successful, the problem is not a problem occurring in the import operation, and the problem of the target abnormal data is determined as the data abnormal problem. If the import is unsuccessful, determining the abnormal problem of the target abnormal data as a warehouse-in abnormal problem. By importing the determined abnormal data into the target database, on one hand, automatic importing of the abnormal data of which part can be normally put into storage can be realized, and on the other hand, the abnormal reason of the abnormal data can be determined, so that the abnormal data can be conveniently processed aiming at the abnormal reason. The processing speed of the abnormal data is increased, and the data importing efficiency is improved.
Drawings
FIG. 1 is a flowchart of a method for processing exception imported data according to an embodiment of the present application;
fig. 2 is a schematic structural diagram of a processing device for exception-import data according to an embodiment of the present application.
Detailed Description
In order to facilitate understanding and explanation of the technical solutions provided by the embodiments of the present application, the background art of the present application will be described first.
The inventor finds that partial abnormal data may exist in the imported data after researching the traditional data importing process of importing the data into the database, and the operation of the target database may be affected if the abnormal data is not processed before the data is imported. In the current data importing method, abnormal data is extracted, and then the abnormal data is processed in a manual processing mode. The manner of manual processing is inefficient and when the type of anomaly problem for the anomaly data is not determined, multiple problem resolution approaches may need to be tried, resulting in longer processing times for the anomaly data.
Based on this, the method, the device and the equipment for processing the abnormal imported data provided by the embodiment of the application firstly acquire the data to be imported and judge whether the data to be imported meets the abnormal condition or not; the exception condition includes that the number of data fields of the data to be imported does not conform to the target number of fields and/or that the field format of the data to be imported does not conform to a preset field format. If the data to be imported meets the abnormal condition, determining the data to be imported as abnormal data to be checked, and writing the abnormal data to be checked into an abnormal data table. And finally, selecting target abnormal data from the abnormal data table, importing the target abnormal data into a target database, and determining an abnormal problem according to an importing result. If the import is successful, the problem is not a problem occurring in the import operation, and the problem of the target abnormal data is determined as the data abnormal problem. If the import is unsuccessful, determining the abnormal problem of the target abnormal data as a warehouse-in abnormal problem. By importing the determined abnormal data into the target database, on one hand, automatic importing of the abnormal data of which part can be normally put into storage can be realized, and on the other hand, the abnormal reason of the abnormal data can be determined, so that the abnormal data can be conveniently processed aiming at the abnormal reason. The processing speed of the abnormal data is increased, and the data importing efficiency is improved.
In order to facilitate understanding of the technical solution provided by the embodiments of the present application, a method for processing abnormal imported data provided by the embodiments of the present application is described below with reference to the accompanying drawings.
Referring to fig. 1, the flowchart of a method for processing abnormal imported data according to an embodiment of the present application includes steps S101 to S105.
S101: acquiring data to be imported, and judging whether the data to be imported meets an abnormal condition or not; the exception condition includes that the number of data fields of the data to be imported does not conform to the target number of fields and/or that the field format of the data to be imported does not conform to a preset field format.
The data to be imported is the data which needs to be written into the target database, and the data to be imported can be derived from each service system generating the data.
The data to be imported may be data obtained after basic data processing. The data to be imported may have abnormal data therein, which may not be normally written into the target database. In order not to influence the normal importing process, firstly analyzing the data to be imported, extracting the abnormal data in the data to be imported, and processing the abnormal data.
The abnormal data to be imported can be judged according to the abnormal condition. The exception condition may specifically be that the number of data fields of the data to be imported does not conform to the target number of fields and/or that the field format of the data to be imported does not conform to a preset field format. And judging whether the data to be imported is abnormal or not according to the number of the data fields and the field formats in the data to be imported.
The exception condition may be set in a configuration file in advance. In one possible implementation, after the data to be imported is acquired, an abnormal condition may be acquired from the configuration file to determine the data to be imported.
The embodiment of the application is not limited to specific target field number and preset field format, and can be set according to the data form of the database and the data form of the standard data to be imported.
S102: if the data to be imported meets the abnormal condition, determining the data to be imported as abnormal data to be checked, and writing the abnormal data to be checked into an abnormal data table.
And if the data to be imported meets the abnormal condition, determining the data to be imported as abnormal data to be checked. The abnormal data to be checked may have data which can be read and written normally, and the data can meet the abnormal condition but does not need to be processed by special abnormal data.
And writing the abnormal data to be checked into an abnormal data table. Specifically, the abnormal data to be checked can be written into the abnormal data table after corresponding processing according to the format of the abnormal data table. The embodiment of the application provides a specific implementation manner of writing the abnormal data to be checked into the abnormal data table, please refer to the following.
The embodiment of the application does not limit the processing mode of the data to be imported which does not meet the abnormal condition. In one possible implementation, the data to be imported that does not satisfy the exception condition may be written into the target database. Specifically, when the target database is an HBase database, a bulk load warehousing manner may be adopted.
S103: and selecting target abnormal data from the abnormal data table, and importing the target abnormal data into a target database.
It can be understood that the partial abnormal data only have different formats, so that the introduction of the abnormal data and the reading in the target database are not affected, and the abnormal data can be normally introduced into the target database for subsequent reading and writing operations.
The source of the target abnormal data in the abnormal data table is different, and the data table in the target database to be written is also different. And selecting part of the abnormal data from the abnormal data table as target abnormal data, importing the target abnormal data into a target database, and sequentially processing the abnormal data to be checked in the abnormal data table.
S104: if the importing is successful, determining the abnormal problem of the target abnormal data as the data abnormal problem.
If the target abnormal data can be imported into the target database, the target abnormal data has no warehousing problem, and the abnormal problem of the target abnormal data is determined to be the data abnormal problem.
The data anomaly problem refers to a format problem existing in the data itself to be imported. The data anomaly problem may be caused by the fact that the data format of the service system from which the data is to be imported is different from the data format of the import target database.
S105: if the import is unsuccessful, determining the abnormal problem of the target abnormal data as a warehouse-in abnormal problem.
The problem of abnormal warehousing is that target abnormal data cannot be correctly written into a target database. The problem of abnormal warehousing may be caused by the inability of the target abnormal data to meet the warehousing conditions. If the abnormal data is not successfully imported, determining the abnormal problem of the target abnormal data as a warehousing abnormal problem.
Based on the above-mentioned content related to S101-S105, the abnormal data to be checked is determined from the data to be imported, and written into the corresponding abnormal data table. And then, selecting target abnormal data from the abnormal data table and leading the target abnormal data into the target database, so that on one hand, the automatic leading-in of part of abnormal data which can be normally put into storage can be realized, and on the other hand, the abnormal reason of the target abnormal data can be determined, thereby being convenient for the subsequent processing of the abnormal data aiming at the abnormal reason. The processing speed of the abnormal data is increased, and the data importing efficiency is improved.
In one possible implementation, the primary key in the exception table may be the import time and table name. For recording abnormal data to be examined corresponding to different data tables to be written and different writing times.
Specifically, writing the abnormal data to be checked into the abnormal data table includes:
acquiring a target table name of a data table to be imported by the abnormal data to be checked and a preset importing time of the abnormal data to be checked into a target database;
acquiring column information, data information and anomaly information of anomaly data to be checked, and forming the column information, the data information and the anomaly information into a target value field;
and writing the target table name, the target import time and the target value field into the abnormal data table.
And obtaining the target table names of the data tables in the target database to which the abnormal data to be checked are written, and reserving importing time for importing the abnormal data to be checked into the target database. And determining a data table in the target database to which the abnormal data to be checked is to be imported according to the target table name and the preset importing time.
Column information, data information and anomaly information of the anomaly data to be checked are obtained. The column information is corresponding column information in an original data table stored by the abnormal data to be checked. The data information is information related to abnormal data to be checked. The abnormality information corresponds to an abnormality problem of the abnormality data to be examined.
The column information, the data information, and the abnormality information are composed into a target value field.
And writing the target table name, the target import time and the target value field into the abnormal data table. In one possible implementation manner, the target table name and the target import time are key values of a main key corresponding to the abnormal data to be checked, and the target value field is corresponding key value data in the abnormal data table.
Further, selecting the target abnormal data from the abnormal data table includes:
selecting a data table in a target database as a data table to be written;
acquiring a table name of a data table to be written as the table name to be written; acquiring date information of a data table to be written as a date to be written;
inquiring whether the abnormal data to be checked is corresponding to the table name to be written in and the date to be written in the abnormal data table;
if the data to be checked is the target abnormal data, the abnormal data to be checked is taken as the target abnormal data.
In one possible implementation, after normal data to be written is written to the corresponding data table in the target database, whether each data table in the target database has corresponding unwritten abnormal data to be checked may be queried.
And selecting the data table in the target database as a data table to be written, wherein the data table to be written can be one of the data tables for writing the data to be imported.
Because the target table name and the target import time of the abnormal data to be checked are stored in the abnormal data table, the table name of the data table to be written can be obtained as the table name to be written, and the date information of the data table to be written for writing the data to be written can be obtained as the date to be written. Inquiring whether the abnormal data to be checked is corresponding to the table name to be written and the date to be written in the abnormal data table.
If the data to be checked has the corresponding abnormal data to be checked, the data to be written in the data table to be written is indicated to have the abnormal data to be checked. And taking the corresponding abnormal data to be checked as target abnormal data so as to write the target abnormal data into the data table to be written later.
Based on the above, in the embodiment of the present application, the data table to be written in the target database is queried for the corresponding abnormal data to be checked, so as to implement the processing of the abnormal data to be checked. When the data to be written has the corresponding abnormal data to be checked, the corresponding abnormal data to be checked is used as target abnormal data, so that the target abnormal data can be written into the corresponding data table to be written in later, the complement of the target abnormal data is carried out, and the abnormal type of the target abnormal data is determined.
In one possible implementation manner, the method for importing the target abnormal data into the target database specifically includes:
obtaining key value data of target abnormal data, and splicing the key value data to obtain target key value data;
and writing the target key value data into a target database.
And extracting the target abnormal data from the abnormal data table after selecting the target abnormal data from the abnormal data table.
And acquiring key value data of the target abnormal data, and splicing the key value data to obtain the target key value data. The target key value data is data for writing into the target database. And writing the target key value data into a target database to realize the importing of the target abnormal data into the target database. And the type of the abnormal problem of the target abnormal data can be determined according to the imported result.
In the embodiment of the application, the target key value data is obtained by utilizing the key value data of the target abnormal data, and the target key value data is written into the target database, so that the complement of the target abnormal data can be realized, the automatic processing of partial target abnormal data which can be imported into the target database can be realized, and the processing efficiency of the abnormal data is improved. And, by writing the target key value data into the target database, the type of the abnormality problem of the target abnormality data can be determined from the writing result.
It may be understood that the data to be imported is data derived from a service system, and the service system may specifically be a service system, and the generated data may be service data. The structure of data from different service systems is different, and the correlation between data from the respective service systems is not strong enough. If the data from the service system is directly used as the data to be imported, the corresponding multiple importing tasks are executed for different types of data to be imported, so that the determination of abnormal data to be examined is affected, and the data importing efficiency is reduced.
Based on the above problems, data from the service system can be processed to obtain data to be imported. In one possible implementation manner, before acquiring the data to be imported and determining whether the data to be imported is abnormal data to be checked, the method further includes:
acquiring original data, and performing data aggregation and data conversion on the original data to obtain source data;
and selecting data to be imported from the source data.
The original data is data from a service system, and the original data has characteristics of the service system from which the original data is derived. And carrying out data aggregation and data conversion on the original data to obtain the processed source data.
The data aggregation specifically refers to performing association integration on original data belonging to different types or originating from different service systems. Specifically, the method comprises the steps of extracting the original data with the same or similar characteristics from the original data for clustering. The characteristic may specifically be the type of traffic with which the original data is associated. For example, data for public traffic may be extracted from raw data, and such extracted raw data may be aggregated. The data aggregation further comprises data processing on the original data, wherein key data can be selected from the original data, or the key data is obtained by calculating one or more original data. For example, average data corresponding to a plurality of original data is calculated using the plurality of original data. When the data is imported, the key data can be used as the data to be imported, so that the data quantity of the data to be imported of the target database is reduced.
The data conversion refers to preprocessing part of abnormal characters in the original data, and preventing the abnormal characters from affecting data import. Specifically, the method can include setting blank characters as default characters, removing special characters related to individual data fields in data, and the like.
And obtaining source data after data aggregation and data conversion, and extracting the source data to obtain data to be imported into the target database.
The embodiment of the application does not limit the specific mode of selecting the data to be imported from the source data, and the corresponding selection conditions can be set according to the data importing requirement.
In addition, in order to facilitate the subsequent acquisition of the specific operations of data aggregation and data conversion, information about the specific operations of data aggregation and data conversion may be saved in a configuration file. Therefore, specific operations of data aggregation and data conversion can be determined through the configuration file, and modification of data and optimization of a later data processing mode are achieved.
Based on the above, the complexity of the field information of the data to be imported can be reduced through data aggregation and data conversion, and different interfaces do not need to be developed for different data types, so that the unification of the warehouse-in interfaces is realized. And high-quality data which can be imported through the interface is obtained, so that the success rate of batch warehousing is improved. In addition, the volume of data to be imported can be reduced, the data importing efficiency is improved, and the workload of processing abnormal data to be imported in the later period is reduced.
Further, before the data aggregation and data conversion are performed on the original data, verification may be performed on whether the original data exists and whether the original data is complete.
In one possible implementation manner, before acquiring the source data, performing data aggregation and data conversion on the original data to obtain the source data, the method further includes:
acquiring first file information of a first data file from the configuration file, wherein the first file information comprises first position information and first data information; the first data file is a file for storing original data;
querying whether the first data file exists or not by using the first position information;
if so, the first data information is used for inquiring whether the original data in the first data file is complete.
The configuration file is provided with basic information of data transmitted by other service systems, and the first file information of the first data file can be obtained according to the configuration information. The first data file is a file storing data sent by the service system. The first file information includes first location information and first data information. The first location information is used for determining a storage location where the first data file is located, and the storage location may be a designated directory. The first data information is data information stored in the first data file, and the first data information may specifically include one or more of file name information, file classification information, and file distribution information.
The presence of the first data file is determined first using the first location information. If the first data file does not exist, it may be that the file storing the data is lost in transmission or that the data file has a storage problem, and further acquisition or problem checking of the data is required.
If the first data file exists, the corresponding data transmitted by the service system exists. However, the data stored in the first data file is not necessarily complete. And checking whether the data in the first data file is complete or not by using the acquired first file information.
It should be noted that, the configuration file storing the first file information of the first data file and the configuration file storing the information related to the specific operations of data aggregation and data conversion may be the same configuration file. Such a profile may be a first level profile for storing information related to data and data files. In addition, there is a second level of configuration files and a third level of configuration files. The second level configuration file may be used to store task information related to the data to be imported into the database. The third level of configuration files may be used to build specific subtasks with initialization information associated with the binning operation. By setting specific information on the three levels of configuration files, the simple and convenient expansion of the data to be imported system can be realized, and the stability of data importing is improved.
Correspondingly, obtaining the original data includes:
if the data in the first data file is complete, taking the data in the first data file as original data;
raw data is obtained from a first data file.
If it is determined that the data in the first data file is complete, the data in the first data file is determined to be the original data. And acquiring the original data from the first data file, and carrying out subsequent processing on the original data.
In the embodiment of the application, the integrity of the original data can be ensured by checking the existence and the data integrity of the data file, the subsequent processing of the original data is facilitated, the problem in data transmission of a service system is prevented, the efficiency of importing the data to be imported is improved, the abnormal data to be imported is reduced, and the processing efficiency of the abnormal data to be imported is improved.
In one possible implementation manner, before acquiring the data to be imported and determining whether the data to be imported is abnormal data to be checked, the method further includes:
acquiring second file information of a second data file from the configuration file, wherein the second file information comprises second position information and second data information; the second data file is a file for storing source data;
Querying whether a second data file exists or not by using the second position information;
if so, the second data information is used for inquiring whether the data in the second data file is complete.
The second data file is a data file for storing source data. The second data file may be generated after processing the original data to obtain the source data.
There may be a problem of data file loss during transmission of the second data file. Correspondingly, the presence of the second data file needs to be verified. Specifically, second file information of a second data file is obtained from the configuration file, and the second file information comprises second position information and second data information. The second location information is storage location information where the second data file is located, and may specifically be a storage location corresponding to the directory to be imported. The second data information is related information of data stored in the second data file, and the second data information may include one or more of file name information, file classification information, and file distribution information.
The presence of a second data file is first determined using the second location information. If the second data file does not exist, it may be that the file storing the source data is lost in transmission or that the second data file has a storage problem, and further acquisition or problem checking needs to be performed on the second data file.
If the second data file exists, the data to be sourced exists. However, the source data stored in the second data file is not necessarily complete. And checking whether the source data in the second data file is complete or not by using the acquired second file information.
Correspondingly, acquiring the data to be imported includes:
if the data in the second data file is complete, taking the data in the second data file as source data;
and obtaining the data to be imported from the second data file.
If it is determined that the data in the second data file is complete, the data in the second data file is determined to be source data. Note that, not all the source data are data to be imported, and part of the source data may be selected as the data to be imported. And acquiring source data from the second data file, acquiring data to be imported from the source data, and judging the subsequent abnormal condition of the data to be imported.
In the embodiment of the application, the integrity of the source data can be ensured by checking the existence and the data integrity of the second data file, so that the data to be imported can be conveniently selected from the source data, the problem that the data file is lost and the data is missed in the process of generating the source data or transmitting the source data is prevented, the abnormal data to be imported is reduced, and the processing efficiency of the abnormal data to be imported is improved.
Based on the method for processing the abnormal imported data provided by the above method embodiment, the embodiment of the present application further provides a device for processing the abnormal imported data, and the device for processing the abnormal imported data will be described below with reference to the accompanying drawings.
Referring to fig. 2, the schematic structural diagram of a processing device for exception-import data according to an embodiment of the present application is shown. As shown in fig. 2, the processing device for the exception-introduced data includes:
a first obtaining unit 201, configured to obtain data to be imported, and determine whether the data to be imported meets an abnormal condition; the abnormal condition comprises that the number of the data fields of the data to be imported does not accord with the number of the target fields and/or the field format of the data to be imported does not accord with the preset field format;
a first determining unit 202, configured to determine the data to be imported as abnormal data to be checked if the data to be imported meets an abnormal condition, and write the abnormal data to be checked into an abnormal data table;
an importing unit 203, configured to select target abnormal data from the abnormal data table, and import the target abnormal data into a target database;
a second determining unit 204, configured to determine an anomaly problem of the target anomaly data as a data anomaly problem if the importing is successful;
And a third determining unit 205, configured to determine, if the import is unsuccessful, an anomaly problem of the target anomaly data as a binning anomaly problem.
In a possible implementation manner, the first determining unit 202 is specifically configured to obtain a target table name of a data table into which the to-be-examined abnormal data is to be imported and a predetermined import time of the to-be-examined abnormal data into a target database;
acquiring column information, data information and anomaly information of the anomaly data to be checked, and forming a target value field by the column information, the data information and the anomaly information;
and writing the target table name, the target import time and the target value field into the abnormal data table.
In a possible implementation manner, the importing unit 203 is specifically configured to select a data table in the target database as the data table to be written;
acquiring the table name of the data table to be written as the table name to be written; acquiring date information of the data table to be written as date to be written;
inquiring whether the abnormal data table has corresponding abnormal data to be checked or not according to the table name to be written and the date to be written;
And if the abnormal data to be checked is the target abnormal data.
In a possible implementation manner, the importing unit 203 is specifically configured to obtain key value data of the target abnormal data, and splice the key value data to obtain target key value data;
and writing the target key value data into a target database.
In one possible implementation, the apparatus further includes:
the second acquisition unit is used for acquiring original data, and carrying out data aggregation and data conversion on the original data to obtain source data;
and the first selecting unit is used for selecting data to be imported from the source data.
In one possible implementation, the apparatus further includes:
a third obtaining unit, configured to obtain first file information of a first data file from a configuration file, where the first file information includes first location information and first data information; the first data file is a file for storing original data;
a first querying unit, configured to query whether the first data file exists using the first location information;
the second query unit is used for querying whether the original data in the first data file is complete or not by utilizing the first data information if the original data is complete;
The second obtaining unit is specifically configured to take the data in the first data file as original data if the data in the first data file is complete;
and acquiring original data from the first data file.
In one possible implementation, the apparatus further includes:
a fourth acquisition unit configured to acquire second file information of a second data file from the configuration file, the second file information including second location information and second data information; the second data file is a file for storing source data;
a third query unit configured to query whether the second data file exists using the second location information;
a fourth query unit, configured to query whether the data in the second data file is complete using the second data information, if so;
the first obtaining unit 201 is specifically configured to take, if the data in the second data file is complete, the data in the second data file as source data;
and acquiring data to be imported from the second data file.
Based on the method for processing the abnormal imported data provided by the embodiment of the method, the application provides processing equipment for the abnormal imported data, which comprises the following steps: a processor, memory, system bus;
The processor and the memory are connected through the system bus;
the memory is configured to store one or more programs, the one or more programs comprising instructions, which when executed by the processor, cause the processor to perform the methods of the embodiments described above.
Based on the method for processing the abnormal imported data provided by the above method embodiment, the present application provides a computer readable storage medium, where an instruction is stored in the computer readable storage medium, and when the instruction is executed on a terminal device, the terminal device is caused to execute the method described in the above embodiment.
It should be noted that, in the present description, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different manner from other embodiments, and identical and similar parts between the embodiments are all enough to refer to each other. For the system or device disclosed in the embodiments, since it corresponds to the method disclosed in the embodiments, the description is relatively simple, and the relevant points refer to the description of the method section.
It should be understood that in this application, "at least one" means one or more, and "a plurality" means two or more. "and/or" for describing the association relationship of the association object, the representation may have three relationships, for example, "a and/or B" may represent: only a, only B and both a and B are present, wherein a, B may be singular or plural. The character "/" generally indicates that the context-dependent object is an "or" relationship. "at least one of" or the like means any combination of these items, including any combination of single item(s) or plural items(s). For example, at least one (one) of a, b or c may represent: a, b, c, "a and b", "a and c", "b and c", or "a and b and c", wherein a, b, c may be single or plural.
It is further noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. The software modules may be disposed in Random Access Memory (RAM), memory, read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (9)

1. A method for processing exception-imported data, the method comprising:
acquiring data to be imported, and judging whether the data to be imported meets an abnormal condition or not; the abnormal condition comprises that the number of the data fields of the data to be imported does not accord with the number of the target fields and/or the field format of the data to be imported does not accord with the preset field format;
if the data to be imported meets the abnormal condition, determining the data to be imported as abnormal data to be checked, and writing the abnormal data to be checked into an abnormal data table;
selecting target abnormal data from the abnormal data table, and importing the target abnormal data into a target database;
If the importing is successful, determining the abnormal problem of the target abnormal data as a data abnormal problem;
if the importing is unsuccessful, determining the abnormal problem of the target abnormal data as a warehousing abnormal problem;
the writing the abnormal data to be checked into an abnormal data table comprises the following steps:
acquiring a target table name of a data table to be imported by the abnormal data to be checked and a preset importing time of the abnormal data to be checked into a target database;
acquiring column information, data information and anomaly information of the anomaly data to be checked, and forming a target value field by the column information, the data information and the anomaly information;
and writing the target table name, the preset import time and the target value field into the abnormal data table.
2. The method of claim 1, wherein the selecting the target exception data from the exception data table comprises:
selecting a data table in a target database as a data table to be written;
acquiring the table name of the data table to be written as the table name to be written; acquiring date information of the data table to be written as date to be written;
inquiring whether the abnormal data table has corresponding abnormal data to be checked or not according to the table name to be written and the date to be written;
And if the abnormal data to be checked is the target abnormal data.
3. The method of claim 1, wherein importing the target anomaly data into a target database comprises:
obtaining key value data of the target abnormal data, and splicing the key value data to obtain target key value data;
and writing the target key value data into a target database.
4. The method according to claim 1, wherein before the acquiring the data to be imported, determining whether the data to be imported is abnormal data to be troubleshooted, the method further comprises:
acquiring original data, and performing data aggregation and data conversion on the original data to obtain source data;
and selecting data to be imported from the source data.
5. The method of claim 4, wherein prior to said obtaining the raw data, performing data aggregation and data conversion on the raw data to obtain the source data, the method further comprises:
acquiring first file information of a first data file from a configuration file, wherein the first file information comprises first position information and first data information; the first data file is a file for storing original data;
Querying whether the first data file exists or not by utilizing the first position information;
if so, inquiring whether the original data in the first data file is complete or not by utilizing the first data information;
the acquiring the original data comprises the following steps:
if the data in the first data file is complete, taking the data in the first data file as original data;
and acquiring original data from the first data file.
6. The method according to claim 1, wherein before the acquiring the data to be imported, determining whether the data to be imported is abnormal data to be troubleshooted, the method further comprises:
acquiring second file information of a second data file from the configuration file, wherein the second file information comprises second position information and second data information; the second data file is a file for storing source data;
querying whether the second data file exists or not by using the second position information;
if so, inquiring whether the data in the second data file is complete or not by utilizing the second data information;
the obtaining the data to be imported comprises the following steps:
if the data in the second data file is complete, taking the data in the second data file as source data;
And acquiring data to be imported from the second data file.
7. An apparatus for processing exception-introduced data, the apparatus comprising:
the first acquisition unit is used for acquiring data to be imported and judging whether the data to be imported meets an abnormal condition or not; the abnormal condition comprises that the number of the data fields of the data to be imported does not accord with the number of the target fields and/or the field format of the data to be imported does not accord with the preset field format;
the first determining unit is used for determining the data to be imported as abnormal data to be checked if the data to be imported meets the abnormal condition, and writing the abnormal data to be checked into an abnormal data table;
the importing unit is used for selecting target abnormal data from the abnormal data table and importing the target abnormal data into a target database;
a second determining unit, configured to determine, if the importing is successful, an anomaly problem of the target anomaly data as a data anomaly problem;
a third determining unit, configured to determine, if the importing is unsuccessful, an anomaly problem of the target anomaly data as a warehouse-in anomaly problem;
the first determining unit is specifically configured to obtain a target table name of a data table to be imported by the abnormal data to be checked and a predetermined import time of the data table to be imported by the abnormal data to be checked; acquiring column information, data information and anomaly information of the anomaly data to be checked, and forming a target value field by the column information, the data information and the anomaly information; and writing the target table name, the preset import time and the target value field into the abnormal data table.
8. A processing apparatus for exception-imported data, comprising: a processor, memory, system bus;
the processor and the memory are connected through the system bus;
the memory is for storing one or more programs, the one or more programs comprising instructions, which when executed by the processor, cause the processor to perform the method of any of claims 1-6.
9. A computer readable storage medium, characterized in that the computer readable storage medium has stored therein instructions, which when run on a terminal device, cause the terminal device to perform the method of any of claims 1-6.
CN202011637145.4A 2020-12-31 2020-12-31 Processing method, device and equipment for abnormal imported data Active CN112632132B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011637145.4A CN112632132B (en) 2020-12-31 2020-12-31 Processing method, device and equipment for abnormal imported data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011637145.4A CN112632132B (en) 2020-12-31 2020-12-31 Processing method, device and equipment for abnormal imported data

Publications (2)

Publication Number Publication Date
CN112632132A CN112632132A (en) 2021-04-09
CN112632132B true CN112632132B (en) 2024-04-12

Family

ID=75290115

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011637145.4A Active CN112632132B (en) 2020-12-31 2020-12-31 Processing method, device and equipment for abnormal imported data

Country Status (1)

Country Link
CN (1) CN112632132B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113220695A (en) * 2021-06-03 2021-08-06 中国农业银行股份有限公司 Data storage method, device, equipment, medium and product

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105760373A (en) * 2014-12-15 2016-07-13 金蝶软件(中国)有限公司 Abnormal data processing method and abnormal data processing device
CN107506451A (en) * 2017-08-28 2017-12-22 泰康保险集团股份有限公司 abnormal information monitoring method and device for data interaction
CN109656985A (en) * 2018-09-27 2019-04-19 深圳壹账通智能科技有限公司 Data lead-in method, system, terminal and storage medium
CN110162563A (en) * 2019-05-28 2019-08-23 深圳市网心科技有限公司 A kind of data storage method, system and electronic equipment and storage medium
CN110781231A (en) * 2019-09-19 2020-02-11 平安科技(深圳)有限公司 Batch import method, device, equipment and storage medium based on database
WO2020134213A1 (en) * 2018-12-25 2020-07-02 苏宁云计算有限公司 Method and system for querying abnormal financial data on basis of knowledge map

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105760373A (en) * 2014-12-15 2016-07-13 金蝶软件(中国)有限公司 Abnormal data processing method and abnormal data processing device
CN107506451A (en) * 2017-08-28 2017-12-22 泰康保险集团股份有限公司 abnormal information monitoring method and device for data interaction
CN109656985A (en) * 2018-09-27 2019-04-19 深圳壹账通智能科技有限公司 Data lead-in method, system, terminal and storage medium
WO2020134213A1 (en) * 2018-12-25 2020-07-02 苏宁云计算有限公司 Method and system for querying abnormal financial data on basis of knowledge map
CN110162563A (en) * 2019-05-28 2019-08-23 深圳市网心科技有限公司 A kind of data storage method, system and electronic equipment and storage medium
CN110781231A (en) * 2019-09-19 2020-02-11 平安科技(深圳)有限公司 Batch import method, device, equipment and storage medium based on database

Also Published As

Publication number Publication date
CN112632132A (en) 2021-04-09

Similar Documents

Publication Publication Date Title
CN110569298A (en) data docking and visualization method and system
CN110647447B (en) Abnormal instance detection method, device, equipment and medium for distributed system
CN113946294A (en) Distributed storage system and data processing method thereof
CN112632132B (en) Processing method, device and equipment for abnormal imported data
CN111460098A (en) Text matching method and device and terminal equipment
CN112699098A (en) Index data migration method, device and equipment
CN112965912B (en) Interface test case generation method and device and electronic equipment
CN112214473B (en) Data migration method and system between databases
CN111949663B (en) Big data main foreign key consistency evaluation method, device and equipment
CN112667631A (en) Method, device and equipment for automatically editing service field and storage medium
CN112579608A (en) Case data query method, system, device and computer readable storage medium
CN112214394A (en) Memory leak detection method, device and equipment
CN116166629A (en) File format conversion method, device, equipment and readable storage medium
CN116204428A (en) Test case generation method and device
CN116089527A (en) Data verification method, storage medium and device
CN113625967B (en) Data storage method, data query method and server
CN113094415B (en) Data extraction method, data extraction device, computer readable medium and electronic equipment
CN111061554B (en) Intelligent task scheduling method and device, computer equipment and storage medium
CN109829016B (en) Data synchronization method and device
CN114648410A (en) Stock staring method, apparatus, system, device and medium
CN113787977A (en) Vehicle maintenance method, communication device, and storage medium
CN112965992B (en) Multi-parameter constraint data retrieval man-machine interaction method and device
CN113760853B (en) Directory processing method, server and storage medium
CN112181539B (en) File processing method, device, equipment and medium
CN113051329B (en) Data acquisition method, device, equipment and storage medium based on interface

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant