Disclosure of Invention
In order to solve the technical problem or at least partially solve the technical problem, the present application provides a data processing method and apparatus.
In a first aspect, the present application provides a data processing method, including:
acquiring a data source needing to read data;
determining a data type of the data source;
determining an adaptive reading mode according to the data type;
reading the data source according to the adaptive reading mode to obtain read data;
converting the read data into data to be stored for saving;
and writing the data to be stored into a database.
Optionally, as in the foregoing method, the determining an adaptive reading manner according to the data type includes:
when the data type is a file type, acquiring a reading mode corresponding to a separator according to a preset strategy according to the separator in the data source;
and when the data type is a database file, determining a reading mode for paging reading of the database file.
Optionally, as in the foregoing method, when the data type is a file type, the reading the data source according to the adapted reading manner to obtain read data includes:
acquiring the data volume of the data source;
reading all the data sources within a preset interval of the data volume of the data sources to obtain read data;
when the data volume of the data source is not in the preset interval, performing multiple times of block reading on the data source according to a preset sequence until all the data sources are read, and obtaining the read data with the number consistent with the block reading times; and the data size of the read data obtained by each block reading is within the preset interval, and the sum of all the read data is consistent with the data source.
Optionally, as in the foregoing method, the converting the read data into data to be stored for saving includes:
matching to obtain a preset data conversion method corresponding to the data type of the data source;
and converting the read data according to the data conversion method to obtain the data to be stored.
Optionally, as in the foregoing method, the converting the read data according to the data conversion method to obtain the data to be stored includes:
matching to obtain a preset filtering strategy corresponding to the data type of the data source;
filtering the data source according to the filtering strategy, and obtaining filtered data after eliminating invalid data;
and converting the filtered data according to the data conversion method to obtain the data to be stored.
Optionally, as in the foregoing method, the writing the data to be stored into the database includes:
and writing the data to be stored into the database in batches according to a preset single-time writing data volume.
Optionally, as in the foregoing method, before acquiring a data source that needs to be read, the method further includes:
acquiring original data for storage;
partitioning the original data to obtain a plurality of data sources;
and generating a corresponding number of job tasks according to the number of the data sources, wherein the job tasks write the data sources into the database in parallel by the method in any one of the preceding items.
Optionally, as in the foregoing method, after writing the data to be stored into the database, the method further includes:
generating an execution result corresponding to the job task;
and acquiring the execution result of each job task to obtain the processing result of the original data.
In a second aspect, the present application provides a data processing apparatus comprising:
the acquisition module is used for acquiring a data source needing data reading;
the type determining module is used for determining the data type of the data source;
the determining module is used for determining an adaptive reading mode according to the data type;
the reading module is used for reading the data source according to the adaptive reading mode to obtain read data;
the processing module is used for converting the read data into data to be stored for storage;
and the writing module is used for writing the data to be stored into a database.
In a third aspect, the present application provides an electronic device, comprising: the system comprises a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory are communicated with each other through the communication bus;
the memory is used for storing a computer program;
the processor is configured to implement the processing method according to any one of the preceding claims when executing the computer program.
In a fourth aspect, the present application provides a non-transitory computer-readable storage medium, characterized in that the non-transitory computer-readable storage medium stores computer instructions that cause the computer to perform the processing method according to any one of the preceding claims.
The data processing method and device provided by the embodiment of the application comprise the following steps: acquiring a data source needing to read data; determining a data type of the data source; determining an adaptive reading mode according to the data type; reading the data source according to the adaptive reading mode to obtain read data; converting the read data into data to be stored for saving; and writing the data to be stored into a database. Compared with the prior art, the technical scheme provided by the embodiment of the application has the following advantages: matching various types of data to obtain corresponding processing flows, reading modes and conversion modes; furthermore, data sources of different data types do not need to be processed by different methods, and the read-write mode is effectively simplified; and then the effects of uniformly realizing data collection aiming at different data sources, reducing the diversity of the realization technology and reducing the maintenance cost are achieved.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
Fig. 1 is a flowchart of a data processing method according to an embodiment of the present application, and as shown in fig. 1, the embodiment includes the following steps S1 to S6:
s1, acquiring a data source needing to read data;
specifically, the data source is data to be written into the database, the data type of the data source is the data format of the data source, and the data source may be obtained actively or obtained by receiving a transmission from another device;
s2, determining the data type of the data source;
specifically, the data type may include a database data or a file type, and the file type may include a word, txt, and the like; in addition, the data source can also be read from the message queue, and the message queue is only equivalent to a container for storing messages in the transmission process of the messages, so that the data type of the data source is not changed.
And S3, determining an adaptive reading mode according to the data type.
Specifically, each data type is preset with a corresponding reading mode, so that the reading mode adaptive to the data type can be determined in a matching mode; the reading mode here may be implemented by calling other software, or may be implemented by writing a code in a system corresponding to the method of the present application, which is not specifically limited herein.
And S4, reading the data source according to the adaptive reading mode to obtain read data.
Specifically, after the adaptive reading mode is obtained, the data source may be read, and the data source may be analyzed by using a method corresponding to the data format of the data source to obtain read data.
And S5, converting the read data into data to be stored for storage.
Specifically, when data is stored, the data needs to be converted into a data type suitable for a format corresponding to a storage method before the data can be stored, and therefore the read data needs to be converted according to the storage method.
And S6, writing the data to be stored into a database.
Specifically, after the read data is converted into the data to be stored, the data is stored by writing the data into the database.
Optionally, the creation and recording of the execution process corresponding to each step in this embodiment may be implemented by a job management module of the SpringBatch framework; meanwhile, a job flow management node can be further arranged and used for recording the process node of each job and temporarily storing the obtained data in the steps S4 and S5, and the job is retried or neglected to continue to be executed when the process is abnormal according to the configuration set in advance by the job management module; wherein, Spring batch is based on POJO and Spring frame, and easy the first hand is used, can let the developer visit and utilize enterprise level service easily, and Spring batch has high scalability's frame simultaneously, and simple batch processing, complicated big data batch processing operation all can realize through Spring batch frame.
Therefore, data collection aiming at different data sources can be uniformly realized, the diversity of the realization technology is reduced, and the maintenance cost is reduced.
In some embodiments, as the aforementioned method, the determining the adapted reading mode according to the data type in step S3 includes steps a31 and a32 as follows:
A31. and when the data type is the file type, acquiring a reading mode corresponding to the separator according to a preset strategy according to the separator in the data source.
A32. And when the data type is the database file, determining a reading mode for paging reading of the database file.
Specifically, when a read operation is performed on a data source: and matching corresponding reading according to different data sources to realize data reading.
For data of file type, parsing data according to a specific delimiter can be supported by the method of the embodiment; when converting a table to text, the locations of the word divisions are identified with separators, and when converting text to a table, the starting location of a new row or column is identified with it. The delimiters may include the following types: the letter "comma" may be a "-", "/", or the like. Generally, the method is generally applied to an accounting system, so when the data type is a file type, the text therein is also set according to preset rules, for example, name, gender, and age are passed through and separated; thus, in a file appears as: in the data, the name corresponds to Zhang III, the sex corresponds to the male and the age corresponds to 23 according to the data obtained by splitting the data.
The data of the database type (DB) supports paging reading, and reading and analyzing are carried out again after data analysis processing of each reading is finished.
As shown in fig. 2, in some embodiments, as the foregoing method, when the data type is a file type, the step S4 reads the data source according to the adapted reading method to obtain the read data, including the following steps S41 to S43:
s41, acquiring the data volume of a data source;
specifically, when a data source is acquired, the corresponding data volume can be acquired by acquiring the attribute of the data source, and in addition, other acquisition modes can be provided, which are not listed one by one;
s42, reading all the data sources within a preset interval to obtain read data;
s43, when the data volume of the data source is not within a preset interval, performing multiple times of block reading on the data source according to a preset sequence until all the data sources are read, and obtaining read data with the number consistent with the number of the block reading times; the data size of the read data obtained by each block reading is within a preset interval, and the sum of all the read data is consistent with the data source.
Specifically, the preset interval is an interval of the size of the data volume, and is used for limiting the data volume read at a time, so that the situation that the system stability is not facilitated due to memory overflow and the like caused by the fact that the data volume is too large at one time is prevented.
When the data volume of the data source is within the preset interval, the memory is not affected even if the data source is completely read, so that the read data can be obtained by one-time reading;
when the data of the data source is not within the preset interval, one of the methods may be: reading the data size consistent with the size of the preset interval each time, and after the analysis processing is finished; if the unprocessed data is still larger than the data volume of the preset interval, reading the unprocessed data continuously according to the size of the preset interval until the data of the data source is completely read; another implementation method may be: and dividing the data source into a plurality of data blocks according to the size of the preset interval, enabling the data volume of each data block to be smaller than or equal to the size of the preset interval, and then respectively reading and analyzing.
By adopting the method in the embodiment, when the file is too large, the file can be read again after the fixed size data is read and analyzed, so that the memory overflow is prevented, and the stability of the system operation is ensured.
As shown in fig. 3, in some embodiments, step S5 converts the read data into data to be stored for saving, as the aforementioned method, including steps S51 and S52 as follows:
step S51: matching to obtain a preset data conversion method corresponding to the data type of the data source;
specifically, data conversion methods corresponding to different data types may be preset; different data conversion methods are used for converting the data of different data types into the data of the data format required by the target storage mode; the data conversion method may be implemented by calling other software, or may be implemented by writing a code in a system corresponding to the method of the present application, which is not limited specifically herein.
Step S52: and converting the read data according to a data conversion method to obtain the data to be stored.
Specifically, after the corresponding preset data conversion method is obtained according to the method matching in the previous step, the read data can be converted by using the method, so that the data in the data format required by the target storage mode is obtained.
As shown in fig. 4, in some embodiments, as the aforementioned method, the step S52 converts the read data according to a data conversion method to obtain the data to be stored, including the following steps S521 to S523:
step S521: matching to obtain a preset filtering strategy corresponding to the data type of the data source;
step S522: filtering the data source according to a filtering strategy, and obtaining filtered data after eliminating invalid data;
step S523: and converting the filtered data according to a data conversion method to obtain the data to be stored.
Specifically, since the data in the data source is not required to be stored, it needs to be filtered to obtain the data required for storage; for example, data of data type T1, the filtering policy for filtering it is filtering policy P1; data of data type T2, the filtering policy for filtering it being filtering policy P2; invalid data filtered by different filtering strategies are generally different; for example: (1) when the data source R1 is of the data type T1, its corresponding data fields include: a. b, c, d; the policy set in the filtering policy P1 is to filter out data corresponding to the data field c, so that the data field corresponding to the data to be stored obtained by filtering by the data source R1 includes: a. b and d. (2) When the data source R2 is of the data type T2, its corresponding data fields include: a. b, e, f, g; the policy set in the filtering policy P2 is to filter out data corresponding to the data fields a and e, so that the data fields corresponding to the data to be stored obtained by filtering by the data source R2 include: b. f, g.
Therefore, the method in the embodiment can filter the read data, avoid data conversion of unnecessary data and waste of processing capacity, and avoid storage of the unnecessary data and waste of storage space.
In some embodiments, as in the foregoing method, the step S6 of writing the data to be stored into the database specifically includes:
and writing the data to be stored into the database in batches according to the preset single-time write data volume.
Specifically, the write-once data size may be the number of data pieces when data is written into the database in each batch; for example: writing 2000 pieces of data in each batch, and the like, and performing adaptation adjustment according to the system processing capacity, which is not specifically limited herein; after setting the write-once data amount, one of the writing methods may be: reading a data amount corresponding to the number of the data amount written once every time, after the writing is completed; if the data volume not written into the database is still larger than the data volume written once, continuing to read and write the data not written into the database according to the size of the data volume written once until the data to be stored is completely written into the database; another implementation method may be: dividing the data to be stored into a plurality of data blocks according to the size of the data volume written once, enabling the data volume of each data block to be smaller than or equal to the size of the data volume written once (generally, the data volume of the remaining part of the data to be stored which is not required to be divided is smaller than the data volume written once), and then respectively writing.
As shown in fig. 6, one application provided by the method described in the previous embodiment is as follows:
I. a job task is created.
Reading the data. And matching corresponding read operation to realize processing according to a specific data source, and reading data according to configuration paging/blocks to prevent memory overflow caused by overlarge data volume.
And III, filtering the read data, eliminating invalid data, and converting the data into a data object needing to be stored.
And IV, repeating the steps II and III until the data reading processing is finished.
And V, writing the processed and converted data into a database, and writing in batches of two thousand pieces in a batch.
And VI, successfully updating the operation state.
In some embodiments, as the foregoing method, before the step S1 obtains the data source that needs to be read, the method further includes the following steps B1 to B3:
b1, acquiring original data for storage;
specifically, the original data is data that needs to be stored without any processing;
b2, partitioning the original data to obtain a plurality of data sources;
specifically, when the original data is too large, the time required for data reading, filtering and writing through only one job task is too long; therefore, after the original data are partitioned, a plurality of data sources are obtained, and then specific operation nodes such as data reading, filtering, writing and the like are respectively executed on each data source, so that the processing efficiency can be effectively improved. The method for partitioning data may be: the database data is partitioned according to the maximum value and the minimum value of the primary key id, the file(s) are partitioned according to the file names, and the like.
And B3, generating a corresponding number of job tasks according to the number of the data sources, wherein the job tasks write the data sources into the database in parallel by the method in any one of the preceding items.
Specifically, after the partitions are completed, the master node executing the method in this embodiment may generate a corresponding number of job tasks according to the number of the partitions, and the job tasks execute the method in any of the foregoing items, or alternatively, the job task may send the partition data and the job context to the slave node executing the job task in the form of a message through the MQ to process the partition data and the job context.
Optionally, the data partitioning and multi-node parallel processing capability is realized through a SpringBatch framework, so that high availability and mass data processing capability of a data processing system such as an account checking system are enhanced.
In some embodiments, as the foregoing method, after the step S6 writes the data to be stored in the database, the method further includes the following steps C1 and C2:
c1, generating an execution result corresponding to the job task;
specifically, the execution result may be a result after the job task is executed, such as "write success" or "write failure".
Step C2. obtains the execution results of each job task, and obtains the processing results of the raw data.
Specifically, after the processing of the slave nodes in the previous embodiment is completed, the execution results of the slave nodes are respectively returned to the master node, and the master node can update the job information of each job task and count the execution conditions.
As shown in fig. 5, according to another aspect of the present application, there is also provided a data processing apparatus including:
the acquisition module 1 is used for acquiring a data source needing data reading;
a type determining module 2, configured to determine a data type of the data source;
the determining module 3 is used for determining an adaptive reading mode according to the data type;
the reading module 4 is used for reading the data source according to the adaptive reading mode to obtain read data;
the processing module 5 is used for converting the read data into data to be stored for saving;
and the writing module 6 is used for writing the data to be stored into the database.
Specifically, the specific process of implementing the functions of each module in the apparatus according to the embodiment of the present invention may refer to the related description in the method embodiment, and is not described herein again.
According to another embodiment of the present application, there is also provided an electronic apparatus including: as shown in fig. 7, the electronic device may include: the system comprises a processor 1501, a communication interface 1502, a memory 1503 and a communication bus 1504, wherein the processor 1501, the communication interface 1502 and the memory 1503 complete communication with each other through the communication bus 1504.
A memory 1503 for storing a computer program;
the processor 1501 is configured to implement the steps of the above-described method embodiments when executing the program stored in the memory 1503.
The bus mentioned in the electronic device may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown, but this does not mean that there is only one bus or one type of bus.
The communication interface is used for communication between the electronic equipment and other equipment.
The Memory may include a Random Access Memory (RAM) or a Non-Volatile Memory (NVM), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the processor.
The Processor may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but also Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components.
Embodiments of the present application also provide a non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform the steps of the above-described method embodiments.
It is noted that, in this document, relational terms such as "first" and "second," and the like, may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The foregoing are merely exemplary embodiments of the present invention, which enable those skilled in the art to understand or practice the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.