CN110309133B - Batch data processing method and device - Google Patents
Batch data processing method and device Download PDFInfo
- Publication number
- CN110309133B CN110309133B CN201910440400.7A CN201910440400A CN110309133B CN 110309133 B CN110309133 B CN 110309133B CN 201910440400 A CN201910440400 A CN 201910440400A CN 110309133 B CN110309133 B CN 110309133B
- Authority
- CN
- China
- Prior art keywords
- data
- data information
- information
- batch
- files
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/21—Design, administration or maintenance of databases
- G06F16/215—Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
- G06F16/2282—Tablespace storage structures; Management thereof
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/27—Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Quality & Reliability (AREA)
- Software Systems (AREA)
- Computing Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The application provides a method and a device for processing batch data, which belongs to the technical field of safety detection, wherein the method comprises the steps of receiving a plurality of batch data files and obtaining attribute information of the batch data files; according to the matching rule stored in the batch data file, storing the data information in the batch data file into each data table according to the attribute information of the batch data file; and acquiring the characteristic bytes according to the data information, and correspondingly covering the corresponding data information of the data table one by one corresponding to the characteristic bytes. The method is beneficial to ensuring the processing efficiency of the data and finally ensuring the processing quality of the data.
Description
Technical Field
The application relates to the technical field of data processing, in particular to a method and a device for processing batch data.
Background
With the development of internet big data technology, the data volume to be processed by the server will be increased in an explosive manner, and meanwhile, the server system with the functions of data processing, storage, inquiry and the like is gradually developed into a processing system with multi-level functional modules. How the data synchronization process is performed between the functional modules of the server system becomes particularly important.
In order to do this, the existing method is to synchronize the data between each functional module in the server system when the data processing is idle, such as 12 a.m. However, in the synchronization process, synchronization processing is generally performed with all data to be synchronized, which is not beneficial to the synchronization processing efficiency and data quality between the functional modules of the server system.
Disclosure of Invention
In order to solve the above technical problems, in particular, the synchronization processing of all the data to be synchronized together in the prior art is not beneficial to the synchronization processing efficiency and the data quality between the functional modules of the server system:
in a first aspect, the present application provides a method for processing batch data, including the steps of:
receiving a plurality of batch data files, and obtaining attribute information of the batch data files;
according to the matching rule stored in the batch data file, storing the data information in the batch data file into each data table according to the attribute information of the batch data file;
and acquiring the characteristic bytes according to the data information, and correspondingly covering the corresponding data information of the data table one by one corresponding to the characteristic bytes.
In one embodiment, the step of obtaining the characteristic bytes according to the data information, where the corresponding data information of the data table is correspondingly covered by the characteristic bytes one by one includes:
extracting corresponding characteristic bytes according to the data information index;
comparing the characteristic byte of each data information with the original data information in the data table;
and according to the comparison result, covering the corresponding data information of the data table with the data information.
In one embodiment, the attribute information is dividing the data information in the batch data file into newly added data information and updated data information.
In one embodiment, the method for processing batch data further includes:
when detecting that the data information stored in the data table is in an abnormal state, searching a corresponding batch data file according to the data information, and stopping the batch data file for data processing;
and carrying out data correction according to the error log of the data information.
In one embodiment, when detecting that the data information stored in the data table is in an abnormal state, searching a corresponding batch data file according to the data information, and stopping the batch data file from performing data processing, where the step includes:
according to the data information index, judging that the data information is in an abnormal state when detecting that the related content corresponding to the data information index cannot be obtained through analysis of the data information stored in the data table;
and searching and obtaining the batch data file corresponding to the data information according to the data information, and stopping the batch data file for data processing.
In one embodiment, the step of correcting the error log according to the data information includes:
obtaining abnormal bytes of the corresponding data information according to the error log;
and correcting the abnormal bytes by corresponding the abnormal bytes to the bytes corresponding to the data information indexes.
In one embodiment, the step of storing the data information in the data tables according to the attribute information of the batch data file and the matching rule stored in the batch data file further includes:
acquiring an actual formation time period of the data information;
if the actual forming time period is within the set generating time period of the corresponding batch data file, storing the corresponding data information into a temporary data table;
after the storage of other batch data files is completed, according to the matching rule stored by the batch data files, the data information in the temporary data table is stored into the corresponding data table.
In a second aspect, the present application also provides a device for processing batch data, including:
the receiving module is used for receiving a plurality of batch data files and acquiring attribute information of the batch data files;
the storage module is used for storing data information in the batch data files into each data table according to the matching rules stored in the batch data files and the attribute information of the batch data files;
and the coverage module is used for acquiring the characteristic bytes according to the data information, and correspondingly covering the corresponding data information of the data table one by one corresponding to the characteristic bytes.
In a third aspect, the present application also provides a server, comprising:
one or more processors;
a memory;
one or more computer programs, wherein the one or more computer programs are stored in the memory and configured to be executed by the one or more processors, the one or more computer programs configured to perform the method of processing bulk data according to embodiments of the first aspect.
In a fourth aspect, the present application also provides a computer readable storage medium, where a computer program is stored, where the computer program when executed by a processor implements a method for processing batch data according to an embodiment of the first aspect.
According to the method and the device for processing the batch data, the respective data information is distributed to different data tables according to the attribute information of the received batch data files, and the data information corresponding to the characteristic byte coverage data tables is obtained according to the characteristics of the data information of the different data tables.
On the basis of the above, the application also provides another batch data processing method and device, when detecting that the data information stored in the data table is in an abnormal state, the batch data file is stopped from corresponding data processing, and corresponding repair is carried out according to the error log of the data information.
According to the technical scheme provided by the application, the requirement of synchronous data processing among the functional modules in the server system is defined, the attribute information is defined, and the data information is classified to form respective batch data files. The batch data files are mutually independent, and the server can process the batch data files simultaneously or independently. However, if there is an abnormal state in the batch data file, and/or other data processing needs to be performed, such as data correction, stopping data processing, etc., normal data processing of other batch data files is not affected. Therefore, in the process of data synchronization, the occurrence of the condition that other data processing is stopped due to the occurrence of abnormality of individual data can be prevented, the data processing efficiency is guaranteed, and the data processing quality is finally guaranteed.
Additional aspects and advantages of the application will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the application.
Drawings
The foregoing and/or additional aspects and advantages of the application will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings, in which:
FIG. 1 is a diagram of an application environment for an embodiment of the present application;
FIG. 2 is a flow chart of a method of processing batch data in accordance with one embodiment of the application;
FIG. 3 is a flow chart of a method of processing batch data in accordance with another embodiment of the present application;
FIG. 4 is a schematic diagram of a batch data processing apparatus according to one embodiment of the present application;
fig. 5 is a schematic structural diagram of a server according to an embodiment of the present application.
Detailed Description
Embodiments of the present application are described in detail below, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to like or similar elements or elements having like or similar functions throughout. The embodiments described below by referring to the drawings are illustrative only and are not to be construed as limiting the application.
As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless expressly stated otherwise, as understood by those skilled in the art. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may also be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or wirelessly coupled. The term "and/or" as used herein includes all or any element and all combination of one or more of the associated listed items.
It will be understood by those skilled in the art that all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs unless defined otherwise. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the prior art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.
Those skilled in the art will appreciate that a remote network device, as used herein, includes, but is not limited to, a computer, a network host, a single network server, a collection of network servers, or a cloud of multiple servers. Here, the Cloud is composed of a large number of computers or network servers based on Cloud Computing (Cloud Computing), which is a kind of distributed Computing, a super virtual computer composed of a group of loosely coupled computer sets. In the embodiment of the present application, the communication between the remote network device, the terminal device and the WNS server may be implemented by any communication method, including, but not limited to, mobile communication based on 3GPP, LTE, WIMAX, computer network communication based on TCP/IP and UDP protocols, and short-range wireless transmission method based on bluetooth and infrared transmission standards.
Referring to fig. 1, fig. 1 is an application environment diagram of an embodiment of the present application; in this embodiment, the technical solution of the present application may be implemented on a server, as in fig. 1, where the host server 110 and the query system server 120 may implement data interaction through an internet network. The host server 110 may perform a data processing service according to a service request of a user, and send corresponding user data to a database established by the query system server 120 for storage. When the host server 110 receives the query and information update command of the user, it sends out related operation command to the query system server 120, invokes related data, and forms related query information to be sent to the user interface.
In order to solve the problems, the application provides a batch data processing method. Referring to fig. 2, fig. 2 is a flow chart of a method of processing batch data according to one embodiment, the method comprising the steps of:
s210, receiving a plurality of batch data files, and acquiring attribute information of the batch data files.
Because of the large volume of data handled by the server, synchronization of data is performed between modules of the server system, especially for a set period of time (e.g., idle, specifically 12 a.m.), and the data processing involved is typically synchronized in the form of bulk data.
In this embodiment, the synchronized data refers to data information of the related art, which may be customer information such as service industry (finance, internet, etc.), or product information of manufacturing industry. Tens or even tens of millions of data information may be contained in each batch data file.
To facilitate distinguishing between different bulk data files during the data synchronization process. When a batch of data files are generated, the data files are classified according to the data information with different attributes, the similar data information is concentrated in one batch of information files, and corresponding attribute information is set for the batch of information files.
Taking data synchronization in the financial service industry as an example, the data information is user information, and each batch of files is batch files of user information aiming at the same attribute information during each data synchronization processing. In this embodiment, the data synchronization process uses the new addition and update of the data information as the basis for dividing the batch data file in the same period, and the attribute information is the new addition data information and the update data information.
According to the above attribute division of the data information, in the same time period, the batch data files synchronously transmitted to the query system server by the host server at least include a first batch data file only including the newly added user information and a second batch data file only including the updated user information.
Because the data volume contained in the batch data file is large, the batch data file can be compressed and then synchronously transmitted to the query system server. Correspondingly, the query system server correspondingly receives the batch data file and decompresses the batch data file, acquires corresponding attribute information from the batch data file, and prepares for classification for subsequent data distinguishing processing.
S220, according to the matching rule stored in the batch data file, storing the data information in the batch data file into each data table according to the attribute information of the batch data file.
Since the batch data file has been decompressed in step S210, the corresponding data information can be directly obtained from the corresponding batch data file.
And storing the data information in the batch data files into corresponding data tables according to the attribute information of the batch data files according to the matching rules stored in the related batch data files.
Corresponding to the above embodiment, the new user information of the first batch data file is stored in table 0, and the updated user information of the second batch data file is stored in table 1. And tables 0 and 1 also carry attribute information of the corresponding batch data file accordingly.
S230, obtaining characteristic bytes according to the data information, and correspondingly covering the corresponding data information of the data table one by one corresponding to the characteristic bytes.
In this embodiment, the data information is stored in the form of bytes. The stored information of the fixed sequence number bytes of all data information is content belonging to the same attribute. In this embodiment, the content may include, for example, a user account number, a user's name, an account date, and the like. Wherein the characteristic byte is a basic distinguishing point for distinguishing between the data information, and in this embodiment, the characteristic byte is a byte for storing user account information.
After the characteristic bytes are obtained from the data information of each data table, traversing the characteristic bytes corresponding to the data information of the data table by the characteristic bytes, and covering the corresponding data information in the data table by the latest data information.
The application provides a batch data processing method, which comprises the steps of obtaining attribute information of a batch data file, storing the attribute information in a corresponding data table, obtaining characteristic bytes from the data information of the batch data file, and covering the data information to an original data information column in the data table according to the characteristic bytes. According to the application, different data tables are stored separately according to the attribute information, and related data information is processed, so that the different data tables are mutually independent in the process of data synchronization processing, are not interfered by errors or corrections of other data tables, and are beneficial to improving the synchronization processing efficiency and the data quality.
For step S230, it may further include:
a1, extracting corresponding characteristic bytes according to the data information index.
The storage forms of the data information are identical for the batch data files of the same type handled by the same server device, and are specified according to certain agreed rules, in this embodiment, the storage forms of the data information are specified by data information indexes. Specific indexes specify the following: the 1 st to 10 th bytes of the data information are used for storing the account number of the corresponding user, the 11 th to 18 th bytes are used for storing the name of the corresponding user, the 19 th to 25 th bytes are used for storing the account date of the corresponding user, and the like. In this embodiment, the query system server extracts feature information of the data information according to the data information index, so as to facilitate subsequent distinguishing processing.
A2, comparing the characteristic byte of each data information with the original data information in the data table.
The characteristic byte of the data information is the most basic information for distinguishing between data information, and no matter how many changes the data information has undergone in the data processing, the characteristic byte does not change in this embodiment.
In this embodiment, the data information in the batch data file includes attribute information of newly added data information or updated data information.
Traversing and comparing the original data information in the data table by the characteristic bytes of each data information, and if the data information is newly added data information, failing to obtain the data information with the same characteristic field in the data table; if the data information is updated data information, data information having the same characteristic field can be obtained in the data table.
A3, according to the comparison result, covering the corresponding data information of the data table with the data information.
For the comparison result in the step A3, for the data information which cannot be obtained in the data table and has the same characteristic field, the relevant data information is newly added into the data table according to the corresponding byte arrangement form; corresponding data information with the same characteristic field can be obtained in the data table, and the corresponding data information with the generation time of the data table being preceded by the data information with the generation time being followed by the data information.
The determination of the generation time can be performed according to the time mark of the data information.
Referring to fig. 3, fig. 3 is a flowchart of a method of processing batch data according to another embodiment of the present application. The batch data processing method provided by the application can further comprise the following steps on the basis of the above steps:
s240, when the data information stored in the data table is detected to be in an abnormal state, searching the corresponding batch data file according to the data information, and stopping the batch data file for data processing.
In this step, when it is detected that there is an abnormality in the data information of the data table, since the batch data file performs the package synchronization processing on the data information of the same attribute information, the corresponding batch data file can be found according to the data information, and at this time, in order to ensure the accuracy of the data synchronization processing, the synchronization data processing on the data information of the batch data file corresponding to the data information is stopped, including the synchronization processing on the newly added data information or the updated data information.
S250, correcting according to the error log of the data information.
Obtaining data information with abnormal state according to the error log, obtaining attribute content with error in the data information according to the error code of the error log, and correcting the attribute content.
Until the error correction of the corresponding batch data file is completed, the method can restore the synchronous processing of the data information in the batch data file.
In this embodiment, since the batch data files are mutually independent, when an abnormality occurs in processing of an individual batch data file, the processing progress of processing of other batch data files is not affected, so that the processing progress of other data can be further prevented from being affected by the abnormal condition of part of the data, and the processing efficiency of batch data is further improved.
For step S240, it may further include:
and B11, according to the data information index, judging that the data information is in an abnormal state when detecting that the related content corresponding to the data information index cannot be obtained by analyzing the data information stored in the data table.
And according to the index of the data information, bytes with different serial numbers of the data information represent contents with different attributes, and according to the contents with the attributes. And when the attribute content of the corresponding byte obtained by analyzing the data information is different from the attribute content obtained according to the data information index, judging that the data information is abnormal.
If the account information is composed of a plurality of bytes, wherein the 19 th-25 th byte is an account opening date, if the corresponding byte in a certain file is 888888888 instead of the format yyyymmdd of the preset date, the query system server cannot parse the field to obtain the corresponding account opening date, at this time, the corresponding data information is marked as an abnormal state in the state table, and an error log about the error information is generated.
And B12, searching and obtaining the batch data files corresponding to the data information according to the data information, and stopping the batch data files for data processing.
Since the data information comes from the batch data file corresponding to the data information, the corresponding batch data file can be obtained through the coding or the position marking of the data information,
and obtaining the corresponding batch data file according to the attribute information of the data information so that the server stops processing the batch data file.
According to the scheme of the embodiment, the data information in the abnormal state can be obtained rapidly according to the data information index, and the data processing of the corresponding batch data files is stopped at the same time, so that the number of errors in the data synchronization processing process is reduced, and the data quality of the synchronization processing is improved.
For the above, the step S250 may further include:
b21, acquiring abnormal bytes of the corresponding data information according to the error log;
b22, the abnormal bytes are corresponding to the bytes corresponding to the data information indexes, and correction is carried out on the abnormal bytes.
In the above steps B21-B22, according to the error code obtained in the error log, the attribute content of the abnormality of the corresponding data information is obtained from the error code, and then the corresponding abnormal byte is obtained from the attribute content, or the corresponding error byte is obtained by directly punching the error code.
And sequencing the byte sequence number corresponding to the abnormal byte corresponding to the byte in the data information index to obtain the attribute content with correct byte sequence number of the abnormal byte. And according to the correct attribute content, acquiring corresponding content from the latest information of the corresponding data information again, and carrying out corresponding correction on the abnormal bytes of the corresponding data information.
The data information index obtained according to the error log is compared and corrected, so that error information can be rapidly positioned and corrected, the data quality of synchronous processing is improved, and the data synchronization efficiency is further improved.
For step S220, it may further include:
c1, acquiring an actual formation time period of the batch data file;
if the actual forming time period is within the set generating time period of the corresponding batch data file, storing the corresponding batch data file into a temporary data table;
and C3, after the storage of other batch data files is completed, storing the data information in the temporary data table into the corresponding data table according to the matching rule stored in the batch data files.
In the above steps C1 to C3, when the querying system server receives the batch data file, the forming time of the batch data file in the host server is obtained according to the time tag thereof. If the actual forming time of the data information is just in the set generating time period of the batch data file, that is, in the process of forming the batch data file, the corresponding data information is just formed, but cannot be inserted into the batch data file with the same attribute information at this time. In order to ensure the integrity of the data and timely synchronous processing, the corresponding data information is firstly synchronized to the inquiry system server by a data packet or another batch data file, and is stored by a temporary data table. After the batch data files generated in the set generation time period are decompressed and the corresponding data information of the data tables is stored one by the data information, the data information in the temporary data tables is correspondingly kept in the corresponding data tables according to the attribute information after the corresponding data storage is completed and the matching rule stored by the batch data files is used.
The following description is made specifically by using the above examples:
the data information obtained in the process of generating the batch data files corresponding to the table 0 and the table 1 is synchronized to the query system server in the form of data packets, and is temporarily stored in the form of table 2. However, after the storage of the new user information and the updated user information is completed in the tables 0 and 1, the attribute information corresponding to the corresponding data information in the table 2 is merged into the table 1 according to the new data and the updated data.
In order to shorten the corresponding time of the subsequent external query service, each data table is combined into the basic table of the external service. In this embodiment, the data information of table 0 and table 1 is incorporated into the base table.
In the above stated batch data processing method, the batch data file performs corresponding processing on the latest new addition and update according to the set generation time period and the synchronization time period. In order to process the generated data information in time, the data information generated outside the set generation time period is firstly synchronized in the form of data packets and stored in the query system server. When the set generation time period of the batch data file is reached, the latest update information of the data information is synchronized again in the form of the batch data file and stored in the query system server.
Based on the same inventive concept as the batch data processing method, an embodiment of the present application further provides a batch data processing device, as shown in fig. 4, including:
a receiving module 410, configured to receive a plurality of batch data files, and obtain attribute information of the batch data files;
the storage module 420 is configured to store data information in the batch data file according to the matching rule stored in the batch data file and the attribute information of the batch data file;
the overlay module 430 is configured to obtain, according to the data information, a characteristic byte, and corresponding data information of the overlay data table corresponding to the characteristic byte one by one.
Referring to fig. 5, fig. 5 is a schematic diagram illustrating an internal structure of a server in one embodiment. As shown in fig. 5, the server includes a processor 510, a storage medium 520, a memory 530, and a network interface 540 connected by a system bus. The storage medium 520 of the server stores an operating system, a database, and a computer readable instruction, where the database may store a control information sequence, where the computer readable instruction when executed by the processor 510 may enable the processor 510 to implement a method for processing batch data, and the processor 510 may implement the functions of the receiving module 410, the storage module 420, and the overlay module 430 in a batch data processing apparatus in the embodiment shown in fig. 4. The processor 510 of the server is used to provide computing and control capabilities, supporting the operation of the entire server. The memory 530 of the server may have stored therein computer readable instructions that, when executed by the processor 510, cause the processor 510 to perform a method of processing batch data. The network interface 540 of the server is used for communication with the terminal connection. It will be appreciated by those skilled in the art that the structure shown in fig. 5 is merely a block diagram of a portion of the structure associated with the present inventive arrangements and is not limiting of the server to which the present inventive arrangements are applied, and that a particular server may include more or fewer components than shown, or may combine some of the components, or have a different arrangement of components.
In one embodiment, the application also proposes a storage medium storing computer-readable instructions that, when executed by one or more processors, cause the one or more processors to perform the steps of: receiving a plurality of batch data files, and obtaining attribute information of the batch data files; according to the matching rule stored in the batch data file, storing the data information in the batch data file into each data table according to the attribute information of the batch data file; and acquiring the characteristic bytes according to the data information, and correspondingly covering the corresponding data information of the data table one by one corresponding to the characteristic bytes.
As can be seen from the above embodiments, the present application has the following advantages:
according to the method and the device for processing the batch data, the respective data information is distributed to different data tables according to the attribute information of the received batch data files, and the data information corresponding to the characteristic byte coverage data tables is obtained according to the characteristics of the data information of the different data tables.
On the basis of the above, the application also provides another batch data processing method and device, when detecting that the data information stored in the data table is in an abnormal state, the batch data file is stopped from corresponding data processing, and corresponding repair is carried out according to the error log of the data information.
On the basis of the above, the application also provides another batch data processing method and device, for the data information of the batch data file which cannot be generated within the set time period and the data information in the corresponding data table is covered, the batch data file is stored in the form of a temporary data table, and after the storage of the other batch data file is completed, the data information is stored in the corresponding data table according to the matching rule of the batch data file storage. Thus, the integrity of the data can be ensured and timely synchronous processing can be obtained.
According to the technical scheme provided by the application, the requirement of synchronous data processing among the functional modules in the server system is defined, the attribute information is defined, and the data information is classified to form respective batch data files. The batch data files are mutually independent, and the server can process the batch data files simultaneously or independently. However, if there is an abnormal state in the batch data file, and/or other data processing needs to be performed, such as data correction, stopping data processing, etc., normal data processing of other batch data files is not affected. Therefore, in the process of data synchronization, the occurrence of the condition that other data processing is stopped due to the occurrence of abnormality of individual data can be prevented, the data processing efficiency is guaranteed, and the data processing quality is finally guaranteed.
In summary, the method and the device for processing the batch data classify the batch data by defining the attribute information and form the corresponding batch data file, and the technical scheme of covering the data information of the data table according to the characteristic bytes of the data information in the batch data file solves the problem that the synchronous processing of all the data needing to be synchronized together in the prior art is not beneficial to the synchronous processing efficiency and the data quality between the functional modules of the server system.
Those skilled in the art will appreciate that implementing all or part of the above-described methods in accordance with the embodiments may be accomplished by way of a computer program stored in a computer-readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), or the like.
The technical features of the above-described embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above-described embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.
The foregoing examples illustrate only a few embodiments of the application and are described in detail herein without thereby limiting the scope of the application. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the application, which are all within the scope of the application. Accordingly, the scope of protection of the present application is to be determined by the appended claims.
Claims (5)
1. A method for processing batch data, comprising the steps of:
receiving a plurality of batch data files, and obtaining attribute information of the batch data files;
storing the data information in the batch data files into each data table according to the matching rules stored in the batch data files and the attribute information of the batch data files, wherein the data tables comprise: acquiring an actual formation time period of the data information; if the actual forming time period is within the set generating time period of the corresponding batch data file, storing the corresponding data information into a temporary data table; after the storage of other batch data files is completed, according to the matching rule stored by the batch data files, storing the data information in the temporary data table into the corresponding data table;
acquiring the characteristic bytes according to the data information, and correspondingly covering the corresponding data information of the data table one by one corresponding to the characteristic bytes, wherein the method comprises the following steps: extracting corresponding characteristic bytes according to the data information index; comparing the characteristic byte of each data information with the original data information in the data table; according to the comparison result, covering the data information corresponding to the data table with the data information; the attribute information is newly added data information and updated data information which are used for dividing the data information in the batch data file;
according to the data information index, judging that the data information is in an abnormal state when detecting that the related content corresponding to the data information index cannot be obtained through analysis of the data information stored in the data table; searching to obtain a batch data file corresponding to the data information according to the data information, and stopping the batch data file for data processing;
and carrying out data correction according to the error log of the data information.
2. The method of claim 1, wherein the step of determining the position of the substrate comprises,
the step of correcting the error log according to the data information comprises the following steps:
obtaining abnormal bytes of the corresponding data information according to the error log;
and correcting the abnormal bytes by corresponding the abnormal bytes to the bytes corresponding to the data information indexes.
3. A batch data processing apparatus, comprising:
the receiving module is used for receiving a plurality of batch data files and acquiring attribute information of the batch data files;
the storage module is used for storing the data information in the batch data files into each data table according to the matching rules stored in the batch data files and the attribute information of the batch data files, and comprises the following steps: acquiring an actual formation time period of the data information; if the actual forming time period is within the set generating time period of the corresponding batch data file, storing the corresponding data information into a temporary data table; after the storage of other batch data files is completed, according to the matching rule stored by the batch data files, storing the data information in the temporary data table into the corresponding data table;
the coverage module is used for acquiring the characteristic bytes according to the data information, and corresponding data information of the data table is correspondingly covered one by one corresponding to the characteristic bytes, and comprises the following steps: extracting corresponding characteristic bytes according to the data information index; comparing the characteristic byte of each data information with the original data information in the data table; according to the comparison result, covering the data information corresponding to the data table with the data information; the attribute information is newly added data information and updated data information which are used for dividing the data information in the batch data file; according to the data information index, judging that the data information is in an abnormal state when detecting that the related content corresponding to the data information index cannot be obtained through analysis of the data information stored in the data table; searching to obtain a batch data file corresponding to the data information according to the data information, and stopping the batch data file for data processing; and carrying out data correction according to the error log of the data information.
4. A server, comprising:
one or more processors;
a memory;
one or more computer programs, wherein the one or more computer programs are stored in the memory and configured to be executed by the one or more processors, the one or more computer programs configured to perform the method of processing bulk data according to any one of claims 1 to 2.
5. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored thereon a computer program which, when executed by a processor, implements the method of processing batch data according to any one of claims 1 to 2.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910440400.7A CN110309133B (en) | 2019-05-24 | 2019-05-24 | Batch data processing method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910440400.7A CN110309133B (en) | 2019-05-24 | 2019-05-24 | Batch data processing method and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110309133A CN110309133A (en) | 2019-10-08 |
CN110309133B true CN110309133B (en) | 2023-08-22 |
Family
ID=68074928
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910440400.7A Active CN110309133B (en) | 2019-05-24 | 2019-05-24 | Batch data processing method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110309133B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113485952A (en) * | 2021-07-23 | 2021-10-08 | 中国工商银行股份有限公司 | Data batch transmission method and device |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106325933A (en) * | 2016-08-24 | 2017-01-11 | 明算科技(北京)股份有限公司 | Method and device for synchronizing batch data |
CN106776131A (en) * | 2016-11-30 | 2017-05-31 | 杭州华为数字技术有限公司 | A kind of data back up method and server |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20030172368A1 (en) * | 2001-12-26 | 2003-09-11 | Elizabeth Alumbaugh | System and method for autonomously generating heterogeneous data source interoperability bridges based on semantic modeling derived from self adapting ontology |
JP2010063087A (en) * | 2008-08-08 | 2010-03-18 | Ricoh Co Ltd | Image forming apparatus, log storing method, and log storing program |
US10853349B2 (en) * | 2017-08-09 | 2020-12-01 | Vmware, Inc. | Event based analytics database synchronization |
-
2019
- 2019-05-24 CN CN201910440400.7A patent/CN110309133B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106325933A (en) * | 2016-08-24 | 2017-01-11 | 明算科技(北京)股份有限公司 | Method and device for synchronizing batch data |
CN106776131A (en) * | 2016-11-30 | 2017-05-31 | 杭州华为数字技术有限公司 | A kind of data back up method and server |
Also Published As
Publication number | Publication date |
---|---|
CN110309133A (en) | 2019-10-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110502546B (en) | Data processing method and device | |
CN111857880B (en) | Dialogue configuration item information management method, device, equipment and storage medium | |
CN107391632B (en) | Database storage processing method and device, computing equipment and computer storage medium | |
CN102722585B (en) | Browser type identification method, device and system | |
CN106844139A (en) | Log file analysis method and device | |
CN114500690B (en) | Interface data processing method and device, electronic equipment and storage medium | |
CN112286565B (en) | Embedded system differential upgrading method based on storage container | |
CN112818937B (en) | Excel file identification method and device, electronic equipment and readable storage medium | |
CN109063142B (en) | Webpage resource pushing method, server and storage medium | |
CN113132267A (en) | Distributed system, data aggregation method and computer readable storage medium | |
CN110909168A (en) | Knowledge graph updating method and device, storage medium and electronic device | |
CN110309133B (en) | Batch data processing method and device | |
CN116306531A (en) | Automatic HTML document filling method and device, electronic device and storage medium | |
CN112035405A (en) | Document transcoding method and device, scheduling server and storage medium | |
EP3866031A1 (en) | Webpage loading method, intermediate server, and webpage loading system | |
CN110503504B (en) | Information identification method, device and equipment of network product | |
WO2023125038A1 (en) | Data table preprocessing method and apparatus, and electronic device and storage medium | |
CN114745452B (en) | Equipment management method and device and electronic equipment | |
CA3180833A1 (en) | Flink sql statement verification method and device, computer equipment and storage medium | |
CN113407193B (en) | System deployment method, device and equipment | |
CN114500508A (en) | Gas meter upgrading method and device and electronic equipment | |
CN115550470A (en) | Industrial control network data packet analysis method and device, electronic equipment and storage medium | |
CN112001160A (en) | Data processing method, device, equipment and storage medium | |
CN105871815A (en) | Data transmitting method and device | |
CN107562553B (en) | Data center management method and equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |