CN108984177A

CN108984177A - A kind of data processing method and system

Info

Publication number: CN108984177A
Application number: CN201810643204.5A
Authority: CN
Inventors: 程赓; 刘建波; 汪文超
Original assignee: China Tower Co Ltd
Current assignee: China Tower Co Ltd
Priority date: 2018-06-21
Filing date: 2018-06-21
Publication date: 2018-12-11

Abstract

The present invention provides a kind of data processing method and system, this method comprises: receiving file destination；The first queue to be processed is added in the file destination, wherein first queue to be processed is First Input First Output；When the file obtained from the described first queue to be processed is the file destination, according to preconfigured metadata mapping ruler, the corresponding target matrix of the file destination is determined；According to the metadata mapping ruler and preconfigured resolution rules, the data in the file destination are parsed；The target matrix is written into data after parsing.Data processing method provided by the invention, it carries out waiting in line to parse by the way that the file destination received to be added in First Input First Output, it thereby may be ensured that data processing system is parsed and handled to the received file to be processed of institute according to the sequence of first in, first out, and then can be improved document analysis speed and treatment effeciency.

Description

A kind of data processing method and system

Technical field

The present invention relates to field of communication technology more particularly to a kind of data processing methods and system.

Background technique

With the progress of information technology, many enterprises or department can all establish corresponding information system and carry out management business number According to, and in practice, an enterprise or department generally require multiple information systems to manage different business datums respectively.And with The development of enterprise, the required business datum amount handled of each system is also constantly increasing, currently, the basic process of data processing is: What is uploaded to user includes that the Excel file of business datum parses, and corresponding number is written in the data of successfully resolved According to table, while being stored.

However, being in the prior art usually to use open source analytical tool, dissection process is directly carried out after reading data, and It is not concerned with resolution speed, thus there is a problem of that resolution speed is slow, EMS memory occupation is high, it is especially high in files in batch upload, data Under concurrent scene, the treatment effeciency of system is very low.

Summary of the invention

The embodiment of the present invention provides a kind of data processing method and system, lower to solve available data processing method efficiency The problem of.

In order to solve the above technical problems, the present invention is implemented as follows:

In a first aspect, being applied to data processing system the embodiment of the invention provides a kind of data processing method, comprising:

Receive file destination；

The first queue to be processed is added in the file destination, wherein first queue to be processed is first in, first out team Column；

When the file obtained from the described first queue to be processed is the file destination, according to preconfigured first number According to mapping ruler, the corresponding target matrix of the file destination is determined；

According to the metadata mapping ruler and preconfigured resolution rules, the data in the file destination are carried out Parsing；

The target matrix is written into data after parsing.

Second aspect, the embodiment of the present invention provide a kind of data processing system, comprising:

Receiving module, for receiving file destination；

First processing module, for the first queue to be processed to be added in the file destination, wherein described first is to be processed Queue is First Input First Output；

Determining module, for when the file obtained from the described first queue to be processed be the file destination when, according to Preconfigured metadata mapping ruler determines the corresponding target matrix of the file destination；

Parsing module is used for according to the metadata mapping ruler and preconfigured resolution rules, to the target text Data in part are parsed；

The target matrix is written for the data after parsing in Second processing module.

The third aspect, the embodiment of the present invention provide a kind of data processing system, including processor, memory and are stored in institute The computer program that can be run on memory and on the processor is stated, when the computer program is executed by the processor Realize the step in above-mentioned data processing method.

Fourth aspect, the embodiment of the present invention provide a kind of computer readable storage medium, the computer-readable storage medium Computer program is stored in matter, the computer program realizes the step in above-mentioned data processing method when being executed by processor Suddenly.

In the embodiment of the present invention, carry out waiting in line to solve by the way that the file destination received to be added in First Input First Output Analysis thereby may be ensured that data processing system is parsed and located to the received file to be processed of institute according to the sequence of first in, first out Reason, and then can be improved document analysis speed and treatment effeciency, and can be avoided the data processing system on files in batch The problem for passing, getting congestion under the scene of data high concurrent and causing treatment effeciency very low.

Detailed description of the invention

In order to illustrate the technical solution of the embodiments of the present invention more clearly, needed in being described below to the embodiment of the present invention Attached drawing to be used is briefly described, it should be apparent that, drawings in the following description are only some embodiments of the invention, For those of ordinary skill in the art, without any creative labor, it can also obtain according to these attached drawings Obtain other attached drawings.

Fig. 1 is the flow chart for the data processing method that one embodiment of the invention provides；

Fig. 2 be another embodiment of the present invention provides data processing method flow chart；

Fig. 3 is the structure chart of data processing system provided in an embodiment of the present invention；

Fig. 4 is the structure chart of the parsing module of data processing system provided in an embodiment of the present invention；

Fig. 5 is the structure chart of the resolution unit in the parsing module of data processing system provided in an embodiment of the present invention；

Fig. 6 is the structure chart of the Second processing module of data processing system provided in an embodiment of the present invention.

Specific embodiment

Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation description, it is clear that described embodiments are some of the embodiments of the present invention, instead of all the embodiments.Based on this hair Embodiment in bright, every other implementation obtained by those of ordinary skill in the art without making creative efforts Example, shall fall within the protection scope of the present invention.

It is a kind of flow chart of data processing method provided in an embodiment of the present invention referring to Fig. 1, Fig. 1, is applied at data Reason system, as shown in Figure 1, the described method comprises the following steps:

Step 101 receives file destination.

In the present embodiment, before carrying out data processing, file destination to be processed need to be first received, reception is specifically can be and uses The file destination that family is submitted in client, wherein the file destination can be the file of excel format, the file destination It can be user's manual creation editor and be inserted into the file formed after related data, be also possible to the template generated based on system, The file obtained after insertion related data.

In the step, above-mentioned reception file destination, which can be, disposably receives the file destination, is also possible to fragment and connects The file destination is received, such as: being no more than the primary institute of the data processing system in the size of the file destination can be received When largest amount, the file destination disposably can be completely received, is more than the data in the size of the file destination When the primary received largest amount of energy of processing system, then the file destination can be split as multiple segments progress by client It uploads, so that the data processing system fragment receives the file destination.

The first queue to be processed is added in the file destination by step 102, wherein first queue to be processed is first Enter first dequeue.

After receiving the file destination, the first queue to be processed can be added in the file destination, to described File destination carries out queuing processing, in this way, can ensure that all files for being uploaded to the data processing system can be according to one Fixed order is handled, to be not susceptible to congestion and influence system treatment effeciency.

Wherein, the described first queue to be processed is First Input First Output, i.e., first received file is first handled, rear received text Part post-processing, such as: if having been deposited in first queue to be processed before the first queue to be processed is added in the file destination It is to be processed in two files etc., then by the file destination be added the first queue to be processed after, need to wait for both of these documents from In first queue to be processed after allocation processing, then it is allocated processing.

Step 103, when the file obtained from the described first queue to be processed is the file destination, according to matching in advance The metadata mapping ruler set determines the corresponding target matrix of the file destination.

In the embodiment of the present invention, the data processing system can constantly obtain from the described first queue to be processed to be processed File can all obtain the file being added at first in the described first queue to be processed and when obtaining each time.

When the file obtained from the described first queue to be processed is the file destination, then can start to the mesh Mark file is handled, and specifically, can determine that the file destination is corresponding first according to preconfigured metadata mapping ruler Target matrix, wherein the metadata mapping ruler may include the rule of correspondence of filename and tables of data, in this way, can Tables of data corresponding with the title of the file destination is searched, so that it is determined that the file destination according to the rule of correspondence Corresponding target matrix.

Wherein, the target matrix is used to store the data after parsing in the file destination, and the target data Table can be deposited in the target database of the data processing system, and the target matrix can both be convenient for user's subsequent query phase Data are closed, the data processing system can also be facilitated to be called and be pocessed related data therein.

Step 104, according to the metadata mapping ruler and preconfigured resolution rules, in the file destination Data are parsed.

It, can be according to the metadata mapping ruler and preconfigured resolution rules, to the target in the present embodiment Data in file are parsed, wherein the metadata mapping ruler can also include the mappings such as metadata format, relationship rule Then, such as: by the data field of each column in the file destination, data type, data length, numberical range, whether may be used It carries out for sky etc. with the metadata format mapping rule postponed, by the relationship in the file destination between the column and the column Or the metadata relationship mapping ruler that formula that need to meet etc. is configured；The resolution rules can be preconfigured The rule how data in the file destination parsed, such as: which row of the regulation from the file destination and Which column starts to read data, need to parse how many column data altogether, and the file destination divides in the case where incremental update and full dose update Not corresponding resolution rules etc..

It should be noted that the metadata mapping ruler and the resolution rules may each be unique with the file destination It is corresponding, i.e., for the file destination of different structure, it can be corresponding with different metadata mapping ruler and resolution rules, specifically Corresponding configuration can be carried out according to user demand.

In the step, the above-mentioned data in the file destination are parsed, and be can be and are read according to the resolution rules The data in the file destination are taken, and according to the metadata mapping ruler, verify whether read data meet in advance The requirement such as format, relationship of configuration, such as: it is described if resolution rules agreement reads data since the second row of secondary series The data type that metadata mapping ruler arranges secondary series is integer data, and numerical value length is 10, then to the file destination When being parsed, the data of the second row of secondary series in the file destination can be first read, and verify the number of the second row of secondary series It whether is integer data according to type, whether numerical value length is 10.

In this way, by preconfigured metadata mapping ruler and preconfigured resolution rules, it not only can be quickly quasi- Really the data in the file destination are parsed, and can satisfy the data processing needs of the file of different structure.

The target matrix is written in data after parsing by step 105.

After parsing to the data in the file destination, the target can be written in the data after parsing Tables of data specifically can be to extract and parse the data passed through in the file destination, and will be described in the write-in of extracted data Target matrix, wherein when data are written to the target matrix, it can be the configuration according to the target matrix, The data that parsing in the file destination passes through are inserted into corresponding position in the target matrix.

It should be noted that completing after the target matrix is written in the data after parsing to the target text The data processing task of part, and the target matrix will be stored in the target data of the data processing system after write In library, to facilitate user to inquire subsequently through the data processing system data in the target matrix, or facilitate institute The subsequent data in the target matrix of data processing system are stated to be called or be further processed.

In addition, after the target matrix is written in the data after parsing the file destination can also be saved, specifically The file destination can be saved to preset memory space, if local storage space or cloud share memory space, with side Just user is subsequent checks the file destination, shares memory space if storing to cloud, can also realize that multiple spot is shared, user Without repeating the data file for uploading identical content.

In the embodiment of the present invention, above-mentioned data processing system be can be including processor, hard disk, memory, system bus etc. Computer service system, such as: server.

Data processing method in the present embodiment is carried out by the way that the file destination received to be added in First Input First Output Wait in line to parse, thereby may be ensured that data processing system according to first in, first out sequence to the received file to be processed of institute into Row parsing and processing, and then can be improved document analysis speed and treatment effeciency, and can be avoided the data processing system and exist The problem that files in batch uploads, gets congestion under the scene of data high concurrent and cause treatment effeciency very low.

Referring to fig. 2, Fig. 2 is the flow chart of another data processing method provided in an embodiment of the present invention, is applied to data Processing system, on the basis of the present embodiment embodiment shown in Fig. 1, to how according to the metadata mapping ruler and in advance The resolution rules of configuration, to the data in the file destination carry out parsing refined, thus make document analysis speed and Treatment effeciency is further improved.As shown in Fig. 2, the described method comprises the following steps:

Step 201 receives file destination.

The specific embodiment of the step may refer to the embodiment of step 101 in embodiment of the method shown in FIG. 1, be It avoids repeating, which is not described herein again.

Optionally, step 201 includes:

Receive the file destination that client fragment uploads.

The file destination that above-mentioned reception client fragment uploads, can be when the size of the file destination is excessive, is Success can be uploaded by guaranteeing the file destination, and client carries out fragment to the file destination by being sized, then by institute It states file destination fragment and is uploaded to the data processing system, in this way, the data processing system just will receive client fragment The file destination uploaded.

Such as: the size of the file destination is 100M, and the file destination is split as 10 fragments by client, each Fragment is 10M, and 10 fragments are successively then uploaded to the data processing system, in this way, the data processing system System can receive a fragment every time, until receiving the last one fragment.

In the embodiment, by using fragment uploading file, it can not only guarantee that the data processing system can connect The file size of receipts is unrestricted, so that the data processing system is capable of handling big file, and can also preferably guarantee net Network connectivity, such as: during receiving the fragment of the file destination, when there is the case where suspension, without extensive in network The file destination is received again when multiple, but can continue to the fragment not received after suspension.

Certainly, which is equally applicable in embodiment shown in FIG. 1, and can reach identical beneficial effect.

The first queue to be processed is added in the file destination by step 202, wherein first queue to be processed is first Enter first dequeue.

The specific embodiment of the step may refer to the embodiment of step 102 in embodiment of the method shown in FIG. 1, and Identical beneficial effect can be reached, to avoid repeating, which is not described herein again.

Step 203, when the file obtained from the described first queue to be processed is the file destination, according to matching in advance The metadata mapping ruler set determines the corresponding target matrix of the file destination.

The specific embodiment of the step may refer to the embodiment of step 103 in embodiment of the method shown in FIG. 1, be It avoids repeating, which is not described herein again.

Step 204, the parsing task for obtaining the file destination from the described first queue to be processed by destination node, Wherein, the destination node is any idle node of the processing cluster in the data processing system.

In the present embodiment, the data processing system may include processing cluster, i.e., the described data processing system can wrap Multiple processors are included, the multiple processor interconnects, and each processor can be to be processed from described first The parsing task of file to be processed is obtained in queue.

Since the data processing system executes the parsing task of file to be processed using processing Clustering mechanism, In the step, the parsing task of the file destination will be obtained from the described first queue to be processed by destination node, wherein The destination node can be any idle node of the processing cluster in the data processing system.In this way, working as described first There are when multiple files to be processed in queue to be processed, the multiple file to be processed can be sequentially allocated the processing The different idle nodes of cluster carry out dissection process, without waiting for a long time.

In this way, executing parsing task, processing employed in the present embodiment using single processor compared to the prior art Clustering mechanism can not only further increase analyzing efficiency, avoid prolonged EMS memory occupation, moreover it is possible to ensure the data processing The stability of system, such as: when some node of the processing cluster breaks down, solution can also be executed by other nodes Analysis task, to guarantee that the data processing system can work normally.

It should be noted that in the step, it is described being obtained from the described first queue to be processed by the destination node When the parsing task of file destination, the file destination of acquisition can also be saved to the shared memory space in cloud, so as to Also the file destination can be accessed by the data processing system in other networked clients, without user repeat to submit with The structure and content of the file destination file all the same.

Step 205, according to the metadata mapping ruler and preconfigured resolution rules, pass through the destination node pair Data in the file destination are parsed.

The parsing task of the file destination is obtained by then passing through the destination node, is to pass through in the step therefore The destination node parses the data in the file destination.Wherein, about above-mentioned metadata mapping ruler, above-mentioned solution The explanation that analysis rule and above-mentioned data in the file destination parse etc. is referred in Fig. 1 corresponding embodiment The explanation of corresponding portion, to avoid repeating, details are not described herein again.

Optionally, the preconfigured metadata mapping ruler is the metadata mapping ruler of received user setting, And/or the preconfigured resolution rules are the resolution rules of received user setting.

In the embodiment, the preconfigured metadata mapping ruler can by user setting and/or it is described in advance The resolution rules of configuration can by user setting, specifically can be user first pass through in advance client setting function setting it is desired Metadata mapping ruler and/or resolution rules, then client is by the metadata mapping ruler and/or resolution rules of user setting Be sent to the data processing system so that the data processing system receive user setting metadata mapping ruler and/or Resolution rules.

Wherein, the explanation of the metadata mapping ruler and the resolution rules is referred to phase in Fig. 1 corresponding embodiment The explanation of part is answered, to avoid repeating, details are not described herein again.

It is noted that in addition to user described in the embodiment can be to metadata mapping ruler, resolution rules Be configured it is outer, user can also access right, access authority to variant client be configured, such as: setting first Client is able to use the data processing system and carries out data processing, and the site in the accessible data processing system Data and supplier data, the second client of setting cannot use the data processing system to carry out data processing, but can visit Ask the site data etc. in the data processing system.In addition, user can also be to the corresponding database of different clients, system The memory space etc. of file is configured, such as: the first client of setting corresponds to first database, by first client Data in the file of upload are stored into the first database, corresponding first memory space of the first client of setting, will The file and journal file that first client uploads are stored to first memory space.

In this way, by the metadata mapping ruler and/or resolution rules that receive user setting, so as to make at the data Reason system carries out dissection process to the received file destination of institute according to the demand of user, and then has the data processing system Flexibility and versatility, for the file to be processed of different structure, user only needs corresponding changing metadata mapping ruler, parsing rule Then, develop new parsing code again without developer so that the data processing system can satisfy it is not of the same trade or business The process demand for data of being engaged in.

The target matrix is written in data after parsing by step 206.

The specific embodiment of the step may refer to the embodiment of step 105 in embodiment of the method shown in FIG. 1, be It avoids repeating, which is not described herein again.

Optionally, step 206 includes:

Data after parsing are added to grouping corresponding with the target matrix, wherein a grouping is one corresponding Thread；

By default size to belong in the grouping corresponding with the target matrix data of the file destination into Row fragment obtains N number of data fragmentation, wherein N is the integer more than or equal to 2；

Each data fragmentation in N number of data fragmentation is generated into a batch processing configuration query language respectively (Structured Query Language, abbreviation SQL), obtains N batch processing SQL；

The N batch processing SQL is executed one by one, and the data in N number of data fragmentation are sequentially written in the target Tables of data.

In the embodiment, data write-in processing can be carried out by the way of thread pool can be specifically pre-created Preset quantity thread, so that the data processing system can satisfy the number under files in batch upload, data high concurrent scene According to batch requirement, and when preparing that data are written, can first to the data after all parsings by write-in tables of data it is different into Row grouping, wherein the data of write-in identical data table are a grouping, the corresponding thread of a grouping, in this way, need to be written same The data of one tables of data will be in the processing to be written such as the same thread.

Such as: it, can be by each text in five files after carrying out batch parsing to the data in five files Part the difference of corresponding write-in tables of data be grouped, however, it is determined that the first file in five files and the second text Part corresponds to the first tables of data, corresponding second tables of data of the third file and the 4th file in five files, five texts The 5th file in part corresponds to third tables of data, then, can be with when being grouped to the data after parsing in five files Data after parsing in first file and second file are added to the first grouping, by the third file and described Data after parsing in 4th file are added to second packet, and the data after parsing in the 5th file are added to third point Group, wherein described first is grouped corresponding first thread, corresponding second thread of the second packet, the third grouping corresponding the Three threads.

In this way, the data after parsing in the file destination can be added to corresponding with the target matrix point Group, so as to be added to the data after being parsed in the file destination in thread corresponding with grouping where it, the data The target matrix will be sequentially written to the data in the thread in processing system.Here, it should be noted that, in the embodiment The multiple files being previously mentioned correspond to same write-in tables of data and refer to that the data in the multiple file in each file are both needed to be written Mutually isostructural tables of data, but for each file, a new tables of data will be respectively created, the data in each file will divide It a tables of data and Xie Ru not be stored.

It, can also be first to the mesh after the data after parsing are added to grouping corresponding with the target matrix It marks the data after parsing in file and carries out fragment, to prevent from causing to be written due to the data after parsing in the file destination are excessive The overlong time of the target matrix, moreover it is possible to avoid occurring suspension in writing process and resulting in the need for re-writing.Specifically Ground can carry out the data for belonging to the file destination in the grouping corresponding with the target matrix by default size Fragment, so that N number of data fragmentation is obtained, such as: it can will belong in the grouping corresponding with the target matrix described The data of file destination are divided into 100,500 or 1000 data fragmentations by the default size, wherein the default size can be with It is set according to the processing capacity of the data processing system.

By default size to the data for belonging to the file destination in the grouping corresponding with the target matrix After carrying out fragment, each data fragmentation in N number of data fragmentation can be generated into a batch processing configuration respectively and looked into Language SQL is ask, N batch processing SQL is obtained, specifically can be the data being based respectively in each data fragmentation, it is each to generate one Corresponding batch processing SQL, wherein every batch processing SQL is used to indicate once to write the initial data in this batch processing SQL Enter the target position in the target matrix, therefore, the language construction of the N batch processing SQL can be it is identical, only Data to be written therein are different with target writing position.

After generating the N batch processing SQL, the N batch processing SQL can be executed one by one, by N number of number The target matrix is sequentially written according to the data in fragment, wherein one batch processing SQL of every execution can be realized primary Property a plurality of data included in a data fragmentation are written to the target matrix, so as to improve data write-in effect Rate.

It should be noted that executing the N batch processing SQL one by one, to be sequentially written in the N to the target matrix During data in a data fragmentation, can with one batch processing SQL of every execution, interval preset duration, such as: interval 1 second, 2 seconds or 5 seconds etc., this way it is possible to avoid the case where deadlock occurs in the target database, and then ensure the data processing system Has more reliable performance.

It is also pointed out that may be used also after the target matrix is written in the data after parsing in the file destination With the data query instruction inputted according to user, the target data inquired needed for output user, specifically, user can be described Data query interface input in data processing system includes that the data query of keyword instructs, so that the data processing system Keyword included in inquiry instruction based on the data, into being inquired in the target database for storing the target matrix With the target data of the Keywords matching, and the target data inquired is subjected to output and is shown.In this way, at the data Reason system is also equipped with simple and fast data query function.

In this way, by the way that the data after parsing are added to grouping corresponding with the target matrix, and by default size Fragment is carried out to the data for belonging to the file destination in the grouping corresponding with the target matrix, then by the N Each data fragmentation in a data fragmentation generates a batch processing SQL respectively, finally executes the N batch processing SQL one by one, The data in N number of data fragmentation are sequentially written in the target matrix, the data processing system can not only be ensured System can satisfy batch data write-in demand, additionally it is possible to improve the data write efficiency of the data processing system.

Optionally, step 205 includes:

According to the metadata mapping ruler and preconfigured resolution rules, the mesh is read by the destination node The data in file are marked, and legitimate verification is carried out to read data；

It will be verified as illegal data deposit exception database in the file destination, and will be tested in the file destination Card is that the second queue to be processed is added in legal data, wherein second queue to be processed is First Input First Output；

Step 206 includes:

When being the data in the file destination from the data obtained in the described second queue to be processed, by the target The target matrix is written in data in file.

In the embodiment, the data in the file destination are read above by the destination node, and to being read Data carry out legitimate verification, can be through the destination node, according to the resolution rules, from the file destination Data are read, and according to the metadata mapping ruler, whether verify read data legal, such as: verifying is read Whether the format of data meets the call format arranged in the metadata mapping ruler, verify it is read between the column and the column Whether the numerical value of data meets the relationship etc. arranged in the metadata mapping ruler.

If verifying in the file destination, there are illegal data, can will be verified as not in the file destination Legal data are stored in the exception database pre-established, and can be by the illegal reason of corresponding record data, to facilitate use Family checks illegal data by the exception database and knows the illegal reason of data.

For being verified as legal data in the file destination, then can be added in the second queue to be processed at waiting Reason then can be by the mesh when being the data in the file destination from the data obtained in the described second queue to be processed The target matrix is written in the data marked in file, wherein second queue to be processed is First Input First Output, described to incite somebody to action The specific embodiment that the target matrix is written in data in the file destination is referred to phase in Fig. 1 corresponding embodiment The explanation of part is answered, to avoid repeating, details are not described herein again.In this way, data legal after all verifyings can sequentially be written Corresponding tables of data, so as to guarantee higher treatment effeciency, and it is not easy to make mistakes.

It is noted that after the legitimacy to the data in the file destination verified, it can also be in institute The analysis state of the file destination is marked in the tag field for stating the first queue to be processed, such as: it is default by adding Identifier marks the file destination to have parsed.In this way, identical file destination ought be received again, (such as user repeats to upload The file destination) when, it can determine that the file destination has parsed by the tag field in the described first queue to be processed, Without parsing again to the file destination, but the parsing link can be skipped, and enter next process flow.

In this way, will be verified as in the file destination illegal by carrying out legitimate verification to read data Data are stored in exception database, and the second queue to be processed is added by legal data are verified as in the file destination, not only Convenient for the illegal data of user query and know the illegal reason of data, moreover it is possible to guarantee higher data write efficiency.

In the present embodiment, on the basis of embodiment shown in Fig. 1, to how according to the metadata mapping ruler and pre- The resolution rules first configured carry out parsing to the data in the file destination and are refined, to make document analysis speed It is further improved with treatment effeciency.In addition, also added on the basis of the present embodiment embodiment shown in Fig. 1 it is a variety of can The embodiment of choosing, these optional embodiments can be combined with each other realization, can also be implemented separately, and be attained by raising Document analysis speed and treatment effeciency avoid the data processing system from uploading, under the scene of data high concurrent in files in batch The technical effect for getting congestion and causing treatment effeciency very low.

It is a kind of structural schematic diagram of data processing system provided in an embodiment of the present invention referring to Fig. 3, Fig. 3, such as Fig. 3 institute Show, data processing system 300 includes:

Receiving module 301, for receiving file destination；

First processing module 302, for by the file destination be added the first queue to be processed, wherein described first to Processing queue is First Input First Output；

Determining module 303, for when the file obtained from the described first queue to be processed be the file destination when, root According to preconfigured metadata mapping ruler, the corresponding target matrix of the file destination is determined；

Parsing module 304 is used for according to the metadata mapping ruler and preconfigured resolution rules, to the target Data in file are parsed；

The target matrix is written for the data after parsing in Second processing module 305.

Optionally, as shown in figure 4, the parsing module 304 includes:

Acquiring unit 3041, for obtaining the file destination from the described first queue to be processed by destination node Parsing task, wherein the destination node is any idle node of the processing cluster in the data processing system；

Resolution unit 3042 is used for according to the metadata mapping ruler and preconfigured resolution rules, by described Destination node parses the data in the file destination.

Optionally, as shown in figure 5, the resolution unit 3042 includes:

Subelement 30421 is verified, for passing through institute according to the metadata mapping ruler and preconfigured resolution rules It states destination node and reads data in the file destination, and legitimate verification is carried out to read data；

Subelement 30422 is handled, for illegal data deposit exception database will to be verified as in the file destination, And the second queue to be processed is added by legal data are verified as in the file destination, wherein second queue to be processed For First Input First Output；

The Second processing module 305 is used to when the data obtained from the described second queue to be processed be the target text When data in part, the target matrix is written into the data in the file destination.

Optionally, the receiving module 301 is used to receive the file destination of client fragment upload.

Optionally, as shown in fig. 6, the Second processing module 305 includes:

Adding unit 3051, for the data after parsing to be added to grouping corresponding with the target matrix, wherein The corresponding thread of one grouping；

Sharding unit 3052, for described to belonging in the grouping corresponding with the target matrix by default size The data of file destination carry out fragment, obtain N number of data fragmentation, wherein N is the integer more than or equal to 2；

Generation unit 3053, for each data fragmentation in N number of data fragmentation to be generated a batch processing respectively Structured query language SQL obtains N batch processing SQL；

Execution unit 3054, for executing the N batch processing SQL one by one, by the data in N number of data fragmentation It is sequentially written in the target matrix.

Data processing system 300 can be realized each mistake that data processing system is realized in the embodiment of the method for Fig. 1 and Fig. 2 Journey, to avoid repeating, which is not described herein again.The data processing system 300 of the embodiment of the present invention passes through the target text that will be received Part is added in First Input First Output and carries out waiting in line to parse, and thereby may be ensured that received to institute according to the sequence of first in, first out File to be processed is parsed and is handled, and then can be improved document analysis speed and treatment effeciency, and can be avoided in file The problem that batch uploads, gets congestion under the scene of data high concurrent and cause treatment effeciency very low.

The embodiment of the present invention also provides a kind of data processing system, including processor, and memory stores on a memory simultaneously The computer program that can be run on the processor, the computer program realize above-mentioned data processing side when being executed by processor Each process of method embodiment, and identical technical effect can be reached, to avoid repeating, which is not described herein again.

The embodiment of the present invention also provides a kind of computer readable storage medium, and meter is stored on computer readable storage medium Calculation machine program, the computer program realize each process of above-mentioned data processing method embodiment, and energy when being executed by processor Reach identical technical effect, to avoid repeating, which is not described herein again.Wherein, the computer readable storage medium, such as only Read memory (Read-Only Memory, abbreviation ROM), random access memory (Random Access Memory, abbreviation RAM), magnetic or disk etc..

It should be noted that, in this document, the terms "include", "comprise" or its any other variant are intended to non-row His property includes, so that the process, method, article or the device that include a series of elements not only include those elements, and And further include other elements that are not explicitly listed, or further include for this process, method, article or device institute it is intrinsic Element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that including being somebody's turn to do There is also other identical elements in the process, method of element, article or device.

Through the above description of the embodiments, those skilled in the art can be understood that above-described embodiment side Method can be realized by means of software and necessary general hardware platform, naturally it is also possible to by hardware, but in many cases The former is more preferably embodiment.Based on this understanding, technical solution of the present invention substantially in other words does the prior art The part contributed out can be embodied in the form of software products, which is stored in a storage medium In (such as ROM/RAM, magnetic disk, CD), including some instructions are used so that a terminal (can be mobile phone, computer, service Device, air conditioner or network equipment etc.) execute method described in each embodiment of the present invention.

The embodiment of the present invention is described with above attached drawing, but the invention is not limited to above-mentioned specific Embodiment, the above mentioned embodiment is only schematical, rather than restrictive, those skilled in the art Under the inspiration of the present invention, without breaking away from the scope protected by the purposes and claims of the present invention, it can also make very much Form belongs within protection of the invention.

Claims

1. a kind of data processing method is applied to data processing system characterized by comprising

Receive file destination；

The first queue to be processed is added in the file destination, wherein first queue to be processed is First Input First Output；

When the file obtained from the described first queue to be processed is the file destination, reflected according to preconfigured metadata Rule is penetrated, determines the corresponding target matrix of the file destination；

According to the metadata mapping ruler and preconfigured resolution rules, the data in the file destination are solved Analysis；

The target matrix is written into data after parsing.

2. the method according to claim 1, wherein described according to the metadata mapping ruler and being pre-configured with Resolution rules, the data in the file destination are parsed, comprising:

The parsing task of the file destination is obtained from the described first queue to be processed by destination node, wherein the mesh Mark any idle node that node is the processing cluster in the data processing system；

According to the metadata mapping ruler and preconfigured resolution rules, by the destination node to the file destination In data parsed.

3. according to the method described in claim 2, it is characterized in that, described according to the metadata mapping ruler and being pre-configured with Resolution rules, the data in the file destination are parsed by the destination node, comprising:

According to the metadata mapping ruler and preconfigured resolution rules, the target text is read by the destination node Data in part, and legitimate verification is carried out to read data；

It will be verified as illegal data deposit exception database in the file destination, and will be verified as in the file destination The second queue to be processed is added in legal data, wherein second queue to be processed is First Input First Output；

The target matrix is written in the data by after parsing, comprising:

When being the data in the file destination from the data obtained in the described second queue to be processed, by the file destination In data the target matrix is written.

4. according to the method in any one of claims 1 to 3, which is characterized in that the reception file destination, comprising:

Receive the file destination that client fragment uploads.

5. method according to claim 1 or 2, which is characterized in that the number of targets is written in the data by after parsing According to table, comprising:

Data after parsing are added to grouping corresponding with the target matrix, wherein the corresponding thread of a grouping；

The data for belonging to the file destination in the grouping corresponding with the target matrix are divided by default size Piece obtains N number of data fragmentation, wherein N is the integer more than or equal to 2；

Each data fragmentation in N number of data fragmentation is generated into a batch processing configuration query language SQL respectively, is obtained N batch processing SQL；

The N batch processing SQL is executed one by one, and the data in N number of data fragmentation are sequentially written in the target data Table.

6. according to the method in any one of claims 1 to 3, which is characterized in that the preconfigured metadata mapping Rule is the metadata mapping ruler of received user setting and/or the preconfigured resolution rules are received user The resolution rules of setting.

7. a kind of data processing system characterized by comprising

Receiving module, for receiving file destination；

First processing module, for the first queue to be processed to be added in the file destination, wherein first queue to be processed For First Input First Output；

Determining module, for when the file obtained from the described first queue to be processed be the file destination when, according to preparatory The metadata mapping ruler of configuration determines the corresponding target matrix of the file destination；

Parsing module is used for according to the metadata mapping ruler and preconfigured resolution rules, in the file destination Data parsed；

8. data processing system according to claim 7, which is characterized in that the parsing module includes:

Acquiring unit, the parsing for obtaining the file destination from the described first queue to be processed by destination node are appointed Business, wherein the destination node is any idle node of the processing cluster in the data processing system；

Resolution unit, for passing through the destination node according to the metadata mapping ruler and preconfigured resolution rules Data in the file destination are parsed.

9. data processing system according to claim 8, which is characterized in that the resolution unit includes:

Subelement is verified, for passing through the target section according to the metadata mapping ruler and preconfigured resolution rules Point reads the data in the file destination, and carries out legitimate verification to read data；

Subelement is handled, for illegal data deposit exception database will to be verified as in the file destination, and will be described It is verified as legal data in file destination, the second queue to be processed is added, wherein second queue to be processed is first to enter elder generation Dequeue；

It is in the file destination that the Second processing module, which is used to work as from the data obtained in the described second queue to be processed, When data, the target matrix is written into the data in the file destination.

10. data processing system according to any one of claims 7 to 9, which is characterized in that the receiving module is used for Receive the file destination that client fragment uploads.

11. data processing system according to claim 7 or 8, which is characterized in that the Second processing module includes:

Adding unit, for the data after parsing to be added to grouping corresponding with the target matrix a, wherein grouping A corresponding thread；

Sharding unit, for by default size to belonging to the file destination in the grouping corresponding with the target matrix Data carry out fragment, obtain N number of data fragmentation, wherein N is integer more than or equal to 2；

Generation unit is looked into for each data fragmentation in N number of data fragmentation to be generated a batch processing configuration respectively Language SQL is ask, N batch processing SQL is obtained；

Data in N number of data fragmentation are sequentially written in by execution unit for executing the N batch processing SQL one by one The target matrix.

12. data processing system according to any one of claims 7 to 9, which is characterized in that the preconfigured member Data mapping ruler is the metadata mapping ruler of received user setting and/or the preconfigured resolution rules are to connect The resolution rules of the user setting of receipts.

13. a kind of data processing system, which is characterized in that including processor, memory and be stored on the memory and can The computer program run on the processor realizes such as claim when the computer program is executed by the processor Step in data processing method described in any one of 1 to 6.

14. a kind of computer readable storage medium, which is characterized in that be stored with computer on the computer readable storage medium Program realizes such as data processing method described in any one of claims 1 to 6 when the computer program is executed by processor In step.