CN108984177A - A kind of data processing method and system - Google Patents
A kind of data processing method and system Download PDFInfo
- Publication number
- CN108984177A CN108984177A CN201810643204.5A CN201810643204A CN108984177A CN 108984177 A CN108984177 A CN 108984177A CN 201810643204 A CN201810643204 A CN 201810643204A CN 108984177 A CN108984177 A CN 108984177A
- Authority
- CN
- China
- Prior art keywords
- data
- file destination
- processed
- queue
- file
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformation of program code
- G06F8/41—Compilation
- G06F8/42—Syntactic analysis
- G06F8/427—Parsing
Abstract
The present invention provides a kind of data processing method and system, this method comprises: receiving file destination;The first queue to be processed is added in the file destination, wherein first queue to be processed is First Input First Output;When the file obtained from the described first queue to be processed is the file destination, according to preconfigured metadata mapping ruler, the corresponding target matrix of the file destination is determined;According to the metadata mapping ruler and preconfigured resolution rules, the data in the file destination are parsed;The target matrix is written into data after parsing.Data processing method provided by the invention, it carries out waiting in line to parse by the way that the file destination received to be added in First Input First Output, it thereby may be ensured that data processing system is parsed and handled to the received file to be processed of institute according to the sequence of first in, first out, and then can be improved document analysis speed and treatment effeciency.
Description
Technical field
The present invention relates to field of communication technology more particularly to a kind of data processing methods and system.
Background technique
With the progress of information technology, many enterprises or department can all establish corresponding information system and carry out management business number
According to, and in practice, an enterprise or department generally require multiple information systems to manage different business datums respectively.And with
The development of enterprise, the required business datum amount handled of each system is also constantly increasing, currently, the basic process of data processing is:
What is uploaded to user includes that the Excel file of business datum parses, and corresponding number is written in the data of successfully resolved
According to table, while being stored.
However, being in the prior art usually to use open source analytical tool, dissection process is directly carried out after reading data, and
It is not concerned with resolution speed, thus there is a problem of that resolution speed is slow, EMS memory occupation is high, it is especially high in files in batch upload, data
Under concurrent scene, the treatment effeciency of system is very low.
Summary of the invention
The embodiment of the present invention provides a kind of data processing method and system, lower to solve available data processing method efficiency
The problem of.
In order to solve the above technical problems, the present invention is implemented as follows:
In a first aspect, being applied to data processing system the embodiment of the invention provides a kind of data processing method, comprising:
Receive file destination;
The first queue to be processed is added in the file destination, wherein first queue to be processed is first in, first out team
Column;
When the file obtained from the described first queue to be processed is the file destination, according to preconfigured first number
According to mapping ruler, the corresponding target matrix of the file destination is determined;
According to the metadata mapping ruler and preconfigured resolution rules, the data in the file destination are carried out
Parsing;
The target matrix is written into data after parsing.
Second aspect, the embodiment of the present invention provide a kind of data processing system, comprising:
Receiving module, for receiving file destination;
First processing module, for the first queue to be processed to be added in the file destination, wherein described first is to be processed
Queue is First Input First Output;
Determining module, for when the file obtained from the described first queue to be processed be the file destination when, according to
Preconfigured metadata mapping ruler determines the corresponding target matrix of the file destination;
Parsing module is used for according to the metadata mapping ruler and preconfigured resolution rules, to the target text
Data in part are parsed;
The target matrix is written for the data after parsing in Second processing module.
The third aspect, the embodiment of the present invention provide a kind of data processing system, including processor, memory and are stored in institute
The computer program that can be run on memory and on the processor is stated, when the computer program is executed by the processor
Realize the step in above-mentioned data processing method.
Fourth aspect, the embodiment of the present invention provide a kind of computer readable storage medium, the computer-readable storage medium
Computer program is stored in matter, the computer program realizes the step in above-mentioned data processing method when being executed by processor
Suddenly.
In the embodiment of the present invention, carry out waiting in line to solve by the way that the file destination received to be added in First Input First Output
Analysis thereby may be ensured that data processing system is parsed and located to the received file to be processed of institute according to the sequence of first in, first out
Reason, and then can be improved document analysis speed and treatment effeciency, and can be avoided the data processing system on files in batch
The problem for passing, getting congestion under the scene of data high concurrent and causing treatment effeciency very low.
Detailed description of the invention
In order to illustrate the technical solution of the embodiments of the present invention more clearly, needed in being described below to the embodiment of the present invention
Attached drawing to be used is briefly described, it should be apparent that, drawings in the following description are only some embodiments of the invention,
For those of ordinary skill in the art, without any creative labor, it can also obtain according to these attached drawings
Obtain other attached drawings.
Fig. 1 is the flow chart for the data processing method that one embodiment of the invention provides;
Fig. 2 be another embodiment of the present invention provides data processing method flow chart;
Fig. 3 is the structure chart of data processing system provided in an embodiment of the present invention;
Fig. 4 is the structure chart of the parsing module of data processing system provided in an embodiment of the present invention;
Fig. 5 is the structure chart of the resolution unit in the parsing module of data processing system provided in an embodiment of the present invention;
Fig. 6 is the structure chart of the Second processing module of data processing system provided in an embodiment of the present invention.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete
Site preparation description, it is clear that described embodiments are some of the embodiments of the present invention, instead of all the embodiments.Based on this hair
Embodiment in bright, every other implementation obtained by those of ordinary skill in the art without making creative efforts
Example, shall fall within the protection scope of the present invention.
It is a kind of flow chart of data processing method provided in an embodiment of the present invention referring to Fig. 1, Fig. 1, is applied at data
Reason system, as shown in Figure 1, the described method comprises the following steps:
Step 101 receives file destination.
In the present embodiment, before carrying out data processing, file destination to be processed need to be first received, reception is specifically can be and uses
The file destination that family is submitted in client, wherein the file destination can be the file of excel format, the file destination
It can be user's manual creation editor and be inserted into the file formed after related data, be also possible to the template generated based on system,
The file obtained after insertion related data.
In the step, above-mentioned reception file destination, which can be, disposably receives the file destination, is also possible to fragment and connects
The file destination is received, such as: being no more than the primary institute of the data processing system in the size of the file destination can be received
When largest amount, the file destination disposably can be completely received, is more than the data in the size of the file destination
When the primary received largest amount of energy of processing system, then the file destination can be split as multiple segments progress by client
It uploads, so that the data processing system fragment receives the file destination.
The first queue to be processed is added in the file destination by step 102, wherein first queue to be processed is first
Enter first dequeue.
After receiving the file destination, the first queue to be processed can be added in the file destination, to described
File destination carries out queuing processing, in this way, can ensure that all files for being uploaded to the data processing system can be according to one
Fixed order is handled, to be not susceptible to congestion and influence system treatment effeciency.
Wherein, the described first queue to be processed is First Input First Output, i.e., first received file is first handled, rear received text
Part post-processing, such as: if having been deposited in first queue to be processed before the first queue to be processed is added in the file destination
It is to be processed in two files etc., then by the file destination be added the first queue to be processed after, need to wait for both of these documents from
In first queue to be processed after allocation processing, then it is allocated processing.
Step 103, when the file obtained from the described first queue to be processed is the file destination, according to matching in advance
The metadata mapping ruler set determines the corresponding target matrix of the file destination.
In the embodiment of the present invention, the data processing system can constantly obtain from the described first queue to be processed to be processed
File can all obtain the file being added at first in the described first queue to be processed and when obtaining each time.
When the file obtained from the described first queue to be processed is the file destination, then can start to the mesh
Mark file is handled, and specifically, can determine that the file destination is corresponding first according to preconfigured metadata mapping ruler
Target matrix, wherein the metadata mapping ruler may include the rule of correspondence of filename and tables of data, in this way, can
Tables of data corresponding with the title of the file destination is searched, so that it is determined that the file destination according to the rule of correspondence
Corresponding target matrix.
Wherein, the target matrix is used to store the data after parsing in the file destination, and the target data
Table can be deposited in the target database of the data processing system, and the target matrix can both be convenient for user's subsequent query phase
Data are closed, the data processing system can also be facilitated to be called and be pocessed related data therein.
Step 104, according to the metadata mapping ruler and preconfigured resolution rules, in the file destination
Data are parsed.
It, can be according to the metadata mapping ruler and preconfigured resolution rules, to the target in the present embodiment
Data in file are parsed, wherein the metadata mapping ruler can also include the mappings such as metadata format, relationship rule
Then, such as: by the data field of each column in the file destination, data type, data length, numberical range, whether may be used
It carries out for sky etc. with the metadata format mapping rule postponed, by the relationship in the file destination between the column and the column
Or the metadata relationship mapping ruler that formula that need to meet etc. is configured;The resolution rules can be preconfigured
The rule how data in the file destination parsed, such as: which row of the regulation from the file destination and
Which column starts to read data, need to parse how many column data altogether, and the file destination divides in the case where incremental update and full dose update
Not corresponding resolution rules etc..
It should be noted that the metadata mapping ruler and the resolution rules may each be unique with the file destination
It is corresponding, i.e., for the file destination of different structure, it can be corresponding with different metadata mapping ruler and resolution rules, specifically
Corresponding configuration can be carried out according to user demand.
In the step, the above-mentioned data in the file destination are parsed, and be can be and are read according to the resolution rules
The data in the file destination are taken, and according to the metadata mapping ruler, verify whether read data meet in advance
The requirement such as format, relationship of configuration, such as: it is described if resolution rules agreement reads data since the second row of secondary series
The data type that metadata mapping ruler arranges secondary series is integer data, and numerical value length is 10, then to the file destination
When being parsed, the data of the second row of secondary series in the file destination can be first read, and verify the number of the second row of secondary series
It whether is integer data according to type, whether numerical value length is 10.
In this way, by preconfigured metadata mapping ruler and preconfigured resolution rules, it not only can be quickly quasi-
Really the data in the file destination are parsed, and can satisfy the data processing needs of the file of different structure.
The target matrix is written in data after parsing by step 105.
After parsing to the data in the file destination, the target can be written in the data after parsing
Tables of data specifically can be to extract and parse the data passed through in the file destination, and will be described in the write-in of extracted data
Target matrix, wherein when data are written to the target matrix, it can be the configuration according to the target matrix,
The data that parsing in the file destination passes through are inserted into corresponding position in the target matrix.
It should be noted that completing after the target matrix is written in the data after parsing to the target text
The data processing task of part, and the target matrix will be stored in the target data of the data processing system after write
In library, to facilitate user to inquire subsequently through the data processing system data in the target matrix, or facilitate institute
The subsequent data in the target matrix of data processing system are stated to be called or be further processed.
In addition, after the target matrix is written in the data after parsing the file destination can also be saved, specifically
The file destination can be saved to preset memory space, if local storage space or cloud share memory space, with side
Just user is subsequent checks the file destination, shares memory space if storing to cloud, can also realize that multiple spot is shared, user
Without repeating the data file for uploading identical content.
In the embodiment of the present invention, above-mentioned data processing system be can be including processor, hard disk, memory, system bus etc.
Computer service system, such as: server.
Data processing method in the present embodiment is carried out by the way that the file destination received to be added in First Input First Output
Wait in line to parse, thereby may be ensured that data processing system according to first in, first out sequence to the received file to be processed of institute into
Row parsing and processing, and then can be improved document analysis speed and treatment effeciency, and can be avoided the data processing system and exist
The problem that files in batch uploads, gets congestion under the scene of data high concurrent and cause treatment effeciency very low.
Referring to fig. 2, Fig. 2 is the flow chart of another data processing method provided in an embodiment of the present invention, is applied to data
Processing system, on the basis of the present embodiment embodiment shown in Fig. 1, to how according to the metadata mapping ruler and in advance
The resolution rules of configuration, to the data in the file destination carry out parsing refined, thus make document analysis speed and
Treatment effeciency is further improved.As shown in Fig. 2, the described method comprises the following steps:
Step 201 receives file destination.
The specific embodiment of the step may refer to the embodiment of step 101 in embodiment of the method shown in FIG. 1, be
It avoids repeating, which is not described herein again.
Optionally, step 201 includes:
Receive the file destination that client fragment uploads.
The file destination that above-mentioned reception client fragment uploads, can be when the size of the file destination is excessive, is
Success can be uploaded by guaranteeing the file destination, and client carries out fragment to the file destination by being sized, then by institute
It states file destination fragment and is uploaded to the data processing system, in this way, the data processing system just will receive client fragment
The file destination uploaded.
Such as: the size of the file destination is 100M, and the file destination is split as 10 fragments by client, each
Fragment is 10M, and 10 fragments are successively then uploaded to the data processing system, in this way, the data processing system
System can receive a fragment every time, until receiving the last one fragment.
In the embodiment, by using fragment uploading file, it can not only guarantee that the data processing system can connect
The file size of receipts is unrestricted, so that the data processing system is capable of handling big file, and can also preferably guarantee net
Network connectivity, such as: during receiving the fragment of the file destination, when there is the case where suspension, without extensive in network
The file destination is received again when multiple, but can continue to the fragment not received after suspension.
Certainly, which is equally applicable in embodiment shown in FIG. 1, and can reach identical beneficial effect.
The first queue to be processed is added in the file destination by step 202, wherein first queue to be processed is first
Enter first dequeue.
The specific embodiment of the step may refer to the embodiment of step 102 in embodiment of the method shown in FIG. 1, and
Identical beneficial effect can be reached, to avoid repeating, which is not described herein again.
Step 203, when the file obtained from the described first queue to be processed is the file destination, according to matching in advance
The metadata mapping ruler set determines the corresponding target matrix of the file destination.
The specific embodiment of the step may refer to the embodiment of step 103 in embodiment of the method shown in FIG. 1, be
It avoids repeating, which is not described herein again.
Step 204, the parsing task for obtaining the file destination from the described first queue to be processed by destination node,
Wherein, the destination node is any idle node of the processing cluster in the data processing system.
In the present embodiment, the data processing system may include processing cluster, i.e., the described data processing system can wrap
Multiple processors are included, the multiple processor interconnects, and each processor can be to be processed from described first
The parsing task of file to be processed is obtained in queue.
Since the data processing system executes the parsing task of file to be processed using processing Clustering mechanism,
In the step, the parsing task of the file destination will be obtained from the described first queue to be processed by destination node, wherein
The destination node can be any idle node of the processing cluster in the data processing system.In this way, working as described first
There are when multiple files to be processed in queue to be processed, the multiple file to be processed can be sequentially allocated the processing
The different idle nodes of cluster carry out dissection process, without waiting for a long time.
In this way, executing parsing task, processing employed in the present embodiment using single processor compared to the prior art
Clustering mechanism can not only further increase analyzing efficiency, avoid prolonged EMS memory occupation, moreover it is possible to ensure the data processing
The stability of system, such as: when some node of the processing cluster breaks down, solution can also be executed by other nodes
Analysis task, to guarantee that the data processing system can work normally.
It should be noted that in the step, it is described being obtained from the described first queue to be processed by the destination node
When the parsing task of file destination, the file destination of acquisition can also be saved to the shared memory space in cloud, so as to
Also the file destination can be accessed by the data processing system in other networked clients, without user repeat to submit with
The structure and content of the file destination file all the same.
Step 205, according to the metadata mapping ruler and preconfigured resolution rules, pass through the destination node pair
Data in the file destination are parsed.
The parsing task of the file destination is obtained by then passing through the destination node, is to pass through in the step therefore
The destination node parses the data in the file destination.Wherein, about above-mentioned metadata mapping ruler, above-mentioned solution
The explanation that analysis rule and above-mentioned data in the file destination parse etc. is referred in Fig. 1 corresponding embodiment
The explanation of corresponding portion, to avoid repeating, details are not described herein again.
Optionally, the preconfigured metadata mapping ruler is the metadata mapping ruler of received user setting,
And/or the preconfigured resolution rules are the resolution rules of received user setting.
In the embodiment, the preconfigured metadata mapping ruler can by user setting and/or it is described in advance
The resolution rules of configuration can by user setting, specifically can be user first pass through in advance client setting function setting it is desired
Metadata mapping ruler and/or resolution rules, then client is by the metadata mapping ruler and/or resolution rules of user setting
Be sent to the data processing system so that the data processing system receive user setting metadata mapping ruler and/or
Resolution rules.
Wherein, the explanation of the metadata mapping ruler and the resolution rules is referred to phase in Fig. 1 corresponding embodiment
The explanation of part is answered, to avoid repeating, details are not described herein again.
It is noted that in addition to user described in the embodiment can be to metadata mapping ruler, resolution rules
Be configured it is outer, user can also access right, access authority to variant client be configured, such as: setting first
Client is able to use the data processing system and carries out data processing, and the site in the accessible data processing system
Data and supplier data, the second client of setting cannot use the data processing system to carry out data processing, but can visit
Ask the site data etc. in the data processing system.In addition, user can also be to the corresponding database of different clients, system
The memory space etc. of file is configured, such as: the first client of setting corresponds to first database, by first client
Data in the file of upload are stored into the first database, corresponding first memory space of the first client of setting, will
The file and journal file that first client uploads are stored to first memory space.
In this way, by the metadata mapping ruler and/or resolution rules that receive user setting, so as to make at the data
Reason system carries out dissection process to the received file destination of institute according to the demand of user, and then has the data processing system
Flexibility and versatility, for the file to be processed of different structure, user only needs corresponding changing metadata mapping ruler, parsing rule
Then, develop new parsing code again without developer so that the data processing system can satisfy it is not of the same trade or business
The process demand for data of being engaged in.
Certainly, which is equally applicable in embodiment shown in FIG. 1, and can reach identical beneficial effect.
The target matrix is written in data after parsing by step 206.
The specific embodiment of the step may refer to the embodiment of step 105 in embodiment of the method shown in FIG. 1, be
It avoids repeating, which is not described herein again.
Optionally, step 206 includes:
Data after parsing are added to grouping corresponding with the target matrix, wherein a grouping is one corresponding
Thread;
By default size to belong in the grouping corresponding with the target matrix data of the file destination into
Row fragment obtains N number of data fragmentation, wherein N is the integer more than or equal to 2;
Each data fragmentation in N number of data fragmentation is generated into a batch processing configuration query language respectively
(Structured Query Language, abbreviation SQL), obtains N batch processing SQL;
The N batch processing SQL is executed one by one, and the data in N number of data fragmentation are sequentially written in the target
Tables of data.
In the embodiment, data write-in processing can be carried out by the way of thread pool can be specifically pre-created
Preset quantity thread, so that the data processing system can satisfy the number under files in batch upload, data high concurrent scene
According to batch requirement, and when preparing that data are written, can first to the data after all parsings by write-in tables of data it is different into
Row grouping, wherein the data of write-in identical data table are a grouping, the corresponding thread of a grouping, in this way, need to be written same
The data of one tables of data will be in the processing to be written such as the same thread.
Such as: it, can be by each text in five files after carrying out batch parsing to the data in five files
Part the difference of corresponding write-in tables of data be grouped, however, it is determined that the first file in five files and the second text
Part corresponds to the first tables of data, corresponding second tables of data of the third file and the 4th file in five files, five texts
The 5th file in part corresponds to third tables of data, then, can be with when being grouped to the data after parsing in five files
Data after parsing in first file and second file are added to the first grouping, by the third file and described
Data after parsing in 4th file are added to second packet, and the data after parsing in the 5th file are added to third point
Group, wherein described first is grouped corresponding first thread, corresponding second thread of the second packet, the third grouping corresponding the
Three threads.
In this way, the data after parsing in the file destination can be added to corresponding with the target matrix point
Group, so as to be added to the data after being parsed in the file destination in thread corresponding with grouping where it, the data
The target matrix will be sequentially written to the data in the thread in processing system.Here, it should be noted that, in the embodiment
The multiple files being previously mentioned correspond to same write-in tables of data and refer to that the data in the multiple file in each file are both needed to be written
Mutually isostructural tables of data, but for each file, a new tables of data will be respectively created, the data in each file will divide
It a tables of data and Xie Ru not be stored.
It, can also be first to the mesh after the data after parsing are added to grouping corresponding with the target matrix
It marks the data after parsing in file and carries out fragment, to prevent from causing to be written due to the data after parsing in the file destination are excessive
The overlong time of the target matrix, moreover it is possible to avoid occurring suspension in writing process and resulting in the need for re-writing.Specifically
Ground can carry out the data for belonging to the file destination in the grouping corresponding with the target matrix by default size
Fragment, so that N number of data fragmentation is obtained, such as: it can will belong in the grouping corresponding with the target matrix described
The data of file destination are divided into 100,500 or 1000 data fragmentations by the default size, wherein the default size can be with
It is set according to the processing capacity of the data processing system.
By default size to the data for belonging to the file destination in the grouping corresponding with the target matrix
After carrying out fragment, each data fragmentation in N number of data fragmentation can be generated into a batch processing configuration respectively and looked into
Language SQL is ask, N batch processing SQL is obtained, specifically can be the data being based respectively in each data fragmentation, it is each to generate one
Corresponding batch processing SQL, wherein every batch processing SQL is used to indicate once to write the initial data in this batch processing SQL
Enter the target position in the target matrix, therefore, the language construction of the N batch processing SQL can be it is identical, only
Data to be written therein are different with target writing position.
After generating the N batch processing SQL, the N batch processing SQL can be executed one by one, by N number of number
The target matrix is sequentially written according to the data in fragment, wherein one batch processing SQL of every execution can be realized primary
Property a plurality of data included in a data fragmentation are written to the target matrix, so as to improve data write-in effect
Rate.
It should be noted that executing the N batch processing SQL one by one, to be sequentially written in the N to the target matrix
During data in a data fragmentation, can with one batch processing SQL of every execution, interval preset duration, such as: interval 1 second,
2 seconds or 5 seconds etc., this way it is possible to avoid the case where deadlock occurs in the target database, and then ensure the data processing system
Has more reliable performance.
It is also pointed out that may be used also after the target matrix is written in the data after parsing in the file destination
With the data query instruction inputted according to user, the target data inquired needed for output user, specifically, user can be described
Data query interface input in data processing system includes that the data query of keyword instructs, so that the data processing system
Keyword included in inquiry instruction based on the data, into being inquired in the target database for storing the target matrix
With the target data of the Keywords matching, and the target data inquired is subjected to output and is shown.In this way, at the data
Reason system is also equipped with simple and fast data query function.
In this way, by the way that the data after parsing are added to grouping corresponding with the target matrix, and by default size
Fragment is carried out to the data for belonging to the file destination in the grouping corresponding with the target matrix, then by the N
Each data fragmentation in a data fragmentation generates a batch processing SQL respectively, finally executes the N batch processing SQL one by one,
The data in N number of data fragmentation are sequentially written in the target matrix, the data processing system can not only be ensured
System can satisfy batch data write-in demand, additionally it is possible to improve the data write efficiency of the data processing system.
Certainly, which is equally applicable in embodiment shown in FIG. 1, and can reach identical beneficial effect.
Optionally, step 205 includes:
According to the metadata mapping ruler and preconfigured resolution rules, the mesh is read by the destination node
The data in file are marked, and legitimate verification is carried out to read data;
It will be verified as illegal data deposit exception database in the file destination, and will be tested in the file destination
Card is that the second queue to be processed is added in legal data, wherein second queue to be processed is First Input First Output;
Step 206 includes:
When being the data in the file destination from the data obtained in the described second queue to be processed, by the target
The target matrix is written in data in file.
In the embodiment, the data in the file destination are read above by the destination node, and to being read
Data carry out legitimate verification, can be through the destination node, according to the resolution rules, from the file destination
Data are read, and according to the metadata mapping ruler, whether verify read data legal, such as: verifying is read
Whether the format of data meets the call format arranged in the metadata mapping ruler, verify it is read between the column and the column
Whether the numerical value of data meets the relationship etc. arranged in the metadata mapping ruler.
If verifying in the file destination, there are illegal data, can will be verified as not in the file destination
Legal data are stored in the exception database pre-established, and can be by the illegal reason of corresponding record data, to facilitate use
Family checks illegal data by the exception database and knows the illegal reason of data.
For being verified as legal data in the file destination, then can be added in the second queue to be processed at waiting
Reason then can be by the mesh when being the data in the file destination from the data obtained in the described second queue to be processed
The target matrix is written in the data marked in file, wherein second queue to be processed is First Input First Output, described to incite somebody to action
The specific embodiment that the target matrix is written in data in the file destination is referred to phase in Fig. 1 corresponding embodiment
The explanation of part is answered, to avoid repeating, details are not described herein again.In this way, data legal after all verifyings can sequentially be written
Corresponding tables of data, so as to guarantee higher treatment effeciency, and it is not easy to make mistakes.
It is noted that after the legitimacy to the data in the file destination verified, it can also be in institute
The analysis state of the file destination is marked in the tag field for stating the first queue to be processed, such as: it is default by adding
Identifier marks the file destination to have parsed.In this way, identical file destination ought be received again, (such as user repeats to upload
The file destination) when, it can determine that the file destination has parsed by the tag field in the described first queue to be processed,
Without parsing again to the file destination, but the parsing link can be skipped, and enter next process flow.
In this way, will be verified as in the file destination illegal by carrying out legitimate verification to read data
Data are stored in exception database, and the second queue to be processed is added by legal data are verified as in the file destination, not only
Convenient for the illegal data of user query and know the illegal reason of data, moreover it is possible to guarantee higher data write efficiency.
In the present embodiment, on the basis of embodiment shown in Fig. 1, to how according to the metadata mapping ruler and pre-
The resolution rules first configured carry out parsing to the data in the file destination and are refined, to make document analysis speed
It is further improved with treatment effeciency.In addition, also added on the basis of the present embodiment embodiment shown in Fig. 1 it is a variety of can
The embodiment of choosing, these optional embodiments can be combined with each other realization, can also be implemented separately, and be attained by raising
Document analysis speed and treatment effeciency avoid the data processing system from uploading, under the scene of data high concurrent in files in batch
The technical effect for getting congestion and causing treatment effeciency very low.
It is a kind of structural schematic diagram of data processing system provided in an embodiment of the present invention referring to Fig. 3, Fig. 3, such as Fig. 3 institute
Show, data processing system 300 includes:
Receiving module 301, for receiving file destination;
First processing module 302, for by the file destination be added the first queue to be processed, wherein described first to
Processing queue is First Input First Output;
Determining module 303, for when the file obtained from the described first queue to be processed be the file destination when, root
According to preconfigured metadata mapping ruler, the corresponding target matrix of the file destination is determined;
Parsing module 304 is used for according to the metadata mapping ruler and preconfigured resolution rules, to the target
Data in file are parsed;
The target matrix is written for the data after parsing in Second processing module 305.
Optionally, as shown in figure 4, the parsing module 304 includes:
Acquiring unit 3041, for obtaining the file destination from the described first queue to be processed by destination node
Parsing task, wherein the destination node is any idle node of the processing cluster in the data processing system;
Resolution unit 3042 is used for according to the metadata mapping ruler and preconfigured resolution rules, by described
Destination node parses the data in the file destination.
Optionally, as shown in figure 5, the resolution unit 3042 includes:
Subelement 30421 is verified, for passing through institute according to the metadata mapping ruler and preconfigured resolution rules
It states destination node and reads data in the file destination, and legitimate verification is carried out to read data;
Subelement 30422 is handled, for illegal data deposit exception database will to be verified as in the file destination,
And the second queue to be processed is added by legal data are verified as in the file destination, wherein second queue to be processed
For First Input First Output;
The Second processing module 305 is used to when the data obtained from the described second queue to be processed be the target text
When data in part, the target matrix is written into the data in the file destination.
Optionally, the receiving module 301 is used to receive the file destination of client fragment upload.
Optionally, as shown in fig. 6, the Second processing module 305 includes:
Adding unit 3051, for the data after parsing to be added to grouping corresponding with the target matrix, wherein
The corresponding thread of one grouping;
Sharding unit 3052, for described to belonging in the grouping corresponding with the target matrix by default size
The data of file destination carry out fragment, obtain N number of data fragmentation, wherein N is the integer more than or equal to 2;
Generation unit 3053, for each data fragmentation in N number of data fragmentation to be generated a batch processing respectively
Structured query language SQL obtains N batch processing SQL;
Execution unit 3054, for executing the N batch processing SQL one by one, by the data in N number of data fragmentation
It is sequentially written in the target matrix.
Optionally, the preconfigured metadata mapping ruler is the metadata mapping ruler of received user setting,
And/or the preconfigured resolution rules are the resolution rules of received user setting.
Data processing system 300 can be realized each mistake that data processing system is realized in the embodiment of the method for Fig. 1 and Fig. 2
Journey, to avoid repeating, which is not described herein again.The data processing system 300 of the embodiment of the present invention passes through the target text that will be received
Part is added in First Input First Output and carries out waiting in line to parse, and thereby may be ensured that received to institute according to the sequence of first in, first out
File to be processed is parsed and is handled, and then can be improved document analysis speed and treatment effeciency, and can be avoided in file
The problem that batch uploads, gets congestion under the scene of data high concurrent and cause treatment effeciency very low.
The embodiment of the present invention also provides a kind of data processing system, including processor, and memory stores on a memory simultaneously
The computer program that can be run on the processor, the computer program realize above-mentioned data processing side when being executed by processor
Each process of method embodiment, and identical technical effect can be reached, to avoid repeating, which is not described herein again.
The embodiment of the present invention also provides a kind of computer readable storage medium, and meter is stored on computer readable storage medium
Calculation machine program, the computer program realize each process of above-mentioned data processing method embodiment, and energy when being executed by processor
Reach identical technical effect, to avoid repeating, which is not described herein again.Wherein, the computer readable storage medium, such as only
Read memory (Read-Only Memory, abbreviation ROM), random access memory (Random Access Memory, abbreviation
RAM), magnetic or disk etc..
It should be noted that, in this document, the terms "include", "comprise" or its any other variant are intended to non-row
His property includes, so that the process, method, article or the device that include a series of elements not only include those elements, and
And further include other elements that are not explicitly listed, or further include for this process, method, article or device institute it is intrinsic
Element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that including being somebody's turn to do
There is also other identical elements in the process, method of element, article or device.
Through the above description of the embodiments, those skilled in the art can be understood that above-described embodiment side
Method can be realized by means of software and necessary general hardware platform, naturally it is also possible to by hardware, but in many cases
The former is more preferably embodiment.Based on this understanding, technical solution of the present invention substantially in other words does the prior art
The part contributed out can be embodied in the form of software products, which is stored in a storage medium
In (such as ROM/RAM, magnetic disk, CD), including some instructions are used so that a terminal (can be mobile phone, computer, service
Device, air conditioner or network equipment etc.) execute method described in each embodiment of the present invention.
The embodiment of the present invention is described with above attached drawing, but the invention is not limited to above-mentioned specific
Embodiment, the above mentioned embodiment is only schematical, rather than restrictive, those skilled in the art
Under the inspiration of the present invention, without breaking away from the scope protected by the purposes and claims of the present invention, it can also make very much
Form belongs within protection of the invention.
Claims (14)
1. a kind of data processing method is applied to data processing system characterized by comprising
Receive file destination;
The first queue to be processed is added in the file destination, wherein first queue to be processed is First Input First Output;
When the file obtained from the described first queue to be processed is the file destination, reflected according to preconfigured metadata
Rule is penetrated, determines the corresponding target matrix of the file destination;
According to the metadata mapping ruler and preconfigured resolution rules, the data in the file destination are solved
Analysis;
The target matrix is written into data after parsing.
2. the method according to claim 1, wherein described according to the metadata mapping ruler and being pre-configured with
Resolution rules, the data in the file destination are parsed, comprising:
The parsing task of the file destination is obtained from the described first queue to be processed by destination node, wherein the mesh
Mark any idle node that node is the processing cluster in the data processing system;
According to the metadata mapping ruler and preconfigured resolution rules, by the destination node to the file destination
In data parsed.
3. according to the method described in claim 2, it is characterized in that, described according to the metadata mapping ruler and being pre-configured with
Resolution rules, the data in the file destination are parsed by the destination node, comprising:
According to the metadata mapping ruler and preconfigured resolution rules, the target text is read by the destination node
Data in part, and legitimate verification is carried out to read data;
It will be verified as illegal data deposit exception database in the file destination, and will be verified as in the file destination
The second queue to be processed is added in legal data, wherein second queue to be processed is First Input First Output;
The target matrix is written in the data by after parsing, comprising:
When being the data in the file destination from the data obtained in the described second queue to be processed, by the file destination
In data the target matrix is written.
4. according to the method in any one of claims 1 to 3, which is characterized in that the reception file destination, comprising:
Receive the file destination that client fragment uploads.
5. method according to claim 1 or 2, which is characterized in that the number of targets is written in the data by after parsing
According to table, comprising:
Data after parsing are added to grouping corresponding with the target matrix, wherein the corresponding thread of a grouping;
The data for belonging to the file destination in the grouping corresponding with the target matrix are divided by default size
Piece obtains N number of data fragmentation, wherein N is the integer more than or equal to 2;
Each data fragmentation in N number of data fragmentation is generated into a batch processing configuration query language SQL respectively, is obtained
N batch processing SQL;
The N batch processing SQL is executed one by one, and the data in N number of data fragmentation are sequentially written in the target data
Table.
6. according to the method in any one of claims 1 to 3, which is characterized in that the preconfigured metadata mapping
Rule is the metadata mapping ruler of received user setting and/or the preconfigured resolution rules are received user
The resolution rules of setting.
7. a kind of data processing system characterized by comprising
Receiving module, for receiving file destination;
First processing module, for the first queue to be processed to be added in the file destination, wherein first queue to be processed
For First Input First Output;
Determining module, for when the file obtained from the described first queue to be processed be the file destination when, according to preparatory
The metadata mapping ruler of configuration determines the corresponding target matrix of the file destination;
Parsing module is used for according to the metadata mapping ruler and preconfigured resolution rules, in the file destination
Data parsed;
The target matrix is written for the data after parsing in Second processing module.
8. data processing system according to claim 7, which is characterized in that the parsing module includes:
Acquiring unit, the parsing for obtaining the file destination from the described first queue to be processed by destination node are appointed
Business, wherein the destination node is any idle node of the processing cluster in the data processing system;
Resolution unit, for passing through the destination node according to the metadata mapping ruler and preconfigured resolution rules
Data in the file destination are parsed.
9. data processing system according to claim 8, which is characterized in that the resolution unit includes:
Subelement is verified, for passing through the target section according to the metadata mapping ruler and preconfigured resolution rules
Point reads the data in the file destination, and carries out legitimate verification to read data;
Subelement is handled, for illegal data deposit exception database will to be verified as in the file destination, and will be described
It is verified as legal data in file destination, the second queue to be processed is added, wherein second queue to be processed is first to enter elder generation
Dequeue;
It is in the file destination that the Second processing module, which is used to work as from the data obtained in the described second queue to be processed,
When data, the target matrix is written into the data in the file destination.
10. data processing system according to any one of claims 7 to 9, which is characterized in that the receiving module is used for
Receive the file destination that client fragment uploads.
11. data processing system according to claim 7 or 8, which is characterized in that the Second processing module includes:
Adding unit, for the data after parsing to be added to grouping corresponding with the target matrix a, wherein grouping
A corresponding thread;
Sharding unit, for by default size to belonging to the file destination in the grouping corresponding with the target matrix
Data carry out fragment, obtain N number of data fragmentation, wherein N is integer more than or equal to 2;
Generation unit is looked into for each data fragmentation in N number of data fragmentation to be generated a batch processing configuration respectively
Language SQL is ask, N batch processing SQL is obtained;
Data in N number of data fragmentation are sequentially written in by execution unit for executing the N batch processing SQL one by one
The target matrix.
12. data processing system according to any one of claims 7 to 9, which is characterized in that the preconfigured member
Data mapping ruler is the metadata mapping ruler of received user setting and/or the preconfigured resolution rules are to connect
The resolution rules of the user setting of receipts.
13. a kind of data processing system, which is characterized in that including processor, memory and be stored on the memory and can
The computer program run on the processor realizes such as claim when the computer program is executed by the processor
Step in data processing method described in any one of 1 to 6.
14. a kind of computer readable storage medium, which is characterized in that be stored with computer on the computer readable storage medium
Program realizes such as data processing method described in any one of claims 1 to 6 when the computer program is executed by processor
In step.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810643204.5A CN108984177A (en) | 2018-06-21 | 2018-06-21 | A kind of data processing method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810643204.5A CN108984177A (en) | 2018-06-21 | 2018-06-21 | A kind of data processing method and system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108984177A true CN108984177A (en) | 2018-12-11 |
Family
ID=64541657
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810643204.5A Pending CN108984177A (en) | 2018-06-21 | 2018-06-21 | A kind of data processing method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108984177A (en) |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110502562A (en) * | 2019-08-16 | 2019-11-26 | 深圳证券交易所 | Data lead-in method and device, readable storage medium storing program for executing |
CN110825920A (en) * | 2019-10-22 | 2020-02-21 | 厦门市美亚柏科信息股份有限公司 | Data processing method and device |
CN111209736A (en) * | 2020-01-03 | 2020-05-29 | 恩亿科(北京)数据科技有限公司 | Text file analysis method and device, computer equipment and storage medium |
CN111339103A (en) * | 2020-03-13 | 2020-06-26 | 河南安冉云网络科技有限公司 | Data exchange method and system based on full fragmentation and incremental log analysis |
CN111506569A (en) * | 2020-03-02 | 2020-08-07 | 平安科技(深圳)有限公司 | Data storage method and device and electronic device |
CN111651514A (en) * | 2020-07-09 | 2020-09-11 | 中国银行股份有限公司 | Data import method and device |
CN112000646A (en) * | 2020-08-25 | 2020-11-27 | 北京浪潮数据技术有限公司 | Database initialization method and device, electronic equipment and storage medium |
CN112860631A (en) * | 2021-04-25 | 2021-05-28 | 成都淞幸科技有限责任公司 | Efficient metadata batch configuration method |
CN113254437A (en) * | 2020-02-11 | 2021-08-13 | 北京京东振世信息技术有限公司 | Batch processing job processing method and device |
CN113448585A (en) * | 2020-12-11 | 2021-09-28 | 北京新氧科技有限公司 | Optimization method and device for thread pool, electronic equipment and storage medium |
CN113824651A (en) * | 2021-11-25 | 2021-12-21 | 上海金仕达软件科技有限公司 | Market data caching method and device, storage medium and electronic equipment |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2005011029A (en) * | 2003-06-18 | 2005-01-13 | Matsushita Electric Ind Co Ltd | Memory access control device |
CN101866364A (en) * | 2010-06-22 | 2010-10-20 | 用友软件股份有限公司 | Data lead-in method and device |
CN104268294A (en) * | 2014-10-24 | 2015-01-07 | 中国建设银行股份有限公司 | Method and device for importing files into database |
CN106919618A (en) * | 2015-12-28 | 2017-07-04 | 航天信息股份有限公司 | Excel data lead-in methods and system |
-
2018
- 2018-06-21 CN CN201810643204.5A patent/CN108984177A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2005011029A (en) * | 2003-06-18 | 2005-01-13 | Matsushita Electric Ind Co Ltd | Memory access control device |
CN101866364A (en) * | 2010-06-22 | 2010-10-20 | 用友软件股份有限公司 | Data lead-in method and device |
CN104268294A (en) * | 2014-10-24 | 2015-01-07 | 中国建设银行股份有限公司 | Method and device for importing files into database |
CN106919618A (en) * | 2015-12-28 | 2017-07-04 | 航天信息股份有限公司 | Excel data lead-in methods and system |
Cited By (18)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110502562A (en) * | 2019-08-16 | 2019-11-26 | 深圳证券交易所 | Data lead-in method and device, readable storage medium storing program for executing |
CN110825920A (en) * | 2019-10-22 | 2020-02-21 | 厦门市美亚柏科信息股份有限公司 | Data processing method and device |
CN110825920B (en) * | 2019-10-22 | 2022-06-10 | 厦门市美亚柏科信息股份有限公司 | Data processing method and device |
CN111209736A (en) * | 2020-01-03 | 2020-05-29 | 恩亿科(北京)数据科技有限公司 | Text file analysis method and device, computer equipment and storage medium |
CN113254437A (en) * | 2020-02-11 | 2021-08-13 | 北京京东振世信息技术有限公司 | Batch processing job processing method and device |
CN113254437B (en) * | 2020-02-11 | 2023-09-01 | 北京京东振世信息技术有限公司 | Batch processing job processing method and device |
CN111506569A (en) * | 2020-03-02 | 2020-08-07 | 平安科技(深圳)有限公司 | Data storage method and device and electronic device |
CN111506569B (en) * | 2020-03-02 | 2024-03-01 | 平安科技(深圳)有限公司 | Data storage method and device and electronic device |
CN111339103B (en) * | 2020-03-13 | 2023-06-20 | 河南安冉云网络科技有限公司 | Data exchange method and system based on full-quantity fragmentation and incremental log analysis |
CN111339103A (en) * | 2020-03-13 | 2020-06-26 | 河南安冉云网络科技有限公司 | Data exchange method and system based on full fragmentation and incremental log analysis |
CN111651514A (en) * | 2020-07-09 | 2020-09-11 | 中国银行股份有限公司 | Data import method and device |
CN112000646B (en) * | 2020-08-25 | 2022-08-02 | 北京浪潮数据技术有限公司 | Database initialization method and device, electronic equipment and storage medium |
CN112000646A (en) * | 2020-08-25 | 2020-11-27 | 北京浪潮数据技术有限公司 | Database initialization method and device, electronic equipment and storage medium |
CN113448585A (en) * | 2020-12-11 | 2021-09-28 | 北京新氧科技有限公司 | Optimization method and device for thread pool, electronic equipment and storage medium |
CN113448585B (en) * | 2020-12-11 | 2024-01-16 | 北京新氧科技有限公司 | Compiling method and device of thread pool, electronic equipment and storage medium |
CN112860631A (en) * | 2021-04-25 | 2021-05-28 | 成都淞幸科技有限责任公司 | Efficient metadata batch configuration method |
CN113824651A (en) * | 2021-11-25 | 2021-12-21 | 上海金仕达软件科技有限公司 | Market data caching method and device, storage medium and electronic equipment |
CN113824651B (en) * | 2021-11-25 | 2022-02-22 | 上海金仕达软件科技有限公司 | Market data caching method and device, storage medium and electronic equipment |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108984177A (en) | A kind of data processing method and system | |
EP3678346B1 (en) | Blockchain smart contract verification method and apparatus, and storage medium | |
JP5730386B2 (en) | Computer system and parallel distributed processing method | |
CN111008521B (en) | Method, device and computer storage medium for generating wide table | |
CN110673839B (en) | Distributed tool configuration construction generation method and system | |
CN110119292A (en) | System operational parameters querying method, matching process, device and node device | |
CN109918375B (en) | Large text storage, indexing and retrieval method based on block chain and distributed storage | |
CN112615945B (en) | Domain name resolution record management method and device, computer equipment and storage medium | |
CN110225039A (en) | Authority models acquisition, method for authenticating, gateway, server and storage medium | |
CN110213290A (en) | Data capture method, API gateway and storage medium | |
CA3142770A1 (en) | Component linkage configuration method, device, computer equipment and storage medium | |
CN109377383A (en) | Product data synchronous method, device, computer equipment and storage medium | |
AU2015331028A1 (en) | Electronic processing system for electronic document and electronic file | |
CN110737432A (en) | script aided design method and device based on root list | |
CN112559525B (en) | Data checking system, method, device and server | |
US20080082516A1 (en) | System for and method of searching distributed data base, and information management device | |
CN116610633A (en) | Electronic file archiving method, device, medium and system in airport construction engineering | |
CN110147350A (en) | File search method, device, electronic equipment and storage medium | |
CN111339042B (en) | Data operation processing method, system and scheduling server | |
CN113342647A (en) | Test data generation method and device | |
CN114153910A (en) | Data acquisition method and device, electronic device and computer program product | |
CN114564208A (en) | Decompiling method of android application program, electronic device and medium | |
CN111400269B (en) | IPFS file processing method, node, medium and equipment | |
US9281994B2 (en) | Processing algebraic expressions for keyed data sets | |
CN109739479A (en) | A kind of front end structure method for implanting and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
CB02 | Change of applicant information | ||
CB02 | Change of applicant information |
Address after: Room 101, floors 1-3, building 14, North District, yard 9, dongran North Street, Haidian District, Beijing 100029 Applicant after: CHINA TOWER Co.,Ltd. Address before: 100142 19th floor, 73 Fucheng Road, Haidian District, Beijing Applicant before: CHINA TOWER Co.,Ltd. |
|
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20181211 |