Embodiment
In order to make the purpose , technical scheme and advantage of the present invention be clearer, it is right below in conjunction with drawings and Examples
The present invention is further elaborated.It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, and
It is not used in the restriction present invention.
As shown in figure 1, in one embodiment, the internal structure of server 102 is as shown in figure 1, including total by system
Processor, non-volatile memory medium, internal memory and the network interface of line connection.Wherein, the non-volatile memory medium includes operation
System, database, the device of multi-format document parsing.Database is used for data storage.The device of multi-format document parsing is used
In realizing a kind of method of multi-format document parsing, the processor of the server is used to provide calculating and control ability, and support is whole
The operation of individual server.The network interface of the server is used to communicate by network connection with the server of outside and terminal, than
Such as, mutual files that product side is sent etc. are received.It will be understood by those skilled in the art that the structure shown in Fig. 1, only with
The block diagram of the related part-structure of application scheme, the limit for the server being applied thereon to application scheme is not formed
Fixed, specific server can include, than more or less parts shown in figure, either combining some parts or with not
Same part arrangement.
As shown in Fig. 2 in one embodiment it is proposed that a kind of method of multi-format document parsing, this method include:
Step 202, the storage address of file to be resolved is obtained.
In the present embodiment, the corresponding relation between product side and storage address has been pre-established in the server, has been serviced
After device gets the mutual files of product side's transmission, mutual files can be deposited according to the relation between product side and storage address
It is put into corresponding storage address.Then server can regularly be gone to obtain the storage of file to be resolved according to program set in advance
Address, and then get corresponding file to be resolved.For example the mutual files that server sends the A product sides received are united
One is stored in address 1, the mutual files that the B product sides received are sent uniformly is stored in into address 2, the C products that will be received
The mutual files just sent are uniformly stored in address 3, the like.The mutual files of different product side are respectively stored in different
Position, that is, correspond to different storage address.Server timing goes to obtain the position of file storage to be resolved, then corresponding to acquisition
File to be resolved.
Step 204, configuration identifier corresponding with file to be resolved is determined according to storage address.
In the present embodiment, after server gets the storage address of file to be resolved, except waiting to solve corresponding to obtaining
Analyse file, it is also necessary to configuration identifier corresponding to file to be resolved is determined according to the storage address, configuration identifier is used to uniquely mark
Know the configuration information of a product side, wherein, configuration identifier can be the matching somebody with somebody for unique mark configuration information of system distribution
It the number of putting, can also be filename where configuration information, can also be the mark of other unique marks configuration information.Specifically
, the relation between file storage address and configuration identifier is prestored in server, according to the storage address got i.e.
Corresponding configuration identifier can be found.In another embodiment, it is to have prestored storage address and product in the server
Relation between mark, the corresponding relation between product identification and configuration identifier is then stored again, wherein, if a product side
All business use same file format, then product identification is used for one product side of unique mark.An if production
The different business of product side uses different file formats, then product identification is used for some industry of one product side of unique mark
Business.Specifically, first, corresponding product identification is found according to the storage address of file to be resolved, then according to the product mark
Know to obtain configuration identifier corresponding with the product identification.
Step 206, configuration information corresponding with the configuration identifier is loaded according to configuration identifier.
In the present embodiment, the corresponding relation between configuration identifier and configuration information has been prestored in server, according to
Configuration identifier loads configuration information corresponding with the configuration identifier.Wherein, the form definition of file, bag have been recorded in configuration information
Include the Format Type of file, the coding of file, the line Separator of file, quotation marks mark, quotation marks escape character etc..As shown in table 1, it is
In one embodiment, 18 attribute and corresponding attribute meaning, possible value and associated exemplary that configuration information includes.Service
After device determines configuration identifier corresponding with file to be resolved according to storage address, according to configuration identifier loading and the configuration mark
Configuration information corresponding to knowledge, subsequently to be parsed according to the configuration information.
Table 1
Step 208, file format type in configuration information, which obtains, corresponding with this document Format Type parses class.
In the present embodiment, whois lookup extracts the configuration information to after configuration information corresponding with file to be resolved
In file format type (fileFormat i.e. in table 1), unified file solution is then passed through according to this document Format Type
Analyse abstraction interface and obtain parsing class corresponding with this document Format Type.Specifically, different-format is predefined in code is parsed
Parsing class corresponding to type, to handle the file of different file format types, such as, CSV formatted files, CSV f formats,
XML format file etc..Different Format Types corresponds to different parsing classes.
Step 210, treat resolution file according to parsing class and configuration information and parsed.
In the present embodiment, after server gets configuration information corresponding with file to be resolved and parsing class, according to
Specific format definition information (18 kinds of format definition informations in such as table 1) in the parsing class and configuration information that get is to text
Part is parsed, and the file content of parsing is uniformly stored in the list of Value Object (value object), can enter one
Content is saved in database information persistence or carries out further business logic processing by step.In the present embodiment, phase identical text
The file of part Format Type is parsed using same parsing class, is prestored in the unified file parsing module of server
Class is realized in multiple parsings corresponding to different file format types, and corresponding realization is obtained by the abstraction interface of resolution file
Class, Fig. 3 are the schematic diagram of unified file parsing module.Such as the default parsing class for CSV forms, for CSV f formats
Parsing class, the parsing class for XML format, and the parsing class of extended formatting.It should be noted that file format type is only
One of attribute in configuration information, thus different product side even with file format type it is identical, it is corresponding
Configuration information it is also different, such as, it is assumed that product side A and B are using CSV forms, but specific configuration information, than
Such as, line Separator of the coding of file, file etc. all may be different, so to make an appointment different product side and different business pair
The configuration information answered.In the present embodiment, as shown in Figure 4 B, by the way that file format configuration information and the pre- of document analysis will be realized
Definition parsing class is isolated, and so, is only needed to develop a set of compiled code in the server, is predefined not in the code
With the parsing class of form, specifically, after the mutual files of product side are got, form corresponding with the product side is loaded
Configuration information, and the Format Type in the fonnat configuring information being loaded into obtains predefined parsing class, then basis should
The parsing class got parses to mutual files.And traditional document analysis method needs the form for different product side
A set of document analysis code is each developed, Fig. 4 A are traditional document analysis schematic diagram.
In the present embodiment, by obtaining the storage address of file to be resolved, according to the storage address determine with it is to be resolved
Configuration identifier corresponding to file, configuration information corresponding with the configuration identifier is loaded according to configuration identifier, according in configuration information
File format type obtain it is corresponding with file format type parse class, according to the parsing class and configuration information to text to be resolved
Part is parsed.This method by by the fonnat configuring information of file and realizing that the parsing class of document analysis is isolated, this
Sample, document analysis use same set of code, according to parsing of the different parsing class can realizations to different-format file, and after
The modification of continuous file format only needs to change corresponding configuration file, and without remodifying and compiling parsing code, no
Development amount is only reduced, and effectively reduces the cost of later maintenance upgrading.
As shown in figure 5, in one embodiment, it is described to determine that configuration corresponding with file to be resolved is marked according to storage address
The step of knowledge, includes:
Step 204A, product identification corresponding with the storage address is searched according to storage address.
In the present embodiment, product identification is used for unique mark product side and corresponding business, if a product side owns
Business use same file format, then product identification is used for one product side of unique mark.An if product side
Different business use different file formats, then product identification be used for one product side of unique mark some business.Clothes
After business device gets the storage address of file to be resolved, product corresponding with the storage address is searched according to the storage address first
Mark.For example what product side A all business used is all same file format, then all mutual files of product side A
Product identification corresponding to storage address is unique, such as, product side A product identification is a;Product side B different business pair
Answer different file formats, then product identification corresponding to product side B No. 1 business is b-01, product mark corresponding to No. 2 business
Know for b-02, the like.
Step 204B, configuration identifier corresponding with file to be resolved is determined according to product identification.
In the present embodiment, the corresponding relation between product identification and configuration identifier is prestored in the server, according to
It is true according to the corresponding relation between product identification and configuration identifier after storage address searches product identification corresponding with storage address
Fixed configuration identifier corresponding with file to be resolved, configuration identifier are used to uniquely point to a configuration information.An if specifically, production
The corresponding product identification in product side, then the mutual files of the product side all correspond to identical configuration identifier, if a product side
Different business correspond to the corresponding product identification of different product identifications or some business, other business correspond to another
Product identification.
As shown in fig. 6, in one embodiment, the method for above-mentioned multi-format document parsing also includes:
Step 212, the file format change request that product side is sent, the product identification and corresponding in extraction request are received
Format configuration information.
In the present embodiment, can be to service when product side is if necessary to change the mutual files form with financial store
Device sends file format change request, after server receives the file format change request of product side's transmission, extracts the request
In product identification and format configuration information.If all file format in product side is uniformly changed to new form, that
Product identification is exactly the mark of the product side, if only changing the form of some business, then product identification is exactly product
The mark of the business of side.Format configuration information is exactly the configuration information used after changing.
Step 214, configuration identifier corresponding with product identification is obtained.
In the present embodiment, after server extracts the product identification in change request, obtained first according to the product identification
Configuration identifier corresponding with the product identification is taken, configuration identifier is used to uniquely point to a configuration information.
Step 216, old fonnat configuring information corresponding with configuration identifier is obtained, old fonnat configuring information is replaced with into new lattice
Formula configuration information.
In the present embodiment, after server gets configuration identifier corresponding with product identification, obtained according to the configuration identifier
Take old fonnat configuring information corresponding with the configuration identifier, i.e. configuration identifier corresponding configuration information originally.Then configuration is deleted
Old fonnat configuring information in file, replaces with format configuration information, so, just directly establishes configuration identifier and format
Corresponding relation between configuration information.Therefore, when the format change of product side, without remodifying and compiling parsing code,
The configuration information can for only needing to change in configuration file reaches, and greatly reduces the cost and workload of later maintenance.
In the present embodiment, by having isolated, the form of file defines (i.e. configuration information) and document analysis logic (solves
Analyse class), realize respective independence and decouple, so, in exploitation, it is only necessary to a set of code is developed, it is predefined not have to form class
Parsing class corresponding to type, development amount is greatly reduced, and the modification of later stage file format also only needs to change corresponding configuration
File, so as to effectively reduce the cost of later maintenance upgrading.
As shown in fig. 7, in one embodiment it is proposed that a kind of method of multi-format document parsing, including:
Step 702, the storage address and filename of file to be resolved are obtained.
In the present embodiment, due to the mutual files (file i.e. to be resolved) of different product side be possible to be stored in it is same
The storage address of the file to be resolved in place, i.e. different product side is likely to be the same, so except to obtain text to be resolved
Outside the storage address of part, it is also necessary to obtain specific filename.Come together according to storage address and filename corresponding to acquisition
File to be resolved.Specifically, the corresponding relation between product side and storage address and filename has been pre-established in the server,
After the mutual files for receiving the transmission of product side, it can be looked for according to the corresponding relation between product side and storage address and filename
Deposited to corresponding storage address and filename, during follow-up timing acquisition file to be resolved, and according to storage address and
Filename is searched.
Step 704, configuration identifier corresponding with file to be resolved is determined according to storage address and filename.
In the present embodiment, server get file to be resolved storage address and filename after, it is necessary to according to storage
Address and filename determine configuration identifier corresponding with file to be resolved.Configuration identifier is used for one product side of unique mark
Configuration information, wherein, configuration identifier can be system distribution for unique mark configuration information config. number, can also match somebody with somebody
Filename where confidence breath, can also be the mark of other unique marks configuration information.Specifically, deposited in advance in server
The relation between file storage address and filename and configuration identifier is stored up, according to the storage address and filename got
Find corresponding configuration identifier.
Step 706, configuration information corresponding with the configuration identifier is loaded according to configuration identifier.
In the present embodiment, the corresponding relation between configuration identifier and configuration information has been prestored in server, according to
Configuration identifier loads configuration information corresponding with the configuration identifier.Wherein, the form definition of file, bag have been recorded in configuration information
Include the Format Type of file, the coding of file, the line Separator of file, quotation marks mark, quotation marks escape character etc..As shown in table 1, it is
In one embodiment, configuration information includes 18 attribute and corresponding attribute meaning, possible value and associated exemplary.Server
After configuration identifier corresponding with file to be resolved is determined according to storage address, according to configuration identifier loading and the configuration identifier
Corresponding configuration information, subsequently to be parsed according to the configuration information.
Step 708, file format type in configuration information, which obtains, corresponding with this document Format Type parses class.
In the present embodiment, whois lookup extracts the configuration information to after configuration information corresponding with file to be resolved
In file format type (fileFormat i.e. in table 1), unified file solution is then passed through according to this document Format Type
Analyse abstraction interface and obtain parsing class corresponding with this document Format Type.Specifically, different-format is predefined in code is parsed
Parsing class corresponding to type, to handle the file of different file format types, such as, CSV formatted files, CSV f formats,
XML format file etc..Different Format Types corresponds to different parsing classes.
Step 710, treat resolution file according to parsing class and configuration information and parsed.
In the present embodiment, after server gets configuration information corresponding with file to be resolved and parsing class, according to
Specific format definition information (18 kinds of format definition informations in such as table 1) in the parsing class and configuration information that get is to text
Part is parsed, and the file content of parsing is uniformly stored in the list of Value Object (value object), can enter one
Content is saved in database information persistence or carries out further business logic processing by step.In the present embodiment, phase identical text
The file of part Format Type is parsed using same parsing class, and difference is prestored in the analytic uniform module of server
Multiple parsing classes corresponding to file format type, such as, the parsing class for CSV forms is preset, for the solution of CSV f formats
Analyse class, the parsing class for XML format, and the parsing class of extended formatting.It should be noted that file format type is simply matched somebody with somebody
One of attribute in confidence breath, thus different product side even with file format type it is identical, match somebody with somebody corresponding to it
Confidence breath is also different, such as, it is assumed that product side A and B are using CSV forms, but specific configuration information, such as, text
The coding of part, line Separator of file etc. all may be different, so to make an appointment corresponding to different product side and different business
Configuration information.
As shown in figure 8, in one embodiment, determined and the storage according to the storage address and filename described
The step 704 of configuration identifier includes corresponding to address and filename:
Step 704A, according to storage address and filename lookup product identification corresponding with storage address and filename.
In the present embodiment, product identification is used for unique mark product side and corresponding business, if a product side owns
Business use same file format, then product identification is used for one product side of unique mark.An if product side
Different business use different file formats, then product identification be used for one product side of unique mark some business.Clothes
After business device gets storage address and the filename of file to be resolved, deposited first according to the storage address and filename lookup with this
Store up product identification corresponding to address and filename.For example what product side A all business used is all same file format,
So product identification corresponding to the storage address of all mutual files of product side A and filename is unique, such as, product side A
Product identification be a;Product side B different business corresponds to different file formats, then corresponding to product side B No. 1 business
Product identification is b-01, and product identification corresponding to No. 2 business is b-02, the like.
Step 704B, configuration identifier corresponding with file to be resolved is determined according to product identification.
In the present embodiment, the corresponding relation between product identification and configuration identifier is prestored in the server, according to
It is true according to the corresponding relation between product identification and configuration identifier after storage address searches product identification corresponding with storage address
Fixed configuration identifier corresponding with file to be resolved, configuration identifier are used to uniquely point to a configuration information.An if specifically, production
The corresponding product identification in product side, then the mutual files of the product side all correspond to identical configuration identifier, if a product side
Different business correspond to the corresponding product identification of different product identifications or some business, other business correspond to another
Product identification.
As shown in figure 9, in one embodiment it is proposed that a kind of device of multi-format document parsing, the device include:
Acquisition module 902, for obtaining the storage address of file to be resolved.
Determining module 904, for determining configuration identifier corresponding with the file to be resolved according to the storage address.
Load-on module 906, for loading configuration information corresponding with the configuration identifier according to the configuration identifier.
Class acquisition module 908 is parsed, is obtained and the file for the file format type in the configuration information
Parsing class corresponding to Format Type.
Parsing module 910, for being parsed according to the parsing class and the configuration information to the file to be resolved.
As shown in Figure 10, in one embodiment, determining module 904 includes:
Searching modul 904A, for searching product identification corresponding with the storage address according to the storage address.
Configuration identifier determining module 904B, for determining match somebody with somebody corresponding with the file to be resolved according to the product identification
Put mark.
As shown in figure 11, in one embodiment it is proposed that a kind of device 1100 of multi-format document parsing, except including
Above-mentioned module 902-910, in addition to:
Receiving module 912, the file format for receiving the transmission of product side change request, extract the product in the request
Mark and corresponding format configuration information;
Configuration identifier acquisition module 914, for obtaining configuration identifier corresponding with the product identification;
Replacement module 916, for obtaining old fonnat configuring information corresponding with the configuration identifier, the old form is matched somebody with somebody
Confidence breath replaces with format configuration information.
In one embodiment, the acquisition module 902 is additionally operable to obtain the filename of file to be resolved;Determining module
904 are additionally operable to determine configuration identifier corresponding with the storage address and filename according to the storage address and filename.
In one embodiment, determining module is additionally operable to according to the storage address and filename lookup and the storage
Product identification corresponding to location and filename, configuration identifier corresponding with the file to be resolved is determined according to the product identification.
One of ordinary skill in the art will appreciate that realize all or part of flow in above-described embodiment method, being can be with
The hardware of correlation is instructed to complete by computer program, the computer program can be stored in a computer-readable storage and be situated between
In matter, the program is upon execution, it may include such as the flow of the embodiment of above-mentioned each method.Wherein, foregoing storage medium can be
The non-volatile memory mediums such as magnetic disc, CD, read-only memory (Read-Only Memory, ROM), or random storage note
Recall body (Random Access Memory, RAM) etc..
Each technical characteristic of embodiment described above can be combined arbitrarily, to make description succinct, not to above-mentioned reality
Apply all possible combination of each technical characteristic in example to be all described, as long as however, the combination of these technical characteristics is not deposited
In contradiction, the scope that this specification is recorded all is considered to be.
Embodiment described above only expresses the several embodiments of the present invention, and its description is more specific and detailed, but simultaneously
Can not therefore it be construed as limiting the scope of the patent.It should be pointed out that come for one of ordinary skill in the art
Say, without departing from the inventive concept of the premise, various modifications and improvements can be made, these belong to the protection of the present invention
Scope.Therefore, the protection domain of patent of the present invention should be determined by the appended claims.