CN107784049A - The method and apparatus of multi-format document parsing - Google Patents

The method and apparatus of multi-format document parsing Download PDF

Info

Publication number
CN107784049A
CN107784049A CN201611104057.1A CN201611104057A CN107784049A CN 107784049 A CN107784049 A CN 107784049A CN 201611104057 A CN201611104057 A CN 201611104057A CN 107784049 A CN107784049 A CN 107784049A
Authority
CN
China
Prior art keywords
file
storage address
resolved
configuration identifier
format
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201611104057.1A
Other languages
Chinese (zh)
Inventor
洪光宝
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
OneConnect Smart Technology Co Ltd
Original Assignee
OneConnect Financial Technology Co Ltd Shanghai
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by OneConnect Financial Technology Co Ltd Shanghai filed Critical OneConnect Financial Technology Co Ltd Shanghai
Priority to CN201611104057.1A priority Critical patent/CN107784049A/en
Publication of CN107784049A publication Critical patent/CN107784049A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/178Techniques for file synchronisation in file systems
    • G06F16/1794Details of file format conversion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/194Calculation of difference between files

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention proposes a kind of method of multi-format document parsing, and methods described includes:Obtain the storage address of file to be resolved;Configuration identifier corresponding with the file to be resolved is determined according to the storage address;Configuration information corresponding with the configuration identifier is loaded according to the configuration identifier;File format type in the configuration information, which obtains, corresponding with the file format type parses class;The file to be resolved is parsed according to the parsing class and the configuration information.The multi-format document analytic method not only reduces development amount by the way that the fonnat configuring information of file has been carried out into isolation with the parsing class for realizing document analysis, and effectively reduces the cost of later maintenance upgrading.In addition, it is also proposed that a kind of device of multi-format document parsing.

Description

The method and apparatus of multi-format document parsing
Technical field
The present invention relates to computer disposal field, more particularly to a kind of method and apparatus of multi-format document parsing.
Background technology
With the development of online shopping mall, increasing insurance or finance product selection are sold on financial store, Financial store often interacts (opening an account for such as product, applies to purchase, redemption etc.) towards many product sides, product side with financial store Using file interaction, financial mall system needs to parse interactive file to obtain corresponding information, and different The mutual files form of product side is often different, or even the form that uses of different business of identical product side is also different.Tradition Financial mall system the file of different-format is needed each to develop a set of document analysis code (as shown in Figure 4 A), and if Product side changes file format, and financial mall system is also required to corresponding modification code, and recompilates, therefore, traditional For the parsing of different-format, not only development amount is big, and later maintenance upgrade cost is high.
The content of the invention
Based on this, it is necessary to can reduce development amount in view of the above-mentioned problems, proposition is a kind of and reduce maintenance cost The method and apparatus of multi-format document parsing.
A kind of method of multi-format document parsing, methods described include:Obtain the storage address of file to be resolved;According to institute State storage address and determine configuration identifier corresponding with the file to be resolved;According to configuration identifier loading and the configuration identifier Corresponding configuration information;File format type in the configuration information obtains solution corresponding with the file format type Analyse class;The file to be resolved is parsed according to the parsing class and the configuration information.
In one of the embodiments, it is described that configure corresponding with the file to be resolved is determined according to the storage address The step of mark, includes:Product identification corresponding with the storage address is searched according to the storage address;According to the product mark Know and determine configuration identifier corresponding with the file to be resolved.
In one of the embodiments, methods described also includes:The file format change request that product side is sent is received, is carried Take the product identification in the request and corresponding format configuration information;Obtain configuration mark corresponding with the product identification Know;Old fonnat configuring information corresponding with the configuration identifier is obtained, the old fonnat configuring information is replaced with into format and matched somebody with somebody Confidence ceases.
In one of the embodiments, match somebody with somebody corresponding with the file to be resolved is determined according to the storage address described Also include before the step of putting mark:Obtain the filename of file to be resolved;It is described according to the storage address determine with it is described Include corresponding to file to be resolved the step of configuration identifier:Determined and the text to be resolved according to the storage address and filename Configuration identifier corresponding to part.
In one of the embodiments, determined and the file to be resolved according to the storage address and filename described The step of corresponding configuration identifier, includes:According to the storage address and filename lookup and the storage address and filename pair The product identification answered;Configuration identifier corresponding with the file to be resolved is determined according to the product identification.
A kind of device of multi-format document parsing, described device include:Acquisition module, for obtaining depositing for file to be resolved Store up address;Determining module, for determining configuration identifier corresponding with the file to be resolved according to the storage address;Load mould Block, for loading configuration information corresponding with the configuration identifier according to the configuration identifier;Class acquisition module is parsed, for basis File format type in the configuration information obtains parsing class corresponding with the file format type;Parsing module, it is used for The file to be resolved is parsed according to the parsing class and the configuration information.
In one of the embodiments, the determining module includes:Searching modul, for being searched according to the storage address Product identification corresponding with the storage address;Configuration identifier determining module, for determining to treat with described according to the product identification Configuration identifier corresponding to resolution file.
In one of the embodiments, described device also includes:Receiving module, for receiving the tray of product side's transmission Formula change request, extracts the product identification in the request and corresponding format configuration information;Configuration identifier acquisition module, use In acquisition configuration identifier corresponding with the product identification;Replacement module, for obtaining old lattice corresponding with the configuration identifier Formula configuration information, the old fonnat configuring information is replaced with into format configuration information.
In one of the embodiments, the acquisition module is additionally operable to obtain the filename of file to be resolved;The determination Module is additionally operable to determine configuration identifier corresponding with the file to be resolved according to the storage address and filename.
In one of the embodiments, the determining module is additionally operable to according to the storage address and filename lookup and institute Product identification corresponding to storage address and filename is stated, match somebody with somebody corresponding with the file to be resolved is determined according to the product identification Put mark.
The method and apparatus of above-mentioned multi-format document parsing, by obtaining the storage address of file to be resolved, are deposited according to this Store up address and determine configuration identifier corresponding with file to be resolved, match somebody with somebody confidence according to configuration identifier loading is corresponding with the configuration identifier Breath, file format type in configuration information obtain it is corresponding with file format type parses class, according to the parsing class with Configuration information is treated resolution file and parsed.This method is by by the fonnat configuring information of file and the solution for realizing document analysis Analysis class is isolated, and so, document analysis uses same set of code, is realized according to different parsing class cans to not apposition The parsing of formula file, and the modification of subsequent file form only needs to change corresponding configuration file, and without remodifying Code is parsed with compiling, not only reduces development amount, and effectively reduces the cost of later maintenance upgrading.
Brief description of the drawings
Fig. 1 is the internal structure schematic diagram of server in one embodiment;
Fig. 2 is the method flow diagram that multi-format document parses in one embodiment;
Fig. 3 is the schematic diagram of file parsing module in server in one embodiment;
Fig. 4 A are the method schematic diagram that traditional multi-format document parses;
Fig. 4 B are the method schematic diagram that multi-format document parses in one embodiment;
Fig. 5 is the method flow for determining configuration identifier corresponding with file to be resolved in one embodiment according to storage address Figure;
Fig. 6 is the method flow diagram that multi-format document parses in another embodiment;
Fig. 7 is the method flow diagram that multi-format document parses in further embodiment;
Fig. 8 is to determine configuration identifier corresponding with file to be resolved according to storage address and filename in one embodiment Method flow diagram;
Fig. 9 is the apparatus structure block diagram that multi-format document parses in one embodiment;
Figure 10 is the structured flowchart of determining module in one embodiment;
Figure 11 is the apparatus structure block diagram that multi-format document parses in another embodiment.
Embodiment
In order to make the purpose , technical scheme and advantage of the present invention be clearer, it is right below in conjunction with drawings and Examples The present invention is further elaborated.It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, and It is not used in the restriction present invention.
As shown in figure 1, in one embodiment, the internal structure of server 102 is as shown in figure 1, including total by system Processor, non-volatile memory medium, internal memory and the network interface of line connection.Wherein, the non-volatile memory medium includes operation System, database, the device of multi-format document parsing.Database is used for data storage.The device of multi-format document parsing is used In realizing a kind of method of multi-format document parsing, the processor of the server is used to provide calculating and control ability, and support is whole The operation of individual server.The network interface of the server is used to communicate by network connection with the server of outside and terminal, than Such as, mutual files that product side is sent etc. are received.It will be understood by those skilled in the art that the structure shown in Fig. 1, only with The block diagram of the related part-structure of application scheme, the limit for the server being applied thereon to application scheme is not formed Fixed, specific server can include, than more or less parts shown in figure, either combining some parts or with not Same part arrangement.
As shown in Fig. 2 in one embodiment it is proposed that a kind of method of multi-format document parsing, this method include:
Step 202, the storage address of file to be resolved is obtained.
In the present embodiment, the corresponding relation between product side and storage address has been pre-established in the server, has been serviced After device gets the mutual files of product side's transmission, mutual files can be deposited according to the relation between product side and storage address It is put into corresponding storage address.Then server can regularly be gone to obtain the storage of file to be resolved according to program set in advance Address, and then get corresponding file to be resolved.For example the mutual files that server sends the A product sides received are united One is stored in address 1, the mutual files that the B product sides received are sent uniformly is stored in into address 2, the C products that will be received The mutual files just sent are uniformly stored in address 3, the like.The mutual files of different product side are respectively stored in different Position, that is, correspond to different storage address.Server timing goes to obtain the position of file storage to be resolved, then corresponding to acquisition File to be resolved.
Step 204, configuration identifier corresponding with file to be resolved is determined according to storage address.
In the present embodiment, after server gets the storage address of file to be resolved, except waiting to solve corresponding to obtaining Analyse file, it is also necessary to configuration identifier corresponding to file to be resolved is determined according to the storage address, configuration identifier is used to uniquely mark Know the configuration information of a product side, wherein, configuration identifier can be the matching somebody with somebody for unique mark configuration information of system distribution It the number of putting, can also be filename where configuration information, can also be the mark of other unique marks configuration information.Specifically , the relation between file storage address and configuration identifier is prestored in server, according to the storage address got i.e. Corresponding configuration identifier can be found.In another embodiment, it is to have prestored storage address and product in the server Relation between mark, the corresponding relation between product identification and configuration identifier is then stored again, wherein, if a product side All business use same file format, then product identification is used for one product side of unique mark.An if production The different business of product side uses different file formats, then product identification is used for some industry of one product side of unique mark Business.Specifically, first, corresponding product identification is found according to the storage address of file to be resolved, then according to the product mark Know to obtain configuration identifier corresponding with the product identification.
Step 206, configuration information corresponding with the configuration identifier is loaded according to configuration identifier.
In the present embodiment, the corresponding relation between configuration identifier and configuration information has been prestored in server, according to Configuration identifier loads configuration information corresponding with the configuration identifier.Wherein, the form definition of file, bag have been recorded in configuration information Include the Format Type of file, the coding of file, the line Separator of file, quotation marks mark, quotation marks escape character etc..As shown in table 1, it is In one embodiment, 18 attribute and corresponding attribute meaning, possible value and associated exemplary that configuration information includes.Service After device determines configuration identifier corresponding with file to be resolved according to storage address, according to configuration identifier loading and the configuration mark Configuration information corresponding to knowledge, subsequently to be parsed according to the configuration information.
Table 1
Step 208, file format type in configuration information, which obtains, corresponding with this document Format Type parses class.
In the present embodiment, whois lookup extracts the configuration information to after configuration information corresponding with file to be resolved In file format type (fileFormat i.e. in table 1), unified file solution is then passed through according to this document Format Type Analyse abstraction interface and obtain parsing class corresponding with this document Format Type.Specifically, different-format is predefined in code is parsed Parsing class corresponding to type, to handle the file of different file format types, such as, CSV formatted files, CSV f formats, XML format file etc..Different Format Types corresponds to different parsing classes.
Step 210, treat resolution file according to parsing class and configuration information and parsed.
In the present embodiment, after server gets configuration information corresponding with file to be resolved and parsing class, according to Specific format definition information (18 kinds of format definition informations in such as table 1) in the parsing class and configuration information that get is to text Part is parsed, and the file content of parsing is uniformly stored in the list of Value Object (value object), can enter one Content is saved in database information persistence or carries out further business logic processing by step.In the present embodiment, phase identical text The file of part Format Type is parsed using same parsing class, is prestored in the unified file parsing module of server Class is realized in multiple parsings corresponding to different file format types, and corresponding realization is obtained by the abstraction interface of resolution file Class, Fig. 3 are the schematic diagram of unified file parsing module.Such as the default parsing class for CSV forms, for CSV f formats Parsing class, the parsing class for XML format, and the parsing class of extended formatting.It should be noted that file format type is only One of attribute in configuration information, thus different product side even with file format type it is identical, it is corresponding Configuration information it is also different, such as, it is assumed that product side A and B are using CSV forms, but specific configuration information, than Such as, line Separator of the coding of file, file etc. all may be different, so to make an appointment different product side and different business pair The configuration information answered.In the present embodiment, as shown in Figure 4 B, by the way that file format configuration information and the pre- of document analysis will be realized Definition parsing class is isolated, and so, is only needed to develop a set of compiled code in the server, is predefined not in the code With the parsing class of form, specifically, after the mutual files of product side are got, form corresponding with the product side is loaded Configuration information, and the Format Type in the fonnat configuring information being loaded into obtains predefined parsing class, then basis should The parsing class got parses to mutual files.And traditional document analysis method needs the form for different product side A set of document analysis code is each developed, Fig. 4 A are traditional document analysis schematic diagram.
In the present embodiment, by obtaining the storage address of file to be resolved, according to the storage address determine with it is to be resolved Configuration identifier corresponding to file, configuration information corresponding with the configuration identifier is loaded according to configuration identifier, according in configuration information File format type obtain it is corresponding with file format type parse class, according to the parsing class and configuration information to text to be resolved Part is parsed.This method by by the fonnat configuring information of file and realizing that the parsing class of document analysis is isolated, this Sample, document analysis use same set of code, according to parsing of the different parsing class can realizations to different-format file, and after The modification of continuous file format only needs to change corresponding configuration file, and without remodifying and compiling parsing code, no Development amount is only reduced, and effectively reduces the cost of later maintenance upgrading.
As shown in figure 5, in one embodiment, it is described to determine that configuration corresponding with file to be resolved is marked according to storage address The step of knowledge, includes:
Step 204A, product identification corresponding with the storage address is searched according to storage address.
In the present embodiment, product identification is used for unique mark product side and corresponding business, if a product side owns Business use same file format, then product identification is used for one product side of unique mark.An if product side Different business use different file formats, then product identification be used for one product side of unique mark some business.Clothes After business device gets the storage address of file to be resolved, product corresponding with the storage address is searched according to the storage address first Mark.For example what product side A all business used is all same file format, then all mutual files of product side A Product identification corresponding to storage address is unique, such as, product side A product identification is a;Product side B different business pair Answer different file formats, then product identification corresponding to product side B No. 1 business is b-01, product mark corresponding to No. 2 business Know for b-02, the like.
Step 204B, configuration identifier corresponding with file to be resolved is determined according to product identification.
In the present embodiment, the corresponding relation between product identification and configuration identifier is prestored in the server, according to It is true according to the corresponding relation between product identification and configuration identifier after storage address searches product identification corresponding with storage address Fixed configuration identifier corresponding with file to be resolved, configuration identifier are used to uniquely point to a configuration information.An if specifically, production The corresponding product identification in product side, then the mutual files of the product side all correspond to identical configuration identifier, if a product side Different business correspond to the corresponding product identification of different product identifications or some business, other business correspond to another Product identification.
As shown in fig. 6, in one embodiment, the method for above-mentioned multi-format document parsing also includes:
Step 212, the file format change request that product side is sent, the product identification and corresponding in extraction request are received Format configuration information.
In the present embodiment, can be to service when product side is if necessary to change the mutual files form with financial store Device sends file format change request, after server receives the file format change request of product side's transmission, extracts the request In product identification and format configuration information.If all file format in product side is uniformly changed to new form, that Product identification is exactly the mark of the product side, if only changing the form of some business, then product identification is exactly product The mark of the business of side.Format configuration information is exactly the configuration information used after changing.
Step 214, configuration identifier corresponding with product identification is obtained.
In the present embodiment, after server extracts the product identification in change request, obtained first according to the product identification Configuration identifier corresponding with the product identification is taken, configuration identifier is used to uniquely point to a configuration information.
Step 216, old fonnat configuring information corresponding with configuration identifier is obtained, old fonnat configuring information is replaced with into new lattice Formula configuration information.
In the present embodiment, after server gets configuration identifier corresponding with product identification, obtained according to the configuration identifier Take old fonnat configuring information corresponding with the configuration identifier, i.e. configuration identifier corresponding configuration information originally.Then configuration is deleted Old fonnat configuring information in file, replaces with format configuration information, so, just directly establishes configuration identifier and format Corresponding relation between configuration information.Therefore, when the format change of product side, without remodifying and compiling parsing code, The configuration information can for only needing to change in configuration file reaches, and greatly reduces the cost and workload of later maintenance.
In the present embodiment, by having isolated, the form of file defines (i.e. configuration information) and document analysis logic (solves Analyse class), realize respective independence and decouple, so, in exploitation, it is only necessary to a set of code is developed, it is predefined not have to form class Parsing class corresponding to type, development amount is greatly reduced, and the modification of later stage file format also only needs to change corresponding configuration File, so as to effectively reduce the cost of later maintenance upgrading.
As shown in fig. 7, in one embodiment it is proposed that a kind of method of multi-format document parsing, including:
Step 702, the storage address and filename of file to be resolved are obtained.
In the present embodiment, due to the mutual files (file i.e. to be resolved) of different product side be possible to be stored in it is same The storage address of the file to be resolved in place, i.e. different product side is likely to be the same, so except to obtain text to be resolved Outside the storage address of part, it is also necessary to obtain specific filename.Come together according to storage address and filename corresponding to acquisition File to be resolved.Specifically, the corresponding relation between product side and storage address and filename has been pre-established in the server, After the mutual files for receiving the transmission of product side, it can be looked for according to the corresponding relation between product side and storage address and filename Deposited to corresponding storage address and filename, during follow-up timing acquisition file to be resolved, and according to storage address and Filename is searched.
Step 704, configuration identifier corresponding with file to be resolved is determined according to storage address and filename.
In the present embodiment, server get file to be resolved storage address and filename after, it is necessary to according to storage Address and filename determine configuration identifier corresponding with file to be resolved.Configuration identifier is used for one product side of unique mark Configuration information, wherein, configuration identifier can be system distribution for unique mark configuration information config. number, can also match somebody with somebody Filename where confidence breath, can also be the mark of other unique marks configuration information.Specifically, deposited in advance in server The relation between file storage address and filename and configuration identifier is stored up, according to the storage address and filename got Find corresponding configuration identifier.
Step 706, configuration information corresponding with the configuration identifier is loaded according to configuration identifier.
In the present embodiment, the corresponding relation between configuration identifier and configuration information has been prestored in server, according to Configuration identifier loads configuration information corresponding with the configuration identifier.Wherein, the form definition of file, bag have been recorded in configuration information Include the Format Type of file, the coding of file, the line Separator of file, quotation marks mark, quotation marks escape character etc..As shown in table 1, it is In one embodiment, configuration information includes 18 attribute and corresponding attribute meaning, possible value and associated exemplary.Server After configuration identifier corresponding with file to be resolved is determined according to storage address, according to configuration identifier loading and the configuration identifier Corresponding configuration information, subsequently to be parsed according to the configuration information.
Step 708, file format type in configuration information, which obtains, corresponding with this document Format Type parses class.
In the present embodiment, whois lookup extracts the configuration information to after configuration information corresponding with file to be resolved In file format type (fileFormat i.e. in table 1), unified file solution is then passed through according to this document Format Type Analyse abstraction interface and obtain parsing class corresponding with this document Format Type.Specifically, different-format is predefined in code is parsed Parsing class corresponding to type, to handle the file of different file format types, such as, CSV formatted files, CSV f formats, XML format file etc..Different Format Types corresponds to different parsing classes.
Step 710, treat resolution file according to parsing class and configuration information and parsed.
In the present embodiment, after server gets configuration information corresponding with file to be resolved and parsing class, according to Specific format definition information (18 kinds of format definition informations in such as table 1) in the parsing class and configuration information that get is to text Part is parsed, and the file content of parsing is uniformly stored in the list of Value Object (value object), can enter one Content is saved in database information persistence or carries out further business logic processing by step.In the present embodiment, phase identical text The file of part Format Type is parsed using same parsing class, and difference is prestored in the analytic uniform module of server Multiple parsing classes corresponding to file format type, such as, the parsing class for CSV forms is preset, for the solution of CSV f formats Analyse class, the parsing class for XML format, and the parsing class of extended formatting.It should be noted that file format type is simply matched somebody with somebody One of attribute in confidence breath, thus different product side even with file format type it is identical, match somebody with somebody corresponding to it Confidence breath is also different, such as, it is assumed that product side A and B are using CSV forms, but specific configuration information, such as, text The coding of part, line Separator of file etc. all may be different, so to make an appointment corresponding to different product side and different business Configuration information.
As shown in figure 8, in one embodiment, determined and the storage according to the storage address and filename described The step 704 of configuration identifier includes corresponding to address and filename:
Step 704A, according to storage address and filename lookup product identification corresponding with storage address and filename.
In the present embodiment, product identification is used for unique mark product side and corresponding business, if a product side owns Business use same file format, then product identification is used for one product side of unique mark.An if product side Different business use different file formats, then product identification be used for one product side of unique mark some business.Clothes After business device gets storage address and the filename of file to be resolved, deposited first according to the storage address and filename lookup with this Store up product identification corresponding to address and filename.For example what product side A all business used is all same file format, So product identification corresponding to the storage address of all mutual files of product side A and filename is unique, such as, product side A Product identification be a;Product side B different business corresponds to different file formats, then corresponding to product side B No. 1 business Product identification is b-01, and product identification corresponding to No. 2 business is b-02, the like.
Step 704B, configuration identifier corresponding with file to be resolved is determined according to product identification.
In the present embodiment, the corresponding relation between product identification and configuration identifier is prestored in the server, according to It is true according to the corresponding relation between product identification and configuration identifier after storage address searches product identification corresponding with storage address Fixed configuration identifier corresponding with file to be resolved, configuration identifier are used to uniquely point to a configuration information.An if specifically, production The corresponding product identification in product side, then the mutual files of the product side all correspond to identical configuration identifier, if a product side Different business correspond to the corresponding product identification of different product identifications or some business, other business correspond to another Product identification.
As shown in figure 9, in one embodiment it is proposed that a kind of device of multi-format document parsing, the device include:
Acquisition module 902, for obtaining the storage address of file to be resolved.
Determining module 904, for determining configuration identifier corresponding with the file to be resolved according to the storage address.
Load-on module 906, for loading configuration information corresponding with the configuration identifier according to the configuration identifier.
Class acquisition module 908 is parsed, is obtained and the file for the file format type in the configuration information Parsing class corresponding to Format Type.
Parsing module 910, for being parsed according to the parsing class and the configuration information to the file to be resolved.
As shown in Figure 10, in one embodiment, determining module 904 includes:
Searching modul 904A, for searching product identification corresponding with the storage address according to the storage address.
Configuration identifier determining module 904B, for determining match somebody with somebody corresponding with the file to be resolved according to the product identification Put mark.
As shown in figure 11, in one embodiment it is proposed that a kind of device 1100 of multi-format document parsing, except including Above-mentioned module 902-910, in addition to:
Receiving module 912, the file format for receiving the transmission of product side change request, extract the product in the request Mark and corresponding format configuration information;
Configuration identifier acquisition module 914, for obtaining configuration identifier corresponding with the product identification;
Replacement module 916, for obtaining old fonnat configuring information corresponding with the configuration identifier, the old form is matched somebody with somebody Confidence breath replaces with format configuration information.
In one embodiment, the acquisition module 902 is additionally operable to obtain the filename of file to be resolved;Determining module 904 are additionally operable to determine configuration identifier corresponding with the storage address and filename according to the storage address and filename.
In one embodiment, determining module is additionally operable to according to the storage address and filename lookup and the storage Product identification corresponding to location and filename, configuration identifier corresponding with the file to be resolved is determined according to the product identification.
One of ordinary skill in the art will appreciate that realize all or part of flow in above-described embodiment method, being can be with The hardware of correlation is instructed to complete by computer program, the computer program can be stored in a computer-readable storage and be situated between In matter, the program is upon execution, it may include such as the flow of the embodiment of above-mentioned each method.Wherein, foregoing storage medium can be The non-volatile memory mediums such as magnetic disc, CD, read-only memory (Read-Only Memory, ROM), or random storage note Recall body (Random Access Memory, RAM) etc..
Each technical characteristic of embodiment described above can be combined arbitrarily, to make description succinct, not to above-mentioned reality Apply all possible combination of each technical characteristic in example to be all described, as long as however, the combination of these technical characteristics is not deposited In contradiction, the scope that this specification is recorded all is considered to be.
Embodiment described above only expresses the several embodiments of the present invention, and its description is more specific and detailed, but simultaneously Can not therefore it be construed as limiting the scope of the patent.It should be pointed out that come for one of ordinary skill in the art Say, without departing from the inventive concept of the premise, various modifications and improvements can be made, these belong to the protection of the present invention Scope.Therefore, the protection domain of patent of the present invention should be determined by the appended claims.

Claims (10)

1. a kind of method of multi-format document parsing, methods described include:
Obtain the storage address of file to be resolved;
Configuration identifier corresponding with the file to be resolved is determined according to the storage address;
Configuration information corresponding with the configuration identifier is loaded according to the configuration identifier;
File format type in the configuration information, which obtains, corresponding with the file format type parses class;
The file to be resolved is parsed according to the parsing class and the configuration information.
2. according to the method for claim 1, it is characterised in that it is described according to the storage address determine with it is described to be resolved Include corresponding to file the step of configuration identifier:
Product identification corresponding with the storage address is searched according to the storage address;
Configuration identifier corresponding with the file to be resolved is determined according to the product identification.
3. according to the method for claim 2, it is characterised in that methods described also includes:
The file format change request that product side is sent is received, the product identification in the request is extracted and corresponding format is matched somebody with somebody Confidence ceases;
Obtain configuration identifier corresponding with the product identification;
Old fonnat configuring information corresponding with the configuration identifier is obtained, the old fonnat configuring information is replaced with into format and matched somebody with somebody Confidence ceases.
4. according to the method for claim 1, it is characterised in that determine to wait to solve with described according to the storage address described Also include before the step of configuration identifier corresponding to analysis file:
Obtain the filename of file to be resolved;
Described the step of determining configuration identifier corresponding with the file to be resolved according to the storage address, includes:According to described Storage address and filename determine configuration identifier corresponding with the file to be resolved.
5. according to the method for claim 3, it is characterised in that it is described according to the storage address and filename determine with Include corresponding to the file to be resolved the step of configuration identifier:
According to the storage address and filename lookup product identification corresponding with the storage address and filename;
Configuration identifier corresponding with the file to be resolved is determined according to the product identification.
6. a kind of device of multi-format document parsing, it is characterised in that described device includes:
Acquisition module, for obtaining the storage address of file to be resolved;
Determining module, for determining configuration identifier corresponding with the file to be resolved according to the storage address;
Load-on module, for loading configuration information corresponding with the configuration identifier according to the configuration identifier;
Class acquisition module is parsed, is obtained and the file format type for the file format type in the configuration information Corresponding parsing class;
Parsing module, for being parsed according to the parsing class and the configuration information to the file to be resolved.
7. device according to claim 6, it is characterised in that the determining module includes:
Searching modul, for searching product identification corresponding with the storage address according to the storage address;
Configuration identifier determining module, for determining configuration identifier corresponding with the file to be resolved according to the product identification.
8. device according to claim 7, it is characterised in that described device also includes:
Receiving module, the file format for receiving the transmission of product side change request, extract product identification in the request and Corresponding format configuration information;
Configuration identifier acquisition module, for obtaining configuration identifier corresponding with the product identification;
Replacement module, for obtaining old fonnat configuring information corresponding with the configuration identifier, by the old fonnat configuring information Replace with format configuration information.
9. device according to claim 6, it is characterised in that the acquisition module is additionally operable to obtain the text of file to be resolved Part name;
The determining module is additionally operable to determine configure corresponding with the file to be resolved according to the storage address and filename Mark.
10. device according to claim 9, it is characterised in that the determining module is additionally operable to according to the storage address Product identification corresponding with the storage address and filename with filename lookup, determine to treat with described according to the product identification Configuration identifier corresponding to resolution file.
CN201611104057.1A 2016-12-05 2016-12-05 The method and apparatus of multi-format document parsing Pending CN107784049A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201611104057.1A CN107784049A (en) 2016-12-05 2016-12-05 The method and apparatus of multi-format document parsing

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201611104057.1A CN107784049A (en) 2016-12-05 2016-12-05 The method and apparatus of multi-format document parsing

Publications (1)

Publication Number Publication Date
CN107784049A true CN107784049A (en) 2018-03-09

Family

ID=61437454

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201611104057.1A Pending CN107784049A (en) 2016-12-05 2016-12-05 The method and apparatus of multi-format document parsing

Country Status (1)

Country Link
CN (1) CN107784049A (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108804140A (en) * 2018-06-15 2018-11-13 中国建设银行股份有限公司 A kind of batch instruction analytic method, device and equipment
CN109862021A (en) * 2019-02-26 2019-06-07 武汉思普崚技术有限公司 Threaten the acquisition methods and device of information
CN110688828A (en) * 2019-09-20 2020-01-14 京东数字科技控股有限公司 File processing method and device, file processing system and computer equipment
CN111427899A (en) * 2020-03-17 2020-07-17 中国建设银行股份有限公司 Method, device, equipment and computer readable medium for storing file
CN112051999A (en) * 2020-09-03 2020-12-08 中国银行股份有限公司 Method and device for generating configured download file
CN112364206A (en) * 2020-11-12 2021-02-12 广东海启星海洋科技有限公司 Method and device for analyzing and translating multi-format data file
WO2021027592A1 (en) * 2019-08-14 2021-02-18 深圳前海微众银行股份有限公司 File processing method, apparatus, device and computer readable storage medium
CN113010588A (en) * 2019-12-20 2021-06-22 北京国基科技股份有限公司 Data table processing method
CN114640721A (en) * 2022-04-25 2022-06-17 淮南万泰电子股份有限公司 Power communication protocol conversion system based on remote configuration

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101552976A (en) * 2009-04-29 2009-10-07 中兴通讯股份有限公司 Multi-service form file processing system and method
CN102053952A (en) * 2009-11-10 2011-05-11 英华达(上海)电子有限公司 Method and device for converting data format of electronic book and portable electronic book reader
CN103177045A (en) * 2011-12-26 2013-06-26 中国移动通信集团广东有限公司 Text analysis method and text analysis device
CN103873134A (en) * 2014-03-20 2014-06-18 中国空间技术研究院 Subscription method of satellite data compatible with multiple data formats
CN103984773A (en) * 2014-06-05 2014-08-13 南京信息工程大学 Method for converting multi-format weather radar base data file into NetCDF file
US20160139892A1 (en) * 2014-11-14 2016-05-19 Xpliant, Inc. Parser engine programming tool for programmable network devices

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101552976A (en) * 2009-04-29 2009-10-07 中兴通讯股份有限公司 Multi-service form file processing system and method
CN102053952A (en) * 2009-11-10 2011-05-11 英华达(上海)电子有限公司 Method and device for converting data format of electronic book and portable electronic book reader
CN103177045A (en) * 2011-12-26 2013-06-26 中国移动通信集团广东有限公司 Text analysis method and text analysis device
CN103873134A (en) * 2014-03-20 2014-06-18 中国空间技术研究院 Subscription method of satellite data compatible with multiple data formats
CN103984773A (en) * 2014-06-05 2014-08-13 南京信息工程大学 Method for converting multi-format weather radar base data file into NetCDF file
US20160139892A1 (en) * 2014-11-14 2016-05-19 Xpliant, Inc. Parser engine programming tool for programmable network devices

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108804140A (en) * 2018-06-15 2018-11-13 中国建设银行股份有限公司 A kind of batch instruction analytic method, device and equipment
CN108804140B (en) * 2018-06-15 2021-08-13 中国建设银行股份有限公司 Batch instruction analysis method, device and equipment
CN109862021A (en) * 2019-02-26 2019-06-07 武汉思普崚技术有限公司 Threaten the acquisition methods and device of information
CN109862021B (en) * 2019-02-26 2021-08-17 武汉思普崚技术有限公司 Method and device for acquiring threat information
WO2021027592A1 (en) * 2019-08-14 2021-02-18 深圳前海微众银行股份有限公司 File processing method, apparatus, device and computer readable storage medium
CN110688828A (en) * 2019-09-20 2020-01-14 京东数字科技控股有限公司 File processing method and device, file processing system and computer equipment
CN113010588A (en) * 2019-12-20 2021-06-22 北京国基科技股份有限公司 Data table processing method
CN113010588B (en) * 2019-12-20 2023-07-04 北京国基科技股份有限公司 Data form processing method
CN111427899A (en) * 2020-03-17 2020-07-17 中国建设银行股份有限公司 Method, device, equipment and computer readable medium for storing file
CN112051999A (en) * 2020-09-03 2020-12-08 中国银行股份有限公司 Method and device for generating configured download file
CN112051999B (en) * 2020-09-03 2024-04-19 中国银行股份有限公司 Configurable download file generation method and device
CN112364206A (en) * 2020-11-12 2021-02-12 广东海启星海洋科技有限公司 Method and device for analyzing and translating multi-format data file
CN114640721A (en) * 2022-04-25 2022-06-17 淮南万泰电子股份有限公司 Power communication protocol conversion system based on remote configuration

Similar Documents

Publication Publication Date Title
CN107784049A (en) The method and apparatus of multi-format document parsing
US8918447B2 (en) Methods, apparatus, systems and computer readable mediums for use in sharing information between entities
CA3174601C (en) Text intent identifying method, device, computer equipment and storage medium
US8224772B2 (en) Data management apparatus, method and program
US20150113009A1 (en) Method and device for processing file having unknown format
US7899820B2 (en) Apparatus and method for transporting business intelligence objects between business intelligence systems
EP3709189A1 (en) Recommender system for data integration
CN111210842B (en) Voice quality inspection method, device, terminal and computer readable storage medium
US20070067512A1 (en) Method, system and software arrangement for processing a device support file for a field device
CN1526104B (en) Parsing structured data
CN106951430A (en) Account table querying method and device
US9229921B2 (en) Method and system for processing the input in a XML form
WO2003091903A1 (en) System and method for processing of xml documents represented as an event stream
CN111459985A (en) Identification information processing method and device
EP1654708A1 (en) Creating volume images
US11449686B1 (en) Automated evaluation and selection of machine translation protocols
US20130091416A1 (en) Method for establishing a relationship between semantic data and the running of a widget
JP2004118374A (en) Conversion device, conversion method, conversion program and computer-readable recording medium with conversion program recorded
US8290950B2 (en) Identifying locale-specific data based on a total ordering of supported locales
CN104363237B (en) A kind of processing method and its system of the Internet media resource metadata
CN110533456A (en) A kind of coupon information method for pushing, system and server
CN111984267A (en) Method and storage medium for internationalization of multi-version resource storage
CN101923463A (en) Information processing apparatus and method
CN108874944B (en) XSL language transformation-based heterogeneous data mapping system and method
CN103220355B (en) Multi-user configuration method in content distributing network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20180524

Address after: 518000 Room 201, building A, No. 1, Qian Wan Road, Qianhai Shenzhen Hong Kong cooperation zone, Shenzhen, Guangdong (Shenzhen Qianhai business secretary Co., Ltd.)

Applicant after: Shenzhen one ledger Intelligent Technology Co., Ltd.

Address before: 200000 Xuhui District, Shanghai Kai Bin Road 166, 9, 10 level.

Applicant before: Shanghai Financial Technologies Ltd

TA01 Transfer of patent application right
CB02 Change of applicant information

Address after: 518000 Room 201, building A, 1 front Bay Road, Shenzhen Qianhai cooperation zone, Shenzhen, Guangdong

Applicant after: Shenzhen one ledger Intelligent Technology Co., Ltd.

Address before: 518000 Room 201, building A, No. 1, Qian Wan Road, Qianhai Shenzhen Hong Kong cooperation zone, Shenzhen, Guangdong (Shenzhen Qianhai business secretary Co., Ltd.)

Applicant before: Shenzhen one ledger Intelligent Technology Co., Ltd.

CB02 Change of applicant information
RJ01 Rejection of invention patent application after publication

Application publication date: 20180309

RJ01 Rejection of invention patent application after publication