CN104102652B - Unstructured data storage system and method - Google Patents
Unstructured data storage system and method Download PDFInfo
- Publication number
- CN104102652B CN104102652B CN201310118763.1A CN201310118763A CN104102652B CN 104102652 B CN104102652 B CN 104102652B CN 201310118763 A CN201310118763 A CN 201310118763A CN 104102652 B CN104102652 B CN 104102652B
- Authority
- CN
- China
- Prior art keywords
- unstructured data
- xml
- server
- file
- data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000013500 data storage Methods 0.000 title claims abstract description 36
- 238000000034 method Methods 0.000 title claims abstract description 33
- 238000003860 storage Methods 0.000 claims description 33
- 230000008901 benefit Effects 0.000 abstract description 4
- 239000000284 extract Substances 0.000 abstract description 4
- 230000000875 corresponding effect Effects 0.000 description 51
- 238000007726 management method Methods 0.000 description 10
- 230000008859 change Effects 0.000 description 8
- 238000005516 engineering process Methods 0.000 description 5
- 230000008676 import Effects 0.000 description 5
- 230000008569 process Effects 0.000 description 5
- 230000003044 adaptive effect Effects 0.000 description 4
- 238000013461 design Methods 0.000 description 4
- 238000011161 development Methods 0.000 description 4
- 230000006870 function Effects 0.000 description 4
- 230000003993 interaction Effects 0.000 description 4
- 230000000717 retained effect Effects 0.000 description 4
- 230000005540 biological transmission Effects 0.000 description 3
- 238000000605 extraction Methods 0.000 description 3
- 230000006978 adaptation Effects 0.000 description 2
- 238000013075 data extraction Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 235000013399 edible fruits Nutrition 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000012423 maintenance Methods 0.000 description 2
- 230000014759 maintenance of location Effects 0.000 description 2
- 238000004519 manufacturing process Methods 0.000 description 2
- 230000007246 mechanism Effects 0.000 description 2
- 239000000203 mixture Substances 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 238000000547 structure data Methods 0.000 description 2
- 230000004888 barrier function Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 210000004556 brain Anatomy 0.000 description 1
- 230000000295 complement effect Effects 0.000 description 1
- 238000004590 computer program Methods 0.000 description 1
- 238000012937 correction Methods 0.000 description 1
- 230000002596 correlated effect Effects 0.000 description 1
- 230000008878 coupling Effects 0.000 description 1
- 238000010168 coupling process Methods 0.000 description 1
- 238000005859 coupling reaction Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 230000005611 electricity Effects 0.000 description 1
- 239000000835 fiber Substances 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000011835 investigation Methods 0.000 description 1
- 230000002045 lasting effect Effects 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 230000008520 organization Effects 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/80—Information retrieval; Database structures therefor; File system structures therefor of semi-structured data, e.g. markup language structured data such as SGML, XML or HTML
- G06F16/84—Mapping; Conversion
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention provides an unstructured data storage system and an unstructured data storage method. The method comprises the following steps that a source system data server stores enterprise service system feature data; an XML (extensive markup language) generator server generates XML files according to the record features of the source system data server and extracts unstructured data file bodies in the source system data server to be paired with the XML files, and the corresponding relationship between the XML files and the unstructured data file bodies is generated; an XML resolver server resolves the XML files according to a field matching rule, and obtains the corresponding attribute and classification information of the XML files; in addition, the corresponding relationship between the paired XML files and the unstructured data file bodies is stored into the corresponding classification according to the corresponding attribute and classification information of the XML files, and corresponding attributes are given; an unstructured data server stores the corresponding relationship between the paired XML files and the unstructured data file bodies. The system and the method provided by the invention have the advantages that data in each kind of source systems can be imported into the unstructured data storage system according to a certain business rule.
Description
Technical field
The present invention relates to Enterprise Informatization Technology, more particularly to a kind of unstructured data storage system and method.
Background technology
BPM(Business Process Management, i.e. BPM, are a set of to reach enterprise's miscellaneous service
The comprehensive management mode that link is integrated, it generally realizes information transmission, data syn-chronization, business monitoring and enterprise's industry with network mode
The lasting upgrading of flow of being engaged in and optimization)It is the important technology for improving the modern enterprise level of IT application.Described using unified flow
Specification carries out formal definitions to business, in that context it may be convenient to completes the work such as the integrated, reconstruction of information system of enterprise, realizes letter
The clear and definite business separation of breathization system.Aspect is realized in the system of BPM processes, can often be related to the data of multiple service sub-systems
Interaction problems:Complementary system is there may be in multiple business datums and uses different data storage and transmission specification, from
And be to carry out data interaction between system to bring larger obstacle, this is between Legacy System and Legacy System and system newly developed
It is most commonly seen in interaction.To solve such problem, it usually needs develop corresponding data for the data-interface between system and read
System is write, to realize normal data interaction.
Ubiquitous deployment ERP in power industry enterprise(Enterprise Resource Planning, i.e. ERM meter
Draw, be directed to materials and equipment resources management, human resource management, the enterprise that financial resources are managed, the management of information resources is integrated
Management software external member, is the important component of modern enterprise informationization mainstream solution), electric power MS(Management
Information System, i.e. management information system be one artificially to dominate, set using computer software and hardware, network service
Standby and other office equipment, enter collection, transmission, processing, storage, renewal and the maintenance of row information, with corporate strategy it is competing it is excellent, carry
For the purpose of high benefit and efficiency, the integrated man-machine system of the decision of the senior level, middle level control and basic unit's running of enterprise is supported)Deng system
System.Carry out the management of the aspects such as business finance, assets, operation usually using ERP, and using electric power MS carry out two tickets, equipment,
The management of the production tasks such as maintenance.Market has formed more ripe product line, most solutions to said system at home
In business datum use structured storage mode, will be during data deposit in multiple two-dimensional data tables of database.And to industry
Unstructured data in business data(Relative to structural data(Row data, store in database, can use bivariate table knot
Structure carrys out the data of logical expression realization)For, it is impossible to the data referred to as destructuring number represented with database two dimension logical table
According to, mainly including the computer documents of various forms, the form such as including big text, picture, audio, video), then mainly have two kinds
Storage mode:One kind is, in itself as a binary string, database table to be stored in directly as field using unstructured data
Record in;URL that is another then being the storage sensing unstructured data store path in database table(Uniform
Resource Locator, URL), and unstructured data is stored in independent file system in itself.
In electric power enterprise, unstructured document in said system mainly include various kinds of equipment design documentation, contract and
Supporting paper, technical report and examining report, live recording video recording etc., they are generally organized in system flow in the form of annex
In.In general, these annexes cannot be searched directly, also cannot category, attribute be indexed, can only be by searching
The operation flow of association, indirectly obtains relevant information.And the electric power enterprise destructuring related in order to grasp this part producing
Data, it is necessary to set up dedicated for storing and managing the data-storage system of unstructured data, to unstructured data according to
Different attribute dimensions(Such as according to time, device type, manufacturer, significance level etc.)Carry out classified index, with facilitate from
Different angles are scanned for and managed to it.
Under above-mentioned background, how to the unstructured data in original operation flow and Style Product Information Management System and with
Association the extraction that is automated of structured attributes, the flow set up in original system, data record and destructuring number
Corresponding relation according to the non-structured document in storage system just turns into those skilled in the art's problem to be solved.
The extraction of structural data not yet forms general technical specification in prior art business process system, at present more
The method of main flow is the independent data read-write module of exploitation, is led to building the reading and writing data between single origin system and goal systems
Road, using this kind of solution, generally requires following steps:Determine that goal systems carries out unstructured data storage first required
Classification and attribute information, sort out the list of fields that corresponding origin system should be provided;Database is checked, destructuring is determined
The deposit position of body of data, if directly stored in big field mode, carries out unserializing, otherwise according to non-to the field
The store path of structural data body reads body of data;Adaptive tool is developed for specific origin system, in the adaptation work
Source system data storehouse parameter is configured in tool, unstructured data is read respectively from source system data storehouse and is needed the correspondence for extracting
Characteristic field;Adaptive tool invocation target system interface, using origin system extract characteristic according to matched rule as
The attribute of corresponding non-structured document/classification information write-in goal systems database, and according to attribute/classification information by non-structural
Change data write-in goal systems.
The major defect of above-mentioned solution is as follows:Development cost is high:Need to develop a set of independent for each origin system
System adaptation instrument, so that origin system characteristic and goal systems(Unstructured data storage system)Attribute/class malapropism
Section matches;Degree of coupling is high:Source system data is extracted and is adapted to by same with the data write-in of goal systems in the program
Device complete, do not carry out rational functional areas every.Either there is change, or goal systems institute in source system data storage organization
The attribute and classification for using are adjusted, and are required for developing adaptive tool again.During especially in the presence of multiple origin systems,
Again the exploitation of the adjustment of goal systems active system adaptive tool by causing, so as to adapt to the unstructured data after adjustment
Relating attribute;Error correction difficulty is high:Because each adapter directly reads source system data form, the middle text of extraction process is not generated
Part, is tracked once making a mistake and stilling need read-write source system data storehouse, and needs are grasped again from data extraction step
Make, correct relatively costly.
As fully visible, a kind of method for automating and extracting electric power enterprise production service feature data how is designed, will be each
Data in type origin system are directed into unstructured data storage system according to certain business rule, and this is this area
Technical staff's technical barrier urgently to be resolved hurrily.
The content of the invention
The embodiment of the present invention provides a kind of unstructured data storage system and method, by all kinds origin system
Data are directed into unstructured data storage system according to certain business rule.
On the one hand, a kind of unstructured data storage system, the business event characteristic be the embodiment of the invention provides
Include according to storage system:Source system data server, XML generator server, XML parser server, unstructured data
Server, wherein:
The source system data server, for storing enterprise operation system characteristic;
The XML generator server, mutually couples with the source system data server, for according to the origin system number
XML file is generated according to the recording feature of server, and by the unstructured data file sheet in the source system data server
Body is extracted, and is matched with the XML file, and generation XML file is corresponding with unstructured data file body to close
System;
The XML parser server, mutually couples with the XML generator server, for by the XML file according to
Fields match rule is parsed, and obtains the corresponding attribute of the XML file and classification information;And according to the XML file pair
The attribute and classification information answered, by pairing after the corresponding relation of the XML file and unstructured data file body store
Attribute is corresponded to accordingly classifying and assigning;
The unstructured data server, mutually couples with the XML parser server, for storing the institute after pairing
State the corresponding relation of XML file and unstructured data file body.
Optionally, in an embodiment of the present invention, the XML generator server is by the source system data server
Unstructured data file body extracted, including:The source system data server is retrieved, unstructured data is determined
The deposit position of file body;Deposit position according to the unstructured data file body is extracted.
Optionally, in an embodiment of the present invention, the XML generator server is according to the unstructured data file
The deposit position of body is extracted, and is further included:If the unstructured data file of the source system data server
Body is directly stored in big field mode in tables of data, then unserializing is carried out to the big field, otherwise according to non-structural
The store path for changing data file body reads corresponding unstructured data file body.
Optionally, in an embodiment of the present invention, the unstructured data server, is stored in the form of file idol and matched somebody with somebody
The corresponding relation of the XML file and unstructured data file body to after.
Optionally, in an embodiment of the present invention, the XML generator server is according to the source system data server
Recording feature generation XML file in wall scroll record each data field as XML file a node, if note
The record that certain field of record refer in other tables, then using the reference record of this field as present field node child node.
On the other hand, a kind of unstructured data storage method is the embodiment of the invention provides, methods described is applied to enterprise
Industry service feature data-storage system, the system includes:Source system data server, XML generator server, XML parser
Server, unstructured data server, the source system data server, for storing enterprise operation system characteristic;
Wherein, methods described includes:
XML file is generated according to the recording feature of the source system data server by the XML generator server,
And extracted the unstructured data file body in the source system data server, carried out with the XML file
The corresponding relation of pairing, generation XML file and unstructured data file body;
The XML file is parsed according to fields match rule by the XML parser server, is obtained described
The corresponding attribute of XML file and classification information;
According to the corresponding attribute of the XML file and classification information, by pairing after the XML file and destructuring number
Stored to the corresponding classification in the unstructured data server according to the corresponding relation of file body and assign correspondence attribute.
Optionally, in an embodiment of the present invention, the unstructured data by the source system data server
File body is extracted, including:The source system data server is retrieved, the storage of unstructured data file body is determined
Position;Deposit position according to the unstructured data file body is extracted.
Optionally, in an embodiment of the present invention, the deposit position according to the unstructured data file body
Extracted, including:If the unstructured data file body of the source system data server is directly in big field mode
Stored in tables of data, then unserializing is carried out to the big field, otherwise according to the storage of unstructured data file body
Read corresponding unstructured data file body in path.
Optionally, in an embodiment of the present invention, it is described by pairing after the XML file and unstructured data file
The corresponding relation storage of body is to the corresponding classification in the unstructured data server and assigns correspondence attribute, including:With
The even form of file, by pairing after the XML file and unstructured data file body corresponding relation storage to described
Corresponding classification in unstructured data server simultaneously assigns correspondence attribute.
Optionally, in an embodiment of the present invention, it is described by the XML generator server according to the origin system number
XML file is generated according to the recording feature of server, including:By the XML generator server according to the source system data
The recording feature generation XML file of server, wherein, each data field of the wall scroll record in the XML file is used as XML
One node of file, if the record that certain field of record refer in other tables, using the reference record of this field as
The child node of present field node.
Above-mentioned technical proposal has the advantages that:Because using the business event characteristic storage system bag
Include:Source system data server, XML generator server, XML parser server, unstructured data server, wherein:
The source system data server, for storing enterprise operation system characteristic;The XML generator server, it is and described
Source system data server is mutually coupled, for generating XML file according to the recording feature of the source system data server, and will
Unstructured data file body in the source system data server is extracted, and is matched with the XML file,
The corresponding relation of generation XML file and unstructured data file body;The XML parser server, generates with the XML
Device server is mutually coupled, and for the XML file to be parsed according to fields match rule, obtains the XML file correspondence
Attribute and classification information;And according to the corresponding attribute of the XML file and classification information, by pairing after the XML file
Classify and assign correspondingly attribute to corresponding to the storage of the corresponding relation of unstructured data file body;The unstructured data
Server, mutually couples with the XML parser server, for storing the XML file and unstructured data after pairing
The technological means of the corresponding relation of file body, so having reached following technique effect:Only need to develop a set of XML generator,
A set of XML parser, it is possible to meet all types of origin systems to the data of goal systems and import;No matter origin system or mesh
When the data list structure of mark system changes, it is only necessary to change the fields match rule configuration file that XML parser is used, greatly
Reduce development amount greatly;Source system data is extracted to be imported with the data of goal systems and is divided into two independent processes, in
Between data exchange is carried out with standardized XML file, realize the system decoupling of higher degree;The result that data are extracted is used
XML forms corresponding with unstructured data are stored, and importing mistake if there is data can easily according to being retained
Intermediate result investigated and recalled.
Brief description of the drawings
In order to illustrate more clearly about the embodiment of the present invention or technical scheme of the prior art, below will be to embodiment or existing
The accompanying drawing to be used needed for having technology description is briefly described, it should be apparent that, drawings in the following description are only this
Some embodiments of invention, for those of ordinary skill in the art, on the premise of not paying creative work, can be with
Other accompanying drawings are obtained according to these accompanying drawings.
Fig. 1 is a kind of unstructured data storage system composition structural representation of the embodiment of the present invention;
Fig. 2 is a kind of unstructured data storage method flow chart of the embodiment of the present invention;
Fig. 3 is application example system structure diagram of the present invention;
Fig. 4 is the System Operation mechanism schematic flow sheet in application example Fig. 3 of the present invention.
Specific embodiment
Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is carried out clear, complete
Site preparation is described, it is clear that described embodiment is only a part of embodiment of the invention, rather than whole embodiments.It is based on
Embodiment in the present invention, it is every other that those of ordinary skill in the art are obtained under the premise of creative work is not made
Embodiment, belongs to the scope of protection of the invention.
As shown in figure 1, being a kind of unstructured data storage system composition structural representation of the embodiment of the present invention, the enterprise
Industry service feature data-storage system includes:Source system data server 11, XML generator server 12, XML parser service
Device 13, unstructured data server 14, wherein:
The source system data server 11, for storing enterprise operation system characteristic;
The XML generator server 12, mutually couples with the source system data server 11, for according to the source system
The recording feature generation XML file of system data server, and the unstructured data in the source system data server is literary
Part body is extracted, and is matched with the XML file, and generation XML file is right with unstructured data file body
Should be related to;
The XML parser server 13, mutually couples with the XML generator server 12, for by the XML file
Parsed according to fields match rule, obtained the corresponding attribute of the XML file and classification information;And according to XML texts
The corresponding attribute of part and classification information, by pairing after the XML file and unstructured data file body corresponding relation
Store and accordingly classify and assign correspondence attribute;
The unstructured data server 14, mutually couples with the XML parser server 13, after storing pairing
The XML file and unstructured data file body corresponding relation.
Optionally, the XML generator server 12 is by the unstructured data in the source system data server 11
File body is extracted, including:The source system data server is retrieved, the storage of unstructured data file body is determined
Position;Deposit position according to the unstructured data file body is extracted.
Optionally, the XML generator server 12 enters according to the deposit position of the unstructured data file body
Row is extracted, and is further included:If the unstructured data file body of the source system data server is directly with big field
Mode is stored in tables of data, then unserializing is carried out to the big field, otherwise according to unstructured data file body
Store path reads corresponding unstructured data file body.
Optionally, the unstructured data server 14, stores the XML file after matching in the form of file idol
With the corresponding relation of unstructured data file body.
Optionally, the XML generator server 12 is generated according to the recording feature of the source system data server 11
XML file in wall scroll record each data field as XML file a node, if record certain field quote
Record in other tables, then using the reference record of this field as present field node child node.
Corresponding to above method embodiment, as shown in Fig. 2 being a kind of unstructured data storage method of the embodiment of the present invention
Flow chart, methods described is applied to business event characteristic storage system, and the system includes:Source system data server, XML
Maker server, XML parser server, unstructured data server, the source system data server, for storing
Enterprise operation system characteristic;Wherein, methods described includes:
201st, XML is generated according to the recording feature of the source system data server by the XML generator server
File, and the unstructured data file body in the source system data server is extracted, with the XML file
Matched, the corresponding relation of generation XML file and unstructured data file body;
202nd, the XML file is parsed according to fields match rule by the XML parser server, is obtained
The corresponding attribute of the XML file and classification information;
203rd, according to the corresponding attribute of the XML file and classification information, by pairing after the XML file and non-structural
Change the corresponding relation storage of data file body to the corresponding classification in the unstructured data server and assign correspondence category
Property.
Optionally, the unstructured data file body by the source system data server is extracted, bag
Include:The source system data server is retrieved, the deposit position of unstructured data file body is determined;According to the non-structural
The deposit position for changing data file body is extracted.
Optionally, the deposit position according to the unstructured data file body is extracted, including:If institute
The unstructured data file body for stating source system data server is directly stored in big field mode in tables of data, then to institute
Stating big field carries out unserializing, and the store path otherwise according to unstructured data file body reads corresponding destructuring
Data file body.
Optionally, it is described by pairing after the corresponding relation of the XML file and unstructured data file body store
To the corresponding classification in the unstructured data server and assign correspondence attribute, including:In the form of file idol, will match
The XML file afterwards is stored in the unstructured data server with the corresponding relation of unstructured data file body
Corresponding classification and assign correspondence attribute.
Optionally, it is described by the XML generator server according to the recording feature of the source system data server
Generation XML file, including:Given birth to according to the recording feature of the source system data server by the XML generator server
Into XML file, wherein, each data field of wall scroll in XML file record as XML file a node, such as
Certain field of fruit record refer to the record in other tables, then saved the reference record of this field as the son of present field node
Point.
Embodiment of the present invention above-mentioned technical proposal has the advantages that:Because using the business event characteristic
Storage system includes:Source system data server, XML generator server, XML parser server, unstructured data clothes
Business device, wherein:The source system data server, for storing enterprise operation system characteristic;The XML generator service
Device, mutually couples with the source system data server, for generating XML according to the recording feature of the source system data server
File, and the unstructured data file body in the source system data server is extracted, with the XML file
Matched, the corresponding relation of generation XML file and unstructured data file body;The XML parser server, with institute
State XML generator server mutually to couple, for the XML file to be parsed according to fields match rule, obtain the XML
The corresponding attribute of file and classification information;And according to the corresponding attribute of the XML file and classification information, described in after pairing
The corresponding relation storage of XML file and unstructured data file body is classified and assigns correspondingly attribute to corresponding;The non-knot
Structure data server, mutually couples with the XML parser server, for storing the XML file and non-knot after pairing
The technological means of the corresponding relation of structure data file body, so having reached following technique effect:Only need to develop a set of XML
Maker, a set of XML parser, it is possible to meet all types of origin systems to the data of goal systems and import;No matter source system
When system or the data list structure of goal systems change, it is only necessary to change the fields match rule that XML parser uses and match somebody with somebody
File is put, development amount is greatly reduced;Source system data is extracted to be imported with the data of goal systems and is divided into two solely
Vertical step, centre carries out data exchange, realizes the system decoupling of higher degree with standardized XML file;What data were extracted
Result is stored using XML forms corresponding with unstructured data, and importing mistake if there is data can convenient root
Investigated and recalled according to the intermediate result for being retained.
Application example is below lifted to be described in detail:
For the deficiency of prior art, application example scheme of the present invention is by each origin system(Source system data server)
Data pick-up and goal systems(Unstructured data server)Data write-in completed as two independent steps.This hair
It is a data extraction module of all of origin system setting in bright application example(It is arranged in XML generator server, with
Call XML generator in the following text), all characteristics disposably read during the module records source database wall scroll, according to established rule
Generation(It is that every record generation is unique)XML(Extensible Markup Language, you can extending mark language, it
It is a kind of for marking e-file to make it have structural markup language, can be used to flag data, defines data type,
It is a kind of original language for allowing user to be defined the markup language of oneself)Document;Single XML parser is set(It is arranged at
In XML parser server, hereinafter referred to as XML parser), the XML document to the generation of each origin system is parsed, and parsing is tied
In fruit write-in goal systems database, as shown in figure 3, being application example system structure diagram of the present invention.
As shown in figure 4, be the System Operation mechanism schematic flow sheet in application example Fig. 3 of the present invention, including:
401st, start;
402nd, source database record reads;
403rd, all structured field information related to wall scroll record in identification goal systems database, generate source record
Correlated characteristic field XML file;Wherein wall scroll record each data field as XML file a node, if record
Certain field record for refer in other tables, then using this reference record as present field node child node;
404th, judge whether unstructured data file body is stored in tableIf it is, turning 405, deny, turn 406;
If the 405, the unstructured data file body of origin system is directly stored in big field mode in tables of data,
Carry out file body field unserializing;
If the 406, unstructured data file body is not stored in table, file store path is read;
407th, unstructured data file body is read according to path;
408th, XML file and the unstructured data file body pairing extracted are carried out, is imported as goal systems data
Module(That is the XML parser in Fig. 3)Input;
409th, the data import modul of goal systems will analyze the XML file of input, according to fields match rule configuration text
Part extracts the characteristic field for needing to use, as the attribute and classification information of non-structured document, and according to this will be corresponding
Unstructured data storage is to accordingly classifying and write particular community;
410th, goal systems unstructured data write-in;By XML file and unstructured data in the form of file idol
File body is stored, if there is data import mistake can easily according to the intermediate result for being retained carry out investigation with
Backtracking;
411st, terminate.
Application example scheme of the present invention compares existing mainstream technology scheme, is improved in the following aspects:Only need
Develop a set of XML generator, a set of XML parser(It should be noted that XML generator can with the physics realization of XML parser
It is respectively arranged in two servers, or is arranged in same server;Furthermore it is possible to be respectively each origin system individually set
The independent XML generator of meter exploitation, carries out data pick-up respectively, same to complete what the above-mentioned file of application example of the present invention was extracted
Purpose), it is possible to meet all types of origin systems to the data of goal systems and import;No matter origin system or goal systems
When data list structure changes, it is only necessary to change the fields match rule configuration file that XML parser is used, greatly reduce
Development amount;Source system data is extracted to be imported with the data of goal systems and is divided into two independent processes, centre is with standard
The XML file of change carries out data exchange, realizes the system decoupling of higher degree;Data extract result using XML file with
Unstructured data file body is stored in the form of file idol, and importing mistake if there is data can convenient basis
The intermediate result for being retained is investigated and recalled.
Those skilled in the art will also be appreciated that the various illustrative components, blocks that the embodiment of the present invention is listed
(illustrative logical block), unit, and step can be by the knot of electronic hardware, computer software, or both
Conjunction is realized.To clearly show that the replaceability of hardware and software(interchangeability), above-mentioned various explanations
Property part(illustrative components), unit and step universally describe their function.Such work(
It can be the design requirement for realizing depending on specific application and whole system by hardware or software.Those skilled in the art
Can be for every kind of specific application, it is possible to use various methods realize described function, but this realization is understood not to
Beyond the scope of embodiment of the present invention protection.
Various illustrative logical block described in the embodiment of the present invention, or unit, or server can be by logical
With processor, digital signal processor, application specific integrated circuit(ASIC), field programmable gate array or other FPGAs are filled
Put, the design of discrete gate or transistor logic, discrete hardware components, or any of the above described combination is come the work(realized or described by operate
Energy.General processor can be microprocessor, and alternatively, the general processor can also be any traditional processor, control
Device, microcontroller or state machine.Processor can also be realized by the combination of computing device, for example digital signal processor and
Microprocessor, multi-microprocessor, one or more microprocessors combine a Digital Signal Processor Core, or any other class
As configuration realize.
The step of method or algorithm described in the embodiment of the present invention can be directly embedded into hardware, computing device it is soft
Part module or the combination of both.Software module can be stored in RAM memory, flash memory, ROM memory, EPROM storages
Other any form of storage media in device, eeprom memory, register, hard disk, moveable magnetic disc, CD-ROM or this area
In.Exemplarily, storage medium can be connected with processor, to allow that processor reads information from storage medium, and
Write information can be deposited to storage medium.Alternatively, storage medium can also be integrated into processor.Processor and storage medium can
To be arranged in ASIC, ASIC can be arranged in user terminal.Alternatively, processor and storage medium can also be arranged at use
In different part in the terminal of family.
In one or more exemplary designs, above-mentioned functions described by the embodiment of the present invention can be in hardware, soft
Any combination of part, firmware or this three is realized.If realized in software, these functions can be stored and computer-readable
On medium, or it is transmitted on the medium of computer-readable with one or more instructions or code form.Computer readable medium includes electricity
Brain stores medium and is easy to so that allowing computer program to be transferred to other local telecommunication medias from a place.Storage medium can be with
It is that any general or special computer can be with the useable medium of access.For example, such computer readable media can include but
RAM, ROM, EEPROM, CD-ROM or other optical disc storages, disk storage or other magnetic storage devices are not limited to, or other are appointed
What can be used for carrying or store with instruct or data structure and other can be by general or special computer or general or specially treated
Device reads the medium of the program code of form.Additionally, any connection can be properly termed computer readable medium, example
Such as, if software is by a coaxial cable, fiber optic cables, double from web-site, server or other remote resources
Twisted wire, Digital Subscriber Line(DSL)Or with the wireless way for transmitting such as example infrared, wireless and microwave be also contained in it is defined
In computer readable medium.Described disk(disk)And disk(disc)Including Zip disk, radium-shine disk, CD, DVD, floppy disk
And Blu-ray Disc, disk is generally with magnetic duplication data, and disk generally carries out optical reproduction data with laser.Combinations of the above
Can also be included in computer readable medium.
Above-described specific embodiment, has been carried out further to the purpose of the present invention, technical scheme and beneficial effect
Describe in detail, should be understood that and the foregoing is only specific embodiment of the invention, be not intended to limit the present invention
Protection domain, all any modification, equivalent substitution and improvements within the spirit and principles in the present invention, done etc. all should include
Within protection scope of the present invention.
Claims (10)
1. a kind of unstructured data storage system, it is characterised in that the unstructured data storage system includes:Origin system
Data server, XML generator server, XML parser server, unstructured data server, wherein:
The source system data server, for storing enterprise operation system characteristic;
The XML generator server, mutually couples with the source system data server, for being taken according to the source system data
The recording feature generation XML file of business device, and the unstructured data file body in the source system data server is entered
Row is extracted, and is matched with the XML file, the corresponding relation of generation XML file and unstructured data file body;
The XML parser server, mutually couples with the XML generator server, for by the XML file according to field
Matched rule is parsed, and obtains the corresponding attribute of the XML file and classification information;And it is corresponding according to the XML file
Attribute and classification information, by pairing after the XML file and unstructured data file body corresponding relation storage to phase
Should classify and assign correspondence attribute;
The unstructured data server, mutually couples with the XML parser server, described in after storage pairing
The corresponding relation of XML file and unstructured data file body.
2. unstructured data storage system as claimed in claim 1, it is characterised in that
The XML generator server is carried the unstructured data file body in the source system data server
Take, including:The source system data server is retrieved, the deposit position of unstructured data file body is determined;According to described
The deposit position of unstructured data file body is extracted.
3. unstructured data storage system as claimed in claim 2, it is characterised in that
The XML generator server is extracted according to the deposit position of the unstructured data file body, further
Including:If the unstructured data file body of the source system data server is directly in big field mode in tables of data
Storage, then carry out unserializing to the big field, and it is right that the store path otherwise according to unstructured data file body reads
The unstructured data file body answered.
4. unstructured data storage system as claimed in claim 1, it is characterised in that
The unstructured data server, stores the XML file after matching and destructuring number in the form of file idol
According to the corresponding relation of file body.
5. unstructured data storage system as claimed in claim 1, it is characterised in that
List in the XML file that the XML generator server is generated according to the recording feature of the source system data server
Each data field of bar record as XML file a node, if during certain data field of record refer to other tables
Record, then using the reference record of the data field as present field node child node.
6. a kind of unstructured data storage method, it is characterised in that methods described is applied to the storage of business event characteristic
System, the system includes:Source system data server, XML generator server, XML parser server, unstructured data
Server, the source system data server, for storing enterprise operation system characteristic;Wherein, methods described includes:
XML file is generated according to the recording feature of the source system data server by the XML generator server, and will
Unstructured data file body in the source system data server is extracted, and is matched with the XML file,
The corresponding relation of generation XML file and unstructured data file body;
The XML file is parsed according to fields match rule by the XML parser server, is obtained the XML
The corresponding attribute of file and classification information;
According to the corresponding attribute of the XML file and classification information, by pairing after the XML file and unstructured data text
The corresponding relation storage of part body is to the corresponding classification in the unstructured data server and assigns correspondence attribute.
7. unstructured data storage method as claimed in claim 6, it is characterised in that described by the source system data service
Unstructured data file body in device is extracted, including:
The source system data server is retrieved, the deposit position of unstructured data file body is determined;
Deposit position according to the unstructured data file body is extracted.
8. unstructured data storage method as claimed in claim 7, it is characterised in that described according to the unstructured data
The deposit position of file body is extracted, including:
If the unstructured data file body of the source system data server is directly in big field mode in tables of data
Storage, then carry out unserializing to the big field, and it is right that the store path otherwise according to unstructured data file body reads
The unstructured data file body answered.
9. unstructured data storage method as claimed in claim 6, it is characterised in that it is described by pairing after XML texts
Part is stored to the corresponding classification in the unstructured data server simultaneously to the corresponding relation of unstructured data file body
Correspondence attribute is assigned, including:
In the form of file idol, by pairing after the corresponding relation of the XML file and unstructured data file body store
To the corresponding classification in the unstructured data server and assign correspondence attribute.
10. unstructured data storage method as claimed in claim 6, it is characterised in that described to be taken by the XML generator
Business device generates XML file according to the recording feature of the source system data server, including:
XML file is generated according to the recording feature of the source system data server by the XML generator server, its
In, each data field of wall scroll in XML file record as XML file a node, if certain number of record
Refer to the record in other tables according to field, then using the reference record of the data field as present field node child node.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310118763.1A CN104102652B (en) | 2013-04-08 | 2013-04-08 | Unstructured data storage system and method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310118763.1A CN104102652B (en) | 2013-04-08 | 2013-04-08 | Unstructured data storage system and method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN104102652A CN104102652A (en) | 2014-10-15 |
CN104102652B true CN104102652B (en) | 2017-05-24 |
Family
ID=51670811
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201310118763.1A Active CN104102652B (en) | 2013-04-08 | 2013-04-08 | Unstructured data storage system and method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104102652B (en) |
Families Citing this family (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105630903B (en) * | 2015-12-21 | 2020-02-21 | 中国电子科技集团公司第十五研究所 | Method and device for rapidly storing mass data |
CN106055702B (en) * | 2016-06-22 | 2019-12-20 | 西安邮电大学 | Internet-oriented data service unified description method |
CN106547915B (en) * | 2016-11-29 | 2019-10-29 | 上海轻维软件有限公司 | Intelligent data extracting method based on model library |
CN106649863A (en) * | 2016-12-30 | 2017-05-10 | 天津市测绘院 | Non-structured data management method and apparatus |
CN109947705B (en) * | 2017-11-28 | 2021-03-16 | 中国石油化工股份有限公司 | System and method for accessing petroleum engineering data |
CN108470040B (en) * | 2018-02-11 | 2021-03-09 | 中国石油天然气股份有限公司 | Method and device for warehousing unstructured data |
CN108829767A (en) * | 2018-05-29 | 2018-11-16 | 吉贝克信息技术(北京)有限公司 | Data exchange system and its method, apparatus and computer storage medium |
CN109144950B (en) * | 2018-07-20 | 2022-02-15 | 中国邮政储蓄银行股份有限公司 | Service data storage method and device |
CN109805921B (en) * | 2018-12-18 | 2022-03-25 | 深圳小辣椒科技有限责任公司 | Electrocardio data cross-platform sampling method and electrocardio monitoring system |
CN109657184B (en) * | 2018-12-19 | 2020-05-05 | 北京创鑫旅程网络技术有限公司 | Rich text processing method, rich text processing device, server and computer readable medium |
CN111723245B (en) * | 2019-03-18 | 2024-04-26 | 阿里巴巴集团控股有限公司 | Method for establishing association relation of different types of storage objects in data storage system |
CN110765111B (en) * | 2019-10-28 | 2023-03-31 | 深圳市商汤科技有限公司 | Storage and reading method and device, electronic equipment and storage medium |
CN111563065B (en) * | 2020-07-09 | 2020-12-11 | 北京联想协同科技有限公司 | Document storage method and device and computer readable storage medium |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101477532A (en) * | 2008-12-23 | 2009-07-08 | 北京畅游天下网络技术有限公司 | Method, apparatus and system for implementing data storage and access |
CN102156699A (en) * | 2010-02-11 | 2011-08-17 | 陈巍 | Data migration method based on JDOM revolving technology |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7650512B2 (en) * | 2003-11-18 | 2010-01-19 | Oracle International Corporation | Method of and system for searching unstructured data stored in a database |
US20090187581A1 (en) * | 2008-01-22 | 2009-07-23 | Vincent Delisle | Consolidation and association of structured and unstructured data on a computer file system |
-
2013
- 2013-04-08 CN CN201310118763.1A patent/CN104102652B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101477532A (en) * | 2008-12-23 | 2009-07-08 | 北京畅游天下网络技术有限公司 | Method, apparatus and system for implementing data storage and access |
CN102156699A (en) * | 2010-02-11 | 2011-08-17 | 陈巍 | Data migration method based on JDOM revolving technology |
Also Published As
Publication number | Publication date |
---|---|
CN104102652A (en) | 2014-10-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN104102652B (en) | Unstructured data storage system and method | |
CN104516633B (en) | A kind of user interface element management method and device | |
CN104598376B (en) | The layering automatization test system and method for a kind of data-driven | |
CN102364894B (en) | Issuing method for configuration data file and network management equipment | |
Groth et al. | PROV-overview | |
US8412735B2 (en) | Data quality enhancement for smart grid applications | |
CN112396404A (en) | Data center system | |
US9262248B2 (en) | Log configuration of distributed applications | |
CN101320373B (en) | Safety search engine system of website database | |
CN104933101B (en) | A kind of configuration audit information method for automatically counting based on SVN | |
CN104123227A (en) | Method for automatically generating testing cases | |
An et al. | Methodology for automatic ontology generation using database schema information | |
CN102609520B (en) | Method for exporting model data of substation by filtering | |
CN109508355A (en) | A kind of data pick-up method, system and terminal device | |
CN103473672A (en) | System, method and platform for auditing metadata quality of enterprise-level data center | |
CN101957816A (en) | Webpage metadata automatic extraction method and system based on multi-page comparison | |
CN102542513A (en) | Body-based verification tool of power grid public information model and method thereof | |
US8250041B2 (en) | Method and apparatus for propagation of file plans from enterprise retention management applications to records management systems | |
CN109683911A (en) | A kind of system for realizing automation application deployment and impact analysis | |
CN102932195A (en) | Networking protocol analysis-based business analysis monitoring method and system | |
CN109857875A (en) | A kind of electronic record group volume method and system | |
CN106802905A (en) | A kind of synergistic data exchange method of isomorphism PLM system | |
US20130204875A1 (en) | Automatic Configuration Of A Product Data Management System | |
CN106156060B (en) | Tag control system and terminal, label application method and label method for sorting | |
CN115617776A (en) | Data management system and method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |