CN104102652B - Unstructured data storage system and method - Google Patents

Unstructured data storage system and method Download PDF

Info

Publication number
CN104102652B
CN104102652B CN201310118763.1A CN201310118763A CN104102652B CN 104102652 B CN104102652 B CN 104102652B CN 201310118763 A CN201310118763 A CN 201310118763A CN 104102652 B CN104102652 B CN 104102652B
Authority
CN
China
Prior art keywords
unstructured data
xml
server
file
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201310118763.1A
Other languages
Chinese (zh)
Other versions
CN104102652A (en
Inventor
徐小天
王刚
陈威
石磊
陈乐然
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
State Grid Corp of China SGCC
North China Electric Power Research Institute Co Ltd
Original Assignee
State Grid Corp of China SGCC
North China Electric Power Research Institute Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by State Grid Corp of China SGCC, North China Electric Power Research Institute Co Ltd filed Critical State Grid Corp of China SGCC
Priority to CN201310118763.1A priority Critical patent/CN104102652B/en
Publication of CN104102652A publication Critical patent/CN104102652A/en
Application granted granted Critical
Publication of CN104102652B publication Critical patent/CN104102652B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/80Information retrieval; Database structures therefor; File system structures therefor of semi-structured data, e.g. markup language structured data such as SGML, XML or HTML
    • G06F16/84Mapping; Conversion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides an unstructured data storage system and an unstructured data storage method. The method comprises the following steps that a source system data server stores enterprise service system feature data; an XML (extensive markup language) generator server generates XML files according to the record features of the source system data server and extracts unstructured data file bodies in the source system data server to be paired with the XML files, and the corresponding relationship between the XML files and the unstructured data file bodies is generated; an XML resolver server resolves the XML files according to a field matching rule, and obtains the corresponding attribute and classification information of the XML files; in addition, the corresponding relationship between the paired XML files and the unstructured data file bodies is stored into the corresponding classification according to the corresponding attribute and classification information of the XML files, and corresponding attributes are given; an unstructured data server stores the corresponding relationship between the paired XML files and the unstructured data file bodies. The system and the method provided by the invention have the advantages that data in each kind of source systems can be imported into the unstructured data storage system according to a certain business rule.

Description

A kind of unstructured data storage system and method
Technical field
The present invention relates to Enterprise Informatization Technology, more particularly to a kind of unstructured data storage system and method.
Background technology
BPM(Business Process Management, i.e. BPM, are a set of to reach enterprise's miscellaneous service The comprehensive management mode that link is integrated, it generally realizes information transmission, data syn-chronization, business monitoring and enterprise's industry with network mode The lasting upgrading of flow of being engaged in and optimization)It is the important technology for improving the modern enterprise level of IT application.Described using unified flow Specification carries out formal definitions to business, in that context it may be convenient to completes the work such as the integrated, reconstruction of information system of enterprise, realizes letter The clear and definite business separation of breathization system.Aspect is realized in the system of BPM processes, can often be related to the data of multiple service sub-systems Interaction problems:Complementary system is there may be in multiple business datums and uses different data storage and transmission specification, from And be to carry out data interaction between system to bring larger obstacle, this is between Legacy System and Legacy System and system newly developed It is most commonly seen in interaction.To solve such problem, it usually needs develop corresponding data for the data-interface between system and read System is write, to realize normal data interaction.
Ubiquitous deployment ERP in power industry enterprise(Enterprise Resource Planning, i.e. ERM meter Draw, be directed to materials and equipment resources management, human resource management, the enterprise that financial resources are managed, the management of information resources is integrated Management software external member, is the important component of modern enterprise informationization mainstream solution), electric power MS(Management Information System, i.e. management information system be one artificially to dominate, set using computer software and hardware, network service Standby and other office equipment, enter collection, transmission, processing, storage, renewal and the maintenance of row information, with corporate strategy it is competing it is excellent, carry For the purpose of high benefit and efficiency, the integrated man-machine system of the decision of the senior level, middle level control and basic unit's running of enterprise is supported)Deng system System.Carry out the management of the aspects such as business finance, assets, operation usually using ERP, and using electric power MS carry out two tickets, equipment, The management of the production tasks such as maintenance.Market has formed more ripe product line, most solutions to said system at home In business datum use structured storage mode, will be during data deposit in multiple two-dimensional data tables of database.And to industry Unstructured data in business data(Relative to structural data(Row data, store in database, can use bivariate table knot Structure carrys out the data of logical expression realization)For, it is impossible to the data referred to as destructuring number represented with database two dimension logical table According to, mainly including the computer documents of various forms, the form such as including big text, picture, audio, video), then mainly have two kinds Storage mode:One kind is, in itself as a binary string, database table to be stored in directly as field using unstructured data Record in;URL that is another then being the storage sensing unstructured data store path in database table(Uniform Resource Locator, URL), and unstructured data is stored in independent file system in itself.
In electric power enterprise, unstructured document in said system mainly include various kinds of equipment design documentation, contract and Supporting paper, technical report and examining report, live recording video recording etc., they are generally organized in system flow in the form of annex In.In general, these annexes cannot be searched directly, also cannot category, attribute be indexed, can only be by searching The operation flow of association, indirectly obtains relevant information.And the electric power enterprise destructuring related in order to grasp this part producing Data, it is necessary to set up dedicated for storing and managing the data-storage system of unstructured data, to unstructured data according to Different attribute dimensions(Such as according to time, device type, manufacturer, significance level etc.)Carry out classified index, with facilitate from Different angles are scanned for and managed to it.
Under above-mentioned background, how to the unstructured data in original operation flow and Style Product Information Management System and with Association the extraction that is automated of structured attributes, the flow set up in original system, data record and destructuring number Corresponding relation according to the non-structured document in storage system just turns into those skilled in the art's problem to be solved.
The extraction of structural data not yet forms general technical specification in prior art business process system, at present more The method of main flow is the independent data read-write module of exploitation, is led to building the reading and writing data between single origin system and goal systems Road, using this kind of solution, generally requires following steps:Determine that goal systems carries out unstructured data storage first required Classification and attribute information, sort out the list of fields that corresponding origin system should be provided;Database is checked, destructuring is determined The deposit position of body of data, if directly stored in big field mode, carries out unserializing, otherwise according to non-to the field The store path of structural data body reads body of data;Adaptive tool is developed for specific origin system, in the adaptation work Source system data storehouse parameter is configured in tool, unstructured data is read respectively from source system data storehouse and is needed the correspondence for extracting Characteristic field;Adaptive tool invocation target system interface, using origin system extract characteristic according to matched rule as The attribute of corresponding non-structured document/classification information write-in goal systems database, and according to attribute/classification information by non-structural Change data write-in goal systems.
The major defect of above-mentioned solution is as follows:Development cost is high:Need to develop a set of independent for each origin system System adaptation instrument, so that origin system characteristic and goal systems(Unstructured data storage system)Attribute/class malapropism Section matches;Degree of coupling is high:Source system data is extracted and is adapted to by same with the data write-in of goal systems in the program Device complete, do not carry out rational functional areas every.Either there is change, or goal systems institute in source system data storage organization The attribute and classification for using are adjusted, and are required for developing adaptive tool again.During especially in the presence of multiple origin systems, Again the exploitation of the adjustment of goal systems active system adaptive tool by causing, so as to adapt to the unstructured data after adjustment Relating attribute;Error correction difficulty is high:Because each adapter directly reads source system data form, the middle text of extraction process is not generated Part, is tracked once making a mistake and stilling need read-write source system data storehouse, and needs are grasped again from data extraction step Make, correct relatively costly.
As fully visible, a kind of method for automating and extracting electric power enterprise production service feature data how is designed, will be each Data in type origin system are directed into unstructured data storage system according to certain business rule, and this is this area Technical staff's technical barrier urgently to be resolved hurrily.
The content of the invention
The embodiment of the present invention provides a kind of unstructured data storage system and method, by all kinds origin system Data are directed into unstructured data storage system according to certain business rule.
On the one hand, a kind of unstructured data storage system, the business event characteristic be the embodiment of the invention provides Include according to storage system:Source system data server, XML generator server, XML parser server, unstructured data Server, wherein:
The source system data server, for storing enterprise operation system characteristic;
The XML generator server, mutually couples with the source system data server, for according to the origin system number XML file is generated according to the recording feature of server, and by the unstructured data file sheet in the source system data server Body is extracted, and is matched with the XML file, and generation XML file is corresponding with unstructured data file body to close System;
The XML parser server, mutually couples with the XML generator server, for by the XML file according to Fields match rule is parsed, and obtains the corresponding attribute of the XML file and classification information;And according to the XML file pair The attribute and classification information answered, by pairing after the corresponding relation of the XML file and unstructured data file body store Attribute is corresponded to accordingly classifying and assigning;
The unstructured data server, mutually couples with the XML parser server, for storing the institute after pairing State the corresponding relation of XML file and unstructured data file body.
Optionally, in an embodiment of the present invention, the XML generator server is by the source system data server Unstructured data file body extracted, including:The source system data server is retrieved, unstructured data is determined The deposit position of file body;Deposit position according to the unstructured data file body is extracted.
Optionally, in an embodiment of the present invention, the XML generator server is according to the unstructured data file The deposit position of body is extracted, and is further included:If the unstructured data file of the source system data server Body is directly stored in big field mode in tables of data, then unserializing is carried out to the big field, otherwise according to non-structural The store path for changing data file body reads corresponding unstructured data file body.
Optionally, in an embodiment of the present invention, the unstructured data server, is stored in the form of file idol and matched somebody with somebody The corresponding relation of the XML file and unstructured data file body to after.
Optionally, in an embodiment of the present invention, the XML generator server is according to the source system data server Recording feature generation XML file in wall scroll record each data field as XML file a node, if note The record that certain field of record refer in other tables, then using the reference record of this field as present field node child node.
On the other hand, a kind of unstructured data storage method is the embodiment of the invention provides, methods described is applied to enterprise Industry service feature data-storage system, the system includes:Source system data server, XML generator server, XML parser Server, unstructured data server, the source system data server, for storing enterprise operation system characteristic; Wherein, methods described includes:
XML file is generated according to the recording feature of the source system data server by the XML generator server, And extracted the unstructured data file body in the source system data server, carried out with the XML file The corresponding relation of pairing, generation XML file and unstructured data file body;
The XML file is parsed according to fields match rule by the XML parser server, is obtained described The corresponding attribute of XML file and classification information;
According to the corresponding attribute of the XML file and classification information, by pairing after the XML file and destructuring number Stored to the corresponding classification in the unstructured data server according to the corresponding relation of file body and assign correspondence attribute.
Optionally, in an embodiment of the present invention, the unstructured data by the source system data server File body is extracted, including:The source system data server is retrieved, the storage of unstructured data file body is determined Position;Deposit position according to the unstructured data file body is extracted.
Optionally, in an embodiment of the present invention, the deposit position according to the unstructured data file body Extracted, including:If the unstructured data file body of the source system data server is directly in big field mode Stored in tables of data, then unserializing is carried out to the big field, otherwise according to the storage of unstructured data file body Read corresponding unstructured data file body in path.
Optionally, in an embodiment of the present invention, it is described by pairing after the XML file and unstructured data file The corresponding relation storage of body is to the corresponding classification in the unstructured data server and assigns correspondence attribute, including:With The even form of file, by pairing after the XML file and unstructured data file body corresponding relation storage to described Corresponding classification in unstructured data server simultaneously assigns correspondence attribute.
Optionally, in an embodiment of the present invention, it is described by the XML generator server according to the origin system number XML file is generated according to the recording feature of server, including:By the XML generator server according to the source system data The recording feature generation XML file of server, wherein, each data field of the wall scroll record in the XML file is used as XML One node of file, if the record that certain field of record refer in other tables, using the reference record of this field as The child node of present field node.
Above-mentioned technical proposal has the advantages that:Because using the business event characteristic storage system bag Include:Source system data server, XML generator server, XML parser server, unstructured data server, wherein: The source system data server, for storing enterprise operation system characteristic;The XML generator server, it is and described Source system data server is mutually coupled, for generating XML file according to the recording feature of the source system data server, and will Unstructured data file body in the source system data server is extracted, and is matched with the XML file, The corresponding relation of generation XML file and unstructured data file body;The XML parser server, generates with the XML Device server is mutually coupled, and for the XML file to be parsed according to fields match rule, obtains the XML file correspondence Attribute and classification information;And according to the corresponding attribute of the XML file and classification information, by pairing after the XML file Classify and assign correspondingly attribute to corresponding to the storage of the corresponding relation of unstructured data file body;The unstructured data Server, mutually couples with the XML parser server, for storing the XML file and unstructured data after pairing The technological means of the corresponding relation of file body, so having reached following technique effect:Only need to develop a set of XML generator, A set of XML parser, it is possible to meet all types of origin systems to the data of goal systems and import;No matter origin system or mesh When the data list structure of mark system changes, it is only necessary to change the fields match rule configuration file that XML parser is used, greatly Reduce development amount greatly;Source system data is extracted to be imported with the data of goal systems and is divided into two independent processes, in Between data exchange is carried out with standardized XML file, realize the system decoupling of higher degree;The result that data are extracted is used XML forms corresponding with unstructured data are stored, and importing mistake if there is data can easily according to being retained Intermediate result investigated and recalled.
Brief description of the drawings
In order to illustrate more clearly about the embodiment of the present invention or technical scheme of the prior art, below will be to embodiment or existing The accompanying drawing to be used needed for having technology description is briefly described, it should be apparent that, drawings in the following description are only this Some embodiments of invention, for those of ordinary skill in the art, on the premise of not paying creative work, can be with Other accompanying drawings are obtained according to these accompanying drawings.
Fig. 1 is a kind of unstructured data storage system composition structural representation of the embodiment of the present invention;
Fig. 2 is a kind of unstructured data storage method flow chart of the embodiment of the present invention;
Fig. 3 is application example system structure diagram of the present invention;
Fig. 4 is the System Operation mechanism schematic flow sheet in application example Fig. 3 of the present invention.
Specific embodiment
Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is carried out clear, complete Site preparation is described, it is clear that described embodiment is only a part of embodiment of the invention, rather than whole embodiments.It is based on Embodiment in the present invention, it is every other that those of ordinary skill in the art are obtained under the premise of creative work is not made Embodiment, belongs to the scope of protection of the invention.
As shown in figure 1, being a kind of unstructured data storage system composition structural representation of the embodiment of the present invention, the enterprise Industry service feature data-storage system includes:Source system data server 11, XML generator server 12, XML parser service Device 13, unstructured data server 14, wherein:
The source system data server 11, for storing enterprise operation system characteristic;
The XML generator server 12, mutually couples with the source system data server 11, for according to the source system The recording feature generation XML file of system data server, and the unstructured data in the source system data server is literary Part body is extracted, and is matched with the XML file, and generation XML file is right with unstructured data file body Should be related to;
The XML parser server 13, mutually couples with the XML generator server 12, for by the XML file Parsed according to fields match rule, obtained the corresponding attribute of the XML file and classification information;And according to XML texts The corresponding attribute of part and classification information, by pairing after the XML file and unstructured data file body corresponding relation Store and accordingly classify and assign correspondence attribute;
The unstructured data server 14, mutually couples with the XML parser server 13, after storing pairing The XML file and unstructured data file body corresponding relation.
Optionally, the XML generator server 12 is by the unstructured data in the source system data server 11 File body is extracted, including:The source system data server is retrieved, the storage of unstructured data file body is determined Position;Deposit position according to the unstructured data file body is extracted.
Optionally, the XML generator server 12 enters according to the deposit position of the unstructured data file body Row is extracted, and is further included:If the unstructured data file body of the source system data server is directly with big field Mode is stored in tables of data, then unserializing is carried out to the big field, otherwise according to unstructured data file body Store path reads corresponding unstructured data file body.
Optionally, the unstructured data server 14, stores the XML file after matching in the form of file idol With the corresponding relation of unstructured data file body.
Optionally, the XML generator server 12 is generated according to the recording feature of the source system data server 11 XML file in wall scroll record each data field as XML file a node, if record certain field quote Record in other tables, then using the reference record of this field as present field node child node.
Corresponding to above method embodiment, as shown in Fig. 2 being a kind of unstructured data storage method of the embodiment of the present invention Flow chart, methods described is applied to business event characteristic storage system, and the system includes:Source system data server, XML Maker server, XML parser server, unstructured data server, the source system data server, for storing Enterprise operation system characteristic;Wherein, methods described includes:
201st, XML is generated according to the recording feature of the source system data server by the XML generator server File, and the unstructured data file body in the source system data server is extracted, with the XML file Matched, the corresponding relation of generation XML file and unstructured data file body;
202nd, the XML file is parsed according to fields match rule by the XML parser server, is obtained The corresponding attribute of the XML file and classification information;
203rd, according to the corresponding attribute of the XML file and classification information, by pairing after the XML file and non-structural Change the corresponding relation storage of data file body to the corresponding classification in the unstructured data server and assign correspondence category Property.
Optionally, the unstructured data file body by the source system data server is extracted, bag Include:The source system data server is retrieved, the deposit position of unstructured data file body is determined;According to the non-structural The deposit position for changing data file body is extracted.
Optionally, the deposit position according to the unstructured data file body is extracted, including:If institute The unstructured data file body for stating source system data server is directly stored in big field mode in tables of data, then to institute Stating big field carries out unserializing, and the store path otherwise according to unstructured data file body reads corresponding destructuring Data file body.
Optionally, it is described by pairing after the corresponding relation of the XML file and unstructured data file body store To the corresponding classification in the unstructured data server and assign correspondence attribute, including:In the form of file idol, will match The XML file afterwards is stored in the unstructured data server with the corresponding relation of unstructured data file body Corresponding classification and assign correspondence attribute.
Optionally, it is described by the XML generator server according to the recording feature of the source system data server Generation XML file, including:Given birth to according to the recording feature of the source system data server by the XML generator server Into XML file, wherein, each data field of wall scroll in XML file record as XML file a node, such as Certain field of fruit record refer to the record in other tables, then saved the reference record of this field as the son of present field node Point.
Embodiment of the present invention above-mentioned technical proposal has the advantages that:Because using the business event characteristic Storage system includes:Source system data server, XML generator server, XML parser server, unstructured data clothes Business device, wherein:The source system data server, for storing enterprise operation system characteristic;The XML generator service Device, mutually couples with the source system data server, for generating XML according to the recording feature of the source system data server File, and the unstructured data file body in the source system data server is extracted, with the XML file Matched, the corresponding relation of generation XML file and unstructured data file body;The XML parser server, with institute State XML generator server mutually to couple, for the XML file to be parsed according to fields match rule, obtain the XML The corresponding attribute of file and classification information;And according to the corresponding attribute of the XML file and classification information, described in after pairing The corresponding relation storage of XML file and unstructured data file body is classified and assigns correspondingly attribute to corresponding;The non-knot Structure data server, mutually couples with the XML parser server, for storing the XML file and non-knot after pairing The technological means of the corresponding relation of structure data file body, so having reached following technique effect:Only need to develop a set of XML Maker, a set of XML parser, it is possible to meet all types of origin systems to the data of goal systems and import;No matter source system When system or the data list structure of goal systems change, it is only necessary to change the fields match rule that XML parser uses and match somebody with somebody File is put, development amount is greatly reduced;Source system data is extracted to be imported with the data of goal systems and is divided into two solely Vertical step, centre carries out data exchange, realizes the system decoupling of higher degree with standardized XML file;What data were extracted Result is stored using XML forms corresponding with unstructured data, and importing mistake if there is data can convenient root Investigated and recalled according to the intermediate result for being retained.
Application example is below lifted to be described in detail:
For the deficiency of prior art, application example scheme of the present invention is by each origin system(Source system data server) Data pick-up and goal systems(Unstructured data server)Data write-in completed as two independent steps.This hair It is a data extraction module of all of origin system setting in bright application example(It is arranged in XML generator server, with Call XML generator in the following text), all characteristics disposably read during the module records source database wall scroll, according to established rule Generation(It is that every record generation is unique)XML(Extensible Markup Language, you can extending mark language, it It is a kind of for marking e-file to make it have structural markup language, can be used to flag data, defines data type, It is a kind of original language for allowing user to be defined the markup language of oneself)Document;Single XML parser is set(It is arranged at In XML parser server, hereinafter referred to as XML parser), the XML document to the generation of each origin system is parsed, and parsing is tied In fruit write-in goal systems database, as shown in figure 3, being application example system structure diagram of the present invention.
As shown in figure 4, be the System Operation mechanism schematic flow sheet in application example Fig. 3 of the present invention, including:
401st, start;
402nd, source database record reads;
403rd, all structured field information related to wall scroll record in identification goal systems database, generate source record Correlated characteristic field XML file;Wherein wall scroll record each data field as XML file a node, if record Certain field record for refer in other tables, then using this reference record as present field node child node;
404th, judge whether unstructured data file body is stored in tableIf it is, turning 405, deny, turn 406;
If the 405, the unstructured data file body of origin system is directly stored in big field mode in tables of data, Carry out file body field unserializing;
If the 406, unstructured data file body is not stored in table, file store path is read;
407th, unstructured data file body is read according to path;
408th, XML file and the unstructured data file body pairing extracted are carried out, is imported as goal systems data Module(That is the XML parser in Fig. 3)Input;
409th, the data import modul of goal systems will analyze the XML file of input, according to fields match rule configuration text Part extracts the characteristic field for needing to use, as the attribute and classification information of non-structured document, and according to this will be corresponding Unstructured data storage is to accordingly classifying and write particular community;
410th, goal systems unstructured data write-in;By XML file and unstructured data in the form of file idol File body is stored, if there is data import mistake can easily according to the intermediate result for being retained carry out investigation with Backtracking;
411st, terminate.
Application example scheme of the present invention compares existing mainstream technology scheme, is improved in the following aspects:Only need Develop a set of XML generator, a set of XML parser(It should be noted that XML generator can with the physics realization of XML parser It is respectively arranged in two servers, or is arranged in same server;Furthermore it is possible to be respectively each origin system individually set The independent XML generator of meter exploitation, carries out data pick-up respectively, same to complete what the above-mentioned file of application example of the present invention was extracted Purpose), it is possible to meet all types of origin systems to the data of goal systems and import;No matter origin system or goal systems When data list structure changes, it is only necessary to change the fields match rule configuration file that XML parser is used, greatly reduce Development amount;Source system data is extracted to be imported with the data of goal systems and is divided into two independent processes, centre is with standard The XML file of change carries out data exchange, realizes the system decoupling of higher degree;Data extract result using XML file with Unstructured data file body is stored in the form of file idol, and importing mistake if there is data can convenient basis The intermediate result for being retained is investigated and recalled.
Those skilled in the art will also be appreciated that the various illustrative components, blocks that the embodiment of the present invention is listed (illustrative logical block), unit, and step can be by the knot of electronic hardware, computer software, or both Conjunction is realized.To clearly show that the replaceability of hardware and software(interchangeability), above-mentioned various explanations Property part(illustrative components), unit and step universally describe their function.Such work( It can be the design requirement for realizing depending on specific application and whole system by hardware or software.Those skilled in the art Can be for every kind of specific application, it is possible to use various methods realize described function, but this realization is understood not to Beyond the scope of embodiment of the present invention protection.
Various illustrative logical block described in the embodiment of the present invention, or unit, or server can be by logical With processor, digital signal processor, application specific integrated circuit(ASIC), field programmable gate array or other FPGAs are filled Put, the design of discrete gate or transistor logic, discrete hardware components, or any of the above described combination is come the work(realized or described by operate Energy.General processor can be microprocessor, and alternatively, the general processor can also be any traditional processor, control Device, microcontroller or state machine.Processor can also be realized by the combination of computing device, for example digital signal processor and Microprocessor, multi-microprocessor, one or more microprocessors combine a Digital Signal Processor Core, or any other class As configuration realize.
The step of method or algorithm described in the embodiment of the present invention can be directly embedded into hardware, computing device it is soft Part module or the combination of both.Software module can be stored in RAM memory, flash memory, ROM memory, EPROM storages Other any form of storage media in device, eeprom memory, register, hard disk, moveable magnetic disc, CD-ROM or this area In.Exemplarily, storage medium can be connected with processor, to allow that processor reads information from storage medium, and Write information can be deposited to storage medium.Alternatively, storage medium can also be integrated into processor.Processor and storage medium can To be arranged in ASIC, ASIC can be arranged in user terminal.Alternatively, processor and storage medium can also be arranged at use In different part in the terminal of family.
In one or more exemplary designs, above-mentioned functions described by the embodiment of the present invention can be in hardware, soft Any combination of part, firmware or this three is realized.If realized in software, these functions can be stored and computer-readable On medium, or it is transmitted on the medium of computer-readable with one or more instructions or code form.Computer readable medium includes electricity Brain stores medium and is easy to so that allowing computer program to be transferred to other local telecommunication medias from a place.Storage medium can be with It is that any general or special computer can be with the useable medium of access.For example, such computer readable media can include but RAM, ROM, EEPROM, CD-ROM or other optical disc storages, disk storage or other magnetic storage devices are not limited to, or other are appointed What can be used for carrying or store with instruct or data structure and other can be by general or special computer or general or specially treated Device reads the medium of the program code of form.Additionally, any connection can be properly termed computer readable medium, example Such as, if software is by a coaxial cable, fiber optic cables, double from web-site, server or other remote resources Twisted wire, Digital Subscriber Line(DSL)Or with the wireless way for transmitting such as example infrared, wireless and microwave be also contained in it is defined In computer readable medium.Described disk(disk)And disk(disc)Including Zip disk, radium-shine disk, CD, DVD, floppy disk And Blu-ray Disc, disk is generally with magnetic duplication data, and disk generally carries out optical reproduction data with laser.Combinations of the above Can also be included in computer readable medium.
Above-described specific embodiment, has been carried out further to the purpose of the present invention, technical scheme and beneficial effect Describe in detail, should be understood that and the foregoing is only specific embodiment of the invention, be not intended to limit the present invention Protection domain, all any modification, equivalent substitution and improvements within the spirit and principles in the present invention, done etc. all should include Within protection scope of the present invention.

Claims (10)

1. a kind of unstructured data storage system, it is characterised in that the unstructured data storage system includes:Origin system Data server, XML generator server, XML parser server, unstructured data server, wherein:
The source system data server, for storing enterprise operation system characteristic;
The XML generator server, mutually couples with the source system data server, for being taken according to the source system data The recording feature generation XML file of business device, and the unstructured data file body in the source system data server is entered Row is extracted, and is matched with the XML file, the corresponding relation of generation XML file and unstructured data file body;
The XML parser server, mutually couples with the XML generator server, for by the XML file according to field Matched rule is parsed, and obtains the corresponding attribute of the XML file and classification information;And it is corresponding according to the XML file Attribute and classification information, by pairing after the XML file and unstructured data file body corresponding relation storage to phase Should classify and assign correspondence attribute;
The unstructured data server, mutually couples with the XML parser server, described in after storage pairing The corresponding relation of XML file and unstructured data file body.
2. unstructured data storage system as claimed in claim 1, it is characterised in that
The XML generator server is carried the unstructured data file body in the source system data server Take, including:The source system data server is retrieved, the deposit position of unstructured data file body is determined;According to described The deposit position of unstructured data file body is extracted.
3. unstructured data storage system as claimed in claim 2, it is characterised in that
The XML generator server is extracted according to the deposit position of the unstructured data file body, further Including:If the unstructured data file body of the source system data server is directly in big field mode in tables of data Storage, then carry out unserializing to the big field, and it is right that the store path otherwise according to unstructured data file body reads The unstructured data file body answered.
4. unstructured data storage system as claimed in claim 1, it is characterised in that
The unstructured data server, stores the XML file after matching and destructuring number in the form of file idol According to the corresponding relation of file body.
5. unstructured data storage system as claimed in claim 1, it is characterised in that
List in the XML file that the XML generator server is generated according to the recording feature of the source system data server Each data field of bar record as XML file a node, if during certain data field of record refer to other tables Record, then using the reference record of the data field as present field node child node.
6. a kind of unstructured data storage method, it is characterised in that methods described is applied to the storage of business event characteristic System, the system includes:Source system data server, XML generator server, XML parser server, unstructured data Server, the source system data server, for storing enterprise operation system characteristic;Wherein, methods described includes:
XML file is generated according to the recording feature of the source system data server by the XML generator server, and will Unstructured data file body in the source system data server is extracted, and is matched with the XML file, The corresponding relation of generation XML file and unstructured data file body;
The XML file is parsed according to fields match rule by the XML parser server, is obtained the XML The corresponding attribute of file and classification information;
According to the corresponding attribute of the XML file and classification information, by pairing after the XML file and unstructured data text The corresponding relation storage of part body is to the corresponding classification in the unstructured data server and assigns correspondence attribute.
7. unstructured data storage method as claimed in claim 6, it is characterised in that described by the source system data service Unstructured data file body in device is extracted, including:
The source system data server is retrieved, the deposit position of unstructured data file body is determined;
Deposit position according to the unstructured data file body is extracted.
8. unstructured data storage method as claimed in claim 7, it is characterised in that described according to the unstructured data The deposit position of file body is extracted, including:
If the unstructured data file body of the source system data server is directly in big field mode in tables of data Storage, then carry out unserializing to the big field, and it is right that the store path otherwise according to unstructured data file body reads The unstructured data file body answered.
9. unstructured data storage method as claimed in claim 6, it is characterised in that it is described by pairing after XML texts Part is stored to the corresponding classification in the unstructured data server simultaneously to the corresponding relation of unstructured data file body Correspondence attribute is assigned, including:
In the form of file idol, by pairing after the corresponding relation of the XML file and unstructured data file body store To the corresponding classification in the unstructured data server and assign correspondence attribute.
10. unstructured data storage method as claimed in claim 6, it is characterised in that described to be taken by the XML generator Business device generates XML file according to the recording feature of the source system data server, including:
XML file is generated according to the recording feature of the source system data server by the XML generator server, its In, each data field of wall scroll in XML file record as XML file a node, if certain number of record Refer to the record in other tables according to field, then using the reference record of the data field as present field node child node.
CN201310118763.1A 2013-04-08 2013-04-08 Unstructured data storage system and method Active CN104102652B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310118763.1A CN104102652B (en) 2013-04-08 2013-04-08 Unstructured data storage system and method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310118763.1A CN104102652B (en) 2013-04-08 2013-04-08 Unstructured data storage system and method

Publications (2)

Publication Number Publication Date
CN104102652A CN104102652A (en) 2014-10-15
CN104102652B true CN104102652B (en) 2017-05-24

Family

ID=51670811

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310118763.1A Active CN104102652B (en) 2013-04-08 2013-04-08 Unstructured data storage system and method

Country Status (1)

Country Link
CN (1) CN104102652B (en)

Families Citing this family (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105630903B (en) * 2015-12-21 2020-02-21 中国电子科技集团公司第十五研究所 Method and device for rapidly storing mass data
CN106055702B (en) * 2016-06-22 2019-12-20 西安邮电大学 Internet-oriented data service unified description method
CN106547915B (en) * 2016-11-29 2019-10-29 上海轻维软件有限公司 Intelligent data extracting method based on model library
CN106649863A (en) * 2016-12-30 2017-05-10 天津市测绘院 Non-structured data management method and apparatus
CN109947705B (en) * 2017-11-28 2021-03-16 中国石油化工股份有限公司 System and method for accessing petroleum engineering data
CN108470040B (en) * 2018-02-11 2021-03-09 中国石油天然气股份有限公司 Method and device for warehousing unstructured data
CN108829767A (en) * 2018-05-29 2018-11-16 吉贝克信息技术(北京)有限公司 Data exchange system and its method, apparatus and computer storage medium
CN109144950B (en) * 2018-07-20 2022-02-15 中国邮政储蓄银行股份有限公司 Service data storage method and device
CN109805921B (en) * 2018-12-18 2022-03-25 深圳小辣椒科技有限责任公司 Electrocardio data cross-platform sampling method and electrocardio monitoring system
CN109657184B (en) * 2018-12-19 2020-05-05 北京创鑫旅程网络技术有限公司 Rich text processing method, rich text processing device, server and computer readable medium
CN111723245B (en) * 2019-03-18 2024-04-26 阿里巴巴集团控股有限公司 Method for establishing association relation of different types of storage objects in data storage system
CN110765111B (en) * 2019-10-28 2023-03-31 深圳市商汤科技有限公司 Storage and reading method and device, electronic equipment and storage medium
CN111563065B (en) * 2020-07-09 2020-12-11 北京联想协同科技有限公司 Document storage method and device and computer readable storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101477532A (en) * 2008-12-23 2009-07-08 北京畅游天下网络技术有限公司 Method, apparatus and system for implementing data storage and access
CN102156699A (en) * 2010-02-11 2011-08-17 陈巍 Data migration method based on JDOM revolving technology

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7650512B2 (en) * 2003-11-18 2010-01-19 Oracle International Corporation Method of and system for searching unstructured data stored in a database
US20090187581A1 (en) * 2008-01-22 2009-07-23 Vincent Delisle Consolidation and association of structured and unstructured data on a computer file system

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101477532A (en) * 2008-12-23 2009-07-08 北京畅游天下网络技术有限公司 Method, apparatus and system for implementing data storage and access
CN102156699A (en) * 2010-02-11 2011-08-17 陈巍 Data migration method based on JDOM revolving technology

Also Published As

Publication number Publication date
CN104102652A (en) 2014-10-15

Similar Documents

Publication Publication Date Title
CN104102652B (en) Unstructured data storage system and method
CN104516633B (en) A kind of user interface element management method and device
CN104598376B (en) The layering automatization test system and method for a kind of data-driven
CN102364894B (en) Issuing method for configuration data file and network management equipment
Groth et al. PROV-overview
US8412735B2 (en) Data quality enhancement for smart grid applications
CN112396404A (en) Data center system
US9262248B2 (en) Log configuration of distributed applications
CN101320373B (en) Safety search engine system of website database
CN104933101B (en) A kind of configuration audit information method for automatically counting based on SVN
CN104123227A (en) Method for automatically generating testing cases
An et al. Methodology for automatic ontology generation using database schema information
CN102609520B (en) Method for exporting model data of substation by filtering
CN109508355A (en) A kind of data pick-up method, system and terminal device
CN103473672A (en) System, method and platform for auditing metadata quality of enterprise-level data center
CN101957816A (en) Webpage metadata automatic extraction method and system based on multi-page comparison
CN102542513A (en) Body-based verification tool of power grid public information model and method thereof
US8250041B2 (en) Method and apparatus for propagation of file plans from enterprise retention management applications to records management systems
CN109683911A (en) A kind of system for realizing automation application deployment and impact analysis
CN102932195A (en) Networking protocol analysis-based business analysis monitoring method and system
CN109857875A (en) A kind of electronic record group volume method and system
CN106802905A (en) A kind of synergistic data exchange method of isomorphism PLM system
US20130204875A1 (en) Automatic Configuration Of A Product Data Management System
CN106156060B (en) Tag control system and terminal, label application method and label method for sorting
CN115617776A (en) Data management system and method

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant