CN103412917B

CN103412917B - The Database Systems of a kind of extendible polymorphic type FIELD Data coordinated management and management method

Info

Publication number: CN103412917B
Application number: CN201310343157.XA
Authority: CN
Inventors: 陈宁江; 肖中正; 董世龙; 胡丹丹
Original assignee: Guangxi University
Current assignee: Nanning super cube science and Technology Co Ltd
Priority date: 2013-08-08
Filing date: 2013-08-08
Publication date: 2016-08-10
Anticipated expiration: 2033-08-08
Also published as: CN103412917A

Abstract

The Database Systems of a kind of extendible polymorphic type FIELD Data coordinated management and management method, including data resource ontology library module, hierarchical FIELD Data library module, network-type FIELD Data library module and FIELD Data genetic module, wherein data resource ontology library module and multiple types of data storehouse, hierarchical FIELD Data library module and network-type FIELD Data library module collectively constitute data base set.The present invention can build the data repository in the service-oriented field of substantial amounts, build extendible data resource ontology library system on this basis, expand different types of domanial hierarchy type data base and network-type field database rapidly, and the data object made new advances can be extracted from non-structured urtext data to build new FIELD Data.

Description

The Database Systems of a kind of extendible polymorphic type FIELD Data coordinated management and management method

Technical field

The present invention relates to a kind of expansible, Database Systems of polymorphic type FIELD Data coordinated management and management method, belong to Data base and artificial intelligence field.

Background technology

Data base is the warehouse organizing, store and managing data according to data structure, is a unit or an application The conventional data processing system in field.Along with information technology and the development in market, data management is only no longer storage and pipe Reason data, and it is transformed into the mode of the various data managements required for user.Data base has number of different types, from the simplest Storage have the form of various data to carrying out the large-scale database system of mass data storage the most in all fields To being widely applied.Along with quickening and the arrival in " big data " epoch of IT application process, business data is increasingly Tend to magnanimity, Un-structured and complication.Artificial intelligence and the combination of two computer technologies of data base, promote Data base intelligent development.General application program is that the knowledge of problem solving is impliedly encoded in a program, and base The problem solving key element of application is then explicitly expressed by the system in intelligence database, and composition one is relative individually Independent program entity.

Along with IT application process is accelerated, the management of magnanimity complex data is increasingly paid attention to by enterprise, but enterprise is being carried out During resource management, often encounter problems with: magnanimity business data stores, difficult management；Search slowly, Inefficiency；FIELD Data version management is chaotic；Data lack safely guarantee；Each field database cannot be total to by effective cooperation Enjoy.Therefore tackle magnanimity complexity Un-structured data management, need expansible, can develop and polymorphic type field work in coordination with These data are stored, processed and are analyzed by the intelligence database of management.

Summary of the invention

The technology of the present invention solves problem: solve non-structured, to have the data of Various Complex relation efficient storage Tissue, index and inquiry problem, it is provided that a kind of extendible polymorphic type FIELD Data coordinated management Database Systems and Management method.

The technical solution of the present invention: the Database Systems of a kind of extendible polymorphic type FIELD Data coordinated management, bag Include: data resource ontology library module, network-type FIELD Data library module, hierarchical FIELD Data library module and FIELD Data Genetic module, wherein:

Data resource ontology library module, defines context resource model, it is achieved the logical view design of primitive And node store structure design, it is provided that data storage with access base support ability, and set up comprise a large number of services data object, Relation and the data base of concept；Data resource ontology library module is network-type FIELD Data library module, hierarchical FIELD Data Library module provides context abstraction rule and data access rule；

Network-type FIELD Data library module, according to attribute, relational network and other specific properties of data object, is counting According to building data base based on network-type data attribute on the basis of resource ontology storehouse, it is achieved the data of network-type data object Structure design, design Storage and Index Design, define the network of personal connections comprising a large amount of network-type data objects, and realize Network database access interface is provided to outside；Network-type FIELD Data module is that data resource ontology storehouse is belonged to succession And the instantiation on network-type FIELD Data realizes；There is provided based on network to user, other modules and external system The query interface of V-neck V numeric field data；

Hierarchical FIELD Data library module, according to being subordinate between hierarchical data object, adjacent, intersect, at the same level etc. close It is feature, builds the special data base representing that data object and level thereof are subordinate to relevant information, and realize providing number to outside The access interface of data base it is subordinate to according to object and level thereof；Level FIELD Data library module is to network-type FIELD Data mould The further evolution of block, carries out storage organization by only having hierarchical structure FIELD Data with the form of tree, it is achieved level language Justice, and provide query interface based on level FIELD Data to user, other modules and external system；

FIELD Data genetic module, follows the trail of and controls data resource ontology library, network-type field database and level field The change during user uses of the FIELD Data in data base, sets up data object version history, and provides user Raw data set combine data with existing and be analyzed, thus obtain new FIELD Data and be input to field number by screening According in storehouse, FIELD Data genetic module provides versions of data based on record to control for above three storehouse, automatically defeated from user The initial data entered finds new FIELD Data, and uses its interface to develop accordingly management.

Described data resource ontology library module include data persistence module, bottom dictionary set up module, contextual definition module, Data directory module and interface module；

Data persistence module, defines towards the implementation method of interface, according to different hardware environment, context environmental with And other demands configure data persistence flexibly and realize；Based on object serialization technology, define FIELD Data related object Serializing and unserializing agreement, two obtained after object serialization are entered by file organization agreement during data persistence System stream output is to file, data base or network site；When needing to load the object being not loaded with Object pooling, according to The logical address that upper layer request is sent reads respective stream of data, then by the unserializing agreement reconstruct object of object； Data logical organization mode in file is block storage mode, and the management of block uses pile structure to be managed.Data resource The data persistence module of ontology library is also network-type field database and the lasting data of hierarchical field database simultaneously Changing abstract, the data storage function of latter two module is customized according to different persistence agreements based on this persistence module And extension, form the persistence storehouse of specific data type；

Module set up in bottom dictionary, stores without extended attribute and the data object of relation, set up basic FIELD Data object, Serializing agreement, unserializing agreement and storage manager；The single data mode of bottom dictionary, for network-type field The definition of data and hierarchical FIELD Data provides data basis.Network-type FIELD Data library module and hierarchical field number The serializing and unserializing interface defined here is achieved according to library module；

Contextual definition module, on the basis of bottom dictionary realizes, with the new file word to bottom data object database Bar sets up synonymy, antonymy, membership；Contextual definition high abstraction, general, organize, store And management so that network-type field database realizes flexible expansion on this basis；

Data directory module, first carries out summary definition, by quick double coding algorithm, by field to FIELD Data object Data summarization maps with the logic storage information of FIELD Data object, reaches quick-searching and accesses the mesh controlled 's；Network-type FIELD Data library module and hierarchical FIELD Data library module all contain index part, and wherein keyword is the most logical Cross and obtain a long numeral after dual coding calculates to realizing；

Interface module, realizes based on EJB3.0 specification, issues with EJB interface and Web Service interface form, Realize cross-platform service, network-type field database and hierarchical field database by inheriting data resource ontology library interface Module, it is achieved the interface issuing function of customization.

It is as follows that described network-type FIELD Data library module realizes process:

(1) storage management level based on described data resource ontology library in design Storage, define network-type FIELD Data Relevant FIELD Data object, persistence agreement；

(2) on the network-type FIELD Data object base defined, accumulation layer, defined attribute part basic structure are set up And process.It is the attribute that data base exists when designing that attribute section is divided into two parts, a part, referred to as base attribute； Another part is User Defined attribute, referred to as extended attribute；

(3) on network-type FIELD Data object storage organization, data directory is set up, based on the B with fast cache district Tree and Bloom Filter realize, the B-tree dynamically generated inserting network-type data object when, and are not intended to B The maximum number of plies of tree, for the situation that network-type data object is of the same name, connect together one genus of formation by attribute block pointer Property block chained list, when with entitled keyword query network-type data object, quickly obtains network-type data object of the same name List；

(4) after realizing data directory, for the renewal of data record, by realizing checkpoint and journal file, Efficiently access and the high fault tolerance of safeguards system.

It is as follows that described hierarchical FIELD Data library module realizes process:

(1) storage management level based on described data resource ontology library in design Storage, define level V-neck V numeric field data FIELD Data object that storehouse is relevant, persistence agreement；

(2) at hierarchical FIELD Data object and base based on hierarchical FIELD Data structural extended to persistence agreement On plinth, carry out the design Storage of hierarchical structure；

(3), after accumulation layer completes, the positionality between Level of organization's type data object is carried out by the binary tree limited. Keyword in index file is the even composition of dual coding number of a data object, due to the uniqueness that this number is even, so Problem regardless of conflict.In property file, deposit parent data object and child data object is also to use this number even next Storage；During retrieval, even by calculating the number of this data object, in index file, mate identical number even, obtain corresponding The pointer of attribute, reading attribute, if there being multiple child data object, can be looked for by the pointer pointing to next attribute of attribute To all of subordinate data object；

(4) after having indexed, on the basis of the checkpoint of network-type FIELD Data library module with journal file function, Build the simplification journal file being suitable for hierarchical structure.

It is as follows that described FIELD Data genetic module realizes process:

(1) the User Activity record of each data base, the change of the level of activity of monitoring FIELD Data object are first collected.

(2) the data object activity change data collected are analyzed, activity is less than the field number of system thresholds Warning standby storehouse is included according to object；

(3) analyzing the activity inventory of user further, the FIELD Data object changing core attribute sets up this number Record is changed according to the version of object；

(4) system also user is provided or text data on the Internet be analyzed, build a googol According to object analysis storehouse；After new data are input to data object analysis storehouse, the version reading associated data object will be triggered This information, then by the relation between analytical data object and associated data object version, calculates current data object For the probability of new FIELD Data, and automatically or new FIELD Data is manually modified by user, and add correspondence In field database.

The data base management method of a kind of extendible polymorphic type FIELD Data coordinated management, it is achieved step is as follows:

(1) text data file providing user carries out pretreatment, removes and includes stopping word, modal particle and punctuate symbol Number at interior non-core FIELD Data, it is thus achieved that preprocessed text data；

(2) preprocessed data of output in step (1) is input in LDA probabilistic model, with built vertical data model Mate, it is thus achieved that field therein associated data object；

(3) the field associated data object of output in step (2) is carried out the structure of suffix tree, and merge existing after Sewing tree, the suffix tree being combined progressively travels through, it is thus achieved that the character string of frequent, then initializes a field and is correlated with Data object；

(4) the field associated data object that step (3) obtains is input to data resource ontology library and carries out type, relation Judge and coupling, it is thus achieved that with the type of this field associated data object, i.e. hierarchical, network-type or user-defined class Type, and the other field data object associated with this data object；

(5) field associated data object step (4) exported and associated data thereof are input to the field number of corresponding types According to storehouse, set up data change log recording, and by this field associated data object input double coding algorithm, it is thus achieved that corresponding Index number even；

(6) number step (5) obtained occasionally carries out combinations of services with data object and association area data object, Output is containing field dependency, multiple relation, multiattribute FIELD Data object eventually.

Present invention advantage compared with prior art is:

(1) present invention is based on block storage, heap manager and the memory technology of many daily records group, it is ensured that bottom storage efficient, Safety；

(2) based on extendible method for designing, by self-defining data relation, permissible on data resource ontology library Expand the field database in many sub-fields；

(3) present invention realizes automatically detecting a large amount of text datas, analyzing and abstraction function, the data object that will obtain Evolution frontier data on the basis of latest edition data resource ontology library.

Accompanying drawing explanation

Fig. 1 is the composition frame chart of present system；

Fig. 2 is data resource ontology library modular concept relation schematic diagram of the present invention；

Fig. 3 is data resource ontology library module tlv triple master data model schematic of the present invention；

Fig. 4 is that data resource ontology library of the present invention indexes schematic diagram；

Fig. 5 is data resource ontology library module accesses process schematic of the present invention；

Fig. 6 is that data resource ontology library of the present invention defines newer field name schematic flow sheet；

Fig. 7 is that data resource ontology library of the present invention inserts new relation schematic flow sheet；

Fig. 8 is that data resource ontology library of the present invention retrieves data object process schematic；

Fig. 9 is inventive network type data base's basic structure and relation schematic diagram；

Figure 10 is hierarchical field database basic structure of the present invention and relation schematic diagram；

Figure 11 is hierarchical data field of the present invention library inquiry schematic flow sheet；

Figure 12 is hierarchical field database of the present invention newly-built FIELD Data object schematic flow sheet；

Figure 13 is FIELD Data genetic module LDA model representation figure of the present invention；

Figure 14 is FIELD Data evolution Version Control structural representation of the present invention.

Detailed description of the invention

The present invention is by efficient data storage organization method, it is achieved data object, the effectively storage of relation, inquiry.And And new data object and FIELD Data can be found by the mass data that user provides；Achieve data resource body Storehouse, network-type field database, hierarchical field database, and the extension of field database can be carried out.

The present invention includes data resource ontology library module, hierarchical FIELD Data library module, network-type FIELD Data library module With FIELD Data genetic module, wherein data resource ontology library module, hierarchical FIELD Data library module and network-type field DBM collectively constitutes data base set, it is provided that carries out the retrieval of information extraction and data object, sets up field database Various associated data object storehouses needed for system, provide data storage for the coordinated management of polymorphic type FIELD Data and access basis Enabling capabilities；FIELD Data genetic module is followed the trail of and the change of control field data, it is achieved the evolution of field database itself With FIELD Data version management.

1, data resource ontology library module

The definition of attribute: the character of a things and relation, is referred to as the attribute of this things.The endemic genus of certain class things Property, from concrete things abstract out, the voice of such as people, thinking are particular attributes.The accidental genus of certain class things Property, it being that some things of certain class has but not all things all has attribute, the colour of skin of such as people, nationality are all even So attribute.

The definition of concept: concept is the thinking form of the particular attribute (build-in attribute or essential attribute) of reflection things. Concept has abstractness, universality.Concept has true and false dividing, and real concept is correctly to reflect the particular attribute of things Concept.The intension of concept is the particular attribute of the things that concept is reflected.The extension of concept, is to have concept institute instead The things of the particular attribute reflected.Typically represent outside certain conceptual model of certain conceptual model by " is-a " relation Prolong.Relation is an extension of concept.Role is also an extension of concept.Concept requirement is clear and definite, i.e. requirement A clear and definite concept is gone in terms of the connotation and extension two.Extension according to concept is a things or multiple things, Concept can be divided into singular concept and universal concept.The extension of singular concept is a unique things, such as has The time of body and concrete space.And the extension of universal concept, the things of many can be comprised.Such as " city ", " commodity " such concept." city " comprises a lot of concrete cities, and " commodity " are also a large amount of physical commodities Concept set." is " and " is-a " is all a kind of unique concept.Concept be further divided into collective concept and Non-collective concept.Collective concept is the concept of reflection aggregation.Non-collective concept is the concept not reflecting aggregation.Generally Thought is further divided into positive concept and negative concept.Positive concept is the concept that reflection has the things of certain attribute.Negative concept is Reflection does not have the concept of the things of certain attribute.Concept is also divided into relative concept and absolute notion.Relative concept is anti- Reflect the concept of the things with certain relation.Absolute notion is the concept that reflection has the things of certain character.

Relation between concept has following 4 kinds of fundamental relations, as shown in Figure 2:

(1) complete same relation: be all b if all of a, meanwhile, b is a, then a and b just has complete same relation. Two concepts have complete same relation, then the extension of the two concept is same.

(2) membership: if all b are a, but have a to be not belonging to b, then a Yu b is superior relation, The relation of b Yu a is subordinate's relation.

(3) cross reference: if some a are b, and, some a are not b, and, some b are not the most a, So, a Yu b is cross reference.

(4) disparate relation: all of a is not the most b, then a Yu b is disparate relation.

The master data model of " Concept A-Relation-Concept B " this tlv triple is field database Basis, be i.e. basis logical structure, be also basis physical storage structure (Fig. 3)." concept " is field number According to the basic element in storehouse, it have expressed people's cognition to things, is the thinking shape of the particular attribute of reflection things State.Concept both can be the object of a kind of necessary being, it is also possible to the imagination of a kind of mankind or design, can be i.e. One attribute, it is also possible to a kind of movable.The logic meaning that this tlv triple is expressed be " concept A " with " concept B " it Between be this " relation (Relation) ", " concept A ", in this tlv triple, is the identity of owner, " concept B " it is the identity of participant in this tlv triple." concept A " and " concept B " can also have or participate in other Relation.Thus tie up to n concept is set up an extremely complex network structure by pass, this structure complexity by User or actual motion environment determine." concept " can be divided into " class " (Class) and " individual " according to abstraction hierarchy (Individual)." relation " be used between concept set up contact, can be divided into " category difference ", " class with Body relation " and " relation between individuality ".In Data Model Designing, " relation " is all unified to regard as a kind of " general Read ", therefore, in field database, all are all " concepts ", and each " concept " has a unique mark. All spectra data all can use the mode of this conception of species-relation-concept to express, if the data in certain field are constituted The most complicated, then this structure of FIELD Data table, it is likely to an extremely complex network structure, but venation But it is apparent from.What a lot of FIELD Data were direct or indirect in the world will have certain contact, by FIELD Data and relation This expression structure, can be easily found the data of association, different from relational data structure, conceptual relation here This structure, almost can accommodate all of relation and data, it is possible to arbitrarily revises and creates new relation and data, And unlike relational database, after data base has designed, the relation between data and data is the most no longer changed, See that conceptual relation this data building form, carrying out pattern search or data mining when, accounts for from this point on There is unrivaled advantage.

1.1 data resource Ontology database structures describe

In data resource ontology library, the ontology data deposited inside storehouse has definite conception, terse, the monosemy of morphology etc. Feature.One body only states a concept, and is based on noun and noun phrase.Some data object has Specific relation, such as synonymy, antonymy and membership.First one underlying database of design, the inside is Without extended attribute and the data object of relation.Underlying database is for realizing synonym, antisense and being subordinate to three kinds of relations offer clothes Business.Another pith is internal memory index, and its structure has object hash value, disk block address；Object hash value is Unique numeral, it is assumed that two data objects clash, i.e. hash value is equal, then they are stored to the block that disk is identical In, just can deposit a plurality of record for such a piece.

Disk file: the disk space specifying size is divided into a block；One block deposits a plurality of record, if note Record has more the upper limit of block, then point to next block by pointer.By program source data object file by mapping, write Enter binary data file.In the data file, being in units of data block, in a data block, that storage is Hash It is worth identical, the data object of Hash conflict i.e. occurs.Store in this data block is multiple attribute words of data object Section.If it is desirable, other fields can be added, such as add the ground of its synonym data object for synonym data base Location field.When searching certain data object, Hash function is just utilized to calculate its hash value, according to mapping relations, just The relative physical block number of data object can be directly found, then give database manager and Object Manager number According to reading and deblocking.Data base's fast cache district it is provided with, it is to avoid in being called in by mass data object in the middle of the method Depositing, save memory space, access efficiency is the highest.

Independent data objects: i.e. data object does not has any relation with other data objects, only returns this number of word during inquiry According to object, do not show other relation.This part directly utilizes underlying database, and underlying database is the most real Relation between existing data object.FIELD Data storage logical structure includes (structure is as shown in Figure 4):

(1) index: data object passes through the calculated value of hash function, for located data objects address.

(2) block address: deposit the relative physical address of the block of data object.

(3) data object record: data object itself.

Band relation data object: this relation includes synonymy, antonymy and membership.These passes tie up to bottom On the basis of data base, the address of its data object provided is utilized to realize.

Data object relationships is accomplished by the basis of underlying database, sets up concordance list file and record set literary composition respectively Part.Concordance list file and record set file are divided into the continuous print block of particular size on physical storage structure respectively, this Place is by one piece of referred to as record, every corresponding unique numbering (i.e. owner record numbered 0, Article 2 record of record Numbered 1, by that analogy).In concordance list file, index record includes positive association group #, inverse association group is compiled Number, subordinate's data object group #, higher level's data object relative to physical address (these numbering all referring at record set file In numbering).In record set file, every records the previous bar record number having two marks to carry out minute book record Number with trailer record, a number of data object can be stored relative to physical address, when a record space is finished simultaneously Time, its trailer record of can reallocating as required.Positive association (inverse association/subordinate's association) is added when to give certain data object During data object, first process data object with Hash function, produce unique data object relative to physical address, so After in concordance list file give this data object distribute an index record, according to data object address at block management data layer The index record numbering of ad-hoc location this data object of record.Then, a record, this record are distributed at record set file It is exclusively used in and preserves the synonym positive association address (inverse association/subordinate association) of notebook data object data object is the most physically Location, then by the index record word of new assignment record numbering write notebook data object.Afterwards, in record set file Corresponding record is added synonym positive association address (antisense inverse association/subordinate's association) data object relative to physical address. If adding subordinate's word association data object, then need to be arranged in the index terms record of subordinate's word association data object The relative physical address of level word association data object.After execution deletion positive association, inverse association, subordinate's association etc. have operated, Corresponding record set file record, corresponding index record are checked, if respective record content is empty or index Recorded content is empty, then record their numbering, and release the relation between they and respective data object, so idle Disk space recoverable, is greatly improved the utilization rate of disk space.

1.2 definition data bases

Set up the purpose of this module primarily to be easy to management and the extension of data-object library.After being only defined certain field name, number This field and its field value can be added according to the data object in library of object；If revising certain field name, then storehouse has this field Data object will automatically this field name be revised as new field name, but corresponding field value is constant；If deletion has been determined in storehouse Certain field name of justice, then all data objects having this field in data base will be automatically deleted this field and its field value.With Reason, after defining certain relation name, the data object in data base can set up this relation；If revising certain relation name, then in data base Have between the data object of this relation and automatically this relation will be revised as new relation；If delete data base the most defined certain Relation name, then have in data base between all data objects of this relation and will automatically terminate this relation.Defined for data base Default fields name and relation name be can not to modify and the operation such as deletion, for user-defined field name and relation name, The operations such as user can be added, revises, deletion.Fig. 5 shows data resource ontology library abstract architecture and access process. To define the entitled example of field,

Fig. 6 is the execution flow process defining field name.Step is as follows:

(1) client-side interface serviced component serves serviced component to send the request of definition field name to server.

(2) service end calls the method definition field name of FIELD Data librarian.Database manager calls database access Object checks whether and has defined field name.Database access object returns and checks result.If data Kuku has defined field Name, then return integer 0 to service end.Service end returns operating result 0 to client, then process terminates.

(3) data base's librarian sends in reading log to database access object that (log is effective in old checkpoint Section end position) request.

(4) database access device returns to old checkpoint, is assigned to startPos.It is right that database manager accesses to abstract data As sending the request reading the effective backup version number of current database.Database access object returns request results, data base administration Return result is assigned to variable version by device.Replicate database registration table, be called for short new registry.

(5) if the entitled sky of field recorded in new registry, then field name directly adding new registry to, field name encodes It is 1；Otherwise: read current largest field name coding in new registry.If there is spare field name coding i in new registry Compile less than current largest field name, then give newly-increased field name by this coding assignment；Otherwise current largest field name coding adds automatically 1.Newly-increased field name is given by current largest field name coding assignment.Reset current largest field name coding.

(6) it is 1 by largest field name encoding setting current in registration table.Create a log recording object.Each for log object Individual marking variable assignment, the data loaded including it and action which be to be, then log recording is write log literary composition Part.

(7) effective checkpoint (being called for short new checkpoint) new in journal file is returned to.New checkpoint is write journal file head Portion.By log recording content write data file (be called for short and submit to) newly written in journal file.

(8) returning submission result, making a mistake or failure if submitting to, database manager returns operating result to service end -1, terminal procedure.If submitting to successfully: the size of audit log file, if exceeding a certain size, journal file will be reset. Database manager returns operating result 1 to service end.Service end returns request operating result 1 to client, defines field name Success.

1.3 managed data object information

Data object information is managed by main being responsible for, it is provided that increases, delete, revise the functions such as data object information. When destroying certain data object, by the full detail of this data object (include field that this data object had and Its relation with other word) complete deletion from data base.Three kinds of approach are had toward data base adds certain data base: One is directly to add certain data object, does not attach its any information；Two is when increasing certain field newly for certain data object, if Data base there has been no this data object, then data base will add this data object automatically, adds added word for it the most again Segment information；Three is when setting up certain relation for two data objects, if one of them data object or two data objects Be not present in data base, then data object non-existent in data base is first added in storehouse by system automatically, is the most again They set up corresponding relation.When increasing newly for data object or revising field, this field name must be that data base has determined The field name of justice；In like manner, when for two data object opening relationships, relation name also must be that data base is the most defined Relation name.Meanwhile, by this module, the frequency word frequency of data object can be configured, database file can be entered Row imports and derives.Brief introduction relevant operational flow as a example by inserting data object field below, Fig. 7 is to insert data object The precedence diagram of field.Precedence diagram correspondence step is as follows:

(1) user's service interface is emitted to key word interpolation field to service end, and field value is content.

(2) service end calls the method for dictionary manager is that data object adds field and field contents.

(3) database manager calls Dual-encoder and calculates the dual coding of key word.

(4) Dual-encoder returns key word dual coding object key(abbreviation index key)；If it is empty for returning result, then turn to step Rapid 5；Otherwise turn to step 7.

(5) key word calculating dual coding failure, Data Object Manager returns operating result-1 to service end.

(6) service end returns operating result-1, declaration request operation failure to client, forwards step 40 to.

(7) Data Object Manager accesses object to abstract data and sends the coding request of field name in acquisition registration table.

(8) database access object returns the coding of corresponding field name, if return value is not for null value (i.e. field name is defined). Then forward step 11 to.

(9) database manager returns operating result 0 to service end, declares data base's undefined field name fieldName, no This field and its content (field value) can be inserted for key word.

(10) service end returns operating result 0 to client, declares dictionary undefined field name fieldName, request operation Failure.

(11) database manager sends to database access object and obtains the index value of index key key in concordance list.

(12) database access object returns the index value value that index key key is mapped, if value is empty, number is described There is not key word according to storehouse, forward step 13 to；Otherwise forward step 16 to.

(13) add key word to data base, return and add result (integer)；If adding unsuccessfully, forward step 6 to, otherwise Forward step 14 to.

(14) again the index value of index key key in acquisition concordance list is sent to data remittance library access object.

(15) database access object returns the index value value of index key key.

(16) database manager sends old checkpoint (the effective section of log in reading log to database access object End position) request.

(17) database access object returns to old checkpoint, and return result is assigned to variable startPos by database manager.

(18) database manager accesses object to abstract data and sends the request of the reading effective backup version number of current database.

(19) database access object returns and reads result, and return result is assigned to variable version by database manager.

(20) database registration table, referred to as new registry are replicated.

(21) database manager sends the byte data information reading key word to database access object.

(22) database access object returns byte data information and the carrier (referred to as data car) of disk address collection of key word.

(23) database manager sends the byte data to key word to data mart modeling factory and is processed, and changes into visual letter The request of breath object.

(24) data mart modeling factory returns key words content carrier to database manager, in database manager checks key word Hold carrier；If there is field name to be added and its corresponding field value, then forward step 25 to, otherwise forward step to 27。

(25) database manager returns operating result 2 to service end, and the content representing to be added has existed.

(26) service end returns operating result 2 to client, and the content representing to be added has existed.

(27) fieldName and content is added in key words content carrier.

(28) database manager sends the data to key word to data mart modeling factory and is processed, and changes into byte data.

(29) data mart modeling factory returns the byte data of key words content to database manager, and database manager will return Byte data put into data car.

(30) redistribute the address of required disk block according to the data of data car and the information of new registry, revise new registry Information.

(31) create new log recording object, and data car, new registry and action which be to be are loaded into daily record note In record object.

(32) database manager sends to database access object and writes newly-built log recording object in log record file Request.

(33) database access object returns to, to database manager, effective checkpoint (being called for short new checkpoint) that journal file is new.

(34) data management system sends the request of new checkpoint write journal file head, data base to database access object Access object response request.

(35) database manager sends log recording content write number newly written in journal file to database access object According to file (be called for short and submit to).

(36) database access object returns and submits result to, if submitting to successfully, then forwards step 38 to, otherwise forwards step 37 to.

(37) if submitting to unsuccessfully or above each step throw exception, then step 6 is turned to.

(38) database manager returns operating result 1 to service end.

(39) service end returns request operating result 1 to client, inserts field success.

(40) terminate.

1.4 retrieval data object information

Search function is as follows:

(1) data object existence is checked: check in data base whether there is certain data object；

(2) retrieve data object packet: retrieval data object all visual data message (field, field value, Relation, relational word), and it is packaged into packet, transmit for network or other form；

(3) retrieval data object field value: the field value (field contents) of retrieval data object field；

(4) retrieval data relationship word: all relational words of retrieval data object relation；

(5) retrieve by field name: be divided into by single field search with by both field combined retrieval.Retrieval is referred to by single field search Go out to have the packet of all data objects of a certain field；Refer to retrieve by both field combined retrieval and have certain two word simultaneously The packet of all data objects of section；

(6) retrieve by relation name: retrieve the packet of all data objects that there is certain relation；

(7) retrieval is mated backward: retrieve the packet of all data objects headed by certain key word；

(8) fuzzy matching retrieval: retrieve the packet of all data objects containing certain key word；

(9) retrieval high frequency words: retrieve data object or the data object data bag of specified quantity by frequency from high to low；

(10) retrieval low-frequency word: retrieve the word frequency all data objects less than certain frequency；

(11) retrieval frequency word frequency: retrieve the frequency (number of times being retrieved) of a certain data object；

(12) the defined all field names of searching database；

(13) the defined all relation names of searching database.

Fig. 8 shows the precedence diagram of retrieval data object data bag.Precedence diagram brief introduction:

(1) client sends the request of packet of data object of search key to service end.

(2) service end calls the method retrieval data object data bag of database searcher.

(4) Dual-encoder returns key word dual coding object key(abbreviation index key).

(5) database searcher sends the request obtaining index value corresponding for key in concordance list to database access object.

(6) database access object returns, to database searcher, the index value value that key is mapped, if value is null value, There is not key word in database of descriptions, forwards step 7 to；Otherwise forward step 9 to.

(7) database searcher returns retrieval result null to service end.

(8) service end returns retrieval result null to client, forwards step 14 to.

(9) adding 1 by the frequency of key word, then database searcher sends more new database rope to database access object Drawing the request of key word word frequency in table, database access object automated tos respond to request.

(10) database searcher sends to database access object and updates the request of key word frequency, data base in data file Access object automated tos respond to request.

(11) database searcher calls the packet of method search key of self according to key word at the first address of disk.

(12) database searcher returns the packet of key word to service end.

(13) service end returns the packet of key word to client.

(14) terminate.

2, network-type FIELD Data library module:

2.1 ultimate principle

The remittance storage of each network-type FIELD Data is divided into index part and attribute section, and index part stores name.dct literary composition In part, attribute section stores in attr.dct file.The name index part of network-type data is to insert data pair As when the unrestricted B-tree that dynamically generates.

Employing pointer operation in indexed file, so defining N number of pointer, they correspond respectively to GB2312-80 and compile N number of Chinese characters in common use in Ma.Pointer points to the tree root of the B-tree at the name place with this word as lead-in.With same word it is All names of lead-in all leave in same B-tree.

When retrieval, it is possible to use network-type data are retrieved by name referred to as keyword, and keyword is to pass through Hash Function, the GB2312-80 coding utilizing title is calculated.First find B-tree during retrieval, then use B-tree searching algorithm Find name.Title in index file is one to one with the attribute in property file, i.e. finds in indexed file The summary info of data object then certainly exists the attribute of this network-type data object in property file.Indexed file In make a look up using summary info as keyword (key), with network-type data attribute corresponding to summary info at attribute literary composition Position in part is lookup result.When indexed file finds summary info, can be according to corresponding before summary info Property index immediate subordinate file in directly read association attributes.Therefore the operation to property file is the fastest, Time is mainly expended in the lookup of indexed file.In indexed file, storage and the lookup of summary info employ Hash Algorithm and B-tree algorithm, the retrieval based on hard disk of this algorithm, addressing operation simultaneously is directly found according to pointer, therefore Efficiency of algorithm is higher.

2.2 index storage organizations

In indexed file, the storage of the summary info of network-type data and lookup employ hash algorithm and B-tree algorithm. Here the first character of summary info being referred to as " lead-in ", the remainder removing lead-in is referred to as " suffix ".First Setting up a position table having N number of list item in first indexed file, each list item is by single character and its GB2312-80 value Constitute.As long as the character in each list item calculates its key value by Hash function and i.e. can get it in the table of position Address.List item is also deposited the pointer pointing to B-tree tree root.This B-tree is used for storing the suffix of summary info.

When storing the title of network-type data, first in the table of position, find, according to summary info lead-in, the B-tree that this word is corresponding, Then its suffix is inserted in B-tree.The search procedure of summary info is similar with storage, first according to the lead-in of summary info In the table of position, find corresponding B-tree, B-tree is searched the suffix of summary info.

The structure of B-tree: B-tree is used for storing the suffix of summary info, and suffix stores as keyword in B-tree. In order to reduce disk reading times, employ n rank B-tree according to actual needs.In B-tree, each node comprises following letter Breath:

(n, C₀,A₁,K₁,C₁,A₂,K₂,C₂,…,A_n,K_n,C_n, Father)

The number of keyword during wherein n is node；K_i(i=1 ... .., n) it is keyword (suffix of summary info), and K_i<K_i+1(i=1,…..,n)；C_i(i=1 ... .., n) for pointing to the pointer of subtree root node, and pointer C_i-1Indication Keyword in tree is respectively less than K_i(i=1 ... .., n), C_nIn indication subtree, the keyword of all nodes is all higher than K_n；A_i (i=1 ... .., n) it is the pointer of property file, this pointer points to character corresponding to node place B-tree as lead-in, with K_i For the summary info attribute of the suffix position in property file；Father is the pointer pointing to parent node.

When to search a data object, first it is calculated this word according to the lead-in of its given summary info by Hash Table entry address in the table of position, the content then reading list item finds the B-tree tree root address that this word is corresponding, then at B Tree is searched the suffix of summary info.Search the time that the nonrecoverable time is a Hash calculating to search plus B-tree Time, therefore this searching algorithm efficiency comparison is high.Use memory mapping technique, it is not necessary to index file is read in internal memory, The node that only need to use be arrived reads in internal memory, greatly reduces disk and reads the time, improves memory usage.

The attribute storage organization of 2.3 network-type data

In the attribute of network-type data object, in addition to title, other is all saved in property file, and attribute is divided into two Point, a part is the attribute that data base exists when designing, and referred to as base attribute is saved in base attribute file；Separately A part is User Defined attribute, referred to as extended attribute, is saved in extended attribute file.Network-type data object base This attribute is to deposit with the form of attribute block, and the attribute of a data object is stored in an attribute block, data pair The base attribute tuber of elephant leaves in base attribute file successively according to the insertion sequence of data object.Data object extension belongs to Property preserve with the form of chained list.The storage organization of property file is as shown in Figure 9.Basic by data object of base attribute block Attribute block, pointer a, pointer b etc. are constituted.Pointing to attribute block of the same name at base attribute block pointer a, pointer b points to The extended attribute first address of data object in extended attribute file.

During Network Search type data object attribute, first search with in the title indexed file of data object, finding number While the title of object, data object attribute block position in property file can be found in the node of store name Putting pointer, the appropriate address directly arriving property file according to this pointer reads base attribute, then according in attribute block Associated pointers reads extended attribute again in extended attribute file.Therefore the recall precision in property file is the highest.

3, hierarchical FIELD Data library module:

3.1 ultimate principle

According to the relation feature between hierarchical data it is recognised that mainly have between hierarchical data be subordinate to, adjacent, hand over Fork, together four kinds of relations of finger, wherein membership is prevailing relationship, a possible existing higher level of hierarchical data object, also Likely there is subordinate, specify that upper level is directly responsible for by each hierarchical data object here, or leader's next stage, so Design is in order in hierarchical data and there is the consideration that a lot of basic primary attribute is essentially equal.

For each hierarchical data object, can uniquely identify this data object as the keyword of data base. Keyword is to be calculated several couple, as the logical address of this data object by Hash function based on dual coding. The keyword of the most each data object and its storage address on disk are one to one, will search some data pair As, as long as calculating its key value by Hash function to be equivalent to obtain the logical address of this data object, then Give Object Manager by data access task and log manager completes.The method avoids search coupling, the time is consumed Take mainly in the calculating of hash value, and all data blocks need not be called in internal memory, as long as by required Data object reads in internal memory, and no matter in the execution efficiency of algorithm, or in the utilization rate of memory headroom, being all can Row.Simultaneously because addressing is directly searched according to pointer, the recall precision of data object is high.

The structure of 3.2 index files

Indexed file structure body mainly comprises following territory, and its Action Specification is as follows:

Key is by the calculated key value of Hash, and this key value is only for each data object One；

Father/Son domain representation membership, Father represents the Key of upper level hierarchical data object, Son table Show the Key of next stage hierarchical data object；

Neighbour domain representation neighbouring relations, neighbour represents hierarchical data object adjacent on positionality Key, this territory may more than one Key；

Cross domain representation cross reference, cross represents the hierarchical data object having cross reference on positionality Key, equally this territory are likely to more than one Key；

Co-ref domain representation refers to together relation, and co-ref represents on semantic understanding all referring to the hierarchical number in same territory According to object, this territory is likely to more than one Key；

It is identical that this hierarchical data object of 0/1 domain representation there may be duplication of name phenomenon, i.e. name, but two complete Different meanings.Represent if provided as 0 and not do not bear the same name, be 1 and indicate duplication of name, followed by Fathers Territory then recites all upper level data objects comprising this data object；

Fathers territory is all upper level data objects comprising this hierarchical data object when recite duplication of name phenomenon, So this territory is likely to more than one Key.It is then NULL without bearing the same name.

Each of the above territory all accounts for four bytes.

The structure of 3.3 data files

Data file is to deposit the file of hierarchical data object itself, and it could realize data with the combination of index file The accessing operation of object.Data file is the data object linear list on mathematical logic, the list item data in linear list Entry is set up by pointer and contacting between each territory in structure, as shown in Figure 10.Wherein W_i、W_jIt is one respectively The entry of hierarchical data object, the pointer in Father, Son, Neighbour, Cross, Co-ref, Fathers territory It is respectively directed to the upper level data object of this data object entry, next stage data object, adjacent data object, crossing number According to object, refer to data object together, and when there being base attribute essentially equal, all upper level data of this data object Object.The two data object can be distinguished by upper level data object.

When requiring to look up certain data object, as long as be calculated the address of this data object by Hash, and by it Call in the organizational structure that just can build this data object and other associated data object in internal memory easily.And nothing Need to retrieve the processes such as coupling, the time of whole retrieval is mainly expended in Hash calculating, and algorithm time efficiency is high, Hash Load factor more than 0.8.

Figure 11, Figure 12 show the related service algorithmic procedure of hierarchical field database.

4, FIELD Data genetic module:

4.1 information extractions based on LDA

The first step that FIELD Data develops is to be analyzed a large amount of valuable text messages processing.Native system uses base In the text cluster digging technology of LDA (Latent Dirichlet Allocation) generative probabilistic model, it passes through Text similar in text set is gathered into automatically different classifications, helps to find association area data.Text is empty with vector Between model represent, text representation matrix is generally of the highest dimension, often because of " dimension calamity " in cluster process Similarity measurement is caused to lose meaning.By LDA topic model, there is good text representation ability, it is possible to excavate text Potential applications information, obtain the document expression in theme space, reduce the dimension of document representation.By text is built Mould, can carry out feature selection, subject classification, judgement similarity etc. to text.LDA model have employed the method for word bag, Each text data resource is considered as a word frequency vector by the method, thus text message is converted into the number being prone to modeling Word information.

Three layers of Bayesian model of LDA model represent as shown in figure 13.Φ_kRepresent the lexical item probability distribution in theme K, θ_m Represent the theme probability distribution of m piece document, Φ_k、θ_mParameter as multinomial distribution is respectively used to generate theme again And word.K represents theme number, and M represents number of documents, N_mRepresent the Document Length of m piece document, ω_m,nAnd Z_m,n Represent the n-th word and theme thereof in m piece document respectively.α and β is the parameter of Dirichlet distribution, it is common that Fixed value and symmetrical, therefore represents with scalar.Φ_k、θ_mAll obeying Dirichlet distribution, this distribution function is such as Shown in following formula:

Dir (μ | α) = \frac{Γ (α_{0})}{Γ (α_{1}) . . . Γ (α_{k})} Π_{k = 1}^{K} {μ_{k}^{α_{k}}}^{- 1}

(formula one)

Wherein, 0≤μ_k≤ 1,Γ is gamma function.The generation process of LDA is as follows.

A () is sampled for theme

B () is for m-th document in language material, m ∈ [1, M]；

(c) sampling theme probability distribution θ_m~Dir(α)；

D () uses Document Length N_m~Poiss(ξ)；

E () is for the n-th word in document m, n ∈ [1, N_m]；

F () selects implicit theme z_m,n~Mult(θ_m)；

G () generates word

The parameter estimation of LDA, first calculates the conditional probability of subject nucleotide sequence under word sequence, and formula is as follows:

p (z | w) = \frac{p (w, z)}{\underset{z}{Σ} p (w, z)}

(formula two)

Then subject nucleotide sequence carries out Gibbs sampling, and sampling formula is as follows:

p (z_{i} = k | z . . . i, w) &Proportional; \frac{n_{k, . . ., i}^{(t)} + β_{t}}{[Σ_{&upsi; = 1}^{V} n_{k}^{(&upsi;)} + β_{&upsi;}] - 1} \cdot \frac{n_{m, . . ., i}^{(k)} + α_{k}}{[Σ_{z = 1}^{K} n_{m}^{(z)} + α_{z}] - 1}

(formula three)

Obtaining the label of the theme z of each word ω, final parameter calculation formula is expressed as follows:

(formula four)

θ_{m, k} = \frac{n_{m}^{(k)} + α_{k}}{Σ_{z = 1}^{K} n_{m}^{(z)} + α_{k}}

The model M trained, appoints to new documentThe implicit theme sampling formula of the most each word is as follows:

p ({\tilde{z}}_{t} = k {| \tilde{ω}}_{i} = t, {\tilde{z}}_{&RightArrow; i}, {\tilde{ω}}_{&RightArrow; i}; M) = \frac{n_{k}^{(t)} + n_{k, &RightArrow; i}^{(t)} + β_{t}}{Σ_{&upsi; = 1}^{V} n_{k}^{(&upsi;)} + {\tilde{n}}_{k, &RightArrow; i}^{(&upsi;)} + β_{&upsi;}} \cdot \frac{n_{{\tilde{m}}_{, &RightArrow; i}}^{(k)} + α_{k}}{[Σ_{z = 1}^{K} n_{\tilde{m}}^{(z)} + α_{z}] - 1}

(formula five)

Wherein,Represent new documentCorresponding theme vector.

By the above-mentioned Gibbs method of sampling, obtain the theme label of each word, use formula six, calculate this article After shelves value on each theme component, the document in this space just obtains the expression in theme space.

θ_{\tilde{m}, k} = \frac{n_{\tilde{m}}^{(k)} + α_{k}}{Σ_{z = 1}^{K} n_{\tilde{m}}^{(z)} + α_{z}}

(formula six)

After above step, cluster process can be carried out.After utilizing LDA to select a certain proportion of feature, select Text is clustered by K-means algorithm.Text cluster flow process is as follows:

(1) urtext is carried out pretreatment, including participle, remove stop words；

(2) feature selection is carried out with LDA model；

(3) to the feature selected, each feature weight in every text, feature weight in the text are added up W (d, computing formula w) is as follows:

W (d, w) = \frac{\log (tf (d, w) + 1) \times \log ((M + 1) / (df (w) + 0.5))}{Σ \log (tf (d, w^{'}) + 1) \times \log ((M + 1) / (df (w^{'}) + 0.5))}

(formula seven)

Wherein, M is overall text number, and (d, is w) number of times that occurs in text d of lemma w to tf, and df (w) is lemma w Text frequency.After obtaining the expression of text, just can generate a vector space model.

(4) randomly select initial point, utilize K-means algorithm to obtain final cluster result.Wherein K-means cluster Algorithm needs to measure the distance between text, uses cosine similarity to calculate.For two text d and d ', it Calculating formula of similarity as follows:

sim (d, d^{'}) = \frac{\underset{w &Element; d, d^{'}}{Σ} W (d, w) \times W (d^{'}, w)}{d \times d^{'}}

(formula eight)

4.2 field database Version Control

Field database version control module quotes the theory of Version Control with method to realize the evolutionary process pipe of data base Reason and control.The Evolution States each time of data can be considered a version, this module provides version generation, version recovers, The functions such as version deletion.Specifically, due to the factor such as amendment and evolution of field database, field database can be along with Time front and then constantly develop, the function of this module records this series of evolutionary process.On the one hand it have recorded concrete The Evolution History of FIELD Data, user can be checked at any time, and can recover certain FIELD Data certain version to the past, On the other hand user can also the data base of certain state of labelling be a version, in order to following make sometime whole Database recovery is to this version.Needs when, user can delete certain non-key version.Its structure such as Figure 14 Shown in.

4.3 new data-objects find

The discovery of new data-objects needs user to provide substantial amounts of base text data, and system is by above-mentioned based on LDA point These data analyzed by analysis model, build a googol according to object analysis storehouse；When new data are input to data object After analyzing storehouse, the version information reading the FIELD Data that is associated will be triggered, then pass through analytical data object and associate field Relation between versions of data, calculates the probability that current data object is new FIELD Data, and automatic or use Manual new FIELD Data is modified in family, and add in field database.Data object analyzes the core texture in storehouse It it is a suffix tree.This part also has another one pith to be catalogue monitoring module, newly counts for system automatic sensing According to arrival, and then automatically carry out evolution process, its processing method is as follows:

(1) system start-up, checks that configuration file is to obtain FIELD Data Evolution Data source directory.

(2) the state change of directory watcher (AutoDectector) monitored data source directory is started.When there being new literary composition Part increase when, directory watcher can detect this change, then check its file format, if text, One in pdf document, html file and Word document, then be read out analyzing to it.

(3) type according to input file is different, it is achieved that different paper analyzers: TxtAnalyzer, PdfAnalyzer、HtmlAnalyzer、WordAnalyzer.Wherein PdfAnalyzer and WordAnalyzer uses Open-Source Tools Apache POI realizes.After paper analyzer, obtain text or text flow (works as data volume Returned text stream time huge).

(4) text obtained or text flow are input to analyze in storehouse, are i.e. inserted in the most up-to-date suffix tree. The inspection of the entry to associated change can be triggered: after the frequency of entry reaches threshold value t when suffix tree changes, From data resource ontology library, then inquire about the FIELD Data relevant to this entry, and determine whether to build according to Query Result New basic area data.

(5), after analyzing complete file, the file ended up with " .analyzed " by this document RNTO, to distinguish In the file do not analyzed.Hereafter, check the meta data file in data source catalogue, if current Study document quantity is held Amount or size have reached the upper limit and have then deleted some files analyzed the earliest.

The content not being described in detail in description of the invention belongs to prior art known to professional and technical personnel in the field.

The above is only the preferred embodiment of the present invention, it is noted that for the ordinary skill people of the art For Yuan, under the premise without departing from the principles of the invention, it is also possible to make some improvements and modifications, these improvements and modifications Also should be regarded as protection scope of the present invention.

Claims

1. the Database Systems of an extendible polymorphic type FIELD Data coordinated management, it is characterised in that including: data Resource ontology library module, network-type FIELD Data library module, hierarchical FIELD Data library module and FIELD Data genetic module, Wherein:

Network-type FIELD Data library module, according to attribute, relational network and other specific properties of data object, is counting According to building data base based on network-type data attribute on the basis of resource ontology storehouse, it is achieved the data of network-type data object Structure design, design Storage and Index Design, define the network of personal connections comprising a large amount of network-type data objects, and realize Network database access interface is provided to outside；Network-type FIELD Data library module is the succession to data resource ontology storehouse And the instantiation on network-type FIELD Data realizes；There is provided based on network to user, other modules and external system The query interface of V-neck V numeric field data；Hierarchical FIELD Data library module, according to being subordinate between hierarchical data object, phase Adjacent, intersection, relation feature at the same level, build the special data base representing that data object and level thereof are subordinate to relevant information, and Realize providing data object and level thereof to be subordinate to the access interface of data base outside；Hierarchical FIELD Data library module is right The further evolution of network-type FIELD Data library module, deposits only having hierarchical construction applications data with the form of tree Storage tissue, it is achieved Layer semantics, and provide based on hierarchical FIELD Data to user, other modules and external system Query interface；

FIELD Data genetic module, follows the trail of and controls data resource ontology library, network-type field database and hierarchical field The change during user uses of the FIELD Data in data base, sets up versions of data history, and user is provided former Beginning data set combines data with existing and is analyzed, thus obtains new FIELD Data and be input to field database by screening In, FIELD Data genetic module is above-mentioned data resource ontology library module, network-type FIELD Data library module, level V-neck V Regional data base module provides versions of data based on record to control, and automatically finds new neck from the initial data of user's input Numeric field data, and use its interface to develop accordingly management.

A kind of Database Systems of extendible polymorphic type FIELD Data coordinated management, its It is characterised by: described data resource ontology library module includes that data persistence module, underlying database set up module, relation Definition module, data directory module and interface module；

Data persistence module, defines towards the implementation method of interface, according to different hardware environment, context environmental with And other demands configure data persistence flexibly and realize；Based on object serialization technology, define FIELD Data related object Serializing and unserializing agreement, two obtained after object serialization are entered by file organization agreement during data persistence System stream output is to file, data base or network site；When needing to load the object being not loaded with Object pooling, according to The logical address that upper layer request is sent reads respective stream of data, then by the unserializing agreement reconstruct object of object； Data logical organization mode in file is block storage mode, and the management of block uses pile structure to be managed；Data resource The data persistence module of ontology library is also network-type field database and the lasting data of hierarchical field database simultaneously Change abstract, the data storage function of network-type field database and hierarchical field database based on this persistence module according to Different persistence agreements is customized and extends, and forms the persistence storehouse of specific data type；

Underlying database sets up module, and storage, without extended attribute and the data object data of relation, sets up basic field number According to object, serializing agreement, unserializing agreement and storage manager；The single data mode of underlying database, for The definition of network-type FIELD Data and hierarchical FIELD Data provides data basis；Network-type FIELD Data library module and level V-neck V regional data base module achieves serializing and the unserializing interface of definition；

Contextual definition module, on the basis of underlying database realizes, sets up the entry of underlying database with new file Synonymy, antonymy, membership；Contextual definition high abstraction, general, organize, store and manage Reason so that network-type field database realizes flexible expansion on this basis；

Data directory module, carries out summary definition, by quick double coding algorithm, by field number to FIELD Data object Store information according to the logic of summary with FIELD Data object to map, reach quick-searching and access the purpose controlled； Network-type FIELD Data library module and hierarchical FIELD Data library module all contain index part, and wherein keyword all passes through double Coding obtains a long numeral to realizing after calculating；

Interface module, realizes based on EJB 3.0 specification, issues with EJB interface and Web Service interface form, Realize cross-platform service, network-type field database and hierarchical field database by inheriting data resource ontology library interface Module, it is achieved the interface issuing function of customization.

A kind of Database Systems of extendible polymorphic type FIELD Data coordinated management, its It is characterised by: it is as follows that described network-type FIELD Data library module realizes process:

(1) storage management level based on described data resource ontology library in design Storage, define network-type FIELD Data Relevant FIELD Data object and persistence agreement；

(2) on defined network-type FIELD Data object base, design Storage, first defined attribute part base are carried out This structure and process；Attribute section is divided into two parts, and a part is the attribute that data base exists when designing, and is referred to as Base attribute；Another part is user-defined attribute, referred to as extended attribute；

(3) on network-type FIELD Data object storage organization, data directory is set up, based on the B with fast cache district Tree and Bloom Filter realize, and dynamically generate B-tree, and be not intended to B inserting network-type data object when The maximum number of plies of tree, for the situation that network-type data object keyword is identical, connect together formation by attribute block pointer One attribute block chained list, when with keyword query network-type data object, quickly obtains the network-type data of same keyword List object；

(4) after achieving data directory, for the renewal of data record, by checkpoint and journal file logarithm The most fine-grained record is carried out, efficiently access and the high fault tolerance of safeguards system according to renewal.

A kind of Database Systems of extendible polymorphic type FIELD Data coordinated management, its It is characterised by: it is as follows that described hierarchical FIELD Data library module realizes process:

(2) at hierarchical FIELD Data object and base based on hierarchical FIELD Data structural extended to persistence agreement On plinth, the storage carrying out hierarchical structure is set up；

(3), after the storage of hierarchical structure has been set up, Level of organization's type data object is carried out by the binary tree limited Between positionality；Keyword in index file is constituted, regardless of punching by data object unique dual coding number is even Prominent problem；In property file, the hierarchical data depositing parent data object and child data object is also to use this number even Store；During retrieval, even by calculating the number of this data object, in index file, mate identical number even, obtain phase The pointer of the attribute answered, reads attribute, if there being multiple child data object, and can be by the pointer pointing to next attribute of attribute Find all of subordinate data object；

A kind of Database Systems of extendible polymorphic type FIELD Data coordinated management, its It is characterised by: it is as follows that described FIELD Data genetic module realizes process:

(1) the User Activity record of every field data base is first collected, the level of activity of monitoring FIELD Data object Change；

(4) system user is provided or text data on the Internet be analyzed, build a googol evidence Analyze storehouse；After new data are input to Data analysis library, the version information reading associated data object will be triggered, so Afterwards by the relation between analytical data object and associated data object version, calculating current data object is new field The probability of data, automatic or manual new FIELD Data is modified of user, and add the field database of correspondence In.

6. the data base management method of an extendible polymorphic type FIELD Data coordinated management, it is characterised in that realize step Rapid as follows:

(1) text data providing user carries out pretreatment, removes and includes that stopping word, modal particle and punctuation mark exists Interior non-core FIELD Data, it is thus achieved that preprocessed text data；

(2) preprocessed data of output in step (1) is input to LDA (Latent Dirichlet Allocation) In probabilistic model, mate with built vertical data model, it is thus achieved that field therein associated data object；

(3) the field associated data object of output in step (2) is carried out the structure of suffix tree (Suffix Tree), And merge existing suffix tree, the suffix tree being combined progressively travels through, it is thus achieved that the character string of frequent, the most initially Change a field associated data object；