Summary of the invention
The problem to be solved in the present invention provides a kind of relevant database be used to managing outside isomeric data, is particularly suitable for multiple external data storage in multiple systems.
For solving the problems of the technologies described above, the technical solution used in the present invention is:
A kind of relevant database be used to managing outside isomeric data, described relevant database comprises the formatted text that is stored in the isomeric data of database outside in order to description.
Further, described formatted text comprises access protocal that data are provided and the URI character string of memory location.
Further, described formatted text comprises the URI character string be used to the access protocal that data are provided and memory location.
Further, described formatted text comprises the data check attribute field.
Further, described data check field comprises MD5 or the above three kinds combination in any of data length, the last modification time of data, data.
Further, described formatted text comprises the data layout field.
Further, described data layout field comprises medium type and encryption algorithm or above two kinds of combinations.
It is a kind of for above-mentioned relational data base establishing method be used to managing outside isomeric data that the present invention also provides, and comprising:
Isomeric data is kept at accumulation layer;
Create the describable formatted text of isomery;
Describable formatted text is stored up in the database.
Further, described method second step also comprises: deposit the data check attribute of isomeric data of statistics in formatted text.
Further, described method second step also comprises: deposit the data attribute of isomeric data of statistics in formatted text.
According to a further aspect in the invention, also provide the querying method of a kind of database for holding large data to external data, having comprised:
Database is received query requests;
Database returns the formatted text of describing external data base to query requests;
Formatted text is resolved;
Pass through the accumulation layer reading out data by the formatted text of resolving.
Advantage and good effect that the present invention has are: owing to adopt technique scheme, so that database has high expansion, the multiple access protocal that can adapt to external data has also strengthened the integrality of database external management and the data independence that external data is pointed to simultaneously.
Embodiment
The invention will be further elaborated below in conjunction with one embodiment of the present of invention, GBase8a is as a kind of database of supporting large data, data are kept at the outside of GBase8a, and its access protocal can be local file, also can be the data that leave Http server, Ftp server and the storage of other specialized protocols in.
Generic resource identifier (URI) can position numerous types of data, can comprise the URI character string by descriptive formatted text among the GBase8a, and this URI character string is exactly by simple formatted text, can store the URI of external data.
The URI character string is by what identify to realize for varchar type increase URI, and its data are multiline text, and row, comprising with a pair of carriage return character and newline separation with in the ranks:
The URI of first trip
URI=protocol name ": " authentication information catalogue file name [" " query argument] [" # " bookmark]
Only support absolute URI, do not support relative address.
The GBase8a database can also comprise the data check attribute field by descriptive formatted text, the data check attribute field can be the MD5(Content-MD5 of length (Content-Length), last modification time (Last-Modified) and data) or its combination, GBase8a database URI data type data check attribute field has comprised above three kinds.
Its citation form is:
Field name: field value
Field is divided into check sum format description two parts.Data check partly is mainly used in data constraint, for database judge data that URI points to whether the data when putting in storage change.
Data check
1, Content-Length is used for pointing out the size of data, and form is the decimal digit string,
Content-Length=″Content-Length″″:″1*DIGIT
As, Content-Length:3495
2, Last-Modified points out the last date and time of revising of data.
Last-Modified=″Last-Modified″″:″RFC1123HTTP-date
As, Last-Modified:Tue, 15Nov199412:45:26GMT
3, MD5 verification.
Content-MD5=″Content-MD5″″:″md5-digest
The base64 coding of 128 MD5 digests of md5-digest=<RFC1864 〉
Content-Length, Last-Modified and Content-MD5 are options, if exist, then application program and GBase8a are when reading out data, whether the size that just should check real data conforms to description, find that difference represents that then data's consistency is destroyed, if there is no, then do not carry out consistency check.
After having increased the data check attribute field, also strengthened the data independence that external data is pointed to, it is had the ability by the complete attribute of field description data.
The GBase8a database can also comprise format fields on the descriptive formatted text, and the format description field is mainly in order to make things convenient for GBase8a to read unstructured data and correctly to resolve.In addition, can provide more detailed explanation to data according to the protocol type flexible expansion of URI, for the expansion module that third party developer's development data extracts, information is identified.
The format description explanation of field
1, Content-Type identification medium type, grammer
Content-Type=″Content-Type″″:″media-type
media-type=type″/″subtype(″;″″charset″″=″charset))
type=token
subtype=token
charset=token
As, Content-Type:text/html; Charset=ISO-8859-4
2, Content-Encoding is used for the encryption algorithm of expression data.GBase8a also can increase by user's the description that expands to external data the description attribute, especially when this attribute is sparse data, so that Data Sheet Design is enough flexible.
When having Content-Encoding, its value is pointed out whether compressing of data field, occurs when generally only being plain text in the data field.
Content-Encoding=″Content-Encoding″″:″″gzip″
As, Content-Encoding:gzip
Start corresponding conversion plug-in unit according to Content-Type, comprise Content-Type in the as a result form of conversion, GBase8a continues to start corresponding converter, until be output as till the plain text " text/plain ".If type, subtype, the charset of Content-Type or setting are not set not within the support scope in the URI field, then call general conversion plug-in unit and change, at this moment, do not guarantee to change successfully by the expection of using.
In GBase8a database URI field, adopt a null, finish in order to represent the URI field data.
The URI type can make the multiple access protocal of GBase8a Database Systems adaptation external data, so that database has high expansion for external data.As shown in Figure 1, turning to of DAP supported protocol turns to the agreement into Ftp such as the Http agreement, and network file system(NFS) turns to and is local file system, until exceed that maximum turns to number of times or the agreement that occurs not supporting and stopping.
As shown in Figure 2, when data loading, at first by application system unstructured data is kept at accumulation layer, it may be a disk, array or other local memory devices, also may be the remote storage services such as a ftp server, distributed file system service, then application program generates the URI data according to the URI field format of agreement and stores among the GBase8a, and the URI access program reading out data of GBase8a by correspondence carries out processing such as content analysis, full-text index and process.
When GBase8a is inquired about, two kinds of patterns generally can be arranged: pattern one, as shown in Figure 3, application program sends query requests to GBase8a, return URI information after, by application program according to it to the parsing of URI and by the accumulation layer reading out data, return after processing.
Pattern two, as shown in Figure 4, application program sends query requests to GBase8a, directly obtains the data that URI points to by GBase8a by built-in function or User-Defined Functions UDF, and returns client with the interface of TEXT or BLOB.
More than one embodiment of the present of invention are had been described in detail, but described content only is preferred embodiment of the present invention, can not be considered to be used to limiting practical range of the present invention.All equalizations of doing according to the present patent application scope change and improve etc., all should still belong within the patent covering scope of the present invention.