CN106126547A

CN106126547A - The big data communication protocol of structuring

Info

Publication number: CN106126547A
Application number: CN201610427075.7A
Authority: CN
Inventors: 樊永正
Original assignee: Individual
Current assignee: Individual
Priority date: 2016-06-08
Filing date: 2016-06-08
Publication date: 2016-11-16

Abstract

The big data communication protocol of structuring is by the optimization of data and the change of software development model being avoided generation information island, making data can interconnect between each information system, and make data easily excavate.The big data communication protocol of structuring can make structural data have 12 technical characteristics " coupling of uniqueness, belongingness, identity, independence, integrity, standardization and system, the uniformity of structure, additive, transplantability, timeliness, verity ", and the data with 12 technical characteristics are only the big data of qualified structuring." data validity " that 12 technical characteristics are big data that can utilize the big data of structuring provides technical guarantee.

Description

The big data communication protocol of structuring

Technical field

The big data communication protocol of structuring is a kind of communication protocol, is also a kind of to allow data become the qualified big number of structuring According to technology.The big data communication protocol of structuring is also similar to that ETL, ETL are to process data produced by existing information system Problem, and the big data communication protocol of structuring is to begin to prevention data at the beginning of design information system come into question.ETL is Curing the disease for data, the big data communication protocol of structuring is that prevention data produces disease.ETL is to problem produced by prior art Carrying out light maintenance, the big data communication protocol of structuring proposes new data processing scheme.The big data communication protocol of structuring Also being a kind of software development model, the various information systeies utilizing the big data communication protocol of structuring to be set up are all big data letters Breath system, as long as uploading to large data center the data in each big data information system in a mirror-image fashion can be summed into conjunction The big data of structuring of lattice.The big data of qualified structuring are to change, without ETL, the structural data that can efficiently excavate.

Background technology

Arrival along with big data age, it has been found that all trades and professions a lot of information systeies, though but information system Many demands that but can not meet big data age, information island is serious, it is difficult to interconnect, data sharing difficulty, all trades and professions Existing a lot of data, though but data are many, be difficult to efficiently excavate.Be currently utilize relational database to solve these problems, But the problem that local can only be solved, it is impossible to tackle the problem at its root.The big data communication protocol of structuring is aiming at these and asks Inscribe and found.The big data communication protocol of structuring derives from imitation brain memory, association, thinking, starts from nineteen eighty-two, thinks at that time Computer is allowed to imitate the association function of brain.

Summary of the invention

The big data communication protocol of structuring is by the optimization of data and the change of software development model are avoided information Islanding problem, the problem that interconnects, data sharing problem produce, and make data easily excavate.The big data communication protocol of structuring Data can be made to have 12 technical characteristics: " uniqueness, belongingness, recognizability, independence, integrity, standardization and system Coupling (degree of coupling is zero), structure uniformity, can additive, portable, timeliness, verity ", meet the most simultaneously The data of 12 technical characteristics are only the big data of qualified structuring.

Invent technical problem to be solved

Invent and to be solved technical problem is that in big data 4V " data type many (Variety) " problem and " data speed Degree is fast (velocity) " problem.Targeted concrete technical problem: all trades and professions a lot of information systeies, but information Though how system but can not meet the demand of big data age, information island is serious, it is difficult to interconnect, data sharing difficulty；Respectively Row respectively already had a lot of data, though but data are many, be difficult to efficiently excavate.

Beneficial effect

Realization is interconnected, data sharing easy, and inquiry velocity is fast, and data mining is easy.

Detailed description of the invention

The innovation of the big data communication protocol of structuring shows following 5 aspects:

1, propose 12 technical characteristics of the big data of structuring first, meet the data of 12 technical characteristics the most simultaneously Just can become the qualified big data of structuring.For making data meet 12 technical characteristics, found relative with 12 technical characteristics 12 data optimization methods answered.

2, the basis of communication is that both sides must use same agreement." the knot that the big data communication protocol of structuring is proposed 12 technical characteristics of the big data of structureization " it is exactly structural data " communication protocol " that interconnects.

3, in each data of the big data of structuring, embodiment " uniqueness of data " and " ownership of data are both increased Property " data item.Existing database technology, owing to contributing to process small data, does not all account for the work of the two data item With, existing data the most all do not have the two data item.The two data item is to show that data are qualified knot The critical data item of the big data of structureization.

4, the standardization of data, standardization are emphasized especially.Because in big data environment, standardization, normalized data energy Automatically imitate the association function of brain, thus increase substantially speed and the motility of inquiry data.Relational database is to data It is not added with any restriction, is defined by the designer of data base oneself completely；The restriction to data of the structuring big data communication protocol Very strict, absolutely not to fill designer perhaps and arbitrarily define data, all data must be all specification, and this is also to allow big data The important measures easily excavated.

5, the verity that 12 technical characteristics are big data utilizing the big data of structuring provides safeguard.Small data simply exists Certain internal institution uses, and big data are to use between a lot of units, the verity of the biggest data, notarization, authority, no Repentance property just seems extremely important.

The big data communication protocol of structuring to time data-optimized be with " omnipotent data structure table " (as shown in Table 1) come Storage data, " omnipotent data structure table " can store various structural datas with a table.

Table one: the example of omnipotent data structure table storage data

ID	Things code name	Transaction attribute	Transaction attribute value	Overlength property value	Unit	Adnexa	Time
								1099	1280	Data Source	Guangzhou First Hospital		2014.5.3
1100	1280	Things is classified	Case history				2014.5.3
								1101	1280	Things is classified	Inpatient cases		2014.5.3
1102	1280	Things is classified	Medical expense				2014.5.3
								1103	1280	Identification card number	XXXXXXXXXX		2014.5.3
1104	1280	Admission number	XXXXXXXXXX				2014.5.3
								1105	1280	Name	Zhang San		2014.5.3
1106	1280	Sex	Man				2014.5.3
								1107	1280	Middle expenses for medicine	56	Unit	2014.5.3
1108	1280	Western medicine takes	72		Unit		2014.5.3
								1109	1280	Other expenditure	180	Unit	2014.5.3

Illustrate 1: 12 technical characteristics of the big data of qualified structuring and 12 data optimization methods

The big data of qualified structuring have 12 technical characteristics, only meet the knot of 12 technical characteristics in other words simultaneously Structure data are only the big data of qualified structuring.The big data communication protocol of structuring makes structural data meet 12 skills exactly The method of art characteristic.For making data have 12 technical characteristics of the big data of structuring, the big data communication protocol of structuring proposes 12 kinds of corresponding data optimization methods.

1, the uniqueness of data

The uniqueness of data: the various data of same thing, in life cycle, in different information systeies, all should It is unique, discernible, it is impossible to become unrecognizable data because of time, the change in space.

The problem that the uniqueness of data is targeted: the various data of current same thing are in different information systeies Expression-form is different, is difficult to when big data mining identify exactly.Such as, same commodity, different distributors' Information system has different codings；Same patient is when different hospitals seeks medical advice, and the admission number of patient is different, greatly When data environment looks into patient medical history, can be difficult to inquire about because of the unified identification code of the data neither one relevant to patient.

Data optimization methods one: allow all data of same thing, in different time, space, environment, it is necessary to contain There is the big identification code of date that (or several) is unique, unified.Big identification code of date is the identity card of data, license plate number. Big identification code of date and the ID in relational database have the difference of essence, and ID is to identify data in the range of a table, greatly Identification code of date is mark data in the range of big data.

Big scope of data: the different scopes involved by big data is different.In international trade, big scope of data is complete Ball, the big scope of data of the big data of national healthcare is medical industry, and the scope of data of the big data in Guangzhou is Guangzhou.

Big identification code of date can be divided into two kinds, and a kind of is the identification code of certain concrete things, just as the serial number of equipment, but With the difference that the serial number of equipment has essence, equipment Serial Number is that enterprise oneself writes, and big identification code of date needs by the world Unified standard encodes；Another kind is the identification code of certain class things.Such as, sell at each at the mobile phone understanding certain model During the sales situation of business, it is necessary to the big identification code of date of this kind of model mobile phone, because mobile phone is sold by hundreds thousand of of the whole world Business sells, and mobile phone producer needs information systeies hundreds thousand of with the whole world to interconnect.The data relevant to people should contain identity entirely Card number, to guarantee in the world, section at any time, the data relevant to someone are all unique, can recognize that as same Individual's.Big data can relate to multiple different information system, and small data is to survive in same information system, because of This is in big data environment, and the uniqueness of data is the most extremely important, can cause less than that unify, standard, specification identification code Data mining is extremely difficult.The uniqueness of data is the basis of big data mining, analysis.Big identification code of date must make can be square Just data classified statistic.

2, the belongingness of data

The belongingness of data: each attribute of data things to be reflected, data to be reflected be return who is all (or Whose collection person says by, from what comes in other words).

Data optimization methods two: " Data Source " data item will be contained in the data of each things." Data Source " Being that structural data is provided with " belongingness ", generally, available units title represents " Data Source ".

Big data come from thousands of units, if not indicating " Data Source ", can cause identification when big data mining Chaotic.

3, the identity of data

The identity of data: refer to allow information system can recognize that, allow people also can recognize that.Furthermore, not only to allow certainly Oneself information system identification, also wants to allow other people information system identification, not only to allow and oneself can identify, also to allow other people know Not.

The problem that the identity of data is targeted: the data in relational database only have data base designer oneself and The information system of oneself can recognize that.Data in data base can only be explained by other people, other information systems by software, Could identify after annotation, translation.

Data optimization methods three: make data can recognize that with suitable redundancy, comes with standard, specification natural language as far as possible Express data, avoid expressing data with code as far as possible.Principle when being optimized data is " to allow the technology in corresponding field Personnel can understand, and others' information system also can be identified, and can not be that the designer of data base can understand, and is also more than The system of oneself can identify.”

In big data environment, a characteristic most important, most critical of data is exactly " identities of data ".Close coefficient According to a strategy in storehouse it is: reduce data redundancy as far as possible.Relational database but adds knowledge while reducing data redundancy The difficulty of other data.The strategy of the big data communication protocol of structuring and relational database contrast.The big data communication of structuring The strategy of agreement: make data have recognizability with suitable redundancy, so that data can allow other people understand, also allow other people Information system can identify.

Relational database is a kind of " data base that data, data structure, program, Database Systems are inseparable ". Because the data in relational database have reformed into insignificant data, relation data after departing from concrete list structure and program Data in storehouse only just have meaning in specific table.

" omnipotent data structure table " is a kind of " data structure that data are unrelated with program ", or perhaps what one " is What is exactly, unrelated with program ".Because after the data in " omnipotent data structure table " depart from its data structure, its data true Real implication is constant.Data standard, the natural language of specification in " omnipotent data structure table " and express, as long as understand from So language, who can understand the real meaning of the data in " omnipotent data structure table ".

On the surface, relational database decreases data redundancy, and this is the big advantage of one.But, this is also to close coefficient One of disadvantage according to storehouse.Relational database, while decreasing data redundancy, result also in data distortion.Data distortion Result have led to " information exchange, information island, data mining difficult " etc. problem.In relational database, only pass through Write substantial amounts of program, data distortion problem could be solved.Countless it turns out that, relational database factor data redundancy issue and pay Go out the highest cost.When " data are inseparable with program ", to store, read, inquire about data and be necessary for writing big The program of amount.When " when data are unrelated with program ", as long as writing a general program, other people is the most permissible by this program Store easily, read, inquire about data, all develop substantial amounts of software without often developing a data base.

One principle of the big data communication protocol of structuring: the most do not consider data redundancy problem, exchange intelligence for space Data oneself can be allowed to speak with easy to use rather than allow program replace data to speak.Relation data is then to pass through application program And replace data to speak.Replacing program by data: would rather increase substantial amounts of " redundancy ", data to be made have independence, complete Property, recognizability.In other words in order to make data have independence, integrity, recognizability, do not consider data redundancy problem, nothing Opinion increases how many redundancies can.When by relational Database Design information system, always come in unscrambling data storehouse by program Data.The serious disastrous effect that this strategy is brought needs to write substantial amounts of program when processing data exactly, and coding is not just Data cannot be processed.

● the strategy of the big data communication protocol of structuring: at all costs, allow data oneself speak, stop to work as by program Translation！

The purpose " allowing data oneself speak " is: no matter data are put into Anywhere, can be only in any environment On the spot, same, complete implication is intactly given expression to.At big data age, data appear in different information systems In system, it is therefore necessary to ensure that data are in different information systeies, have identical implication in different environment.Structuring is big The purpose that data communication protocol makes data have " independence, integrity, identity, uniqueness, belongingness " allows data certainly exactly Oneself speaks, and in big data environment, so can reduce the quantity of coding significantly.Data in relational database do not have There is independence, also do not have integrity, relational database not to accomplish " allowing data oneself speak ".Data in relational database need Complete implication just can be given expression to by various " relations ".The big data communication protocol of structuring can allow data oneself speak, and Data in relational database need " relation " that be equipped with " very distant relatives " could give expression to corresponding implication exactly.

" relation " of " very distant relatives " of relational database: data have inseparable pass with Database Systems System, data have inseparable relation with list structure, and data have inseparable relation, data and data with application program There is between numerous tables in storehouse the relation of interwoveness.Data in relational database must rely on relational database system, Data structure, data type, application program just have meaning.When the data in relational database system are departing from corresponding relation Insignificant data have been reformed into after Database Systems, data structure, data type, application program.Current information system It is existing that " problem of detached island of information, information exchanges problem, data-interface problem, the problem that interconnects, the upgrading of system are asked Topic " etc., it is all oneself can not to speak due to the data in relational database system and cause.

During with relational database system design electronic medical record system, " patient's basic condition " can be used following form:

Table two: patient's basic condition table (table in relational database)

ID

HZXM

GZDW

ZB

XB

ZZ

NL

RQ

HF

BXRQ

MZ

CSZ

26

Hu Feng

Rubber plant

Workman

0

Mongolia road 2

32

1991-4-3

?

1991-4-3

The Chinese

I

The data of above-mentioned form are the classical architecture forms in small data epoch." field name " is also critically important letter in fact Breath, it is necessary to describe with standard, specification natural language." patient's basic condition " is entered through the big data communication protocol of structuring After row optimizes, the expression-form in " omnipotent data structure table ":

Table three: patient's basic condition table (omnipotent data structure table)

ID

Things code name

Transaction attribute

Transaction attribute value

Overlength property value

Unit

Adnexa

Time

100

1001

Data Source

Shanghai City First Hospital

101

1001

Things is classified

Case history

102

1001

Things is classified

Inpatient cases

103

1001

Things is classified

It is admitted to hospital case history

104

1001

Things is classified

Patient's basic condition

105

1001

Patient code

SH10-199103Z21

106	1001	Healthy card number	XXXXXXXXXXXX09
				107	1001	Identification card number	XXXXXXXXXXXXXX
108	1001	Name	Hu Feng
				109	1001	Work unit	Shanghai rubber plant
110	1001	Office rank	Workman
				111	1001	Sex	Female
112	1001	Address	Mongolia road 20
				113	1001	Age	32
114	1001	Admission date	1991-4-30
				115	1001	Wed no	Married
116	1001	History taking day Phase	1991-4-30
				117	1001	National	The Chinese
118	1001	State of an illness representor	I

Found by the contrast of above-mentioned two tables, be that one is completely with certainly by the information expressed by " omnipotent data structure table " So distortionless information expressed by language, no matter this information is placed on what where its implication was just as.

The information stored with " omnipotent data structure table " on the surface can occupy the memory space of about a times more, but So storage data can reduce a lot of complicated data pick-up, conversion work.In " omnipotent data structure table " " data are superfluous Remaining " it is exactly to allow " data oneself are spoken ", allow data be independent of Database Systems, to be independent of data structure, be independent of data type, It is independent of application program.The strategy of the big data communication protocol of structuring is " changing intelligent and easy to use with space ".Before 30 years Comparing, the memory capacity of current hard disk has improve more than 100,000 times, and the cost of the memory space occupying about a times the lowest more, It is negligible." allowing data oneself speak " is exactly to allow data just as natural language, and can accurately, inerrably express should Some implications, it is not necessary to annotation, it is not required that the deciphering of application program.

4, the independence of data

The independence of data: data are not against Database Systems, not against data structure, not against annotation, not against answering Certain implication is given expression to independently by program.

For problem: the data in relational database do not have independence, need by means of annotation, data structure, answer Implication by program ability unscrambling data.The field name of a lot of tables in relational database is nonstandard letter abbreviations, When presenting to user, needing by information system is that table adds that gauge outfit just can give expression to the real meaning of data.

Data optimization methods four: can be spoken with oneself by certain data redundancy but data, allows " data are not against number According to storehouse system, not against data structure, not against annotating, giving expression to certain implication not against application program independently ", as above Omnipotent data structure table shown in table three can realize the independence of data.

5, the integrity of data

The integrity of data: data are not against Database Systems, not against data structure, not against annotation, not against answering Certain implication is intactly given expression to by program.

For problem: the data in relational database do not have an integrity, need by means of annotation, data structure, answer By the implication that program ability unscrambling data is complete.

Data optimization methods five: can be spoken with oneself by certain data redundancy but data, allows " data are not against number According to storehouse system, not against data structure, not against annotating, giving expression to certain implication not against application program independently ", as above Omnipotent data structure table shown in table three can realize the independence of data.

6, the standardization of data

The standardization of data: data should be standard, specification, unified, unambiguous.

For problem: the current data in various information systeies are lack of standardization causes data mining extremely difficult.

Data optimization methods six: should ensure that data are specifications in Information System Design, data acquisition phase.

The standardization of data needs to set up at " the big data standard of GB, country big data standard, the big data standard of industry " Basis on rather than set up certain intramural data standard, specification basis on.Only meet that " GB is big Data standard, country big data standard, the big data standard of industry " the data of specification to may be eligible to become qualified structuring big Data.Current problem be the data standard of each unit be oneself work out, different, there is no " GB big data mark Accurate, country big data standard, the big data standard of industry ", this is the big obstacle hindering the development of big data.There are standard, rule Model, and by standard, regulation enforcement, then when big data are excavated, it is no longer necessary to ETL.

How to embody the standardization of the big data of structuring: the standardization of data will be considered when design information system, When collection, generation data, it is necessary in strict accordance with " the big data standard of GB, country big data standard, all trades and professions big data mark Accurate " input data, generate data, only in this way, the data that information system is generated are only the data of specification.

The standardization of the data of all trades and professions, standardization are the engineerings that a quantities is huge.Only carry out this work, Just can ensure that " standardization of the big data of structuring ".The standardization of data is the basis of big data.It may be said that there is no the mark of data Standardization just has qualified big data.Big data engineering, standard is leading.For in a certain respect, due at present the most in the world, domestic All trades and professions all fail to carry out data normalization work, so the most just not having qualified big data at present！

" data in information system name, database name, table name, field name, data base " will with standard, specification, Unified natural language, avoids using nonstandard code as far as possible, and this is to allow the key of data self-assembling formation " associative relationship ", also It it is the key realizing universal data retrieval.This be also the big data communication protocol of structuring advocate one of data normalization very important Reason！In big data environment, this " associative relationship " can be that data mining brings great convenience, and can be significantly increased The speed of inquiry data.

Relational database theory there is no any restriction to data, is all arbitrarily defined by designer.This is to close Be the data in data base be very difficult to excavate a basic reason.The big data communication protocol of structuring to the requirement of data, Limit the strictest.Be strict with data must be standard, specification, unified, it is necessary to meet 12 technical characteristics, each Individual data all must be in strict conformity with international standard, national standard, industry standard.Forbid the arbitrarily private self-defining data of designer. Data are just as universal machine ground parts, it is necessary to standardization is the most permissible.

Big data standard relates to each industry, also relates to various business.Big data standard relates to data Standard, the standard of data structure, the standard of business, the standard of operation flow, information system standard etc..

At big data age, unified, standard, specification speech naturally must be used in information system, keep away as far as possible Exempt to use code.This is to ensure that data independence, the integrity of data and the identity of data, reduces the degree of coupling of data and system Requisite measure.

7, data and the coupling of system

Data and the coupling of system: data are the highest with the degree of coupling of system, and data are the highest to the degree of dependence of system. When data to the degree of dependence of system higher time, data have once reformed into insignificant data departing from original system. If data need not the deciphering of any information system, user just can understand, then the coupling of these data and information system Right is zero.

For problem: the data in relational database are the highest with the degree of coupling of information system.In relational database Data and Database Systems and data structure and application program are inseparable, and the data in relational database once depart from Prime information system, to after in big data environment, has reformed into insignificant data.

Data optimization methods seven: must assure that each data is zero with the degree of coupling of information system.With suitable data Redundancy and make data have independence, integrity, identity, standardization, uniqueness, belongingness, with independence, the data of data Integrity, the identity of data, the standardization of data, the uniqueness of data, the belongingness of data and guarantee each data It is all to be the data of zero with the information system degree of coupling.

The Data Source of big data is in the system of thousands of units, and therefore, the data in big data should be and be The data that the degree of coupling is zero of system, are otherwise accomplished by writing a lot of level of application and carry out unscrambling data, and this can increase process data Difficulty, cost.The personnel of the corresponding specialty of the various articles that people's natural language is write can directly understand, and is not required to Wanting the deciphering of any information system, therefore, this data are zero with the degree of coupling of information system.In big data, its data Amount number is in terms of hundred billion, if each data all have certain degree of coupling with system, then be accomplished by writing sea The program of amount could understand big data.If each data in big data are the degrees of coupling with information system is zero Data, then when processing big data, it is not necessary to write any program again and data are understood.

Designer's custom code of relational database represents various data.Such as, some designers represent with " 0 " Women, represents male with " 1 ", and some designers represents women with " W ", represents male with " M ".In the face of thousands of letter The data of more than hundreds billion of produced by breath system, this code nonstandard, nonstandard will bring for big data mining Huge disaster.

Why the information system set up with relational database can produce serious problem of detached island of information, and one important former Data because being in relational database are incomplete, the most independent, impalpable.Relational database is " to close with various System " express the relation between various things.Data in relational database and relational database system, list structure and corresponding Application program is inseparable, once separates, and the data in relational database will become insignificant data, the most this " pass System " cause relational database to necessarily lead to " information island ".

Data in " omnipotent data structure table " are unrelated with Database Systems, list structure and application program, can be the most de- Have an independent existence from Database Systems, list structure and application program.Data in " table one " are to lead to through the big data of structuring Letter agreement be optimized after data, even if such data depart from list structure also can give expression to original implication.

The principle of big data: as far as possible avoid code, as far as possible with the natural language of standard.

Judge that data are the method for qualified big data: may be eligible to into the data that the information system degree of coupling is zero For qualified big data.

Inference: owing to the data in current relational database are entirely the data closely coupled with information system, institute It not the most qualified big data with the data in current relational database.

8, the uniformity of data structure

The uniformity of data structure: the data structure of the big data of qualified structuring must be unified.The most only " ten thousand Energy data structure table " data can be made to realize " uniformity of data structure ".

For problem: the data structure of the data in each relational database is different.

Data optimization methods eight: the big data communication protocol of structuring utilizes " omnipotent data structure table " (as shown in following table four) Realize " uniformity of data structure " of data.The big data communication protocol of structuring does not fill designer perhaps and designs any data Structure, all structural datas all must be stored in one, or in duplicate, standard, the unified table of several structures. The standardization of data structure is not accomplished with relational database theory.

Table four: omnipotent data structure table can realize the uniformity of data structure

The greatest problem of relational database is exactly that data structure is nonstandard.Data structure is not appointed by relational database theory What limits, completely by the free definition data structure of designer.Data structure standard is the basis processing big data, and data are tied It is extremely difficult that the nonstandard meeting of structure causes data to process.

9, data is additive

Data additive: refer to " make data (just as books) can be accumulated in one without any process Rise ".

For problem: current relational database system has created a lot of data, but these data all can not be tired out The big data of addition.

Data optimization methods nine: the additive of data passes through " uniqueness of data, the belongingness of data, the knowledge of data Other property, the independence of data, the integrity of data, the standardization of data, data and the coupling of system, the unification of data structure Property " realize, it may also be said to the data only simultaneously having these attributes just have additive.

Traditional information write on paper has can be additive, and library is exactly numerous books sums, and archives are exactly many Many archives sum.If data have additive, then, the data of Guangzhou Government departments concentrate storage the most in a mirror-image fashion Being equal to establish the big data in Guangzhou after cloud platform, all data of 97.8 ten thousand medical institutions in the whole nation are all with mirror image Mode upload to national healthcare large data center and be equal to build up the big data of national healthcare.It's a pity, current is various Data in information system do not have additive.

10, the transplantability of data

The transplantability of data: " no matter in data migration to any environment, data can keep original implication constant, energy Allow various information system identification, user can be allowed to identify ", such data just have transplantability.

For problem: the information system set up with relational database is difficult to interconnect, data in i.e. one system Can not be transplanted in another system.

Data optimization methods ten: the transplantability of data is by " uniqueness of data, the belongingness of data, the knowledge of data Other property, the independence of data, the integrity of data, the standardization of data, data and the coupling of system, the unification of data structure Property " realize, it may also be said to there are the data having these attributes just to have transplantability the most simultaneously.

The transplantability of data is related to interconnecting of information system.The data with transplantability could be at random at each Interconnect between system.The transplantability of data and the additive of data are the same, and the data with transplantability also have tired Additivity, simply the transplantability of data is used to embody whether data can interconnect between each system, adding up of data Can property refer to numerous small datas is summed into big data.

11, the timeliness of data

The timeliness of data: each data in big data should have the corresponding time.

Data optimization methods 11: increase timestamp for each data.

12, the verity of data

The verity of data: small data is kept accounts and the data that produce just as oneself, big data are just as between not commensurate Fund dealing and the data that produce, the verity of the biggest data is exactly very important.

Data optimization methods 12: must data are false proof, data are anti-tamper as important process, can be recognized by third party The method that card, third party's notarization, third party's data are put on record makes the verity of data be guaranteed.

Illustrate 2: the uniqueness of data is to realize the basis of " Data for Global leads to "

In class, organize in such subenvironment and can distinguish everyone with everyone name, but at whole nation model In enclosing, owing to number is too many, duplication of name a lot, the most only cannot identify everyone like clockwork by name.Big number Simply being applied to according to the data in the relational database before the epoch that certain is in-house, therefore each data are easy for identifying, If but the data in relational database are put in big data environment, then these data have just become unrecognizable number According to.In big data environment, all data about people all must contain " identification card number ", and this is to show the unique of data Property.

Relational database shows the uniqueness of the data in every table with " ID ".What relational database was considered is one The Uniqueness of the data in table, and do not consider the data uniqueness problem in big data environment.Such as, in a lot of medical treatment In information system, simply carry out the information of identified patient with " outpatient service number ", " admission number ", and do not contain the identification card number of patient.As Fruit to inquire about the history data of certain patient in the big data environment of national healthcare, then, will due in the data of patient not It is that inquiry causes the biggest difficulty containing identification card number, because the history data of patient is likely included in the whole nation 97.8 ten thousand In the table of more than millions of produced by medical institutions of family.

In big data environment, " uniqueness of data " of the data of each things are exactly a very important problem. " uniqueness of data " are to ensure that data have a key of " identities of data " in big data environment.Such as, giving birth to Producing in the information system of producer, distributor, the code name of same part commodity all must be, globally unique, unified, standard, this Sample just can ensure that data are discernible in big data environment.But, the most also do not accomplish this point, Ge Jia enterprise Information system have oneself coded system, different, for same commodity, the coding of different enterprises is different , this is the Global Link of data and big data analysis causes the biggest difficulty.

Qualified big data should be: buys a box medicine in pharmacy, can look into according to the unique coding above this box medicine Ask the whole production of this box medicine, the various correlation circumstance of intermediate links, be which manufacturer production, when produce, when dispatch from the factory, Which agent centre have passed through.

What World Economics needed most is " Data for Global leads to ", and the various data in the information system of the most global all enterprises are all Can " interconnect ", in other words " whole world any two enterprise information system between can send in time, receive any The data of commodity." current practical situation is: there is exclusive product coding rule in Mei Jia enterprise, when enterprise receives order, Also needing to the manual discernible data of information system order data being converted to oneself, then the system of oneself could process visitor The information system of the enterprise of the order data at family, only only a few can directly process the data that upstream firm is sent.This " complete Ball data are the most obstructed " the basic reason of phenomenon be that current data deficiency " uniqueness of data ", there is no international uniform , the commodity code standard of standard be " uniqueness of data " provide support.

Following the tracks of a kind of commodity at current intelligence all over the world, " uniqueness of data " are bases.A kind of number of commodity According in the information system appearing in millions of the enterprises in the whole world, only embody the big identification code of date of " uniqueness of data " From millions of information systeies, the data of this commodity can be identified like clockwork.The big data Unified coding in the whole world, Decoding (can be referred to as big identification code of date) is a very important job in big data, is also an extremely complex work Make.In international trade, order, the global Unified coding of commodity, decoding are the most extremely important, and this is commodity " Data for Global logical " Basis.

For enterprise, at big data age, order, the international standard of commodity data, national standard, industry standard are complete Ball enterprise realizes the basis of " Data for Global leads to ".Not having the standard of order, commodity, enterprise cannot enter into big data age.

Illustrate 3: the belongingness of data is to discriminate between a key of big data and small data

If from the perspective of relational database theory, increase " Data Source " and can make system produces substantial amounts of redundancy Data.But, at big data age, Data Source to be dealt with is in the information system of more than millions of, therefore, the most very It is necessary that what demonstrates each data comes from, not so, just cannot be distinguished by numerous data.In big data environment, " data are come Source " it is exactly the most crucial data, also it is requisite.In big data, increase " Data Source " for each data The purpose of data item is exactly to allow data no matter where can independently, intactly give expression to its complete implication.Data such as thing, The various things of human society have it main, and data also should have it main.

Whether the key index distinguishing big data and small data is exactly to contain " Data Source " in data.Every do not contain The data having " Data Source " are all small datas, are all the big data of underproof structuring, and this is that relational database master-hand is difficult to Understand, but the concept that this is also database technology personnel has forwarded a mark of big data age the most to.Big number According to faced by: units more than hundreds thousand of families, millions of above information systeies, tens million of above tables, many trillion Above data.In big data environment, do not have " Data Source " that turbulence will be caused.At big data age, there are " data Source " line number of coding code can be greatly reduced, " Data Source ", data sharing is needed exist for when data exchange Time, it is desirable to have " Data Source ".

Illustrate 4: the standardization of data, standardization are the keys realizing universal data retrieval

The big data communication protocol of structuring is to found on the basis of imitating the memory of brain, association, thinking, starts In nineteen eighty-two, it is desirable to calculate the association function (i.e. inquiry) of the apish brain of function at that time.The brain of people is processing data Time the technology that used be " super high data fidelity treatment technology "." standardization of data, standardization realize universal data retrieval Crucial ", this needs the angle from the super high data fidelity treatment technology of the brain of people to understand.People are from calculating at present What the angle of machine technology understands is " data ", actually what is understood from the brain memory of people, association, the angle of thinking It is most suitable to be that " data " are only.

The brain of people is large nature classic " computer ".Being only of being stored in the brain of people is the most qualified " data "." data " in the brain of people are " super high data fidelities ".Data in the brain of people are all analog datas, almost It is distortionless, is super high data fidelity, be real data, the various things of nature can be reflected the most truly, It it is in the brain epitome of the various things of nature.The relation between data and data in the brain of people is with thing The natural quality of thing and the natural relation naturally set up, can reflect between the various things of nature micro-truly Wonderful relation, this is only brain and has the basic of super strong functional.

Data in computer are dead, and the information in the brain of people is alive.Brain can with break through, space, with Time ground activate " the various things " in brain, playback various scenes in the past.Computer can also show a film, but computer is not Associative relationship can be set up for each things in film.The brain of people can be associated another scene by a scene, meter Calculation machine cannot.Brain, when recalling Pekinese's the Forbidden City, Great Wall, the most just can remember the Huangpu in Shanghai, another nictation Just go to Guizhou Yellow fruit tree.Brain can realize " moment in thousands of year, in nictation 90,000 ".Between data and data in computer There is no what relation, but the information of the things in any brain being input to people, the brain of people all can automatically with in brain Correlate information between formed associative relationship, this associative relationship is to set up according to the natural quality of things.

The super high data fidelity treatment technology of the brain of people mainly has four kinds: 1, super high data fidelity acquisition technique； 2, the storage of super high data fidelity and reproducing technology；3, the relationship technology that super high-fidelity is formed between data and data (is formed Associative relationship)；4, super high-fidelity utilizes the relationship technology (i.e. with association to process data) between data.

" super high data fidelity acquisition technique " and " the super high guarantor of brain can be imitated better by current technology True data storage and reproducing technology ".But prior art cannot realize the " super of (even say and cannot imitate) brain comprehensively at all High-fidelity forms the relationship technology between data and data " and " the super high data fidelity treatment technology " of brain, both skills Art is only brain and has the basic of super function.

Super high data fidelity acquisition technique: brain is to be felt by vision, audition, sense of touch, olfactory sensation, the sense of taste, the pain sensation etc. Organ and gather data.

The storage of super high-fidelity and true reappearance data technique: brain not only can store number with the form of super high-fidelity According to, arrive in brain just as the things of nature " is removed ", but also break through, space conventional things can have been made arbitrarily to reproduce (association).Data in brain are the epitomes of the most concrete things of nature.

Super high-fidelity sets up the relationship technology between data: brain not only can gather, store data, prior It is that brain can automatically allow data form similar association in the brain, close to association, simultaneously associative relationship.Number in brain It is naturally to set up according to the natural quality of things according to associative relationship.Brain is more than super high-fidelity and has stored data, But also the super natural relation with high fidelity stored between data and data.This is that prior art is difficult to imitate.

Super high-fidelity utilizes the relationship technology (data processing technique) between data: handled by computer is numeral Signal, and be analogue signal entirely handled by the brain of people.Brain is with similar association, simultaneously association, close at the modes such as association Manage the analog data (i.e. human thinking) of super high-fidelity.Prior art cannot imitate this technology at all comprehensively, can only local Imitate.

Compare with example below and explain " the super high data fidelity treatment technology of brain ".Main explanation: natural Things, the attribute of things, brain carry out the association between association, reasoning, and data and data and close according to the attribute of things System is to set up according to the natural quality of things.

1, " people can judge by listening that you are to strike iron block, or is striking wood." this is because, people's In the memory of brain, strike the sound that iron block sends and the most naturally link together with iron block, strike the sound of wood the most very Naturally naturally linking together with wood, these information are all that people are received in daily life.Therefore, people Corresponding things can be associated by sound.Computer can also store phonotape and videotape file, but computer can not realize sound Associate naturally between sound and image, can not identify sound and image neatly.

2, " I throws limed egg several times in hands lightly, it is possible to judge that this limed egg is the best.” During this is because good limed egg is gently thrown in hands, palm will feel a kind of quiver, and raw egg, ripe egg are the most not Can produce vibration, bad limed egg also will not produce vibration.In the memory of my brain, vibration is built naturally with limed egg Stand contact.

3, " when buying egg, egg is shaken in hand held lightly it may determine that go out the quality of egg." bad egg, Putting the egg of time length in other words, shake lightly with hands, egg yolk, Ovum Gallus domesticus album inside egg will move, and in good egg Yolk-egg white would not move.In my brain memory, these are about the information of egg, and the quality with egg is set up the most naturally Play contact.

4, " see and set outside window dynamic, be known that and blow." people brain in stored the information that wind tree is dynamic.

5, " see that tree outside window, dynamic, is known that that is to have people shaking tree." to shake tree because behaving be different with wind tree 's.Wind tree, a lot of trees are the most dynamic.People shakes tree, and only one tree is dynamic, and other tree is motionless.And people shakes the tree that causes of tree moves, with The tree that wind tree causes is dynamic is differentiated.

Compared with the brain of people, the most absolutely distortion data of the data in relational database.Relational database Artificially for data opening relationships, relational database theory it is thought that the most prominent advantage of relational database, but this It is only the most fatal defect of relational database！Because being data opening relationships artificially, destroying between the things of nature The associate naturally of itself.Relational database can not set up contact according to the natural quality of things as the brain of people.Relation One advantage of data base is that data redundancy is the least.But this is also the critical defect of relational database！Because relation data Storehouse also causes data serious distortion while reducing data redundancy.The data of serious distortion cannot be according to the nature of things Attribute and opening relationships naturally.

Relational database in different tables, has thus isolated the natural quality between things and things data-storing Between relation.Relational database is the data-storing of same class things in same table, and the data of inhomogeneous things are deposited Storage is in different tables.Brain is to classify things according to the natural quality of things, and things is same class, by thing The natural quality of thing determines, the things having same alike result is exactly same class things.Plastic tub, plastic cup, plastic bag, mould Charging basket, form is different, and brain is, according to the natural quality of plastics, they are classified as a class.For plastic cup, glass Cup, steel bowl, brain is, according to the natural quality of " cup ", they are classified as a class.Data in brain are all at same In table, various data just can extremely flexibly be classified by brain according to the natural quality of things.

" data " are not only code name, a symbol, and real " data " should be the epitome of the concrete things of nature. The brain of people can link together with ferrum naturally striking the sound that iron block sent, and relational database cannot allow " data " Realize such associate naturally.

The big data communication protocol of structuring has imitated the super high data fidelity treatment technology of brain.The big data of structuring are led to " the artificial relation " that letter agreement seeks to firmly to root out in relational database, allow Dynamic data exchange ground, naturally according to things from So attribute and set up " natural relation ".Relation in relational database is artificial foundation, destroys the nature between things Relation.Want the super function and thinking of the brain making computer close to people, be necessary for as brain making data lose less Very, data are enable to set up natural relation according to the natural quality of things.Also must root out stoutly artificially for data The relation set up, because artificial relation is certain to destroy the natural relation between data and data.

The concept of " data " in computer is the narrowest." data " should not be " digital ", " code name ", but also should This is the true reflection of things of nature, it is often more important that also should reflect the pass naturally between " data " and " data " System." mobile phone " in computer simply numeral, and " mobile phone " in the brain of people is the real reflection of real " mobile phone ", Brain have received the various signals of relevant " mobile phone " of magnanimity by vision, audition, haptic interface.Qualified " data " should This is that distortion level is minimum, it is possible to reflects concrete things than more fully, also can truly reflect the nature between things Relation.Data in relational database can not reflect the natural relation between data and data truly.Data and data it Between relation absolutely not can set up artificially, and should be the natural quality by things itself and opening relationships naturally.Structure Changing big data communication protocol is to be made data lack distortion as best one can by a certain amount of " data redundancy ", makes " data " and " data " Between naturally set up " natural relation " according to the natural quality of things.

" information system name, database name, table name, field name " with standardized, unified, the natural language of specification, As far as possible without code, in order to realize " association ".The title of information system, the title of data base, table name, field name are all to weigh very much The transaction attribute wanted, all has important implication.The designer of relational database system gets used to code, english abbreviation, the Chinese Language Pinyin abbreviation is as database name, table name, field name.This results in the data that domestic consumer fails to understand in relational database. Relational database ignores this information because it handled be small data.In big data environment, these information are with regard to right and wrong The most important, it is impossible to default.

In the big data communication protocol of structuring, in order to make data have independence, integrity, recognizability, each Data both increase " title of information system, the title of data base, table name ", " title of information system, the name of data base Claim, table name " " classification " of actually things, or perhaps the attribute of things.This way is that relation data master-hand is difficult to That understand, mysterious, because this way adds substantial amounts of data redundancy.The big data communication protocol of structuring is at " number According to redundancy " and " independence of data, the integrity of data, the identity of data, data and the degree of coupling of system " between select The latter.Its objective is the real meaning allowing the ordinary people of the technology of being ignorant of also can understand data.

The data redundancy of relational database is considerably less, but its cost is, is ignorant of the ordinary people of technology and fails to understand relation data Data in storehouse, the data in relational database can only be stored in corresponding data base, once departing from corresponding data base Insignificant data are reformed into.Data in relational database need the translation by substantial amounts of application program could allow commonly User understands.

If the data in data base are all standardized, normalized, then, these data just can basis naturally " transaction attribute " and " transaction attribute value " in " omnipotent data structure table " and automatically simultaneous play that naturally " to associate " relation (logical Cross index and set up).Owing to utilizing data produced by the various information systeies that the big data communication protocol of structuring set up complete Portion is stored in one, or in several structures duplicate " omnipotent data structure table ", it is possible to write out easily General " universal data retrieval " instrument.Such as, if the various medical information systems in the whole nation are all to assist with the big data communication of structuring Discuss and set up, then just can be by the identification card number of patient easily from national healthcare large data center " association " (inquiry) is to the history data of patient.Because the every data in the medical history of patient all contains identification card number (big data identification Code), just can " be associated " to all data relevant with patient by the identification card number of patient.And current various medical datas In not necessarily contain patient identity card number, so it is the most non-to inquire about the history data of patient from the information system of whole nation Ge Jia hospital Often difficulty.

Why the big data communication protocol of structuring makes data meet 12 technical characteristics with substantial amounts of " data redundancy ", Its basic goal is contemplated to make data become " data of high-fidelity ", and " data redundancy " compensate for the distortion of data, only " high The data of fidelity " information system just can be made can to realize " super high data fidelity process " as the brain of people.

Illustrate 5: ETL conversion need not be carried out and can efficiently excavate and universal data retrieval can be realized

Will excavate the medical data in the current whole nation to be extremely difficult, and reason is current various information Data in system are nonstandard, lack of standardization.Such as: medical industry has millions of tables, and hundreds billion of records, the structure of each table is each Differ.The data in table different to so many structure are excavated, are inquired about, and need to write substantial amounts of degree. If the various information systeies of the Ge Jia medical institutions in the whole nation are entirely and design by the big data communication protocol of structuring, then right It will be easily that data produced by such information system carry out excavating, inquiring about.Because these information systeies all use " omnipotent data structure table ", data therein be full standard, specification, unified.

The data mining of five: two kinds of methods of table, inquiry Contrast on effect table

" the most critical technology of big data is inquiring technology ": the feature of big data is big, just because of greatly, needed for wanting to obtain Data are especially difficult, therefore, inquiring required data from big data is exactly most critical, followed by inquiring The analysis of data, statistics.Therefore, it can be said that " big data are inquired about exactly ", the previous work of big data is to prepare, greatly for inquiry The later stage work of data is to add up, analyze inquiring data, and the various work of big data are all centered by inquiry Launch.

Illustrate 6: utilize the verity that 12 technical characteristics of the big data of structuring are big data to provide technical guarantee

Big data are the resources that a kind of image-stone oil is the most important.The verity of big data is the basis of big data, loses The big data of verity are exactly data rubbish.Therefore, at big data age, the verity of big data how is guaranteed, it is simply that one Very important task.

In the small data epoch, the data handled by various information systeies are mainly the data within constituent parts, data true Reality is mainly controlled oneself control by constituent parts.At big data age, data are more than the internal circulation at constituent parts, with greater need in state Circulating between each unit inside and outside, therefore, the verity of big data, notarization, authority are accomplished by being guaranteed, it is necessary to make big Data have act of law as official document.The big data communication protocol of structuring carries from the verity that the angle of technology is big data Supply to ensure." uniqueness of data " are the keys controlling big data " verities of data "." uniqueness of data " can lead to Excessive identification code of date embodies, and " verities of data " that control big data can come real by controlling the identification code of big data Existing, big identification code of date is " identity card " of the data of things, and what environment no matter the data of a things be in, its big number It is all unique according to identification code.Big data not only data, code, symbol, be also a kind of resource, as a kind of commodity, as well as article, As well as property, as treating resource, commodity, article, property, therefore to manage big data.Logistics, artificial abortion need substantial amounts of Traffic-police controls, and data stream also to control.Country is to manage control commodity by the mechanism such as industrial and commercial bureau, customs, several According to verity be also required to use similar industrial and commercial bureau, customs management to control the method for commodity to manage control, by every country The national large data center of industrial and commercial bureau (or law court, the Ministry of Public Security, work letter committee etc.) is responsible for controlling the verity ratio of big data Appropriate.

The big identification code of date of extensive stock, order etc. is responsible for coding by the national large data center of various countries and provides work Make, and big identification code of date is put on record.National large data center is responsible for the examination of the various qualifications of constituent parts, only The unit having passed through the examination & verification of national large data center is had to may be eligible to the big identification code of date of commodity, the order etc. that obtain.State Family's level large data center is only responsible for providing big identification code of date, the examination & verification of the verity of the data of not responsible commodity, order etc..Number According to verity go wrong and dispute occur time national level large data center " data police " verity of data is carried out Examination & verification, and punish accordingly according to auditing result, and result is placed on record.Just as traffic, driver's row to oneself Being responsible, when simply there is vehicle accident, traffic police just occurs.

Obtain the order of big identification code of date, official document etc. standby to national large data center or third party notary organization Case, has order, official document that third party notary organization puts on record etc. just to have act of law just as having covered official seal.Do so is permissible Save substantial amounts of paper document, also save the time of the transmission of order, official document etc..

Enterprise needs corresponding for the commodity various data to upload to national big after obtaining the big identification code of date of commodity Data center puts on record.The client of enterprise can be obtained by national large data center according to the big data encoding of commodity The various data of commodity.

Owing to being whole world Unified coding, can directly transmit between each enterprise information system, receive order, and to order Content is understood.Data acquisition in order stores with " omnipotent data structure table ", and makes data have the big data of structuring 12 technical characteristics.Every " transaction attribute " (just as field name) in order must be that the whole world is unified.Each in order Item " transaction attribute " can be different when expressing with various different language, therefore, it is also desirable to works out global standards, makes every " transaction attribute " can be by international standard one_to_one corresponding in various language.Thus can be designed that general data understand, Translation software instrument, is automatically performed the translation of the order of different language by software tool.

Current problem: can not interconnect between the information system of global enterprise.Reason is that each system is used Data encoding disunity, lack of standardization, reception order data can not be directly transmitted between the information system of enterprise, need manually ordering Forms data is entered in the system of oneself again.

The benefit of big identification code of date: realize Data for Global and lead to.Commodity Flow is guaranteed with data stream timely, accurate, comprehensive Deng smooth circulation.By means of big identification code of date, enterprise can utilize 100,000, millions of information systeies of global metadata and Follow the tracks of the commodity sale in all parts of the world, inventories.Global business information system interconnects the enterprise to supply chain upstream and downstream Industry is the most beneficial, can be commodity production, circulation provide safeguard.

The certification of the qualification of national large data center identification code of date big to various tissues and personal use: various tissues And individual can obtain the qualification using big identification code of date, but need examining by national large data center before use Core, audit qualified after issue tool valid " big data E-seal ".After national large data center examination & verification, certification The qualification of the various correlation functions using big identification code of date can be obtained, relevant information can be issued.National large data center Notarization, authority ensure that " data validity " of big data.After big data have " data validity ", Ke Yiguang Apply every field generally.

Big identification code of date has in terms of product anti-counterfeiting, Drug Administration and has been widely used.Enterprise can be each commodity Apply for a big identification code of date, an identifying code.After user buys commodity, can be known according to the big data of commodity by mobile phone Other code and obtain identifying code, identifying code is identical with on commodity is then certified products, is otherwise personation, or mobile telephone scanning Quick Response Code Personation can be known whether.

Can manage various certificate easily with big identification code of date, the checking of certificate is very convenient, as long as according to big number Just can find the information of certificate at national large data center according to identification code.Such as can be used for following certificate management: enterprise Various qualifications, the various certificates of individual, the various certifications of enterprise, notarization, property ownership certificate, the commodity inspection quality certification, marriage certificate, graduation Card, driving license (need not show driving license again, the number of saying or show Quick Response Code).Even must the most various not issue licence, as long as sending out one Individual big data certificate.

With big identification code of date can manage easily " contract, file, contract, receipt, statement, various promise, bill, Order, bid document, tender documents " etc..Large data center can also become a huge archive management system.International big number According to the highest administration mechanism that center is the big data in the whole world, it is made up of every country, is responsible for the whole world big data standard, the system of specification Order, establish rules for the big data in the whole world.

Illustrate 7: data produced by the various information systeies utilizing the big data communication protocol of structuring to be set up have tired Additivity

Found the initial idea of the big data communication protocol of structuring: big data are exactly the biggest data of data volume, respectively Row had respectively had a lot of small data already, and these small datas add up and can be referred to as big data？Big data can be referred to as, But qualified big data can not be referred to as.Because these data are excavated extremely difficult！So, these small datas how are made Qualified big data are become in the way of cumulative？Why current data can not be summed into qualified big data？Because closing Be the data that data base produces be not real data, code can only be referred to as！What really to be understood is big data, Needing what first makes clear is " data ", and what is " code ".

The definition of data: " information that the personnel of corresponding specialty can be allowed to understand just is referred to as real data." such as, relevant The data of medical treatment should be the data that corresponding medical professional can directly understand, it is not necessary to other annotation, explanation；Relevantization The data learned should be the data that the personnel of specialty chemical can understand, it is not necessary to other annotation, explains.

The definition of code: " information that the personnel of corresponding specialty can not understand is referred to as code, and corresponding professional needs profit With corresponding application program, software tool code is translated, understand, annotate after just can understand the real meaning of code.”

For relational database, the data that domestic consumer is seen all by information system in relational database Data carry out understanding, translate, annotate after data, be not the initial data in relational database.In relational database Data do not possess " identity, independence, integrity ", when the most directly the data in relational database being presented to domestic consumer, User can not " identify " these " data ", and reason is that relational database can not " independently ", " intactly " give expression to due Implication.

The definition of qualified data: only can " (independence of data) independently " (not against software deciphering, disobey Explanation by other people), " intactly (integrity of data) " give expression to due implication, and can allow people and other information systems The data " identified (identities of data) " are only qualified data.But the data in relational database do not possess such spy Property, the data that reason is in relational database are a kind of " data the highest with the degree of coupling of system ".In relational database Data be inseparable with relational database system and application system.Data in relational database are once departing from relation Database Systems and application system, just become data unrecognizable, insignificant.

Data relational database can so be described from the angle of 12 technical characteristics of the big data of structuring: by " data " in relational database are inseparable with relational database system and application system (does not possesses " coupling with system Conjunction property (degree of coupling is zero) "), so " data " (can not possess " independence ") independently, intactly (not possess " complete Property ") allow people identify (not possessing " recognizability "), can not allow other information systems identification.

Can be reached a conclusion that due to the data in relational database " with the degree of coupling of system very by above-mentioned analysis High ", the data in relational database, once departing from relational database system and application system, have just become unrecognizable, nothing The data of meaning, so the data in relational database do not possess additive.Owing to current various information systeies are substantially all Relational database is utilized to develop, so data produced by current information system can not be by cumulative method Become qualified big data.

Why the information system set up with relational database is difficult to interconnect is because such information system institute Generate data do not have " transplantability ", i.e. data can not directly from a system transplantation to another system, this is by big data " data type many (Variety) " problem in 4V characteristic and cause.If each information system is all by " omnipotent data Structural table " storage data, then " data type many (Variety) " problem is just readily solved.The most only " omnipotent data structure Table " data can be made to have " structure uniformity " and " transplantability ", it is possible to make data depart from coupled relation with information system.

The big data communication protocol of structuring is aiming at the problem existing for relational database and foundes, it is therefore an objective to check on It is that the data in data base are converted to qualified big data.Solution is: utilize " omnipotent data structure table " first to allow data " de- Coupling ", make data have " structure uniformity ", make data have with " independence, integrity, standardization, uniqueness, belongingness " " identity ".

Utilize prior art that data can be made to have " coupling (the degree of coupling of identity, independence, integrity and system Be zero), structure uniformity ".But data can't be made really to have " additive " and " transplantability " merely with prior art.Knot The big data communication protocol of structureization makes data really have " additive " and " transplantability " with " uniqueness, belongingness, standardization ", and Efficiently solve " data speed (the velocity) " problem in big data 4V.Data are made to have " uniqueness, belongingness, rule Plasticity " method be only the big data communication protocol of structuring core technology, be to aim at small data to be converted into big data and create Stand, seem and do not have technology content, the most crucial.

The standardization of the data importance to big data: in the small data epoch, each information system is substantially all in unit Internal use.Big data age, interconnecting between information system, excavate the data deriving from different information systeies, Just become very distinct issues, therefore allowed data have standardization and be just very important.If there is no " international big data Standard, country big data standard, the big data standard of all trades and professions ", then big data age is impossible to arrive.Why pole Power emphasizes the importance of data standard, is because the big data communication protocol of structuring and derives from and imitate the association of brain and brain After super high data fidelity treatment technology, the only whole standardization of data, could be automatically according to thing between data and data The natural quality of thing and naturally set up associative relationship, had associative relationship, in big data 4V " data speed is fast (velocity) " problem just can be readily solved！The most countless personages try every means and cannot fundamentally solve data Excavating difficult problem, the data that one of them basic reason is that in each current information system are the most nonstandard , nonstandard.If the data in each information system are all specification, unification, data mining will be easy to.Data Standardization be one everybody know about, the most common concept, but surface is usual, and effect behind is the hugest！ Allowing data have standardization just can make the excavation of data become easy.Only the standardization of data is performed to ultimate attainment, allow all Data be all standard, specification, unified, the super power of the standardization of data just can show.Data standard is talked about Coming easily, it is the most difficult to do, and needs to spend huge manpower and materials, it has also become affect a key factor of big data.

" uniqueness of data " and " belongingnesses of data " do not have any technology content on the surface, are only data Add two data item, two attributes.If the way it goes from the perspective of small data, due to the information in small data epoch System is primarily used to process certain intramural data, and " uniqueness of data " are nothing technology at all, and " data Belongingness " only can bring bulk redundancy for system.But at big data age, " uniqueness of data " and " belongingnesses of data " Just there is epoch-making meaning, be the small data key that becomes big data, only add the two data item, small data ability Become big data, every big data of the structuring not being qualified without the two data item, small data only stick this two Individual label mays be eligible to enter big data age.

The belongingness of the data importance to big data: the scope of small data is certain unit, simply in an information system System is survived, and the scope of big data is the whole world, faced by institute be the whole world millions of more than information system.Return for data increase The purpose of attribute is to ensure which corner is data be put into and all keep constant, will not distortion.If without belongingness in data, that , among data migration to other information systems after will distortion, in other words, after finding data from big data, just cannot Know and find wherefrom.The belongingness of data is extremely important to big data, is the identity of data, additive, transplantability Basis.

The uniqueness of the data importance to big data: the uniqueness of data be in order to easily in big data environment fast Speed, catch data exactly, also for the association function making computer can imitate brain.The environment of big data is very big, Can be the whole nation, it is also possible to be the whole world, uniqueness then can ensure that computer the most quickly and accurately data from sky Margin cape catches.If without uniqueness, grab data in the world the most extremely difficult.Such as, the A commodity of enterprise appear in In hundreds thousand of the retail shops in the whole world, if A commodity are without big identification code of date, enterprise wants the data 100,000 information system from the whole world System catches stock, the sales data of A commodity, the most extremely difficult.Data can be hidden nowhere for uniqueness, has nowhere to run.Not yet Having uniqueness, data will become different appearance in different information systeies as the White Bone Demon." number is increased for data According to uniqueness ", be equal to be mounted with tracker for data.

Relation between 12 technical characteristics of data: " additive, transplantability " be by " 1, recognizability；2, independence； 3, integrity；4, standardization；5 and the coupling (degree of coupling is zero) of system；6, the uniformity of structure；7, uniqueness；8, ownership Property " realize.The coupling (degree of coupling is zero) of data and system be by " 1, recognizability；2, independence；3, integrity；4、 Standardization；5, structure uniformity " realize.The recognizability of data is by " independence, integrity, standardization, uniqueness, returning Attribute " realize.

Utilize data produced by the system designed by the big data communication protocol of structuring why can be summed into qualified Big data？Because the data structure of all data is all identical, data are all specifications, it is not necessary to ETL has been to excavate the most Data.Additive by data " uniqueness, belongingness, recognizability, independence, integrity, standardization and system Coupling, the uniformity of structure " ensure.Data be provided with " uniqueness, belongingness, recognizability, independence, integrity, Standardization and the coupling of system (degree of coupling is zero), the uniformity of structure " be just provided with additive.

Illustrate 8: the transplantability of data is that information system interconnects and provides conveniently

Why current information system is difficult to is interconnected, the data being because in current information system and system The degree of coupling is the highest, has reformed into insignificant data after data are departing from relational database system and application system.Knot The big data communication protocol of structureization by the optimization to data data be provided with " 1, recognizability；2, independence；3, integrity； 4, standardization；5 and the coupling (degree of coupling is zero) of system；6, structure uniformity；7, uniqueness；8, belongingness；9, the time Property, 10, verity ", the data simultaneously having this eight big technical attributes just have " transplantability ".There are the data of " transplantability " Implication in any information system is just as, and all keeps constant, i.e. directly data can be sent any data system In and realize interconnect.

Illustrate 9: the big data communication protocol of structuring can be that data offer of interconnecting between each Database Systems is logical Letter agreement

The communication protocol of the data interconnection intercommunication between each Database Systems:

1, need in each data base, set up an omnipotent data structure table, the omnipotent data structure in each Database Systems The structure of table must be completely unified.

2, structural data to be sent must is fulfilled for 12 technical characteristics: " 1, uniqueness；2, belongingness；3, can know Other property；4, independence；5, integrity；6, standardization；7 and the coupling (degree of coupling is zero) of system；8, structure uniformity；9、 Additive；10, transplantability；11, timeliness；12, verity.”

As long as meeting above-mentioned two conditions, any data between any data base can interconnect, because data The recipient of sender and data be all with ten thousand Data Data structural tables storage data, so the recipient of data is receiving number Can directly write the data to after according in the omnipotent data structure table in the data base of oneself.

Claims

1. the big data communication protocol of structuring, is characterized in that: the big data communication protocol of structuring is that one realizes structuring The communication protocol that data interconnect between various information systeies, is also that a kind of structural data is converted to qualified structure The method changing big data, the big data communication protocol of structuring by 12 technical characteristics " uniqueness of data, the belongingness of data, The identity of data, the independence of data, the integrity of data, the standardization of data, data with the coupling of system, data The uniformity of structure, additive, the timeliness of the transplantability of data, data of data, the verity of data " composition, utilize The various information systeies that the big data communication protocol of structuring is set up are all qualified big data information systems, as long as with mirror image Mode uploads to large data center the data in each big data information system just can form the big data of qualified structuring.

The big data communication protocol of structuring the most according to claim 1, is characterized in that: the big data communication of available structured is assisted View realizes " uniqueness of data ", i.e. allows in each data containing unique a, system in corresponding big data environment One, the big identification code of date of standard, this is the key realizing data interconnection intercommunication, is also make small data become big data one Item key technology.

The big data communication protocol of structuring the most according to claim 1, is characterized in that: guarantee with " belongingnesses of data " Data recognizability in big data environment, i.e. allows in each data containing " Data Source ".

The big data communication protocol of structuring the most according to claim 1, is characterized in that: " information system name, database name, Data in table name, field name, data base " standard to be used, specification, unified natural language, avoid using not advising as far as possible The code of model, this is to allow the key of data self-assembling formation " associative relationship ", is also to improve inquiry velocity, realize the pass of universal data retrieval Key.

The big data communication protocol of structuring the most according to claim 1, is characterized in that: with the big data communication protocol of structuring The various information systeies set up are all big data information systems, as long as in a mirror-image fashion in each big data information system Data upload to large data center and are the formation of the big data of qualified structuring, need not carry out when processing these data ETL conversion can efficiently be excavated and can realize universal data retrieval.

The big data communication protocol of structuring the most according to claim 1, is characterized in that: the verity of big data is big data Basis, the big data communication protocol of structuring can utilize 12 technical characteristics of the big data of structuring to be the true of big data Property technical guarantee is provided, available big identification code of date and the Third Party Authentication of data, third party's notarization, third party are put on record Method make the verity of data be guaranteed, and make big data have notarization, authority, can not repentance property.

The big data communication protocol of structuring the most according to claim 1, is characterized in that: utilize the big data communication of structuring to assist Data produced by the various information systeies that view is set up have additive, these data need not carry out ETL conversion and get final product shape Become the qualified big data of structuring, as long as in a mirror-image fashion data being uploaded to large data center.

The big data communication protocol of structuring the most according to claim 1, is characterized in that: utilize the big data communication of structuring to assist Data produced by the various information systeies set up of view have transplantability, the most no matter data migration to where can Keeping data implication constant, this interconnects for information system and provides conveniently.

The big data communication protocol of structuring the most according to claim 1, is characterized in that: the big data communication protocol of structuring can For the structural data offer communication protocol that interconnects between various data bases.