CN106126547A - The big data communication protocol of structuring - Google Patents
The big data communication protocol of structuring Download PDFInfo
- Publication number
- CN106126547A CN106126547A CN201610427075.7A CN201610427075A CN106126547A CN 106126547 A CN106126547 A CN 106126547A CN 201610427075 A CN201610427075 A CN 201610427075A CN 106126547 A CN106126547 A CN 106126547A
- Authority
- CN
- China
- Prior art keywords
- data
- big
- structuring
- big data
- communication protocol
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/25—Integrating or interfacing systems involving database management systems
- G06F16/254—Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2455—Query execution
- G06F16/24564—Applying rules; Deductive queries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2465—Query processing support for facilitating data mining operations in structured databases
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Computational Linguistics (AREA)
- Fuzzy Systems (AREA)
- Mathematical Physics (AREA)
- Probability & Statistics with Applications (AREA)
- Software Systems (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The big data communication protocol of structuring is by the optimization of data and the change of software development model being avoided generation information island, making data can interconnect between each information system, and make data easily excavate.The big data communication protocol of structuring can make structural data have 12 technical characteristics " coupling of uniqueness, belongingness, identity, independence, integrity, standardization and system, the uniformity of structure, additive, transplantability, timeliness, verity ", and the data with 12 technical characteristics are only the big data of qualified structuring." data validity " that 12 technical characteristics are big data that can utilize the big data of structuring provides technical guarantee.
Description
Technical field
The big data communication protocol of structuring is a kind of communication protocol, is also a kind of to allow data become the qualified big number of structuring
According to technology.The big data communication protocol of structuring is also similar to that ETL, ETL are to process data produced by existing information system
Problem, and the big data communication protocol of structuring is to begin to prevention data at the beginning of design information system come into question.ETL is
Curing the disease for data, the big data communication protocol of structuring is that prevention data produces disease.ETL is to problem produced by prior art
Carrying out light maintenance, the big data communication protocol of structuring proposes new data processing scheme.The big data communication protocol of structuring
Also being a kind of software development model, the various information systeies utilizing the big data communication protocol of structuring to be set up are all big data letters
Breath system, as long as uploading to large data center the data in each big data information system in a mirror-image fashion can be summed into conjunction
The big data of structuring of lattice.The big data of qualified structuring are to change, without ETL, the structural data that can efficiently excavate.
Background technology
Arrival along with big data age, it has been found that all trades and professions a lot of information systeies, though but information system
Many demands that but can not meet big data age, information island is serious, it is difficult to interconnect, data sharing difficulty, all trades and professions
Existing a lot of data, though but data are many, be difficult to efficiently excavate.Be currently utilize relational database to solve these problems,
But the problem that local can only be solved, it is impossible to tackle the problem at its root.The big data communication protocol of structuring is aiming at these and asks
Inscribe and found.The big data communication protocol of structuring derives from imitation brain memory, association, thinking, starts from nineteen eighty-two, thinks at that time
Computer is allowed to imitate the association function of brain.
Summary of the invention
The big data communication protocol of structuring is by the optimization of data and the change of software development model are avoided information
Islanding problem, the problem that interconnects, data sharing problem produce, and make data easily excavate.The big data communication protocol of structuring
Data can be made to have 12 technical characteristics: " uniqueness, belongingness, recognizability, independence, integrity, standardization and system
Coupling (degree of coupling is zero), structure uniformity, can additive, portable, timeliness, verity ", meet the most simultaneously
The data of 12 technical characteristics are only the big data of qualified structuring.
Invent technical problem to be solved
Invent and to be solved technical problem is that in big data 4V " data type many (Variety) " problem and " data speed
Degree is fast (velocity) " problem.Targeted concrete technical problem: all trades and professions a lot of information systeies, but information
Though how system but can not meet the demand of big data age, information island is serious, it is difficult to interconnect, data sharing difficulty;Respectively
Row respectively already had a lot of data, though but data are many, be difficult to efficiently excavate.
Beneficial effect
Realization is interconnected, data sharing easy, and inquiry velocity is fast, and data mining is easy.
Detailed description of the invention
The innovation of the big data communication protocol of structuring shows following 5 aspects:
1, propose 12 technical characteristics of the big data of structuring first, meet the data of 12 technical characteristics the most simultaneously
Just can become the qualified big data of structuring.For making data meet 12 technical characteristics, found relative with 12 technical characteristics
12 data optimization methods answered.
2, the basis of communication is that both sides must use same agreement." the knot that the big data communication protocol of structuring is proposed
12 technical characteristics of the big data of structureization " it is exactly structural data " communication protocol " that interconnects.
3, in each data of the big data of structuring, embodiment " uniqueness of data " and " ownership of data are both increased
Property " data item.Existing database technology, owing to contributing to process small data, does not all account for the work of the two data item
With, existing data the most all do not have the two data item.The two data item is to show that data are qualified knot
The critical data item of the big data of structureization.
4, the standardization of data, standardization are emphasized especially.Because in big data environment, standardization, normalized data energy
Automatically imitate the association function of brain, thus increase substantially speed and the motility of inquiry data.Relational database is to data
It is not added with any restriction, is defined by the designer of data base oneself completely;The restriction to data of the structuring big data communication protocol
Very strict, absolutely not to fill designer perhaps and arbitrarily define data, all data must be all specification, and this is also to allow big data
The important measures easily excavated.
5, the verity that 12 technical characteristics are big data utilizing the big data of structuring provides safeguard.Small data simply exists
Certain internal institution uses, and big data are to use between a lot of units, the verity of the biggest data, notarization, authority, no
Repentance property just seems extremely important.
The big data communication protocol of structuring to time data-optimized be with " omnipotent data structure table " (as shown in Table 1) come
Storage data, " omnipotent data structure table " can store various structural datas with a table.
Table one: the example of omnipotent data structure table storage data
ID | Things code name | Transaction attribute | Transaction attribute value | Overlength property value | Unit | Adnexa | Time |
1099 | 1280 | Data Source | Guangzhou First Hospital | 2014.5.3 | |||
1100 | 1280 | Things is classified | Case history | 2014.5.3 | |||
1101 | 1280 | Things is classified | Inpatient cases | 2014.5.3 | |||
1102 | 1280 | Things is classified | Medical expense | 2014.5.3 | |||
1103 | 1280 | Identification card number | XXXXXXXXXX | 2014.5.3 | |||
1104 | 1280 | Admission number | XXXXXXXXXX | 2014.5.3 | |||
1105 | 1280 | Name | Zhang San | 2014.5.3 | |||
1106 | 1280 | Sex | Man | 2014.5.3 | |||
1107 | 1280 | Middle expenses for medicine | 56 | Unit | 2014.5.3 | ||
1108 | 1280 | Western medicine takes | 72 | Unit | 2014.5.3 | ||
1109 | 1280 | Other expenditure | 180 | Unit | 2014.5.3 |
Illustrate 1: 12 technical characteristics of the big data of qualified structuring and 12 data optimization methods
The big data of qualified structuring have 12 technical characteristics, only meet the knot of 12 technical characteristics in other words simultaneously
Structure data are only the big data of qualified structuring.The big data communication protocol of structuring makes structural data meet 12 skills exactly
The method of art characteristic.For making data have 12 technical characteristics of the big data of structuring, the big data communication protocol of structuring proposes
12 kinds of corresponding data optimization methods.
1, the uniqueness of data
The uniqueness of data: the various data of same thing, in life cycle, in different information systeies, all should
It is unique, discernible, it is impossible to become unrecognizable data because of time, the change in space.
The problem that the uniqueness of data is targeted: the various data of current same thing are in different information systeies
Expression-form is different, is difficult to when big data mining identify exactly.Such as, same commodity, different distributors'
Information system has different codings;Same patient is when different hospitals seeks medical advice, and the admission number of patient is different, greatly
When data environment looks into patient medical history, can be difficult to inquire about because of the unified identification code of the data neither one relevant to patient.
Data optimization methods one: allow all data of same thing, in different time, space, environment, it is necessary to contain
There is the big identification code of date that (or several) is unique, unified.Big identification code of date is the identity card of data, license plate number.
Big identification code of date and the ID in relational database have the difference of essence, and ID is to identify data in the range of a table, greatly
Identification code of date is mark data in the range of big data.
Big scope of data: the different scopes involved by big data is different.In international trade, big scope of data is complete
Ball, the big scope of data of the big data of national healthcare is medical industry, and the scope of data of the big data in Guangzhou is Guangzhou.
Big identification code of date can be divided into two kinds, and a kind of is the identification code of certain concrete things, just as the serial number of equipment, but
With the difference that the serial number of equipment has essence, equipment Serial Number is that enterprise oneself writes, and big identification code of date needs by the world
Unified standard encodes;Another kind is the identification code of certain class things.Such as, sell at each at the mobile phone understanding certain model
During the sales situation of business, it is necessary to the big identification code of date of this kind of model mobile phone, because mobile phone is sold by hundreds thousand of of the whole world
Business sells, and mobile phone producer needs information systeies hundreds thousand of with the whole world to interconnect.The data relevant to people should contain identity entirely
Card number, to guarantee in the world, section at any time, the data relevant to someone are all unique, can recognize that as same
Individual's.Big data can relate to multiple different information system, and small data is to survive in same information system, because of
This is in big data environment, and the uniqueness of data is the most extremely important, can cause less than that unify, standard, specification identification code
Data mining is extremely difficult.The uniqueness of data is the basis of big data mining, analysis.Big identification code of date must make can be square
Just data classified statistic.
2, the belongingness of data
The belongingness of data: each attribute of data things to be reflected, data to be reflected be return who is all (or
Whose collection person says by, from what comes in other words).
Data optimization methods two: " Data Source " data item will be contained in the data of each things." Data Source "
Being that structural data is provided with " belongingness ", generally, available units title represents " Data Source ".
Big data come from thousands of units, if not indicating " Data Source ", can cause identification when big data mining
Chaotic.
3, the identity of data
The identity of data: refer to allow information system can recognize that, allow people also can recognize that.Furthermore, not only to allow certainly
Oneself information system identification, also wants to allow other people information system identification, not only to allow and oneself can identify, also to allow other people know
Not.
The problem that the identity of data is targeted: the data in relational database only have data base designer oneself and
The information system of oneself can recognize that.Data in data base can only be explained by other people, other information systems by software,
Could identify after annotation, translation.
Data optimization methods three: make data can recognize that with suitable redundancy, comes with standard, specification natural language as far as possible
Express data, avoid expressing data with code as far as possible.Principle when being optimized data is " to allow the technology in corresponding field
Personnel can understand, and others' information system also can be identified, and can not be that the designer of data base can understand, and is also more than
The system of oneself can identify.”
In big data environment, a characteristic most important, most critical of data is exactly " identities of data ".Close coefficient
According to a strategy in storehouse it is: reduce data redundancy as far as possible.Relational database but adds knowledge while reducing data redundancy
The difficulty of other data.The strategy of the big data communication protocol of structuring and relational database contrast.The big data communication of structuring
The strategy of agreement: make data have recognizability with suitable redundancy, so that data can allow other people understand, also allow other people
Information system can identify.
Relational database is a kind of " data base that data, data structure, program, Database Systems are inseparable ".
Because the data in relational database have reformed into insignificant data, relation data after departing from concrete list structure and program
Data in storehouse only just have meaning in specific table.
" omnipotent data structure table " is a kind of " data structure that data are unrelated with program ", or perhaps what one " is
What is exactly, unrelated with program ".Because after the data in " omnipotent data structure table " depart from its data structure, its data true
Real implication is constant.Data standard, the natural language of specification in " omnipotent data structure table " and express, as long as understand from
So language, who can understand the real meaning of the data in " omnipotent data structure table ".
On the surface, relational database decreases data redundancy, and this is the big advantage of one.But, this is also to close coefficient
One of disadvantage according to storehouse.Relational database, while decreasing data redundancy, result also in data distortion.Data distortion
Result have led to " information exchange, information island, data mining difficult " etc. problem.In relational database, only pass through
Write substantial amounts of program, data distortion problem could be solved.Countless it turns out that, relational database factor data redundancy issue and pay
Go out the highest cost.When " data are inseparable with program ", to store, read, inquire about data and be necessary for writing big
The program of amount.When " when data are unrelated with program ", as long as writing a general program, other people is the most permissible by this program
Store easily, read, inquire about data, all develop substantial amounts of software without often developing a data base.
One principle of the big data communication protocol of structuring: the most do not consider data redundancy problem, exchange intelligence for space
Data oneself can be allowed to speak with easy to use rather than allow program replace data to speak.Relation data is then to pass through application program
And replace data to speak.Replacing program by data: would rather increase substantial amounts of " redundancy ", data to be made have independence, complete
Property, recognizability.In other words in order to make data have independence, integrity, recognizability, do not consider data redundancy problem, nothing
Opinion increases how many redundancies can.When by relational Database Design information system, always come in unscrambling data storehouse by program
Data.The serious disastrous effect that this strategy is brought needs to write substantial amounts of program when processing data exactly, and coding is not just
Data cannot be processed.
● the strategy of the big data communication protocol of structuring: at all costs, allow data oneself speak, stop to work as by program
Translation!
The purpose " allowing data oneself speak " is: no matter data are put into Anywhere, can be only in any environment
On the spot, same, complete implication is intactly given expression to.At big data age, data appear in different information systems
In system, it is therefore necessary to ensure that data are in different information systeies, have identical implication in different environment.Structuring is big
The purpose that data communication protocol makes data have " independence, integrity, identity, uniqueness, belongingness " allows data certainly exactly
Oneself speaks, and in big data environment, so can reduce the quantity of coding significantly.Data in relational database do not have
There is independence, also do not have integrity, relational database not to accomplish " allowing data oneself speak ".Data in relational database need
Complete implication just can be given expression to by various " relations ".The big data communication protocol of structuring can allow data oneself speak, and
Data in relational database need " relation " that be equipped with " very distant relatives " could give expression to corresponding implication exactly.
" relation " of " very distant relatives " of relational database: data have inseparable pass with Database Systems
System, data have inseparable relation with list structure, and data have inseparable relation, data and data with application program
There is between numerous tables in storehouse the relation of interwoveness.Data in relational database must rely on relational database system,
Data structure, data type, application program just have meaning.When the data in relational database system are departing from corresponding relation
Insignificant data have been reformed into after Database Systems, data structure, data type, application program.Current information system
It is existing that " problem of detached island of information, information exchanges problem, data-interface problem, the problem that interconnects, the upgrading of system are asked
Topic " etc., it is all oneself can not to speak due to the data in relational database system and cause.
During with relational database system design electronic medical record system, " patient's basic condition " can be used following form:
Table two: patient's basic condition table (table in relational database)
ID | HZXM | GZDW | ZB | XB | ZZ | NL | RQ | HF | BXRQ | MZ | CSZ |
26 | Hu Feng | Rubber plant | Workman | 0 | Mongolia road 2 | 32 | 1991-4-3 | ? | 1991-4-3 | The Chinese | I |
The data of above-mentioned form are the classical architecture forms in small data epoch." field name " is also critically important letter in fact
Breath, it is necessary to describe with standard, specification natural language." patient's basic condition " is entered through the big data communication protocol of structuring
After row optimizes, the expression-form in " omnipotent data structure table ":
Table three: patient's basic condition table (omnipotent data structure table)
ID | Things code name | Transaction attribute | Transaction attribute value | Overlength property value | Unit | Adnexa | Time |
100 | 1001 | Data Source | Shanghai City First Hospital | ||||
101 | 1001 | Things is classified | Case history | ||||
102 | 1001 | Things is classified | Inpatient cases | ||||
103 | 1001 | Things is classified | It is admitted to hospital case history | ||||
104 | 1001 | Things is classified | Patient's basic condition | ||||
105 | 1001 | Patient code | SH10-199103Z21 |
106 | 1001 | Healthy card number | XXXXXXXXXXXX09 | ||||
107 | 1001 | Identification card number | XXXXXXXXXXXXXX | ||||
108 | 1001 | Name | Hu Feng | ||||
109 | 1001 | Work unit | Shanghai rubber plant | ||||
110 | 1001 | Office rank | Workman | ||||
111 | 1001 | Sex | Female | ||||
112 | 1001 | Address | Mongolia road 20 | ||||
113 | 1001 | Age | 32 | ||||
114 | 1001 | Admission date | 1991-4-30 | ||||
115 | 1001 | Wed no | Married | ||||
116 | 1001 | History taking day Phase | 1991-4-30 | ||||
117 | 1001 | National | The Chinese | ||||
118 | 1001 | State of an illness representor | I |
Found by the contrast of above-mentioned two tables, be that one is completely with certainly by the information expressed by " omnipotent data structure table "
So distortionless information expressed by language, no matter this information is placed on what where its implication was just as.
The information stored with " omnipotent data structure table " on the surface can occupy the memory space of about a times more, but
So storage data can reduce a lot of complicated data pick-up, conversion work.In " omnipotent data structure table " " data are superfluous
Remaining " it is exactly to allow " data oneself are spoken ", allow data be independent of Database Systems, to be independent of data structure, be independent of data type,
It is independent of application program.The strategy of the big data communication protocol of structuring is " changing intelligent and easy to use with space ".Before 30 years
Comparing, the memory capacity of current hard disk has improve more than 100,000 times, and the cost of the memory space occupying about a times the lowest more,
It is negligible." allowing data oneself speak " is exactly to allow data just as natural language, and can accurately, inerrably express should
Some implications, it is not necessary to annotation, it is not required that the deciphering of application program.
4, the independence of data
The independence of data: data are not against Database Systems, not against data structure, not against annotation, not against answering
Certain implication is given expression to independently by program.
For problem: the data in relational database do not have independence, need by means of annotation, data structure, answer
Implication by program ability unscrambling data.The field name of a lot of tables in relational database is nonstandard letter abbreviations,
When presenting to user, needing by information system is that table adds that gauge outfit just can give expression to the real meaning of data.
Data optimization methods four: can be spoken with oneself by certain data redundancy but data, allows " data are not against number
According to storehouse system, not against data structure, not against annotating, giving expression to certain implication not against application program independently ", as above
Omnipotent data structure table shown in table three can realize the independence of data.
5, the integrity of data
The integrity of data: data are not against Database Systems, not against data structure, not against annotation, not against answering
Certain implication is intactly given expression to by program.
For problem: the data in relational database do not have an integrity, need by means of annotation, data structure, answer
By the implication that program ability unscrambling data is complete.
Data optimization methods five: can be spoken with oneself by certain data redundancy but data, allows " data are not against number
According to storehouse system, not against data structure, not against annotating, giving expression to certain implication not against application program independently ", as above
Omnipotent data structure table shown in table three can realize the independence of data.
6, the standardization of data
The standardization of data: data should be standard, specification, unified, unambiguous.
For problem: the current data in various information systeies are lack of standardization causes data mining extremely difficult.
Data optimization methods six: should ensure that data are specifications in Information System Design, data acquisition phase.
The standardization of data needs to set up at " the big data standard of GB, country big data standard, the big data standard of industry "
Basis on rather than set up certain intramural data standard, specification basis on.Only meet that " GB is big
Data standard, country big data standard, the big data standard of industry " the data of specification to may be eligible to become qualified structuring big
Data.Current problem be the data standard of each unit be oneself work out, different, there is no " GB big data mark
Accurate, country big data standard, the big data standard of industry ", this is the big obstacle hindering the development of big data.There are standard, rule
Model, and by standard, regulation enforcement, then when big data are excavated, it is no longer necessary to ETL.
How to embody the standardization of the big data of structuring: the standardization of data will be considered when design information system,
When collection, generation data, it is necessary in strict accordance with " the big data standard of GB, country big data standard, all trades and professions big data mark
Accurate " input data, generate data, only in this way, the data that information system is generated are only the data of specification.
The standardization of the data of all trades and professions, standardization are the engineerings that a quantities is huge.Only carry out this work,
Just can ensure that " standardization of the big data of structuring ".The standardization of data is the basis of big data.It may be said that there is no the mark of data
Standardization just has qualified big data.Big data engineering, standard is leading.For in a certain respect, due at present the most in the world, domestic
All trades and professions all fail to carry out data normalization work, so the most just not having qualified big data at present!
" data in information system name, database name, table name, field name, data base " will with standard, specification,
Unified natural language, avoids using nonstandard code as far as possible, and this is to allow the key of data self-assembling formation " associative relationship ", also
It it is the key realizing universal data retrieval.This be also the big data communication protocol of structuring advocate one of data normalization very important
Reason!In big data environment, this " associative relationship " can be that data mining brings great convenience, and can be significantly increased
The speed of inquiry data.
Relational database theory there is no any restriction to data, is all arbitrarily defined by designer.This is to close
Be the data in data base be very difficult to excavate a basic reason.The big data communication protocol of structuring to the requirement of data,
Limit the strictest.Be strict with data must be standard, specification, unified, it is necessary to meet 12 technical characteristics, each
Individual data all must be in strict conformity with international standard, national standard, industry standard.Forbid the arbitrarily private self-defining data of designer.
Data are just as universal machine ground parts, it is necessary to standardization is the most permissible.
Big data standard relates to each industry, also relates to various business.Big data standard relates to data
Standard, the standard of data structure, the standard of business, the standard of operation flow, information system standard etc..
At big data age, unified, standard, specification speech naturally must be used in information system, keep away as far as possible
Exempt to use code.This is to ensure that data independence, the integrity of data and the identity of data, reduces the degree of coupling of data and system
Requisite measure.
7, data and the coupling of system
Data and the coupling of system: data are the highest with the degree of coupling of system, and data are the highest to the degree of dependence of system.
When data to the degree of dependence of system higher time, data have once reformed into insignificant data departing from original system.
If data need not the deciphering of any information system, user just can understand, then the coupling of these data and information system
Right is zero.
For problem: the data in relational database are the highest with the degree of coupling of information system.In relational database
Data and Database Systems and data structure and application program are inseparable, and the data in relational database once depart from
Prime information system, to after in big data environment, has reformed into insignificant data.
Data optimization methods seven: must assure that each data is zero with the degree of coupling of information system.With suitable data
Redundancy and make data have independence, integrity, identity, standardization, uniqueness, belongingness, with independence, the data of data
Integrity, the identity of data, the standardization of data, the uniqueness of data, the belongingness of data and guarantee each data
It is all to be the data of zero with the information system degree of coupling.
The Data Source of big data is in the system of thousands of units, and therefore, the data in big data should be and be
The data that the degree of coupling is zero of system, are otherwise accomplished by writing a lot of level of application and carry out unscrambling data, and this can increase process data
Difficulty, cost.The personnel of the corresponding specialty of the various articles that people's natural language is write can directly understand, and is not required to
Wanting the deciphering of any information system, therefore, this data are zero with the degree of coupling of information system.In big data, its data
Amount number is in terms of hundred billion, if each data all have certain degree of coupling with system, then be accomplished by writing sea
The program of amount could understand big data.If each data in big data are the degrees of coupling with information system is zero
Data, then when processing big data, it is not necessary to write any program again and data are understood.
Designer's custom code of relational database represents various data.Such as, some designers represent with " 0 "
Women, represents male with " 1 ", and some designers represents women with " W ", represents male with " M ".In the face of thousands of letter
The data of more than hundreds billion of produced by breath system, this code nonstandard, nonstandard will bring for big data mining
Huge disaster.
Why the information system set up with relational database can produce serious problem of detached island of information, and one important former
Data because being in relational database are incomplete, the most independent, impalpable.Relational database is " to close with various
System " express the relation between various things.Data in relational database and relational database system, list structure and corresponding
Application program is inseparable, once separates, and the data in relational database will become insignificant data, the most this " pass
System " cause relational database to necessarily lead to " information island ".
Data in " omnipotent data structure table " are unrelated with Database Systems, list structure and application program, can be the most de-
Have an independent existence from Database Systems, list structure and application program.Data in " table one " are to lead to through the big data of structuring
Letter agreement be optimized after data, even if such data depart from list structure also can give expression to original implication.
The principle of big data: as far as possible avoid code, as far as possible with the natural language of standard.
Judge that data are the method for qualified big data: may be eligible to into the data that the information system degree of coupling is zero
For qualified big data.
Inference: owing to the data in current relational database are entirely the data closely coupled with information system, institute
It not the most qualified big data with the data in current relational database.
8, the uniformity of data structure
The uniformity of data structure: the data structure of the big data of qualified structuring must be unified.The most only " ten thousand
Energy data structure table " data can be made to realize " uniformity of data structure ".
For problem: the data structure of the data in each relational database is different.
Data optimization methods eight: the big data communication protocol of structuring utilizes " omnipotent data structure table " (as shown in following table four)
Realize " uniformity of data structure " of data.The big data communication protocol of structuring does not fill designer perhaps and designs any data
Structure, all structural datas all must be stored in one, or in duplicate, standard, the unified table of several structures.
The standardization of data structure is not accomplished with relational database theory.
Table four: omnipotent data structure table can realize the uniformity of data structure
The greatest problem of relational database is exactly that data structure is nonstandard.Data structure is not appointed by relational database theory
What limits, completely by the free definition data structure of designer.Data structure standard is the basis processing big data, and data are tied
It is extremely difficult that the nonstandard meeting of structure causes data to process.
9, data is additive
Data additive: refer to " make data (just as books) can be accumulated in one without any process
Rise ".
For problem: current relational database system has created a lot of data, but these data all can not be tired out
The big data of addition.
Data optimization methods nine: the additive of data passes through " uniqueness of data, the belongingness of data, the knowledge of data
Other property, the independence of data, the integrity of data, the standardization of data, data and the coupling of system, the unification of data structure
Property " realize, it may also be said to the data only simultaneously having these attributes just have additive.
Traditional information write on paper has can be additive, and library is exactly numerous books sums, and archives are exactly many
Many archives sum.If data have additive, then, the data of Guangzhou Government departments concentrate storage the most in a mirror-image fashion
Being equal to establish the big data in Guangzhou after cloud platform, all data of 97.8 ten thousand medical institutions in the whole nation are all with mirror image
Mode upload to national healthcare large data center and be equal to build up the big data of national healthcare.It's a pity, current is various
Data in information system do not have additive.
10, the transplantability of data
The transplantability of data: " no matter in data migration to any environment, data can keep original implication constant, energy
Allow various information system identification, user can be allowed to identify ", such data just have transplantability.
For problem: the information system set up with relational database is difficult to interconnect, data in i.e. one system
Can not be transplanted in another system.
Data optimization methods ten: the transplantability of data is by " uniqueness of data, the belongingness of data, the knowledge of data
Other property, the independence of data, the integrity of data, the standardization of data, data and the coupling of system, the unification of data structure
Property " realize, it may also be said to there are the data having these attributes just to have transplantability the most simultaneously.
The transplantability of data is related to interconnecting of information system.The data with transplantability could be at random at each
Interconnect between system.The transplantability of data and the additive of data are the same, and the data with transplantability also have tired
Additivity, simply the transplantability of data is used to embody whether data can interconnect between each system, adding up of data
Can property refer to numerous small datas is summed into big data.
11, the timeliness of data
The timeliness of data: each data in big data should have the corresponding time.
Data optimization methods 11: increase timestamp for each data.
12, the verity of data
The verity of data: small data is kept accounts and the data that produce just as oneself, big data are just as between not commensurate
Fund dealing and the data that produce, the verity of the biggest data is exactly very important.
Data optimization methods 12: must data are false proof, data are anti-tamper as important process, can be recognized by third party
The method that card, third party's notarization, third party's data are put on record makes the verity of data be guaranteed.
Illustrate 2: the uniqueness of data is to realize the basis of " Data for Global leads to "
In class, organize in such subenvironment and can distinguish everyone with everyone name, but at whole nation model
In enclosing, owing to number is too many, duplication of name a lot, the most only cannot identify everyone like clockwork by name.Big number
Simply being applied to according to the data in the relational database before the epoch that certain is in-house, therefore each data are easy for identifying,
If but the data in relational database are put in big data environment, then these data have just become unrecognizable number
According to.In big data environment, all data about people all must contain " identification card number ", and this is to show the unique of data
Property.
Relational database shows the uniqueness of the data in every table with " ID ".What relational database was considered is one
The Uniqueness of the data in table, and do not consider the data uniqueness problem in big data environment.Such as, in a lot of medical treatment
In information system, simply carry out the information of identified patient with " outpatient service number ", " admission number ", and do not contain the identification card number of patient.As
Fruit to inquire about the history data of certain patient in the big data environment of national healthcare, then, will due in the data of patient not
It is that inquiry causes the biggest difficulty containing identification card number, because the history data of patient is likely included in the whole nation 97.8 ten thousand
In the table of more than millions of produced by medical institutions of family.
In big data environment, " uniqueness of data " of the data of each things are exactly a very important problem.
" uniqueness of data " are to ensure that data have a key of " identities of data " in big data environment.Such as, giving birth to
Producing in the information system of producer, distributor, the code name of same part commodity all must be, globally unique, unified, standard, this
Sample just can ensure that data are discernible in big data environment.But, the most also do not accomplish this point, Ge Jia enterprise
Information system have oneself coded system, different, for same commodity, the coding of different enterprises is different
, this is the Global Link of data and big data analysis causes the biggest difficulty.
Qualified big data should be: buys a box medicine in pharmacy, can look into according to the unique coding above this box medicine
Ask the whole production of this box medicine, the various correlation circumstance of intermediate links, be which manufacturer production, when produce, when dispatch from the factory,
Which agent centre have passed through.
What World Economics needed most is " Data for Global leads to ", and the various data in the information system of the most global all enterprises are all
Can " interconnect ", in other words " whole world any two enterprise information system between can send in time, receive any
The data of commodity." current practical situation is: there is exclusive product coding rule in Mei Jia enterprise, when enterprise receives order,
Also needing to the manual discernible data of information system order data being converted to oneself, then the system of oneself could process visitor
The information system of the enterprise of the order data at family, only only a few can directly process the data that upstream firm is sent.This " complete
Ball data are the most obstructed " the basic reason of phenomenon be that current data deficiency " uniqueness of data ", there is no international uniform
, the commodity code standard of standard be " uniqueness of data " provide support.
Following the tracks of a kind of commodity at current intelligence all over the world, " uniqueness of data " are bases.A kind of number of commodity
According in the information system appearing in millions of the enterprises in the whole world, only embody the big identification code of date of " uniqueness of data "
From millions of information systeies, the data of this commodity can be identified like clockwork.The big data Unified coding in the whole world,
Decoding (can be referred to as big identification code of date) is a very important job in big data, is also an extremely complex work
Make.In international trade, order, the global Unified coding of commodity, decoding are the most extremely important, and this is commodity " Data for Global logical "
Basis.
For enterprise, at big data age, order, the international standard of commodity data, national standard, industry standard are complete
Ball enterprise realizes the basis of " Data for Global leads to ".Not having the standard of order, commodity, enterprise cannot enter into big data age.
Illustrate 3: the belongingness of data is to discriminate between a key of big data and small data
If from the perspective of relational database theory, increase " Data Source " and can make system produces substantial amounts of redundancy
Data.But, at big data age, Data Source to be dealt with is in the information system of more than millions of, therefore, the most very
It is necessary that what demonstrates each data comes from, not so, just cannot be distinguished by numerous data.In big data environment, " data are come
Source " it is exactly the most crucial data, also it is requisite.In big data, increase " Data Source " for each data
The purpose of data item is exactly to allow data no matter where can independently, intactly give expression to its complete implication.Data such as thing,
The various things of human society have it main, and data also should have it main.
Whether the key index distinguishing big data and small data is exactly to contain " Data Source " in data.Every do not contain
The data having " Data Source " are all small datas, are all the big data of underproof structuring, and this is that relational database master-hand is difficult to
Understand, but the concept that this is also database technology personnel has forwarded a mark of big data age the most to.Big number
According to faced by: units more than hundreds thousand of families, millions of above information systeies, tens million of above tables, many trillion
Above data.In big data environment, do not have " Data Source " that turbulence will be caused.At big data age, there are " data
Source " line number of coding code can be greatly reduced, " Data Source ", data sharing is needed exist for when data exchange
Time, it is desirable to have " Data Source ".
Illustrate 4: the standardization of data, standardization are the keys realizing universal data retrieval
The big data communication protocol of structuring is to found on the basis of imitating the memory of brain, association, thinking, starts
In nineteen eighty-two, it is desirable to calculate the association function (i.e. inquiry) of the apish brain of function at that time.The brain of people is processing data
Time the technology that used be " super high data fidelity treatment technology "." standardization of data, standardization realize universal data retrieval
Crucial ", this needs the angle from the super high data fidelity treatment technology of the brain of people to understand.People are from calculating at present
What the angle of machine technology understands is " data ", actually what is understood from the brain memory of people, association, the angle of thinking
It is most suitable to be that " data " are only.
The brain of people is large nature classic " computer ".Being only of being stored in the brain of people is the most qualified
" data "." data " in the brain of people are " super high data fidelities ".Data in the brain of people are all analog datas, almost
It is distortionless, is super high data fidelity, be real data, the various things of nature can be reflected the most truly,
It it is in the brain epitome of the various things of nature.The relation between data and data in the brain of people is with thing
The natural quality of thing and the natural relation naturally set up, can reflect between the various things of nature micro-truly
Wonderful relation, this is only brain and has the basic of super strong functional.
Data in computer are dead, and the information in the brain of people is alive.Brain can with break through, space, with
Time ground activate " the various things " in brain, playback various scenes in the past.Computer can also show a film, but computer is not
Associative relationship can be set up for each things in film.The brain of people can be associated another scene by a scene, meter
Calculation machine cannot.Brain, when recalling Pekinese's the Forbidden City, Great Wall, the most just can remember the Huangpu in Shanghai, another nictation
Just go to Guizhou Yellow fruit tree.Brain can realize " moment in thousands of year, in nictation 90,000 ".Between data and data in computer
There is no what relation, but the information of the things in any brain being input to people, the brain of people all can automatically with in brain
Correlate information between formed associative relationship, this associative relationship is to set up according to the natural quality of things.
The super high data fidelity treatment technology of the brain of people mainly has four kinds: 1, super high data fidelity acquisition technique;
2, the storage of super high data fidelity and reproducing technology;3, the relationship technology that super high-fidelity is formed between data and data (is formed
Associative relationship);4, super high-fidelity utilizes the relationship technology (i.e. with association to process data) between data.
" super high data fidelity acquisition technique " and " the super high guarantor of brain can be imitated better by current technology
True data storage and reproducing technology ".But prior art cannot realize the " super of (even say and cannot imitate) brain comprehensively at all
High-fidelity forms the relationship technology between data and data " and " the super high data fidelity treatment technology " of brain, both skills
Art is only brain and has the basic of super function.
Super high data fidelity acquisition technique: brain is to be felt by vision, audition, sense of touch, olfactory sensation, the sense of taste, the pain sensation etc.
Organ and gather data.
The storage of super high-fidelity and true reappearance data technique: brain not only can store number with the form of super high-fidelity
According to, arrive in brain just as the things of nature " is removed ", but also break through, space conventional things can have been made arbitrarily to reproduce
(association).Data in brain are the epitomes of the most concrete things of nature.
Super high-fidelity sets up the relationship technology between data: brain not only can gather, store data, prior
It is that brain can automatically allow data form similar association in the brain, close to association, simultaneously associative relationship.Number in brain
It is naturally to set up according to the natural quality of things according to associative relationship.Brain is more than super high-fidelity and has stored data,
But also the super natural relation with high fidelity stored between data and data.This is that prior art is difficult to imitate.
Super high-fidelity utilizes the relationship technology (data processing technique) between data: handled by computer is numeral
Signal, and be analogue signal entirely handled by the brain of people.Brain is with similar association, simultaneously association, close at the modes such as association
Manage the analog data (i.e. human thinking) of super high-fidelity.Prior art cannot imitate this technology at all comprehensively, can only local
Imitate.
Compare with example below and explain " the super high data fidelity treatment technology of brain ".Main explanation: natural
Things, the attribute of things, brain carry out the association between association, reasoning, and data and data and close according to the attribute of things
System is to set up according to the natural quality of things.
1, " people can judge by listening that you are to strike iron block, or is striking wood." this is because, people's
In the memory of brain, strike the sound that iron block sends and the most naturally link together with iron block, strike the sound of wood the most very
Naturally naturally linking together with wood, these information are all that people are received in daily life.Therefore, people
Corresponding things can be associated by sound.Computer can also store phonotape and videotape file, but computer can not realize sound
Associate naturally between sound and image, can not identify sound and image neatly.
2, " I throws limed egg several times in hands lightly, it is possible to judge that this limed egg is the best.”
During this is because good limed egg is gently thrown in hands, palm will feel a kind of quiver, and raw egg, ripe egg are the most not
Can produce vibration, bad limed egg also will not produce vibration.In the memory of my brain, vibration is built naturally with limed egg
Stand contact.
3, " when buying egg, egg is shaken in hand held lightly it may determine that go out the quality of egg." bad egg,
Putting the egg of time length in other words, shake lightly with hands, egg yolk, Ovum Gallus domesticus album inside egg will move, and in good egg
Yolk-egg white would not move.In my brain memory, these are about the information of egg, and the quality with egg is set up the most naturally
Play contact.
4, " see and set outside window dynamic, be known that and blow." people brain in stored the information that wind tree is dynamic.
5, " see that tree outside window, dynamic, is known that that is to have people shaking tree." to shake tree because behaving be different with wind tree
's.Wind tree, a lot of trees are the most dynamic.People shakes tree, and only one tree is dynamic, and other tree is motionless.And people shakes the tree that causes of tree moves, with
The tree that wind tree causes is dynamic is differentiated.
Compared with the brain of people, the most absolutely distortion data of the data in relational database.Relational database
Artificially for data opening relationships, relational database theory it is thought that the most prominent advantage of relational database, but this
It is only the most fatal defect of relational database!Because being data opening relationships artificially, destroying between the things of nature
The associate naturally of itself.Relational database can not set up contact according to the natural quality of things as the brain of people.Relation
One advantage of data base is that data redundancy is the least.But this is also the critical defect of relational database!Because relation data
Storehouse also causes data serious distortion while reducing data redundancy.The data of serious distortion cannot be according to the nature of things
Attribute and opening relationships naturally.
Relational database in different tables, has thus isolated the natural quality between things and things data-storing
Between relation.Relational database is the data-storing of same class things in same table, and the data of inhomogeneous things are deposited
Storage is in different tables.Brain is to classify things according to the natural quality of things, and things is same class, by thing
The natural quality of thing determines, the things having same alike result is exactly same class things.Plastic tub, plastic cup, plastic bag, mould
Charging basket, form is different, and brain is, according to the natural quality of plastics, they are classified as a class.For plastic cup, glass
Cup, steel bowl, brain is, according to the natural quality of " cup ", they are classified as a class.Data in brain are all at same
In table, various data just can extremely flexibly be classified by brain according to the natural quality of things.
" data " are not only code name, a symbol, and real " data " should be the epitome of the concrete things of nature.
The brain of people can link together with ferrum naturally striking the sound that iron block sent, and relational database cannot allow " data "
Realize such associate naturally.
The big data communication protocol of structuring has imitated the super high data fidelity treatment technology of brain.The big data of structuring are led to
" the artificial relation " that letter agreement seeks to firmly to root out in relational database, allow Dynamic data exchange ground, naturally according to things from
So attribute and set up " natural relation ".Relation in relational database is artificial foundation, destroys the nature between things
Relation.Want the super function and thinking of the brain making computer close to people, be necessary for as brain making data lose less
Very, data are enable to set up natural relation according to the natural quality of things.Also must root out stoutly artificially for data
The relation set up, because artificial relation is certain to destroy the natural relation between data and data.
The concept of " data " in computer is the narrowest." data " should not be " digital ", " code name ", but also should
This is the true reflection of things of nature, it is often more important that also should reflect the pass naturally between " data " and " data "
System." mobile phone " in computer simply numeral, and " mobile phone " in the brain of people is the real reflection of real " mobile phone ",
Brain have received the various signals of relevant " mobile phone " of magnanimity by vision, audition, haptic interface.Qualified " data " should
This is that distortion level is minimum, it is possible to reflects concrete things than more fully, also can truly reflect the nature between things
Relation.Data in relational database can not reflect the natural relation between data and data truly.Data and data it
Between relation absolutely not can set up artificially, and should be the natural quality by things itself and opening relationships naturally.Structure
Changing big data communication protocol is to be made data lack distortion as best one can by a certain amount of " data redundancy ", makes " data " and " data "
Between naturally set up " natural relation " according to the natural quality of things.
" information system name, database name, table name, field name " with standardized, unified, the natural language of specification,
As far as possible without code, in order to realize " association ".The title of information system, the title of data base, table name, field name are all to weigh very much
The transaction attribute wanted, all has important implication.The designer of relational database system gets used to code, english abbreviation, the Chinese
Language Pinyin abbreviation is as database name, table name, field name.This results in the data that domestic consumer fails to understand in relational database.
Relational database ignores this information because it handled be small data.In big data environment, these information are with regard to right and wrong
The most important, it is impossible to default.
In the big data communication protocol of structuring, in order to make data have independence, integrity, recognizability, each
Data both increase " title of information system, the title of data base, table name ", " title of information system, the name of data base
Claim, table name " " classification " of actually things, or perhaps the attribute of things.This way is that relation data master-hand is difficult to
That understand, mysterious, because this way adds substantial amounts of data redundancy.The big data communication protocol of structuring is at " number
According to redundancy " and " independence of data, the integrity of data, the identity of data, data and the degree of coupling of system " between select
The latter.Its objective is the real meaning allowing the ordinary people of the technology of being ignorant of also can understand data.
The data redundancy of relational database is considerably less, but its cost is, is ignorant of the ordinary people of technology and fails to understand relation data
Data in storehouse, the data in relational database can only be stored in corresponding data base, once departing from corresponding data base
Insignificant data are reformed into.Data in relational database need the translation by substantial amounts of application program could allow commonly
User understands.
If the data in data base are all standardized, normalized, then, these data just can basis naturally
" transaction attribute " and " transaction attribute value " in " omnipotent data structure table " and automatically simultaneous play that naturally " to associate " relation (logical
Cross index and set up).Owing to utilizing data produced by the various information systeies that the big data communication protocol of structuring set up complete
Portion is stored in one, or in several structures duplicate " omnipotent data structure table ", it is possible to write out easily
General " universal data retrieval " instrument.Such as, if the various medical information systems in the whole nation are all to assist with the big data communication of structuring
Discuss and set up, then just can be by the identification card number of patient easily from national healthcare large data center " association "
(inquiry) is to the history data of patient.Because the every data in the medical history of patient all contains identification card number (big data identification
Code), just can " be associated " to all data relevant with patient by the identification card number of patient.And current various medical datas
In not necessarily contain patient identity card number, so it is the most non-to inquire about the history data of patient from the information system of whole nation Ge Jia hospital
Often difficulty.
Why the big data communication protocol of structuring makes data meet 12 technical characteristics with substantial amounts of " data redundancy ",
Its basic goal is contemplated to make data become " data of high-fidelity ", and " data redundancy " compensate for the distortion of data, only " high
The data of fidelity " information system just can be made can to realize " super high data fidelity process " as the brain of people.
Illustrate 5: ETL conversion need not be carried out and can efficiently excavate and universal data retrieval can be realized
Will excavate the medical data in the current whole nation to be extremely difficult, and reason is current various information
Data in system are nonstandard, lack of standardization.Such as: medical industry has millions of tables, and hundreds billion of records, the structure of each table is each
Differ.The data in table different to so many structure are excavated, are inquired about, and need to write substantial amounts of degree.
If the various information systeies of the Ge Jia medical institutions in the whole nation are entirely and design by the big data communication protocol of structuring, then right
It will be easily that data produced by such information system carry out excavating, inquiring about.Because these information systeies all use
" omnipotent data structure table ", data therein be full standard, specification, unified.
The data mining of five: two kinds of methods of table, inquiry Contrast on effect table
" the most critical technology of big data is inquiring technology ": the feature of big data is big, just because of greatly, needed for wanting to obtain
Data are especially difficult, therefore, inquiring required data from big data is exactly most critical, followed by inquiring
The analysis of data, statistics.Therefore, it can be said that " big data are inquired about exactly ", the previous work of big data is to prepare, greatly for inquiry
The later stage work of data is to add up, analyze inquiring data, and the various work of big data are all centered by inquiry
Launch.
Illustrate 6: utilize the verity that 12 technical characteristics of the big data of structuring are big data to provide technical guarantee
Big data are the resources that a kind of image-stone oil is the most important.The verity of big data is the basis of big data, loses
The big data of verity are exactly data rubbish.Therefore, at big data age, the verity of big data how is guaranteed, it is simply that one
Very important task.
In the small data epoch, the data handled by various information systeies are mainly the data within constituent parts, data true
Reality is mainly controlled oneself control by constituent parts.At big data age, data are more than the internal circulation at constituent parts, with greater need in state
Circulating between each unit inside and outside, therefore, the verity of big data, notarization, authority are accomplished by being guaranteed, it is necessary to make big
Data have act of law as official document.The big data communication protocol of structuring carries from the verity that the angle of technology is big data
Supply to ensure." uniqueness of data " are the keys controlling big data " verities of data "." uniqueness of data " can lead to
Excessive identification code of date embodies, and " verities of data " that control big data can come real by controlling the identification code of big data
Existing, big identification code of date is " identity card " of the data of things, and what environment no matter the data of a things be in, its big number
It is all unique according to identification code.Big data not only data, code, symbol, be also a kind of resource, as a kind of commodity, as well as article,
As well as property, as treating resource, commodity, article, property, therefore to manage big data.Logistics, artificial abortion need substantial amounts of
Traffic-police controls, and data stream also to control.Country is to manage control commodity by the mechanism such as industrial and commercial bureau, customs, several
According to verity be also required to use similar industrial and commercial bureau, customs management to control the method for commodity to manage control, by every country
The national large data center of industrial and commercial bureau (or law court, the Ministry of Public Security, work letter committee etc.) is responsible for controlling the verity ratio of big data
Appropriate.
The big identification code of date of extensive stock, order etc. is responsible for coding by the national large data center of various countries and provides work
Make, and big identification code of date is put on record.National large data center is responsible for the examination of the various qualifications of constituent parts, only
The unit having passed through the examination & verification of national large data center is had to may be eligible to the big identification code of date of commodity, the order etc. that obtain.State
Family's level large data center is only responsible for providing big identification code of date, the examination & verification of the verity of the data of not responsible commodity, order etc..Number
According to verity go wrong and dispute occur time national level large data center " data police " verity of data is carried out
Examination & verification, and punish accordingly according to auditing result, and result is placed on record.Just as traffic, driver's row to oneself
Being responsible, when simply there is vehicle accident, traffic police just occurs.
Obtain the order of big identification code of date, official document etc. standby to national large data center or third party notary organization
Case, has order, official document that third party notary organization puts on record etc. just to have act of law just as having covered official seal.Do so is permissible
Save substantial amounts of paper document, also save the time of the transmission of order, official document etc..
Enterprise needs corresponding for the commodity various data to upload to national big after obtaining the big identification code of date of commodity
Data center puts on record.The client of enterprise can be obtained by national large data center according to the big data encoding of commodity
The various data of commodity.
Owing to being whole world Unified coding, can directly transmit between each enterprise information system, receive order, and to order
Content is understood.Data acquisition in order stores with " omnipotent data structure table ", and makes data have the big data of structuring
12 technical characteristics.Every " transaction attribute " (just as field name) in order must be that the whole world is unified.Each in order
Item " transaction attribute " can be different when expressing with various different language, therefore, it is also desirable to works out global standards, makes every
" transaction attribute " can be by international standard one_to_one corresponding in various language.Thus can be designed that general data understand,
Translation software instrument, is automatically performed the translation of the order of different language by software tool.
Current problem: can not interconnect between the information system of global enterprise.Reason is that each system is used
Data encoding disunity, lack of standardization, reception order data can not be directly transmitted between the information system of enterprise, need manually ordering
Forms data is entered in the system of oneself again.
The benefit of big identification code of date: realize Data for Global and lead to.Commodity Flow is guaranteed with data stream timely, accurate, comprehensive
Deng smooth circulation.By means of big identification code of date, enterprise can utilize 100,000, millions of information systeies of global metadata and
Follow the tracks of the commodity sale in all parts of the world, inventories.Global business information system interconnects the enterprise to supply chain upstream and downstream
Industry is the most beneficial, can be commodity production, circulation provide safeguard.
The certification of the qualification of national large data center identification code of date big to various tissues and personal use: various tissues
And individual can obtain the qualification using big identification code of date, but need examining by national large data center before use
Core, audit qualified after issue tool valid " big data E-seal ".After national large data center examination & verification, certification
The qualification of the various correlation functions using big identification code of date can be obtained, relevant information can be issued.National large data center
Notarization, authority ensure that " data validity " of big data.After big data have " data validity ", Ke Yiguang
Apply every field generally.
Big identification code of date has in terms of product anti-counterfeiting, Drug Administration and has been widely used.Enterprise can be each commodity
Apply for a big identification code of date, an identifying code.After user buys commodity, can be known according to the big data of commodity by mobile phone
Other code and obtain identifying code, identifying code is identical with on commodity is then certified products, is otherwise personation, or mobile telephone scanning Quick Response Code
Personation can be known whether.
Can manage various certificate easily with big identification code of date, the checking of certificate is very convenient, as long as according to big number
Just can find the information of certificate at national large data center according to identification code.Such as can be used for following certificate management: enterprise
Various qualifications, the various certificates of individual, the various certifications of enterprise, notarization, property ownership certificate, the commodity inspection quality certification, marriage certificate, graduation
Card, driving license (need not show driving license again, the number of saying or show Quick Response Code).Even must the most various not issue licence, as long as sending out one
Individual big data certificate.
With big identification code of date can manage easily " contract, file, contract, receipt, statement, various promise, bill,
Order, bid document, tender documents " etc..Large data center can also become a huge archive management system.International big number
According to the highest administration mechanism that center is the big data in the whole world, it is made up of every country, is responsible for the whole world big data standard, the system of specification
Order, establish rules for the big data in the whole world.
Illustrate 7: data produced by the various information systeies utilizing the big data communication protocol of structuring to be set up have tired
Additivity
Found the initial idea of the big data communication protocol of structuring: big data are exactly the biggest data of data volume, respectively
Row had respectively had a lot of small data already, and these small datas add up and can be referred to as big data?Big data can be referred to as,
But qualified big data can not be referred to as.Because these data are excavated extremely difficult!So, these small datas how are made
Qualified big data are become in the way of cumulative?Why current data can not be summed into qualified big data?Because closing
Be the data that data base produces be not real data, code can only be referred to as!What really to be understood is big data,
Needing what first makes clear is " data ", and what is " code ".
The definition of data: " information that the personnel of corresponding specialty can be allowed to understand just is referred to as real data." such as, relevant
The data of medical treatment should be the data that corresponding medical professional can directly understand, it is not necessary to other annotation, explanation;Relevantization
The data learned should be the data that the personnel of specialty chemical can understand, it is not necessary to other annotation, explains.
The definition of code: " information that the personnel of corresponding specialty can not understand is referred to as code, and corresponding professional needs profit
With corresponding application program, software tool code is translated, understand, annotate after just can understand the real meaning of code.”
For relational database, the data that domestic consumer is seen all by information system in relational database
Data carry out understanding, translate, annotate after data, be not the initial data in relational database.In relational database
Data do not possess " identity, independence, integrity ", when the most directly the data in relational database being presented to domestic consumer,
User can not " identify " these " data ", and reason is that relational database can not " independently ", " intactly " give expression to due
Implication.
The definition of qualified data: only can " (independence of data) independently " (not against software deciphering, disobey
Explanation by other people), " intactly (integrity of data) " give expression to due implication, and can allow people and other information systems
The data " identified (identities of data) " are only qualified data.But the data in relational database do not possess such spy
Property, the data that reason is in relational database are a kind of " data the highest with the degree of coupling of system ".In relational database
Data be inseparable with relational database system and application system.Data in relational database are once departing from relation
Database Systems and application system, just become data unrecognizable, insignificant.
Data relational database can so be described from the angle of 12 technical characteristics of the big data of structuring: by
" data " in relational database are inseparable with relational database system and application system (does not possesses " coupling with system
Conjunction property (degree of coupling is zero) "), so " data " (can not possess " independence ") independently, intactly (not possess " complete
Property ") allow people identify (not possessing " recognizability "), can not allow other information systems identification.
Can be reached a conclusion that due to the data in relational database " with the degree of coupling of system very by above-mentioned analysis
High ", the data in relational database, once departing from relational database system and application system, have just become unrecognizable, nothing
The data of meaning, so the data in relational database do not possess additive.Owing to current various information systeies are substantially all
Relational database is utilized to develop, so data produced by current information system can not be by cumulative method
Become qualified big data.
Why the information system set up with relational database is difficult to interconnect is because such information system institute
Generate data do not have " transplantability ", i.e. data can not directly from a system transplantation to another system, this is by big data
" data type many (Variety) " problem in 4V characteristic and cause.If each information system is all by " omnipotent data
Structural table " storage data, then " data type many (Variety) " problem is just readily solved.The most only " omnipotent data structure
Table " data can be made to have " structure uniformity " and " transplantability ", it is possible to make data depart from coupled relation with information system.
The big data communication protocol of structuring is aiming at the problem existing for relational database and foundes, it is therefore an objective to check on
It is that the data in data base are converted to qualified big data.Solution is: utilize " omnipotent data structure table " first to allow data " de-
Coupling ", make data have " structure uniformity ", make data have with " independence, integrity, standardization, uniqueness, belongingness "
" identity ".
Utilize prior art that data can be made to have " coupling (the degree of coupling of identity, independence, integrity and system
Be zero), structure uniformity ".But data can't be made really to have " additive " and " transplantability " merely with prior art.Knot
The big data communication protocol of structureization makes data really have " additive " and " transplantability " with " uniqueness, belongingness, standardization ", and
Efficiently solve " data speed (the velocity) " problem in big data 4V.Data are made to have " uniqueness, belongingness, rule
Plasticity " method be only the big data communication protocol of structuring core technology, be to aim at small data to be converted into big data and create
Stand, seem and do not have technology content, the most crucial.
The standardization of the data importance to big data: in the small data epoch, each information system is substantially all in unit
Internal use.Big data age, interconnecting between information system, excavate the data deriving from different information systeies,
Just become very distinct issues, therefore allowed data have standardization and be just very important.If there is no " international big data
Standard, country big data standard, the big data standard of all trades and professions ", then big data age is impossible to arrive.Why pole
Power emphasizes the importance of data standard, is because the big data communication protocol of structuring and derives from and imitate the association of brain and brain
After super high data fidelity treatment technology, the only whole standardization of data, could be automatically according to thing between data and data
The natural quality of thing and naturally set up associative relationship, had associative relationship, in big data 4V " data speed is fast
(velocity) " problem just can be readily solved!The most countless personages try every means and cannot fundamentally solve data
Excavating difficult problem, the data that one of them basic reason is that in each current information system are the most nonstandard
, nonstandard.If the data in each information system are all specification, unification, data mining will be easy to.Data
Standardization be one everybody know about, the most common concept, but surface is usual, and effect behind is the hugest!
Allowing data have standardization just can make the excavation of data become easy.Only the standardization of data is performed to ultimate attainment, allow all
Data be all standard, specification, unified, the super power of the standardization of data just can show.Data standard is talked about
Coming easily, it is the most difficult to do, and needs to spend huge manpower and materials, it has also become affect a key factor of big data.
" uniqueness of data " and " belongingnesses of data " do not have any technology content on the surface, are only data
Add two data item, two attributes.If the way it goes from the perspective of small data, due to the information in small data epoch
System is primarily used to process certain intramural data, and " uniqueness of data " are nothing technology at all, and " data
Belongingness " only can bring bulk redundancy for system.But at big data age, " uniqueness of data " and " belongingnesses of data "
Just there is epoch-making meaning, be the small data key that becomes big data, only add the two data item, small data ability
Become big data, every big data of the structuring not being qualified without the two data item, small data only stick this two
Individual label mays be eligible to enter big data age.
The belongingness of the data importance to big data: the scope of small data is certain unit, simply in an information system
System is survived, and the scope of big data is the whole world, faced by institute be the whole world millions of more than information system.Return for data increase
The purpose of attribute is to ensure which corner is data be put into and all keep constant, will not distortion.If without belongingness in data, that
, among data migration to other information systems after will distortion, in other words, after finding data from big data, just cannot
Know and find wherefrom.The belongingness of data is extremely important to big data, is the identity of data, additive, transplantability
Basis.
The uniqueness of the data importance to big data: the uniqueness of data be in order to easily in big data environment fast
Speed, catch data exactly, also for the association function making computer can imitate brain.The environment of big data is very big,
Can be the whole nation, it is also possible to be the whole world, uniqueness then can ensure that computer the most quickly and accurately data from sky
Margin cape catches.If without uniqueness, grab data in the world the most extremely difficult.Such as, the A commodity of enterprise appear in
In hundreds thousand of the retail shops in the whole world, if A commodity are without big identification code of date, enterprise wants the data 100,000 information system from the whole world
System catches stock, the sales data of A commodity, the most extremely difficult.Data can be hidden nowhere for uniqueness, has nowhere to run.Not yet
Having uniqueness, data will become different appearance in different information systeies as the White Bone Demon." number is increased for data
According to uniqueness ", be equal to be mounted with tracker for data.
Relation between 12 technical characteristics of data: " additive, transplantability " be by " 1, recognizability;2, independence;
3, integrity;4, standardization;5 and the coupling (degree of coupling is zero) of system;6, the uniformity of structure;7, uniqueness;8, ownership
Property " realize.The coupling (degree of coupling is zero) of data and system be by " 1, recognizability;2, independence;3, integrity;4、
Standardization;5, structure uniformity " realize.The recognizability of data is by " independence, integrity, standardization, uniqueness, returning
Attribute " realize.
Utilize data produced by the system designed by the big data communication protocol of structuring why can be summed into qualified
Big data?Because the data structure of all data is all identical, data are all specifications, it is not necessary to ETL has been to excavate the most
Data.Additive by data " uniqueness, belongingness, recognizability, independence, integrity, standardization and system
Coupling, the uniformity of structure " ensure.Data be provided with " uniqueness, belongingness, recognizability, independence, integrity,
Standardization and the coupling of system (degree of coupling is zero), the uniformity of structure " be just provided with additive.
Illustrate 8: the transplantability of data is that information system interconnects and provides conveniently
Why current information system is difficult to is interconnected, the data being because in current information system and system
The degree of coupling is the highest, has reformed into insignificant data after data are departing from relational database system and application system.Knot
The big data communication protocol of structureization by the optimization to data data be provided with " 1, recognizability;2, independence;3, integrity;
4, standardization;5 and the coupling (degree of coupling is zero) of system;6, structure uniformity;7, uniqueness;8, belongingness;9, the time
Property, 10, verity ", the data simultaneously having this eight big technical attributes just have " transplantability ".There are the data of " transplantability "
Implication in any information system is just as, and all keeps constant, i.e. directly data can be sent any data system
In and realize interconnect.
Illustrate 9: the big data communication protocol of structuring can be that data offer of interconnecting between each Database Systems is logical
Letter agreement
The communication protocol of the data interconnection intercommunication between each Database Systems:
1, need in each data base, set up an omnipotent data structure table, the omnipotent data structure in each Database Systems
The structure of table must be completely unified.
2, structural data to be sent must is fulfilled for 12 technical characteristics: " 1, uniqueness;2, belongingness;3, can know
Other property;4, independence;5, integrity;6, standardization;7 and the coupling (degree of coupling is zero) of system;8, structure uniformity;9、
Additive;10, transplantability;11, timeliness;12, verity.”
As long as meeting above-mentioned two conditions, any data between any data base can interconnect, because data
The recipient of sender and data be all with ten thousand Data Data structural tables storage data, so the recipient of data is receiving number
Can directly write the data to after according in the omnipotent data structure table in the data base of oneself.
Claims (9)
1. the big data communication protocol of structuring, is characterized in that: the big data communication protocol of structuring is that one realizes structuring
The communication protocol that data interconnect between various information systeies, is also that a kind of structural data is converted to qualified structure
The method changing big data, the big data communication protocol of structuring by 12 technical characteristics " uniqueness of data, the belongingness of data,
The identity of data, the independence of data, the integrity of data, the standardization of data, data with the coupling of system, data
The uniformity of structure, additive, the timeliness of the transplantability of data, data of data, the verity of data " composition, utilize
The various information systeies that the big data communication protocol of structuring is set up are all qualified big data information systems, as long as with mirror image
Mode uploads to large data center the data in each big data information system just can form the big data of qualified structuring.
The big data communication protocol of structuring the most according to claim 1, is characterized in that: the big data communication of available structured is assisted
View realizes " uniqueness of data ", i.e. allows in each data containing unique a, system in corresponding big data environment
One, the big identification code of date of standard, this is the key realizing data interconnection intercommunication, is also make small data become big data one
Item key technology.
The big data communication protocol of structuring the most according to claim 1, is characterized in that: guarantee with " belongingnesses of data "
Data recognizability in big data environment, i.e. allows in each data containing " Data Source ".
The big data communication protocol of structuring the most according to claim 1, is characterized in that: " information system name, database name,
Data in table name, field name, data base " standard to be used, specification, unified natural language, avoid using not advising as far as possible
The code of model, this is to allow the key of data self-assembling formation " associative relationship ", is also to improve inquiry velocity, realize the pass of universal data retrieval
Key.
The big data communication protocol of structuring the most according to claim 1, is characterized in that: with the big data communication protocol of structuring
The various information systeies set up are all big data information systems, as long as in a mirror-image fashion in each big data information system
Data upload to large data center and are the formation of the big data of qualified structuring, need not carry out when processing these data
ETL conversion can efficiently be excavated and can realize universal data retrieval.
The big data communication protocol of structuring the most according to claim 1, is characterized in that: the verity of big data is big data
Basis, the big data communication protocol of structuring can utilize 12 technical characteristics of the big data of structuring to be the true of big data
Property technical guarantee is provided, available big identification code of date and the Third Party Authentication of data, third party's notarization, third party are put on record
Method make the verity of data be guaranteed, and make big data have notarization, authority, can not repentance property.
The big data communication protocol of structuring the most according to claim 1, is characterized in that: utilize the big data communication of structuring to assist
Data produced by the various information systeies that view is set up have additive, these data need not carry out ETL conversion and get final product shape
Become the qualified big data of structuring, as long as in a mirror-image fashion data being uploaded to large data center.
The big data communication protocol of structuring the most according to claim 1, is characterized in that: utilize the big data communication of structuring to assist
Data produced by the various information systeies set up of view have transplantability, the most no matter data migration to where can
Keeping data implication constant, this interconnects for information system and provides conveniently.
The big data communication protocol of structuring the most according to claim 1, is characterized in that: the big data communication protocol of structuring can
For the structural data offer communication protocol that interconnects between various data bases.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610427075.7A CN106126547A (en) | 2016-06-08 | 2016-06-08 | The big data communication protocol of structuring |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610427075.7A CN106126547A (en) | 2016-06-08 | 2016-06-08 | The big data communication protocol of structuring |
Publications (1)
Publication Number | Publication Date |
---|---|
CN106126547A true CN106126547A (en) | 2016-11-16 |
Family
ID=57469554
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610427075.7A Pending CN106126547A (en) | 2016-06-08 | 2016-06-08 | The big data communication protocol of structuring |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106126547A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106845064A (en) * | 2016-11-25 | 2017-06-13 | 张金柱 | Big data and the transmission for medical treatment & health big data, extracting method and system |
CN109063507A (en) * | 2018-07-13 | 2018-12-21 | 上海派兰数据科技有限公司 | A kind of general design model for hospital information system analysis |
CN110692103A (en) * | 2017-06-08 | 2020-01-14 | 沟口智 | System login method |
CN112183771A (en) * | 2020-08-18 | 2021-01-05 | 北京城建信捷轨道交通工程咨询有限公司 | Intelligent operation and maintenance ecosystem for rail transit and operation method thereof |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20030115084A1 (en) * | 2001-12-19 | 2003-06-19 | Research Foundation Of State University Of New York | System and method for electronic medical record keeping |
CN101599088A (en) * | 2008-11-18 | 2009-12-09 | 北京美智医疗科技有限公司 | The mining multi-dimensional data system and method for medical information system |
CN103500225A (en) * | 2013-10-21 | 2014-01-08 | 樊梦真 | Method for structural storage of medical information |
-
2016
- 2016-06-08 CN CN201610427075.7A patent/CN106126547A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20030115084A1 (en) * | 2001-12-19 | 2003-06-19 | Research Foundation Of State University Of New York | System and method for electronic medical record keeping |
CN101599088A (en) * | 2008-11-18 | 2009-12-09 | 北京美智医疗科技有限公司 | The mining multi-dimensional data system and method for medical information system |
CN103500225A (en) * | 2013-10-21 | 2014-01-08 | 樊梦真 | Method for structural storage of medical information |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106845064A (en) * | 2016-11-25 | 2017-06-13 | 张金柱 | Big data and the transmission for medical treatment & health big data, extracting method and system |
CN110692103A (en) * | 2017-06-08 | 2020-01-14 | 沟口智 | System login method |
CN109063507A (en) * | 2018-07-13 | 2018-12-21 | 上海派兰数据科技有限公司 | A kind of general design model for hospital information system analysis |
CN112183771A (en) * | 2020-08-18 | 2021-01-05 | 北京城建信捷轨道交通工程咨询有限公司 | Intelligent operation and maintenance ecosystem for rail transit and operation method thereof |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Kitchin | Data lives: How data are made and shape our world | |
Turton | International relations and American dominance: A diverse discipline | |
Callon et al. | Peripheral vision: Economic markets as calculative collective devices | |
Turnhout et al. | ‘Measurementality’in biodiversity governance: knowledge, transparency, and the Intergovernmental Science-Policy Platform on Biodiversity and Ecosystem Services (IPBES) | |
Smiraglia | Cultural synergy in information institutions | |
CN106126547A (en) | The big data communication protocol of structuring | |
Veland et al. | All strings attached: Negotiating relationships of geographic information science | |
Marstine et al. | New directions in museum ethics | |
Gibbons et al. | Knowledge, stories, and culture in organizations | |
Gould | Considerations on governing heritage as a commons resource | |
MacLaughlin | Data driven nonprofits | |
CN107341461A (en) | Intelligent Recognition differentiates the method and its system of the art work true and false with analytical technology | |
Kislov et al. | Forthcoming plans for institutional transformation of Russian higher education | |
McLeod et al. | Record DNA: reconceptualising digital records as the future evidence base | |
Gibson | Aboriginal secret-sacred objects, their values and future prospects | |
Hayajneh | The legal protection of the intangible cultural heritage in the Hashemite Kingdom of Jordan | |
McMillan | Relations and Relationships: 40 years of people movements from ASEAN countries to New Zealand | |
Strazzullo et al. | An investigation of the translational asset: a proposed classification | |
Saad et al. | Blockchain technology–understanding its application in humanitarian supply chains | |
Esposito et al. | What’s Observed in a Rating | |
McIntyre et al. | Systemic praxis and education to protect the commons | |
Akakpo et al. | Chiefs in development in Ghana: a study of two contemporary Ghanaian chiefs | |
Illsley | Assembling the Historic Environment: Heritage in the Digital Making | |
Theodoropoulou | The socioethical concerns associated with Indigenous Oceanic cultural heritage materials | |
Menon | Technology immorality and its legal issues |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20161116 |