CN102663097A

CN102663097A - Agricultural timing sequence data organization method based on Hadoop+Hbase

Info

Publication number: CN102663097A
Application number: CN2012101079153A
Authority: CN
Inventors: 崔文顺; 郭作玉; 崔硕; 王昕�; 曹亚男
Original assignee: BEIJING HUAXIA SHENNONG INFORMATION TECHNOLOGY CO LTD; DAHUAXIA SHENNONG INFORMATION TECHNOLOGY Co Ltd LANGFANG CITY; INFORMATION CENTER MINISTRY OF AGRICULTURE OF PEOPLE'S REPUBLIC OF CHINA
Current assignee: BEIJING HUAXIA SHENNONG INFORMATION TECHNOLOGY CO LTD; DAHUAXIA SHENNONG INFORMATION TECHNOLOGY Co Ltd LANGFANG CITY; INFORMATION CENTER MINISTRY OF AGRICULTURE OF PEOPLE'S REPUBLIC OF CHINA
Priority date: 2012-04-10
Filing date: 2012-04-10
Publication date: 2012-09-12

Abstract

The invention discloses an agricultural timing sequence data organization method based on Hadoop+Hbase, belonging to the field of analysis of agricultural economy technology information. The method mainly solves the scientific organization problem of massive agricultural economy technology data with time attribute on a Hadoop+Hbase cloud computing foundation platform and is used for storing the massive data. The key point of the technical scheme is that during a data organization storage stage, under the actual conditions that multiple agricultural economy technology data has time attribute and probability of searching the later data is higher, actual-time-data-assisted reverse timing sequence data is added in original data, and a reverse timing sequence value and an actual time value are in negative correlation on values, so that when the actual time is later, the reverse timing sequence value is smaller, and a higher ranking the ascending sort is in, the more quickly the data can be searched in the sequential search. During a data search stage, the actual time value provided by the user in a search condition is converted to the reverse timing sequence value to form a main key value, thereby realizing the quick search.

Description

A kind of agriculture time series data method for organizing based on Hadoop+Hbase

One, technical field

The analysis field of agricultural economy technical information.

Two, background technology

The current farm information development is rapid; Agricultural Website construction, agriculture ecommerce, agricultural, commodities market information, agricultural economy information are through the rapid enrichment of Internet; In the future that mobile Internet develops rapidly, the agricultural economy technical information also has the trend of explosive growth.This is the inevitable outcome of IT application to agriculture development, agriculture industrialization, agricultural modernization development on the one hand, has also proposed new demand for we gather, store, utilize these magnanimity informations for the agricultural production service on the other hand.

Now, be the Key-Value NoSQL cloud computing technology of representative with Hadoop, with its cheapness, stable, general, become the main platform that industry-by-industry carries out mass data collection, storage and analysis gradually.Its technology constantly is improved in application and develops.But,, still at the early-stage in agriculture magnanimity information processing field.The mass data that forms in conjunction with the characteristics of agricultural production and operation, and around the demand that the data processing that these data form is utilized, all also lack a lot of technique known means and handle efficiently.

The problem that the present invention solves is: the large database Hbase that on cloud computing basic platform Hadoop, disposes, storage then.Find that in utilization the Query Result performance is very slow, user experience is very poor.Passing through the organizational form of discovering with data has much relations, because Hadoop is based on major key sequential search data, major key designs unreasonable in proper order, will directly influence the speed that Query Result returns.A lot of data all have the time sequencing attribute; For example the market of farm produce pricing information of agricultural product is acquisition order, the storages according to the date; The futures market of agricultural products pricing information is acquisition order, the storage according to date and Hour Minute Second for another example, and also having agrometeorological data also is time sequencing collection, storage according to date and Hour Minute Second.Data time value more early is more little, and the lexicographic ordering of major key comes the front, and inquiry is just fast, and new more data time value is big more, and the lexicographic ordering of major key comes the back, and inquiry is just slow.Because in most cases, the user is to use up-to-date data, so the slow situation of inquiry velocity will frequently occur.

Three, summary of the invention

The objective of the invention is to study a kind of method for organizing, so that solve the slow problem of agricultural economy technical data inquiry velocity on the Hadoop+Hbase cloud computing basic platform that is stored in for agricultural economy technical data with time attribute.

For realizing that the object of the invention provides a kind of method for organizing for the agricultural economy technical data with time sequencing, comprise the following steps:

Step 100. is in the data organization stage, for the agricultural economy technical data increases reverse time series data.

Step 200. is in the data query stage, and the real time value that the user is provided in querying condition converts reverse sequential value into, forms major key, inquires about.

Said step 100 specifically comprises the steps:

The time granularity of selected real time of step 110.: the time by granule size can be divided into successively year, year+month, year+moon+day, the year+moon+day+hour, year+month+day+hour+minute, year+month+day+hour+minute+second, year+moon+day+hour+minute+second+polytypes such as millisecond.To select wherein a kind of as required.

Step 120. is set historical with reference to the sequential value: the moment of setting a history is historical reference time point; It is the time value consistent with the time granularity of real time; It is all little that this time value should be stored the time value of data than needs, the historical juncture remote that normally in the real time, can not occur.Further this time value is converted into a long positive number, promptly historical with reference to the sequential value, its numerical value equals 1;

Step 130. is set following with reference to the sequential value: set an inaccessible Future Time point of data that future, very remote native system was stored; It is the time value consistent with the time granularity of real time, and it is all big that this time value should be stored the time value of data than needs.Further with reference to the sequential value this time value is converted into a long positive number according to history, promptly following with reference to the sequential value.The character number with reference to the sequential value will be defined as the standard character number that the time content occupies in major key in this future.

Step 140. is provided with a time field and a reverse sequential field: the time field is used to deposit the real time value of this data set.Reverse sequential field is used to deposit the reverse sequential value of this data set, and reverse sequential value is worth corresponding one by one depositing with the real time;

Step 150. is calculated reverse sequential value: for each real time value is calculated corresponding reverse sequential value.Reverse sequential value=future is with reference to sequential value-actual sequential value.Wherein: it serves as a long positive number with reference to conversion with historical reference time value that actual sequential value equals the real time value, is chronomere's number of the historical reference time value of real time value distance.Actual sequential value is big more, and reverse sequential value is more little.

Step 160. is set up major key with reverse sequential value.With the important part of reverse sequential value as major key, the major key of setting up data deposits database in other data.Note, if the character number of reverse sequential value does not reach the standard character width, be at combination major key key assignments after the left side is with 0 polishing.

In the said step 200, comprise the steps:.

Step 210. is converted into actual sequential value with the real time value that the user selects.Actual sequential value equals the chronomere number of the real time value distance of user's selection with historical reference time value, is a long positive number.

Step 220. is calculated corresponding reverse sequential value: reverse sequential value=future is with reference to sequential value-actual sequential value.

Step 230. utilizes reverse sequential value to be combined into the key assignments of data major key.If the character number of reverse sequential value does not reach the standard character width, be at combination major key key assignments after the left side is with 0 polishing.

Step 240. is pressed major key key assignments inquiry Hbase database, from Query Result, can obtain the data of the real time corresponding with reverse sequential value.

Advantage of the present invention or good effect are:

The present invention need not change direct adaptation basic platform Hadoop+Hbase, and method is simple, and is easy to implement, is suitable for the data of most free sequential attributes, can significantly improve inquiry velocity again, improves customer experience.

Four, description of drawings

Fig. 1 is the flow chart of steps based on storage of the magnanimity agricultural economy technical data tissue of Hadoop+Hbase and querying method that the present invention proposes;

Fig. 2 is the concrete steps process flow diagram of the reverse sequential field data of calculating that proposes of the present invention;

Fig. 3 is the concrete steps process flow diagram according to the real time value conversion major key key assignments of user inquiring that the present invention proposes.

Five, embodiment

Further specify embodiment of the present invention below in conjunction with process flow diagram and instance.Should be appreciated that specific embodiment described herein only in order to explain the present invention, is not limited to the present invention.

As shown in Figure 1, the present invention can be divided into data organization stage and data query stage, comprises the following steps:

Step 100. is in the data organization stage, for the agricultural economy technical data increases reverse time series data.For agricultural economy with time attribute and technical data, memory space is along with time duration is ever-increasing, thus the real time in the raw data normally more and more the evening, its numerical value is increasing.So if set up major key according to real time value in the raw data, then the little data of real time real time value morning will be queried to earlier, and the big data of real time value in evening real time just are queried to after the meeting, so Query Result returns just slow.And we are the complementary data back sequential field that raw data increases, and its reverse sequential value and real time value are inverse correlation, and the real time value is big more, and reverse sequential value is more little, are used for setting up major key and have just inquired easily.So this step is to calculate the reverse sequential value corresponding with the real time value, and is increased in the raw data and goes.In conjunction with Fig. 2 following detailed step is described:

Step 110. setting-up time type: the time type of making major key by the time by granule size can be divided into successively year, year+month, year+moon+day, the year+moon+day+hour, year+month+day+hour+minute, year+month+day+hour+minute+second, year+moon+day+hour+minute+second+polytypes such as millisecond.To select as required wherein a kind of, below with year+moon+day be the example explanation.

Step 120. is set historical time point and historical with reference to the sequential value: historical time point of setting earlier, and it is consistent with the time granularity of real time, and it is all little to store the time value of data than needs.Setting is 1 to history that should the historical time value with reference to the sequential value.To the times selected type, set historical with reference to the sequential value.Set the historical time point earlier, present embodiment time type is the year+moon+day, so the historical time point is set at January 1 1900 Christian era, corresponding historical is 1 with reference to the sequential value.

Step 130. is set following with reference to the sequential value: set following reference time value earlier, can be set at Dec 31 5000 Christian era to the agricultural economy technical data.Because should future the reference time value be that fate is 1132618 apart from chronomere's number of historical reference time value, so can obtain corresponding future is 1132618 with reference to the sequential value.Have 7 characters with reference to the sequential value this future, so the standard character number of present embodiment is 7.

Step 140. setting-up time field and reverse sequential field: set the time field that type is the year+moon+day earlier, deposit the real time value of this data set; Set a reverse sequential field again, deposit the reverse sequential value of this data set with standard character number 7.Reverse sequential value is corresponding one by one in same data line with the real time value;

Step 150. is calculated reverse sequential value: for each real time value is calculated corresponding reverse sequential value.Reverse sequential value=future is with reference to sequential value-actual sequential value.Wherein: actual sequential value equals the chronomere number of real time value apart from historical reference time value, is a long positive number, and present embodiment does.Calculate earlier the actual sequential value of this data set real time value one by one, for example the fate of real time value " March 15 nineteen fifty " distance " on January 1st, 1900 " is 18337 days, so actual sequential value is 18337.Calculating corresponding reverse sequential value again is 1114281, because: following is 1114281 with reference to the reverse sequential value of the actual sequential value of sequential value 1132618-18337=.For another example, the fate of real time value " on January 1st, 3011 " distance " on January 1st, 1900 " is 405786 days, so actual sequential value is 405786.Calculating corresponding reverse sequential value again is 0726832, because: following is 726832 with reference to the reverse sequential value of the actual sequential value of sequential value 1132618-405786=, because of not enough standard character number is 0726832 after 0 polishing is used in the left side.Imitate this, can calculate each reverse sequential value one by one, insert among the corresponding reverse sequential field, form auxiliary data.

Step 160. is set up major key: with the part of reverse sequential value as major key, set up the major key key assignments of data, deposit database in other data.So far, the organization work of data is accomplished.

Step 200. is in the data query stage, and the real time value that the user is provided in querying condition converts reverse sequential value into, forms the major key key assignments, inquires about.The pattern of the data that a value or scope of specifying the real time stored as condition query more meets user's custom; For the utilization of reverse sequential value becomes transparent; Need convert the real time value into reverse sequential value; Find corresponding major key key assignments again, just can reach the purpose of fast query.So this step is a reduction formula of utilizing real time value and reversed time value the inquiry of real time is converted into corresponding major key key assignments inquiry.Thereby realize that fast query is to later data of real time.In conjunction with Fig. 3 following detailed step is described:

Step 210. is converted into actual sequential value with the real time value that the user selects: actual sequential value equals chronomere's number of the historical reference time value of real time value distance of user's selection, and present embodiment is a fate.For example the real time value of user's input is " on March 15th, 2010 ", and the fate of distance " on January 1st, 1900 " is 40252 days, so actual sequential value is 40252.

Step 220. is calculated corresponding reverse sequential value: reverse sequential value=future is with reference to sequential value-actual sequential value.For example the real time value of user's input is " on March 15th, 2010 ", and actual sequential value is 40252.Corresponding reverse sequential value=future is with reference to the actual sequential value of sequential value 1132618-40252=1092366.

Step 230. utilizes reverse sequential value to be combined into the key assignments of data major key: for example the real time value of user's input is " on March 15th, 2010 ", and then the reverse sequential value part of major key key assignments is " 1092366 ".

Step 240. is pressed major key key assignments inquiry Hbase database, from Query Result, can obtain the data of the real time corresponding with reverse sequential value.Because the real time of user input is later as a rule, corresponding actual sequential value is also bigger, can be by first search, so can find quickly than by actual sequential value by reverse sequential value.

Be that example is illustrated only below, should be appreciated that specific embodiment described herein only in order to explain the present invention, is not limited to the present invention with type year+moon time+day.

Claims

1. agriculture time series data method for organizing based on Hadoop+Hbase is characterized in that:

In raw data, increase reverse time series data and be used as the important content of setting up the major key key assignments; Reverse time series data by with raw data in real time value one to one oppositely the sequential value form; Reverse sequential value is the number of real time value apart from the minimum time unit of Future Time point, is negative correlation with the real time value numerically.In the data query stage, the real time value that the user provides is converted into reverse sequential value in querying condition, form the major key key assignments and be used for retrieval and inquisition.