CN102663097A - Agricultural timing sequence data organization method based on Hadoop+Hbase - Google Patents

Agricultural timing sequence data organization method based on Hadoop+Hbase Download PDF

Info

Publication number
CN102663097A
CN102663097A CN2012101079153A CN201210107915A CN102663097A CN 102663097 A CN102663097 A CN 102663097A CN 2012101079153 A CN2012101079153 A CN 2012101079153A CN 201210107915 A CN201210107915 A CN 201210107915A CN 102663097 A CN102663097 A CN 102663097A
Authority
CN
China
Prior art keywords
data
value
time
reverse
real time
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN2012101079153A
Other languages
Chinese (zh)
Inventor
崔文顺
郭作玉
崔硕
王昕�
曹亚男
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
BEIJING HUAXIA SHENNONG INFORMATION TECHNOLOGY CO LTD
DAHUAXIA SHENNONG INFORMATION TECHNOLOGY Co Ltd LANGFANG CITY
INFORMATION CENTER MINISTRY OF AGRICULTURE OF PEOPLE'S REPUBLIC OF CHINA
Original Assignee
BEIJING HUAXIA SHENNONG INFORMATION TECHNOLOGY CO LTD
DAHUAXIA SHENNONG INFORMATION TECHNOLOGY Co Ltd LANGFANG CITY
INFORMATION CENTER MINISTRY OF AGRICULTURE OF PEOPLE'S REPUBLIC OF CHINA
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by BEIJING HUAXIA SHENNONG INFORMATION TECHNOLOGY CO LTD, DAHUAXIA SHENNONG INFORMATION TECHNOLOGY Co Ltd LANGFANG CITY, INFORMATION CENTER MINISTRY OF AGRICULTURE OF PEOPLE'S REPUBLIC OF CHINA filed Critical BEIJING HUAXIA SHENNONG INFORMATION TECHNOLOGY CO LTD
Priority to CN2012101079153A priority Critical patent/CN102663097A/en
Publication of CN102663097A publication Critical patent/CN102663097A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses an agricultural timing sequence data organization method based on Hadoop+Hbase, belonging to the field of analysis of agricultural economy technology information. The method mainly solves the scientific organization problem of massive agricultural economy technology data with time attribute on a Hadoop+Hbase cloud computing foundation platform and is used for storing the massive data. The key point of the technical scheme is that during a data organization storage stage, under the actual conditions that multiple agricultural economy technology data has time attribute and probability of searching the later data is higher, actual-time-data-assisted reverse timing sequence data is added in original data, and a reverse timing sequence value and an actual time value are in negative correlation on values, so that when the actual time is later, the reverse timing sequence value is smaller, and a higher ranking the ascending sort is in, the more quickly the data can be searched in the sequential search. During a data search stage, the actual time value provided by the user in a search condition is converted to the reverse timing sequence value to form a main key value, thereby realizing the quick search.

Description

A kind of agriculture time series data method for organizing based on Hadoop+Hbase
One, technical field
The analysis field of agricultural economy technical information.
Two, background technology
The current farm information development is rapid; Agricultural Website construction, agriculture ecommerce, agricultural, commodities market information, agricultural economy information are through the rapid enrichment of Internet; In the future that mobile Internet develops rapidly, the agricultural economy technical information also has the trend of explosive growth.This is the inevitable outcome of IT application to agriculture development, agriculture industrialization, agricultural modernization development on the one hand, has also proposed new demand for we gather, store, utilize these magnanimity informations for the agricultural production service on the other hand.
Now, be the Key-Value NoSQL cloud computing technology of representative with Hadoop, with its cheapness, stable, general, become the main platform that industry-by-industry carries out mass data collection, storage and analysis gradually.Its technology constantly is improved in application and develops.But,, still at the early-stage in agriculture magnanimity information processing field.The mass data that forms in conjunction with the characteristics of agricultural production and operation, and around the demand that the data processing that these data form is utilized, all also lack a lot of technique known means and handle efficiently.
The problem that the present invention solves is: the large database Hbase that on cloud computing basic platform Hadoop, disposes, storage then.Find that in utilization the Query Result performance is very slow, user experience is very poor.Passing through the organizational form of discovering with data has much relations, because Hadoop is based on major key sequential search data, major key designs unreasonable in proper order, will directly influence the speed that Query Result returns.A lot of data all have the time sequencing attribute; For example the market of farm produce pricing information of agricultural product is acquisition order, the storages according to the date; The futures market of agricultural products pricing information is acquisition order, the storage according to date and Hour Minute Second for another example, and also having agrometeorological data also is time sequencing collection, storage according to date and Hour Minute Second.Data time value more early is more little, and the lexicographic ordering of major key comes the front, and inquiry is just fast, and new more data time value is big more, and the lexicographic ordering of major key comes the back, and inquiry is just slow.Because in most cases, the user is to use up-to-date data, so the slow situation of inquiry velocity will frequently occur.
Three, summary of the invention
The objective of the invention is to study a kind of method for organizing, so that solve the slow problem of agricultural economy technical data inquiry velocity on the Hadoop+Hbase cloud computing basic platform that is stored in for agricultural economy technical data with time attribute.
For realizing that the object of the invention provides a kind of method for organizing for the agricultural economy technical data with time sequencing, comprise the following steps:
Step 100. is in the data organization stage, for the agricultural economy technical data increases reverse time series data.
Step 200. is in the data query stage, and the real time value that the user is provided in querying condition converts reverse sequential value into, forms major key, inquires about.
Said step 100 specifically comprises the steps:
The time granularity of selected real time of step 110.: the time by granule size can be divided into successively year, year+month, year+moon+day, the year+moon+day+hour, year+month+day+hour+minute, year+month+day+hour+minute+second, year+moon+day+hour+minute+second+polytypes such as millisecond.To select wherein a kind of as required.
Step 120. is set historical with reference to the sequential value: the moment of setting a history is historical reference time point; It is the time value consistent with the time granularity of real time; It is all little that this time value should be stored the time value of data than needs, the historical juncture remote that normally in the real time, can not occur.Further this time value is converted into a long positive number, promptly historical with reference to the sequential value, its numerical value equals 1;
Step 130. is set following with reference to the sequential value: set an inaccessible Future Time point of data that future, very remote native system was stored; It is the time value consistent with the time granularity of real time, and it is all big that this time value should be stored the time value of data than needs.Further with reference to the sequential value this time value is converted into a long positive number according to history, promptly following with reference to the sequential value.The character number with reference to the sequential value will be defined as the standard character number that the time content occupies in major key in this future.
Step 140. is provided with a time field and a reverse sequential field: the time field is used to deposit the real time value of this data set.Reverse sequential field is used to deposit the reverse sequential value of this data set, and reverse sequential value is worth corresponding one by one depositing with the real time;
Step 150. is calculated reverse sequential value: for each real time value is calculated corresponding reverse sequential value.Reverse sequential value=future is with reference to sequential value-actual sequential value.Wherein: it serves as a long positive number with reference to conversion with historical reference time value that actual sequential value equals the real time value, is chronomere's number of the historical reference time value of real time value distance.Actual sequential value is big more, and reverse sequential value is more little.
Step 160. is set up major key with reverse sequential value.With the important part of reverse sequential value as major key, the major key of setting up data deposits database in other data.Note, if the character number of reverse sequential value does not reach the standard character width, be at combination major key key assignments after the left side is with 0 polishing.
In the said step 200, comprise the steps:.
Step 210. is converted into actual sequential value with the real time value that the user selects.Actual sequential value equals the chronomere number of the real time value distance of user's selection with historical reference time value, is a long positive number.
Step 220. is calculated corresponding reverse sequential value: reverse sequential value=future is with reference to sequential value-actual sequential value.
Step 230. utilizes reverse sequential value to be combined into the key assignments of data major key.If the character number of reverse sequential value does not reach the standard character width, be at combination major key key assignments after the left side is with 0 polishing.
Step 240. is pressed major key key assignments inquiry Hbase database, from Query Result, can obtain the data of the real time corresponding with reverse sequential value.
Advantage of the present invention or good effect are:
The present invention need not change direct adaptation basic platform Hadoop+Hbase, and method is simple, and is easy to implement, is suitable for the data of most free sequential attributes, can significantly improve inquiry velocity again, improves customer experience.
Four, description of drawings
Fig. 1 is the flow chart of steps based on storage of the magnanimity agricultural economy technical data tissue of Hadoop+Hbase and querying method that the present invention proposes;
Fig. 2 is the concrete steps process flow diagram of the reverse sequential field data of calculating that proposes of the present invention;
Fig. 3 is the concrete steps process flow diagram according to the real time value conversion major key key assignments of user inquiring that the present invention proposes.
Five, embodiment
Further specify embodiment of the present invention below in conjunction with process flow diagram and instance.Should be appreciated that specific embodiment described herein only in order to explain the present invention, is not limited to the present invention.
As shown in Figure 1, the present invention can be divided into data organization stage and data query stage, comprises the following steps:
Step 100. is in the data organization stage, for the agricultural economy technical data increases reverse time series data.For agricultural economy with time attribute and technical data, memory space is along with time duration is ever-increasing, thus the real time in the raw data normally more and more the evening, its numerical value is increasing.So if set up major key according to real time value in the raw data, then the little data of real time real time value morning will be queried to earlier, and the big data of real time value in evening real time just are queried to after the meeting, so Query Result returns just slow.And we are the complementary data back sequential field that raw data increases, and its reverse sequential value and real time value are inverse correlation, and the real time value is big more, and reverse sequential value is more little, are used for setting up major key and have just inquired easily.So this step is to calculate the reverse sequential value corresponding with the real time value, and is increased in the raw data and goes.In conjunction with Fig. 2 following detailed step is described:
Step 110. setting-up time type: the time type of making major key by the time by granule size can be divided into successively year, year+month, year+moon+day, the year+moon+day+hour, year+month+day+hour+minute, year+month+day+hour+minute+second, year+moon+day+hour+minute+second+polytypes such as millisecond.To select as required wherein a kind of, below with year+moon+day be the example explanation.
Step 120. is set historical time point and historical with reference to the sequential value: historical time point of setting earlier, and it is consistent with the time granularity of real time, and it is all little to store the time value of data than needs.Setting is 1 to history that should the historical time value with reference to the sequential value.To the times selected type, set historical with reference to the sequential value.Set the historical time point earlier, present embodiment time type is the year+moon+day, so the historical time point is set at January 1 1900 Christian era, corresponding historical is 1 with reference to the sequential value.
Step 130. is set following with reference to the sequential value: set following reference time value earlier, can be set at Dec 31 5000 Christian era to the agricultural economy technical data.Because should future the reference time value be that fate is 1132618 apart from chronomere's number of historical reference time value, so can obtain corresponding future is 1132618 with reference to the sequential value.Have 7 characters with reference to the sequential value this future, so the standard character number of present embodiment is 7.
Step 140. setting-up time field and reverse sequential field: set the time field that type is the year+moon+day earlier, deposit the real time value of this data set; Set a reverse sequential field again, deposit the reverse sequential value of this data set with standard character number 7.Reverse sequential value is corresponding one by one in same data line with the real time value;
Step 150. is calculated reverse sequential value: for each real time value is calculated corresponding reverse sequential value.Reverse sequential value=future is with reference to sequential value-actual sequential value.Wherein: actual sequential value equals the chronomere number of real time value apart from historical reference time value, is a long positive number, and present embodiment does.Calculate earlier the actual sequential value of this data set real time value one by one, for example the fate of real time value " March 15 nineteen fifty " distance " on January 1st, 1900 " is 18337 days, so actual sequential value is 18337.Calculating corresponding reverse sequential value again is 1114281, because: following is 1114281 with reference to the reverse sequential value of the actual sequential value of sequential value 1132618-18337=.For another example, the fate of real time value " on January 1st, 3011 " distance " on January 1st, 1900 " is 405786 days, so actual sequential value is 405786.Calculating corresponding reverse sequential value again is 0726832, because: following is 726832 with reference to the reverse sequential value of the actual sequential value of sequential value 1132618-405786=, because of not enough standard character number is 0726832 after 0 polishing is used in the left side.Imitate this, can calculate each reverse sequential value one by one, insert among the corresponding reverse sequential field, form auxiliary data.
Step 160. is set up major key: with the part of reverse sequential value as major key, set up the major key key assignments of data, deposit database in other data.So far, the organization work of data is accomplished.
Step 200. is in the data query stage, and the real time value that the user is provided in querying condition converts reverse sequential value into, forms the major key key assignments, inquires about.The pattern of the data that a value or scope of specifying the real time stored as condition query more meets user's custom; For the utilization of reverse sequential value becomes transparent; Need convert the real time value into reverse sequential value; Find corresponding major key key assignments again, just can reach the purpose of fast query.So this step is a reduction formula of utilizing real time value and reversed time value the inquiry of real time is converted into corresponding major key key assignments inquiry.Thereby realize that fast query is to later data of real time.In conjunction with Fig. 3 following detailed step is described:
Step 210. is converted into actual sequential value with the real time value that the user selects: actual sequential value equals chronomere's number of the historical reference time value of real time value distance of user's selection, and present embodiment is a fate.For example the real time value of user's input is " on March 15th, 2010 ", and the fate of distance " on January 1st, 1900 " is 40252 days, so actual sequential value is 40252.
Step 220. is calculated corresponding reverse sequential value: reverse sequential value=future is with reference to sequential value-actual sequential value.For example the real time value of user's input is " on March 15th, 2010 ", and actual sequential value is 40252.Corresponding reverse sequential value=future is with reference to the actual sequential value of sequential value 1132618-40252=1092366.
Step 230. utilizes reverse sequential value to be combined into the key assignments of data major key: for example the real time value of user's input is " on March 15th, 2010 ", and then the reverse sequential value part of major key key assignments is " 1092366 ".
Step 240. is pressed major key key assignments inquiry Hbase database, from Query Result, can obtain the data of the real time corresponding with reverse sequential value.Because the real time of user input is later as a rule, corresponding actual sequential value is also bigger, can be by first search, so can find quickly than by actual sequential value by reverse sequential value.
Be that example is illustrated only below, should be appreciated that specific embodiment described herein only in order to explain the present invention, is not limited to the present invention with type year+moon time+day.

Claims (1)

1. agriculture time series data method for organizing based on Hadoop+Hbase is characterized in that:
In raw data, increase reverse time series data and be used as the important content of setting up the major key key assignments; Reverse time series data by with raw data in real time value one to one oppositely the sequential value form; Reverse sequential value is the number of real time value apart from the minimum time unit of Future Time point, is negative correlation with the real time value numerically.In the data query stage, the real time value that the user provides is converted into reverse sequential value in querying condition, form the major key key assignments and be used for retrieval and inquisition.
CN2012101079153A 2012-04-10 2012-04-10 Agricultural timing sequence data organization method based on Hadoop+Hbase Pending CN102663097A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2012101079153A CN102663097A (en) 2012-04-10 2012-04-10 Agricultural timing sequence data organization method based on Hadoop+Hbase

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2012101079153A CN102663097A (en) 2012-04-10 2012-04-10 Agricultural timing sequence data organization method based on Hadoop+Hbase

Publications (1)

Publication Number Publication Date
CN102663097A true CN102663097A (en) 2012-09-12

Family

ID=46772588

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2012101079153A Pending CN102663097A (en) 2012-04-10 2012-04-10 Agricultural timing sequence data organization method based on Hadoop+Hbase

Country Status (1)

Country Link
CN (1) CN102663097A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103605805A (en) * 2013-12-09 2014-02-26 冶金自动化研究设计院 Storage method of massive time series data
CN104750809A (en) * 2015-03-26 2015-07-01 中国科学院软件研究所 Storage method for supporting relation model and blended data of key-value structure
CN106682077A (en) * 2016-11-18 2017-05-17 山东鲁能软件技术有限公司 Method for storing massive time series data on basis of Hadoop technologies
CN107180072A (en) * 2017-03-31 2017-09-19 北京奇艺世纪科技有限公司 A kind of processing method and processing device of time series data
CN107239517A (en) * 2017-05-23 2017-10-10 中国联合网络通信集团有限公司 Many condition searching method and device based on Hbase databases

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101158976A (en) * 2007-11-21 2008-04-09 金蝶软件(中国)有限公司 Method and system for data-base recording enquire preprocess
CN101256561A (en) * 2007-03-02 2008-09-03 阿里巴巴集团控股有限公司 Method, apparatus and system for storing and accessing database data
CN101477532A (en) * 2008-12-23 2009-07-08 北京畅游天下网络技术有限公司 Method, apparatus and system for implementing data storage and access
US20110258199A1 (en) * 2010-04-16 2011-10-20 Salesforce.Com, Inc. Methods and systems for performing high volume searches in a multi-tenant store

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101256561A (en) * 2007-03-02 2008-09-03 阿里巴巴集团控股有限公司 Method, apparatus and system for storing and accessing database data
CN101158976A (en) * 2007-11-21 2008-04-09 金蝶软件(中国)有限公司 Method and system for data-base recording enquire preprocess
CN101477532A (en) * 2008-12-23 2009-07-08 北京畅游天下网络技术有限公司 Method, apparatus and system for implementing data storage and access
US20110258199A1 (en) * 2010-04-16 2011-10-20 Salesforce.Com, Inc. Methods and systems for performing high volume searches in a multi-tenant store

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103605805A (en) * 2013-12-09 2014-02-26 冶金自动化研究设计院 Storage method of massive time series data
CN103605805B (en) * 2013-12-09 2016-10-26 冶金自动化研究设计院 A kind of storage method of magnanimity time series data
CN104750809A (en) * 2015-03-26 2015-07-01 中国科学院软件研究所 Storage method for supporting relation model and blended data of key-value structure
CN104750809B (en) * 2015-03-26 2018-05-18 中国科学院软件研究所 A kind of blended data storage method for supporting relational model and key-value structure
CN106682077A (en) * 2016-11-18 2017-05-17 山东鲁能软件技术有限公司 Method for storing massive time series data on basis of Hadoop technologies
CN106682077B (en) * 2016-11-18 2020-06-09 山东鲁能软件技术有限公司 Mass time sequence data storage implementation method based on Hadoop technology
CN107180072A (en) * 2017-03-31 2017-09-19 北京奇艺世纪科技有限公司 A kind of processing method and processing device of time series data
CN107239517A (en) * 2017-05-23 2017-10-10 中国联合网络通信集团有限公司 Many condition searching method and device based on Hbase databases
CN107239517B (en) * 2017-05-23 2020-09-29 中国联合网络通信集团有限公司 Multi-condition searching method and device based on Hbase database

Similar Documents

Publication Publication Date Title
CN105589951B (en) A kind of mass remote sensing image meta-data distribution formula storage method and parallel query method
CN102456058B (en) Method and device for providing category information
EP3035211B1 (en) Visualizing large data volumes utilizing initial sampling and multi-stage calculations
CN106528787B (en) query method and device based on multidimensional analysis of mass data
CN102663097A (en) Agricultural timing sequence data organization method based on Hadoop+Hbase
CN106599052B (en) Apache Kylin-based data query system and method
CN111475509A (en) Big data-based user portrait and multidimensional analysis system
CN102890722A (en) Indexing method applied to time sequence historical database
CN102760138A (en) Classification method and device for user network behaviors and search method and device for user network behaviors
US20150356137A1 (en) Systems and Methods for Optimizing Data Analysis
CN103425772A (en) Method for searching massive data with multi-dimensional information
CN102254043A (en) Semantic mapping-based clothing image retrieving method
CN102254024A (en) Mass data processing system and method
CN104050235A (en) Distributed information retrieval method based on set selection
CN103123653A (en) Search engine retrieving ordering method based on Bayesian classification learning
CN103064903A (en) Method and device for searching images
CN102521364B (en) Method for inquiring shortest path between two points on map
CN104915449A (en) Faceted search system and method based on water conservancy object classification labels
CN102521321A (en) Video search method based on search term ambiguity and user preferences
CN105843842A (en) Multi-dimensional gathering querying and displaying system and method in big data environment
CN103309869A (en) Method and system for recommending display keyword of data object
CN102968464A (en) Index-based local resource quick retrieval system and retrieval method thereof
CN105740264A (en) Distributed XML database sorting method and apparatus
CN105069101A (en) Distributed index construction and search method
CN103902549A (en) Search data sorting method and device and data searching method and device

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C02 Deemed withdrawal of patent application after publication (patent law 2001)
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20120912