CN104933176A - Big data address hierarchical scheduling method based on MapReduce technology - Google Patents
Big data address hierarchical scheduling method based on MapReduce technology Download PDFInfo
- Publication number
- CN104933176A CN104933176A CN201510374579.2A CN201510374579A CN104933176A CN 104933176 A CN104933176 A CN 104933176A CN 201510374579 A CN201510374579 A CN 201510374579A CN 104933176 A CN104933176 A CN 104933176A
- Authority
- CN
- China
- Prior art keywords
- address
- large data
- rough
- contact
- scheduling method
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 21
- 238000004458 analytical method Methods 0.000 claims abstract description 4
- 235000019580 granularity Nutrition 0.000 abstract 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000003672 processing method Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/29—Geographical information databases
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Remote Sensing (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a big data address hierarchical scheduling method based on a MapReduce technology. The method comprises the following steps of: building a contact-address-oriented scheduling table; determining a business territorial scope; generating Key and Value in a Map stage; realizing dispatching analysis in the Reduce stage; and performing layer-by-layer downward dispatching, and the like. The big data address hierarchical scheduling method has the advantages that the contact-address-oriented scheduling is realized; the contact address can be upwards expanded to national or even intercontinental stage addresses, and can downwards extend to a more precise position; and the hierarchical scheduling according to addresses in different granularities can be supported.
Description
Technical field
The present invention relates to a kind of data processing method, particularly relate to a kind of large data address based on MapReduce technology and divide layer scheduling method.
Background technology
Address refers to country, province (autonomous region or municipality directly under the Central Government or special administrative region), city, district (county), town, street number (village's group), address structure has level, available characters string list shows address, as mailing address, home address, CompanyAddress and unit address etc., the existing algorithm based on address has: disk scheduling, IP scheduling and GPS scheduling.
For disk scheduling (prerequisite variable algorithm, the shortest seek time priority algorithm, scanning algorithm and scan round algorithm), the physical block address of disk is made up of cylinder number, head number and sector number.The access time completing a certain physical block of disk comprises seek time, rotational time and access time, and the target of disk scheduling is that seek time is as far as possible short as far as possible large with handling capacity.Address, from the physical block address (cylinder magnetic head sector) of disk with different, the large data address that disk scheduling is not suitable for described in this patent divides layer scheduling algorithm.
For IP address scheduling (IP datagram route), according to different addresses distributing IP address field.By the routing table stored in router, IP datagram is forwarded to the path (port) of particular network address.IP address only represents the computer identity of accessing Internet, and different from the contact address (national province, city and region town street number) described in this patent, the large data address that IP address dispatching algorithm is not suitable for described in this patent divides layer scheduling algorithm.
For GPS scheduling, its terminal receives satellite-signal by satellite antenna, automatically locates; Address information is sent overall control center by GPRS module by terminal; Overall control center utilizes internet or private network to extract positioning address, and shows in electronic chart.Described in GPS positioning address and this patent, address is basically identical, but the address of GPS location needs to pass to overall control center real-time, due to the requirement of real-time of positioning address, be difficult to weaken real-time demand (even not considering real-time demand), locator data can not be accumulated and generate large data.
Summary of the invention
The present invention aims to provide a kind of large data address based on MapReduce technology and divides layer scheduling method, achieve the scheduling towards contact address, contact address upwards can expand to country even continental level address, more elaborate position can be extended to downwards, can support to dispatch by the layering of different grain size address.
For achieving the above object, the present invention realizes by the following technical solutions:
Large data address based on MapReduce technology disclosed by the invention divides layer scheduling method, comprises the following steps:
Step 1, build dispatch list towards contact address, the row race of described dispatch list comprises essential information row race and the dispatch queue race of Problem Areas, described essential information row race is included in the correlative connection address column of Reduce stage content to be processed and large data, described dispatch queue race comprises the contact address being divided into rough address and better address row, choose can distinguish large data record field as the row key word of dispatch list, and row key word to be put in essential information row race;
Step 2, determine business territorial scope, the rough address of initialization and better address: according to the territorial scope of Problem Areas determination business, in the rough address rough address of contact address and better address being written to the dispatch queue race of dispatch list and better address row.
Step 3, generate Key and Value in the Map stage: by the rough address assignment of large Data relationship address to Key, by row key word+contact address+content assignment to be processed to Value.
Step 4, realize lexical analysis in the Reduce stage: according to the contact address of Key and Value, export rough address and better address that next stage address divides;
Step 5, successively to dispatching: initialization Job, set up the connection in schedule table data storehouse, source table and object table be all initialized as dispatch list table, by the correlative connection address of large data successively to dispatching, until bottom contact address; No person, repeats step 3 to step 5.
Preferably, described rough address comprises country, province or autonomous region or municipality directly under the Central Government or special administrative region, city or county, and described better address comprises district or town, street, community or number.
Preferably, in step 3, described row key word is order number or Customer ID number.
Preferably, described dispatch list HBase dispatch list.
Further, in step 1, for existing large data, ETL instrument is utilized by large data importing to dispatch list.
Preferably, when service surface is to the whole world, the National Address choosing contact address is rough address, and contact address remainder is better address.
Preferably, when business at home time, choosing the province of contact address or autonomous region or municipality directly under the Central Government or special administrative region is rough address, and contact address remainder is better address.
Further, described better address comprises goods yard number; Described content to be processed comprises quantity in stock.
The large data address based on MapReduce technology of disclosure of the invention divides layer scheduling method to have following characteristics:
The first, support that contact address is expanded up and down.Upwards expansion can support wider scheduling; Downward expansion can support address arrangement more accurately, is applicable to arranging towards address scheduling of different field (occasion).
The second, along with the circulation of algorithm steps 4 ~ 6 of the present invention advances, contact address successively single level address scheduling is downwards advanced, realizes the scheduling based on address different demarcation granularity.
3rd, the content that will dispatch (goods yard number and quantity in stock thereof as contact address) is placed in the content to be processed of the essential information row bunch of dispatch list, reduce certain manual working (as statistics quantity in stock generates existing quantity ordered, goods is assigned in warehouse and goods yard thereof number).
Beneficial effect of the present invention is as follows:
(1) by dividing the two-stage of contact address, the scheduling towards contact address is realized.
(2) by running the present invention, the scheduling to contact address different demarcation granularity is realized.
(3) contact address upwards can expand to country even continental level address.
(4) contact address can extend to more elaborate position downwards, as the goods yard number in warehouse.
(5) Scheduling content (as goods yard number and quantity in stock) is placed in content to be processed.
Accompanying drawing explanation
Fig. 1 is process flow diagram of the present invention.
Embodiment
In order to make object of the present invention, technical scheme and advantage clearly understand, below in conjunction with accompanying drawing, the present invention is further elaborated.
As shown in Figure 1, the large data address based on MapReduce technology disclosed by the invention divides layer scheduling method, comprises the following steps:
Step 1, build scheduling HBase table (abbreviation dispatch list) towards contact address
Build the dispatch list towards address, its row race comprises essential information and the dispatch queue race of Problem Areas.Essential information row bunch are included in the correlative connection address column of Reduce stage content to be processed and large data.Dispatch queue race comprises the rough address and better address row that contact address is divided into.Choose can distinguish large data record field as the row key word of dispatch list, as order number or the Customer ID of client, row key word is put in essential information row race.For existing large data, instrument (ETL etc.) can be utilized by large data importing to dispatch list; For also not having data at present, the table in background data base can carry out design dispatch list by above-mentioned requirements, can use method disclosed in this patent.
Step 2, determine business territorial scope, the rough address of initialization and better address
According to Problem Areas, determine the territorial scope of business: such as, if service surface is to the whole world, then the National Address choosing contact address is rough address, and the remainder of address is better address; If business at home, then the national province choosing contact address is rough address, and remainder is better address.The rough address of contact address and better address are written in the rough address of the dispatch queue race of dispatch list and better address row.
Step 3, generate Key and Value in the Map stage
First determine Key and Value that MapReduce programmes, the rough address of large Data relationship address is Key, and row key word (can distinguish the field of large data record)+contact address (dispatch address)+content to be processed is Value.
The correlative connection address of large data, rough address, row key word and content to be processed is read from dispatch list.Key ← rough address.Value ← row key word+contact address+content to be processed.
Step 4, realize lexical analysis in the Reduce stage
According to the contact address of Key and Value, export rough address and the better address of next stage address division, if rough address of this process is to province, the rough address of next stage is to address, city, and the remainder of contact address is better address.According to the row key word+content to be processed of Key and Value, large data are analyzed further, and content to be processed is dispatched in output next time.
Step 5, initialization Job, set up the connection of HBase database; Source table and object table are all initialized as dispatch list.
Successively dispatch by the correlative connection address of large data, this level when contact address has been dispatched, rough address is dispatch address from country to this level, the remaining part of address is better address, dispatch next stage address successively, until arrange (or dispensing) to complete, otherwise, repeated execution of steps 3 ~ 5.。
Certainly; the present invention also can have other various embodiments; when not deviating from the present invention's spirit and essence thereof; those of ordinary skill in the art can make various corresponding change and distortion according to the present invention, but these change accordingly and are out of shape the protection domain that all should belong to the claim appended by the present invention.
Claims (8)
1. the large data address based on MapReduce technology divides layer scheduling method, it is characterized in that, comprises the following steps:
Step 1, build dispatch list towards contact address, the row race of described dispatch list comprises essential information row race and the dispatch queue race of Problem Areas, described essential information row race is included in the correlative connection address column of Reduce stage content to be processed and large data, described dispatch queue race comprises the contact address being divided into rough address and better address, choose can distinguish large data record field as the row key word of dispatch list, and row key word to be put in essential information row race;
Step 2, determine business territorial scope, the rough address of initialization and better address: according to the territorial scope of Problem Areas determination business, in the rough address rough address of contact address and better address being written to the dispatch queue race of dispatch list and better address row.
Step 3, generate Key and Value in the Map stage: by the rough address assignment of large Data relationship address to Key, by row key word+contact address+content assignment to be processed to Value;
Step 4, realize lexical analysis in the Reduce stage: according to the contact address of Key and Value, export rough address and better address that next stage address divides;
Step 5, successively to dispatching: initialization Job, set up the connection in schedule table data storehouse, source table and object table be all initialized as dispatch list table, by the correlative connection address of large data successively to dispatching, until bottom contact address; No person, repeats step 3 to step 5.
2. the large data address based on MapReduce technology according to claim 1 divides layer scheduling method, it is characterized in that: described rough address comprises country, province or autonomous region or municipality directly under the Central Government or special administrative region, city or county, and described better address comprises district or town, street, community or number.
3. the large data address based on MapReduce technology according to claim 1 divides layer scheduling method, it is characterized in that: in step 3, and described row key word is order number or Customer ID number.
4. the large data address based on MapReduce technology according to claim 1 divides layer scheduling method, it is characterized in that: described dispatch list HBase dispatch list.
5. the large data address based on MapReduce technology according to claim 1 divides layer scheduling method, it is characterized in that: in step 1, for existing large data, utilizes ETL instrument by large data importing to dispatch list.
6. the large data address based on MapReduce technology according to claim 1 divides layer scheduling method, it is characterized in that: when service surface is to the whole world, and the National Address choosing contact address is rough address, and contact address remainder is better address.
7. the large data address based on MapReduce technology according to claim 1 divides layer scheduling method, it is characterized in that: when business at home time, choosing the province of contact address or autonomous region or municipality directly under the Central Government or special administrative region is rough address, and contact address remainder is better address.
8. the large data address based on MapReduce technology according to claim 1 divides layer scheduling method, it is characterized in that: described better address comprises goods yard number; Described content to be processed comprises quantity in stock.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510374579.2A CN104933176B (en) | 2015-06-30 | 2015-06-30 | Big data address based on MapReduce technologies is layered dispatching method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510374579.2A CN104933176B (en) | 2015-06-30 | 2015-06-30 | Big data address based on MapReduce technologies is layered dispatching method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN104933176A true CN104933176A (en) | 2015-09-23 |
CN104933176B CN104933176B (en) | 2018-10-12 |
Family
ID=54120343
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510374579.2A Expired - Fee Related CN104933176B (en) | 2015-06-30 | 2015-06-30 | Big data address based on MapReduce technologies is layered dispatching method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104933176B (en) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103297499A (en) * | 2013-04-19 | 2013-09-11 | 无锡成电科大科技发展有限公司 | Scheduling method and system based on cloud platform |
CN103414761A (en) * | 2013-07-23 | 2013-11-27 | 北京工业大学 | Mobile terminal cloud resource scheduling method based on Hadoop framework |
CN103605576A (en) * | 2013-11-25 | 2014-02-26 | 华中科技大学 | Multithreading-based MapReduce execution system |
US20150019531A1 (en) * | 2013-06-24 | 2015-01-15 | Great-Circle Technologies, Inc. | Method and apparatus for situational context for big data |
US8996523B1 (en) * | 2011-05-24 | 2015-03-31 | Google Inc. | Forming quality street addresses from multiple providers |
-
2015
- 2015-06-30 CN CN201510374579.2A patent/CN104933176B/en not_active Expired - Fee Related
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8996523B1 (en) * | 2011-05-24 | 2015-03-31 | Google Inc. | Forming quality street addresses from multiple providers |
CN103297499A (en) * | 2013-04-19 | 2013-09-11 | 无锡成电科大科技发展有限公司 | Scheduling method and system based on cloud platform |
US20150019531A1 (en) * | 2013-06-24 | 2015-01-15 | Great-Circle Technologies, Inc. | Method and apparatus for situational context for big data |
CN103414761A (en) * | 2013-07-23 | 2013-11-27 | 北京工业大学 | Mobile terminal cloud resource scheduling method based on Hadoop framework |
CN103605576A (en) * | 2013-11-25 | 2014-02-26 | 华中科技大学 | Multithreading-based MapReduce execution system |
Non-Patent Citations (2)
Title |
---|
朱斌: ""基于Hadoop的日志统计分析系统的设计与实现"", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
马盈: ""基于MapReduce构造多维数据及关联规则挖掘算法的研究与应用"", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
Also Published As
Publication number | Publication date |
---|---|
CN104933176B (en) | 2018-10-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109255565B (en) | Address attribution identification and logistics task distribution method and device | |
CA2653268C (en) | Sort plan optimization | |
CN109101474B (en) | Address aggregation method, package aggregation method and equipment | |
CN103164475B (en) | The merging method and system in multiple IP regional information storehouses | |
CN106210163B (en) | IP address-based localization method and device | |
CN104199860A (en) | Dataset fragmentation method based on two-dimensional geographic position information | |
CN103699615B (en) | A kind of quick cartographic representation method and system based on point vector data multilayered memory | |
CN110365747A (en) | Processing method, device, server and the computer readable storage medium of network request | |
CN106156332A (en) | The method screening vehicles passing in and out based on section seclected time and selection area | |
CN103853500A (en) | Method, device and system for distributing mass data | |
CN105227618A (en) | A kind of communication site's position information processing method and system | |
CN103455335A (en) | Multilevel classification Web implementation method | |
McLeod et al. | The use of a geographical information system for land‐based aquaculture planning | |
Van V. Coetzee et al. | Spatial relationships and movement patterns of the air cargo industry in airport regions | |
CN102591984A (en) | Optimizing method of query speed of point of interest data in navigation data | |
US9838283B2 (en) | Techniques for synchronized address coding and print sequencing | |
CN101963993B (en) | Method for fast searching database sheet table record | |
CN103476003B (en) | Geographic information storage method for mobile equipment and mobile equipment | |
Holl et al. | Spatial patterns and drivers of SME digitalisation | |
Costantino et al. | A new spatial shift‐share decomposition: An application to tourism competitiveness in Italian regions | |
US20130159207A1 (en) | Identifying location in package and mail delivery systems | |
CN104933176A (en) | Big data address hierarchical scheduling method based on MapReduce technology | |
CN111190976B (en) | Express mail signing method, express mail signing method of handheld terminal and storage medium | |
CN106407221A (en) | Address data retrieval method and apparatus | |
US20230267134A1 (en) | System and Method for Location Domain Name Service |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20181012 |
|
CF01 | Termination of patent right due to non-payment of annual fee |