CN104933176A - Big data address hierarchical scheduling method based on MapReduce technology - Google Patents

Big data address hierarchical scheduling method based on MapReduce technology Download PDF

Info

Publication number
CN104933176A
CN104933176A CN201510374579.2A CN201510374579A CN104933176A CN 104933176 A CN104933176 A CN 104933176A CN 201510374579 A CN201510374579 A CN 201510374579A CN 104933176 A CN104933176 A CN 104933176A
Authority
CN
China
Prior art keywords
address
large data
rough
contact
scheduling method
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201510374579.2A
Other languages
Chinese (zh)
Other versions
CN104933176B (en
Inventor
胡自权
徐勇
尹德辉
龙汉安
夏纪毅
王柯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sichuan Medical University
Original Assignee
Sichuan Medical University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sichuan Medical University filed Critical Sichuan Medical University
Priority to CN201510374579.2A priority Critical patent/CN104933176B/en
Publication of CN104933176A publication Critical patent/CN104933176A/en
Application granted granted Critical
Publication of CN104933176B publication Critical patent/CN104933176B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/29Geographical information databases

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Remote Sensing (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a big data address hierarchical scheduling method based on a MapReduce technology. The method comprises the following steps of: building a contact-address-oriented scheduling table; determining a business territorial scope; generating Key and Value in a Map stage; realizing dispatching analysis in the Reduce stage; and performing layer-by-layer downward dispatching, and the like. The big data address hierarchical scheduling method has the advantages that the contact-address-oriented scheduling is realized; the contact address can be upwards expanded to national or even intercontinental stage addresses, and can downwards extend to a more precise position; and the hierarchical scheduling according to addresses in different granularities can be supported.

Description

Large data address based on MapReduce technology divides layer scheduling method
Technical field
The present invention relates to a kind of data processing method, particularly relate to a kind of large data address based on MapReduce technology and divide layer scheduling method.
Background technology
Address refers to country, province (autonomous region or municipality directly under the Central Government or special administrative region), city, district (county), town, street number (village's group), address structure has level, available characters string list shows address, as mailing address, home address, CompanyAddress and unit address etc., the existing algorithm based on address has: disk scheduling, IP scheduling and GPS scheduling.
For disk scheduling (prerequisite variable algorithm, the shortest seek time priority algorithm, scanning algorithm and scan round algorithm), the physical block address of disk is made up of cylinder number, head number and sector number.The access time completing a certain physical block of disk comprises seek time, rotational time and access time, and the target of disk scheduling is that seek time is as far as possible short as far as possible large with handling capacity.Address, from the physical block address (cylinder magnetic head sector) of disk with different, the large data address that disk scheduling is not suitable for described in this patent divides layer scheduling algorithm.
For IP address scheduling (IP datagram route), according to different addresses distributing IP address field.By the routing table stored in router, IP datagram is forwarded to the path (port) of particular network address.IP address only represents the computer identity of accessing Internet, and different from the contact address (national province, city and region town street number) described in this patent, the large data address that IP address dispatching algorithm is not suitable for described in this patent divides layer scheduling algorithm.
For GPS scheduling, its terminal receives satellite-signal by satellite antenna, automatically locates; Address information is sent overall control center by GPRS module by terminal; Overall control center utilizes internet or private network to extract positioning address, and shows in electronic chart.Described in GPS positioning address and this patent, address is basically identical, but the address of GPS location needs to pass to overall control center real-time, due to the requirement of real-time of positioning address, be difficult to weaken real-time demand (even not considering real-time demand), locator data can not be accumulated and generate large data.
Summary of the invention
The present invention aims to provide a kind of large data address based on MapReduce technology and divides layer scheduling method, achieve the scheduling towards contact address, contact address upwards can expand to country even continental level address, more elaborate position can be extended to downwards, can support to dispatch by the layering of different grain size address.
For achieving the above object, the present invention realizes by the following technical solutions:
Large data address based on MapReduce technology disclosed by the invention divides layer scheduling method, comprises the following steps:
Step 1, build dispatch list towards contact address, the row race of described dispatch list comprises essential information row race and the dispatch queue race of Problem Areas, described essential information row race is included in the correlative connection address column of Reduce stage content to be processed and large data, described dispatch queue race comprises the contact address being divided into rough address and better address row, choose can distinguish large data record field as the row key word of dispatch list, and row key word to be put in essential information row race;
Step 2, determine business territorial scope, the rough address of initialization and better address: according to the territorial scope of Problem Areas determination business, in the rough address rough address of contact address and better address being written to the dispatch queue race of dispatch list and better address row.
Step 3, generate Key and Value in the Map stage: by the rough address assignment of large Data relationship address to Key, by row key word+contact address+content assignment to be processed to Value.
Step 4, realize lexical analysis in the Reduce stage: according to the contact address of Key and Value, export rough address and better address that next stage address divides;
Step 5, successively to dispatching: initialization Job, set up the connection in schedule table data storehouse, source table and object table be all initialized as dispatch list table, by the correlative connection address of large data successively to dispatching, until bottom contact address; No person, repeats step 3 to step 5.
Preferably, described rough address comprises country, province or autonomous region or municipality directly under the Central Government or special administrative region, city or county, and described better address comprises district or town, street, community or number.
Preferably, in step 3, described row key word is order number or Customer ID number.
Preferably, described dispatch list HBase dispatch list.
Further, in step 1, for existing large data, ETL instrument is utilized by large data importing to dispatch list.
Preferably, when service surface is to the whole world, the National Address choosing contact address is rough address, and contact address remainder is better address.
Preferably, when business at home time, choosing the province of contact address or autonomous region or municipality directly under the Central Government or special administrative region is rough address, and contact address remainder is better address.
Further, described better address comprises goods yard number; Described content to be processed comprises quantity in stock.
The large data address based on MapReduce technology of disclosure of the invention divides layer scheduling method to have following characteristics:
The first, support that contact address is expanded up and down.Upwards expansion can support wider scheduling; Downward expansion can support address arrangement more accurately, is applicable to arranging towards address scheduling of different field (occasion).
The second, along with the circulation of algorithm steps 4 ~ 6 of the present invention advances, contact address successively single level address scheduling is downwards advanced, realizes the scheduling based on address different demarcation granularity.
3rd, the content that will dispatch (goods yard number and quantity in stock thereof as contact address) is placed in the content to be processed of the essential information row bunch of dispatch list, reduce certain manual working (as statistics quantity in stock generates existing quantity ordered, goods is assigned in warehouse and goods yard thereof number).
Beneficial effect of the present invention is as follows:
(1) by dividing the two-stage of contact address, the scheduling towards contact address is realized.
(2) by running the present invention, the scheduling to contact address different demarcation granularity is realized.
(3) contact address upwards can expand to country even continental level address.
(4) contact address can extend to more elaborate position downwards, as the goods yard number in warehouse.
(5) Scheduling content (as goods yard number and quantity in stock) is placed in content to be processed.
Accompanying drawing explanation
Fig. 1 is process flow diagram of the present invention.
Embodiment
In order to make object of the present invention, technical scheme and advantage clearly understand, below in conjunction with accompanying drawing, the present invention is further elaborated.
As shown in Figure 1, the large data address based on MapReduce technology disclosed by the invention divides layer scheduling method, comprises the following steps:
Step 1, build scheduling HBase table (abbreviation dispatch list) towards contact address
Build the dispatch list towards address, its row race comprises essential information and the dispatch queue race of Problem Areas.Essential information row bunch are included in the correlative connection address column of Reduce stage content to be processed and large data.Dispatch queue race comprises the rough address and better address row that contact address is divided into.Choose can distinguish large data record field as the row key word of dispatch list, as order number or the Customer ID of client, row key word is put in essential information row race.For existing large data, instrument (ETL etc.) can be utilized by large data importing to dispatch list; For also not having data at present, the table in background data base can carry out design dispatch list by above-mentioned requirements, can use method disclosed in this patent.
Step 2, determine business territorial scope, the rough address of initialization and better address
According to Problem Areas, determine the territorial scope of business: such as, if service surface is to the whole world, then the National Address choosing contact address is rough address, and the remainder of address is better address; If business at home, then the national province choosing contact address is rough address, and remainder is better address.The rough address of contact address and better address are written in the rough address of the dispatch queue race of dispatch list and better address row.
Step 3, generate Key and Value in the Map stage
First determine Key and Value that MapReduce programmes, the rough address of large Data relationship address is Key, and row key word (can distinguish the field of large data record)+contact address (dispatch address)+content to be processed is Value.
The correlative connection address of large data, rough address, row key word and content to be processed is read from dispatch list.Key ← rough address.Value ← row key word+contact address+content to be processed.
Step 4, realize lexical analysis in the Reduce stage
According to the contact address of Key and Value, export rough address and the better address of next stage address division, if rough address of this process is to province, the rough address of next stage is to address, city, and the remainder of contact address is better address.According to the row key word+content to be processed of Key and Value, large data are analyzed further, and content to be processed is dispatched in output next time.
Step 5, initialization Job, set up the connection of HBase database; Source table and object table are all initialized as dispatch list.
Successively dispatch by the correlative connection address of large data, this level when contact address has been dispatched, rough address is dispatch address from country to this level, the remaining part of address is better address, dispatch next stage address successively, until arrange (or dispensing) to complete, otherwise, repeated execution of steps 3 ~ 5.。
Certainly; the present invention also can have other various embodiments; when not deviating from the present invention's spirit and essence thereof; those of ordinary skill in the art can make various corresponding change and distortion according to the present invention, but these change accordingly and are out of shape the protection domain that all should belong to the claim appended by the present invention.

Claims (8)

1. the large data address based on MapReduce technology divides layer scheduling method, it is characterized in that, comprises the following steps:
Step 1, build dispatch list towards contact address, the row race of described dispatch list comprises essential information row race and the dispatch queue race of Problem Areas, described essential information row race is included in the correlative connection address column of Reduce stage content to be processed and large data, described dispatch queue race comprises the contact address being divided into rough address and better address, choose can distinguish large data record field as the row key word of dispatch list, and row key word to be put in essential information row race;
Step 2, determine business territorial scope, the rough address of initialization and better address: according to the territorial scope of Problem Areas determination business, in the rough address rough address of contact address and better address being written to the dispatch queue race of dispatch list and better address row.
Step 3, generate Key and Value in the Map stage: by the rough address assignment of large Data relationship address to Key, by row key word+contact address+content assignment to be processed to Value;
Step 4, realize lexical analysis in the Reduce stage: according to the contact address of Key and Value, export rough address and better address that next stage address divides;
Step 5, successively to dispatching: initialization Job, set up the connection in schedule table data storehouse, source table and object table be all initialized as dispatch list table, by the correlative connection address of large data successively to dispatching, until bottom contact address; No person, repeats step 3 to step 5.
2. the large data address based on MapReduce technology according to claim 1 divides layer scheduling method, it is characterized in that: described rough address comprises country, province or autonomous region or municipality directly under the Central Government or special administrative region, city or county, and described better address comprises district or town, street, community or number.
3. the large data address based on MapReduce technology according to claim 1 divides layer scheduling method, it is characterized in that: in step 3, and described row key word is order number or Customer ID number.
4. the large data address based on MapReduce technology according to claim 1 divides layer scheduling method, it is characterized in that: described dispatch list HBase dispatch list.
5. the large data address based on MapReduce technology according to claim 1 divides layer scheduling method, it is characterized in that: in step 1, for existing large data, utilizes ETL instrument by large data importing to dispatch list.
6. the large data address based on MapReduce technology according to claim 1 divides layer scheduling method, it is characterized in that: when service surface is to the whole world, and the National Address choosing contact address is rough address, and contact address remainder is better address.
7. the large data address based on MapReduce technology according to claim 1 divides layer scheduling method, it is characterized in that: when business at home time, choosing the province of contact address or autonomous region or municipality directly under the Central Government or special administrative region is rough address, and contact address remainder is better address.
8. the large data address based on MapReduce technology according to claim 1 divides layer scheduling method, it is characterized in that: described better address comprises goods yard number; Described content to be processed comprises quantity in stock.
CN201510374579.2A 2015-06-30 2015-06-30 Big data address based on MapReduce technologies is layered dispatching method Expired - Fee Related CN104933176B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510374579.2A CN104933176B (en) 2015-06-30 2015-06-30 Big data address based on MapReduce technologies is layered dispatching method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510374579.2A CN104933176B (en) 2015-06-30 2015-06-30 Big data address based on MapReduce technologies is layered dispatching method

Publications (2)

Publication Number Publication Date
CN104933176A true CN104933176A (en) 2015-09-23
CN104933176B CN104933176B (en) 2018-10-12

Family

ID=54120343

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510374579.2A Expired - Fee Related CN104933176B (en) 2015-06-30 2015-06-30 Big data address based on MapReduce technologies is layered dispatching method

Country Status (1)

Country Link
CN (1) CN104933176B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103297499A (en) * 2013-04-19 2013-09-11 无锡成电科大科技发展有限公司 Scheduling method and system based on cloud platform
CN103414761A (en) * 2013-07-23 2013-11-27 北京工业大学 Mobile terminal cloud resource scheduling method based on Hadoop framework
CN103605576A (en) * 2013-11-25 2014-02-26 华中科技大学 Multithreading-based MapReduce execution system
US20150019531A1 (en) * 2013-06-24 2015-01-15 Great-Circle Technologies, Inc. Method and apparatus for situational context for big data
US8996523B1 (en) * 2011-05-24 2015-03-31 Google Inc. Forming quality street addresses from multiple providers

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8996523B1 (en) * 2011-05-24 2015-03-31 Google Inc. Forming quality street addresses from multiple providers
CN103297499A (en) * 2013-04-19 2013-09-11 无锡成电科大科技发展有限公司 Scheduling method and system based on cloud platform
US20150019531A1 (en) * 2013-06-24 2015-01-15 Great-Circle Technologies, Inc. Method and apparatus for situational context for big data
CN103414761A (en) * 2013-07-23 2013-11-27 北京工业大学 Mobile terminal cloud resource scheduling method based on Hadoop framework
CN103605576A (en) * 2013-11-25 2014-02-26 华中科技大学 Multithreading-based MapReduce execution system

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
朱斌: ""基于Hadoop的日志统计分析系统的设计与实现"", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *
马盈: ""基于MapReduce构造多维数据及关联规则挖掘算法的研究与应用"", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Also Published As

Publication number Publication date
CN104933176B (en) 2018-10-12

Similar Documents

Publication Publication Date Title
CN109255565B (en) Address attribution identification and logistics task distribution method and device
CA2653268C (en) Sort plan optimization
CN109101474B (en) Address aggregation method, package aggregation method and equipment
CN103164475B (en) The merging method and system in multiple IP regional information storehouses
CN106210163B (en) IP address-based localization method and device
CN104199860A (en) Dataset fragmentation method based on two-dimensional geographic position information
CN103699615B (en) A kind of quick cartographic representation method and system based on point vector data multilayered memory
CN110365747A (en) Processing method, device, server and the computer readable storage medium of network request
CN106156332A (en) The method screening vehicles passing in and out based on section seclected time and selection area
CN103853500A (en) Method, device and system for distributing mass data
CN105227618A (en) A kind of communication site's position information processing method and system
CN103455335A (en) Multilevel classification Web implementation method
McLeod et al. The use of a geographical information system for land‐based aquaculture planning
Van V. Coetzee et al. Spatial relationships and movement patterns of the air cargo industry in airport regions
CN102591984A (en) Optimizing method of query speed of point of interest data in navigation data
US9838283B2 (en) Techniques for synchronized address coding and print sequencing
CN101963993B (en) Method for fast searching database sheet table record
CN103476003B (en) Geographic information storage method for mobile equipment and mobile equipment
Holl et al. Spatial patterns and drivers of SME digitalisation
Costantino et al. A new spatial shift‐share decomposition: An application to tourism competitiveness in Italian regions
US20130159207A1 (en) Identifying location in package and mail delivery systems
CN104933176A (en) Big data address hierarchical scheduling method based on MapReduce technology
CN111190976B (en) Express mail signing method, express mail signing method of handheld terminal and storage medium
CN106407221A (en) Address data retrieval method and apparatus
US20230267134A1 (en) System and Method for Location Domain Name Service

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20181012

CF01 Termination of patent right due to non-payment of annual fee