CN107766530A - A kind of method and its device of gathered data distribution - Google Patents

A kind of method and its device of gathered data distribution Download PDF

Info

Publication number
CN107766530A
CN107766530A CN201711018854.2A CN201711018854A CN107766530A CN 107766530 A CN107766530 A CN 107766530A CN 201711018854 A CN201711018854 A CN 201711018854A CN 107766530 A CN107766530 A CN 107766530A
Authority
CN
China
Prior art keywords
data
information
gathered
data distribution
stock
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201711018854.2A
Other languages
Chinese (zh)
Inventor
王清霞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Plastic Technology Co Ltd
Original Assignee
Beijing Plastic Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Plastic Technology Co Ltd filed Critical Beijing Plastic Technology Co Ltd
Priority to CN201711018854.2A priority Critical patent/CN107766530A/en
Publication of CN107766530A publication Critical patent/CN107766530A/en
Pending legal-status Critical Current

Links

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a kind of method of gathered data distribution and its device, this method to include several flows:A, data collection data collection, B, data acquisition, C, data distribution, wherein described data set owner will include wanting to buy information, information provision, recovery client, top four category informations of information;Wherein described data acquisition mainly builds acquisition tasks by crawler technology, to data analysis management, realizes the crawl to waste and old industry data accuracy, and carries out data classification according to certain rule and screening criteria, forms a process of database file.Using this method, it can will collect data, substantial amounts of target customer and expert data are obtained by analysis mining, produce potential customer list simultaneously, product information interested is provided for purchaser, improves customer order amount according to embodiments of the present invention so that business effect is fine, and information rank is improved, helps client preferably to complete product trading.

Description

A kind of method and its device of gathered data distribution
Technical field
The present invention relates to Computer Applied Technology field, more particularly to a kind of method of gathered data distribution and its dress Put.
Background technology
At present, with the development of Internet technology and the growth of network magnanimity information, the acquisition to information turns into sorting A kind of increasing demand.It after magnanimity information and data acquisition, will be sorted and secondary operation, realized by crawler technology Gathered data is worth and interests bigizationner, more specialized purpose.
Reptile, gatherer process is completed jointly from thread Anywhere to hundreds of by one, wherein each line Journey can be shown on flow iterative cycles, the thread in reptile module operates in different nodes under distributed system environment In different processes.In distribution crawls, a main frame divider is used(host splitter)By by filtering detection URL is assigned to different acquisition nodes up, also just says, the host object to be gathered can be assigned to different nodes and be adopted Collection.The output result of main frame divider can be input in the repetition URL detection modules of each acquisition node of distributed system.
The present invention proposes a kind of method and its device of gathered data distribution, captures the high-quality number on internet as far as possible According to, while scheduling, ageing is resolved, each side problem is stored, data will be collected, is obtained rapidly by analysis mining a large amount of Target customer and expert data, while produce potential customer list, product information interested provided for purchaser, to big The collection, analysis and depth for measuring data are excavated, and huge business opportunity is provided for purchaser.
The content of the invention
In view of this, it is a primary object of the present invention to provide a kind of method and its device of gathered data distribution, cause Power will gather information electric business in guiding enterprise, there is provided the low product of valency of fine quality, save production cost so that product trading is more It is efficiently convenient, meanwhile, concentration shows quality product, increases popularity of enterprise, and more order chances are obtained for supplier.
To reach above-mentioned purpose, the technical proposal of the invention is realized in this way:
A kind of gathered data dispensing device, including:
Data collection data collection module, it is main to include wanting to buy information, information provision, recovery client, top four category informations of information;
Data acquisition module, acquisition tasks are built by crawler technology, to data analysis management, realized to waste and old industry data essence The crawl of parasexuality, and data classification is carried out according to certain rule and screening criteria, form a process of database file;
Data distribution module, by judging the classification of data message, if text message, then waste and old information processing is carried out, such as Fruit is user data, also further to determine whether member by crm system.
Present invention also offers a kind of method of gathered data distribution, this method includes:
A, data collection data collection;
B, pond data are gathered;
C, data pool;
D, judge data message classification, if text message, then perform step E, if user data, then perform step F;
E, waste and old information processing;
F, crm system;
G, judge whether user is effective, if invalid, perform step H, if it is valid, performing step I;
H, terminate;
I, member.
Further, step D includes:Waste plastics, broken material and regeneration particle;
Further, step E mainly includes:Stock city and general information, a stock stock in hand inside the city, mode of doing business are that commission is handed over Easily, and general information is user independently merchandises, that is, self-dealing, not by platform, and not amount of money involved of independently merchandising, Consignment trade amount of money involved;
Further, the system described in step F is the built-in system that company contact staff uses, for safeguarding customer data, after Phase gives client's warm call product.
Further, the member described in step I is paid for a kind of mark of advanced member, and member refers to website(Such as then Mould precious APP)It is supplied to a kind of payment product of client, and client to establish a kind of status symbol of cooperation with website and pass through reality The functions such as ground certification, credit accumulation, individual value-added service are that client facilitates transaction, open the gate of ecommerce.
The method and its device of a kind of gathered data distribution provided by the present invention, have advantages below:
1)The order volume of client is improved, business effect is fine;
2)Information rank is improved, helps client preferably to complete product trading;
3)Save and see goods cost, reduce purchase cost, realize that poor goods is free from risk.
Brief description of the drawings
Fig. 1 is a kind of methodological function structural representation of gathered data distribution of the present invention;
Fig. 2 is a kind of method flow schematic diagram of gathered data distribution of the present invention;
Fig. 3 is a kind of device application scenarios schematic diagram of gathered data distribution of the present invention.
Embodiment
In order to facilitate the understanding of the purposes, features and advantages of the present invention, it is below in conjunction with the accompanying drawings and of the invention Embodiment to the present invention a kind of gathered data distribute method and its device be described in further detail.
With reference to figure 2, the method for gathered data distribution, specifically comprising following flow:
Step 201:Data collection data collection, acquisition tasks are built using crawler technology, to data analysis management, are realized to waste and old The crawl of industry data accuracy, and data classification is carried out according to certain rule and screening criteria, form the one of database file Individual process;
Step 202:Gather pond data;
Step 203:Data pool, by the data that crawler capturing comes by sorting and secondary operation, duplicate removal, the data after filtering;
Step 204:Judge data message classification, if text message, then perform step 205, if user data, then hold Row step 206;
Step 205:Waste and old information processing;
Step 206:Crm system;
Step 207:Judge whether user is effective, if invalid, perform step 208, if it is valid, performing step 209;
Step 208:Terminate;
Step 209:Member.
Further, step 209 specifically includes:
Free member:Free registration regular member, stock store purchase commodity, famous enterprise's purchase and supply commodity, issues autonomous commodity Information, check the autonomous information of other members issue;
Stock member:Commodity are bought in stock store, issue autonomous merchandise news, issue stock store merchandise news, can be famous enterprise The supply of material;
Famous enterprise member:Commodity are bought in stock store, are issued autonomous merchandise news, issue famous enterprise buying purpose, can be preengage and place an order, It can be that famous enterprise supplies, check the autonomous information of other members issue.
Further, member's set meal service advantage has following three aspects:
1)Check contact method:Member can unrestrictedly check domestic consumer and the contact method of member;
2)The preferential ranking of site search:The search information of website, Particulars of membership ranking, allow client find the very first time required for Information, from 1 year cycle;
3)The preferential ranking of member's rank:The supply-demand information of website, member enjoy the right of preferential ranking, preferentially obtain order.
The present invention is the solution a kind of method and its device of gathered data distribution, mainly employs following technology, below These technologies are simply introduced.
1)Crawler technology.Crawlers read the url list of crawl website, take out a website URL, put it into and do not visit The url list asked(UVURL lists)In, do not judge whether to have accessed to be taken out a URL if empty if UVURL, This webpage is read if not accessing, and carries out Hypertext Link and content analysis, this webpage is then stored in document database, URL is put into and has accessed url list(VURL lists), untill UVRL is sky, other websites are now captured again, are followed successively Ring is untill all website url lists have all been crawled.In order to ensure that crawlers can quickly obtain required letter Breath, crawlers are traveled through to website using certain search strategy and download document, and in general search strategy has width excellent First search strategy, depth-first search strategy and focusing search strategy.
2)MapReduce distributed proccessings.MapReduce distributed proccessings pass through the big rule to data set Each node that modulo operation is distributed on network realizes reliability, and each node can be periodically the work of completion and state Updating record is returned.More than the one default time interval if a node is kept silent, host node record this node State is death, and the data for distributing to this node are dealt into other node.It is each to operate with the inseparable of name file Operation is cut to ensure the conflict between parallel thread will not occur, when file is renamed, system may replicate them Gone in another name beyond to task name.
3)Sort result technology.The search information of website, Particulars of membership ranking, allows client the very first time to find supplier.
4)Redis technologies.Redis not only supports the data of simple k/v types, while also provides list, set, The storage of the data structures such as hash, redis support the data backup of the backup, i.e. master-slave patterns of data, redis branch The persistence of data is held, the data in internal memory can be maintained in disk, can again load and be used when restarting, In addition, redis String capacity is big, it is maximum up to 1G.
5)Database technology.Database is used to manage substantial amounts of gathered data and the structuring number related to these data According to.Structural data includes the metadata of collection information, and the other information extracted from data set, such as hyperlink and Anchor Text etc..Requirement of the data acquisition module to database has at 2 points in the present invention:When high arbitrary access, can be to information Quick-searching is carried out, second, high-compressibility, can cause database to store more gathered data, this hair using less space Bright middle use Delta is encoded(coding)Compress technique can realize 70% compression to documents such as HTML.
Several typical case scenes of this method are described below:
With reference to figure 2, a kind of embodiment of the method for gathered data of the invention distribution is as follows.
Application scenarios one:
Reptile is firstly connected to a domain name system(DNS)On server(Dns server converts the hostname into IP address), so Try to connect the server with IP address afterwards, finally send a HTTP request to the webserver, have requested that a page.
Wherein described reptile component mainly includes five modules:
A, url data pond to be collected:It contains current URL to be collected(In continous way collection, some acquired mistake URL can may be also put back into the collection pond to be resurveyed).
B, DNS name resolution module:It is when URL captures webpage for determining the IP of its corresponding Web server Location.
C, Web page module is captured:Webpage corresponding to some URL is returned using http agreements.
D, analysis and processing module:Text and link are extracted from the webpage collected.
F, URL deduplication modules:Determine whether some link extracted captured in URL pool or recently.
Application scenarios two:
With reference to figure 3, the method for gathered data of the invention distribution is applied in certain waste plastic regeneration website, is had to each client Industry Zone Information, trade trend are targetedly shown, so that client understands the waste and old regenerated plastics business of forefront, meanwhile, The resource of top quality stock is provided for client.
It should be noted that:Website is all at present to release news, as long as free application is registered as domestic consumer, you can All see, but only member can just check the user contact details that issue is independently merchandised, and domestic consumer can't see.
The foregoing is only a preferred embodiment of the present invention, is not intended to limit the scope of the present invention.
The technical staff in the field can be understood that for convenience of description and succinctly, foregoing description is The specific work process of system, device and unit, may be referred to the corresponding process in preceding method embodiment, will not be repeated here.
In several embodiments provided by the present invention, it should be understood that disclosed systems, devices and methods, can be with Realize by another way.For example, the device embodiment described above that arrives is only schematical, for example, the unit Division, only a kind of division of logic function, can there is other dividing mode, such as multiple units or group when actually realizing Part can combine or be desirably integrated into another system, or some features can be ignored, or not perform.It is another, it is shown or The mutual coupling discussed or direct-coupling or communication connection can be the indirect couplings by some interfaces, device or unit Close or communicate to connect, can be electrical, mechanical or other forms.
The unit illustrated as separating component can be or can also be physically separate, be shown as unit Part can be or may not be physical location, you can with positioned at a place, or multiple nets can also be distributed to On network unit.Some or all of unit therein can be selected to realize the mesh of this embodiment scheme according to the actual needs 's.
In addition, each functional unit in each embodiment of the present invention can be integrated in a processing unit, can also That unit is individually physically present, can also two or more units it is integrated in a unit.Above-mentioned integrated list Member can both be realized in the form of hardware, can be realized in the form of SFU software functional unit.
It should be noted that one of ordinary skill in the art will appreciate that realize the whole in above-described embodiment method or portion Split flow, it is that by computer program the hardware of correlation can be instructed to complete, described program can be stored in a computer In read/write memory medium, the program is upon execution, it may include such as the flow of the embodiment of above-mentioned each method.Wherein, it is described Storage medium can be magnetic disc, CD, read-only memory(Read-Only Memory, ROM)Or random access memory (Random Access Memory, RAM)Deng.
The method and its device of a kind of gathered data distribution provided by the present invention are described in detail above, this Apply specific embodiment in text to be set forth the principle and embodiment of the present invention, the explanation of above example is simply used Understand the method and its core concept of the present invention in help;Meanwhile for those of ordinary skill in the art, according to the present invention's Thought, there will be changes in specific embodiments and applications, in summary, this specification content should not be construed as Limitation of the present invention.

Claims (9)

  1. A kind of 1. method of gathered data distribution, it is characterised in that this method mainly includes:
    A, data collection data collection;
    B, pond data are gathered;
    C, data pool;
    D, judge data message classification, if text message, then perform step E, if user data, then perform step F;
    E, waste and old information processing;
    F, crm system;
    G, judge whether user is effective, if invalid, perform step H, if it is valid, performing step I;
    H, terminate;
    I, member.
  2. 2. the method for a kind of gathered data distribution according to claim 1, it is characterised in that the step C, be specially: By the data that crawler capturing comes by sorting and secondary operation, duplicate removal, the data after filtering.
  3. 3. the method for a kind of gathered data distribution according to claim 1, it is characterised in that the step D, be specially: The classification of data message includes waste plastics, broken material and regeneration particle.
  4. 4. the method for a kind of gathered data distribution according to claim 1, it is characterised in that the step E, be specially:
    Main to include stock city and general information, a stock stock in hand inside the city, mode of doing business is consignment trade, and general information is User independently merchandises, that is, self-dealing, and not by platform, and not amount of money involved of independently merchandising, consignment trade are related to gold Volume.
  5. 5. the method for a kind of gathered data distribution according to claim 1, it is characterised in that the step I, be specially:
    Free member:Free registration regular member, stock store purchase commodity, famous enterprise's purchase and supply commodity, issues autonomous commodity Information, check the autonomous information of other members issue;
    Stock member:Commodity are bought in stock store, issue autonomous merchandise news, issue stock store merchandise news, can be famous enterprise The supply of material;
    Famous enterprise member:Commodity are bought in stock store, are issued autonomous merchandise news, issue famous enterprise buying purpose, can be preengage and place an order, It can be that famous enterprise supplies, check the autonomous information of other members issue.
  6. 6. a kind of device of gathered data distribution, it is characterised in that mainly include:Data collection data collection module, data acquisition Module, data distribution module.
  7. A kind of 7. device of gathered data distribution according to claim 6, it is characterised in that the data collection data collection Module, it is specially:Want to buy information, information provision, recovery client, top four category informations of information.
  8. A kind of 8. device of gathered data distribution according to claim 6, it is characterised in that the data acquisition module, Specially:Acquisition tasks are built by crawler technology, to data analysis management, realizes and waste and old industry data accuracy is grabbed Take, and data classification is carried out according to certain rule and screening criteria, form a process of database file.
  9. A kind of 9. device of gathered data distribution according to claim 6, it is characterised in that described data distribution mould Block, it is specially:By judging the classification of data message, if text message, then waste and old information processing is carried out, if user Data, also member is further determined whether by crm system.
CN201711018854.2A 2017-10-27 2017-10-27 A kind of method and its device of gathered data distribution Pending CN107766530A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711018854.2A CN107766530A (en) 2017-10-27 2017-10-27 A kind of method and its device of gathered data distribution

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711018854.2A CN107766530A (en) 2017-10-27 2017-10-27 A kind of method and its device of gathered data distribution

Publications (1)

Publication Number Publication Date
CN107766530A true CN107766530A (en) 2018-03-06

Family

ID=61270539

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711018854.2A Pending CN107766530A (en) 2017-10-27 2017-10-27 A kind of method and its device of gathered data distribution

Country Status (1)

Country Link
CN (1) CN107766530A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108596683A (en) * 2018-05-03 2018-09-28 新奥(中国)燃气投资有限公司 A kind of potential customers' information acquisition method and device
CN108959855A (en) * 2018-04-25 2018-12-07 上海药明康德新药开发有限公司 A kind of computer coding method that the reagent for DNA encoding compound library screens
CN110417712A (en) * 2018-04-28 2019-11-05 北京资采信息技术有限公司 One kind being based on network data transmission equipment real-time data acquisition and analytic method

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150294256A1 (en) * 2014-04-11 2015-10-15 Microsoft Technology Licensing, Llc Scenario modeling and visualization
CN105007171A (en) * 2015-05-25 2015-10-28 上海欣方软件有限公司 User data analysis system and method based on big data in communication field
CN107124463A (en) * 2017-05-11 2017-09-01 广州德领物联科技有限责任公司 A kind of data-sharing systems applied to intelligent plant

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150294256A1 (en) * 2014-04-11 2015-10-15 Microsoft Technology Licensing, Llc Scenario modeling and visualization
CN105007171A (en) * 2015-05-25 2015-10-28 上海欣方软件有限公司 User data analysis system and method based on big data in communication field
CN107124463A (en) * 2017-05-11 2017-09-01 广州德领物联科技有限责任公司 A kind of data-sharing systems applied to intelligent plant

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108959855A (en) * 2018-04-25 2018-12-07 上海药明康德新药开发有限公司 A kind of computer coding method that the reagent for DNA encoding compound library screens
CN108959855B (en) * 2018-04-25 2021-05-18 上海药明康德新药开发有限公司 Computer coding method for screening reagent of DNA coding compound library
CN110417712A (en) * 2018-04-28 2019-11-05 北京资采信息技术有限公司 One kind being based on network data transmission equipment real-time data acquisition and analytic method
CN108596683A (en) * 2018-05-03 2018-09-28 新奥(中国)燃气投资有限公司 A kind of potential customers' information acquisition method and device

Similar Documents

Publication Publication Date Title
US20200073906A1 (en) Method, Device, Storage Medium and Processor for Data Acquisition and Query
CN102682059B (en) Method and system for distributing users to clusters
CN103559211B (en) Information processing method and information processor and browser in webpage
CN109658206A (en) Information recommendation method and device
CN107562818A (en) Information recommendation system and method
CN106126648B (en) It is a kind of based on the distributed merchandise news crawler method redo log
CN106708821A (en) User personalized shopping behavior-based commodity recommendation method
CN108268565B (en) Method and system for processing user browsing behavior data based on data warehouse
KR102225729B1 (en) Product information processing apparatus for multiple online shopping mall product registration and method thereof
CN103942712A (en) Product similarity based e-commerce recommendation system and method thereof
CA2893912C (en) Systems and methods for optimizing data analysis
CN107016587B (en) Personalized page pushing method and device
CN102982050A (en) Collecting and presenting temporal-based action information
CN107766530A (en) A kind of method and its device of gathered data distribution
CN109087121A (en) Marketing message release platform construction method and device
CN104965863B (en) A kind of clustering objects method and apparatus
CN107103062A (en) A kind of webpage recommending method and system
Dias et al. Automating the extraction of static content and dynamic behaviour from e-commerce websites
JP2020504879A (en) System and method for collecting data related to malicious content in a networked environment
CN103985049A (en) Method and device for setting marketing tool
CN108122153A (en) Personalized recommendation method based on cloud computing tupe under e-commerce environment
CN111914163A (en) Medicine combination recommendation method and device, electronic equipment and storage medium
CN109359998A (en) Customer data processing method, device, computer installation and storage medium
CN110232589B (en) Intention customer analysis system based on big data
CN107729330A (en) The method and apparatus for obtaining data set

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20180306