CN105849730A - Data capture method and system - Google Patents

Data capture method and system Download PDF

Info

Publication number
CN105849730A
CN105849730A CN201680000336.5A CN201680000336A CN105849730A CN 105849730 A CN105849730 A CN 105849730A CN 201680000336 A CN201680000336 A CN 201680000336A CN 105849730 A CN105849730 A CN 105849730A
Authority
CN
China
Prior art keywords
data
baidu
google search
scope
capture
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201680000336.5A
Other languages
Chinese (zh)
Inventor
马岩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Publication of CN105849730A publication Critical patent/CN105849730A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention provides a data capture method and system. The method comprises the following steps of receiving a data range of a user needing to capture; according to the data range, carrying out the data capture by a Baidu search algorithm and a Google search algorithm separately; taking the same results in the Baidu search algorithm and the Google search algorithm as the capture results. The technical scheme provided by the present invention has an advantage of good capture effect.

Description

The method and system of data grabber
Technical field
The present invention relates to communication and data processing field, particularly relate to the method and system of a kind of data grabber.
Background technology
In the biggest data of data grabber and network data, range of application is wider, the standard of existing data grabber Really property is poor.
Summary of the invention
A kind of method that data grabber is provided, the shortcoming that the accuracy of the crawl which solving prior art is poor.
On the one hand, it is provided that a kind of data grab method, described method comprises the steps:
Receive the scope of data that user needs to capture;
Carried out the crawl of data respectively by Baidu's searching algorithm and Google search algorithm according to this scope of data;
The result that the data that Baidu's Search Results is identical with in Google search result are captured as this.
Optionally, described method also includes:
After other results of Google search are arranged in identical data.
Optionally, described method also includes:
The data that shielding Baidu promotes and Baidu optimizes.
On the other hand, it is provided that a kind of data grabber system, described system includes:
Receive unit, the scope of data captured for receiving user to need;
Search unit, for entering respectively by Baidu's searching algorithm and Google search algorithm according to this scope of data The crawl of row data;
Judging unit, for grabbing Baidu's Search Results as this with identical data in Google search result The result taken.
Optionally, described system also includes:
Sequencing unit, after being arranged in identical data by other results of Google search.
Optionally, described system also includes:
Screen unit, for shielding the data that Baidu promotes and Baidu optimizes.
The technical scheme that the specific embodiment of the invention provides receives the scope of data that user needs to capture, foundation This scope of data carries out the crawl of data respectively by Baidu's searching algorithm and Google search algorithm, Baidu is searched The result that the data that hitch is the most identical with in Google search result capture as this, it combines Baidu and paddy The advantage of song, so it has the advantage that accuracy is good.
Accompanying drawing explanation
In order to be illustrated more clearly that the embodiment of the present invention or technical scheme of the prior art, below will be to enforcement In example or description of the prior art, the required accompanying drawing used is briefly described, it should be apparent that, describe below In accompanying drawing be only some embodiments of the present invention, for those of ordinary skill in the art, do not paying On the premise of going out creative work, it is also possible to obtain other accompanying drawing according to these accompanying drawings.
The flow chart of a kind of data grab method that Fig. 1 provides for the present invention;
The structure chart of a kind of data grabber system that Fig. 2 provides for the present invention.
Detailed description of the invention
Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is carried out clearly Chu, be fully described by, it is clear that described embodiment be only a part of embodiment of the present invention rather than Whole embodiments.Based on the embodiment in the present invention, those of ordinary skill in the art are not making creation The every other embodiment obtained under property work premise, broadly falls into the scope of protection of the invention.
Flow process refering to a kind of data grab method that Fig. 1, Fig. 1 provide for the present invention the first better embodiment Figure, the method is completed by server, and the method is as it is shown in figure 1, comprise the steps:
Step S101, receive user need capture scope of data;
Step S102, counted respectively by Baidu's searching algorithm and Google search algorithm according to this scope of data According to crawl;
Step S103, Baidu's Search Results captured as this with identical data in Google search result Result.
The technical scheme that the specific embodiment of the invention provides receives the scope of data that user needs to capture, foundation This scope of data carries out the crawl of data respectively by Baidu's searching algorithm and Google search algorithm, Baidu is searched The result that the data that hitch is the most identical with in Google search result capture as this, it combines Baidu and paddy The advantage of song, so it has the advantage that accuracy is good.
Optionally, said method can also include after step s 103:
After other results of Google search are arranged in identical data.
Optionally, said method can also include after step s 103:
The data that shielding Baidu promotes and Baidu optimizes.
A kind of data grabber system provided for the present invention the second better embodiment refering to Fig. 2, Fig. 2, this is System includes:
Receive unit 201, the scope of data captured for receiving user to need;
Search unit 202, for passing through Baidu's searching algorithm and Google search algorithm difference according to this scope of data Carry out the crawl of data;
Judging unit 203, for the data that Baidu's Search Results is identical with in Google search result as this The result captured.
The technical scheme that the specific embodiment of the invention provides receives the scope of data that user needs to capture, foundation This scope of data carries out the crawl of data respectively by Baidu's searching algorithm and Google search algorithm, Baidu is searched The result that the data that hitch is the most identical with in Google search result capture as this, it combines Baidu and paddy The advantage of song, so it has the advantage that accuracy is good.
Optionally, said system can also include:
Sequencing unit 204, after being arranged in identical data by other results of Google search.
Optionally, said system can also include:
Screen unit 205, for shielding the data that Baidu promotes and Baidu optimizes.
It should be noted that for aforesaid each method embodiment or embodiment, in order to be briefly described, therefore It being all expressed as a series of combination of actions, but those skilled in the art should know, the present invention is not Limited by described sequence of movement, because of according to the present invention, some step can use other orders or Person is carried out simultaneously.Secondly, those skilled in the art also should know, embodiment described in the specification or Embodiment belongs to preferred embodiment, necessary to involved action and the unit not necessarily present invention.
In the above-described embodiments, the description to each embodiment all emphasizes particularly on different fields, in certain embodiment the most in detail The part stated, may refer to the associated description of other embodiments.
Step in embodiment of the present invention method can carry out order according to actual needs and adjust, merges and delete.
Unit in embodiment of the present invention device can merge according to actual needs, divides and delete.This The feature of the different embodiments described in this specification and different embodiment can be entered by the technical staff in field Row combines or combination.
Through the above description of the embodiments, those skilled in the art is it can be understood that arrive this Bright can realize with hardware, or firmware realizes, or combinations thereof mode realizes.Realize when using software Time, above-mentioned functions can be stored in computer-readable medium or as on computer-readable medium Or multiple instruction or code are transmitted.Computer-readable medium includes computer-readable storage medium and communication media, Wherein communication media includes any medium being easy to transmit computer program to another place from a place. Storage medium can be any usable medium that computer can access.As example but be not limited to: computer Computer-readable recording medium can include random access memory (Random Access Memory, RAM), read-only storage Device (Read-Only Memory, ROM), EEPROM (Electrically Erasable Programmable Read-Only Memory, EEPROM), read-only optical disc (Compact Disc Read-Only Memory, CD-ROM) or other optical disc storage, magnetic disk storage medium or other magnetic storage apparatus or Can be used in carrying or store and there is instruction or the desired program code of data structure form can be by calculating Any other medium of machine access.In addition.Any connection can be suitable become computer-readable medium.Example As, if software is to use coaxial cable, optical fiber cable, twisted-pair feeder, Digital Subscriber Line (Digital Subscriber Line, DSL) or the wireless technology of such as infrared ray, radio and microwave etc from website, server or Other remote source of person, then coaxial cable, optical fiber cable, twisted-pair feeder, DSL or the most infrared The wireless technology of line, wireless and microwave etc be included in affiliated medium fixing in.As used in the present invention, Dish (Disk) and dish (disc) include compress laser disc (CD), laser dish, laser disc, Digital Versatile Disc (DVD), Floppy disk and Blu-ray Disc, the duplication data of the usual magnetic of its mid-game, dish then carrys out the duplication number of optics with laser According to.Within above combination above should also be as being included in the protection domain of computer-readable medium.
In a word, the foregoing is only the preferred embodiment of technical solution of the present invention, be not intended to limit this The protection domain of invention.All within the spirit and principles in the present invention, any amendment of being made, equivalent, Improve, should be included within the scope of the present invention.

Claims (6)

1. a data grab method, it is characterised in that described method comprises the steps:
Receive the scope of data that user needs to capture;
Carried out the crawl of data respectively by Baidu's searching algorithm and Google search algorithm according to this scope of data;
The result that the data that Baidu's Search Results is identical with in Google search result are captured as this.
Method the most according to claim 1, it is characterised in that described method also includes:
After other results of Google search are arranged in identical data.
Method the most according to claim 1, it is characterised in that described method also includes:
The data that shielding Baidu promotes and Baidu optimizes.
4. a data grabber system, it is characterised in that described system includes:
Receive unit, the scope of data captured for receiving user to need;
Search unit, for entering respectively by Baidu's searching algorithm and Google search algorithm according to this scope of data The crawl of row data;
Judging unit, for grabbing Baidu's Search Results as this with identical data in Google search result The result taken.
System the most according to claim 4, it is characterised in that described system also includes:
Sequencing unit, after being arranged in identical data by other results of Google search.
System the most according to claim 4, it is characterised in that described system also includes:
Screen unit, for shielding the data that Baidu promotes and Baidu optimizes.
CN201680000336.5A 2016-03-25 2016-03-25 Data capture method and system Pending CN105849730A (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2016/077409 WO2017161578A1 (en) 2016-03-25 2016-03-25 Method and system for data capturing

Publications (1)

Publication Number Publication Date
CN105849730A true CN105849730A (en) 2016-08-10

Family

ID=56576345

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201680000336.5A Pending CN105849730A (en) 2016-03-25 2016-03-25 Data capture method and system

Country Status (2)

Country Link
CN (1) CN105849730A (en)
WO (1) WO2017161578A1 (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106294802A (en) * 2016-08-15 2017-01-04 马岩 The grasping means of voice data and system
CN106326373A (en) * 2016-08-15 2017-01-11 马岩 Grasping method and system of reliable video in big data
WO2018027928A1 (en) * 2016-08-12 2018-02-15 深圳市博信诺达经贸咨询有限公司 Forum big data capturing method and system
WO2018032247A1 (en) * 2016-08-15 2018-02-22 马岩 Search method and system for big data of videos
WO2018032252A1 (en) * 2016-08-15 2018-02-22 马岩 Secure search method and system for big data on forums
WO2018032253A1 (en) * 2016-08-15 2018-02-22 马岩 Secure search method and system for big data of images
WO2018032251A1 (en) * 2016-08-15 2018-02-22 马岩 Method and system for applying security level to data fetching of big data
WO2018032249A1 (en) * 2016-08-15 2018-02-22 马岩 Audio data fetching method and system
WO2018032254A1 (en) * 2016-08-15 2018-02-22 马岩 Method and system for fetching trusted video in big data

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070214158A1 (en) * 2006-03-08 2007-09-13 Yakov Kamen Method and apparatus for conducting a robust search
CN102004782A (en) * 2010-11-25 2011-04-06 北京搜狗科技发展有限公司 Search result sequencing method and search result sequencer
CN102043834A (en) * 2010-11-25 2011-05-04 北京搜狗科技发展有限公司 Method for realizing searching by utilizing client and search client
US20140122475A1 (en) * 2012-10-29 2014-05-01 Alibaba Group Holding Limited Search result ranking method and system

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101079048A (en) * 2006-05-24 2007-11-28 上海万纬信息技术有限公司 Internet information search engine and method based on software robot exclusion standard
CN101477554A (en) * 2009-01-16 2009-07-08 西安电子科技大学 User interest based personalized meta search engine and search result processing method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070214158A1 (en) * 2006-03-08 2007-09-13 Yakov Kamen Method and apparatus for conducting a robust search
CN102004782A (en) * 2010-11-25 2011-04-06 北京搜狗科技发展有限公司 Search result sequencing method and search result sequencer
CN102043834A (en) * 2010-11-25 2011-05-04 北京搜狗科技发展有限公司 Method for realizing searching by utilizing client and search client
US20140122475A1 (en) * 2012-10-29 2014-05-01 Alibaba Group Holding Limited Search result ranking method and system

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018027928A1 (en) * 2016-08-12 2018-02-15 深圳市博信诺达经贸咨询有限公司 Forum big data capturing method and system
CN106294802A (en) * 2016-08-15 2017-01-04 马岩 The grasping means of voice data and system
CN106326373A (en) * 2016-08-15 2017-01-11 马岩 Grasping method and system of reliable video in big data
WO2018032247A1 (en) * 2016-08-15 2018-02-22 马岩 Search method and system for big data of videos
WO2018032252A1 (en) * 2016-08-15 2018-02-22 马岩 Secure search method and system for big data on forums
WO2018032253A1 (en) * 2016-08-15 2018-02-22 马岩 Secure search method and system for big data of images
WO2018032251A1 (en) * 2016-08-15 2018-02-22 马岩 Method and system for applying security level to data fetching of big data
WO2018032249A1 (en) * 2016-08-15 2018-02-22 马岩 Audio data fetching method and system
WO2018032254A1 (en) * 2016-08-15 2018-02-22 马岩 Method and system for fetching trusted video in big data

Also Published As

Publication number Publication date
WO2017161578A1 (en) 2017-09-28

Similar Documents

Publication Publication Date Title
CN105849730A (en) Data capture method and system
CN105683966A (en) Searching method and searching system based on big data
CN105706136A (en) E-commerce platform analysis method and system based on big data
CN106446009A (en) Method and system for recommending house to user in agency app
CN106250516A (en) Synonym application process in big data search and system
CN106294637A (en) Realize the method and system of phonetic search
CN106250538A (en) Wechat is shared the method and system of big data
CN106293528A (en) Dropbox stores the method and system of big data
CN106294010A (en) The storage method and system of big data in distributed system
CN106294011A (en) The big date storage method of sort-type and system
CN106250509A (en) Key class searching method and system in big data
CN106294645A (en) Different part of speech realization method and systems in big data search
CN106331323A (en) Method and system for sorting apps according to place
CN105683967A (en) Web page grabbing method and web page grabbing system based on big data
CN106294856A (en) House matching process and system in house app
CN106254662A (en) Interior of mobile phone control method and system
CN106326404A (en) Method and system for searching house resources in internet
CN106303036A (en) The hidden method of app and system under specific use scene
CN106253787A (en) The rotating speed method and system of closed loop control horizontal coil winding machine
CN106294818A (en) Personalization realizes app sort method and system
CN106022930A (en) Logistic insurance premium claim settlement method and system
CN106294643A (en) Different language realizes real-time searching method and system in big data
CN106250530A (en) Key class searching method and system in big data
CN106294711A (en) Different part of speech realization method and systems in big data search
CN106250531A (en) Synonym application process in big data search and system

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20160810