CN105849730A - Data capture method and system - Google Patents
Data capture method and system Download PDFInfo
- Publication number
- CN105849730A CN105849730A CN201680000336.5A CN201680000336A CN105849730A CN 105849730 A CN105849730 A CN 105849730A CN 201680000336 A CN201680000336 A CN 201680000336A CN 105849730 A CN105849730 A CN 105849730A
- Authority
- CN
- China
- Prior art keywords
- data
- baidu
- google search
- scope
- capture
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9535—Search customisation based on user profiles and personalisation
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The present invention provides a data capture method and system. The method comprises the following steps of receiving a data range of a user needing to capture; according to the data range, carrying out the data capture by a Baidu search algorithm and a Google search algorithm separately; taking the same results in the Baidu search algorithm and the Google search algorithm as the capture results. The technical scheme provided by the present invention has an advantage of good capture effect.
Description
Technical field
The present invention relates to communication and data processing field, particularly relate to the method and system of a kind of data grabber.
Background technology
In the biggest data of data grabber and network data, range of application is wider, the standard of existing data grabber
Really property is poor.
Summary of the invention
A kind of method that data grabber is provided, the shortcoming that the accuracy of the crawl which solving prior art is poor.
On the one hand, it is provided that a kind of data grab method, described method comprises the steps:
Receive the scope of data that user needs to capture;
Carried out the crawl of data respectively by Baidu's searching algorithm and Google search algorithm according to this scope of data;
The result that the data that Baidu's Search Results is identical with in Google search result are captured as this.
Optionally, described method also includes:
After other results of Google search are arranged in identical data.
Optionally, described method also includes:
The data that shielding Baidu promotes and Baidu optimizes.
On the other hand, it is provided that a kind of data grabber system, described system includes:
Receive unit, the scope of data captured for receiving user to need;
Search unit, for entering respectively by Baidu's searching algorithm and Google search algorithm according to this scope of data
The crawl of row data;
Judging unit, for grabbing Baidu's Search Results as this with identical data in Google search result
The result taken.
Optionally, described system also includes:
Sequencing unit, after being arranged in identical data by other results of Google search.
Optionally, described system also includes:
Screen unit, for shielding the data that Baidu promotes and Baidu optimizes.
The technical scheme that the specific embodiment of the invention provides receives the scope of data that user needs to capture, foundation
This scope of data carries out the crawl of data respectively by Baidu's searching algorithm and Google search algorithm, Baidu is searched
The result that the data that hitch is the most identical with in Google search result capture as this, it combines Baidu and paddy
The advantage of song, so it has the advantage that accuracy is good.
Accompanying drawing explanation
In order to be illustrated more clearly that the embodiment of the present invention or technical scheme of the prior art, below will be to enforcement
In example or description of the prior art, the required accompanying drawing used is briefly described, it should be apparent that, describe below
In accompanying drawing be only some embodiments of the present invention, for those of ordinary skill in the art, do not paying
On the premise of going out creative work, it is also possible to obtain other accompanying drawing according to these accompanying drawings.
The flow chart of a kind of data grab method that Fig. 1 provides for the present invention;
The structure chart of a kind of data grabber system that Fig. 2 provides for the present invention.
Detailed description of the invention
Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is carried out clearly
Chu, be fully described by, it is clear that described embodiment be only a part of embodiment of the present invention rather than
Whole embodiments.Based on the embodiment in the present invention, those of ordinary skill in the art are not making creation
The every other embodiment obtained under property work premise, broadly falls into the scope of protection of the invention.
Flow process refering to a kind of data grab method that Fig. 1, Fig. 1 provide for the present invention the first better embodiment
Figure, the method is completed by server, and the method is as it is shown in figure 1, comprise the steps:
Step S101, receive user need capture scope of data;
Step S102, counted respectively by Baidu's searching algorithm and Google search algorithm according to this scope of data
According to crawl;
Step S103, Baidu's Search Results captured as this with identical data in Google search result
Result.
The technical scheme that the specific embodiment of the invention provides receives the scope of data that user needs to capture, foundation
This scope of data carries out the crawl of data respectively by Baidu's searching algorithm and Google search algorithm, Baidu is searched
The result that the data that hitch is the most identical with in Google search result capture as this, it combines Baidu and paddy
The advantage of song, so it has the advantage that accuracy is good.
Optionally, said method can also include after step s 103:
After other results of Google search are arranged in identical data.
Optionally, said method can also include after step s 103:
The data that shielding Baidu promotes and Baidu optimizes.
A kind of data grabber system provided for the present invention the second better embodiment refering to Fig. 2, Fig. 2, this is
System includes:
Receive unit 201, the scope of data captured for receiving user to need;
Search unit 202, for passing through Baidu's searching algorithm and Google search algorithm difference according to this scope of data
Carry out the crawl of data;
Judging unit 203, for the data that Baidu's Search Results is identical with in Google search result as this
The result captured.
The technical scheme that the specific embodiment of the invention provides receives the scope of data that user needs to capture, foundation
This scope of data carries out the crawl of data respectively by Baidu's searching algorithm and Google search algorithm, Baidu is searched
The result that the data that hitch is the most identical with in Google search result capture as this, it combines Baidu and paddy
The advantage of song, so it has the advantage that accuracy is good.
Optionally, said system can also include:
Sequencing unit 204, after being arranged in identical data by other results of Google search.
Optionally, said system can also include:
Screen unit 205, for shielding the data that Baidu promotes and Baidu optimizes.
It should be noted that for aforesaid each method embodiment or embodiment, in order to be briefly described, therefore
It being all expressed as a series of combination of actions, but those skilled in the art should know, the present invention is not
Limited by described sequence of movement, because of according to the present invention, some step can use other orders or
Person is carried out simultaneously.Secondly, those skilled in the art also should know, embodiment described in the specification or
Embodiment belongs to preferred embodiment, necessary to involved action and the unit not necessarily present invention.
In the above-described embodiments, the description to each embodiment all emphasizes particularly on different fields, in certain embodiment the most in detail
The part stated, may refer to the associated description of other embodiments.
Step in embodiment of the present invention method can carry out order according to actual needs and adjust, merges and delete.
Unit in embodiment of the present invention device can merge according to actual needs, divides and delete.This
The feature of the different embodiments described in this specification and different embodiment can be entered by the technical staff in field
Row combines or combination.
Through the above description of the embodiments, those skilled in the art is it can be understood that arrive this
Bright can realize with hardware, or firmware realizes, or combinations thereof mode realizes.Realize when using software
Time, above-mentioned functions can be stored in computer-readable medium or as on computer-readable medium
Or multiple instruction or code are transmitted.Computer-readable medium includes computer-readable storage medium and communication media,
Wherein communication media includes any medium being easy to transmit computer program to another place from a place.
Storage medium can be any usable medium that computer can access.As example but be not limited to: computer
Computer-readable recording medium can include random access memory (Random Access Memory, RAM), read-only storage
Device (Read-Only Memory, ROM), EEPROM (Electrically Erasable
Programmable Read-Only Memory, EEPROM), read-only optical disc (Compact Disc Read-Only
Memory, CD-ROM) or other optical disc storage, magnetic disk storage medium or other magnetic storage apparatus or
Can be used in carrying or store and there is instruction or the desired program code of data structure form can be by calculating
Any other medium of machine access.In addition.Any connection can be suitable become computer-readable medium.Example
As, if software is to use coaxial cable, optical fiber cable, twisted-pair feeder, Digital Subscriber Line (Digital Subscriber
Line, DSL) or the wireless technology of such as infrared ray, radio and microwave etc from website, server or
Other remote source of person, then coaxial cable, optical fiber cable, twisted-pair feeder, DSL or the most infrared
The wireless technology of line, wireless and microwave etc be included in affiliated medium fixing in.As used in the present invention,
Dish (Disk) and dish (disc) include compress laser disc (CD), laser dish, laser disc, Digital Versatile Disc (DVD),
Floppy disk and Blu-ray Disc, the duplication data of the usual magnetic of its mid-game, dish then carrys out the duplication number of optics with laser
According to.Within above combination above should also be as being included in the protection domain of computer-readable medium.
In a word, the foregoing is only the preferred embodiment of technical solution of the present invention, be not intended to limit this
The protection domain of invention.All within the spirit and principles in the present invention, any amendment of being made, equivalent,
Improve, should be included within the scope of the present invention.
Claims (6)
1. a data grab method, it is characterised in that described method comprises the steps:
Receive the scope of data that user needs to capture;
Carried out the crawl of data respectively by Baidu's searching algorithm and Google search algorithm according to this scope of data;
The result that the data that Baidu's Search Results is identical with in Google search result are captured as this.
Method the most according to claim 1, it is characterised in that described method also includes:
After other results of Google search are arranged in identical data.
Method the most according to claim 1, it is characterised in that described method also includes:
The data that shielding Baidu promotes and Baidu optimizes.
4. a data grabber system, it is characterised in that described system includes:
Receive unit, the scope of data captured for receiving user to need;
Search unit, for entering respectively by Baidu's searching algorithm and Google search algorithm according to this scope of data
The crawl of row data;
Judging unit, for grabbing Baidu's Search Results as this with identical data in Google search result
The result taken.
System the most according to claim 4, it is characterised in that described system also includes:
Sequencing unit, after being arranged in identical data by other results of Google search.
System the most according to claim 4, it is characterised in that described system also includes:
Screen unit, for shielding the data that Baidu promotes and Baidu optimizes.
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2016/077409 WO2017161578A1 (en) | 2016-03-25 | 2016-03-25 | Method and system for data capturing |
Publications (1)
Publication Number | Publication Date |
---|---|
CN105849730A true CN105849730A (en) | 2016-08-10 |
Family
ID=56576345
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201680000336.5A Pending CN105849730A (en) | 2016-03-25 | 2016-03-25 | Data capture method and system |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN105849730A (en) |
WO (1) | WO2017161578A1 (en) |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106294802A (en) * | 2016-08-15 | 2017-01-04 | 马岩 | The grasping means of voice data and system |
CN106326373A (en) * | 2016-08-15 | 2017-01-11 | 马岩 | Grasping method and system of reliable video in big data |
WO2018027928A1 (en) * | 2016-08-12 | 2018-02-15 | 深圳市博信诺达经贸咨询有限公司 | Forum big data capturing method and system |
WO2018032247A1 (en) * | 2016-08-15 | 2018-02-22 | 马岩 | Search method and system for big data of videos |
WO2018032252A1 (en) * | 2016-08-15 | 2018-02-22 | 马岩 | Secure search method and system for big data on forums |
WO2018032253A1 (en) * | 2016-08-15 | 2018-02-22 | 马岩 | Secure search method and system for big data of images |
WO2018032251A1 (en) * | 2016-08-15 | 2018-02-22 | 马岩 | Method and system for applying security level to data fetching of big data |
WO2018032249A1 (en) * | 2016-08-15 | 2018-02-22 | 马岩 | Audio data fetching method and system |
WO2018032254A1 (en) * | 2016-08-15 | 2018-02-22 | 马岩 | Method and system for fetching trusted video in big data |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070214158A1 (en) * | 2006-03-08 | 2007-09-13 | Yakov Kamen | Method and apparatus for conducting a robust search |
CN102004782A (en) * | 2010-11-25 | 2011-04-06 | 北京搜狗科技发展有限公司 | Search result sequencing method and search result sequencer |
CN102043834A (en) * | 2010-11-25 | 2011-05-04 | 北京搜狗科技发展有限公司 | Method for realizing searching by utilizing client and search client |
US20140122475A1 (en) * | 2012-10-29 | 2014-05-01 | Alibaba Group Holding Limited | Search result ranking method and system |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101079048A (en) * | 2006-05-24 | 2007-11-28 | 上海万纬信息技术有限公司 | Internet information search engine and method based on software robot exclusion standard |
CN101477554A (en) * | 2009-01-16 | 2009-07-08 | 西安电子科技大学 | User interest based personalized meta search engine and search result processing method |
-
2016
- 2016-03-25 CN CN201680000336.5A patent/CN105849730A/en active Pending
- 2016-03-25 WO PCT/CN2016/077409 patent/WO2017161578A1/en active Application Filing
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070214158A1 (en) * | 2006-03-08 | 2007-09-13 | Yakov Kamen | Method and apparatus for conducting a robust search |
CN102004782A (en) * | 2010-11-25 | 2011-04-06 | 北京搜狗科技发展有限公司 | Search result sequencing method and search result sequencer |
CN102043834A (en) * | 2010-11-25 | 2011-05-04 | 北京搜狗科技发展有限公司 | Method for realizing searching by utilizing client and search client |
US20140122475A1 (en) * | 2012-10-29 | 2014-05-01 | Alibaba Group Holding Limited | Search result ranking method and system |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2018027928A1 (en) * | 2016-08-12 | 2018-02-15 | 深圳市博信诺达经贸咨询有限公司 | Forum big data capturing method and system |
CN106294802A (en) * | 2016-08-15 | 2017-01-04 | 马岩 | The grasping means of voice data and system |
CN106326373A (en) * | 2016-08-15 | 2017-01-11 | 马岩 | Grasping method and system of reliable video in big data |
WO2018032247A1 (en) * | 2016-08-15 | 2018-02-22 | 马岩 | Search method and system for big data of videos |
WO2018032252A1 (en) * | 2016-08-15 | 2018-02-22 | 马岩 | Secure search method and system for big data on forums |
WO2018032253A1 (en) * | 2016-08-15 | 2018-02-22 | 马岩 | Secure search method and system for big data of images |
WO2018032251A1 (en) * | 2016-08-15 | 2018-02-22 | 马岩 | Method and system for applying security level to data fetching of big data |
WO2018032249A1 (en) * | 2016-08-15 | 2018-02-22 | 马岩 | Audio data fetching method and system |
WO2018032254A1 (en) * | 2016-08-15 | 2018-02-22 | 马岩 | Method and system for fetching trusted video in big data |
Also Published As
Publication number | Publication date |
---|---|
WO2017161578A1 (en) | 2017-09-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105849730A (en) | Data capture method and system | |
CN105683966A (en) | Searching method and searching system based on big data | |
CN105706136A (en) | E-commerce platform analysis method and system based on big data | |
CN106446009A (en) | Method and system for recommending house to user in agency app | |
CN106250516A (en) | Synonym application process in big data search and system | |
CN106294637A (en) | Realize the method and system of phonetic search | |
CN106250538A (en) | Wechat is shared the method and system of big data | |
CN106293528A (en) | Dropbox stores the method and system of big data | |
CN106294010A (en) | The storage method and system of big data in distributed system | |
CN106294011A (en) | The big date storage method of sort-type and system | |
CN106250509A (en) | Key class searching method and system in big data | |
CN106294645A (en) | Different part of speech realization method and systems in big data search | |
CN106331323A (en) | Method and system for sorting apps according to place | |
CN105683967A (en) | Web page grabbing method and web page grabbing system based on big data | |
CN106294856A (en) | House matching process and system in house app | |
CN106254662A (en) | Interior of mobile phone control method and system | |
CN106326404A (en) | Method and system for searching house resources in internet | |
CN106303036A (en) | The hidden method of app and system under specific use scene | |
CN106253787A (en) | The rotating speed method and system of closed loop control horizontal coil winding machine | |
CN106294818A (en) | Personalization realizes app sort method and system | |
CN106022930A (en) | Logistic insurance premium claim settlement method and system | |
CN106294643A (en) | Different language realizes real-time searching method and system in big data | |
CN106250530A (en) | Key class searching method and system in big data | |
CN106294711A (en) | Different part of speech realization method and systems in big data search | |
CN106250531A (en) | Synonym application process in big data search and system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20160810 |