CN104516956B - The website information of one kind of incremental crawling method - Google Patents

The website information of one kind of incremental crawling method Download PDF

Info

Publication number
CN104516956B
CN104516956B CN201410783643.8A CN201410783643A CN104516956B CN 104516956 B CN104516956 B CN 104516956B CN 201410783643 A CN201410783643 A CN 201410783643A CN 104516956 B CN104516956 B CN 104516956B
Authority
CN
China
Prior art keywords
data
crawling
length
preset value
queue
Prior art date
Application number
CN201410783643.8A
Other languages
Chinese (zh)
Other versions
CN104516956A (en
Inventor
刘学
脱立恒
董微
刘照邻
Original Assignee
中国科学院声学研究所
上海尚恩华科网络科技股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中国科学院声学研究所, 上海尚恩华科网络科技股份有限公司 filed Critical 中国科学院声学研究所
Priority to CN201410783643.8A priority Critical patent/CN104516956B/en
Publication of CN104516956A publication Critical patent/CN104516956A/en
Application granted granted Critical
Publication of CN104516956B publication Critical patent/CN104516956B/en

Links

Abstract

本发明公开了一种网站信息增量爬取方法,该方法包括:按照网站数据呈现顺序爬取设定长度的数据,并按照网站数据的呈现顺序放入数据队列,所述数据队列末端设有比较窗口,检查比较窗口内的数据与已爬取数据的重复度,当重复度达到预设值时,停止数据爬取;否则,重复上述过程,直到比较窗口内数据与已爬取数据的重复度达到预设值,停止数据爬取。 The present invention discloses an incremental crawl the website information, the method comprising: presenting data sequence by site crawling set data length, and data according to the order of presentation of the site into the data queue, the data queue provided with end data in the comparison window, the comparison window to check the repeatability of crawling have data, when repeated reaches the preset value, the stop data crawling; otherwise, repeat the process, until the comparison in the data window and the data has been repeated crawling reaches the preset value, stops crawling data. 本发明针对网站信息未严格按照时间排序进行增量爬取时,在可允许的漏爬率情况下,降低了爬取消耗。 When the present invention is not strictly crawling website information ordered by time is incremented for, in case of leakage permissible climbing, crawling consumption is reduced. 在工作流程中,可动态调整“数据爬取的设定长度”和“数据队列长度”大小,提高算法工作效率,满足不同的漏爬率及爬取损耗需求。 In the workflow, you can dynamically adjust "crawling data set length" and "data queue length" sizes, to improve the efficiency of the algorithm, to meet different leak rate and climb crawling loss requirements.

Description

一种网站信息增量爬取方法 The website information of one kind of incremental crawling method

技术领域 FIELD

[0001]本发明涉及网络信息爬取技术领域,尤其涉及一种网站信息增量爬取方法,适用于增量爬取未严格按照时间排序的网站信息。 [0001] The present invention relates to a crawling network information technologies, and in particular, to a method of crawling the website information gain for an incremental crawl in accordance with the time information is not strictly ordered site.

背景技术 Background technique

[0002] 全球互联网自上世纪九十年代进入商用以来迅速拓展,已经成为当今世界推动经济发展和社会进步的重要信息基础设施。 [0002] Global Internet since the mid nineties to enter the rapidly expanding business, has become the world's critical information infrastructures to promote economic development and social progress. 中国的互联网发展虽然起步比国际互联网发展晚,但是进入新世纪以来,同样快速发展。 China's Internet development than the international development of the Internet, though started late, but since the beginning of the new century, the rapid development of the same. 在服务侧,互联网已经渗透各个领域,尤其在信息搜索、交流沟通、商务交易、手机无线应用等方面得到快速发展。 In the service side, the Internet has penetrated all fields, especially in the rapid development of information search, communication, business transactions, mobile phones and other wireless applications. 在用户侧,截至2013年12 月,我国网民规模达6.18亿,已经成为全球用户最多的国家。 On the user side, as of December 2013, the scale of China reached 618 million Internet users, has become the world's largest user country.

[0003]随着互联网技术和服务发展,互联网信息数量庞大,为了方便用户更便捷地从庞大网络信息中获取感兴趣的内容,出现网络聚合服务。 [0003] With the development of Internet technology and services, a large number of Internet information more easily access content of interest from the vast network of information for the convenience of users, the emergence of network aggregation service. 网络聚合是指在将互联网上的海量信息与资源(如博客、论坛、影视、音乐、供求信息、文件等)进行人工或机器的内容挑选、分析、分类的基础上,为用户提供有用的、更具针对性的信息。 Network convergence refers to the vast amounts of information and resources (such as blog, forum, film, music, supply and demand information, files, etc.) on the Internet for content manually or machine selection, analysis, classification based on, provide users with useful, more targeted information.

[0004] 网络聚合首先解决从目标网站上获取信息,有一类网站,网站信息未严格按照时间排序,如果从这些网站增量爬取数据,很难判断哪些信息已经爬取过,哪些是新的信息, 如果新爬取的数据逐条进行验证是否爬取过,将带来很大的爬取损耗。 [0004] The network aggregation first address to obtain information from the target site, there is a website, the website information is not strictly in chronological order, if the climb from these sites in increments of data, it is difficult to determine what information has been crawling, what is new information, crawling if the new data one by one to verify whether crawling over, crawling will bring great losses. 经过对这类网站的信息进行分析发现,网站信息越往前的信息,即热度高的信息或较新的信息,未严格按照时间排序排序,而网站后面的信息,即热度底或较旧的信息,相对比较按照时间序列进行排序。 After analysis of the information found in such sites, website information more forward message that the high popularity information or the newer information, not strictly sorted according to time sorting, and behind the website information, that is the end of heat or older information relatively sorted by time series. 比如说一个视频内容网站,每个栏目下的首页的信息往往包括新上线元数据及操作员或系统推荐的元数据,所以整个网页的信息并未按照时间序列进行排序,然而,栏目后面几页元数据,相对都是按照时间序列进行排序,在增量爬取此种网站信息时,需要有一种方法在可允许的漏爬率情况下,尽量降低爬取损耗。 For example, a video content website, information page of each section is often included in the new on-line metadata and recommended system operator or metadata, so information on the entire page is not sorted by time series, however, a few columns back page metadata is sorted according to the relative time sequence in such incremental crawling website information, a need for a method in case of leak rate allowable climbing, crawling minimize losses.

发明内容 SUMMARY

[0005] 本发明目的在于克服现有技术中针对网站信息未严格按照时间排序进行增量爬取时,在可允许的漏爬率情况下,如何降低爬取消耗这一技术问题,从而提供一种网站信息增量爬取方法。 [0005] The object of the present invention is to overcome the prior art when crawled site information is not strictly ordered by time increments for, in case of leak rate allowable climbing, crawling how to reduce the consumption of this technical problem, thereby providing a The website information kinds incremental crawling method.

[0006] 为实现上述目的,本发明提供了一种网站信息增量爬取方法。 [0006] To achieve the above object, the present invention provides a method of website information incremental crawl. 该方法包括:(a)按照网站数据的呈现顺序,从目标网站上呈现第一数据开始,爬取设定长度的数据;(b)将爬取设定长度的数据按照网站数据的呈现顺序放入数据队列,所述数据队列末端设有比较窗口;(c)计算比较窗口内爬取的设定长度的数据与已爬取数据的重复度;(d)根据重复度计算结果停止数据爬取或进行下一次数据的爬取;即,当重复度达到预设值时,停止数据爬取;当重复度未达到预设值时,进行下一次数据的爬取,然后执行步骤(b) _⑹。 The method comprising: (a) site data presented according to the order of presentation of the first data from the target start site crawling set data length; (b) setting the data length of crawling in accordance with an order of presentation of the discharge site data into the data queue, the end of the queue data comparison window provided; data (c) calculating crawling comparison window is set to the length of the take crawled duplicate data; (d) stop data based on calculation results of repeated crawling a crawling under or data; i.e., when the repetition reaches a preset value, the stop data crawling; repetitiveness when a preset value is not reached, the next crawling data, and the step (b) _⑹ .

[0007] 优选地,所述当重复度达到预设值时,且预设值小于1时,将所述数据队列中未与已爬取数据重复的数据保存在数据库中,然后停止整个数据爬取流程。 [0007] Preferably, when the repetition reaches a preset value and the preset value is smaller than 1, the data queue is not stored in the database and the data already crawling duplicate data, then stops the entire data crawl taking process.

[0008] 优选地,所述当重复度达到预设值时,且预设值为1时,停止整个数据爬取流程。 [0008] Preferably, when the repetition reaches a preset value and the preset value is 1, the process stops the entire data crawling.

[0009] 优选地,所述进行下一次数据的爬取具体包括:当重复度未达到预设值时,将所述数据队列中未与己爬取数据重复的数据保存在数据库中,并清空数据队列;按照网站数据的呈现顺序,从上一次数据爬取结束位置处,继续爬取设定长度的数据。 [0009] Preferably, a crawling under said specific data comprises: when the overlapping degree does not reach a preset value, the data will not crawling queue with data duplicated data already stored in the database and clears data queue; in accordance with the order of presentation site data, data from the last crawling at the end position, continue to set the data length of the crawling.

[0010] 优选地,所述数据爬取的设定长度小于或等于数据队列长度。 [0010] Preferably, the length of the data set is less than or equal to crawl data queue length.

[0011] 优选地,所述当重复度没达到预设值时,进行下一次数据的爬取的过程中,动态调整数据爬取的设定长度和数据队列长度。 [0011] Preferably, when the duplication degree does not reach the predetermined value, the process is carried out at a crawling data, dynamic adjustment data crawling the length and the length of the data queue.

[0012] 本发明针对网站信息未严格按照时间排序进行增量爬取时,在可允许的漏爬率情况下,降低了爬取消耗。 [0012] When the present invention is not strictly crawling website information ordered by time is incremented for, in case of leakage permissible climbing, crawling consumption is reduced. 在工作流程中,可动态调整“数据爬取的设定长度”和“数据队列长度”大小,提高算法工作效率,满足不同的漏爬率及爬取损耗需求。 In the workflow, you can dynamically adjust "crawling data set length" and "data queue length" sizes, to improve the efficiency of the algorithm, to meet different leak rate and climb crawling loss requirements.

附图说明 BRIEF DESCRIPTION

[0013] 图1是根据本发明实施例的网站信息增量爬取方法流程图; [0013] FIG. 1 is a flowchart of a method according to the site of the present invention, embodiments of the incremental crawl;

[0014] 图1A是图1中重复度达到预设值时对爬取的数据进行处理的示意图; [0014] FIG 1A is a schematic diagram of the data processing crawled reaches the preset value when repetitiveness in FIG 1;

[0015] 图1B是图1中重复度未达到预设值时进行下一次数据爬取的示意图; [0015] FIG. 1B is a schematic diagram in FIG. 1 when carried out under repeatability of data does not reach a predetermined value crawling;

[0016] 图2是根据本发明实施例的网站信息增量爬取过程中对数据爬取设定长度和数据队列的调整示意图。 [0016] FIG. 2 is a crawling a set length and adjusting the data queue according to the site of a schematic embodiment of the present invention the incremental crawl process data.

具体实施方式 Detailed ways

[0017] 为了使本技术领域的人员更好的理解本发明实施例中的技术方案,并使本发明实施例的上述目的、特征和优点能够更加明显易懂,下面通过附图和实施例,对本发明的技术方案做进一步的详细描述。 [0017] In order to make those skilled in the art better understand the technical solutions in the embodiments of the present invention, the above and other objects, features and advantages more apparent embodiment of the present invention, the following embodiments illustrated by the drawings and, further detailed description of the technical solution of the present invention. 基于本发明中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其它实施例,都属于本发明保护的范围。 Based on the embodiments of the present invention, those of ordinary skill in the art without making all of the other embodiments given herein without creative efforts fall within the scope of the present invention.

[0018] 图1是本发明网站信息增量爬取方法流程图。 [0018] FIG. 1 is the site of the present invention is a method flowchart information incremental crawl.

[0019] 如图1所示,本发明实施例网站信息增量爬取方法包括以下步骤: [0019] As shown in FIG. 1, for example website information incremental crawling embodiment of the present invention the method comprises the steps of:

[0020] 在步骤101中,按照网站数据的呈现顺序,从目标网站上呈现第一数据开始,爬取设定长度的数据,如一个视频网站有100条信息,可设定爬取设定长度为10条。 [0020] In step 101, in accordance with the order of presentation site data, presentation data from the first target site begins crawling data set length, such as a video information site 100, the set length can be set crawling It is 10.

[0021] 在步骤102中,将爬取设定长度的数据,即步骤1 〇1中第1 -1 〇条网站信息按照网站数据的呈现顺序放入数据队列,所述数据队列末端设有比较窗口。 [0021] In step 102, the data length setting crawling, i.e. -1 step 1 〇1 article 210 site 1 site information in the order of presentation of the data into the data queue, the end of the queue data comparison with window. 优选地,数据爬取的设定长度小于或等于数据队列长度。 Preferably, the length of the data set is less than or equal to crawl data queue length.

[0022] 在步骤103中,计算比较窗口内爬取的设定长度的数据即步骤1〇1中第1-1〇条网站信息与已爬取数据的重复度,也可从1-10条网站信息抽取若干条进行抽样计算。 [0022] In step 103, the calculation 1〇1 article 1-1〇 website information has repeatability crawling data length of the data set in the comparison window crawling i.e. step, may be from 1-10 several pieces of information extraction sites sampled calculation. 其中,若定义爬取的1-10条信息为今日首次爬取数据,则已爬取数据可指昨日爬取的所述视频网站信息。 Wherein, if the information is defined crawling 1-10 crawling first data today, it may refer to data already crawling the website yesterday crawling video information.

[0023]在步骤104中,根据重复度计算结果停止数据爬取或进行下一次数据的爬取。 [0023] In step 104, according to the calculation result of repeated stops crawling data or next data crawling. 其中,判定原则为:当重复度达到预设值时,停止数据爬取(见步骤104a);当重复度未达到预设值时,进行下一次数据的爬取(见步骤l〇4b),然后执行步骤102-104。 Wherein the determining principle: when repeated reaches the preset value, stops crawling data (see step 104a); when the overlapping degree does not reach a preset value, crawling next data (see step l〇4b), then perform steps 102-104.

[0024]图1A是图1中重复度达到预设值时对爬取的数据进行处理的示意图。 [0024] FIG 1A is a schematic diagram of the data processing crawled reaches the preset value when repetitiveness in FIG.

[0025]如图1A所示,当图1中步骤104中计算重复度达到预设值时,则执行步骤105 (即判定该预设值是否小于1)。 When [0025] As shown in FIG. 1A, FIG. 1 when calculated in step 104 is repeated reaches the preset value, step 105 (i.e., the predetermined determination value is smaller than 1). 若小于1则执行步骤106 (即将数据队列中未与已爬取数据重复的数据保存在数据库中,然后执行图1中的步骤l〇4a (即停止整个数据爬取流程)。若预设值为1时,执行图1中的步骤104a (即停止整个数据爬取流程)。 If step 106 is less than 1 (ie, data queue with data is not stored crawling duplicate data in the database, then perform step l〇4a FIG. 1 (i.e., stops the entire data crawling process). When the preset value is 1, step 1 executes 104a (i.e., stop the whole process of data crawling).

[0026]图1B是图1中重复度未达到预设值时进行下一次数据爬取的示意图。 [0026] FIG. 1B is a schematic diagram for the repetition of a preset value does not reach the primary data 1 in FIG crawling.

[0027]如图1B所示,当图1中步骤104中计算重复度未达到预设值时,则执行步骤104b (即进行下一次数据的爬取)。 [0027] FIG. 1B, FIG. 1, when the overlapping degree calculated in step 104 does not reach the predetermined value, step 104b is executed (at a crawling ie data). 进行下一次数据的爬取具体包括:当重复度未达到预设值时,执行步骤107 (即将所述数据队列中未与已爬取数据重复的数据保存在数据库中,并清空数据队列);然后再执行步骤108 (即按照网站数据的呈现顺序,从上一次数据爬取结束位置处, 继续爬取设定长度的数据,该设定长度与图1步骤101中所设定长度一致)。 Crawling under a specific data comprises: when the overlapping degree does not reach a preset value, the step 107 (ie, the data queue data is not stored with the data repetition crawling in the database, and empty data queue); then step 108 (i.e., order of presentation data by site, once the data from the position at the end of the crawler, crawling continues set data length, the length 1 is set in step 101 in FIG consistent set length). 然后执行图1中步骤102,重复循环动作直至比较窗口内的数据与己爬取数据重复度达到预设值,再停止数据爬取。 Then a step 102 executes, data already within the cycle operation is repeated until the comparison window crawling repetitiveness data reaches a preset value, and then stops crawling data.

[0028]图2是根据本发明实施例的网站信息增量爬取过程中对数据爬取设定长度和数据队列的调整示意图。 [0028] FIG. 2 is a crawling a set length and adjusting the data queue according to the site of a schematic embodiment of the present invention the incremental crawl process data.

[0029]如图2所示,图1中所述设定的数据爬取长度可动态调整,当下一次爬取设定长度的数据重复度未达到预设值时,进一步执行步骤201 (即判定下一次爬取设定长度的数据重复度是否大于上一次爬取设定长度的数据重复度),若是,则执行步骤202 (即调整当前设定的数据爬取长度,如可动态缩短数据爬取长度,减少爬取损耗),然后执行步骤204 (即根据新调整的数据爬取长度,重新进行下下一次的数据的爬取)。 [0029] As shown in Figure 1, the set data length can be dynamically adjusted crawling 2, the next time the data length setting crawling duplication degree does not reach the predetermined value, further performs the step 201 (i.e., determination crawling next set of data is greater than the length of the last repetition crawling length repeatability data set), if yes, step 202 is performed (i.e. data currently set by adjusting the length of crawling as crawled data dynamically shortened length taken to reduce loss crawling), then step 204 (i.e., the length of the new crawling adjusted data, the next data crawling under re). 若否,则执行步骤203 (即按照原有设定的数据爬取长度进行下下一次的数据的爬取)。 If not, step 203 is performed (i.e., the original data set according to crawling crawling next lower data length).

[0030]综上所述,本发明的有益效果在于:可针对网站信息未严格按照时间排序进行增量爬取时,在可允许的漏爬率情况下,降低爬取消耗。 [0030] In conclusion, the beneficial effects of the present invention is that: when crawling the website information can not strictly be ordered by time increments for, in case where the allowable leak rate of climbing, crawling consumption reduction. 可动态调整“数据爬取的设定长度”和“数据队列长度”大小,提高算法工作效率,满足不同的漏爬率及爬取损耗需求。 Can dynamically adjust "crawling data set length" and "data queue length" sizes, to improve the efficiency of the algorithm, to meet different leak rate and climb crawling loss requirements.

[0031] 应当理解,本发明并不限定具体业务类别、数据类别以及目标网站类别,对以上内容所做的变换也落在本发明的保护范围之内。 [0031] It should be understood that the present invention is not limited to particular traffic class, data type, and a target site category of the above transform does also fall within the scope of the present invention.

[0032] 以上所述的具体实施方式,对本发明的目的、技术方案和有益效果进行了进一步详细说明,所应理解的是,以上所述仅为本发明的具体实施方式而已,并不用于限定本发明的保护范围,凡在本发明的精神和原则之内,所做的任何修改、等同替换、改进等,均应包含在本发明的保护范围之内。 [0032] The foregoing specific embodiments, objectives, technical solutions, and advantages of the invention will be further described in detail, it should be understood that the above descriptions are merely specific embodiments of the present invention, but not intended to limit the scope of the present invention, all within the spirit and principle of the present invention, any changes made, equivalent substitutions and improvements should be included within the scope of the present invention.

Claims (5)

1. 一种网站信息增量爬取方法,其特征在于,包括以下步骤: (a) 按照网站数据的呈现顺序,从目标网站上呈现第一数据开始,爬取设定长度的数据; (b) 将爬取设定长度的数据按照网站数据的呈现顺序放入数据队列,所述数据队列末端设有比较窗口; (c) 计算比较窗口内爬取的设定长度的数据与己爬取数据的重复度; (d) 根据重复度计算结果停止数据爬取或进行下一次数据的爬取; 艮P,当重复度达到预设值时,停止数据爬取;当重复度未达到预设值时,进行下一次数据的爬取,并动态调整数据爬取的设定长度和数据队列长度,然后执行步骤(b)-⑹。 An incremental crawl method of website information, characterized by comprising the steps of: (a) site data according to the order of presentation, presenting the first data from the target start site crawling set data length; (b ) crawling the data length set in accordance with the order of presentation of site data into the data queue, the end of the queue data comparison window provided; hexyl data (c) calculating crawling comparison window set data length crawling the repeatability; (d) stop data based on calculation results of duplicate or crawling next data crawling; Gen P, when repeated reaches the preset value, the stop data crawling; repetitiveness when a preset value is not reached when the next, once crawling data, and dynamically adjusts the data crawling queue length and the data length is set, and the step (b) -⑹.
2. 根据权利要求1所述的网站信息增量爬取方法,其特征在于,所述当重复度达到预设值时,且预设值小于1时,将所述数据队列中未与已爬取数据重复的数据保存在数据库中, 然后停止整个数据爬取流程。 The website information increment according to claim 1 crawling, characterized in that, when the repetition reaches a preset value and the preset value is smaller than 1, and not in the queue the data crawled data fetch duplicate data stored in the database, then the process stops the entire data crawling.
3. 根据权利要求1所述的网站信息增量爬取方法,其特征在于,所述当重复度达到预设值时,且预设值为1时,停止整个数据爬取流程。 The website information increment according to claim 1 crawling, characterized in that, when the repetition reaches a preset value and the preset value of 1, the process stops the entire data crawling.
4. 根据权利要求1所述的网站信息增量爬取方法,其特征在于,所述进行下一次数据的爬取具体包括: 当重复度未达到预设值时,将所述数据队列中未与已爬取数据重复的数据保存在数据库中,并清空数据队列; 按照网站数据的呈现顺序,从上一次数据爬取结束位置处,继续爬取设定长度的数据。 The website information increment according to claim 1 crawling, characterized in that, once the next crawling data comprises: when the overlapping degree does not reach a preset value, the data queue is not the data has been crawling duplicate data stored in the database, and empty data queue; in accordance with the order of presentation site data, time data from the position at the end of the crawler, crawling continues data set length.
5. 根据权利要求1所述的网站信息增量爬取方法,其特征在于,所述数据爬取的设定长度小于或等于数据队列长度。 The website information increment according to claim 1 crawling, characterized in that the length of the data set is less than or equal to crawl data queue length.
CN201410783643.8A 2014-12-16 2014-12-16 The website information of one kind of incremental crawling method CN104516956B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410783643.8A CN104516956B (en) 2014-12-16 2014-12-16 The website information of one kind of incremental crawling method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410783643.8A CN104516956B (en) 2014-12-16 2014-12-16 The website information of one kind of incremental crawling method

Publications (2)

Publication Number Publication Date
CN104516956A CN104516956A (en) 2015-04-15
CN104516956B true CN104516956B (en) 2017-12-01

Family

ID=52792255

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410783643.8A CN104516956B (en) 2014-12-16 2014-12-16 The website information of one kind of incremental crawling method

Country Status (1)

Country Link
CN (1) CN104516956B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105956069A (en) * 2016-04-28 2016-09-21 优品财富管理有限公司 Network information collection and analysis method and network information collection and analysis system

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101676907A (en) * 2008-09-16 2010-03-24 北京雷速科技有限公司 Method and system of directionally acquiring Internet resources
CN103544213A (en) * 2013-09-16 2014-01-29 青岛英网资讯股份有限公司 Network content upgrading detection assessment method and system

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7877368B2 (en) * 2007-11-02 2011-01-25 Paglo Labs, Inc. Hosted searching of private local area network information with support for add-on applications

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101676907A (en) * 2008-09-16 2010-03-24 北京雷速科技有限公司 Method and system of directionally acquiring Internet resources
CN103544213A (en) * 2013-09-16 2014-01-29 青岛英网资讯股份有限公司 Network content upgrading detection assessment method and system

Also Published As

Publication number Publication date
CN104516956A (en) 2015-04-15

Similar Documents

Publication Publication Date Title
EP2302538B1 (en) Method and system for capturing change of data
CN102591876A (en) Sequencing method and device of search results
CN102033955B (en) Method for expanding user search results and server
CN101990670A (en) Search results ranking using editing distance and document information
CN102521406B (en) Distributed query method and system for complex task of querying massive structured data
CN101916266A (en) Voice control web page browsing method and device based on mobile terminal
CN101620625B (en) Method, device and search engine for sequencing searching keywords
CN101477542B (en) Sampling analysis method, system and equipment
CN103116605B (en) A real-time detection method and system for hot microblogging event-based monitoring subnet
CN102609435B (en) Large-scale event evaluation using realtime processors
CN102456058B (en) Method and device for providing category information
CN102521307A (en) Parallel query processing method for share-nothing database cluster in cloud computing environment
CN102063469B (en) Method and device for acquiring relevant keyword message and computer equipment
CN101908191A (en) Data analysis method and system for e-commerce
CN103763361A (en) Method and system for recommending applications based on user behavior and recommending server
US8359326B1 (en) Contextual n-gram analysis
CN103812849B (en) A native cache update method, system, client and server
CN103412870A (en) News pushing method of mobile terminal device news client side software
CN104765848B (en) Results hybrid cloud storage support efficient search may be ordered symmetric encryption method
CN102822820B (en) Indexing and searching employing virtual documents
CN101236554A (en) Database mass data comparison process
CN102163229A (en) Method and equipment for generating abstracts of searching results
Wang et al. A parallel execution method for minimizing distributed query response time
CN102339289A (en) Match identification method for character information and image information and server
CN101894129B (en) Video topic finding method based on online video-sharing website structure and video description text information

Legal Events

Date Code Title Description
C06 Publication
C10 Entry into substantive examination
GR01