CN102799610A - Method and system for collecting network information - Google Patents

Method and system for collecting network information Download PDF

Info

Publication number
CN102799610A
CN102799610A CN2012101805210A CN201210180521A CN102799610A CN 102799610 A CN102799610 A CN 102799610A CN 2012101805210 A CN2012101805210 A CN 2012101805210A CN 201210180521 A CN201210180521 A CN 201210180521A CN 102799610 A CN102799610 A CN 102799610A
Authority
CN
China
Prior art keywords
information
classification
collection
web page
user
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN2012101805210A
Other languages
Chinese (zh)
Other versions
CN102799610B (en
Inventor
赵勇
党书国
阎飞飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Shuqin Technology Co Ltd
Original Assignee
BEIJING QILEKE TECHNOLOGY CO LTD
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by BEIJING QILEKE TECHNOLOGY CO LTD filed Critical BEIJING QILEKE TECHNOLOGY CO LTD
Priority to CN201210180521.0A priority Critical patent/CN102799610B/en
Publication of CN102799610A publication Critical patent/CN102799610A/en
Application granted granted Critical
Publication of CN102799610B publication Critical patent/CN102799610B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The invention discloses a method and a system for collecting network information. The method comprises the following steps of: acquiring information to be collected which is required to be collected by a user according to a collection instruction of the user; analyzing the information to be collected to determine a classification of the information to be collected; and storing the information to be collected and the determined classification of the information to be collected, wherein the information to be collected comprises website information which is required to be collected by the user and/or information relevant to webpage contents. Preferably, during determination, based on a preset keyword library, at least one of text analysis, semantic analysis and word frequency statistical analysis is executed on information of the webpage contents corresponding to the information to be collected, so that whether at least one keyword is comprised in the information of the webpage contents corresponding to the information to be collected and also comprised in the preset keyword library is judged; and the classification of the information to be collected is determined according to a judgment result. By the method and the system, the convenience in network collection for the user is enhanced.

Description

Network information collecting method and system
Technical field
The present invention relates to network technology, more particularly, relate to a kind of network information collecting method and system that is used for Web contents such as collected network address and web page contents.
Background technology
Along with popularizing and development of Internet technology, the quantity of website, blog, microblogging and content also rapidly increase.Emerge in large numbers some and helped the technical scheme of user's collected network content.
In a kind of collection mode, the user can add website visiting address and title thereof or web page access address and title thereof in the collection of browser to, as the bookmark (being made up of webpage title and respective links) in the collection of browser.When the user hopes to visit the webpage of being collected, click corresponding bookmark in the collection, can make browser be switched to the corresponding page and read.In this collection mode; If the user does not carry out manual sort with in the classification that the bookmark that newly adds is placed on user-selected collection and divides into to the Web page bookmark of being collected; Then the network linking that can the user will be collected of browser is placed in the collection as bookmark according to user's interpolation order successively; This possibly cause the user when wanting that after a period of time of this web page interlinkage of collection the bookmark that passes through to be collected is visited this web page interlinkage, is not easy to find the bookmark corresponding with this web page interlinkage.In addition,, then need carry out manual setting, be provided with and to bring unnecessary trouble to the user by hand if the user hopes that the network linking that will collect is placed in the collection as bookmark categorizedly.
In another kind collection mode; The webpage that is used to collect web site url that the user can the access network services merchant provides; The own web site name of hoping to collect of input or the classification of web site url and this website in webpage oneself are liked the web site url of visiting to preserve.In this mode, the user need setting/selection manually the classification of the website that will collect, even to manually add web site name or web site url, these numerous and diverse manual operationss greatly reduce the user friendly of collection system.
Content in order to help user group's leading subscriber to be collected has better proposed the present invention.
Summary of the invention
Technical matters to be solved by this invention be need provide a kind of can be automatically to the network information that will the collect network information collecting method and the system of classifying.
In order to solve the problems of the technologies described above, the invention provides a kind of network information collecting method.This method comprises:
Obtaining step, according to user's collection indication, obtain the user the collection information of treating that will collect;
Confirm step, the said collection information of treating is analyzed, to confirm the said classification of waiting the information of collecting;
The collection step is stored said treat collection information and the determined said classification of waiting the information of collecting,
Wherein, packets of information said to be collected draw together with the user website information and/or the relevant information of web page contents that will collect.
According to another aspect of the invention; In said definite step; Based on the predetermined keyword storehouse, to said treat the corresponding web page content information of collection information carry out text analyzing, semantic analysis and word frequency statistics analyze in one of at least, saidly treat in the corresponding web page content information of collection information and be contained at least one keyword in the said predetermined keyword storehouse to judge whether to exist to be contained in; Confirm the said classification of waiting the information of collecting according to judged result; Wherein, said predetermined keyword storehouse comprises a plurality of keywords, and each keyword corresponds respectively to one or more said classification; Said and said treat the corresponding web page content information of collection information comprise the website information treated in collection information web page contents pointed, treat in the collection information website information the part or all of web page contents and/or the said web page contents of treating that collection information is included of corresponding website.
According to another aspect of the invention, when being, the said classification of waiting the information of collecting is confirmed as in the pairing classification of said at least one keyword in judged result; Perhaps, with in said at least one keyword, treat that said and said the said classification of waiting the information of collecting is confirmed as in the more corresponding classification of one or more keywords of occurrence number in the corresponding web page content information of collection information.
According to another aspect of the invention, in said analytical procedure, for not the time, wait that with said the classification of the information of collecting confirms as preset default classification in judged result; Perhaps the said classification of waiting the information of collecting is confirmed as in the classification of user's appointment.
According to another aspect of the invention, also comprise: step is set, according to user's indication to the keyword in said predetermined keyword storehouse add, deletion, retouching operation.
According to another aspect of the invention; In said definite step; To with said treat the corresponding web page content information of collection information carry out text analyzing, semantic analysis and word frequency statistics analyze in one of at least; Said and saidly treat the characteristic of the web page content information that collection information is corresponding and be contained in keyword wherein to obtain to be used to embody; Confirm the said classification of waiting the information of collecting according to said at least one keyword; Wherein, said and said treat the corresponding web page content information of collection information comprise the website information treated in collection information web page contents pointed, treat in the collection information website information the part or all of web page contents and/or the said web page contents of treating that collection information is included of corresponding website.
According to another aspect of the invention, in said definite step, the said classification of waiting the information of collecting is confirmed as in the pairing classification of said one or more keywords.
According to another aspect of the invention; In said definite step; Each classification that said one or more keywords are corresponding all be not said user before the collection process in used minute time-like, wait that with said the classification of the information of collecting confirms as preset default classification; Perhaps the said classification of waiting the information of collecting is confirmed as in the classification of user's appointment.
According to another aspect of the invention, preestablish the corresponding classification of each keyword, perhaps, through carrying out the classification that text analyzing and/or semantic analysis confirm that each keyword is corresponding.
According to another aspect of the invention, said web page contents comprises all or part of literal, image and/or image in the webpage.
According to another aspect of the invention, said treat collection information be the user to collect website information the time, said obtaining step also comprises, according to machine learning algorithm analyze with the user the pairing web page content information of website information that will collect.
According to another aspect of the invention, said machine learning algorithm is simple Bei Yesi, SVMs, potential Di Li Cray apportion model and/or neuroid.
According to another aspect of the invention, also provide a kind of network information to collect system.This system comprises: acquiring unit, according to user's collection indication, obtain the user the collection information of treating that will collect; Confirm the unit, the said collection information of treating is analyzed, to confirm the said classification of waiting the information of collecting; Said treat collection information and the determined said classification of waiting the information of collecting are stored in the collection unit, wherein, packets of information said to be collected draw together with the user website information and/or the relevant information of web page contents that will collect.
According to another aspect of the invention; Said definite unit is also based on the predetermined keyword storehouse; To with said treat the corresponding web page content information of collection information carry out text analyzing, semantic analysis and word frequency statistics analyze in one of at least; Saidly treat in the corresponding web page content information of collection information and be contained at least one keyword in the said predetermined keyword storehouse to judge whether to exist to be contained in, confirm the said classification of waiting the information of collecting according to judged result, wherein; Said predetermined keyword storehouse comprises a plurality of keywords, and each keyword corresponds respectively to one or more said classification; Said and said treat the corresponding web page content information of collection information comprise the website information treated in collection information web page contents pointed, treat in the collection information website information the part or all of web page contents and/or the said web page contents of treating that collection information is included of corresponding website.
According to another aspect of the invention, the said classification of waiting collect information when being, confirmed as with the pairing classification of said at least one keyword in judged result in said definite unit; Perhaps, with in said at least one keyword, treat that said and said the said classification of waiting the information of collecting is confirmed as in the more corresponding classification of one or more keywords of occurrence number in the corresponding web page content information of collection information.
One or more embodiment according to the present invention; According to user's collection indication obtain the user to collect treat collection information after; Through treating Information on Collection analysis, can confirm the said classification of waiting the information of collecting according to analysis result, and need not the artificial assorting process of participating in.Like this, can realize treating that collection information is stored in the favorite places of respective classes automatically, strengthen the convenience that the user uses the network collection.
In other words, one or more embodiment of the present invention have solved when the user does not carry out manual sort, and problems such as loaded down with trivial details, the inorganizable property of the classifying content in the collection make and collect faster and more convenient operation.
Other features and advantages of the present invention will be set forth in instructions subsequently, and, partly become obviously through the content in the study instructions, perhaps understand through embodiment of the present invention.The object of the invention can be realized through the structure that in instructions, claims and accompanying drawing, is particularly pointed out and obtained with other advantages.
Description of drawings
Accompanying drawing is used to provide further understanding of the present invention, and constitutes the part of instructions, is used to explain the present invention with embodiments of the invention, is not construed as limiting the invention.In the accompanying drawings:
Fig. 1 illustrates the structural representation according to the network information collection system of present embodiment;
Fig. 2 A and Fig. 2 B illustrate the process flow diagram of the network information collecting method of first and second embodiment according to the present invention respectively;
Fig. 3 shows the process flow diagram that carries out the example of network collection according to the present invention;
Fig. 4 shows the process flow diagram that carries out the another example of network collection according to the present invention;
Fig. 5 shows the process flow diagram that carries out the another example of network collection according to the present invention.
Embodiment
For making the object of the invention, technical scheme and advantage clearer, the present invention is done to specify further below in conjunction with accompanying drawing.
Need to prove that if do not conflict, each characteristic among the embodiment of the invention and the embodiment can mutually combine, all within protection scope of the present invention.In addition; Can in computer system, carry out in the step shown in the process flow diagram of accompanying drawing such as a set of computer-executable instructions, and, though logical order has been shown in process flow diagram; But in some cases, can carry out step shown or that describe with the order that is different from here.
First embodiment
Fig. 1 illustrates the structural representation according to the network information collection system of present embodiment.As shown in Figure 1, server 10 is connected with many clients 20 networks.
Need to prove, only show a station server 10 among Fig. 1, yet server 10 of the present invention also can be for being many that for example, many computer installations in the cloud platform can serve as the role of server jointly.Client 10 can be various calculation elements such as computing machine, PDA(Personal Digital Assistant), panel computer, smart mobile phone.
In addition, server 10 can also can be wireless network for cable network with both being connected of client 20.In the client 20, can be provided for browser or other network information process software of accesses network information.
Fig. 2 A illustrates each step according to the network information collecting method of present embodiment.Following with reference to the next method of figure 2A according to present embodiment.
Step 210; According to user's collection indication, obtain the user the collection information of treating that will collect, packets of information wherein to be collected draw together with the user website information and/or the web page content relevant information that will collect; For example, web page address, webpage title, all or part of web page contents etc.;
Step 220 is treated Information on Collection analysis, to confirm to wait the classification of the information of collecting;
Step 230, collection information and the determined classification of waiting the information of collecting are treated in storage.
In step 210, more specifically, when the user clicked the browser of client 20, browser can receive user's collection indication.According to this collection indication, browser can obtain the user the collection information of treating that will collect.
In an example, customization collection button or browser plug-in on user's click browser, browser obtain the website information of current webpage of visiting, as treating collection information according to this operation (corresponding to user's collection indication) of user.Certainly, treat the website information that collection information not only can will be collected for the user, the web page contents that also can will collect for the user comprises picture, text even image etc. in the webpage.
Preferably; Browser also can send to server 10 with received user's collection indication; By server 10 according to this indicate obtain the user the collection information of treating that will collect, be convenient to like this in step 230, will wait to collect information stores in server 10.
For example; The customization button of local collection of the carrying out on user's click browser or network collection, browser plug-in, menu etc.; Server 10 receives the message (corresponding to user's collection indication) that this click event has taken place from the expression of browser, according to this message confirm the user the collection information of treating that will collect.For example, with the network address that comprises in this message, server 10 according to this network address with all or part of content of the pairing webpage of this network address as treating collection information.More specifically; Server 10 can be according to simple Bei Yesi (Naive Bayesian Model; NBC), SVMs (Support Vector Machine; SVM), machine learning algorithms such as potential Di Li Cray apportion model (Latent Dirichlet Allocation, be called for short LDA), neuroid wait analyze download treat website information in collection information web page contents pointed, treat in the collection information website information the part or all of web page contents and/or the said web page contents of treating that collection information is included of corresponding website.
In addition, other software in also can client 20 or module carry out according to user's collection indication obtain the user the operation of waiting the information of collecting that will collect.
Specify below in the step 220 the said collection information of treating is analyzed to confirm the said processing of waiting the classification of the information of collecting.
At first; Based on the predetermined keyword storehouse; To analyzing, saidly treat in the corresponding web page content information of collection information and be contained at least one keyword in the said predetermined keyword storehouse to judge whether to exist to be contained in the said corresponding web page content information of collection information of treating.
For example; Pass through word frequency analysis; Analyze in the crucial dictionary each keyword with the said occurrence number of treating in the corresponding web page content information of collection information, like this, just can accomplish above-mentioned decision operation; That is, judge whether to exist to be contained in and saidly treat in the corresponding web page content information of collection information and be contained at least one keyword in the said predetermined keyword storehouse.
For another example; Also can be through semantic analysis or text analyzing; Analyze with said and treat that can be used in the corresponding web page content information of collection information embody and the said keyword of treating the characteristic of the web page content information that collection information is corresponding; And then judge these keywords whether in the predetermined keyword storehouse, thereby accomplish above-mentioned decision operation.
When judged result for being; Promptly; Be judged as to exist to be contained in and saidly treat in the corresponding web page content information of collection information and when being contained at least one keyword in the said predetermined keyword storehouse, can the said classification of waiting the information of collecting be confirmed as in the pairing classification of these keywords.It should be noted that keyword can be for one or more, keyword can corresponding a plurality of classification, and a plurality of keywords also can corresponding same classification, therefore, can wait that the classification of the information of collecting can be for a plurality of.
Preferably; Be judged as and exist more being contained in saidly to treat in the corresponding web page content information of collection information and when being contained in the keyword in the said predetermined keyword storehouse; But in these keywords, treating that with said the said classification of waiting the information of collecting is confirmed as in the more corresponding classification of one or more keywords of occurrence number in the corresponding web page content information of collection information; Like this, can improve the classification precision preferably.
In addition, when analyzing, also can be with use that these three kinds of analysis modes of text analyzing, semantic analysis and word frequency statistics analysis are mutually combined, draw with analysis and to best embody and the said keyword of treating the characteristic of the web page content information that collection information is corresponding.
In addition, also can only that keyword is corresponding classification in wait the classification of the information of collecting than corresponding one or several classification of multi-key word as this, with the raising classify accuracy.
When judged result for not, that is, be judged as not exist to be contained in and saidly treat in the corresponding web page content information of collection information and when being contained in the keyword in the said predetermined keyword storehouse, wait that with said the classification of the information of collecting confirms as preset default classification; Perhaps the said classification of waiting the information of collecting is confirmed as in the classification of user's appointment.
Can come the processing of execution in step 220 by the browser on client 20, the client 20 or other software or server 10.
Get into step 230 then, store said treat collection information and the determined said classification of waiting the information of collecting.
Need to prove, both can also can be stored in server 10 ends with said collection information and the determined said classification and storage of waiting the information of collecting treated in client 20.
For example; When user expectation will be kept at this locality as the website information of Information on Collection, so that convenient next time, directly the bookmark on the click browser can be visited this webpage easily; Be installed on that browser on the client 20 can be preserved this website information and webpage title and according to the classification of confirming in this step, as the bookmark of a browser.Like this, when the user hoped this bookmark of visit next time, category was sought this bookmark easily, thereby both need not user's manual sort, has improved user friendly again.
Similarly, when user expectation is stored in Information on Collection on the network, can be by server 20 storage Information on Collection and classifications thereof.Like this, the user can come through network, category visits the content of being collected easily.
Need to prove, among aforementioned each embodiment, can preestablish the corresponding classification of each keyword, perhaps, through carrying out the classification that text analyzing and/or semantic analysis confirm that each keyword is corresponding.
In addition; Said predetermined keyword storehouse can be by research staff's manual configuration or confirm through program in advance; Also can the interface be set for the user provides, like this, can according to user's indication to the keyword in said predetermined keyword storehouse add, deletion, retouching operation.
In the present embodiment, through based on the predetermined keyword storehouse, come the analyzing web page content, need not each keyword of web page contents is all carried out word frequency analysis, can reduce the complexity of analyzing and processing preferably to each keyword in predetermined keyword storehouse.
Second embodiment
Fig. 2 B illustrates the process flow diagram according to the network information collecting method of present embodiment.In Fig. 2 B,, adopted identical Reference numeral with the same or analogous step of Fig. 2.
The step 210 of present embodiment is identical substantially with first embodiment with step 230, therefore launches no longer in detail.In the present embodiment, with step 221 alternative steps 220, can treat the collection info web in the situation that not preliminary election is provided with crucial dictionary and classify automatically.
More specifically; In the step 221 of present embodiment; To carrying out analyses such as text analyzing, semantic analysis and/or word frequency analysis with the said corresponding web page content information of collection information of treating; Said and saidly treat the characteristic of the web page content information that collection information is corresponding and be contained in and the said keyword of treating the web page content information that collection information is corresponding to obtain to be used for embodying, confirm the said classification of waiting the information of collecting according to keyword then.
Compare last preferred embodiment; This embodiment not necessarily needs the predetermined keyword storehouse; But directly web page contents is carried out text analyzing, semantic analysis and/or word frequency statistics analysis etc., analyze and determine the one or more keywords that best embody the web page contents characteristics.For example, analyze the highest speech of frequency of occurrence, as keyword.Also analyze the speech that some have specific meanings,, for example,, can represent that then this network address belongs to automotive-type, it is confirmed as keyword through semantic analysis if the automobile kind occurs repeatedly like masses, Audi etc. as keyword.
Then, can the said classification of waiting the information of collecting be confirmed as in the pairing classification of these keywords that analyze.
Further, can also judge whether these keywords are this user used classification in collection process before.If not, then preset default classification is confirmed as in the said classification of waiting the information of collecting; Perhaps the said classification of waiting the information of collecting is confirmed as in the classification of user's appointment.
In the present embodiment, need not preliminary election and set crucial dictionary, also can realize treating the automatic classification of Information on Collection preferably.
Instance one
Fig. 3 shows the process flow diagram that carries out the example of network collection according to the present invention.
Behind the website or link collections that the user likes oneself, system will carry out keyword abstraction automatically and the content that will collect is classified, and the content that will collect is added in the collection of corresponding classification.
Explain at length below how this example carries out automatic keyword abstraction to carry out each step of classifying content in the network collection:
Step 310, user's logging in network collection;
Step 320, the user directly adds in system and wants the network address or the link of collecting, and clicks collection; Perhaps in the browsing page process, select to want the mode of collecting through the browser plug-in of system customization, system can be automatically collect picture, the content (corresponding to treating collection information) that current page link or user select for the user;
Step 330; System will and extract the content of corresponding network address or link automatically according to machine learning algorithm analyses such as simple Bei Yesi, SVMs, LDA, neuroid, comparison, and extract the keyword of content description through participle, word frequency statistics scheduling algorithm;
Step 340, system carries out text and semantic analysis according to the keyword that extracts, and obtains the classification of keyword;
Step 350, according to classification results, system is with connecting or collecting content and add in the collection of respective classes.
Instance two
Fig. 4 shows the process flow diagram that carries out the another example of network collection according to the present invention.
In the present example, when the user collected network address, link or the page, blog, microblogging content, system was classified in the corresponding collection automatically.
Below at length introduce each step of this example.
Step 410, the user click network address or the link that will collect;
Step 420, system's crawlers grasp the content of corresponding network address or link automatically;
Step 430, system are carried out work such as text analyzing, semantic analysis and word frequency statistics to the content that grasps automatically;
Step 440, system realize the automatic keyword abstraction of system according to the one or more keywords of content extraction of the crucial dictionary that defines in advance to grasping;
Step 450, system are carried out automatic clustering according to the classification under the keyword with the content of correspondence;
In the collection that step 460, system are added the network address or the link of correspondence to corresponding classification.
Instance two
Fig. 5 shows the process flow diagram that carries out the another example of network collection according to the present invention.
In the present example, the user to collect be the page, blog, microblogging content.The concrete steps of this example are following:
Step 510, the user click the page, blog, the microblogging that will collect;
Step 520, the content that system's automatic phasing is answered is carried out work such as text analyzing, semantic analysis and word frequency statistics;
Step 530, identical substantially with above-mentioned steps 440, repeat no more;
Step 540, identical substantially with above-mentioned steps 450, repeat no more;
Step 550, system adds corresponding page, blog, microblogging in the collection of corresponding classification to.
So far; The content of network address, link or the page that the user will want to collect, blog, microblogging in the collection of classification, will be very easy to the user and visits corresponding network address, link or the page, blog, microblogging content once more under will be categorized into by system automatically according to classification.
System embodies intellectuality, and a kind of behavior that is user's collection can the intelligent collection that reflects the user be accustomed to, like, and the website can be carried out the classification that the user collects content automatically; The mode of collection is a lot, can use the website to collect, and the application of various operating platforms (for example, android, ios, winphone) is collected, and can also collect through the browser plug-in of customization.
The foregoing description is illustrated with the example of browser as network information process software, need to prove, and is alternative, also can be for other built-in or be installed on client network information process software.
Generally speaking; As illustrated among the above-mentioned embodiment, client is generally the distinct device that two networks are connected with server, but as special case; When in same computing machine, not only the web server being installed but also browser has been installed, client and server one also can be same equipment.
Those skilled in the art should be understood that; Above-mentioned each module of the present invention or each step can realize that they can concentrate on the single calculation element with the general calculation device, perhaps are distributed on the network that a plurality of calculation element forms; Alternatively; They can realize with the executable program code of calculation element, thereby, can they be stored in the memory storage and carry out by calculation element; Perhaps they are made into each integrated circuit modules respectively, perhaps a plurality of modules in them or step are made into the single integrated circuit module and realize.Like this, the present invention is not restricted to any specific hardware and software combination.
Though the embodiment that the present invention disclosed as above, the embodiment that described content just adopts for the ease of understanding the present invention is not in order to limit the present invention.Technician under any the present invention in the technical field; Under the prerequisite of spirit that does not break away from the present invention and disclosed and scope; Can do any modification and variation what implement in form and on the details; But scope of patent protection of the present invention still must be as the criterion with the scope that appending claims was defined.

Claims (10)

1. a network information collecting method is characterized in that, comprising:
Obtaining step, according to user's collection indication, obtain the user the collection information of treating that will collect;
Confirm step, the said collection information of treating is analyzed, to confirm the said classification of waiting the information of collecting;
The collection step is stored said treat collection information and the determined said classification of waiting the information of collecting,
Wherein,
Packets of information said to be collected draw together with the user website information and/or the relevant information of web page contents that will collect.
2. method according to claim 1 is characterized in that, in said definite step,
Based on the predetermined keyword storehouse; To with said treat the corresponding web page content information of collection information carry out text analyzing, semantic analysis and word frequency statistics analyze in one of at least; Saidly treat in the corresponding web page content information of collection information and be contained at least one keyword in the said predetermined keyword storehouse to judge whether to exist to be contained in; Confirm the said classification of waiting the information of collecting according to judged result, wherein
Said predetermined keyword storehouse comprises a plurality of keywords, and each keyword corresponds respectively to one or more said classification;
Said and said treat the corresponding web page content information of collection information comprise the website information treated in collection information web page contents pointed, treat in the collection information website information the part or all of web page contents and/or the said web page contents of treating that collection information is included of corresponding website.
3. method according to claim 2 is characterized in that, in said definite step, in judged result when being,
The said classification of waiting the information of collecting is confirmed as in the pairing classification of said at least one keyword; Perhaps,
With in said at least one keyword, treat that said and said the said classification of waiting the information of collecting is confirmed as in the more corresponding classification of one or more keywords of occurrence number in the corresponding web page content information of collection information.
4. method according to claim 2 is characterized in that, in said analytical procedure, in judged result for not the time,
Preset default classification is confirmed as in the said classification of waiting the information of collecting; Perhaps
The said classification of waiting the information of collecting is confirmed as in the classification of user's appointment.
5. according to each described method in the claim 2 to 4, it is characterized in that, also comprise:
Step is set, according to user's indication to the keyword in said predetermined keyword storehouse add, deletion, retouching operation.
6. method according to claim 1 is characterized in that, in said definite step,
To with said treat the corresponding web page content information of collection information carry out text analyzing, semantic analysis and word frequency statistics analyze in one of at least; Said and saidly treat the characteristic of the web page content information that collection information is corresponding and be contained in keyword wherein to obtain to be used to embody; Confirm the said classification of waiting the information of collecting according to said at least one keyword; Wherein
Said and said treat the corresponding web page content information of collection information comprise the website information treated in collection information web page contents pointed, treat in the collection information website information the part or all of web page contents and/or the said web page contents of treating that collection information is included of corresponding website.
7. method according to claim 6 is characterized in that, in said definite step,
The said classification of waiting the information of collecting is confirmed as in the pairing classification of said one or more keywords.
8. method according to claim 6 is characterized in that, in said definite step, all is not said user used minute time-like in collection process before in each classification that said one or more keywords are corresponding,
Preset default classification is confirmed as in the said classification of waiting the information of collecting; Perhaps
The said classification of waiting the information of collecting is confirmed as in the classification of user's appointment.
9. according to claim 3,7 or 8 described methods, it is characterized in that,
Preestablish the corresponding classification of each keyword, perhaps, through carrying out the classification that text analyzing and/or semantic analysis confirm that each keyword is corresponding.
10. a network information collection system is characterized in that, comprising:
Acquiring unit, according to user's collection indication, obtain the user the collection information of treating that will collect;
Confirm the unit, the said collection information of treating is analyzed, to confirm the said classification of waiting the information of collecting;
Said treat collection information and the determined said classification of waiting the information of collecting are stored in the collection unit,
Wherein,
Packets of information said to be collected draw together with the user website information and/or the relevant information of web page contents that will collect.
CN201210180521.0A 2012-06-01 2012-06-01 Method and system for collecting network information Active CN102799610B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201210180521.0A CN102799610B (en) 2012-06-01 2012-06-01 Method and system for collecting network information

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210180521.0A CN102799610B (en) 2012-06-01 2012-06-01 Method and system for collecting network information

Publications (2)

Publication Number Publication Date
CN102799610A true CN102799610A (en) 2012-11-28
CN102799610B CN102799610B (en) 2017-04-12

Family

ID=47198720

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210180521.0A Active CN102799610B (en) 2012-06-01 2012-06-01 Method and system for collecting network information

Country Status (1)

Country Link
CN (1) CN102799610B (en)

Cited By (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103324669A (en) * 2013-05-20 2013-09-25 北京奇虎科技有限公司 Method and client for processing web page bookmark
CN103885980A (en) * 2012-12-21 2014-06-25 腾讯科技(深圳)有限公司 Bookmark adding method and browser
CN104077314A (en) * 2013-03-28 2014-10-01 腾讯科技(深圳)有限公司 Method and system for adding browser into favorites and terminal equipment
CN104123316A (en) * 2013-04-28 2014-10-29 腾讯科技(深圳)有限公司 Resource collection method, device and facility
CN104125264A (en) * 2013-04-28 2014-10-29 腾讯科技(深圳)有限公司 Resource collecting method, device and equipment
WO2014194766A1 (en) * 2013-06-06 2014-12-11 Tencent Technology (Shenzhen) Company Limited Method, system and apparatus for collecting a webpage
CN104391936A (en) * 2014-11-21 2015-03-04 百度在线网络技术(北京)有限公司 Method and device for processing tags in browser favorite
CN104899270A (en) * 2015-05-26 2015-09-09 惠州Tcl移动通信有限公司 Intelligent terminal and information storage method for same
CN105095517A (en) * 2015-09-17 2015-11-25 安一恒通(北京)科技有限公司 Method and device for sorting favorites of browser
CN105793846A (en) * 2016-01-21 2016-07-20 马岩 Method and system for sorting member information based on app
CN105893584A (en) * 2016-04-03 2016-08-24 北京设集约科技有限公司 Method, client and system for displaying website label of favorites
CN106095985A (en) * 2016-06-20 2016-11-09 网际傲游(北京)科技有限公司 A kind of dynamic collection the method for cluster web pages information
CN106570061A (en) * 2016-09-30 2017-04-19 维沃移动通信有限公司 Webpage tag management method and mobile terminal
US9674271B2 (en) 2013-04-28 2017-06-06 Tencent Technology (Shenzhen) Company Limited Platform for sharing collected information with third-party applications
CN107193981A (en) * 2017-05-26 2017-09-22 腾讯科技(深圳)有限公司 Collection file is shown, processing method and processing device, computer-readable storage medium and equipment
CN107436907A (en) * 2016-05-27 2017-12-05 中国联合网络通信集团有限公司 Web text classification integration method and device
CN108959316A (en) * 2017-05-24 2018-12-07 北京搜狗科技发展有限公司 A kind of method and apparatus adding a webpage to collection
CN109033306A (en) * 2018-07-17 2018-12-18 佛山市灏金赢科技有限公司 A kind of browsing webpage method for sorting and system for mobile client
CN109493845A (en) * 2019-01-02 2019-03-19 百度在线网络技术(北京)有限公司 For generating the method and device of audio
CN109657168A (en) * 2018-11-30 2019-04-19 维沃移动通信有限公司 A kind of collection record display methods and device
CN110059268A (en) * 2018-12-27 2019-07-26 阿里巴巴集团控股有限公司 Collect the determination method, apparatus and client device of object type
CN110351183A (en) * 2019-06-03 2019-10-18 阿里巴巴集团控股有限公司 Resource collecting method and device in instant messaging
CN115248803A (en) * 2022-09-22 2022-10-28 天津联想协同科技有限公司 Collection method and device suitable for network disk file, network disk and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1756160A (en) * 2004-09-27 2006-04-05 戴志军 Individualized website convenient for user accessing Internet
CN101382954A (en) * 2008-09-25 2009-03-11 北京搜狗科技发展有限公司 Method and system for providing web site collection name
CN102043805A (en) * 2009-10-19 2011-05-04 阿里巴巴集团控股有限公司 Method and device for generating Internet navigation page
CN102298614A (en) * 2011-07-29 2011-12-28 百度在线网络技术(北京)有限公司 Method for determining collection category of page collection information and device and equipment

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1756160A (en) * 2004-09-27 2006-04-05 戴志军 Individualized website convenient for user accessing Internet
CN101382954A (en) * 2008-09-25 2009-03-11 北京搜狗科技发展有限公司 Method and system for providing web site collection name
CN102043805A (en) * 2009-10-19 2011-05-04 阿里巴巴集团控股有限公司 Method and device for generating Internet navigation page
CN102298614A (en) * 2011-07-29 2011-12-28 百度在线网络技术(北京)有限公司 Method for determining collection category of page collection information and device and equipment

Cited By (31)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103885980A (en) * 2012-12-21 2014-06-25 腾讯科技(深圳)有限公司 Bookmark adding method and browser
CN104077314A (en) * 2013-03-28 2014-10-01 腾讯科技(深圳)有限公司 Method and system for adding browser into favorites and terminal equipment
CN104123316A (en) * 2013-04-28 2014-10-29 腾讯科技(深圳)有限公司 Resource collection method, device and facility
CN104125264A (en) * 2013-04-28 2014-10-29 腾讯科技(深圳)有限公司 Resource collecting method, device and equipment
CN104123316B (en) * 2013-04-28 2018-12-04 腾讯科技(深圳)有限公司 Resource collecting method, device and equipment
CN104125264B (en) * 2013-04-28 2017-08-25 腾讯科技(深圳)有限公司 Resource collecting method, device and equipment
US9674271B2 (en) 2013-04-28 2017-06-06 Tencent Technology (Shenzhen) Company Limited Platform for sharing collected information with third-party applications
CN103324669B (en) * 2013-05-20 2016-12-28 北京奇虎科技有限公司 A kind of method that Web page bookmark is processed and client
CN103324669A (en) * 2013-05-20 2013-09-25 北京奇虎科技有限公司 Method and client for processing web page bookmark
WO2014194766A1 (en) * 2013-06-06 2014-12-11 Tencent Technology (Shenzhen) Company Limited Method, system and apparatus for collecting a webpage
CN104391936A (en) * 2014-11-21 2015-03-04 百度在线网络技术(北京)有限公司 Method and device for processing tags in browser favorite
CN104899270A (en) * 2015-05-26 2015-09-09 惠州Tcl移动通信有限公司 Intelligent terminal and information storage method for same
CN105095517A (en) * 2015-09-17 2015-11-25 安一恒通(北京)科技有限公司 Method and device for sorting favorites of browser
WO2017124367A1 (en) * 2016-01-21 2017-07-27 马岩 App-based member information classification method and system
CN105793846A (en) * 2016-01-21 2016-07-20 马岩 Method and system for sorting member information based on app
CN105893584A (en) * 2016-04-03 2016-08-24 北京设集约科技有限公司 Method, client and system for displaying website label of favorites
CN107436907A (en) * 2016-05-27 2017-12-05 中国联合网络通信集团有限公司 Web text classification integration method and device
CN106095985A (en) * 2016-06-20 2016-11-09 网际傲游(北京)科技有限公司 A kind of dynamic collection the method for cluster web pages information
CN106095985B (en) * 2016-06-20 2019-08-20 网际傲游(北京)科技有限公司 A kind of method of dynamic collection and cluster web pages information
CN106570061A (en) * 2016-09-30 2017-04-19 维沃移动通信有限公司 Webpage tag management method and mobile terminal
CN108959316A (en) * 2017-05-24 2018-12-07 北京搜狗科技发展有限公司 A kind of method and apparatus adding a webpage to collection
CN108959316B (en) * 2017-05-24 2021-08-20 北京搜狗科技发展有限公司 Method and device for adding webpage to favorites
CN107193981A (en) * 2017-05-26 2017-09-22 腾讯科技(深圳)有限公司 Collection file is shown, processing method and processing device, computer-readable storage medium and equipment
CN109033306A (en) * 2018-07-17 2018-12-18 佛山市灏金赢科技有限公司 A kind of browsing webpage method for sorting and system for mobile client
CN109657168A (en) * 2018-11-30 2019-04-19 维沃移动通信有限公司 A kind of collection record display methods and device
CN110059268A (en) * 2018-12-27 2019-07-26 阿里巴巴集团控股有限公司 Collect the determination method, apparatus and client device of object type
CN109493845A (en) * 2019-01-02 2019-03-19 百度在线网络技术(北京)有限公司 For generating the method and device of audio
CN110351183A (en) * 2019-06-03 2019-10-18 阿里巴巴集团控股有限公司 Resource collecting method and device in instant messaging
CN110351183B (en) * 2019-06-03 2021-06-08 创新先进技术有限公司 Resource collection method and device in instant messaging
CN115248803A (en) * 2022-09-22 2022-10-28 天津联想协同科技有限公司 Collection method and device suitable for network disk file, network disk and storage medium
CN115248803B (en) * 2022-09-22 2023-02-17 天津联想协同科技有限公司 Collection method and device suitable for network disk file, network disk and storage medium

Also Published As

Publication number Publication date
CN102799610B (en) 2017-04-12

Similar Documents

Publication Publication Date Title
CN102799610A (en) Method and system for collecting network information
CN102663064B (en) A kind of disposal route of favorites data and device
CN102024064B (en) Rapid searching method and mobile communication terminal
CN107256232B (en) Information recommendation method and device
CN103780677A (en) Method for performing classified information push and system thereof
US20160364373A1 (en) Method and apparatus for extracting webpage information
CN104899220A (en) Application program recommendation method and system
CN102968451B (en) The browser form page loads method and the client of website data
WO2014180130A1 (en) Method and system for recommending contents
CN102298614A (en) Method for determining collection category of page collection information and device and equipment
CN101996193A (en) Processing method and system for expressing network resource link and internet terminal
CN105991722B (en) Downloader recommendation method, application server, terminal and system
CN104010035A (en) Method and system for application program distribution
CN112818111B (en) Document recommendation method, device, electronic equipment and medium
CN105279206A (en) Intelligent recommendation method and system
CN105260459B (en) Searching method and device
CN105930488A (en) Information search processing method and apparatus
CN103902579A (en) Method and device for acquiring information
CN106557584A (en) A kind of web site collection method and device
CN104753979B (en) A kind of method, server, terminal and system showing site information
CN103577426A (en) Method, device and system for providing additional application messages of searching suggestion
CN105550179A (en) Webpage collection method and browser plug-in
CN103634470A (en) Human-computer interaction prediction method based on terminal mobile data access network Qos
CN105893584A (en) Method, client and system for displaying website label of favorites
CN105989171A (en) Media file processing method and device

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20230206

Address after: Room 1102, 11 / F, building 2, dingchuang wealth center, 1166 liangmu Road, Cangqian street, Yuhang District, Hangzhou City, Zhejiang Province, 311100

Patentee after: ZHEJIANG SHUQIN TECHNOLOGY CO.,LTD.

Address before: 405C, Building 106, Lize Zhongyuan, Chaoyang District, Beijing, 100102

Patentee before: BEIJING QILEKE TECHNOLOGY Co.,Ltd.