CN105808545A - Forum data extraction method and forum data extraction apparatus - Google Patents

Forum data extraction method and forum data extraction apparatus Download PDF

Info

Publication number
CN105808545A
CN105808545A CN201410840255.9A CN201410840255A CN105808545A CN 105808545 A CN105808545 A CN 105808545A CN 201410840255 A CN201410840255 A CN 201410840255A CN 105808545 A CN105808545 A CN 105808545A
Authority
CN
China
Prior art keywords
address
forum
page
information
current traversal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201410840255.9A
Other languages
Chinese (zh)
Inventor
谢伟峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
TCL Corp
Original Assignee
TCL Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by TCL Corp filed Critical TCL Corp
Priority to CN201410840255.9A priority Critical patent/CN105808545A/en
Publication of CN105808545A publication Critical patent/CN105808545A/en
Pending legal-status Critical Current

Links

Abstract

The invention discloses a forum data extraction method and a forum data extraction apparatus. The forum data extraction method comprises the steps of obtaining a target forum address and a query object, and taking the target forum address as a current traversal address; by taking the query object as an index, searching for information related to the query object in a webpage corresponding to the current traversal address; judging whether the current traversal address contains a next page link address or not; if the current traversal address contains the next page link address, obtaining the next page link address of the current traversal address, taking the obtained next page link address as the current traversal address, and returning to perform the step of performing data search in the webpage corresponding to the current traversal address and the subsequent steps; and if the current traversal address does not contain the next page link address, displaying all found information related to the query object. According to the technical scheme provided by the invention, information released by a user name can be quickly searched for in a forum.

Description

A kind of forum data extracting method and forum data extraction element
Technical field
The present invention relates to Internet technical field, be specifically related to a kind of forum data extracting method and forum data Extraction element.
Background technology
Forum has been thing the most universal on the Internet at present, and somebody can issue novel in forum Or some content of continuous updating, the theme to forum that can have many consumers the most accordingly comment on or Express the suggestion of oneself.
Major part forum meeting provides the user function of search in forum, and user can be searched by this function of search Related article in rope forum, i.e. when user uses this function of search and inputs key word, forum backstage will All of chained address in traveling through this forum, searches and this key in the webpage that all-links address is corresponding Article in the forum that word is relevant, and lookup result is presented to user.But, due to past in current forum Contact comprise advertisement or other implant external website, this make forum backstage search data time also can be to this A little external websites travel through and search, thus have impact on data search efficiency.
Summary of the invention
The present invention provides a kind of forum data extracting method and forum data extraction element, for improving in forum The efficiency of middle lookup data.
One aspect of the present invention provides a kind of forum data extracting method, including:
Obtain target forum address and search object, and using above-mentioned target forum address as currently traveling through ground Location;
In the webpage that current traversal address is corresponding, carry out data search, wherein, above-mentioned currently travel through address Corresponding webpage carries out data search include: with above-mentioned lookup object for index, in current traversal address pair The webpage answered is searched the information relevant to above-mentioned lookup object;
Judge one page chained address in the presence of current traversal address is whether;
If currently one page chained address in the presence of traversal address, then obtain lower one page link of current traversal address Address, using lower one page chained address of acquisition as currently traveling through address, and it is above-mentioned at current time to return execution Go through and the webpage that address is corresponding carries out the step of data search and above-mentioned judgement currently travels through whether address deposits In the step of lower one page chained address, wherein, lower one page chained address of current traversal address is current traversal The chained address of the lower one page of the webpage that address is corresponding;
If currently one page chained address in the absence of traversal address, then show all and above-mentioned lookup found The information that object is relevant.
Another aspect of the present invention provides another kind of forum data extraction element, including:
First acquiring unit, is used for obtaining target forum address and searching object, and by above-mentioned target forum ground Location is as currently traveling through address;
First searches unit, for above-mentioned lookup object for index, at the webpage that current traversal address is corresponding The information that middle lookup is relevant to above-mentioned lookup object;
Judging unit, be used for judging currently traveling through address whether in the presence of one page chained address;
Second acquisition unit, for when the judged result of above-mentioned judging unit is for being, obtains and currently travels through ground Lower one page chained address of location, using lower one page chained address of acquisition as currently traveling through address, and triggers Stating the first lookup unit, wherein, lower one page chained address of current traversal address is that current traversal address is corresponding The chained address of lower one page of webpage;
Display unit, for for when the judged result of above-mentioned judging unit is no, showing the institute found There is the information relevant to above-mentioned lookup object.
Therefore, the present invention is after getting target forum address and searching object, from target forum address Proceed by data search and the lookup of lower one page chained address, after getting lower one page chained address, The lower one page chained address got carries out data search and the lookup of lower one page chained address, by that analogy, Until currently traveling through one page chained address in the absence of address, pass through the present invention program, it is possible to filter in forum Advertisement or other implant external website, thus save carry out in these external websites data search time Between, improve the efficiency searching data in forum.
Accompanying drawing explanation
In order to be illustrated more clearly that the embodiment of the present invention or technical scheme of the prior art, below will be to enforcement In example or description of the prior art, the required accompanying drawing used is briefly described, it should be apparent that, describe below In accompanying drawing be only some embodiments of the present invention, for those of ordinary skill in the art, do not paying On the premise of going out creative work, it is also possible to obtain other accompanying drawing according to these accompanying drawings.
One embodiment schematic flow sheet of a kind of forum data extracting method that Fig. 1 provides for the present invention;
Another embodiment schematic flow sheet of a kind of forum data extracting method that Fig. 2 provides for the present invention;
A kind of forum data extraction system display interface schematic diagram that Fig. 3-a provides for the present invention;
The another kind of forum data extraction system display interface schematic diagram that Fig. 3-b provides for the present invention;
One example structure schematic diagram of a kind of forum data extraction element that Fig. 4 provides for the present invention.
Detailed description of the invention
For making the goal of the invention of the present invention, feature, the advantage can be the most obvious and understandable, below will knot Close the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is clearly and completely retouched State, it is clear that described embodiment is only a part of embodiment of the present invention, and not all embodiments.Base Embodiment in the present invention, those of ordinary skill in the art are obtained under not making creative work premise The every other embodiment obtained, broadly falls into the scope of protection of the invention.
Below a kind of forum data extracting method in the embodiment of the present invention is described, refers to Fig. 1, Forum data extracting method in the embodiment of the present invention includes:
101, obtain target forum address and search object, and using above-mentioned forum address as currently traveling through ground Location;
In the embodiment of the present invention, forum generally refers to provide the function allowing user exchange, discussing, and can allow use Publishing an article in family, replys the website of the functions such as the article that other users deliver, such as, BBS(Bulletin Board System) (BBS, Bulletin Board System) is exactly a kind of form of forum.
In the embodiment of the present invention, after forum data extraction element starts, forum data extraction element is user There is provided for edit target forum address the first input control, for edit lookup object (such as user name, Issuing time) the second input control and for triggering the execution control of step 101.
In one implementation, when forum data extraction element starts, forum data extraction element detects Whether default browser is currently running, if default browser is currently running, then forum data extraction element carries Take the network address in this default browser address field and be input in above-mentioned first input control;If acquiescence browses Device does not run, then last access during forum data extraction element extracts this default browser address field Network address is also input in above-mentioned first input control, although, in this implementation, forum data extract Device automatically extracts a network address and is input in above-mentioned first input control, but, user still can be at this The network address of forum data extraction element input is revised on first input control.
In another kind of implementation, when forum data extraction element starts, user is by input or pastes Mode be manually entered respectively in above-mentioned first input control and above-mentioned second input control forum address and User name.Which kind of mode the most above-mentioned, when user triggers step 101 by above-mentioned execution control, forum Data extraction device obtains in the input in above-mentioned first input control and above-mentioned second input control respectively Input as target forum address and search object.
It should be noted that the address that the embodiment of the present invention inputs above-mentioned first input control is necessary for a forum Address, otherwise, when user triggers step 101 by above-mentioned execution control, step 101 and subsequent step Will be unable to properly functioning, optionally, in this case, the output instruction of forum data extraction element is without searching The information that result or lookup make mistakes.
102, with above-mentioned lookup object for index, search with above-mentioned in the webpage that current traversal address is corresponding Search the information that object is relevant;
In the embodiment of the present invention, forum data extraction element is with above-mentioned lookup object for index, in current traversal The webpage that address is corresponding is searched the information relevant to above-mentioned lookup object.
Alternatively, above-mentioned lookup object is user name, then forum data extraction element is with the entitled index of user, Searching, in the webpage that current traversal address is corresponding, the information that this user name is issued, specifically, forum data carries Fetching puts HTML (HTML, the HyperText of the webpage corresponding to current traversal address Mark-up Language) file is analyzed, and searches the information that this user name is issued, such as, forum's number In the current HTML traveling through webpage corresponding to address, " usename ", " content " is found according to extraction element Etc keyword, then " usename " field below is exactly user name, and is positioned at this " usename " The field below of nearest " content " afterwards is exactly the information that this user name is issued.
Alternatively, above-mentioned lookup object is issuing time, then forum data extraction element is with issuing time as rope Draw, the information issued in searching this issuing time in the webpage that current traversal address is corresponding, specifically, opinion The html file of the webpage that altar data extraction device is corresponding to current traversal address is analyzed, and searches this The information issued in the cloth time, such as, the webpage that forum data extraction element is corresponding in current traversal address HTML finds the keyword of " time ", " replaytime " etc, then " time " or " replaytime " Or field below is exactly issuing time, and it is positioned at nearest after this " time " or " replaytime " Individual " content " field below is exactly the interior information issued of this issuing time.
Certainly, above-mentioned lookup object can also be other type object, or can also multiple type objects Combination, the combination of such as user name and issuing time, it is not construed as limiting herein.
103, one page chained address in the presence of current traversal address is whether is judged;
Forum data extraction element judges one page chained address in the presence of current traversal address is whether, if current time Go through one page chained address in the presence of address, then perform step 104, if currently traveling through one page in the absence of address Chained address, then perform step 105.
Optionally, forum data extraction element accesses on backstage or foreground and currently travels through address, and to current time Go through address and carry out extreme saturation, obtain in current traversal address all can chained address, and will above-mentioned own Can chained address be that this currently travels through lower one page of address with Address Confirmation of this current traversal address beginning Chained address, if forum data extraction element detect current traversal address does not exist can chained address or Person, forum data extraction element detect in the current traversal address of acquisition all can be in chained address, no Exist with this address that currently traversal address starts, then forum data extraction element judges that this currently travels through address In the absence of one page chained address, perform step 105, otherwise, forum data extraction element judges that this is current One page chained address in the presence of traversal address, performs step 104.Specifically, forum data extraction element can To obtain the html file of webpage corresponding to current traversal address, and in this html file, find href Attributes extraction this currently all in traversal address can chained address.
Illustrate, it is assumed that above-mentioned target forum network address is http://bbs.abc.net/topics/3907, forum's number After this target forum network address being traveled through as current traversal address according to extraction element, obtain much can linking Address, including:
http://bbs.abc.net/topics/3907?Page=2#,
Http:// www.abc.net/article/20141028/28, and
http://bbs.abc.net/topics/3907?Tri-network address of page=2#new_post, then forum data is extracted Device by above three network address currently to travel through what address " http://bbs.abc.net/topics/3907 " started Address Confirmation is the lower one page chained address currently traveling through address, and judges one page in the presence of current traversal address Chained address.
Owing to a topic (or model) in forum is usually directed to multiple page, these multiple pages are corresponding Parameter value in the page turning form differed only in network address of network address is different, and, make in same forum Page turning form be that changeless (the page turning form used in such as csdn forum is for " page=number of pages # ") therefore, in order to improve efficiency, optionally, if currently traversal address comprises page turning form, then forum's number Attempt after adding one according to extraction element by the value of the page turning form currently traveled through in address accessing, if accessing successfully, One page chained address in the presence of then forum data extraction element judges current traversal address, and lower one page link ground Location be amendment page turning form value after the address that obtains, forum data extraction element performs step 104, if visiting Ask unsuccessfully, then one page chained address in the absence of forum data extraction element judges current traversal address, perform Step 105.Concrete, the embodiment of the present invention can be pre-configured with multiple already present page turning form, When comprising any one of the multiple page turning form that is pre-configured with in current traversal address, forum data extracts dress Put in this current traversal address of judgement and comprise page turning form.Illustrate, it is assumed that current traversal address is http://bbs.abc.net/topics/3907?Page=1#, current traversal address is carried out by forum data extraction element After traversal, detect that this currently comprises page turning form (such as " page=number of pages # ") in traversal address, then ought The value of the page turning form in front traversal address adds one, obtains address “http://bbs.abc.net/topics/3907?Page=2# ", forum data extraction element is attempted accessing “http://bbs.abc.net/topics/3907?Page=2# ", if accessing successfully, then forum data extraction element Judge one page chained address in the presence of current traversal address, and lower one page chained address is “http://bbs.abc.net/topics/3907?Page=2# ", then forum data extraction element performs step 105, If accessing unsuccessfully, then one page chained address in the absence of forum data extraction element judges current traversal address, Forum data extraction element performs step 104.
104, obtain the current lower one page chained address traveling through address, the lower one page chained address obtained is made For currently traveling through address, return and perform step 102.
105, the information that display finds all and above-mentioned lookup object is relevant;
Optionally, the information that forum data extraction element statistics finds all and above-mentioned lookup object is relevant Quantity, and while the information that above-mentioned targeted customer's name of finding of display is issued, show this quantity. Further, predetermined threshold value is exceeded when the quantity of the relevant information of all and above-mentioned lookup object found Time, forum data extraction element use page turning pattern to show all and above-mentioned lookup object found is relevant Information, concrete, be set in advance in the information bar number of every page of display in page turning pattern, then basis finds The quantity of the information that all and above-mentioned lookup object is relevant and the above-mentioned information bar number pre-set, Ji Keji Calculate total page number.
Optionally, user checks lookup result on external equipment for convenience, and forum data extraction element will The information that all and above-mentioned lookup object of finding is relevant is stored as text or other type of file, So that the file comprising the relevant information of all and above-mentioned lookup object found is sent to external equipment On.
Optionally, if above-mentioned lookup object is to user name, then step 102 also includes: currently traveling through address In corresponding webpage, search the information that other user name in addition to above-mentioned lookup object is issued, and with in forum Topic identifier (such as topic ID or topic name) and user name be combined as classification, by find The information classification that each user name is issued stores or updates in data base, in order to user needs in same forum When searching the information that other user name is issued, directly can find the storage of the information wanted to look up according to classification Path, checks corresponding information.
Optionally, before step 102, what forum data extraction element reception user inputted is used for logging in State the username and password of forum corresponding to target forum address, and according to this user name of user's input and close Code logs in, if logging in successfully, then performing step 102 and subsequent step, if logging in failure, then wanting User is asked to re-enter the username and password for logging in forum corresponding to above-mentioned target forum address, until Log in successfully.
It should be noted that the forum data extraction element in the embodiment of the present invention is specifically as follows intelligence hands Machine, panel computer, notebook etc. can connect the equipment of the Internet, are not construed as limiting herein.
Therefore, the present invention is after getting target forum address and searching object, from target forum address Proceed by data search and the lookup of lower one page chained address, after getting lower one page chained address, The lower one page chained address got carries out data search and the lookup of lower one page chained address, by that analogy, Until currently traveling through one page chained address in the absence of address, pass through the present invention program, it is possible to filter in forum Advertisement or other implant external website, thus save carry out in these external websites data search time Between, improve the efficiency searching data in forum.
Below with a specific embodiment, a kind of forum data extracting method in the embodiment of the present invention is retouched Stating, in the embodiment of the present invention, lookup object is user name, refers to Fig. 2, the opinion in the embodiment of the present invention Altar data extraction method includes:
201, target forum address and targeted customer's name are obtained, using above-mentioned forum address as currently traveling through ground Location;
In the embodiment of the present invention, forum generally refers to provide the function allowing user exchange, discussing, and can allow use Publishing an article in family, replys the website of the functions such as the article that other users deliver, and such as, BBS is exactly forum A kind of form.
In the embodiment of the present invention, after forum data extraction element starts, forum data extraction element is user The first input control for editing target forum address is provided, inputs for the second of editor's targeted customer's name Control and for triggering the execution control of step 201.
In one implementation, when forum data extraction element starts, forum data extraction element detects Whether default browser is currently running, if default browser is currently running, then forum data extraction element carries Take the network address in this default browser address field and be input in above-mentioned first input control;If acquiescence browses Device does not run, then last access during forum data extraction element extracts this default browser address field Network address is also input in above-mentioned first input control, although, in this implementation, forum data extract Device automatically extracts a network address and is input in above-mentioned first input control, but, user still can be at this The network address of forum data extraction element input is revised on first input control.
In another kind of implementation, when forum data extraction element starts, user is by input or pastes Mode be manually entered respectively in above-mentioned first input control and above-mentioned second input control forum address and User name.Which kind of mode the most above-mentioned, when user triggers step 101 by above-mentioned execution control, forum Data extraction device obtains the input address in above-mentioned first input control and above-mentioned second input control respectively In address as target forum address and targeted customer's name.
It should be noted that the address that the embodiment of the present invention inputs above-mentioned first input control is necessary for a forum Address, otherwise, when user triggers step 201 by above-mentioned execution control, step 201 and subsequent step Will be unable to properly functioning, optionally, in this case, the output instruction of forum data extraction element is without searching The information that result or lookup make mistakes.
202, the information and its searching the issue of above-mentioned targeted customer's name in the webpage that address is corresponding is currently being traveled through The information that its user name is issued;
Alternatively, the html file of the webpage that forum data extraction element is corresponding to current traversal address is carried out Analyzing, search the information that above-mentioned targeted customer's name is issued, such as, forum data extraction element is in current traversal The HTML of the webpage that address is corresponding finds the keyword of " usename ", " content " etc, then " usename " field below is exactly user name, and is positioned at nearest after this " usename " " content " field below is exactly the information that this user name is issued.
203, one page chained address in the presence of current traversal address is whether is judged;
Forum data extraction element judges one page chained address in the presence of current traversal address is whether, if current time Go through one page chained address in the presence of address, then perform step 204, if currently traveling through one page in the absence of address Chained address, then perform step 205 and step 206.
Optionally, forum data extraction element accesses on backstage or foreground and currently travels through address, and to current time Go through address and carry out extreme saturation, obtain in current traversal address all can chained address, and will above-mentioned own Can chained address be that this currently travels through lower one page of address with Address Confirmation of this current traversal address beginning Chained address, if forum data extraction element detect current traversal address does not exist can chained address or Person, forum data extraction element detect in the current traversal address of acquisition all can be in chained address, no Exist with this address that currently traversal address starts, then forum data extraction element judges that this currently travels through address In the absence of one page chained address, perform step 205 and step 206, otherwise, forum data extraction element is sentenced Fixed this currently travels through one page chained address in the presence of address, performs step 204.Specifically, forum data carries Fetching puts the html file that can obtain webpage corresponding to current traversal address, and in this html file Find href attributes extraction this currently all in traversal address can chained address.
Illustrate, it is assumed that above-mentioned target forum network address is http://bbs.abc.net/topics/3907, forum's number After this target forum network address being traveled through as current traversal address according to extraction element, obtain much can linking Address, including:
http://bbs.abc.net/topics/3907?Page=2#,
Http:// www.abc.net/article/20141028/28, and
http://bbs.abc.net/topics/3907?Tri-network address of page=2#new_post, then forum data is extracted Device by above three network address currently to travel through what address " http://bbs.abc.net/topics/3907 " started Address Confirmation is the lower one page chained address currently traveling through address, and judges one page in the presence of current traversal address Chained address.
Owing to a topic (or model) in forum is usually directed to multiple page, these multiple pages are corresponding Parameter value in the page turning form differed only in network address of network address is different, and, make in same forum Page turning form be that changeless (the page turning form used in such as csdn forum is for " page=number of pages # ") therefore, in order to improve efficiency, optionally, if currently traversal address comprises page turning form, then forum's number Attempt after adding one according to extraction element by the value of the page turning form currently traveled through in address accessing, if accessing successfully, One page chained address in the presence of then forum data extraction element judges current traversal address, and lower one page link ground Location be amendment page turning form value after the address that obtains, forum data extraction element performs step 204, if visiting Ask unsuccessfully, then one page chained address in the absence of forum data extraction element judges current traversal address, perform Step 205 and 206.Concrete, the embodiment of the present invention can be pre-configured with multiple already present page turning form, When comprising any one of the multiple page turning form that is pre-configured with in current traversal address, forum data is extracted Device judges that this currently travels through in address and comprises page turning form.Illustrate, it is assumed that current traversal address is http://bbs.abc.net/topics/3907?Page=1#, current traversal address is carried out by forum data extraction element After traversal, detect that this currently comprises page turning form (such as " page=number of pages # ") in traversal address, then ought The value of the page turning form in front traversal address adds one, obtains address “http://bbs.abc.net/topics/3907?Page=2# ", forum data extraction element is attempted accessing “http://bbs.abc.net/topics/3907?Page=2# ", if accessing successfully, then forum data extraction element Judge one page chained address in the presence of current traversal address, and lower one page chained address is “http://bbs.abc.net/topics/3907?Page=2# ", then forum data extraction element performs step 204, If accessing unsuccessfully, then one page chained address in the absence of forum data extraction element judges current traversal address, Forum data extraction element performs step 205 and step 206.
204, obtain the current lower one page chained address traveling through address, the lower one page chained address obtained is made For currently traveling through address, return and perform step 202.
205, the information that above-mentioned targeted customer's name that display finds is issued.
Optionally, the number of the information that above-mentioned targeted customer's name that forum data extraction element statistics finds is issued Amount, and while showing the information that the above-mentioned targeted customer's name found is issued, show this quantity.Enter one Step, when the quantity of the information that the above-mentioned targeted customer's name found is issued exceedes predetermined threshold value, forum's number Page turning pattern is used to show the information that the above-mentioned targeted customer's name found is issued according to extraction element, concrete, It is set in advance in the information bar number of every page of display in page turning pattern, then according to the above-mentioned targeted customer's name found The quantity of the information issued and the above-mentioned information bar number pre-set, can calculate total page number.
Optionally, user checks lookup result on external equipment for convenience, and forum data extraction element will The information that all above-mentioned targeted customer's name found is issued is stored as text or other type of file, So that the file comprising the information that all above-mentioned targeted customer's name found is issued is sent to external equipment On.
206, being combined as classification with the topic identifier in forum and user name, each that will find is used The information classification that name in an account book is issued stores or updates in data base;
In the embodiment of the present invention, topic identifier is specially topic ID or topic name.
Forum data extraction element, with the classification that is combined as of the topic identifier in forum and user name, will be searched To the information classification storage issued of each user name or update in data base, i.e. by same topic by The information that same user name is issued is stored in the same position of data base.
Optionally, in step 206, forum data extraction element will be sent out by same user name in same topic The information of cloth is stored in the one text file of data base, and with the information sum comprised in text file As updating mark, i.e. forum data extraction element judges to work as according to the information sum comprised in text Before be updated the need of to the information in text file, illustrate, if find in topic A use Name in an account book B has issued 30 information altogether, and has currently stored the text literary composition with topic party A-subscriber's name B as classification Part, then the letter comprised in the forum data extraction element detection text with topic party A-subscriber's name B as classification Breath bar number, if be detected that information sum less than 30, then forum data extraction element updates with topic A User name B is the information in the text of classification, i.e. the user B that will do not store in text file The information issued in topic A stores in text file, if be detected that information sum equal to 30 Bar, then forum data extraction element does not update the text with topic party A-subscriber's name B as classification.Optionally, Forum data extraction element storage comprises topic ID, user name, information sum and the form of store path, with Just user can quickly navigate to the storage of the information that certain user name is issued in certain topic by this form Path, forum data extraction element can also judge whether according to the information sum in this form simultaneously The information of storage is updated, and concrete, above table can be as shown in table 1:
Table 1
Numbering Topic ID User name Information sum Store path
1 390755981 Xewenfung 25 C:/a.txt
2 390755981 Cainiao 28 C:/b.txt
3 390755981 Han Han Semen Phaseoli 30 C:/c.txt
4 390755971 Xewenfung 10 C:/d.txt
From table 1, forum data extraction element is combined as classification with the topic ID in forum and user name, The information that same for same topic user name is issued is stored with the form of text, and, this form Also include the information sum comprised in each text, in order to forum data extraction element is according to information sum Judge that a text is the need of renewal.
Optionally, the information of classification storage can also export in external equipment, in order to user is at external equipment On check the information that step 206 stores.
Optionally, before step 202, what forum data extraction element reception user inputted is used for logging in State the username and password of forum corresponding to target forum address, and according to this user name of user's input and close Code logs in, if logging in successfully, then performing step 202 and subsequent step, if logging in failure, then wanting User is asked to re-enter the username and password for logging in forum corresponding to above-mentioned target forum address, until Log in successfully.
It should be noted that the forum data extraction element in the embodiment of the present invention is specifically as follows intelligence hands Machine, panel computer, notebook etc. can connect the equipment of the Internet, are not construed as limiting herein.
Therefore, the present invention is after getting target forum address and searching object, from target forum address Proceed by data search and the lookup of lower one page chained address, after getting lower one page chained address, The lower one page chained address got carries out data search and the lookup of lower one page chained address, by that analogy, Until currently traveling through one page chained address in the absence of address, pass through the present invention program, it is possible to filter in forum Advertisement or other implant external website, thus save carry out in these external websites data search time Between, improve the efficiency searching data in forum.Specifically, the lookup object in the embodiment of the present invention is User name, the most in actual applications, user only need to input forum address and user name as target forum address With lookup object, the information that this user name is issued in the forum that this forum address is corresponding can be viewed, behaviour Make simple and fast so that user can quickly search the information that a certain user name is issued in arbitrary forum.Enter One step, the embodiment of the present invention, in addition to searching the information that targeted customer's name is issued, also searches other user name The information issued, and with the topic identifier (such as topic ID or topic name) in forum and user name It is combined as classification, the information that each user name that classification storage finds is issued so that user needs same When the information that other user name is issued is searched by forum, directly can find the information that wants to look up according to classification Store path, checks corresponding information, further increases the efficiency of information searching.
In a kind of application scenarios, the forum data that the forum data extraction element in the embodiment of the present invention provides Extraction system display interface can be as shown in Fig. 3-a and Fig. 3-b, and wherein, Fig. 3-a is that user starts forum data Forum data extraction system display interface after extraction element, " network address " hurdle is for being used for editing target forum Address, " user name " hurdle is the control for editor's targeted customer's name, and " beginning " button is for being used for touching Send out the execution control of method flow in Fig. 1 or embodiment illustrated in fig. 2.Fig. 3-b is for triggering " beginning " button After, forum data extraction element target forum address based on input and targeted customer's name scan for after aobvious Show result.
The embodiment of the present invention also provides for a kind of forum data extraction element, as shown in Figure 4, the embodiment of the present invention In forum data extraction element 400, including:
First acquiring unit 401, is used for obtaining target forum address and searching object, and by above-mentioned On Targets Altar address is as currently traveling through address;
First searches unit 402, for above-mentioned lookup object for index, corresponding in current traversal address Webpage is searched the information relevant to above-mentioned lookup object;
Judging unit 403, be used for judging currently traveling through address whether in the presence of one page chained address;
Second acquisition unit 404, for when the judged result of judging unit 403 is for being, obtains current traversal Lower one page chained address of address, using lower one page chained address of acquisition as currently traveling through address, and triggers First searches unit 402, and wherein, lower one page chained address of current traversal address is current traversal address pair The chained address of the lower one page of the webpage answered;
Display unit 405, for for when the judged result of judging unit 403 is no, display finds The information that all and above-mentioned lookup object is relevant.
Alternatively, second acquisition unit 404 specifically for: obtain the institute in the correspondence webpage of current traversal address Have can chained address, and by above-mentioned all can be in chained address, with the address of above-mentioned current traversal address beginning Confirm as lower one page chained address of above-mentioned current traversal address;Or, comprise page turning in current traversal address During form, as current lower one page traveling through address after the value of the page turning form currently traveled through in address is added one Chained address.
Alternatively, above-mentioned lookup object is user name;First search unit 402 specifically for: at current time Go through in the webpage that address is corresponding, search the information that above-mentioned user name is issued.
Alternatively, the forum data extraction element in the embodiment of the present invention also includes: second searches unit, uses In the webpage corresponding in current traversal address, obtain what other user name in addition to above-mentioned lookup object was issued Information;Update memory element, for when the judged result of judging unit 403 is for being, with in forum Topic identifier and user name be combined as classification, by first search unit 402 and above-mentioned second search unit look into The information classification that each user name found is issued stores or updates in data base;Second acquisition unit 404 Be additionally operable to using above-mentioned lower one page chained address as behind current traversal address, trigger above-mentioned second search single Unit.
Alternatively, the forum data extraction element in the embodiment of the present invention also includes: statistic unit, for working as When the judged result of judging unit 403 is no, find all of statistics the first lookup unit 402 and above-mentioned look into The quantity of relevant information of looking for a partner in marriage;Display unit 405 is additionally operable to: show that above-mentioned statistic unit is added up upper State the quantity of the relevant information of all and above-mentioned lookup object.
It should be noted that the forum data extraction element in the embodiment of the present invention is specifically as follows intelligence hands Machine, panel computer, notebook etc. can connect the equipment of the Internet, are not construed as limiting herein.The present invention implements Forum data extraction element in example can be as permissible in the forum data extraction element in above-mentioned embodiment of the method For realizing the whole technical schemes in said method embodiment, it implements process and can refer to said method Associated description in embodiment, here is omitted.
Therefore, the present invention is after getting target forum address and searching object, from target forum address Proceed by data search and the lookup of lower one page chained address, after getting lower one page chained address, The lower one page chained address got carries out data search and the lookup of lower one page chained address, by that analogy, Until currently traveling through one page chained address in the absence of address, pass through the present invention program, it is possible to filter in forum Advertisement or other implant external website, thus save carry out in these external websites data search time Between, improve the efficiency searching data in forum.
In several embodiments provided herein, it should be understood that disclosed apparatus and method, can To realize by another way.Such as, device embodiment described above is only schematically, example Such as, the division of said units, being only a kind of logic function and divide, actual can have other drawing when realizing Point mode, the most multiple unit or assembly can in conjunction with or be desirably integrated into another system, or some are special Levy and can ignore, or do not perform.Another point, shown or discussed coupling each other or direct-coupling Or communication connection can be the INDIRECT COUPLING by some interfaces, device or unit or communication connection, Ke Yishi Electrically, machinery or other form.
It should be noted that for aforesaid each method embodiment, in order to simplicity describes, therefore it is all stated For a series of combination of actions, but those skilled in the art should know, the present invention is not by described The restriction of sequence of movement, because according to the present invention, some step can use other order or carry out simultaneously. Secondly, those skilled in the art also should know, embodiment described in this description belongs to be preferable to carry out Example, involved action and module might not be all necessary to the present invention.
In the above-described embodiments, the description to each embodiment all emphasizes particularly on different fields, in certain embodiment the most in detail The part stated, may refer to the associated description of other embodiments.
It is more than to a kind of forum data extracting method provided by the present invention and forum data extraction element Describe, for one of ordinary skill in the art, according to the thought of the embodiment of the present invention, in specific embodiment party All will change in formula and range of application, to sum up, this specification content should not be construed as the present invention's Limit.

Claims (10)

1. a forum data extracting method, it is characterised in that including:
Obtain target forum address and search object, and using described target forum address as currently traveling through address;
Data search is carried out in the webpage that current traversal address is corresponding, wherein, the described webpage corresponding in current traversal address carries out data search include: with described lookup object for index, in the webpage that current traversal address is corresponding, search the information relevant to described lookup object;
Judge one page chained address in the presence of current traversal address is whether;
If currently one page chained address in the presence of traversal address, then obtain the current lower one page chained address traveling through address, using lower one page chained address of acquisition as currently traveling through address, and return the step performing to carry out data search in the described webpage corresponding in current traversal address and described judgement currently travel through address whether in the presence of the step of one page chained address, wherein, the chained address that lower one page chained address is currently lower one page of the webpage that traversal address is corresponding of current traversal address;
If current one page chained address in the absence of traversal address, then show the information that all and described lookup object found is relevant.
Method the most according to claim 1, it is characterised in that described acquisition currently travels through lower one page chained address of address, including:
Obtain in the correspondence webpage of current traversal address all can chained address, will described all can be in chained address, the lower one page chained address that Address Confirmation is described current traversal address started with described current traversal address;
Or,
If currently traversal address comprises page turning form, then as the current lower one page chained address traveling through address after the value of the page turning form currently traveled through in address being added one.
Method the most according to claim 1 and 2, it is characterised in that described lookup object is user name;
In the described webpage corresponding in current traversal address, with described lookup object for index, search the information relevant to described lookup object, particularly as follows:
In the webpage that current traversal address is corresponding, search the information that described user name is issued.
Method the most according to claim 3, it is characterised in that carry out data search in the described webpage corresponding in current traversal address, also include:
In the webpage that current traversal address is corresponding, obtain the information that other user name in addition to described lookup object is issued;
Described judgement currently travels through whether webpage corresponding to address is last page, includes afterwards:
If the webpage that currently traversal address is corresponding is last page, then, it is combined as classification with the topic identifier in forum and user name, the information classification storage each user name found issued or update in data base.
Method the most according to claim 1 and 2, it is characterised in that described current traversal address whether in the presence of one page chained address, include afterwards:
If currently one page chained address in the absence of traversal address, then,
The quantity of the information that all and described lookup object that statistics finds is relevant;
The information that described targeted customer's name that described display finds is issued, also includes:
Show all of described statistics and the quantity of the relevant information of described lookup object.
6. a forum data extraction element, it is characterised in that including:
First acquiring unit, is used for obtaining target forum address and searching object, and using described target forum address as currently traveling through address;
First searches unit, for described lookup object for index, searches the information relevant to described lookup object in the webpage that current traversal address is corresponding;
Judging unit, be used for judging currently traveling through address whether in the presence of one page chained address;
Second acquisition unit, for when the judged result of described judging unit is for being, obtain the current lower one page chained address traveling through address, using lower one page chained address of acquisition as currently traveling through address, and trigger described first lookup unit, wherein, the chained address that lower one page chained address is currently lower one page of the webpage that traversal address is corresponding of current traversal address;
Display unit, for for when the judged result of described judging unit is no, the information that all and described lookup object that display finds is relevant.
Forum data extraction element the most according to claim 6, it is characterised in that described second acquisition unit specifically for:
Obtain in the correspondence webpage of current traversal address all can chained address, and will described all can be in chained address, the lower one page chained address that Address Confirmation is described current traversal address started with described current traversal address;
Or,
When current traversal address comprises page turning form, as the current lower one page chained address traveling through address after the value of the page turning form currently traveled through in address is added one.
8. according to the forum data extraction element described in claim 6 or 7, it is characterised in that described lookup object is user name;
Described first search unit specifically for: in the webpage that current traversal address is corresponding, search the information that described user name is issued.
Forum data extraction element the most according to claim 8, it is characterised in that described forum data extraction element also includes:
Second searches unit, in the webpage corresponding in current traversal address, obtains the information that other user name in addition to described lookup object is issued;
Update memory element, for when the judged result of described judging unit is for being, it is combined as classification with the topic identifier in forum and user name, searches unit and described second by described first and search information classification storage that each user name that unit finds issues or update in data base;
Described second acquisition unit be additionally operable to using described lower one page chained address as behind current traversal address, trigger described second and search unit.
10. according to the forum data extraction element described in claim 6 or 7, it is characterised in that described forum data extraction element also includes:
Statistic unit, for when the judged result of described judging unit is no, adds up the quantity of the relevant information of the described first all and described lookup object searching that unit finds;
Described display unit is additionally operable to: the quantity of the information that described all and described lookup object of showing that described statistic unit is added up is relevant.
CN201410840255.9A 2014-12-30 2014-12-30 Forum data extraction method and forum data extraction apparatus Pending CN105808545A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410840255.9A CN105808545A (en) 2014-12-30 2014-12-30 Forum data extraction method and forum data extraction apparatus

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410840255.9A CN105808545A (en) 2014-12-30 2014-12-30 Forum data extraction method and forum data extraction apparatus

Publications (1)

Publication Number Publication Date
CN105808545A true CN105808545A (en) 2016-07-27

Family

ID=56980223

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410840255.9A Pending CN105808545A (en) 2014-12-30 2014-12-30 Forum data extraction method and forum data extraction apparatus

Country Status (1)

Country Link
CN (1) CN105808545A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106714001A (en) * 2016-11-30 2017-05-24 北京奇虎科技有限公司 Processing method and device for chat information in live broadcast page
CN107391559A (en) * 2017-06-08 2017-11-24 广东工业大学 Based on block, the universal forum text extraction algorithm of pattern-recognition and style of writing originally

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1069515A1 (en) * 1999-07-15 2001-01-17 Information and Communications University Method and apparatus for web information extraction service
CN101727471A (en) * 2008-10-30 2010-06-09 鸿富锦精密工业(深圳)有限公司 Website content retrieval system and method
CN101957866A (en) * 2010-10-25 2011-01-26 中国农业大学 Network text information integration method and device
CN103455492A (en) * 2012-05-29 2013-12-18 腾讯科技(深圳)有限公司 Method and device for searching web pages

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1069515A1 (en) * 1999-07-15 2001-01-17 Information and Communications University Method and apparatus for web information extraction service
CN101727471A (en) * 2008-10-30 2010-06-09 鸿富锦精密工业(深圳)有限公司 Website content retrieval system and method
CN101957866A (en) * 2010-10-25 2011-01-26 中国农业大学 Network text information integration method and device
CN103455492A (en) * 2012-05-29 2013-12-18 腾讯科技(深圳)有限公司 Method and device for searching web pages

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106714001A (en) * 2016-11-30 2017-05-24 北京奇虎科技有限公司 Processing method and device for chat information in live broadcast page
CN107391559A (en) * 2017-06-08 2017-11-24 广东工业大学 Based on block, the universal forum text extraction algorithm of pattern-recognition and style of writing originally
CN107391559B (en) * 2017-06-08 2020-06-02 广东工业大学 General forum text extraction algorithm based on block, pattern recognition and line text

Similar Documents

Publication Publication Date Title
US8856100B2 (en) Displaying browse sequence with search results
EP3089055B1 (en) Method and device for displaying information flows in social network, and server
CN103777980B (en) A kind of method loading website comment information and browser
CN103902535B (en) Obtain the method, apparatus and system of associational word
CA2625097A1 (en) Search results injected into client applications
CN103577566B (en) A kind of web page browing content loading method and device
CN107483522A (en) The method and apparatus of Operational Visit
CN107341399A (en) Assess the method and device of code file security
CN101477564B (en) Intelligent layout method for displaying wide web page on narrow-screen equipment
CN106326734A (en) Method and device for detecting sensitive information
CN104765746A (en) Data processing method and device for mobile communication terminal browser
CN105224175A (en) The method of content and electronic equipment on a kind of marking of web pages
CN107180041A (en) Web page content review method and system
CN103455498A (en) Table keyword information locating method based on hypertext
US10303747B2 (en) Method, apparatus and system for controlling address input
CN103366011A (en) Method and device for visiting authenticated websites by browser address bar
CN105808545A (en) Forum data extraction method and forum data extraction apparatus
CN106933864A (en) A kind of search engine system and its searching method
CN102999576A (en) Method and equipment for confirming page description information corresponding to target pages
CN113239256B (en) Method for generating website signature, method and device for identifying website
CN105938496A (en) Webpage content extraction method and apparatus
CN106126588A (en) The method and apparatus that related term is provided
CN106156128A (en) A kind of realize website multi-lingual mention multiple domain name service method and device
CN108388556A (en) The method for digging and system of similar entity
CN104778232A (en) Searching result optimizing method and device based on long query

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20160727

RJ01 Rejection of invention patent application after publication