CN101777074B - Method and device for searching page through keyword - Google Patents

Method and device for searching page through keyword Download PDF

Info

Publication number
CN101777074B
CN101777074B CN 201010104946 CN201010104946A CN101777074B CN 101777074 B CN101777074 B CN 101777074B CN 201010104946 CN201010104946 CN 201010104946 CN 201010104946 A CN201010104946 A CN 201010104946A CN 101777074 B CN101777074 B CN 101777074B
Authority
CN
China
Prior art keywords
keyword
page
retrieval
reject
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN 201010104946
Other languages
Chinese (zh)
Other versions
CN101777074A (en
Inventor
柯宗贵
柯宗庆
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Bluedon Information Security Technologies Co Ltd
Original Assignee
Bluedon Information Security Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Bluedon Information Security Technologies Co Ltd filed Critical Bluedon Information Security Technologies Co Ltd
Priority to CN 201010104946 priority Critical patent/CN101777074B/en
Publication of CN101777074A publication Critical patent/CN101777074A/en
Application granted granted Critical
Publication of CN101777074B publication Critical patent/CN101777074B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention discloses a method and a device for searching a page through a keyword, which relates to the field of computer and internet, and is used for reducing the error judgment of the page which contains a target keyword but not a target page when searching page. The method comprises: searching a target keyword on a page; confirming the paragraph of the target keyword according to the position of the target keyword on the page; searching a keyword to be removed on the paragraph; and filtering the page of the searched keyword to be removed from the searched result. During the above searching process, at least one target keyword needs to be searched and at least one keyword needs to be removed. Each target keyword needs to be searched and each keyword needs to be removed have a corresponding relationship therebetween. The corresponding relationship is the only one-to-one relationship between the target keyword needs to be searched and the keyword needs to be removed have, or one target keyword and at least two keywords need to be removed have a corresponding relationship therebetween.

Description

A kind of method and device through the keyword retrieval page
Technical field
The present invention relates to computing machine and internet arena, particularly relate to a kind of method and device through the keyword retrieval page.
Background technology
In the process of inspection internet information or page of text, to analyze content of pages usually.Meet such demand on occasion:, but be not the purpose page that to seek though some page comprises target keyword.Need to consider how to reject such page.
Prior art has proposed two kinds of schemes, and the first is rejected the unwanted page through url filtering is set; It two is to reject the keyword list through being provided with, and the page that will comprise these keywords is directly rejected.But two kinds of schemes that prior art proposes all are easier to judge by accident some pages.
Summary of the invention
The invention provides a kind of method and device through the keyword retrieval page, in order in the process that reduces searching page to comprising target keyword but be not the False Rate of the purpose page.
A kind of method through the keyword retrieval page of the present invention comprises the following steps: searched targets keyword in the page; Confirm the paragraph at this target keyword place according to the position of target keyword in the page; The keyword that retrieval will be rejected in said paragraph; Filter from result for retrieval retrieving the page that to reject keyword; In above-mentioned retrieving, need at least one target keyword of retrieval, and at least one keyword that will reject, and have corresponding relation between each target keyword and the keyword that respectively will reject; Described corresponding relation is to have one-to-one relationship between target keyword and the keyword that will reject, perhaps has corresponding relation between a target keyword and at least two keywords that will reject.
A kind of device through the keyword retrieval page of the present invention comprises: first retrieval unit is used at page searched targets keyword; Positioning unit is used for confirming the paragraph that this target keyword belongs in the position of the page according to target keyword; Second retrieval unit is used for the keyword that will reject in said paragraph retrieval; Filter element is used for filtering from result for retrieval retrieving the page that will reject keyword; Database Unit; Corresponding relation between the keyword that is used to store each target keyword and respectively will rejects; And need retrieve at least one target keyword at first retrieval unit, when second retrieval unit need be retrieved at least one keyword that will reject, call said corresponding relation; Described corresponding relation is the one-to-one relationship between target keyword and the keyword that will reject, perhaps the corresponding relation between target keyword and at least two keywords that will reject.
Beneficial effect of the present invention is following: because the present invention has carried out heavily retrieval to rejecting keyword in comprising the paragraph of target keyword; And filter out with the page that will reject keyword comprising target keyword in arbitrary paragraph simultaneously; So improved the ability of the recognition objective page, reduced the probability of the erroneous judgement page simultaneously.
Description of drawings
Fig. 1 is the method step process flow diagram in the embodiment of the invention;
Fig. 2 is the apparatus structure synoptic diagram in the embodiment of the invention.
Embodiment
For in the process that reduces searching page to comprising target keyword but be not the False Rate of the purpose page; The invention provides a kind of method and device through the keyword retrieval page; Main thought is to belong to paragraph through dividing target keyword, and filters the page through heavily retrieving the keyword that will reject.
Referring to shown in Figure 1, the method among the embodiment comprises following key step:
S1, in the page searched targets keyword.
S2, confirm the paragraph at this target keyword place according to the position of target keyword in the page.
S3, the keyword that retrieval will be rejected in above-mentioned paragraph.
S4, will retrieve the page that to reject keyword and from result for retrieval, filter.
More specifically, in retrieving, need at least one target keyword of retrieval, and at least one keyword that will reject, and have corresponding relation between each target keyword and the keyword that respectively will reject.For example: have one-to-one relationship between target keyword and the keyword that will reject; Again for example: have corresponding relation between a target keyword and at least two keywords that will reject.
If have corresponding relation between a target keyword and at least two keywords that will reject; Then the decision logic of step S4 can be in said paragraph, to retrieve the corresponding arbitrary keyword that will reject of target keyword, then this page is filtered from result for retrieval; Also can be in said paragraph, to retrieve all the corresponding keywords that will reject of target keyword, then this page filtered from result for retrieval.
Below pass through the content of the content of the invention described above background technology record as the page to be retrieved; Having one-to-one relationship between target keyword and the keyword that will reject is example; Target keyword is " keyword "; The keyword of rejecting is " prior art ", is described in the process in the concrete realization.
S101, retrieve in background technology of the present invention with " keyword ", retrieve " keyword " for first section in background technology of the present invention according to sequences of text.
S102, retrieval by window to the paragraph at " keyword " place be first section.
S103, retrieval " prior art " in first section do not retrieve, and then continue retrieval by sequences of text.
S104, retrieve " keyword " for second section in background technology of the present invention.
S105, retrieval by window to the paragraph at " keyword " place be second section.
S106, retrieval " prior art " in second section, and retrieve, then from result for retrieval, filter out this page.
Afterwards, if also have other page to be retrieved, then continue other page of retrieval.
Referring to shown in Figure 2, the device among the embodiment comprises: first retrieval unit, positioning unit, second retrieval unit and filter element.
First retrieval unit is used at page searched targets keyword.
Positioning unit, be used for according to first retrieval unit retrieves to target keyword confirm the paragraph at this target keyword place in the position of the page.
Second retrieval unit is used for the keyword that will reject in said paragraph retrieval.
Filter element is used for filtering from result for retrieval retrieving the page that will reject keyword.
More concrete; Also can comprise: Database Unit; Corresponding relation between the keyword that is used to store each target keyword and respectively will rejects; And need retrieve at least one target keyword at first retrieval unit, when second retrieval unit need be retrieved at least one keyword that will reject, call said corresponding relation.For example: the said corresponding relation of Database Unit storage is the one-to-one relationship between target keyword and the keyword that will reject; Again for example: the said corresponding relation of Database Unit storage is the corresponding relation between a target keyword and at least two keywords that will reject.
If the said corresponding relation of Database Unit storage is the corresponding relation between a target keyword and at least two keywords that will reject; Then filter logic can be that second retrieval unit retrieves the corresponding arbitrary keyword that will reject of target keyword in said paragraph, and then filter element filters this page from result for retrieval; Also can be that second retrieval unit retrieves all the corresponding keywords that will reject of target keyword in said paragraph, then filter element filters this page from result for retrieval.
Below pass through the content of the content of the invention described above background technology record as the page to be retrieved; The said corresponding relation of Database Unit storage is that the corresponding relation between a target keyword and two keywords that will reject is an example; Filter logic is that second retrieval unit retrieves the corresponding arbitrary keyword that will reject of target keyword in said paragraph; Then filter element filters this page from result for retrieval; Target keyword is " keyword ", and the keyword that reject is " rejecting " and " prior art ", is described in the process in the concrete realization.
At first, first retrieval unit retrieves in background technology of the present invention with " keyword " according to sequences of text, retrieves " keyword " for first section in background technology of the present invention.
Secondly, positioning unit locate first retrieval unit retrieves to the paragraph at " keyword " place be first section.
Thereafter, second retrieval unit retrieves " rejecting " in first section, and retrieve, then filter element filters out this page from result for retrieval.No longer this page is continued retrieval.
Afterwards, if also have other page to be retrieved, then continue other page of retrieval.
Obviously, those skilled in the art can carry out various changes and modification and not break away from the spirit and scope of the present invention the present invention, and for example: target keyword also can be the relation of multi-to-multi with the keyword that will reject.Like this, belong within the scope of claim of the present invention and equivalent technologies thereof if of the present invention these are revised with modification, then the present invention also is intended to comprise these changes and modification interior.

Claims (4)

1. the method through the keyword retrieval page is characterized in that, comprises the following steps:
Searched targets keyword in the page;
The paragraph at this target keyword place is confirmed in the position of based target keyword in the page;
The keyword that retrieval will be rejected in said paragraph;
Filter from result for retrieval retrieving the page that to reject keyword;
In above-mentioned retrieving, need at least one target keyword of retrieval, and at least one keyword that will reject, and have corresponding relation between each target keyword and the keyword that respectively will reject; Described corresponding relation is to have one-to-one relationship between target keyword and the keyword that will reject, perhaps has corresponding relation between a target keyword and at least two keywords that will reject.
2. the method through the keyword retrieval page according to claim 1; It is characterized in that; Described corresponding relation is when having corresponding relation between a target keyword and at least two keywords that will reject, in said paragraph, to retrieve the corresponding arbitrary keyword that will reject of target keyword, then this page is filtered from result for retrieval; Perhaps in said paragraph, retrieve all the corresponding keywords that will reject of target keyword, then this page is filtered from result for retrieval.
3. the device through the keyword retrieval page is characterized in that, comprising:
First retrieval unit is used at page searched targets keyword;
Positioning unit is used for confirming the paragraph that this target keyword belongs in the position of the page according to target keyword;
Second retrieval unit is used for the keyword that will reject in said paragraph retrieval;
Filter element is used for filtering from result for retrieval retrieving the page that will reject keyword;
Database Unit; Corresponding relation between the keyword that is used to store each target keyword and respectively will rejects; And need retrieve at least one target keyword at first retrieval unit, when second retrieval unit need be retrieved at least one keyword that will reject, call said corresponding relation; Described corresponding relation is the one-to-one relationship between target keyword and the keyword that will reject, perhaps the corresponding relation between target keyword and at least two keywords that will reject.
4. like the said device of claim 3 through the keyword retrieval page; It is characterized in that; When the said corresponding relation of Database Unit storage is the corresponding relation between a target keyword and at least two keywords that will reject; Then second retrieval unit retrieves the corresponding arbitrary keyword that will reject of target keyword in said paragraph; Then filter element filters this page from result for retrieval, and perhaps second retrieval unit retrieves all the corresponding keywords that will reject of target keyword in said paragraph, and then filter element filters this page from result for retrieval.
CN 201010104946 2010-01-29 2010-01-29 Method and device for searching page through keyword Active CN101777074B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN 201010104946 CN101777074B (en) 2010-01-29 2010-01-29 Method and device for searching page through keyword

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN 201010104946 CN101777074B (en) 2010-01-29 2010-01-29 Method and device for searching page through keyword

Publications (2)

Publication Number Publication Date
CN101777074A CN101777074A (en) 2010-07-14
CN101777074B true CN101777074B (en) 2012-09-05

Family

ID=42513535

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 201010104946 Active CN101777074B (en) 2010-01-29 2010-01-29 Method and device for searching page through keyword

Country Status (1)

Country Link
CN (1) CN101777074B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112489683A (en) * 2020-11-24 2021-03-12 广州市久邦数码科技有限公司 Method and device for realizing fast forward and fast backward of audio based on key word positioning

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1750002A (en) * 2005-10-26 2006-03-22 孙斌 Method for providing research result

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1750002A (en) * 2005-10-26 2006-03-22 孙斌 Method for providing research result

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
周天绮.面向网页文本内容的网页信息过滤系统设计.《电脑知识与技术》.2009,第5卷(第27期),论文第7775页第15行-第7776页倒数第8行,图2. *

Also Published As

Publication number Publication date
CN101777074A (en) 2010-07-14

Similar Documents

Publication Publication Date Title
CN105956669A (en) Vehicle maintenance strategy pushing method and device
EP2812815B1 (en) Web page retrieval method and device
CN102467516A (en) Method, device and system for recording logs in equipment control process
CN102542061B (en) Intelligent product classification method
CN101826099B (en) Method and system for identifying similar documents and determining document diffusance
US20090276378A1 (en) System and Method for Identifying Document Structure and Associated Metainformation and Facilitating Appropriate Processing
US20110173187A1 (en) Conflict of interest detection system and method using social interaction models
CN103455758A (en) Method and device for identifying malicious website
CN103827852A (en) Clustering WEB pages on a search engine results page
CN102385632A (en) Method and system for log automatic classification and notification
CN101777074B (en) Method and device for searching page through keyword
CN102521256B (en) High-reliability data protection method of real-time/historical database
CN102446230B (en) Method for merging GDSII layout data
CN103136212B (en) The method for digging of one kind neologisms and device
US20100198829A1 (en) Method and computer-program product for ranged indexing
CN106484788A (en) Patent search system based on industry keyword
KR101666440B1 (en) Data processing method in In-memory Database System based on Circle-Queue
CN106570091A (en) High availability method for reinforced distributed cluster file system
CN103491564B (en) Self-diagnostic method and system of mobile terminal
CN104915425A (en) Method and device for retrieving file content
CN106844495A (en) A kind of acquisition methods and device of website operation daily record
CN110471764A (en) A kind of processing method and processing device of memory cleaning
CN102402610A (en) Method and system for automatically classifying and informing logs
CN104008098A (en) Polysemy keyword based text filtering method and device
KR100849690B1 (en) search system of information using formula for International Patent Classification and method for the same

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
DD01 Delivery of document by public notice

Addressee: Wu Bingtang

Document name: Notification of Passing Preliminary Examination of the Application for Invention

C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
DD01 Delivery of document by public notice

Addressee: Bluedon Information Safety Technology Co., Ltd.

Document name: Notification of Publication and of Entering the Substantive Examination Stage of the Application for Invention

DD01 Delivery of document by public notice

Addressee: Wu Bingtang

Document name: Notification of Passing Examination on Formalities

C14 Grant of patent or utility model
GR01 Patent grant
PP01 Preservation of patent right
PP01 Preservation of patent right

Effective date of registration: 20220422

Granted publication date: 20120905