CN107357781B

CN107357781B - System and method for identifying relevance between webpage title and text

Info

Publication number: CN107357781B
Application number: CN201710516064.0A
Authority: CN
Inventors: 胡玥莹
Original assignee: Shanghai Caitu Information Technology Co Ltd
Current assignee: Shanghai Caitu Information Technology Co., Ltd
Priority date: 2017-06-29
Filing date: 2017-06-29
Publication date: 2020-12-29
Anticipated expiration: 2037-06-29
Also published as: CN107357781A

Abstract

The invention provides a system and a method for identifying the association degree of a webpage title and a text, which relate to the technical field of network communication, and the system comprises the following steps: a link extraction unit for extracting link information; the keyword unit is used for extracting the title information pointed by the cursor, extracting a plurality of keywords in the title information, and opening the text in the background according to the link information to extract the text information; a judging unit for judging sentence patterns of the title information, wherein the sentence patterns comprise statement sentences and question sentences; the relevancy calculation unit is used for calculating the occurrence frequency of the keywords in the text information and calculating the weight of the keywords in the text information according to the sentence patterns of the title information; a display unit for displaying the frequency and the weight beside the title; wherein, the system starts after the cursor stays on the title for more than the preset time. The net friend can obtain the association degree information of the title and the pointed text content by moving the cursor to the title, so that the net friend can be replaced to screen invalid garbage information to avoid wasting reading time.

Description

System and method for identifying relevance between webpage title and text

Technical Field

The invention relates to the technical field of network communication, in particular to a system and a method for identifying the association degree of a webpage title and a text.

Background

Today, as the information network covers the aspects of daily life, people need to read a large amount of information on the network every day to obtain news, know little common knowledge or have time to break. However, there are always a large number of articles or posts in the information, which are not consistent with the title and content, and the people who write the titles and articles intentionally use the exaggerated and towering article titles to attract the net friends to click and watch. The time of the net friends is greatly wasted by reading the articles with the seriously inconsistent subjects, and the feelings of the net friends are deceived.

Disclosure of Invention

It is an object of the present invention to provide a system and method for identifying the relevance between the title and the text of a web page, so that when an attractive title is seen, a net friend can obtain the relevance information between the title and the text pointed by the title by moving a cursor to the title. The invention can avoid wasting reading time and screen invalid and junk information for net friends.

In particular, the present invention provides a system for identifying a degree of association between a title of a web page and a body, wherein a cursor of a mouse points to a title including link information for opening the body associated with the title, the system comprising:

a link extraction unit for extracting the link information;

the keyword unit is used for extracting the title information of the title pointed by the cursor, extracting a plurality of keywords in the title information, and opening the text in the background according to the link information to extract text information;

a judging unit for judging the sentence pattern of the title information, wherein the sentence pattern comprises statement sentences and question sentences;

the relevancy calculation unit is used for calculating the occurrence frequency of the keywords in the text information and calculating the weight of the keywords in the text information according to the sentence patterns of the title information;

a display unit for displaying the frequency and the weight beside the title;

and after the stay time of the cursor on the title exceeds a preset time, starting the system.

Further, in the case where the sentence pattern is a statement sentence, the association calculating unit calculates the weight according to the position of the keyword in the body information.

Further, in a case where the sentence pattern is a question, the association calculating unit calculates the weight according to an answer situation of the question in the body information.

Further, the keyword unit extracts a noun, a verb, and/or an adjective in the header information as the keyword.

Further, the system further comprises a part-of-speech analysis unit for analyzing whether the nouns in the keywords have multiple meanings.

According to another aspect of the present invention, the present invention also provides a method for identifying a degree of association between a title of a web page and a body text, wherein a cursor of a mouse points to a title including link information for opening the body text associated with the title, the method comprising the steps of:

a link extraction step: extracting the link information;

a keyword step: extracting the title information of the title pointed by the cursor, extracting a plurality of keywords in the title information, and opening the text at the background according to the link information to extract text information;

a judging step: judging sentence patterns of the title information, wherein the sentence patterns comprise statement sentences and question sentences;

and a correlation calculation step: calculating the occurrence frequency of the keywords in the text information, and calculating the weight of the keywords in the text information according to the sentence pattern of the title information;

a display step: displaying the frequency and the weight beside the title;

wherein the link extraction step is started after the cursor stays on the title for more than a predetermined time period.

Further, in the case where the sentence pattern is a statement sentence, the association calculating step calculates the weight according to the position of the keyword in the text information.

Further, in a case where the sentence pattern is a question sentence, the association calculating step calculates the weight according to a solution condition of the question sentence in the text information.

Further, the keywords are nouns, verbs and/or adjectives in the header information.

Further, the method also comprises a part-of-speech analysis step of analyzing whether the nouns in the keywords have multiple meanings.

The above and other objects, advantages and features of the present invention will become more apparent to those skilled in the art from the following detailed description of specific embodiments thereof, taken in conjunction with the accompanying drawings.

Drawings

Some specific embodiments of the invention will be described in detail hereinafter, by way of illustration and not limitation, with reference to the accompanying drawings. The same reference numbers in the drawings identify the same or similar elements or components. Those skilled in the art will appreciate that the drawings are not necessarily drawn to scale. In the drawings:

FIG. 1 is a schematic diagram of a system for identifying a degree of association of a web page title with a body text according to one embodiment of the present invention;

FIG. 2 is a flow chart of a method for identifying a degree of association of a web page title with a body text according to another aspect of the present invention.

The symbols in the drawings represent the following meanings:

1. title, 2, link information, 3, text, 4, link extraction unit, 5, keyword unit, 6, judgment unit, 7, relevance calculation unit, 8 and display unit.

Detailed Description

The invention provides a system for identifying the relevance between a webpage title and a text, wherein a cursor of a mouse points to a title, and the title comprises link information for opening the text associated with the title. When an attractive title is seen, the net friend moves the cursor to the title to obtain the association information of the title and the text content pointed by the title. The invention can avoid wasting reading time and screen invalid and junk information for net friends.

As shown in fig. 1, the system includes: link extracting section 4, keyword section 5, judging section 6, association degree calculating section 7, and display section 8. Generally, when the cursor changes from an arrow shape to a hand shape, the system determines that the cursor is resting on title 1. When a webpage is browsed, the system is started after the stay time of the cursor on the title 1 exceeds a preset time. The predetermined period of time is set to 3 seconds or other suitable period of time.

After the system is started, first, the link extraction unit 4 extracts the link information 2. The keyword unit 5 opens the body 3 in the background based on the link information 2 to extract body information. Then, the keyword unit 5 extracts the title information of the title 1 pointed by the cursor, and then extracts a plurality of keywords in the title information. Wherein, the keyword unit 5 extracts nouns, verbs and/or adjectives in the header information as the keywords.

Further, the system further comprises a part-of-speech analysis unit for analyzing whether the nouns in the keywords have multiple meanings. For example, a hackle is both the name of a public character and a tree category. For example, a "walkthrough graph" is distinct from the meaning that (all people) have walked through (have gone away). The part-of-speech analysis unit is suitable for some intelligent analysis systems in the prior art, and the intelligent analysis systems can automatically expand and learn word banks.

Then, the judgment unit 6 makes a judgment of the sentence pattern of the header information. Wherein, the sentence pattern comprises statement sentences and question sentences. On the basis of the obtained keyword and text information, the association degree calculation unit 7 is started. And the relevancy calculation unit 7 is used for calculating the occurrence frequency of the keywords in the text information and calculating the weight of the keywords in the text information according to the sentence patterns of the title information. In the case where the sentence pattern is a statement sentence, the association calculating unit 7 calculates the weight according to the position of the keyword in the text information. And in the case that the sentence pattern is a question, the association calculation unit calculates the weight according to the solution condition of the question in the text information.

Further, for intuitive purposes, the weights have multiple levels, and in one embodiment, the weights can be divided into: completely unrelated, one-star related, two-star related, subject matter consistent, etc.

On the basis of the obtained frequency and weight, the display unit 8 displays the frequency and the weight next to the title 1. For example, frequency and weight are displayed with small boxes next to the title 1. Therefore, when the net friend browses the webpage, the system is opened, and as long as the cursor stays on the interested title 1 for a few seconds, information of the frequency and the weight of displaying the related relation between the title 1 and the text 3 can be obtained, so that the net friend can freely decide whether to read the text 3 or not on the basis.

According to another aspect of the present invention, as shown in fig. 2, the present invention further provides a method for identifying the association degree between the title and the body of a web page, wherein a cursor of a mouse points to a title, the title includes link information for opening the body associated with the title, and the link extraction step starts after the cursor stays on the title for more than a predetermined time. The method comprises the following steps:

s11: a link extraction step of extracting the link information;

s13: a keyword step: extracting the title information of the title pointed by the cursor, extracting a plurality of keywords in the title information, and opening the text at the background according to the link information to extract text information;

s15: a judging step: judging sentence patterns of the title information, wherein the sentence patterns comprise statement sentences and question sentences;

s17: calculating the frequency of the keywords appearing in the text information, and calculating the weight according to the positions of the keywords in the text information in the association calculation step under the condition that the sentence pattern is a statement sentence;

s19: calculating the frequency of the keywords appearing in the text information, and calculating the weight according to the positions of the keywords in the text information in the association calculation step under the condition that the sentence pattern is a statement sentence;

s21: a display step: displaying the frequency and the weight next to the title.

In step S21, the weights have multiple levels for intuitive purposes, and in one embodiment, the weights can be divided into: completely unrelated, one-star related, two-star related, subject matter consistent, etc. Therefore, when the net friend browses the webpage, the system is opened, and as long as the cursor stays on the interested title for a few seconds, information of the frequency and the weight of displaying the related title and the text can be obtained, so that the net friend can freely determine whether to read the text on the basis.

Further, the method also comprises a part-of-speech analysis step of analyzing whether the nouns in the keywords have multiple meanings. For example, a hackle is both the name of a public character and a tree category. For example, a "walkthrough graph" is distinct from the meaning that (all people) have walked through (have gone away). The part-of-speech analysis unit is suitable for some intelligent analysis systems in the prior art, and the intelligent analysis systems can automatically expand and learn word banks.

Thus, it should be appreciated by those skilled in the art that while a number of exemplary embodiments of the invention have been illustrated and described in detail herein, many other variations or modifications consistent with the principles of the invention may be directly determined or derived from the disclosure of the present invention without departing from the spirit and scope of the invention. Accordingly, the scope of the invention should be understood and interpreted to cover all such other variations or modifications.

Claims

1. A system for identifying a degree of association of a title of a web page with a body, wherein a cursor of a mouse points to a title including link information for opening the body associated with the title, the system comprising:

a link extraction unit for extracting the link information;

a display unit for displaying the frequency and the weight beside the title;

after the stay time of the cursor on the title exceeds a preset time, the system is started;

wherein, in the case where the sentence pattern is a statement sentence, the association calculation unit calculates the weight according to the position of the keyword in the text information;

wherein, in the case that the sentence pattern is a question, the association calculation unit calculates the weight according to the answer condition of the question in the text information;

wherein, the keyword unit extracts nouns, verbs and adjectives in the header information as the keywords;

the system also comprises a part-of-speech analysis unit, a word classification analysis unit and a word classification analysis unit, wherein the part-of-speech analysis unit is used for analyzing whether the nouns in the keywords have multiple meanings;

pointing a title to which a cursor of a mouse points, wherein the title comprises link information for opening a text associated with the title;

the method of operation of the system comprises the steps of:

a link extraction step: extracting the link information;

a display step: displaying the frequency and the weight beside the title;

wherein, after the stay time of the cursor on the title exceeds a preset time, the link extraction step starts to be started;

wherein, in the case where the sentence pattern is a statement sentence, the association calculating step calculates the weight according to the position of the keyword in the text information;

wherein, in the case where the sentence pattern is a question sentence, the association calculation step calculates the weight according to an answer situation of the question sentence in the text information.

2. The system for identifying the association degree of a title and a body of a web page as claimed in claim 1, wherein the method for operating the system further comprises a part-of-speech analysis step of analyzing whether the noun in the keyword has multiple meanings.