JPH09101990A - Information filtering device - Google Patents

Information filtering device

Info

Publication number
JPH09101990A
JPH09101990A JP33579095A JP33579095A JPH09101990A JP H09101990 A JPH09101990 A JP H09101990A JP 33579095 A JP33579095 A JP 33579095A JP 33579095 A JP33579095 A JP 33579095A JP H09101990 A JPH09101990 A JP H09101990A
Authority
JP
Japan
Prior art keywords
article
articles
user
information
similarity
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
JP33579095A
Other languages
Japanese (ja)
Other versions
JP3810463B2 (en
Inventor
Masahiro Kajiura
Seiji Miike
Kenji Ono
Tetsuya Sakai
Kazuo Sumita
誠司 三池
一男 住田
顕司 小野
正浩 梶浦
哲也 酒井
Original Assignee
Toshiba Corp
株式会社東芝
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority to JP21293995 priority Critical
Priority to JP7-212939 priority
Application filed by Toshiba Corp, 株式会社東芝 filed Critical Toshiba Corp
Priority to JP33579095A priority patent/JP3810463B2/en
Priority claimed from US08/695,214 external-priority patent/US5907836A/en
Publication of JPH09101990A publication Critical patent/JPH09101990A/en
Publication of JP3810463B2 publication Critical patent/JP3810463B2/en
Application granted granted Critical
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Abstract

(57) [Abstract] [Problem] To add information on related articles to an article selected by information filtering and transmit the article to a user to effectively use the transmitted article. An inter-article similarity calculation unit 16 that calculates a similarity between articles is provided in an information filtering center 1, and the existence of duplicate articles is checked by the inter-article similarity calculation unit 16. One of the article groups forming the duplicate article is selected as an article to be presented to the user, and the other articles are excluded. In this case, the information about the excluded articles is
It is added to the selected article as related article information and sent to the user. Therefore, the relation between the articles provided to the user can be presented to the user, and the user can effectively use the transmitted article.

Description

DETAILED DESCRIPTION OF THE INVENTION

[0001]

BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to an information filtering apparatus for selecting a large number of text articles that meet a user's request / interest and periodically providing the selected information to the user.

[0002]

2. Description of the Related Art In recent years, with the spread of word processors and electronic computers, and the spread of electronic mail and electronic news via computer networks, the digitization of documents is accelerating.

[0003] As the term "electronic publishing" implies, it is considered that information on newspapers, magazines, and books will be provided electronically in the future. As a result, the amount of text information that can be obtained in real time for individuals is expected to be enormous.

Along with this, there is an increasing demand for an information filtering system or an information filtering service that selects a large number of text articles such as newspapers and magazines that meet the user's request / interest and provides them to the user on a regular basis. .

The information filtering system that has been realized conventionally searches for articles that match the user profile expressing the user's request and interest, and presents the list of these headlines or the entire article to the user. .

[0006] Usually, a user profile is created by specifying some topics that the user is interested in.

[0007] Further, the user determines the usefulness of the presented article and reflects this information in the user profile, thereby realizing a function called relevance feedback of increasing the matching rate of information filtering from the next time. There is.

[0008]

However, in the conventional system, since only the selected articles are listed and presented to the user, the relation between the articles presented this time,
There is a problem that it is difficult for the user to grasp the relationship between the article presented this time and the articles presented up to the previous time.

Further, in the conventional simple presentation of an article, which article was presented to the user because it matched any search condition of any topic, and which article was presented to other users? Since information such as reading is missing, it takes a lot of effort to judge the usefulness, and it is difficult to maintain the consistency.

Further, in the information filtering system, the double filtering of selecting the important text in the article after selecting the important article is effective in collecting information from a long article. It is effective in that However, in the past, since only a text of an appropriate length was mechanically displayed as an excerpt, there was a problem that extra information was mixed or necessary information was missing.

Further, conventionally, since the text to be provided to the user is simply selected according to the similarity between the text distributed from the news source and the search condition, even the text having the same content is output in pieces. There was a problem such as.

The present invention has been made in view of the above circumstances, and makes it possible to present to the user the relevance of articles provided to the user by information filtering,
A first object of the present invention is to provide an information filtering device that allows a user to grasp the relationship between articles.

Further, the present invention provides an information filtering apparatus capable of deepening the user's understanding and trust in information filtering by making it possible for the user to know what search condition the presented article satisfies. The second purpose is to provide.

Further, the present invention is to provide an information filtering apparatus capable of adjusting the length of the abstract or abstract presented to the user according to the type of article and performing the double filtering efficiently. The third purpose.

Further, according to the present invention, it is possible to provide articles to users by grouping or associating articles having overlapping contents with each other, and it is possible to significantly reduce the time and labor required for the users to read text articles. A fourth object is to provide a device.

[0016]

The present invention is an information filtering system that receives articles such as texts and images from a plurality of information sources, selects a predetermined article from the delivered articles, and presents it to the user. In the device, a means for holding a search condition designated in advance for each user, an article search means for searching the delivered articles and selecting an article matching the search condition for each user, and an article search means selected by this article search means Means for calculating the similarity between one article or another selected article and another article, and determining the related article for each article according to the similarity, and the information of the determined related article, the selected article And means for presenting it to the user.

In this information filtering device,
For example, the similarity between articles is calculated by comparing article expressions between articles, and the article presented to the user and related articles related thereto are determined according to the similarity. The information of the related article is added to the text information of the article presented to the user and sent to the user. It is preferable to perform similarity calculation between articles that have arrived this time, or between articles that have arrived this time and articles that have arrived up to the previous time. As a result, the relationship between the articles selected by the article search means and the relationship between the article selected this time and the article selected by the past filtering are clarified, and the user is notified of the relationship between the articles. be able to.

If the existence of the duplicate article is checked by calculating the similarity between the articles selected by the article search means, the text information of the duplicate article is not presented to the user, and the headline of the duplicate article is displayed. It is also possible to add only the information of (1) as related article information and present it to the user. This makes it possible to automatically avoid duplicate presentation of articles about the same content obtained from a plurality of different information sources to the user.

Further, the present invention provides an information filtering apparatus that receives articles such as texts and images from a plurality of information sources, selects a predetermined article from the delivered articles, and presents it to the user. A means for holding a search condition designated in advance for each time, an article search means for searching distributed articles, selecting an article that matches the search condition for each user and presenting it to the user, and selecting by this article search means It is characterized in that it is provided with a means for adding to each article information indicating a search condition satisfied by the selected article and presenting it to the user so that the user can be informed of the basis for selecting the article.

With this configuration, it is clear to the user which search conditions the presented article satisfies, such as which topic the presented article is suitable for. Therefore, the user can easily understand why the article is presented, and the usefulness of the article can be easily determined.

Therefore, a relevance feedback function is further provided for receiving feedback from the user regarding information such as whether or not each article already transmitted to the user was useful for the user, and correcting the search condition by reflecting the information. This makes it possible to effectively use the presentation of the rationale for selecting an article for its relevance feedback function.

Also, instead of the basis on which the article was selected,
By presenting to the user how the presented article is read by other users, relevance feedback based on the judgments of other users is possible, and effective use of relevance feedback is possible. Can be planned.

Further, the present invention provides an information filtering apparatus that receives articles such as texts and images from a plurality of information sources, selects a predetermined article from the delivered articles, and presents it to the user. A means for holding a search condition designated in advance for each time, an article search means for searching distributed articles, selecting an article that matches the search condition for each user and presenting it to the user, and selecting by this article search means It is characterized by comprising means for generating a summary or abstract having a length corresponding to the type of the published article and presenting the summary or abstract to the user.

According to this structure, since the abstract or abstract having a length corresponding to the type of article is created and presented to the user, the text information useful for the user out of the text presented to the user. The percentage of This enables efficient information collection.

It is preferable to use, as the classification of article types, differences in search conditions such as topics satisfied with the articles, and differences in attributes of the articles themselves such as the issue date and time of the articles. For example, if the user specifies multiple topics as search conditions and sets priorities for those topics, the size of the abstract / abstract can be increased as the articles corresponding to the topics with higher priority are searched. The percentage of text information that is useful to the user increases.

Further, the present invention is a means for receiving distribution of articles such as texts and images from at least one or more information sources, and a means for calculating the similarity between the retrieval condition designated by the user in advance and the distributed articles. In an information filtering device having an output means for sorting articles in the order of the calculated similarity and outputting only a certain number of articles or articles having a similarity equal to or greater than a predetermined threshold value in the order of the similarity, It is characterized in that it comprises means for calculating the similarity between articles, and performs grouping and association of articles or selection control of output articles according to the calculated similarity between articles.

In this information filtering device,
Related articles can be provided to users by grouping or associating them. When related texts are output in random order as in the past, users need to switch their heads in order to understand the contents of each text, and it takes time to understand the entire filtering result. However, in the information filtering apparatus of the present invention, related articles are grouped or associated with each other and provided to the user, so that the user's labor can be significantly reduced.

The inter-article similarity is calculated not only between the articles delivered on the same day, but also the similarity with the articles output to the user before the previous day. The output article is an article group consisting of only the articles of the day, It is preferable to add information for distinguishing whether articles on days before that are also included. This allows the user to more efficiently organize and read related articles.

[0029]

Embodiments of the present invention will be described below with reference to the drawings.

First, the overall configuration of the information filtering system of the present invention will be described with reference to FIG.

The information filtering system is a newspaper publisher,
This is an information providing system that receives text articles including texts and images from a plurality of information sources 2 such as a news agency or a publisher, and periodically sends the text articles to each of the subscribed user terminals 3. The service is realized by the information filtering center 1. The information filtering center 1 is realized by one computer system connected to a plurality of information sources 2 and a plurality of subscribing user terminals 3 via a communication network.
A central processing unit 4 for controlling and processing for information filtering, a semiconductor memory for storing programs and data, a storage unit 5 such as a magnetic disk or an optical disk, and a text article from an information source 2 via a communication network such as a line or radio waves It is composed of a receiving unit 6 for receiving, a transmitting unit 7 for transmitting a text article to the user terminal 3 via a communication network such as a line and radio waves.

Each user terminal 3 is, for example, an information processing terminal such as a personal computer or a workstation, and a text information receiving unit 8 for receiving a text article transmitted from the information filtering center 1 and a screen display of the received text article. The display unit 9 and the like are provided.

As shown in FIG. 2, the information filtering center 1 holds a kind of search condition called a user profile 10 for each user, and should provide it to the corresponding user according to the user profile 10. Search for articles. The user profile 10 is made up of a plurality of topics specified by the user, and articles that match those topics are searched and selected and sent to the user. Next, a specific configuration of the information filtering center 1 will be described.

(Embodiment 1) FIG. 3 shows a first embodiment of the present invention.
The configuration of the information filtering center 1 according to the embodiment is shown. In the figure, solid arrows indicate the flow of data.

The information filtering center 1 is, as shown in the figure, a user profile generation section 11, a user profile storage section 12, an article information extraction section 13, and an article search section 1.
4, an article selection unit 15, an inter-article similarity calculation unit 16, a presentation information generation unit 17, and an article information storage unit 18. Among these components, the user profile generation unit 11, the article information extraction unit 13, the article search unit 14, the article selection unit 15, and the inter-article similarity calculation unit 1 which are surrounded by broken lines.
6 and the presentation information generation unit 17 can be realized by software executed by the central processing unit 14 of FIG. 1, and the user profile storage unit 12 and article information storage unit 18 can be realized by the storage device 5.

The user profile generation unit 11 analyzes the requests and interests designated in advance by each user and generates a user profile required for the search for each user. These user profiles are stored in the user profile storage unit 12. The article information extraction unit 13 extracts information necessary for searching and similarity calculation between articles from the text articles arriving from each information source 2 and stores it in the article information storage unit 18 together with the raw text articles.

The article search unit 14 searches articles arriving from each information source 2 for articles that match the user profile. In this search process, the degree of similarity between the user profile and the arriving article is checked, and the articles are sorted in descending order of similarity. The article selection unit 15 is for selecting articles to be presented to the user from the search results. For example, all articles whose similarity value exceeds a certain threshold value, or some high-ranking articles with high similarity. Is selected.

The inter-article similarity calculation unit 16 is for examining the similarity between articles and calculates the similarity between the selected articles. The presentation information generation unit 17 generates article information to be presented to the user based on the article selection result and the inter-article similarity calculation result. The article information storage unit 18 stores article information for retrieval, inter-article similarity calculation results, and the like.
The specific processing contents of the user profile generation unit 11, article information extraction unit 13, article search unit 14, article selection unit 15, inter-article similarity calculation unit 16 and presentation information generation unit 17 will be described below.

FIG. 5 shows the user profile generator 11
The processing flow of is shown.

The user profile generator 11 accepts the request / interest of each user as an input (step S1). User requests and interests include natural language such as "I want to read articles about XX and XX", a set of keywords that frequently appear in topics of interest, and those that are prioritized and weighted. Alternatively, it is represented by a search formula in a normal document search.

On the other hand, the user profile generator 11 extracts a word by using a word dictionary, a synonym dictionary, etc.
Language processing such as synonym expansion is performed (step S2), and a user profile is created by converting to a format that enables retrieval (steps S3 and S4). The created user profile is stored in the user profile storage unit 12 for each user and used as a search condition for article search.

FIG. 5 shows an example of the processing flow of the article information extraction unit 13.

The article information extraction unit 13 receives an article arriving from an information source as an input (step S11), and uses a dictionary for document analysis and a dictionary for information extraction to respond to this, and performs morphological analysis, syntactic analysis, Format analysis or the like is performed to extract the information source of the article, the date, the frequency information of the document constituent elements such as characters and words, the appearance position, and the 5W1H-like information (step S12). Next, the article information extraction unit 13
An article is expressed as a collection of these extracted information (step S13). For example, an article is represented by a vector having the frequency of an appearing word as an element, or an article obtained by substituting an actual value into a 5W1H template. Examples of expressions of such articles are shown in FIGS. 6 and 7, respectively. FIG. 6 shows the frequency of appearance of words (semiconductor, memory, friction, depression, production, ...) Appearing in the article (14, 9,
5, 2, 3) as elements, and FIG. 7 shows the information source, the number of characters, the article heading, the topic, the date and time,
It is a template with items such as place, subject, main verb ....

After expressing the articles in this way, the article information extraction unit 13 also performs indexing for realizing high-speed article retrieval, that is, indexing processing (step S1).
4) Then, the article and indexing information expressed as a vector or a template are stored in the article information storage unit 18 (step S15).

FIG. 8 shows the flow of processing of the article search unit 14.

The article search unit 14 refers to the article information extracted by the article information extraction unit 13 and searches the arrived articles for ones that match the user profile.

This is equivalent to calculating the similarity between the user profile and each of the arrived articles. This similarity may take a discrete value such as “matches the user profile” or “does not match the user profile” depending on the search method, and the more well matched the article is, the higher the similarity value becomes. It may take a continuous value. Here, the case where the similarity takes a continuous value, which is more general, will be described.

The article retrieval unit 14 carries out the following processing for the user profile of each user.

First, the profile is read from the user profile storage unit 12 (step S21). Next, the article search unit 14 substitutes 1 into the variable i (step S
22), the similarity between the i-th article (first article) and the user profile is calculated (step S23). This similarity calculation corresponds to a normal search process, and the expression of the article and the search index stored in the article information storage unit 18 are referred to.

Next, the article search unit 14 updates the value of the variable i by +1 and then checks whether or not the value of i at that time is larger than the number of arriving articles (steps S24 and S25). Recognizes that the articles whose similarity has not been calculated remain, until the value of i becomes larger than the number of arrived articles,
Steps S23 to S25 are repeated. When the calculation of the similarity with the user profile is completed for all the arrived articles, that is, when the search process for all the arrived articles is completed, the article search unit 14 sets the arrived articles in the user profile. Sort in descending order of similarity with
Article ranking is performed (step S26). The result of this ranking is stored in the article information storage unit 18. An example of the ranking result is shown in FIG.

FIG. 10 shows the flow of processing of the article selection unit 15.

The article selecting section 15 retrieves and ranks the arrived articles retrieved by the article retrieval section 14 from the article information storage section 1.
8 is read (step S31), and the one actually presented to the user is selected (step S3).
2). The information of the article decided to be presented to the user is
It is stored again in the article information storage unit 18.

As a method of selecting articles, for example, the number of articles N to be presented to the user is set in advance by the user side or the center side, and the top N ranking items are presented, or similar to the user profile. It is conceivable to present articles that have a frequency above a certain threshold. FIG. 11 shows an example in which the top 10 cases are selected when the ranking result as shown in FIG. 9 is obtained.

FIG. 12 shows an example in which articles having a similarity to the user profile of 0.86 or more are selected when the ranking results shown in FIG. 9 are obtained.

Further, FIG. 13 shows an example in which, when a plurality of searches and rankings are performed for one user, the upper parts of the plurality of ranking results are merged to select an article to be presented to the user. ing.

In this example, the searches for the three topics of "semiconductor technology", "low-priced personal computer", and "artificial intelligence" are performed separately, and the articles A1, B1, C1 are ranked from the top of the three ranking results. , A2, B2 are selected.

Articles A1 and A2 are adapted to the topic "semiconductor technology", articles B1 and B2 are adapted to the topic "low-cost personal computer", and article C1 is adapted to the topic "artificial intelligence".

As a method of selecting articles here, as shown in FIG.
It is conceivable to select a fixed number of items such as 1 or to select an article whose similarity is a fixed value or more as shown in FIG.

FIG. 14 shows a processing flow of the inter-article similarity calculation section 16.

The article search unit 14 calculates the degree of similarity between the user profile and the article, in other words, the user profile is used as the search expression, and the ordinary search is performed with the article as the search target, while the article similarity is calculated. The degree calculation unit 16 calculates the degree of similarity between articles.

The similarity calculation is performed, for example, by comparing the expressions of articles as shown in FIGS. 6 and 7, and the calculation result is stored in the article information storage unit 18.

Here, it is assumed that there are a plurality of information sources 2 of articles such as newspaper companies, and the target of the inter-article similarity calculation is the articles arriving from different information sources, for example, newspaper company M. It is assumed that the articles have been sent and the articles have arrived from the newspaper company N.

The inter-article similarity may be calculated for all combinations of articles arriving from different information sources, but here, the inter-article similarity is calculated only for the articles selected by the article selecting unit 15. The method of lowering the calculation cost will be described.

That is, the inter-article similarity calculation section 16 first reads the article selected by the article selection section 15 from the article information storage section 18 (step S41). Next, the inter-article similarity calculation unit 16 calculates the similarity between the read articles that have arrived from different information sources, and stores the result in the article information storage unit 18 (step S4).
2).

A specific example of inter-article similarity calculation will be described below.

FIG. 15 shows an example of articles selected by the article selecting section 15 and arriving from different information sources. In this example, four articles A to D are to be presented to the user.

Articles A and D are articles arriving from newspaper company M, article B is an article arriving from newspaper company N, and article C.
Is an article that arrived from publisher O.

In this case, inter-article similarity is calculated for combinations of article A and article B, article A and article C, article B and article C, and article C and article D. Since article A and article D are articles that have arrived from the same information source, similarity calculation is not performed.

FIG. 16 shows the flow of processing of the presentation information generator 17.

The presentation information generating section 17 includes the article information storage section 1
The information of the article selected by the article selection unit 15 and the inter-article similarity calculated by the inter-article similarity calculation unit 16 are read from 8 (steps S51 and S52).

Then, the presentation information generator 17 classifies a set of articles having a high degree of similarity and different information sources as a set of duplicate articles (step S53). here,
Duplicate articles are articles created independently by multiple information sources for the same event, and are articles that may be considered to be the same or almost the same in content.

After that, the presentation information generation unit 17 selects one article, generally N articles, to be presented to the user as a representative from the duplicate article set in order to avoid the presentation of duplicate articles (step S54). Then, the presentation information generation unit 17
Is added to the body of the selected article by adding the information of the article that was not selected as related article information.
Information to be presented to the user is generated and output (steps S55 and S56).

Specific examples of duplicate articles and related article information will be described below.

FIG. 17 shows an example in which duplicate articles are derived from one press release. A press release article P that describes information about an event is published by newspaper publishers M, N, O.
Then, each newspaper company edits it and adds a comment to create its own article M, N, O. If articles M, N, O, and P are sent from each information source to the information filtering center, articles M, N, O, and P are duplicate articles.

Further, FIG. 18 shows an example in which a duplicate article is created from one event.

In this example, newspaper publishers M, N, and O independently collect data on the same event to create articles M, N, and O. If these are sent to the information filtering center 1, the articles M, N, and O will be duplicate articles.

Since the original purpose of information filtering is to allow a user to access desired information in a huge amount of information as efficiently as possible, generally, articles presented to the user include many duplicate articles. It is considered unfavorable to have For example, in the example of FIG. 18, if all the articles M, N, and O are presented to the user,
The user would have to read three articles to get information about a single event.

The presentation information generation unit 17 selects one article, typically N articles, to be presented to the user as a representative from the duplicate article set in order to avoid the above-described presentation of duplicate articles. Hereinafter, only the case of selecting only one will be described.

FIG. 19 shows an example of an overlapping article set obtained as a result of inter-article similarity calculation for the four articles in FIG.

In this example, since the similarity between the articles A and C and the articles B and D is high, two sets of duplicate articles are obtained.

The presentation information generator 17 selects one article from each duplicate article set according to a certain strategy.

For example, if the user side or the service center side decides in advance that the newspaper company M is given the highest priority, the articles finally presented to the user are the articles A and D arrived from the newspaper company M.

Similarly, it is generally considered that the press release having the largest amount of information is selected with the highest priority.

It is also possible to select the highest ranked search result.

For example, in FIG. 19, the similarity between the user profile and the article is high in the article C in the duplicate article set 1 and high in the article D in the duplicate article set 2, so that the article finally presented to the user is the article C. , D.

Further, a strategy such as selecting the article having the longest length or the smallest length can be considered.

By the processing described so far, duplicate articles are eliminated from the article candidates presented to the user. The information about the finally excluded duplicate article is added to the text information of each article and presented to the user.

FIG. 20 shows an example in which the information about the excluded duplicate article is added to the text information of the article and presented.

In this example, in addition to the body text information of the article presented to the user, the information regarding the article of another information source whose content is determined to be the same as this article is given as additional information. Specifically, the headline and information source of the article, the number of characters, and the degree of similarity with the article in which the text is currently presented are listed.

In this example, the article "The company XX has withdrawn from the information service business" was obtained from three sources, XX newspaper company, △△ newspaper company, and □□ newspaper company. It means that the article of XX newspaper company was selected as the article to be presented to.

By presenting the information on the duplicated articles eliminated in this way by adding it to the text information of the articles, it is possible to avoid reading articles that have the same content but different information sources many times, and It is possible to get an overview of how he reports on the same event.

FIG. 21 shows a modification of the related information presentation form of FIG.

That is, in FIG. 20, the related information is displayed on the user terminal as a solid text, but in FIG. 21, the text portion of the additional information is structured by hypertext or the like, and duplication eliminated by using this. It allows you to access the body of the article.

In this example, the article heading in the additional information area is a button that can be selected by a device such as a mouse, and the user can refer to the text of the related article 1 by selecting the related article 1. it can.

22 and 22 show examples of displaying the text of the related article 1 when the related article 1 is selected in FIG.

In the article "Semiconductor consultation ..." whose text was displayed in FIG. 21, only the information such as the headline is displayed in the additional information area in FIG.
Instead, the text of the related article 1 is displayed in the text information area.

To return to the state of FIG. 22 to FIG. 21,
The user may select the button "Semiconductor consultation ... (Original article)" in the additional information area of FIG.

In FIG. 23, the text information of the related article 1 is displayed on the newly opened window while holding the information displayed in FIG. With such a display method, it is possible to compare a plurality of duplicate articles.

The transition from the screen of FIG. 21 to the screen of FIG. 22 is executed as follows in accordance with the flow of processing of FIG.

As shown in FIG. 21, the presentation information generation section 17 adds the information of the related article to the text information of the presentation article and displays it on the screen of the user terminal 3 (step S6).
1). Next, when the event that the button of the related article is selected occurs, the presentation information generation unit 17 retrieves the text information of the selected related article from the article information storage unit 18 (steps S62 and S63), and is shown in FIG. As described above, the information of the original article is displayed in the additional information area, and the text of the selected related article is displayed in the text information area (step S64).

Note that such screen switching can also be performed under the control of the user terminal 3 side if the text information of the related article is transmitted from the center 1 to the user terminal 3 in advance.

The transition from the screen of FIG. 21 to the screen of FIG. 23 is executed as follows in accordance with the flow of processing of FIG.

As shown in FIG. 21, the presentation information generator 17 adds the information of the related article to the text information of the presentation article and displays it on the screen of the user terminal 3 (step S7).
1). Next, when the event that the button of the related article is selected occurs, the presentation information generation unit 17 retrieves the text information of the selected related article from the article information storage unit 18 (steps S72 and S73), and is shown in FIG. As described above, the text of the selected related article is displayed in a window (step S74).

This screen switching can also be performed under the control of the user terminal 3 side if the text information of the related article is transmitted from the center 1 to the user terminal 3 in advance.

Further, as shown in FIGS. 20 and 21, related articles to be added to the additional information area may be narrowed down by the same strategy as that for selecting articles from the above-mentioned overlapping article set.

As shown in FIGS. 21 to 23, from the article whose text is displayed on behalf of the duplicate article set, when the texts of other duplicate articles are made accessible, the representative article selected by the information filtering system is displayed. Even if is inappropriate, the user can select and read another duplicate article.

For example, even if the information filtering system has a strategy of preferentially selecting articles of the N newspaper according to the wishes of the user, the user wants to read the press release instead of the articles of the N newspaper only for a certain event. It is effective in some cases.

It is also possible to compare the views of a plurality of newspaper companies with respect to the same event.

FIG. 26 shows an example in which a list of articles to be presented to the user is displayed together with duplicate article information when article duplication occurs.

In this example, although there are four articles to be presented to the user, there are two duplicate articles in the third article, “Withdrawing from XX company information service business”.

The value of the degree of similarity between the user profile and the article is displayed after the heading of each article, but the value of the degree of similarity between the original article and the duplicate article is also displayed separately for the duplicate article. ing. It can be said that this is a duplicate article. Here, the original article is "○
× Articles such as “Withdrawing from company information service business”.

In the above description, the processing for one user profile has been mainly described.

In general, since there are a plurality of users who receive the information filtering service, the information filtering center holds a user profile for each user and performs each filtering process.

(Modification 1 of Embodiment 1) Next, another configuration example of the inter-article similarity calculation unit 16 and the presentation information generation unit 17 will be described.

FIG. 27 shows the processing flow of the inter-article similarity calculation section 16.

The article search unit 14 calculates the degree of similarity between the user profile and the article, in other words, the user profile is used as the search expression, and the ordinary search is performed with the article as the search target. The degree calculation unit 16 calculates the degree of similarity between articles.

The similarity calculation is performed by comparing article expressions as shown in FIGS. 6 and 7, and the calculation result is stored in the article information storage unit 18.

Here, it is assumed that the article information obtained by the information filtering up to N times is stored in the article information storage unit 18.

For example, if the information filtering service is provided once a day and N is set to 1, it means that the article information obtained by yesterday's information filtering is stored. Hereinafter, the description will be made mainly with N = 1.

In this system, the target of the inter-article similarity calculation is the set of the article that has arrived this time and the articles that have arrived up to the previous time.

The similarity calculation may be performed for all the combinations of the articles that have arrived this time and the articles that have arrived up to the previous time, but after that, the method with a lower calculation cost, that is, the article selection unit has selected this time. A case will be described where the similarity calculation is performed only for a combination of an article and an article presented to the user up to the previous time.

That is, the inter-article similarity calculation unit 16 first reads the information of the article selected by the article selection unit 15 from the article information storage unit 18, and then the information of the article presented to the user by the filtering up to the previous time. Is read from the article information storage unit 18 (steps S81 and S82). Then, the inter-article similarity calculation unit 16 calculates the similarity of the combination of the article selected this time by the article selection unit 15 and the article presented to the user up to the previous time, and the result is calculated in the article information storage unit 18 (Step S83).

FIG. 28 shows an example of the set of articles selected by the article selecting unit 15 this time and the set of articles presented to the user last time.

In this example, last time the user is in article A,
B, C, D are presented, and this time, articles E, F, G,
H is about to be presented.

In this case, the similarity calculation is calculated for 4 × 4 = 16 combinations such as article A and article E and article A and article F.

As a modified example, only articles satisfying a certain condition may be targets for similarity calculation.

For example, in FIG. 28, if only the similarities between articles having the same information source are calculated, the similarity calculation regarding the article E arrived from the newspaper company M this time is performed on the article A arrived from the newspaper company M last time. , B only needs to be performed.

Further, for example, in FIG. 28, it is conceivable that only articles whose degree of similarity with the user profile is a certain value or more are targeted for similarity degree calculation.

If only articles whose similarity to the user profile is 0.8 or more are targeted, article E, article A, and article G
It suffices to calculate only the combination of and article A.

FIG. 29 shows the flow of processing of the presentation information generator 17.

The presentation information generation section 17 includes the article information storage section 1.
8, the information of the article selected by the article information selection unit 15 this time, the information of the articles presented to the user up to the previous time, and the inter-article similarity calculated by the inter-article similarity calculation unit 16 are read (steps S91 to S91). S93). Then, the text information of the article this time is presented to the user together with the information of related articles up to the previous time (steps S94 and S95).

30 and 31 show an example in which the information of the related articles up to the previous time is added to the text information of the article this time and presented.

In FIG. 30, in addition to the text information of the article “Semiconductor consultation ...” that was presented to the user for the first time,
The information of the articles on semiconductors up to yesterday is given as additional information. Specifically, the headline and information source of the article up to the previous time, the number of characters, and the similarity to the article presented this time are listed.

In this example, the article presented this time is the 15-day article of the XX newspaper, and the related articles up to the previous time are the articles of the XX newspaper and the XX newspaper on the 14th.

Further, in FIG. 31, in addition to the text information of the article “series: semiconductor friction (part 3)” presented to the user this time, “series: semiconductor friction ( 1) ”and“ Series:
Information about the article "Semiconductor Friction (Part 2)" is displayed.

21 to 23 shown in the first embodiment.
Is also a modification of FIGS. 30 and 31 in this example.

That is, also in this system, the first embodiment
Similarly, it is possible to allow the user to access the text of related articles up to the previous time.

In FIGS. 21 to 23, the body text information and the additional information are completely separated, but it is possible to embed the article information up to the previous time in the body text information and present it.

FIG. 32 shows an example in which information about related articles up to the previous time is embedded in the text information of this article and presented.

In this example, the text of an article dated 19th, "XX earthquake off XX earthquake reactivated" is displayed. A part of "The XX earthquake that started ..." is a button that can be selected with a mouse.

When the user selects this, information on articles including information similar to this sentence among the articles up to the previous time is displayed.

FIG. 33 is an example in which, when the user selects the first sentence in FIG. 31, a list of articles up to the last time, which are closely related to the sentence, is displayed.

In this example, the headline and the information source of the article dated 14th, such as "Oki Oki Earthquake Magnitude 4", the number of characters, the similarity to the article this time, etc. are listed.

34. In FIG. 34, the user in FIG.
This is an example of displaying the text of an article when the related article "Earthquake Magnitude 4 Offshore" is selected.

Further, immediately after the user selects the first sentence in FIG. 32, one or more texts of related articles may be displayed as shown in FIG.

As shown in FIG. 32, in order to implement the form in which the information of related articles up to the previous time is embedded in the text information of the article this time, the degree of similarity between the current article and the article up to the previous time is shown. Instead of calculating, the degree of similarity between each component of the text of this article and the previous article is calculated.

Paragraphs, sentences, sections, and
Phrases, words, etc. can be considered.

Further, this may be further modified so that the related article information up to the previous time may be presented not by the article but by the constituent element of the body.

For example, instead of displaying the full text of the related article as shown in FIG. 34, it is possible to display only the first paragraph.

As described above, if the articles presented this time and the articles up to the last time related to the articles can be accessed, it is possible to grasp the history of events whose situation changes with the passage of time. , It becomes easy to grasp the information over multiple articles such as serial articles.

Further, it is also effective when the article of this time is read, and the article presented in the past is remembered and it is desired to reconfirm the content.

(Modification 2 of Embodiment 1) Next, still another configuration example of the inter-article similarity calculator 16 and the presentation information generator 17 will be described.

FIG. 35 shows a processing flow of the inter-article similarity calculation section 16.

The article search unit 14 calculates the degree of similarity between the user profile and the article, in other words, the user profile is used as the search expression, and the ordinary search is performed with the article as the search target. The degree calculation unit 16 calculates the degree of similarity between articles.

The similarity calculation is performed by comparing article expressions as shown in FIGS. 6 and 7, for example, and the calculation result is stored in the article information storage unit 18.

In this example, the target of the inter-article similarity calculation is the combination of the articles that have arrived this time.

The similarity may be calculated for all the articles that have arrived, but hereinafter, a case will be described in which the similarity is calculated only for the articles selected by the article selecting unit 15 this time, which have lower calculation costs.

The similarity calculation is performed for the articles this time, which is the same as in the first embodiment, but the first embodiment calculates between the articles having different information sources.
There is no such limitation here.

When four articles are selected by the article selection unit 15 as shown in FIG. 15, the inter-article similarity calculation unit 16
These articles are read from the article information storage unit 18 (step S101), and article A and article B, article A and article C, article A are read.
Then, similarity calculation is performed for all combinations such as article D, article B and article D (step S102).

Further, only articles satisfying a certain condition may be targets for similarity calculation.

FIG. 36 shows the flow of processing of the presentation information generator 17.

The presentation information generation section 17 includes the article information storage section 1.
The information of the article selected by the article information selection unit 15 and the inter-article similarity calculated by the inter-article similarity calculation unit 16 are read from 8 (steps S111 and S112). Then, the presentation information generation unit 17 presents the text information of this article to the user together with the information of other related articles of this time (steps S113 and S114).

FIG. 37 shows an example in which the text information of the article this time is presented together with the information of other related articles this time.

In this example, in addition to the main body information of the article “Semiconductor consultation ...” dated 15th, the information of the article on the same semiconductor dated 15th is given as additional information. As a result, the duplicated article in the first embodiment may be displayed, but in such a case, the duplicated article deletion process of the first embodiment may be performed.

Further, when viewing the text information of the article "XX company semiconductor share monopoly ..." displayed in the additional information area of FIG. 37, the article "Semiconductor consultation ..." is added as shown in FIG. It will be displayed in the information area.

21 to 23 shown in the first embodiment.
Is also a modified example of FIGS. 37 and 38 of this example.

That is, similarly to the first embodiment, it is considered that the user can directly access the text of the related article on the day.

(Reflection of article similarity to article presentation order)
Up to now, we have mainly described the addition of related article information when presenting individual articles to the user, but this time we used the similarity between articles to determine the order of articles to be presented to the user. It is also possible to do so.

FIG. 39 shows an example in which the inter-article similarity is reflected in the article presentation order.

In this example, the user profile is assumed to be a set of words related to three different fields: semiconductor technology, low-cost personal computer, and artificial intelligence.

When a search is performed in this way, a search result in which articles in three different fields are mixed is obtained, as shown in FIG.

If, for example, the top eight articles or articles having a similarity to the user profile of 0.80 or more are selected and presented to the user in the same order, the user will see semiconductors, low-priced personal computers, artificial intelligence, and semiconductors. You may end up reading articles in the order of, low-priced PC, and so on.

It may be effective in some cases to read articles in the order of similarity to the user profile. However, when articles of a plurality of fields are mixed in this way, FIG.
It is considered easier for the user to collect articles with similar contents and display them as a group, as shown in 9 (b).

In this example, the first three articles are about semiconductors, the next three are about low-priced personal computers, and the other two are about artificial intelligence.

As described above, in the system according to the first embodiment, the similarity between articles is calculated by comparing the expressions of articles using frequency vectors and the like, and the similarity is presented to the user according to the similarity. Related articles related to the article to be executed are determined. The information of the related article is added to the text information of the article presented to the user and sent to the user. It is preferable to perform the similarity calculation between the articles presented this time, or between the articles that have arrived this time and the articles that have arrived so far. This allows
The relationship between the articles presented this time and the relationship between the article presented this time and the articles presented by the past filtering are clarified, and it is possible to inform the user of the relationship between the articles.

If the existence of the duplicate article is checked by calculating the similarity between the articles, the text information of the duplicate article is not presented to the user, but only the information such as the headline of the duplicate article is related article information. It is also possible to add it as and present it to the user. This makes it possible to automatically avoid duplicate presentation of articles about the same content obtained from a plurality of different information sources to the user.

Therefore, when presenting a plurality of articles to the user by one-time information filtering, the relationship between the articles can be presented clearly, and the user can easily understand the content of the article.

(Second Embodiment) Next, a second embodiment of the information filtering system of the present invention will be described. The configuration of the entire system is similar to that of FIG. 1, and a user profile is held for each user, and articles are searched using this user profile. Here, as described above, the user profile refers to a search condition for searching an article that matches a topic of high interest to the user.

FIG. 40 shows a conceptual diagram of a user profile used in the second embodiment.

In this example, a user A selects two topics “semiconductor technology” and “semiconductor trade”. Another user B selects three topics, "semiconductor trade,""low-cost personal computer," and "artificial intelligence."

At this time, the user profile of the user A is composed of a search condition for searching an article on “semiconductor technology” and a search condition for searching an article on “semiconductor trade”. Similarly, the user profile of the user B includes search conditions for articles related to "semiconductor trade," search conditions for articles related to "low-priced personal computers," and search conditions for articles related to "artificial intelligence."

FIG. 41 shows the configuration of the information filtering center 1 according to the second embodiment. The information filtering center 1 includes a user profile generation unit 21, a topic storage unit 22, and an article information extraction unit 2 as illustrated.
3, an article search unit 24, an article selection unit 25, an additional information generation unit 26, and an article information storage unit 27.
Among these components, the user profile generation unit 21, the article information extraction unit 23, the article search unit 24, the article selection unit 25, and the additional information generation unit 26, which are surrounded by broken lines, are provided by the central processing unit 14 of FIG. 1, for example. It can be realized by software to be executed, and the topic storage unit 22 and the article information storage unit 27 can be realized by the storage device 5.

The user profile generator 21 receives the request / interest of each user as an input. For user requests / interests, "I want to read articles about XX and XX."
Such as a natural language, a set of keywords that frequently appear in a topic of interest, a weighted order of them, or a search expression in a normal document search.

The user profile generation unit 21 performs language processing such as word extraction and synonym expansion on this, and converts it into a format that enables retrieval to create a user profile. The user profile is stored in the topic storage unit 22 for each user. Further, the user profile generation unit 21 receives feedback from the user regarding information such as whether or not each article already transmitted to the user was useful for the user, and reflects the information to correct the search condition of the topic storage unit 22. It also has a relevance feedback function.

The article information extraction unit 23 receives an article arriving from an information source as an input, performs morphological analysis, syntactic analysis, format analysis, and the like on the article, and obtains the article information source, date, characters, words, and other documents. The frequency information of the constituent elements, the appearance position, and the 5W1H-like information are extracted. Then, the article is expressed as a collection of these extracted information.
For example, an article is represented by a vector having the frequency of an appearing word as an element, or an article obtained by substituting an actual value into a 5W1H template. The expression examples of such articles are the same as those of the first embodiment described in FIGS. 6 and 7, respectively.

The article information extraction unit 23 also performs indexing processing for realizing article retrieval at high speed. The article information extracted by the article information extraction unit 23 is stored in the article information storage unit 27.

Next, referring to FIG. 42, the article search unit 24
The processing flow of is described.

The article search section 24 refers to the search conditions of each topic stored in the topic storage section 22 and the article information extracted by the article information extraction section 23 to search for an arrival article that matches each topic. To do. This is equivalent to calculating the similarity between the topic and the arrival article. This similarity is "matched to topic" depending on the search method.
It may take a discrete value such as "does not match the topic", or it may take a continuous value such that the more well matched the article, the higher the similarity value, but here it is more general. A case where the similarity takes a continuous value will be described.

The article retrieval unit 24, for each topic,
The following processing is performed.

First, the article retrieval unit 24 substitutes 1 into the variable i (step S121), and then retrieves the retrieval condition of the i-th topic (topic 1) from the topic storage unit 22 (step S122). After this, the article search unit 24
After substituting 1 into the variable j (step S123), the similarity between the topic i (topic 1) and the arrival article j (arrival article 1) is calculated, and the similarity is stored together with the information of the satisfied search condition in the article information. Store in the unit 27 (step S12)
4). This similarity calculation corresponds to a normal search process, and the expression of the article and the search index stored in the article information storage unit 18 are referred to.

Next, the article search unit 24 updates the value of the variable j by +1 and then checks whether the value of j at that time is larger than the number of arriving articles (steps S125 and S126).
If it is not larger, it is recognized that there is an article whose similarity is not calculated, and steps S124 to S126 are repeated until the value of j becomes larger than the number of arrived articles. When the calculation of the similarity to the topic i is completed for all the arrived articles, the article search unit 24 sorts the arrived articles in descending order of similarity to the user profile and ranks the articles (step S127). ). The result of this ranking is stored in the article information storage unit 27.

After that, the article search unit 24 updates the value of the variable i by +1 and then checks whether the value of i at that time is larger than the total number of topics (steps S128 and S12).
9) If it is not larger, it is recognized that there are remaining topics for which the similarity has not been calculated, and steps S122 to S129 are repeated until the value of i becomes larger than the total number of topics.

FIG. 43 is a conceptual diagram of the arriving articles for the topic i ranked by the article search unit 24.
In this way, the arrived articles are ranked by topic.

FIG. 44 shows the flow of processing of the article selecting section 25.

The article selection unit 25 selects an article to be presented to each user from the search results of each topic stored in the article information storage unit 27 by the article search unit 24.

That is, first, the article selecting unit 25 substitutes 1 into the variable i (step S131), and then the user i
The user profile of (user 1) is stored in the topic storage unit 2
It is taken out from 2 (step S132). After that, the article selection unit 25 substitutes 1 into the variable j (step S13).
3) The retrieval result of the topic j (topic 1) of the user i is retrieved from the article information storage unit 27, and the article to be presented to the user is selected from the retrieval results (step S135). As a method of selecting articles, for example, the number of articles N to be presented to the user is predetermined by the user side or the center side, and the top N ranking items are presented, or there is a similarity to the user profile. It is conceivable to present articles above the threshold. Information on the selected article is stored in the article information storage unit 27.

Next, the article selecting unit 25 updates the value of the variable j by +1 and then checks whether the value of j at that time is larger than the number of topics specified by the user i (step S1.
36, S137), if it is not larger, it is recognized that the search results of other unselected topics remain, and steps S134 to S137 are repeated until the value of j becomes larger than the number of topics of the user i. When article selection for all topics of user i is completed, article selection section 2
After updating the value of the variable i by +1, 5 checks whether the value of i at that time is larger than the total number of users (step S1).
38, S139), if not large, it is recognized that there are users who have not been selected for articles, and steps S132 to S139 are repeated until the value of i becomes larger than the total number of users.

With the above processing, for example, as shown in FIG. 45, for a user who has selected three topics "semiconductor trade", "low-priced personal computer", and "artificial intelligence", "semiconductor trade" is selected. , A search result of “low-priced personal computer”, and a search result of “artificial intelligence” are extracted, and the one to be presented to the user is selected from the top articles.

FIG. 46 shows the flow of processing of the additional information generating section 26.

The additional information generator 26 performs the following for all users.

First, the additional information generating section 26 sets the variable i to 1
After substituting (step S141), the user profile of user i (user 1) is retrieved from the topic storage unit 22 (step S142). Next, the additional information generation unit 2
6 retrieves, from the article information storage unit 27, the articles presented to the user 1 selected by the article selection unit 25 and the information on the search conditions satisfied by these articles (step S143).

[0202] Here, the information on the search condition that the article is satisfied is, for example, to which topic the user selected the article, which condition of the topic search conditions, and the like. Of information. The search condition is what language expression is in what position in the article,
The Boolean expressions used in normal document search, natural language, and other article search sections process conditions that the article must satisfy, such as how often it was included, the subject / act of the article, and what the actor is. Is described in a format that allows

After that, the additional information generation unit 26 adds the information on the search conditions satisfied by these articles to the articles selected by the article selection unit 25 and presents them to the user i (step S144). Then, the additional information generation unit 26
Updates the value of the variable i by +1 and then checks whether the value of i at that time is larger than the total number of users (step S14).
5, S146), if it is not larger, it is recognized that there is a user for which additional information is not generated, and steps S142 to S14 are performed until the value of i becomes larger than the total number of users.
Repeat 6

[0204] Fig. 47 shows a display example in which information of topics that each article matches is added to the list of article headings of articles selected for a certain user and presented to that user.

Here, it is assumed that the user has selected three topics, “semiconductor trade”, “low-priced personal computer”, and “artificial intelligence”.

In this example, the user is presented with the article headlines of six articles, three of which are articles that are compatible with the "semiconductor trade", and two are "low-priced personal computers."
And the other one is for both "semiconductor trade" and "low-priced PC".

As described above, even when one article may match a plurality of topics, the basis for presenting the article is displayed.

Further, in this example, the value of the degree of similarity between the matched topic and the article calculated at the time of retrieval by the article retrieval unit 24 is displayed in the last column of each row.

[0209] Since the article of article number 6 is suitable for two topics, the similarity with "semiconductor trade" is 1.05,
Two similarities are displayed such that the similarity to “low-priced personal computer” is 0.80.

FIG. 48 shows a display example in which the information on the number of articles matching each topic is presented to the same user as in FIG.

In FIG. 48 (a), the number-of-items information of articles suitable for each topic selected by the user is displayed in a table format.

Articles conforming to "semiconductor trade" are shown in FIG.
Since the articles are the articles with the article numbers 1, 2, 3, and 6 in FIG. Similarly, since the articles that are suitable for the "low-priced personal computer" are the articles of article numbers 4, 5, and 6 in FIG. 47, the number of articles is displayed as 3. In this example, there are no articles that match "artificial intelligence", so the number of articles is zero.

[0213] The number of articles presented to the user is 6 because there is an overlap between 4 articles of "semiconductor trade" and 3 articles of "low price personal computer".

As a modification, the number of articles that match a plurality of topics, such as article number 6 in FIG. 47, may be separately counted.

In this case, for example, the number of “semiconductor trade” in FIG. 48A is three, which means the number of articles that match only this topic.

In FIG. 48 (b), information on the number of articles matching each topic selected by the user is displayed in a Venn diagram format.

In this example, article number 1 in FIG.
Articles 2 and 3 are articles that are compatible only with "semiconductor trade," Articles 2 and 5 are articles that are compatible only with "low-priced computers," and article 6 is both articles. It is clearly stated that the article conforms to.

In this example, the relationship between the number of matching cases of each topic and the total number of articles is clearer than that in FIG. 48 (a).

FIG. 49 shows a display example in which summary sentences, excerpt sentences, or body texts of articles selected for a user are collected by topic and presented to the user.

Here, the abstract sentence is a text in which the main body of the original article is processed so that the main points can be grasped, and the excerpt sentence is the text of the original article that is not processed. Refers to the extracted text.

In this example, three articles concerning "semiconductor trade" are displayed first side by side, followed by an article concerning "low-priced personal computer".

As described above, by clearly indicating to which topic each article presented to the user fits, the user can understand the content of the article, read which article, and which article is not read. Judgment will be easier and more efficient information collection will be possible.

FIG. 50 shows a display example in which information related to search conditions that satisfy an article is added as header information of the article body and presented to the user.

In this example, it is clearly indicated in the row of "corresponding topic" that the article being displayed is one that matches "semiconductor trade" among the topics selected by the user.

Below that, it is displayed that the degree of similarity between "semiconductor trade" and the article was 1.32.

Further, the search conditions used to search for articles relating to "semiconductor trade" and those satisfying the article being displayed among these conditions are displayed side by side.

In the body of FIG. 50, a part of the text is highlighted.

Here, highlighting means that a part of the text is generally more conspicuous than the other part, such as a display accompanied by an additional symbol such as an underline, a display with characters of different fonts and sizes, a display with different colors, etc. Refers to the display using the means to make.

In this example, as search conditions for searching articles that match the topic "semiconductor trade",
It is assumed that the condition "including words such as semiconductor, IC, and procurement in the text" has been set.

Since the article actually satisfies the above conditions, the words "semiconductor", "IC", and "procurement" in the first sentence of the text are highlighted in order to clearly show this.

As a modification, for example, the word "IC" in the "article headline" line may be highlighted.

[0232] By such highlighting, the user can understand on what basis the article being displayed was searched and presented.

[0233] Further, since the highlighted text is often important in terms of content, it is considered that the user can efficiently grasp the article content by browsing.

[0234] This also leads to the efficiency of the work of judging the usefulness of the article presented for the relevance feedback.

51, 52 and 53 show examples in which the usefulness of an article can be efficiently determined by highlighting the matching search conditions in the article.

FIG. 51 (a) shows an example of search conditions for searching articles that match the topic "natural language processing".

In this example, if the language expressions "natural language processing", "NL", "machine translation", and "kana-kanji conversion" appear in the text of the article, the score of the article will be high.

When the expressions "natural language" and "analysis" appear in the same sentence, the score of the article increases.

In addition, various conditions for retrieving articles are described.

FIG. 51 (b) is an example of an article presented to the user by being searched using the search condition of FIG. 51 (a). Since this article satisfies the search condition "includes a language expression called natural language processing in the text", the expression "natural language processing" in the article is highlighted. Here, when reading around the sentence including the expression "natural language processing" which is highlighted, "this software searches by simple character string matching without using natural language processing."It's easy to see that it's not really an article about natural language processing.

Since the user can determine that it is not necessary to read this article at this point, he or she can read only useful articles and can collect information, or can efficiently perform relevance feedback.

Similarly to FIG. 51, FIG. 52 is also an example of quickly determining that an article is not useful.

In this example, the search target is English text, and "artificial intelligenc" is used.
The search condition of the topic "e (artificial intelligence)" is shown in FIG.
(A).

Here, "artificial",
Articles containing words such as "intelligence" are given higher scores.

FIG. 52 (b) is an example of an article retrieved using the retrieval condition of FIG. 52 (a) and presented to the user.
The word "artificial" is highlighted. Similar to Figure 51, by browsing only around the highlighted words, this article reads "artifici
"ar hand (prosthetic hand)"
It can be instantly understood that it is irrelevant to "tiffial intelligence".

51 and 52 are display examples of articles that are not useful, FIG. 53 is a display example of articles that are useful to the user.

FIG. 53 (a) shows the search conditions for searching articles that match the topic "new personal computer products", and the words appearing in the articles are "notebook personal computer",
Expressions indicating the type of personal computer such as “laptop” and “desktop” and names of personal computer manufacturers such as “○○ company” and “Δ △ company” are specified.

FIG. 53 (b) is a display example of an article presented to the user, which is obtained as a result of the search under the search conditions shown in FIG. 53 (a).

Since “ΔΔ company” is highlighted, it can be seen at a glance that the makers of the personal computers introduced in this article are those of ΔΔ company, not those of XX company.

Similarly, since "notebook personal computer" is highlighted, it can be seen at a glance that the type of personal computer on sale is not a laptop or desktop but a notebook personal computer. As described above, even when the presented article is useful to the user, it is considered easy to understand the content of the article.

FIG. 50 shows an example of presenting to the user the topic search conditions and the search conditions satisfying the articles among them. An example of these display methods will be described.

FIG. 54 is a specific example of search conditions for searching a document that matches the topic "semiconductor trade".

The condition on the first line is an example of a Boolean expression used in a normal document search, and language expressions such as "semiconductor" and "trade" are connected by operators such as AND and OR.

The condition on the second line shows that the language expressions "semiconductor" and "trade" appear in the same sentence.

For example, the condition of the fourth line is that "semiconductor", "memory", "I" are included in the article headline character string of the article.
It represents a condition that a language expression such as "C" appears.

The article shown in FIG. 50, which has been searched by the search condition shown in FIG. 54 and presented to the user, has, for example, FIG.
Information such as 5 is added and displayed.

In this example, it is clearly indicated that the article currently displayed is suitable for the topic "semiconductor trade", and the topic search condition of "semiconductor trade" shown in FIG. 54 is presented to the user as it is. Has been done.

Below that, the conditions under which the article is actually satisfied are listed. For example, since the linguistic expressions "semiconductor" and "procurement" appear in the first sentence of FIG. 50, the satisfied condition is shown in the "search condition satisfied by article 1" of FIG. 55. "First sentence: semiconductor (once),
Procurement (once) ”is displayed.

Here, "(1)" represents the number of appearances.

Of the linguistic expressions written in the line "word:" of the topic search conditions, the three that actually appeared in the article were "semiconductor", "IC", and "procurement". Is displayed in the line "Word:" of "Search conditions satisfied with article 1".

At the same time, information such as the position of appearance and the number of appearances is displayed.

Furthermore, the Boolean expression "(semiconductor OR memory) AND (trade OR procurement)" in the first line of the topic search condition has the expressions "semiconductor" and "procurement" appearing in the article of FIG. Since this is satisfied, the Boolean expression is displayed in the “search condition that article 1 is satisfied”, and the expressions “semiconductor” and “procurement” in the Boolean expression are highlighted.

FIG. 56 shows a modification of FIG. 55.

In FIG. 55, the topic search condition and the search condition that the article is actually satisfied are separately displayed, whereas in FIG. 56, the search condition that the article is satisfied is embedded in the topic search condition. doing.

In this example, the words of the actually satisfied conditions such as "semiconductor" and "procurement" are highlighted.

As a result, it is possible to roughly understand what percentage of the topic search conditions is satisfied by the article.

As described above, by presenting the search condition of the topic being displayed and the information of the search condition in which the article is actually satisfied, the user can browse while judging the usefulness of the article, It is thought that it is possible to easily understand the contents.

[0268] Further, since the user can know from what basis the article was retrieved and presented, the user can return more detailed and effective relevance feedback information to the information filtering service side. It is believed that

(Modification 1 of Embodiment 2) Next, another configuration example of the article search unit 24 and the additional information generation unit 26 will be described.

First, the article retrieval unit 24 substitutes 1 into the variable i (step S151), and then retrieves the retrieval condition of the i-th topic (topic 1) from the topic storage unit 22 (step S152). After this, the article search unit 24
After substituting 1 for the variable j (step S153), the similarity between the topic i (topic 1) and the arrival article j (arrival article 1) is calculated and stored in the article information storage unit 27 (step S154). This similarity calculation corresponds to a normal search process, and the expression of the article and the search index stored in the article information storage unit 18 are referred to.

Here, the difference from FIG. 42 of the second embodiment is that
The only difference is that the article information unit 27 does not necessarily need to store information about the search conditions satisfied by each article.

In the second embodiment, the information to be presented to the user is added with the information as to why the article was searched, whereas in the modified example, the article to be presented to the user is displayed. This is because the information is presented by adding information on how other users read the article.

Next, the article search unit 24 updates the value of the variable j by +1 and then checks whether the value of j at that time is larger than the number of arriving articles (steps S155 and S156).
If it is not larger, it is recognized that there is an article for which the similarity is not calculated, and steps S154 to S156 are repeated until the value of j becomes larger than the number of arrived articles. When the calculation of the similarity to the topic i is completed for all the arrived articles, the article search unit 24 sorts the arrived articles in descending order of the similarity to the user profile and ranks the articles (step S157). ). The result of this ranking is stored in the article information storage unit 27.

After that, the article search unit 24 updates the value of the variable i by +1 and then checks whether the value of i at that time is larger than the total number of topics (steps S158 and S15).
9) If it is not larger, it is recognized that there are remaining topics for which the similarity is not calculated, and steps S152 to S159 are repeated until the value of i becomes larger than the total number of topics.

FIG. 58 shows the flow of processing of the additional information generating section 26.

The additional information generator 26 carries out the following processing for all users.

First, the additional information generating section 26 sets the variable i to 1
After substituting (step S161), the user profile of the user i (user 1) is retrieved from the topic storage unit 22 (step S162). Next, the additional information generation unit 2
6 retrieves from the article information storage unit 27 the articles presented to the user 1 selected by the article selection unit 25 and the information about other users who receive these articles (step S163).

After that, the additional information generating unit 26 adds the information selected by the article selecting unit 25 with the information regarding the other users who receive these articles and presents it to the user i (step S164). Then, the additional information generation unit 2
6 updates the value of the variable i by +1 and then checks whether the value of i at that time is larger than the total number of users (step S1).
65, S166), if it is not larger, it is recognized that there is a user for which additional information is not generated, and steps S162 to S1 are performed until the value of i becomes larger than the total number of users.
Repeat 66.

For example, as shown in FIG. 59, it is assumed that the article selecting section 25 stores information about which article is to be sent to which user.

In this example, for example, user 1 has article 1,
It is noted that presenting 2 presents user 2, presenting articles 2, 3, and 4.

[0281] The additional information generating section 26 informs the user 1 of the article 1
When presenting, the information regarding the users 3 and 4 who are the other users who receive the article 1 is added and presented. For example, when presenting the number of recipients of article 1, 3 of users 1, 3, 4
Information about a person is added, or information about two people except the user 1 is added.

Similarly, when presenting the article 2 to the user 1, the information regarding the users 2 and 4 is added and presented.

FIG. 60 shows a display example in which information about other users who have received an article is added to the list of article headings of articles selected for a certain user and presented to that user.

In this example, the total number of users of the information filtering service is 4,000.

[0285] Then, for example, it can be seen that the number of users who have received the article of article number 1 is 250.

FIG. 61 shows a display example in which the summary sentence or excerpt sentence of an article selected for a certain user is added with information about other users who received the article and presented.

Similar to FIG. 60, information on the number of receiving users is shown.

FIG. 62 shows an example in which information about other users who have received the article is added as header information of the article body and presented to the user.

It is clearly indicated that, out of the total of 4,000 users, 250 are receiving the article being displayed.

FIG. 63 shows a modification of FIG.

In this example, a breakdown of the number of users who received the article is displayed.

[0292] 1 out of 250 who received the displayed article
It can be seen that 50 are men, 100 are women, 200 are Japanese, 30 are Americans, and 20 are users from other countries.

Furthermore, out of 250, 180 are users who have selected the topic “semiconductor trade”, 50
People are users who have selected the topic "IC", 2
It can be seen that 0 people are users who have selected both “semiconductor trade” and “IC”. In addition to this, statistical information such as the affiliation and age group of the recipient may be displayed to the extent that privacy is not infringed.

As described above, if information about how many other users receive the currently displayed article, how general the article is, or a special article It can be understood that the article can be read only by the user group, and it can be used as a basis for determining how useful the article is for the user.

For example, if the user who has received the article information as shown in FIG. 60 does not have time to read all the six articles, he / she tries to collect only general common sense information for the time being, and the article number 4 is used. It can be used to read only articles that many users are reading.

FIG. 64 shows a display example in which the relevance feedback information previously performed by a user or another user is added to the article information presented this time and presented.

In this example, the articles arrived this time are b1 to b.
It is assumed that the user is trying to perform relevance feedback by judging the usefulness of all or some of these.

For example, if the user determines that the article b1 is "not useful" and sends this information to the information filtering center 1 side, the information filtering center 1 side will find articles of topics such as article b1. It is possible to modify the user profile by lowering the priority and so on, so that from the next time, more articles that match the user's request will be presented.

In FIG. 64, as the reference information for the usefulness determination, information on the usefulness determination made by the user last time or before, and information on the usefulness determination of other users are presented.

In this example, there are six articles a1 to a6 that the user received last time and whose usefulness was judged. For example, the user is "useful" for the article a1 and for the article a3. It can be seen that the determination is “unnecessary”.

In general, the judgment of usefulness by humans is inconsistent, and it is said that it is “useful” when there are similar articles, and “somewhat useful” when there are similar articles. It is possible that you may make different decisions.

Even if the user profile is corrected by feeding back the inconsistent determination information as described above, there is no guarantee that better filtering will be performed.

It is considered that the reliability and efficiency of the present usefulness determination can be improved by allowing the person to access the usefulness determination information that has been performed so far, as in this example. In addition, even if the user's request changes with time, it is possible to use the method of consciously changing the usefulness determination policy while referring to the past feedback results of the user.

Further, in FIG. 64, the judgment information of other users is displayed in addition to the judgment information of the past of the person himself / herself.

For example, the article a1 has been received by 250 other users, and its usefulness has been determined.
It can be seen that 100 of them determined to be “useful”, 100 determined to be “slightly useful”, and 50 determined to be “unnecessary”. In this way, by referring to the information of the usefulness judgments made by other users in the past, it can be used as a reference for the future usefulness judgments of the user, or by directly correcting the usefulness judgments made by the user himself in the past, It is possible to have relevance feedback performed again.

It is considered that this enables more reliable and efficient relevance feedback.

A modification of FIG. 64 is shown in FIG.

In FIG. 64, the user is "useful""somewhatuseful".
While the usefulness is judged by the discrete evaluation value of "unnecessary", the judgment is made by the continuous score in FIG.

[Last relevance feedback]
ack information ”, the judgment information of other users,
The average value of the scores given by other users is displayed.

For example, if the content of the article b1 this time is similar to the content of the article a1 of the previous time, the user has determined that the previous determination of a1 was 10 points.
A method of giving a high score to b1 this time can be considered.

Also, looking at the line of the previous article a5, while I gave a low score of 1 point, the average value of the other users is a relatively high value of 7.4 points. .

Therefore, it is possible for the user to withdraw the evaluation of the usefulness of his own a5 and reassign a new evaluation value.

As described above, in the system of the second embodiment,
Because the user is clearly shown what search conditions the presented article satisfies, such as which of the topics the presented article is suitable for, the article is presented. It becomes easier for the user to understand whether or not it is, and it becomes easier to judge the usefulness of the article.
Therefore, in order to make more effective use of the relevance feedback function that receives feedback from the user regarding information such as whether each article already sent to the user was useful to the user and corrects the search condition by reflecting the information. Become.

Also, instead of the basis on which the article was selected,
It is also possible to provide relevance feedback based on the judgment of other users by presenting to the user how the presented article is read by other users, and effective use of relevance feedback is possible. Can be planned.

(Third Embodiment) Next, a third embodiment of the information filtering system of the present invention will be described. The configuration of the entire system is the same as that in FIG. 1, and holds a user profile for each user, and an article is searched using this user profile. Here, as described above, the user profile refers to a search condition for searching an article that matches a topic of high interest to the user.

FIG. 66 shows the structure of the information filtering center 1 according to the third embodiment. The information filtering center 1 includes a user profile generation unit 31, a topic storage unit 32, and an article information extraction unit 3 as illustrated.
3, an article search unit 34, an article selection unit 35, a summary / abstract generation unit 36, and an article information storage unit 37. Among these components, the user profile generation unit 31, the article information extraction unit 33, the article search unit 34, the article selection unit 35, and the summary / abstract generation unit 36, which are surrounded by broken lines,
For example, it can be realized by software executed by the central processing unit 14 in FIG. 1, and the topic storage unit 32 and the article information storage unit 37 can be realized by the storage device 5.

The user profile generator 31 receives the request / interest of each user as an input. For user requests / interests, "I want to read articles about XX and XX."
Such as a natural language, a set of keywords that frequently appear in a topic of interest, a weighted order of them, or a search expression in a normal document search.

The user profile generation unit 31 performs language processing such as word extraction and synonym expansion on this, and converts it into a format that enables retrieval to create a user profile. The user profile is stored in the topic storage unit 32 for each user. Further, the user profile generation unit 31 receives feedback from the user regarding information such as whether or not each article already transmitted to the user was useful for the user, and reflects the information to modify the search condition of the topic storage unit 32. It also has a relevance feedback function.

FIG. 67 shows an example of a user profile represented by keywords and their weights.

In this example, since the user is interested in articles related to semiconductors, related terms such as “memory” are listed, and the weights used for similarity calculation are defined for each term. .

The article information extraction unit 33 receives an article arriving from an information source as an input, performs morphological analysis, syntactic analysis, format analysis, and the like on the article, and obtains the article information source, date, characters, words, and other documents. The frequency information of the constituent elements, the appearance position, and the 5W1H-like information are extracted. Then, the article is expressed as a collection of these extracted information.
For example, an article is represented by a vector having the frequency of an appearing word as an element, or an article obtained by substituting an actual value into a 5W1H template. The article information extraction unit 33 also performs indexing processing for realizing article search at high speed. The article information extracted by the article information extraction unit 33 is stored in the article information storage unit 37.

The article search unit 34 refers to the search conditions of each topic stored in the topic storage unit 32 and the article information extracted by the article information extraction unit 33 to search for an arrival article that matches each topic. To do. This is equivalent to calculating the similarity between the topic and the arrival article. This similarity is "matched to topic" depending on the search method.
It may take a discrete value such as "does not match the topic", or it may take a continuous value such that the more well matched the article, the higher the similarity value, but here it is more general. A case where the similarity takes a continuous value will be described.
In this case, the processing performed by the article search unit 34 for each topic is the same as in the first and second embodiments, and first, the search conditions for searching articles that match the topic are read from the topic storage unit 32. Next, for each of the arrived articles, the degree of similarity with the topic is calculated. This similarity calculation is equivalent to a normal search process, and the expression of the article and the search index stored in the article information storage unit are referred to. Information on the degree of similarity of the article and the search condition satisfying the article is stored in the article information storage unit 37. When the calculation of the similarity is completed for all the arrived articles, that is, when the search process for all the arrived articles is completed, the arrived articles are sorted in descending order of similarity with the topic. That is, article ranking is performed. The ranking result is also stored in the article information storage unit 37.

The article selecting section 35 selects an article to be presented to each user from the search results of each topic stored in the article information storage section 37 by the article searching section 34. For example, for a user who has selected three topics, "semiconductor trade", "low-priced personal computer", and "artificial intelligence", the search result of "semiconductor trade" shows "low-priced personal computer".
, And the search results of “artificial intelligence” are extracted, and the one to be presented to the user is selected from the top articles among these.

FIG. 68 shows a processing flow of the abstract / abstract generating unit 36 in the third embodiment. Abstract / Abstract Generator 3
6 performs the following processing for each user.

First, the abstract / abstract generating unit 36 substitutes 1 for the variable i (step S171), and extracts the user profile of the user i from the topic storage unit 32 (step S172). Next, the abstract / abstract generation unit 36 retrieves from the article information storage unit 37 a set of articles to be presented to the user i and information indicating which of the topics each article has matched. Then, the abstract / abstract generation unit 36 substitutes 1 into the variable j and presents the article j to the user.
On the other hand, while referring to the information on the matched topic, a summary or abstract having a length corresponding to the topic is generated (step S175).

Here, the abstract means a text generated based on the original sentence in order to express the subject of the article, and the abstract means a part of the original text of the article such as an important sentence is extracted as it is. Say something.

[0327] The "length" of the abstract / abstract refers to the compression ratio of the original sentence, the number of sentences, the number of paragraphs, the number of characters, or the ratio of the entire text to be presented.

Any method can be used for the abstract / abstract generation method used in the third embodiment as long as the length can be adjusted in two or more steps.

For example, an automatic summary generation technique using natural language analysis may be used, or a simple method of displaying only the first paragraph or the entire sentence may be used.

Next, the abstract / abstract generating unit 36 uses the current j
Is checked to see if it is larger than the number of articles presented to the user i (step S176).
It is determined that there is an article that has not undergone the abstract generation process, and steps S175 and S176 are repeated until the value of j becomes larger than the number of articles presented to the user i.

After that, the abstract / abstract generating section 36 presents the abstract or abstract of the article to the user i (step S1).
77) Then, it is checked whether or not the current value of i is larger than the total number of users (step S178). If the current value of i is not larger than the total number of users, the processing of steps S172 to S178 is repeated until it becomes larger. Next, a procedure for generating a summary / abstract having a length corresponding to a topic matched with an article will be described with reference to the drawings.

FIG. 69 shows an example of topics selected by a certain user and priorities between them.

In this example, the user has topics A, B,
We have selected four topics, C and D, and are seeking articles on these topics. Further, the priorities are higher in the order of topics A, B, C, and D.

The priority may be set by the information filtering service center 1 side or may be designated by the user. Here, if it is specified by the user, it means that the user is more interested in articles that match topic A than articles that match topic B, for example.

FIG. 70 shows an example of a list of articles to be presented to the user who has selected the topic shown in FIG. 69, and topics that match them.

In this example, four articles 1 to 4 are selected for the user. Articles 1 and 2 are topic A
The article 3 is adapted to the topic B, and the article 4 is adapted to the topics C and D.

FIG. 71 shows a conceptual diagram of article information presented to the user in the case of FIG. Since the articles 1 and 2 are adapted to the topic A having the highest priority among the topics selected by the user, a relatively long abstract or abstract is presented. On the other hand, since article 4 is suitable for topics C and D having the lowest priority among topics selected by the user, a very short abstract or abstract is presented.

In this way, the length of the abstract is changed stepwise according to the priority of the topic.

In FIG. 71, the length of the abstract or abstract is represented by the area in the figure, but the abstract / abstract of the article that matches the topic with the highest priority is not always the longest.

For example, if the compression rate based on the number of sentences in the original sentence is adopted as the length of the abstract, the original sentence of article 1 is 5
Suppose that the original sentence of sentence and article 4 was 20 sentences.

At this time, since the article 1 is an article of topic A having a high priority, the compression rate is 100%, and the article 4 is an article of topics C and D having a low priority, so that the compression rate is 50%. Since the original abstract is 5 sentences, the abstract of Article 4 will be 10 sentences.

With the functions as described above, the user can read articles with different degrees of detail for each topic.

It is considered to be effective when there is a clear priority between topics selected by the user.

(Relevance Feedback) The technique in document retrieval is to ask the user to judge the usefulness of the document of the retrieval result, and use the result to change the weight value of the word in the retrieval formula. Provides relevance feedback that searches for documents that are closer to what the user wants.

This function is being realized also in the field of information filtering.

In this embodiment, the usefulness determination information obtained at the time of relevance feedback can be reflected in the length of the abstract / abstract.

For example, it is assumed that the user returns information "Article 3 was very useful" in response to the information presentation as shown in FIG.

At the same time, it is assumed that the priorities between the topics shown in FIG. 70 are specifically defined by the magnitude of the importance value as shown in FIG.

At this time, since the article 3 which is considered to be particularly useful by the user is an article adapted to the topic B, the importance value of the topic B is made larger by some calculation, and this time it is adapted to the topic B. It is useful to present the article with a longer length.

FIG. 73 shows an example of article information presented to the user in the next filtering when such feedback is provided.

In FIG. 71, the priority of the topic A is the highest, but in this figure, the priority of the topic B is the highest due to the feedback, and the summary or abstract of the article 1'matching the topic B is the longest. ing.

(Modification 1 of Embodiment 3) Next, another configuration example of the abstract / abstract generating unit 36 will be described.

FIG. 74 shows a processing flow of the abstract / abstract generating unit 36 in this modification.

The abstract / abstract generating unit 36 performs the following processing for each user.

First, the user profile of the user i is retrieved from the topic storage section 32 (steps S181, S).
182). Next, a set of articles to be presented to the user i, a date added in advance to each article, a newspaper company, a morning edition / evening edition,
Attribute information such as the size of the headline, the number of lines, and on which page the headline is placed is retrieved from the article information storage unit 37 (step S183). Then, for each article presented to the user i, while referring to the attribute information, a summary or abstract having a length corresponding to the attribute information is generated (step S185). Here, the meanings of “summary”, “abstract”, and “length” are the same as in the third embodiment. The following processing is also performed in the third embodiment.
Is the same as

Below is a summary of lengths according to the attributes of articles.
The procedure for generating the abstract will be described with reference to the drawings.

FIG. 75 shows an example of articles selected by the article selecting section 35 for presentation to a certain user. In this example, date information such as the issue date is adopted as the attribute added in advance to the article.

The dates of articles 1 to 4 are May 26th, respectively.
It is Sunday, 23rd, 23rd and 20th.

For example, in a service in which information is collectively delivered every week, there is a possibility that new articles and old articles are mixed in this way.

FIG. 76 shows a conceptual diagram of article information presented to the user in the case of FIG. In this example, the length of the abstract or abstract is displayed longer for newer articles.

For example, the article 1 dated May 26 is displayed in detail, while the article 4 dated May 20 is displayed briefly.

Similarly, the length of the abstract or abstract may be changed according to the time when the article arrives at the information filtering center, the morning / evening information, and the like.

Further, it is also possible to adopt a day of the week as a time attribute and to perform processing such as "display articles on Monday in more detail than articles on other days".

FIG. 77 shows an example of articles selected to be presented to a certain user when the newspaper company is adopted as the attribute.

In this example, article 1 is from the XX newspaper, articles 2 and 3 are from the ΔΔ newspaper, and article 4 is from the XX newspaper.

If the user or the information filtering service side determines the priority in the order of XX newspapers, ΔΔ newspapers, and XX newspapers, the user is given, for example, FIG. 78.
Information such as is presented.

Since article 1 is an article in the XX newspaper with the highest priority, a long abstract or abstract is presented,
On the other hand, since article 4 is the article with the lowest priority, XX newspaper, a short summary or abstract is presented.

As described above by the newspaper company, the length of the abstract or abstract according to various attributes given in advance by the sender of each article, such as the number of pages, the position, and the social aspect. Can be changed.

(Relevance Feedback) Also in this modification, as in the third embodiment, the usefulness determination information obtained at the time of relevance feedback can be reflected in the length of the abstract / abstract. For example, it is assumed that the user returns information that "articles 2 and 3 were very useful" in response to the information presentation as shown in FIG.

Since articles 2 and 3 are both articles having the attribute of "△△ newspaper", the value of the degree of importance of △△ newspaper is increased by some calculation, and this time the length of the article that fits the △△ newspaper is increased. It may be useful to present the length longer.

FIG. 79 shows an example of article information presented to the user in the next filtering when such feedback is given.

In FIG. 78, the XX newspaper has the highest priority, but in this figure, the ∆ △ newspaper has the highest priority due to the feedback, and the summary or abstract of the article 1'having the attribute of ∆ △ newspaper. Is the longest.

As described above, in the system of the third embodiment, the summary of the length according to the type of article (search condition such as a topic satisfied with the article, or the attribute of the article itself such as the issue date and time of the article) Alternatively, since an abstract is created and presented to the user, the percentage of text information useful to the user in the text presented to the user is high. This enables efficient information collection.

(Embodiment 4) Next, an information filtering system according to a fourth embodiment of the present invention will be described.
Since the overall system configuration is the same as that of the first embodiment,
Here, the difference from the first embodiment will be described.

The inter-article similarity calculation unit 16 of FIG. 3 performs the inter-article similarity calculation processing as shown in FIG. 14, but the following calculation is performed for the inter-article similarity calculation of a certain article i and article j. An expression is used.

[0376]

(Equation 1) As a modified example of the similarity calculation formula, for example, the following formula can be given.

[0377]

(Equation 2) In the modified example of the similarity calculation formula, xi and xj are frequency vectors of words included in article i and article j, respectively.

In the above similarity calculation, all words in the article are targeted, but it is also possible to limit this to words of several kinds of parts of speech. For example, the similarity may be calculated by limiting the part of speech to nouns and verbs.

In the similarity calculation between articles, the similarity may be obtained for each field on the format such as the headline and the first sentence, and the weighted average of the similarities may be defined as the overall inter-article similarity. In this case, the similarity corresponding to Expression 1 is as follows.

[0380]

(Equation 3) Here, Cfi is a set of words included in the field f of the article i, and Cfj is a set of words included in the field f of the article j.

A field can be detected by the presence of a headline, the first sentence, the first paragraph, blanks of the first character of the document, indent information, and punctuation. Similar modifications can be made to the equations 2 to 8.

Further, after the information on the numerical similarity calculation as described above is provided, a process of checking syntax information etc. is provided.
Even between articles having a degree of similarity equal to or higher than a certain threshold, it is possible to transform them so that they are not similar articles. For example, in newspaper articles, the subject of the first sentence (specifically, the proper noun preceded by the particle "ha") plays an important role. If this subject differs from article to article, do not make it a similar article.

Next, with reference to FIG. 80, the presentation information generation processing by the presentation information generation unit 17 in FIG. 3 will be described.

In the first embodiment, in order to avoid the presentation of duplicate articles, the case of selecting an article to be presented to the user as a representative from the duplicate article set has been described. Here, related articles are grouped or associated with each other. And is presented to the user.

That is, first, the information of the article selected by the article selection section 15 is read from the article information storage section 18 (step S201). Next, the inter-article similarity calculation unit 16 calculates the similarity between the selected articles by using the above formula, and obtains a set of articles having a high similarity to each other (step S202). And
Output control such as grouping, associating related articles, or selecting a specific article is performed and presented to the user (step S203).

Here, grouping means that the output lists of articles are aligned so that related articles are presented side by side to the user. Further, as the association, for example, hypertext is generated using link information connecting an article and an article related to the article, and the hypertext is presented to the user. In the specific article selection, one or several articles are selected from related articles, and only the selected article is presented to the user.

By performing such grouping and association, it is possible to prevent the related text articles from being output to the user in random order. Therefore, the user can efficiently read related articles.

(Fifth Embodiment) Next, an information filtering system according to a fifth embodiment of the present invention will be described focusing on differences from the first embodiment. The configuration of the fifth embodiment is shown in FIG. The difference from the embodiment is that it has a sent article storage unit 19 for storing articles output to the user.

The sent article storage unit 19 stores the article provided to the user together with the date information of the article provided, in association with the user. This is done when the article is provided to the user.

FIG. 82 shows the flow of processing of the presentation information generator. First, the information of the article selected by the article selection unit 15 is read (step S211). And
The selected article on the day stored in the article information storage unit 18 and the article before the previous day stored in the sent article storage unit 19 are referred to, and the articles before the previous day are also targeted by the inter-article similarity calculation unit 16. The inter-article similarity calculation is performed to obtain the duplicate article set (step S212).

In this case, the duplicate article set φ k can be defined as follows with a certain article j as the core.

[0392]

(Equation 4) That is, specifically, the articles selected by the article selection unit 15 are scanned from the top, an article having a similarity of a certain threshold value or more with respect to the article j is obtained, and the article is set as a duplicate article. Is to seek.

Thereafter, output control such as grouping and associating related articles as described above or selecting a specific article is performed, and the output control is presented to the user as a filtering result (step S213).

FIG. 83 shows the flow of output processing of the filtering result output to the user. It is determined whether or not there is a duplicate article set in order from the selected top article (steps S221 and S222), and if there is no duplicate article, the article (for example, title and newspaper publisher information) is output. Yes (step S223). On the other hand, when there is a duplicate article, it is checked whether or not the duplicate article set includes only articles on the current day (step S224). If the article includes only articles on the current day, mark 2 is given, and if the articles before that are included, mark 1 is given. The duplicate article set is output together with the output (steps S225, S226, S227). For the remaining selected articles, steps S222 to S2
The process of 27 is performed similarly (steps S228 and S2).
29). When outputting a duplicate article set, if the title is output as flat text, the duplicate articles will be output side by side (grouping). FIG. 84 shows an output example thereof. It means that the articles surrounded by straight lines are duplicate articles. Further, □ is a duplicate article set consisting of only articles on the day, and Δ is a mark indicating that articles before that are included. "8/4" is the date of the article. On the other hand, if it can be output as hypertext, it is possible to display only the representative article at the top level and associate the representative article with other duplicate articles. Display examples of this hypertext are shown in FIGS.

In FIG. 85, □ means having the duplicate article set of the day, and Δ means having the duplicate article set before that, and the titles of the representative articles are displayed respectively. When the article list of the highest hierarchy shown in FIG. 85 is output, link information to the information of the duplicate article set shown in FIGS. 86 and 87 is added to each mark. This is HTML (Hyper Text M
It can be realized by a known technique such as using the notation of arkup Language). In this case, when the user selects the mark □ in FIG. 85 on the screen,
When the user selects the mark Δ of FIG. 85 on the screen, the duplicate article information of FIG. 87 is displayed.

As described above, the user can send the related articles by adding the information for distinguishing whether the article group consists of the articles of the current day or the articles of the previous days are included. It is possible to organize and read more efficiently.

The above Embodiments 1 to 5 and their modifications can be used in various combinations as required. In the above description, an example in which a filtering system is implemented as a network system for sending filtering results to users via a communication network has been described. It can also be built. in this case,
Since the user terminal and the filtering system are integrated, there is no communication network between the user terminal and the filtering system.

[0398]

As described above, according to the present invention,
The information filtering makes it clear to the user the relationship between the articles presented to the user and facilitates the understanding of the article contents. Especially to understand the history of events that change the situation over time,
It becomes easy to understand the information over multiple articles such as serial articles, and the performance of the filtering system can be improved. In addition, it is possible to automatically avoid that articles about the same content obtained from a plurality of information sources are presented to the user in duplicate.

[0399] Further, since it is clearly indicated to the user which article the presented article is suitable for, the user can easily understand the content of the article. Also, since it is clear how the presented article is read by other users, the user can identify articles that are commonly read and articles that are read by some users. be able to. Furthermore, by allowing the user to access the usefulness judgment information made by the user for the articles presented in the past and the usefulness judgment information made by other users, consistent relevance feedback is provided. It is possible to re-provide relevance feedback by referring to other users' judgments and relevance feedbacks, and further to correct the usefulness judgments made in the past.

[0400] Furthermore, since a summary or abstract of length according to the topic suitable for the article and a summary or abstract of length according to the attribute of the article are presented to the user, the text presented to the user Of these, the ratio of text information that is useful to the user can be increased, and efficient information collection becomes possible.

Further, since related articles are provided to the user after being grouped or associated with each other, the labor of the user can be greatly reduced. Furthermore, not only between the articles delivered on the same day, but also the similarity with the articles output to the user before the previous day is obtained, and the output article is an article group consisting of only the articles of the day or the articles of the days before that. By adding the information for distinguishing whether or not it is also included, the user can more efficiently organize and read related articles.

[Brief description of the drawings]

FIG. 1 is a block diagram showing a system configuration of an entire information filtering system to which each embodiment of the present invention is applied.

FIG. 2 is a diagram conceptually showing an operational form of the information filtering system of FIG.

FIG. 3 is a block diagram showing a configuration of an information filtering center provided in the information filtering system according to the first embodiment of the present invention.

FIG. 4 is a flowchart showing a flow of user profile generation processing in the system of the first embodiment.

FIG. 5 is a flowchart showing the flow of article information extraction processing in the system of the first embodiment.

FIG. 6 is a diagram showing a representation example of an article in the system of the first embodiment.

FIG. 7 is a diagram showing another example of expression of articles in the system of the first embodiment.

FIG. 8 is a flowchart showing a flow of article search processing in the system of the first embodiment.

FIG. 9 is a view showing a state of arrived articles ranked by article search processing in the system of the first embodiment.

FIG. 10 is a flowchart showing a flow of article selection processing in the system of the first embodiment.

FIG. 11 is a diagram showing a ranking of the top 10 when the ranking result shown in FIG. 9 is obtained in the system of the first embodiment.
The figure which shows the example which selected the case.

FIG. 12 is a diagram showing an example in which articles having a similarity with a user profile of 0.86 or more are selected when the ranking result as shown in FIG. 8 is obtained in the system of the first embodiment.

FIG. 13 shows a state in which when a plurality of searches and rankings are performed for one user in the system of the first embodiment, an upper part of the plurality of ranking results is merged and an article to be presented to the user is selected. FIG.

FIG. 14 is a flowchart showing a flow of inter-article similarity calculation processing in the system of the first embodiment.

FIG. 15 is a diagram showing an example of articles arriving from different information sources in the system of the first embodiment.

FIG. 16 is a flowchart showing a flow of presentation information generation processing in the system of the first embodiment.

FIG. 17 is a diagram showing how duplicate articles are derived from one press release in the system of the first embodiment.

FIG. 18 is a diagram showing how a duplicate article is created from one event in the system of the first embodiment.

FIG. 19 is a diagram showing an example of a duplicate article set obtained as a result of performing inter-article similarity calculation on the four articles in FIG. 15 in the system of the first embodiment.

FIG. 20 is a diagram showing an example in which information about duplicate articles excluded in the system of the first embodiment is added to the text information of articles and presented.

FIG. 21 is a diagram showing a display form of related article information in the system of the first embodiment.

FIG. 22 is a diagram showing another display form of related article information in the system of the first embodiment.

FIG. 23 is a diagram showing still another display form of related article information in the system of the first embodiment.

FIG. 24 is a flowchart showing a flow of display screen switching processing of related article information in the system of the first embodiment.

FIG. 25 is a flowchart showing the flow of another display screen switching process of related article information in the system of the first embodiment.

FIG. 26 is a diagram showing an example in which a list of articles to be presented to the user is displayed together with duplicate article information when article duplication occurs as shown in FIG. 20 in the system of the first embodiment.

FIG. 27 is a flowchart showing the flow of inter-article similarity calculation processing in the system of the first embodiment.

FIG. 28 is a diagram showing an example of a set of articles selected by an article selection unit this time and a set of articles presented to a user last time in the system of the first embodiment.

FIG. 29 is a flowchart showing the flow of presentation information generation processing in the system of the first embodiment.

FIG. 30 is a diagram showing an example in which information of related articles up to the previous time is added to the text information of the article of this time and presented in the system of the first embodiment.

FIG. 31 is a diagram showing another example in which information of related articles up to the previous time is added to the text information of the article this time and presented in the system of the first embodiment.

FIG. 32 is a diagram showing an example of embedding and presenting information of related articles up to the previous time in the text information of the article of this time in the system of the first embodiment.

FIG. 33 is a view showing a state in which, when the first sentence of FIG. 32 is selected in the system of the first embodiment, a list of articles up to the previous time, which is closely related to the sentence, is displayed.

FIG. 34 is a diagram showing an example in which the text of an article is displayed when a related article “Oki Oki Earthquake Magnitude 4” in FIG. 33 is selected in the system of the first embodiment.

FIG. 35 is a flowchart showing another example of the flow of inter-article similarity calculation processing in the system of the first embodiment.

FIG. 36 is a diagram showing another example of the flow of presentation information generation processing in the system of the first embodiment.

FIG. 37 is a diagram showing another example in which the text information of the article this time is presented together with the information of other related articles this time in the system of the first embodiment.

FIG. 38 is a diagram showing another example in which the text information of the article of this time is presented together with the information of other related articles of the present time in the system of the first embodiment.

FIG. 39 is a diagram showing an example in which the inter-article similarity is reflected in the article presentation order in the system of the first embodiment.

FIG. 40 is a diagram conceptually showing a user profile used in the information filtering system according to the second embodiment of the present invention.

FIG. 41 is a block diagram showing the configuration of an information filtering center provided in the system of the second embodiment.

FIG. 42 is a flowchart showing the flow of article search processing in the system of the second embodiment.

FIG. 43 is a view conceptually showing arrival articles ranked in the system of the second embodiment.

FIG. 44 is a flowchart showing the flow of article selection processing in the system of the second embodiment.

FIG. 45 is a diagram showing an example of topics and their search results in the system of the second embodiment.

FIG. 46 is a flowchart showing the flow of additional information generation processing in the system of the second embodiment.

FIG. 47 is a diagram showing a state in which information on topics matching each article is added to the list of article headings of articles selected for the user and presented to the user in the system of the second embodiment.

FIG. 48 is a diagram showing a state in which information on the number of articles matching each topic is presented to the user in the system of the second embodiment.

FIG. 49 is a diagram showing a state in which summary sentences, excerpt sentences, or body texts of articles selected for a user in the system of the second embodiment are summarized by topic and presented to the user;

FIG. 50 is a diagram showing a state in which information on search conditions satisfied with an article is added as header information of the article body and presented to the user in the system of the second embodiment.

FIG. 51 is a diagram showing how search conditions that match the system of the second embodiment are highlighted in an article.

FIG. 52 is a diagram showing another column in which search conditions that are matched in the system of the second embodiment are highlighted in an article.

FIG. 53 is a diagram showing still another column in which search conditions that are matched in the system of the second embodiment are highlighted in an article.

FIG. 54 is a diagram showing a specific example of search conditions for searching a document that matches a certain topic in the system of the second embodiment.

FIG. 55 is a view showing a display example of search conditions added to an article presented to a user searched by the search conditions of FIG. 54 in the system of the second embodiment.

FIG. 56 is a diagram showing another display example of search conditions added to an article presented to a user by the search conditions of FIG. 54 in the system of the second embodiment.

FIG. 57 is a flowchart showing another example of article search processing in the system of the second embodiment.

FIG. 58 is a flowchart showing another example of additional information generation processing in the system of the second embodiment.

FIG. 59 is a diagram showing a relationship between each of a plurality of users and articles transmitted to the users in the system of the second embodiment.

FIG. 60 is a diagram showing a state in which information on another user who has received an article is added and presented to the list of article headings of an article selected for a user in the system of the second embodiment.

FIG. 61 is a diagram showing a state in which information about other users who have received the article is added to the summary or excerpt of the article selected and presented to a user in the system of the second embodiment.

FIG. 62 is a diagram showing a state in which information regarding another user who has received an article is added as header information of the article text and presented to the user in the system of the second embodiment.

FIG. 63 is a diagram showing another example in which information about another user who has received an article is added as header information of the article text and presented to the user in the system of the second embodiment.

FIG. 64 is a diagram showing a display example in which relevance feedback information previously performed by a user or another user is added to the article information presented this time and presented in the system of the second embodiment.

FIG. 65 is a diagram showing another display example in which the relevance feedback information previously performed by one user or another user is added to the article information presented this time and presented in the system of the second embodiment.

FIG. 66 is a block diagram showing the configuration of an information filtering center provided in the information filtering system according to the third embodiment of the present invention.

FIG. 67 is a diagram showing an example of a user profile represented by keywords and their weights in the system of the third embodiment.

FIG. 68 is a flowchart showing the flow of abstract / abstract generation processing in the system of the third embodiment.

FIG. 69 is a diagram showing an example of topics selected by the user and priorities between them in the system of the third embodiment.

FIG. 70 is a diagram showing an example of a list of articles to be presented to the user who has selected the topic shown in FIG. 69 and examples of topics that match them in the system of the third embodiment.

FIG. 71 is a view conceptually showing article information presented to a user in the system of the third embodiment.

FIG. 72 is a view showing examples of topics selected by the user and priorities between them in the system of the third embodiment.

FIG. 73 is a diagram showing an example of article information presented to the user in the next filtering when feedback is performed in the system of the third embodiment.

FIG. 74 is a flowchart showing another example of the flow of the abstract / abstract generation process in the system of the third embodiment.

FIG. 75 is a diagram showing an example of articles selected by an article selection unit in the system of the third embodiment.

FIG. 76 is a diagram conceptually showing another example of article information presented to the user in the system of the third embodiment.

FIG. 77 is a view showing an example of articles selected for presentation to a user when a newspaper company is adopted as an attribute in the system of the third embodiment.

78 is a diagram conceptually showing article information presented to the user in the case of FIG. 77 in the system of the third embodiment.

FIG. 79 is a diagram showing another example of article information presented to the user in the next filtering when feedback is performed in the system of the third embodiment.

FIG. 80 is a flowchart showing the flow of presentation information generation processing in the information filtering system according to the fourth embodiment of the present invention.

FIG. 81 is a block diagram showing the configuration of an information filtering center provided in the information filtering system according to the fifth embodiment of the present invention.

FIG. 82 is a flowchart showing the flow of presentation information generation processing in the system of the fifth embodiment.

FIG. 83 is a flowchart showing the flow of output processing of a duplicate article set in the system of the fifth embodiment.

FIG. 84 is a view showing an example of article presentation to a user in the system of the fifth embodiment.

FIG. 85 is a view showing an example of article presentation to the user by hypertext in the system of the fifth embodiment.

FIG. 86 is a view showing an example of article presentation to the user by hypertext in the system of the fifth embodiment.

FIG. 87 is a diagram showing an example of article presentation to the user by hypertext in the system of the fifth embodiment.

[Explanation of symbols]

1 ... Information filtering center, 2 ... Information source, 3 ... User terminal, 10 ... User profile, 11, 21, 31
User profile generator 12, User profile storage 13, 23, 33 Article information extractor 14,
24, 34 ... Article search unit, 15, 25, 35 ... Article selection unit, 16 ... Inter-article similarity calculation unit, 17 ... Presented information generation unit, 19 ... Sending article storage unit, 22, 32 ... Topic storage unit, 26 ... Additional information generating unit, 36 ... Summary / abstract generating unit.

 ─────────────────────────────────────────────────── ─── Continuation of the front page (72) Masahiro Kajiura, No. 1 Komukai Toshiba-cho, Sachi-ku, Kawasaki City, Kanagawa Prefecture, Corporate Research & Development Center, Toshiba Corporation (72) Kenji Ono Komukai-Toshiba, Saiwai-ku, Kawasaki, Kanagawa Prefecture Town No. 1 Toshiba Corporation Research & Development Center

Claims (6)

[Claims]
1. An information filtering device that receives articles such as texts and images from a plurality of information sources, selects a predetermined article from the delivered articles, and presents it to the user. Means for holding the retrieved search conditions, article retrieval means for searching the delivered articles and selecting articles that match the retrieval conditions for each user, articles selected by this article retrieval means comrades or selected articles Means for calculating the degree of similarity between an article and another article, and determining a related article for each article according to the degree of similarity, and presenting to the user the information of the determined related article added to the selected article An information filtering device comprising:
2. An information filtering device that receives articles such as texts and images from a plurality of information sources, selects a predetermined article from the delivered articles, and presents it to the user. The means for holding the retrieved search conditions, the article search means for searching the delivered articles, selecting the article that matches the search criteria for each user and presenting it to the user, and the article selected by this article search means An information filtering device comprising means for adding information indicating a satisfied search condition to each article and presenting it to the user.
3. An information filtering device that receives articles such as texts and images from a plurality of information sources, selects a predetermined article from the delivered articles, and presents it to the user. The means for holding the retrieved search conditions, the article retrieval means for searching the delivered articles, selecting the articles that match the retrieval criteria for each user and presenting them to the user, and the article retrieval means for the articles selected by this article retrieval means An information filtering device comprising means for generating a summary or abstract having a length corresponding to a type and presenting the summary or abstract to a user.
4. A means for receiving distribution of articles such as texts and images from at least one or more information sources, and a means for calculating the similarity between the search conditions designated by the user in advance and the distributed articles. In an information filtering device having an output unit that sorts articles in order of similarity and outputs only a certain number of articles or articles having a degree of similarity equal to or greater than a predetermined threshold in order of similarity, An information filtering device comprising means for calculating a degree, and grouping articles, associating articles, or controlling selection of output articles according to the calculated similarity between articles.
5. The means for calculating the similarity between articles obtains the similarity between articles for each field on the format such as the first sentence, the first paragraph, and a headline, and calculates the weighted average of these as the similarity between articles. The information filtering device according to claim 4, wherein
6. A means for regularly receiving articles such as texts and images delivered from at least one or more information sources on a daily basis, and a means for calculating the degree of similarity between the retrieval conditions designated by the user in advance and the delivered articles. In the information filtering device that sorts the articles in the order of the calculated similarity and selects only a certain number of articles or articles having a similarity of a predetermined threshold value or more, the articles output to the user as the filtering result are The output article storage means to be stored and the articles stored in the output article storage means and the articles distributed on the day are combined to calculate the similarity between the articles, and the articles are grouped or associated according to the similarity. And output to the user, including an article group consisting only of articles on the current day or articles on previous days. An information filtering device characterized by adding information for distinguishing whether it is rare to an output article.
JP33579095A 1995-07-31 1995-11-30 Information filtering device Expired - Fee Related JP3810463B2 (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
JP21293995 1995-07-31
JP7-212939 1995-07-31
JP33579095A JP3810463B2 (en) 1995-07-31 1995-11-30 Information filtering device

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP33579095A JP3810463B2 (en) 1995-07-31 1995-11-30 Information filtering device
US08/695,214 US5907836A (en) 1995-07-31 1996-07-31 Information filtering apparatus for selecting predetermined article from plural articles to present selected article to user, and method therefore

Publications (2)

Publication Number Publication Date
JPH09101990A true JPH09101990A (en) 1997-04-15
JP3810463B2 JP3810463B2 (en) 2006-08-16

Family

ID=26519516

Family Applications (1)

Application Number Title Priority Date Filing Date
JP33579095A Expired - Fee Related JP3810463B2 (en) 1995-07-31 1995-11-30 Information filtering device

Country Status (1)

Country Link
JP (1) JP3810463B2 (en)

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6119117A (en) * 1997-07-15 2000-09-12 Kabushiki Kaisha Toshiba Document management method, document retrieval method, and document retrieval apparatus
JP2002342246A (en) * 2001-05-15 2002-11-29 Pia Corp Mail magazine distributing system and computer program for realizing the same
JP2004506961A (en) * 2000-03-16 2004-03-04 マイクロソフト コーポレイションMicrosoft Corporation Generate and manage priorities
JP2005056359A (en) * 2003-08-07 2005-03-03 Sony Corp Information processor and method, program, and storage medium
JP2007034961A (en) * 2005-07-29 2007-02-08 National Institute Of Information & Communication Technology Content processor, content processing program and content processing method
JP2007287154A (en) * 2006-04-18 2007-11-01 Nhn Corp Method for assigning weight value to news article provided online and system for the method
JP2008511081A (en) * 2004-08-23 2008-04-10 トムソン グローバル リソーシーズ Duplicate document detection and display function
US7555195B2 (en) 2002-04-16 2009-06-30 Nippon Telegraph And Telephone Corporation Content combination reproducer, content combination reproduction method, program executing the method, and recording medium recording therein the program
JP2010020678A (en) * 2008-07-14 2010-01-28 Nippon Telegr & Teleph Corp <Ntt> Document summarization device, document summarization method, program and recording medium
JP2010055619A (en) * 2008-08-28 2010-03-11 Palo Alto Research Center Inc System and method for interfacing web browser widget with social indexing
JP2011002982A (en) * 2009-06-18 2011-01-06 Yahoo Japan Corp Content providing device, content providing method and content providing program
JP2011517822A (en) * 2008-04-14 2011-06-16 アルカテル−ルーセント Method for aggregating web feeds that minimize duplication
JP2011134355A (en) * 2007-07-12 2011-07-07 Oki Data Corp Document retrieval system
JP2013513140A (en) * 2009-12-07 2013-04-18 インターナショナル・ビジネス・マシーンズ・コーポレーションInternational Business Machines Corporation Contextual support for publish-subscribe systems
KR101356035B1 (en) * 2012-05-14 2014-01-29 한국과학기술원 Method and system for removing abusive word
JP2016043537A (en) * 2014-08-21 2016-04-04 株式会社アシストシステム研究所 Newspaper page printing device and newspaper page printing method
WO2016147624A1 (en) * 2015-03-13 2016-09-22 日本電気株式会社 Search system, search method, and search program
KR102114223B1 (en) * 2019-12-10 2020-05-22 셀렉트스타 주식회사 Method for filtering a similar image based on deep learning and apparatus using the same

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130073545A1 (en) * 2011-09-15 2013-03-21 Yahoo! Inc. Method and system for providing recommended content for user generated content on an article

Cited By (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6119117A (en) * 1997-07-15 2000-09-12 Kabushiki Kaisha Toshiba Document management method, document retrieval method, and document retrieval apparatus
JP2004506961A (en) * 2000-03-16 2004-03-04 マイクロソフト コーポレイションMicrosoft Corporation Generate and manage priorities
JP2002342246A (en) * 2001-05-15 2002-11-29 Pia Corp Mail magazine distributing system and computer program for realizing the same
JP4699632B2 (en) * 2001-05-15 2011-06-15 ぴあ株式会社 Mail magazine distribution system and computer program for realizing the same
US7555195B2 (en) 2002-04-16 2009-06-30 Nippon Telegraph And Telephone Corporation Content combination reproducer, content combination reproduction method, program executing the method, and recording medium recording therein the program
JP2005056359A (en) * 2003-08-07 2005-03-03 Sony Corp Information processor and method, program, and storage medium
JP2008511081A (en) * 2004-08-23 2008-04-10 トムソン グローバル リソーシーズ Duplicate document detection and display function
JP4919515B2 (en) * 2004-08-23 2012-04-18 トムソン ルーターズ グローバル リソーシーズ Duplicate document detection and display function
JP2007034961A (en) * 2005-07-29 2007-02-08 National Institute Of Information & Communication Technology Content processor, content processing program and content processing method
JP4523952B2 (en) * 2006-04-18 2010-08-11 エヌエイチエヌ コーポレーション Method and system for assigning weights to news articles provided online
JP2007287154A (en) * 2006-04-18 2007-11-01 Nhn Corp Method for assigning weight value to news article provided online and system for the method
JP2011134355A (en) * 2007-07-12 2011-07-07 Oki Data Corp Document retrieval system
JP2011517822A (en) * 2008-04-14 2011-06-16 アルカテル−ルーセント Method for aggregating web feeds that minimize duplication
JP2010020678A (en) * 2008-07-14 2010-01-28 Nippon Telegr & Teleph Corp <Ntt> Document summarization device, document summarization method, program and recording medium
JP2010055619A (en) * 2008-08-28 2010-03-11 Palo Alto Research Center Inc System and method for interfacing web browser widget with social indexing
JP2011002982A (en) * 2009-06-18 2011-01-06 Yahoo Japan Corp Content providing device, content providing method and content providing program
JP2013513140A (en) * 2009-12-07 2013-04-18 インターナショナル・ビジネス・マシーンズ・コーポレーションInternational Business Machines Corporation Contextual support for publish-subscribe systems
US9020959B2 (en) 2009-12-07 2015-04-28 International Business Machines Corporation Contextual support for publish-subscribe systems
KR101356035B1 (en) * 2012-05-14 2014-01-29 한국과학기술원 Method and system for removing abusive word
JP2016043537A (en) * 2014-08-21 2016-04-04 株式会社アシストシステム研究所 Newspaper page printing device and newspaper page printing method
WO2016147624A1 (en) * 2015-03-13 2016-09-22 日本電気株式会社 Search system, search method, and search program
WO2016147621A1 (en) * 2015-03-13 2016-09-22 日本電気株式会社 News article management system, news article management method, and news article management program
KR102114223B1 (en) * 2019-12-10 2020-05-22 셀렉트스타 주식회사 Method for filtering a similar image based on deep learning and apparatus using the same

Also Published As

Publication number Publication date
JP3810463B2 (en) 2006-08-16

Similar Documents

Publication Publication Date Title
US10528650B2 (en) User interface for presentation of a document
US9857946B2 (en) System and method for evaluating sentiment
US9600533B2 (en) Matching and recommending relevant videos and media to individual search engine results
US9384245B2 (en) Method and system for assessing relevant properties of work contexts for use by information services
US9514216B2 (en) Automatic classification of segmented portions of web pages
US9342602B2 (en) User interfaces for search systems using in-line contextual queries
US9449080B1 (en) System, methods, and user interface for information searching, tagging, organization, and display
Pazzani et al. Learning and revising user profiles: The identification of interesting web sites
US9483534B2 (en) User interfaces for a document search engine
US8655869B2 (en) System and method for information retrieval from object collections with complex interrelationships
US5724571A (en) Method and apparatus for generating query responses in a computer-based document retrieval system
US8554786B2 (en) Document information management system
Mukherjea et al. Amore: A world wide web image retrieval engine
US5721897A (en) Browse by prompted keyword phrases with an improved user interface
US7886235B2 (en) Interactive document summarization
US5649186A (en) System and method for a computer-based dynamic information clipping service
US5832494A (en) Method and apparatus for indexing, searching and displaying data
Rowley The controlled versus natural indexing languages debate revisited: a perspective on information retrieval practice and research
US8051080B2 (en) Contextual ranking of keywords using click data
US6038561A (en) Management and analysis of document information text
US5999927A (en) Method and apparatus for information access employing overlapping clusters
US7386438B1 (en) Identifying language attributes through probabilistic analysis
US20150178350A1 (en) Automatic method and system for formulating and transforming representations of context used by information services
US6711585B1 (en) System and method for implementing a knowledge management system
US8176418B2 (en) System and method for document collection, grouping and summarization

Legal Events

Date Code Title Description
A521 Written amendment

Free format text: JAPANESE INTERMEDIATE CODE: A523

Effective date: 20040119

A911 Transfer of reconsideration by examiner before appeal (zenchi)

Free format text: JAPANESE INTERMEDIATE CODE: A911

Effective date: 20040202

A912 Removal of reconsideration by examiner before appeal (zenchi)

Free format text: JAPANESE INTERMEDIATE CODE: A912

Effective date: 20040319

A711 Notification of change in applicant

Free format text: JAPANESE INTERMEDIATE CODE: A711

Effective date: 20041203

A521 Written amendment

Free format text: JAPANESE INTERMEDIATE CODE: A821

Effective date: 20041203

A521 Written amendment

Free format text: JAPANESE INTERMEDIATE CODE: A523

Effective date: 20060421

A61 First payment of annual fees (during grant procedure)

Free format text: JAPANESE INTERMEDIATE CODE: A61

Effective date: 20060524

R150 Certificate of patent or registration of utility model

Free format text: JAPANESE INTERMEDIATE CODE: R150

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20100602

Year of fee payment: 4

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20110602

Year of fee payment: 5

LAPS Cancellation because of no payment of annual fees