WO2007029207A2

WO2007029207A2 - Method, device and system for providing search results

Info

Publication number: WO2007029207A2
Application number: PCT/IB2006/053177
Authority: WO
Inventors: Lalitha Agnihotri; Nevenka Dimitrova; Mauro Barbieri
Original assignee: Koninklijke Philips Electronics N.V.
Priority date: 2005-09-09
Filing date: 2006-09-08
Publication date: 2007-03-15
Also published as: WO2007029207A3

Abstract

A method, device and system for providing a search result in response to a user request. The method includes the steps of entering one or more keywords to a network search engine, receiving a list of network links in response, generating clusters of documents referred to by the network links. The clustering is based on content and modality (text, images, video clips etc.) of the documents, and for each cluster of documents, a summary is generated so as to reduce redundant information. Selected parts of text summaries are translated into a synthesized voice signal, and finally an audio-visual presentation is generated based on the generated summaries and including the synthesized voice signal. As a result, a user can enjoy a multimedia show of the search result in a lean-back mode, e.g. on a TV set. Since text documents are converted to synthesized voice, the presentation may be built up based on the content of text documents, and images and video clips and additional audio clips may be presented along with the synthesized voice that has a narrative function. Thus, the user can absorb a result of a comprehensive search in an effective way using different modalities. A device with user input means, network connecting means and processing means adapted to perform the mentioned method may be formed by a computer, a video recorder, a hard disc recorder, a video camera or a digital TV set.

Description

METHOD, DEVICE AND SYSTEM FOR PROVIDING SEARCH RESULTS

The invention relates to the field of searching for digital information and for providing results thereof, e.g. for a search on a network. More specifically, the invention provides a method, a device and a system for providing a search result in response to a user request.

Comprehensive amount of information on all kinds of topics are available to many people, either locally on their computer hard disc, via local networks with limited access, or via the Internet. People can easily find lists of links to sources of information such as using search engines, e.g. www.google.com, www.yahoo.com, and www.a9.com. However, there is no easy way to have a comprehensive overview of the large amount of information available.

For example, people like to travel and spend time collecting information on their travel destination. Internet, digital TV, digital magazines, e-books and other sources of digital content provide plenty of information on any travel destination. Additionally, many people publish photo and video albums of their vacation on public Internet web sites. Finding relevant information for planning a vacation in a place never visited before requires time, effort and skills due to the huge amount of data available from the multiple sources.

US patent application 2002/0023084 Al describes a method of rendering search results from an Internet search engine and providing a user with a slideshow presentation of the search results.

WO 01/96978 A2 describes a robot agent for extracting web document information and layout images of web documents on the Internet. Embodiments are described that display scaled-down versions of the web images one by one in a slideshow.

It is preferred to provide a user with an easily accessible overview of a search result upon a user request on a topic.

According to a first aspect the invention provides a method for providing a search result in response to a user request, the method comprising the steps of:

1) entering, based on the user request, one or more keywords to a search engine,

2) receiving in response a list of information links, 3) generating clusters of documents referred to by the information links, based on content and modality of the documents,

4) generating a summary for each of the content of clusters of documents,

5) synthesizing voice signal based on selected text parts of the summaries, and

6) generating an audio-visual presentation based on the generated summaries and including the synthesized voice signal.

Instead of providing a list of links as a result of a search, the method creates an abridged version by summarizing and combining into a meaningful multimedia mosaic of all the information that is relevant for the user. The user can then enjoy a multimedia summary in lean-back mode. By including step 5), text can be completely eliminated in the audiovisual presentation, or at least text can be significantly reduced. Hence, the user can consume a search result extract in a more relaxed fashion since he/she can use both eyes and ears but without the need for reading comprehensive text, such as if a conventional manual Internet search on a computer is performed.

The one or more keywords to enter are extracted from the user request. The user request, i.e. user query, may be in form of only one or more keywords. Alternatively, the user request may comprise the step of the user filling out a questionnaire, e.g. interactively, so as to obtain additional information relevant for the user to the search to be performed, thus enabling a more selective search and the possibility to extract information that is most likely relevant for the user.

The method is suited to be performed in connection with a search engine such as an Internet search engine such as Google, but the search engine may alternatively be a search engine on a local network. Alternatively, the search engine may be a search facility installed on a local device, e.g. a computer, such as a personal computer (PC). The method may be performed on the local device, e.g. in storage means of the local device, e.g. on a hard disc installed in the device. The method may be provided as an integrated functionality provided by an operating system of the device or as a desktop search functionality.

Thus, the method may equally well be performed in connection with information searches on a type of network or on a local information device without any network connections.

In step 3) the various documents behind the information links received in step 2) are preferably scanned in order to determine their content and their modality. In general, in case of an Internet search result provided by an Internet search engine such as Google, the information links may include links to single documents as well as links to websites that may contain plenty of documents. Thus, in such cases step 3) preferably includes the steps of entering a website referred to by an information link from the list of information links, scanning the website for relevant documents, and retrieving documents that are found to be relevant based on the user request.

By modality of a document is understood the type or category the document belongs to, thus modality may be: text, image, video clip, audio clip (e.g. including speech) etc. Preferably, in step 3) the documents are first grouped according to their modality. Subsequently, for each modality, clustering is performed. In each modality the clustering is performed in an N dimensional space. N can be different for each modality and it depends on the dimensionality of the features used. E.g. to cluster an audio document representing a song, more than ten features related to audio signal properties are used, and to cluster a video document tens of audio-visual features may be involved in the clustering process.

The division of clusters or groups depending on content may be performed based on additional information provided in the user request, e.g. based on sub-keywords etc. Determination of content in case of a text document may be performed by extracting keywords from the text. Various feature analysis methods may be applied, in case of image, video or audio documents which have no keywords or summary attached thereto. The clustering process may be performed according to a hierarchical clustering or to a partitional clustering or according to a combination of a hierarchical and a partitional clustering. Clustering of video documents may be performed according to "Video clustering using superhistograms in large video archives", L. Agnihotri and N. Dimitrova, In Proc. Fourth International Conference on Advances in Visual Information Systems (Visual 2000), pages 62-93, 2000.

In step 4) summarization is performed with the purpose of reducing redundancy within each cluster or group of documents. Various methods exist for automatic summarization for various modalities. Text summarization may be performed with one of the methods described in "Advances in Automatic Text Summarization I", Mani and M. T. Maybury, editors, MIT Press, 1999. Speech Summarization may be performed as described in "Automatic speech summarization based on word significance and linguistic likelihood", C Hori, S Furui - ICASSP IEEE INT CONF ACOUST SPEECH SIGNAL PROCESS PROC, 2000. Video summarization may be performed according to "Summarization of video programs based on closed captioning", L. Agnihotri, K. Devara, T. McGee, and N. Dimitrova, In Proc. SPIE Conference on Storage and Retrieval in Media Databases, pages 599-607, 2001, or according to one of the methods reported in "Video summarization: Methods and landscape", M. Barbieri, L. Agnihotri, and N. Dimitrova. In SPIE ITCOM

Conference on Internet Multimedia Management Systems, 2003.

In step 5) selected parts of text summaries are converted into a synthesized voice signal, Le. synthesized speech. It may be preferred that the entire text summaries are converted into a synthesized voice signal. This produces a text summary that can be more easily absorbed by the user since the text summary in form of speech can be combined, in step 6), with summaries of other modalities such as summaries in form of images or video clips, and synthesized speech may even be combined with other audio clips, that support the message in the speech, e.g. birds singing in order to support an image from a forest etc. to form a complete audio-visual presentation. Thus, the user can be presented with the result of the search in a slideshow or even a TV program like presentation assisted by summaries of text document presented as synthesized voice together with a mix of other modalities. The voice synthesis of text may be performed in various ways such as known to the skilled person. The voice synthesis may be such as dictionary based (unit selection) voice synthesis, diphone based voice synthesis or voice synthesis based on simulating human voice sounds using a mathematical model of the human speech organs (articulatory synthesis).

In preferred embodiments, step 6) comprises combining images, video clips and the synthesized voice signal into the audio-visual presentation, thus a more compressed presentation can be obtained, i.e. the user will be able to absorb the information in a "lean back mode" in short time. In step 6) an additional audio signal may be included in addition to the synthesized voice signal into the audio-visual presentation. For example the additional audio signal may be a sound track supporting a video clip, i.e. motion pictures, thus gaining the user additional information along with the pictures, e.g. the sound track enhancing experience of differences between the city of London and the country-side.

The synthesized voice signal may be used as a narrative of the audio -visual presentation. Thus, the audio-visual presentation can be guided, i.e. outlined, based on the spoken summaries of text documents. The additional modalities, images, video clips, audio clips etc. are then arranged in accordance therewith to support a structured audio-visual presentation. Preferably, the synthesized narrative voice is aligned with the additional presentations so that e.g. when the narrative voice explains about Mount Everest then images or video clips are displayed along therewith.

Preferably, step 3) includes generating for each modality, a set of clusters based on content of the documents. Thus, e.g. content is split into three different modality clusters: text, images and video clip. Each of these modality clusters are again divided according to their content, i.e. into sub-division of the overall subject or topic of the search.

Step 3) may be performed taking into account a source of the documents, i.e. where the links referring to the documents originate. Such information can be used to further cluster the documents and/or such information can be used to outline the audio-visual presentation. E.g. a search on a travel site may include links to documents in travel books, personal websites and local tourist information websites. The origin of links, i.e. the source of the documents, may then be used to determine in the audio-visual presentation an order of presentation, e.g. so that information originating from travel books is presented prior to that originating from tourist information websites, and finally the least important information may be gained from personal websites and therefore such information may be presented towards the end of the presentation. The origin of the links, i.e. the source of the documents, may in addition be used to rank the information with respect to relevance. I.e. an audio-visual presentation with a preset length of e.g. 3 minutes can be structured based on a relevance ranking of the documents where the source or origin of the documents is taken into account.

The method may further comprise a step of evaluating relevance of the documents referred to by the network links, based on additional information provided by the user, e.g. in the user request. For example, the additional information may include a list of priority provided by the user. Such additional list of priority may be information provided by the user related to a priority of source of documents. I.e. the user may rank the types of links so that the final audio-visual presentation will be based on documents from specific types of links that the user believes are the most relevant. E.g. the user may enter a priority list of links. In the above example, travel book information may be ranked higher than personal websites in case the user wants to search on a specific travel site during vacation planning. The method may include the step of omitting documents from specific sources, e.g. personal websites.

Other types of additional information provided by the user may be such as, in the example of a search related to planning a vacation on a specific travel site, information from the user if he/she has already visited the travel site before or if it is first time to visit the site. Such additional information from the user can then be taken into account by omitting the most basic information about the travel site if the user has already visited the site, as it can be assumed that such basic information is already known by the user.

The user request may, depending on the search topic, include further information that can be used by the method to rank relevance of found documents and to prepare the summarization of documents in accordance therewith with the purpose of preparing the audio-visual presentation so that it provides the user with the most relevant information based on the search result.

In preferred embodiments, an order of presentation in the audio-visual presentation is based on a priority provided by the user. Following the above descriptions, the user may provide additional information on a priority e.g. sub-topics of special interest, sub-topics of no interest, or various preferences with respect to source of documents, as described above. Such information may then be taken into account in the process of preparing an order of presentation in the audio-visual presentation. Hereby it is possible e.g. to present first information that the user finds most relevant, and then present next information that the user finds less relevant.

The method according to a first aspect may be applied within various devices and systems together with video on demand services and systems, web services etc.

In a second aspect, the invention provides a device adapted to provide a search result in response to a user request, the device comprising

- receiving means for receiving the user request,

- searching means adapted to entering one or more keywords, based on the user request, to perform a search, and to provide in response a list of information links, and

- processing means adapted to

- cluster documents referred to by the information links, based on their content and modality,

- generate a summary for each of the clusters,

- synthesize voice based on text parts of summaries, and

- generate an audio-visual presentation based on the generated summaries including the synthesized voice.

The device according to the second aspect provides the same advantages as mentioned in connection with the first aspect. In addition, the same embodiments related to the processing means exist as mentioned in connection with the first embodiment.

The device may be selected from the group consisting of: personal video recorders, stationary consumer electronics, portable consumer electronics, personal infotainment companions, media servers, digital cameras, DVD recorders, hard disc recorders, TV sets etc. The device may also be implemented using a computer, e.g. a Personal Computer (PC).

The receiving means may include a keyboard where the user can enter search words etc. The receiving means may additionally or alternatively include a remote control, and/or a mouse, or a combination of the mentioned means for user interaction. The network connection means may include any type of electronic means, either wired or wireless, that is capable of providing a connection to an information network such as the Internet or a locally accessible network.

The device preferably includes storage means, e.g. RAM/ROM based or disc based, adapted to store the generated audio-visual presentation in a file so a user can view the presentation at any convenient time. The device does not necessarily include itself display and loudspeaker means to present the presentation to the user. Thus, a generated audio-visual presentation file may be played back using e.g. a TV set by connection via any known audiovisual analog or digital connection. The device may also be able to store a presentation file on a portable memory medium, such as a memory stick, a memory card etc. This will enable the user to carry the generated presentation to another device with audio-visual capabilities.

In a third aspect, the invention provides a system adapted to provide a search result on a user request, the system comprising

- receiving means for receiving the user request,

- searching means adapted to entering one or more keywords, based on the user request, to perform a search, and to provide in response a list of information links,

- processing means adapted to

- generate a summary for each of the clusters,

- synthesize voice based on text parts of the summaries, and

- generate an audio-visual presentation based on the generated summaries including the synthesized voice,

- loudspeaker means adapted to present an audio part of the audio-visual presentation, and

- video display means adapted to present a visual part of the audio-visual presentation. The advantages and embodiments described for the first and second aspect apply for third aspect as well.

The system may be integrated into a single apparatus or the system may comprise a number of separate interconnected apparatuses. For example, the system may comprise a hard disc recorder connected to a TV set. The hard disc recorder may then comprise receiving means, network connecting means and processing means adapted to generate the audio-visual presentation, whereas the TV has loudspeaker means and video display means to present the audio-visual presentation provided by the hard disc recorder. In a fourth aspect, the invention provides a computer readable program code adapted,

Le. software, comprising algorithms implementing the method as described in the first aspect. Such computer executable program code may be adapted to run on a specific computer or processor, alternatively the program code may be a generic program code adapted to be translated into a processor dependent code for execution. The program code may be stored on a storage medium. The storage medium may be a memory, e.g. RAM/ROM, a memory stick, a memory card etc. Alternatively, the storage medium may be a disc such as a CD, a DVD or a hard disc.

In the following the invention is described in more details with reference to the accompanying figures, of which

Fig. 1 illustrates a in a block diagram essential steps of a preferred method according to the invention, and

Fig. 2 illustrates in schematic form a preferred device according to the invention.

While the invention is susceptible to various modifications and alternative forms, specific embodiments have been shown by way of example in the drawings and will be described in detail herein. It should be understood, however, that the invention is not intended to be limited to the particular forms disclosed. Rather, the invention is to cover all modifications, equivalents, and alternatives falling within the spirit and scope of the invention as defined by the appended claims.

Fig. 1 illustrates various steps of processing in an embodiment of the method according to the invention, the embodiment being adapted to provide a search result from a network NW. A user request UREQ regarding a search is first entered via a search engine on the network NW, e.g. the Internet. A list of information links, or network links, NL is received in response to the search. In case the information links include links to websites, these websites are then scanned for relevant documents. All found documents DC referred to in the received information links, i.e. documents directly referred together with documents or indirectly referred to via website links, are then clustered CLU according to their content and according to their modality. This clustering CLU leads to a set of documents SD that includes, for each content cluster Cl, C2, C3 a clustering of documents DC into different modalities Ml, M2, M3. For example the modality Ml may be text, modality M2 may be images, and modality M3 may be video clips. In the illustrated example, documents DC referring to content Cl are found for all three modalities Ml, M2, M3, whereas only text documents are found for content C2.

After the clustering CLU, a summarization SMM is performed which results in a summary for each of the illustrated clusters of documents. I.e. a summary for the documents in cluster Ml₅ Cl is performed and so on. Thus, for each content Cl, C2, C3 a summary of documents belonging to the modalities available for that content are generated. In the illustrated example only text modality Ml are available for content C2, and thus for content C2 only a text summary is generated, whereas for content CI a text summary, an image summary and a video clip summary are generated. Text summaries are then translated into synthesized voice VSYN before included in an audio-visual presentation PRST. Summaries for other modalities are directly included into the audio-visual presentation PRST.

Clustering CLU and summarization SMM may be performed according to methods known in the art, i.e. method that are different for different modalities.

The method will now be explained by an application example, namely a search in connection with vacation planning to a travel site. In the user request UREQ a user may enter one or more search keywords together with additional search information or parameters, e.g. including the time available and the travel details, e.g. first or second visit etc. This is used for summarization and ranking the relevance of the places and resorts that can be visited at the travel site. Accordingly, a "To-do" and "To-see" list is generated which covers major things to do at a given location.

Using a web search engine such as a9.com, google.com, etc., a list of relevant web sites, books, movies, images, etc. are retrieved. For each modality (text, video, images, audio, etc.) and source category (travel web sites, books, personal web pages, etc.) documents are clustered based on their content (e.g. keyword based clustering for text, content features clustering for video, audio and images). Thus, the source of the documents is taken into account in order to evaluate their relevance.

A single-modality summarization is then applied to each cluster. Multiple travel guides are summarized in order to provide related pictures and videos. Information from web crawlers, web information extraction, summaries of ratings from multiple guides, what people have said about different resorts, different things they did and described in their personal web sites, and various other items may be included.

A voice synthesis of text summaries are then generated and together with photo summaries, video summaries etc. a single multimedia lean-back version that can be consumed as a TV programme or multimedia presentation is then generated. Preferably, a duration of the multimedia presentation will be selected to be at least equal to a duration of the synthesized voice signal. To fill in the total duration, video summaries of a certain duration are the included into the presentation. Images are preferably included, and a number of images may be shown in a slideshow manner along with synthesized voice signal that is used as a narrative voice.

In connection with a vacation planning system, separate audio-visual presentations of what-to-do and what-to-see at a travel site may be included in the audio-visual presentation, Le. highlighted in a video that is automatically generated. According to user preferences the presentation can be personalized, e.g. include photo, video of what the user likes best or finds most relevant according to additional user information provided in the user request UREQ.

Fig. 2 shows a device DEV embodiment according to the invention. The device includes user input means UIM adapted to receive a user request UIP, i.e. an input UIP from a user including at least one search keyword, e.g. "London" travel in case of a vacation planning. The user input means then communicates one or more search keywords to searching means SM. In this device embodiment DEV the searching means SM includes network connecting means that enters the one or more keywords to a search engine on a connected network NW, e.g. the Internet. In addition, the searching means SM may perform a search using searching facilities on a locally installed hard disc HD. In response to the search a list of information links to documents and websites on the network NW are received together with links to documents on the local hard disc HD. The searching means SM applies these links to processing means PM that essentially performs the steps illustrated and described in connection with Fig. 1, i.e. resulting in an audio-visual presentation. The generated audio-visual presentation may be played immediately or stored as a multimedia file e.g. on the hard disc HD of the device DEV.

The processing means PM may be implemented as software algorithms in a signal processor or main processor of an apparatus. Thus, the invention may be exercised without further hardware.

The audio part A and visual part V of the presentation may then be presented using one or more loudspeakers LSPK and a video display VDSP, e.g. using a TV set or similar that includes means for presenting both the audio A and visual V parts of the presentation.

The device illustrated in Fig. 2 may be a hard disc recorder, optionally with an Internet connection. The user input UIP used to receive the user request may be buttons on a front panel of the hard disc recorder or a remote control, alternatively via a connected keyboard. The device may also be a digital TV that includes the features of the illustrated device together with stereo loudspeakers LSPK and a display VDSP.

It will be appreciated that the present invention may be applied within a wide range of different searches related to different topics. The vacation planning example used throughout the description is merely chosen for illustration purpose only. Another application examples may be a user who wants to buy a new car. He may then enter a car make and receive in response an audio -visual presentation that provides the him with some facts regarding different car models in synthesized voice together with photos of interior and exterior details of the car and video clips showing the car driving. In addition, audio clips may give the user the impression of the sound of the engine. Additionally, the user may be supplied with spoken summaries of test results from different car magazines of a specific model in question.

In the claims reference signs to the figures are included for clarity reasons only. These references to exemplary embodiments in the figures should not in any way be construed as limiting the scope of the claims.

Claims

1. A method for providing a search result in response to a user request (UREQ), the method comprising the steps of:

1) entering, based on the user request (UREQ), one or more keywords to a search engine,

2) receiving in response a list of information links (NL),

3) generating clusters (CLU) of documents (DC) referred to by the information links (NL), based on content and modality of the documents,

4) generating a summary (SMM) for each of the content of clusters of documents (DC),

5) synthesizing voice signal (VSYN) based on selected text parts of the summaries, and

6) generating an audio-visual presentation (PRST) based on the generated summaries and including the synthesized voice signal.

2. A method according to claim 1 , wherein step 6) comprises combining images, video clips and the synthesized voice signal into the audio-visual presentation (PRST).

3. A method according to claim 1, wherein step 6) comprises including an audio signal in addition to the synthesized voice signal into the audio-visual presentation (PRST).

4. A method according to claim 1, wherein the synthesized voice signal is used as a narrative of the audio-visual presentation (PRST).

5. A method according to claim 1, wherein step 3) includes generating for each modality (Ml, M2, M3), a set of clusters based on content (Cl, C2, C3) of the documents (DC).

6. A method according to claim 1 , wherein step 3) includes taking into account a source of the documents (DC).

7. A method according to claim 1, further comprising the step of evaluating relevance of the documents (DC) referred to by the information links, based on additional information provided by the user.

8. A method according to claim 7, wherein the additional information includes a list of priority provided by the user.

9. A method according to claim 1, wherein an order of presentation in the audiovisual presentation (PRST) is based on a priority provided by the user.

10. A method according to claim 1, wherein step 3) includes the steps of entering a website referred to by an information link from the list of information links, scanning the website for relevant documents (DC), and retrieving documents (DC) that are found to be relevant based on the user request.

11. A device (DEV) adapted to provide a search result in response to a user request (UIP), the device (DEV) comprising

- receiving means (UIM) for receiving the user request (UIP),

- searching means (SM) adapted to entering one or more keywords, based on the user request (UIP), to perform a search, and to provide in response a list of information links, and

- processing means (PM) adapted to

- cluster documents referred to by the information links, based on their content and modality, - generate a summary for each of the clusters,

- synthesize voice based on text parts of summaries, and

- generate an audio -visual presentation (A, V) based on the generated summaries including the synthesized voice.

12. A system adapted to provide a search result on a user request (UIP), the system comprising

- receiving means (UIM) for receiving the user request (UIP),

- searching means (SM) adapted to entering one or more keywords, based on the user request, to perform a search, and to provide in response a list of information links,

- processing means (PM) adapted to

- generate a summary for each of the clusters,

- synthesize voice based on text parts of the summaries, and

- generate an audio -visual presentation (A, V) based on the generated summaries including the synthesized voice,

- loudspeaker means (LSPK) adapted to present an audio part (A) of the audio-visual presentation (A, V), and

- video display means (VDSP) adapted to present a visual part (V) of the audio-visual presentation (A, V).

13. Computer executable program code adapted to perform a method for providing a search result in response to a user request, the method comprising the steps of:

2) receiving in response a list of information links,

3) generating clusters of documents referred to by the information links, based on content and modality of the documents,

4) generating a summary for each of the content of clusters of documents,

5) synthesizing voice signal based on selected text parts of the summaries, and