WO2013128462A1 - Video search - Google Patents

Video search Download PDF

Info

Publication number
WO2013128462A1
WO2013128462A1 PCT/IN2012/000133 IN2012000133W WO2013128462A1 WO 2013128462 A1 WO2013128462 A1 WO 2013128462A1 IN 2012000133 W IN2012000133 W IN 2012000133W WO 2013128462 A1 WO2013128462 A1 WO 2013128462A1
Authority
WO
WIPO (PCT)
Prior art keywords
query
query terms
electronic document
section headings
present
Prior art date
Application number
PCT/IN2012/000133
Other languages
French (fr)
Inventor
Amol Sunil DIXIT
Krishnan Ramanathan
Yogesh Sankarasubramaniam
Vidhya Govindaraju
Original Assignee
Hewlett-Packard Development Company, L.P.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hewlett-Packard Development Company, L.P. filed Critical Hewlett-Packard Development Company, L.P.
Priority to CN201280070193.7A priority Critical patent/CN104106064A/en
Priority to PCT/IN2012/000133 priority patent/WO2013128462A1/en
Priority to US14/373,493 priority patent/US20140379731A1/en
Priority to EP12869857.8A priority patent/EP2820568A4/en
Publication of WO2013128462A1 publication Critical patent/WO2013128462A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/73Querying

Definitions

  • WWW World Wide Web
  • a typical internet search requires a user to provide a keyword or a set of keywords to a search engine.
  • the search engine displays the search results, which may be in the form of text documents, web pages, URLs (Uniform Resource Locator), etc.
  • FIG. 1 shows a flow chart of a method of performing a video search, according to an embodiment.
  • FIG. 2 illustrates a system for performing a video search, according to an embodiment.
  • Embodiments of the present solution provide a method and system for performing a video search on a computer network (such as intranet or the internet) using a search engine (for example, a video search engine).
  • a search engine for example, a video search engine
  • FIG. 1 shows a flow chart of a method of performing a video search, according to an embodiment.
  • the method may be implemented in a computing system, such as, but not limited to, a desktop computer, a notebook computer, a server computer, a personal digital assistant (PDA), a mobile device, a touch pad, a television (TV) set, a docking device, and the like.
  • the computing system may be connected to a computer network, such as, an intranet or the internet (World Wide Web), through wired (for example, coaxial cable) or wireless (for example, Wi-Fi) means.
  • a computing system receives a video search query from a user.
  • the computing system has an input interface to receive the video search query from the user.
  • the input interface includes a software interface (such as a graphical user interface (GUI)) and/or a hardware interface for providing a search input (such as a keyboard).
  • GUI graphical user interface
  • the video search query is a text based search query comprising a keyword or a string of keywords, which a user identifies. For example, if a user intends to search for "How to assemble a computer", he or she may provide the complete query string i.e. "How to assemble a computer", or a string of keywords "assembling a computer", and the like. The choice of query keywords lies with the user.
  • the video search query is a voice (speech) based search query comprising a keyword or a string of keywords.
  • voice speech
  • the computing device is provided with a speech capture device and speech recognition means.
  • the computing device may also have a speech-to-text conversion computer program (a set of machine readable instructions which can be executed by a processor of the computing system).
  • the computing system analyses the video search query to identify a first set of query terms.
  • This part of the method may be termed as "question processing".
  • the "question processing" part may be implemented by a question processing module which, in one example, may reside in the memory of the computing system.
  • the method analyses the video search query from a user and tries to find the information need of the user. The idea is to identify those keywords in the search query which replicate the search intent of the user. This is carried out by performing a part-of-speech tagging (on the search query) to identify two types of words in the search query.
  • the first type of words is called "noun phrases”. It includes all nouns and proper nouns in the search query.
  • the second type of words is called "Focus words”. It includes all nouns, proper nouns, non-trivial verbs, adjectives and numbers.
  • the words identified as “noun phrases” and/or "focus words” are recognized as a first set of query terms.
  • terms “how”, “repair” and “computer” would be identified as a first set of query terms.
  • terms “how”, “wear” and “tie” would be acknowledged as a first set of query terms.
  • the computing system uses the "noun phrases" from the first set of query terms to query a knowledge repository.
  • the first set of query terms are used to query a knowledge repository.
  • a knowledge repository may be defined as a computerized system (or database) that systematically captures, organizes and categorizes knowledge in the form of a collection of electronic documents.
  • the repository can be searched and data is retrievable.
  • an online encyclopedia such as Wikipedia or Britannica Online, is a knowledge repository.
  • the practice of using the first set of query terms to query a knowledge repository may be termed as “question understanding” part of the method.
  • the "question understanding” part may be implemented by a question understanding module which, in one example, may reside in the memory of the computing system.
  • the query to a knowledge repository results in identifying electronic document(s) that corresponds to the first set of query terms.
  • the "noun phrases" from the first set of query terms are used to query the Wikipedia repository to identify an electronic document(s) corresponding to the "noun phrases”.
  • the process involves using the Wikipedia search API (Application Programming Interface) to query the Wikipedia repository. For example a query "How did the Universe originate” might give the Wikipedia page on Big Bang (http://en.wikipedia.org/wiki/Big_Bang)
  • An electronic document(s) corresponding to the first set of query terms is identified using Wikipedia categories.
  • Wikipedia uses a category system, which provides links to all Wikipedia articles in the form of a hierarchy of categories. The categories allow articles to be placed in one or more groups, and allow those groups to be further categorized. Each article in Wikipedia belongs to at least one category.
  • Topic categories are named after a topic and usually share a name with the Wikipedia article on that topic. For example, category "Cricket" would contain all articles related to cricket.
  • Set categories are created for a class of object. For example, category "Wines of France” contains articles whose subjects are wines of France.
  • an electronic document(s) corresponding to a first set of query terms may include a web page(s).
  • an electronic document may include a document containing text, audio and/or video.
  • an electronic document(s) corresponding to a first set of query terms is passed to a regular expression based parser to extract the following information.
  • Section headings These include list of all section headings in the identified electronic document.
  • Sub-section headings These include list of all sub-section headings in the identified electronic document.
  • Important noun phrases include all those noun phrases in the electronic document, which are not present in the section headings, subsection headings and hyperlinks of the identified electronic document.
  • the extracted information (Section headings, Sub-section headings, hyperlinks and important noun phrases) is combined to form a second set of query terms.
  • the extracted information is merged to obtain a second set of query terms.
  • duplicate terms are also removed to form a neat second set of query terms.
  • query term as used herein, in this document, may include one word or a set of words.
  • the query terms in the second set are ranked.
  • This part of the method may be termed as “question term ranking”.
  • the "question term ranking” part may be implemented by a question term ranking module which, in one example, may reside in the memory of the computing system.
  • a weighting mechanism is used to assign weights to the query terms. Weighting may be carried out in different ways. Some of these means are described below.
  • the final weight given to a query term is calculated by adding the individual weights assigned through different weighing methods.
  • the weighting may be done as follows.
  • Section/Sub-section headings More weight is given to a query term present in a sub-section heading than to a term in a section heading. This is based on the premise that comparatively sub-section headings represent actual topic than section headings.
  • Word overlap The extent of overlap between: (a) a query term and focus words and (b) query term and page title (for example, Wikipedia page title) is computed. A higher overlap indicates a more relevant query, and therefore a higher weight.
  • Hyperlinks the hyperlinks in the electronic document are individually given a weight (as they represent, in case of Wikipedia repository, Wiki concepts). If a query term is present in a hyperlink, it is considered relatively more important and given a higher weight.
  • Important sections ranker All section and sub-section headings with even a single non-zero word overlap with noun phrases are considered as important sections or sub-sections.
  • Query terms which are present in a text associated with an important section or sub-section are given higher weight since they are more relevant.
  • the method recognizes those section and sub-section headings of the electronic document which share at least one common term with the second set of query terms, and upon recognition assigning relatively more weight to those query terms which are present in a text associated with aforesaid section and sub-section headings of the electronic document.
  • a final weight (to a query term) may be given by adding the individual weights assigned through different weighing methods. Assigning a final weight to the query terms (of the second set) results in a ranked list of query terms.
  • the video search engine may be accessed via a web browser such as Windows Internet Explorer, Mozilla Firefox, Google Chrome, Opera, etc.
  • a non- limiting example of video search includes YouTube.
  • the video search engine displays the results of top N ranked query terms to the user on a display coupled to the computing system.
  • search results are sorted on the basis of video coverage, diversity and relevance.
  • FIG. 2 illustrates a system for performing a video search, according to an embodiment.
  • the system 200 includes a computing system 210 connected to a computer network 270.
  • the computing system 210 may be, but not limited to, a desktop computer, a notebook computer, a server computer, a personal digital assistant (PDA), a mobile device, a touch pad, a television (TV) set, a docking device, and the like.
  • PDA personal digital assistant
  • Computing system 210 may include a processor 220, for executing machine readable instructions, a memory (storage medium) 230, for storing machine readable ' instructions (such as, a web browser module), an input interface 240 and a display 250. These components may be coupled together through a system bus 260.
  • Processor 220 is arranged to execute machine readable instructions.
  • the machine readable instructions may be in the form of a web browser module 240.
  • processor 220 executes machine readable instructions to: identify a first set of query terms from the video search query; use the first set of query terms to query a knowledge repository, wherein the knowledge repository is a collection of electronic documents; identify an electronic document corresponding to the first set of query terms; parse the electronic document to obtain a second set of query terms; rank query terms obtained in the second set of query terms, by assigning a weight to the query terms; and provide top N ranked query terms to a video search engine.
  • the memory 230 may include computer system memory such as, but not limited to, SDRAM (Synchronous DRAM), DDR (Double Data Rate SDRAM), Rambus DRAM (RDRAM), Rambus RAM, etc. or storage memory media, such as, a floppy disk, a hard disk, a CD-ROM, a DVD, a pen drive, etc.
  • the memory 230 may include modules, such as, but not limited to, a web browser module 240.
  • the web browser module may be used to provide video search query terms to a video search engine.
  • Some major web browser modules include Windows Internet Explorer, Mozilla Firefox, Google Chrome, and Opera.
  • the input interface 240 may be used to provide an initial seed set input to the computing system 210.
  • the input interface 240 may include an input device, such as a keyboard or a mouse, and other user interaction mechanisms, such as a touch interface, a voice interface (such as microphone), a gesture interface, etc.
  • the input interface also includes a software interface (such as a graphical user interface (GUI)).
  • GUI graphical user interface
  • input interface 240 is used to receive a video search query from a user.
  • the display device 250 may be any device that enables a user to receive visual feedback.
  • the display may be a liquid crystal display (LCD), a light-emitting diode (LED) display, a plasma display panel, a television, a computer monitor, and the like.
  • LCD liquid crystal display
  • LED light-emitting diode
  • plasma display panel a television, a computer monitor, and the like.
  • the computer network 270 may be the internet or an intranet.
  • the computing system 210 may be connected to a computer network 270, such as, an intranet or the internet (World Wide Web), through wired (for example, co-axial cable) or wireless (for example, Wi-Fi) means.
  • a network interface controller 280 is used to connect the computing system 210 to the computer network 270.
  • module may mean to include a software component, a hardware component or a combination thereof.
  • a module may include, by way of example, components, such as software components, processes, functions, attributes, procedures, drivers, firmware, data, databases, and data structures.
  • the module may reside on a volatile or non-volatile storage medium and configured to interact with a processor of a computer system. It would be appreciated that the system components depicted in FIG. 2 are for the purpose of illustration only and the actual components may vary depending on the computing system and architecture deployed for implementation of the present solution.
  • the various components described above may be hosted on a single computing system or multiple computer systems, including servers, connected together through suitable means.
  • the computing system 210 is connected to a search engine portal through a network, such as the internet, and a user provides an input video search query to a video search engine through a web browser stored on the computing system 210.
  • the proposed solution may be implemented on the computing system 210 or another computing device such as a server computer used to host a search engine portal.
  • Embodiments within the scope of the present solution may be implemented in the form of a computer program product including computer-executable instructions, such as program code, which may be run on any suitable computing environment in conjunction with a suitable operating system, such as Microsoft Windows, Linux or UNIX operating system.
  • Embodiments within the scope of the present solution may also include program products comprising computer- readable media for carrying or having computer-executable instructions or data structures stored thereon.
  • Such computer-readable media can be any available media that can be accessed by a general purpose or special purpose computer.
  • Such computer-readable media can comprise RAM, ROM, EPROM, EEPROM, CD-ROM, magnetic disk storage or other storage devices, or any other medium which can be used to carry or store desired program code in the form of computer-executable instructions and which can be accessed by a general purpose or special purpose computer.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Multimedia (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Provided is a computer-implemented method of performing a video search. A search query is analyzed to identify a first set of query terms. The first set of query terms are used to query a knowledge repository, wherein the knowledge repository is a collection of electronic documents. An electronic document corresponding to the first set of query terms is identified and parsed to obtain a second set of query terms. Query terms present in the second set are ranked and top N ranked query terms are provided to a video search engine.

Description

VIDEO SEARCH
Background
Internet has emerged as the preferred medium for people looking for information. From finding word meanings to searching for a detailed essay on a scientific breakthrough, the World Wide Web (WWW) provides an immediate response to a user's information needs. According to an estimate, on an average three billion plus searches are performed each day on the internet. A typical internet search requires a user to provide a keyword or a set of keywords to a search engine. In response, the search engine displays the search results, which may be in the form of text documents, web pages, URLs (Uniform Resource Locator), etc.
Brief Description of the Drawings
For a better understanding of the solution, embodiments will how be described, purely by way of example, with reference to the accompanying drawings, in which:
FIG. 1 shows a flow chart of a method of performing a video search, according to an embodiment.
FIG. 2 illustrates a system for performing a video search, according to an embodiment.
Detailed Description of the Invention
As indicated earlier, millions of people perform billions of searches each day to find information on the internet. In most cases, the search results are in the form of text documents or web site links (URLs). However, video search (or search for videos uploaded to a network) is also not behind. With thousands of people uploading their videos every day, video search has become a popular method amongst people looking for a search experience which extends beyond written words. In fact; in some instances, a video is better than a text response. For example, a user looking for a method of wearing a tie would be much better served by a video describing the "tie wearing" process rather than a text document. To provide another example, a user looking to assemble a sofa or a game set would appreciate a video describing the process more than a tedious text manual.
Realizing the increasing user preference to search for videos rather than text documents (or web pages), whether it is for information or recreation, a number of mechanisms have been proposed to perform a video search. However, it has been identified that in most cases, the video search results do not provide results that match the user intent. The search results are either too generic or plainly vague, thereby leaving a user with an unsatisfied experience. This is mainly because the results are provided using simple keyword matching without a deeper understanding of the user intent or query. Needless to say, this is not an ideal situation from a user's perspective.
Embodiments of the present solution provide a method and system for performing a video search on a computer network (such as intranet or the internet) using a search engine (for example, a video search engine).
FIG. 1 shows a flow chart of a method of performing a video search, according to an embodiment.
The method may be implemented in a computing system, such as, but not limited to, a desktop computer, a notebook computer, a server computer, a personal digital assistant (PDA), a mobile device, a touch pad, a television (TV) set, a docking device, and the like. The computing system may be connected to a computer network, such as, an intranet or the internet (World Wide Web), through wired (for example, coaxial cable) or wireless (for example, Wi-Fi) means.
At block 110, a computing system receives a video search query from a user. In an example, the computing system has an input interface to receive the video search query from the user. The input interface includes a software interface (such as a graphical user interface (GUI)) and/or a hardware interface for providing a search input (such as a keyboard). In an example, the video search query is a text based search query comprising a keyword or a string of keywords, which a user identifies. For example, if a user intends to search for "How to assemble a computer", he or she may provide the complete query string i.e. "How to assemble a computer", or a string of keywords "assembling a computer", and the like. The choice of query keywords lies with the user.
In another example, the video search query is a voice (speech) based search query comprising a keyword or a string of keywords. For example, if a user intends to search for "How to assemble a computer", he or she may provide the query string by providing a speech command to the computing system. In such instance, the computing device is provided with a speech capture device and speech recognition means. The computing device may also have a speech-to-text conversion computer program (a set of machine readable instructions which can be executed by a processor of the computing system).
Upon receipt of the video search query, the computing system analyses the video search query to identify a first set of query terms. This part of the method may be termed as "question processing". The "question processing" part may be implemented by a question processing module which, in one example, may reside in the memory of the computing system. During question processing, the method analyses the video search query from a user and tries to find the information need of the user. The idea is to identify those keywords in the search query which replicate the search intent of the user. This is carried out by performing a part-of-speech tagging (on the search query) to identify two types of words in the search query.
The first type of words is called "noun phrases". It includes all nouns and proper nouns in the search query. The second type of words is called "Focus words". It includes all nouns, proper nouns, non-trivial verbs, adjectives and numbers.
The words identified as "noun phrases" and/or "focus words" are recognized as a first set of query terms. To provide an example, in a video search query "How to repair a computer", terms "how", "repair" and "computer" would be identified as a first set of query terms. To provide another example, in a video search query "How to wear a tie", terms "how", "wear" and "tie" would be acknowledged as a first set of query terms.
At block 120, after a first set of query terms have been identified, the computing system, in an example, uses the "noun phrases" from the first set of query terms to query a knowledge repository. In another example, the first set of query terms are used to query a knowledge repository.
A knowledge repository may be defined as a computerized system (or database) that systematically captures, organizes and categorizes knowledge in the form of a collection of electronic documents. The repository can be searched and data is retrievable. To provide an illustration, an online encyclopedia, such as Wikipedia or Britannica Online, is a knowledge repository.
The practice of using the first set of query terms to query a knowledge repository may be termed as "question understanding" part of the method. The "question understanding" part may be implemented by a question understanding module which, in one example, may reside in the memory of the computing system.
The query to a knowledge repository results in identifying electronic document(s) that corresponds to the first set of query terms. In an example, the "noun phrases" from the first set of query terms are used to query the Wikipedia repository to identify an electronic document(s) corresponding to the "noun phrases". The process involves using the Wikipedia search API (Application Programming Interface) to query the Wikipedia repository. For example a query "How did the Universe originate" might give the Wikipedia page on Big Bang (http://en.wikipedia.org/wiki/Big_Bang)
An electronic document(s) corresponding to the first set of query terms is identified using Wikipedia categories. Wikipedia uses a category system, which provides links to all Wikipedia articles in the form of a hierarchy of categories. The categories allow articles to be placed in one or more groups, and allow those groups to be further categorized. Each article in Wikipedia belongs to at least one category. There are two kinds of categories in Wikipedia. Topic categories are named after a topic and usually share a name with the Wikipedia article on that topic. For example, category "Cricket" would contain all articles related to cricket. Set categories are created for a class of object. For example, category "Wines of France" contains articles whose subjects are wines of France.
In an example, an electronic document(s) corresponding to a first set of query terms may include a web page(s). However, in other instances, an electronic document may include a document containing text, audio and/or video.
At block 130, once an electronic document(s) corresponding to a first set of query terms has been indentified, it is passed to a regular expression based parser to extract the following information.
(a) Section headings: These include list of all section headings in the identified electronic document.
(b) Sub-section headings: These include list of all sub-section headings in the identified electronic document.
(c) Hyperlinks: These include all hyperlinks in the identified electronic document.
(d) Important noun phrases: These include all those noun phrases in the electronic document, which are not present in the section headings, subsection headings and hyperlinks of the identified electronic document.
In an example, the extracted information (Section headings, Sub-section headings, hyperlinks and important noun phrases) is combined to form a second set of query terms. In another example, only some sections (or terms) of the extracted information is merged to obtain a second set of query terms. In one instance, duplicate terms are also removed to form a neat second set of query terms. It is to be noted that the phrase "query term" as used herein, in this document, may include one word or a set of words.
At block 140, the query terms in the second set are ranked. This part of the method may be termed as "question term ranking". The "question term ranking" part may be implemented by a question term ranking module which, in one example, may reside in the memory of the computing system. During question term ranking, a weighting mechanism is used to assign weights to the query terms. Weighting may be carried out in different ways. Some of these means are described below. In one example, the final weight given to a query term is calculated by adding the individual weights assigned through different weighing methods.
The weighting may be done as follows. (1) Section/Sub-section headings: More weight is given to a query term present in a sub-section heading than to a term in a section heading. This is based on the premise that comparatively sub-section headings represent actual topic than section headings. (2) Word overlap: The extent of overlap between: (a) a query term and focus words and (b) query term and page title (for example, Wikipedia page title) is computed. A higher overlap indicates a more relevant query, and therefore a higher weight. (3) Hyperlinks: the hyperlinks in the electronic document are individually given a weight (as they represent, in case of Wikipedia repository, Wiki concepts). If a query term is present in a hyperlink, it is considered relatively more important and given a higher weight. (4) Important sections ranker: All section and sub-section headings with even a single non-zero word overlap with noun phrases are considered as important sections or sub-sections. Query terms which are present in a text associated with an important section or sub-section are given higher weight since they are more relevant. In other words, the method recognizes those section and sub-section headings of the electronic document which share at least one common term with the second set of query terms, and upon recognition assigning relatively more weight to those query terms which are present in a text associated with aforesaid section and sub-section headings of the electronic document.
Either or all of the above methods may be used to assign weights to the query terms. A final weight (to a query term) may be given by adding the individual weights assigned through different weighing methods. Assigning a final weight to the query terms (of the second set) results in a ranked list of query terms.
At block 150, top IM (where N = 1, 2, 3, 4...) ranked query terms are provided as an input to a video search engine. Selecting a value for N may be system determined or user defined. In an example, the video search engine may be accessed via a web browser such as Windows Internet Explorer, Mozilla Firefox, Google Chrome, Opera, etc. A non- limiting example of video search includes YouTube.
The video search engine displays the results of top N ranked query terms to the user on a display coupled to the computing system.
In an example, prior to a display of video query search results, the search results are sorted on the basis of video coverage, diversity and relevance.
FIG. 2 illustrates a system for performing a video search, according to an embodiment.
The system 200 includes a computing system 210 connected to a computer network 270. The computing system 210 may be, but not limited to, a desktop computer, a notebook computer, a server computer, a personal digital assistant (PDA), a mobile device, a touch pad, a television (TV) set, a docking device, and the like.
Computing system 210 may include a processor 220, for executing machine readable instructions, a memory (storage medium) 230, for storing machine readable ' instructions (such as, a web browser module), an input interface 240 and a display 250. These components may be coupled together through a system bus 260.
Processor 220 is arranged to execute machine readable instructions. The machine readable instructions may be in the form of a web browser module 240. In an example, processor 220 executes machine readable instructions to: identify a first set of query terms from the video search query; use the first set of query terms to query a knowledge repository, wherein the knowledge repository is a collection of electronic documents; identify an electronic document corresponding to the first set of query terms; parse the electronic document to obtain a second set of query terms; rank query terms obtained in the second set of query terms, by assigning a weight to the query terms; and provide top N ranked query terms to a video search engine. The memory 230 may include computer system memory such as, but not limited to, SDRAM (Synchronous DRAM), DDR (Double Data Rate SDRAM), Rambus DRAM (RDRAM), Rambus RAM, etc. or storage memory media, such as, a floppy disk, a hard disk, a CD-ROM, a DVD, a pen drive, etc. The memory 230 may include modules, such as, but not limited to, a web browser module 240.
The web browser module may be used to provide video search query terms to a video search engine. Some major web browser modules include Windows Internet Explorer, Mozilla Firefox, Google Chrome, and Opera.
The input interface 240 may be used to provide an initial seed set input to the computing system 210. The input interface 240 may include an input device, such as a keyboard or a mouse, and other user interaction mechanisms, such as a touch interface, a voice interface (such as microphone), a gesture interface, etc. The input interface also includes a software interface (such as a graphical user interface (GUI)). In an example, input interface 240 is used to receive a video search query from a user.
The display device 250 may be any device that enables a user to receive visual feedback. For example, the display may be a liquid crystal display (LCD), a light-emitting diode (LED) display, a plasma display panel, a television, a computer monitor, and the like.
The computer network 270 may be the internet or an intranet. The computing system 210 may be connected to a computer network 270, such as, an intranet or the internet (World Wide Web), through wired (for example, co-axial cable) or wireless (for example, Wi-Fi) means. A network interface controller 280 is used to connect the computing system 210 to the computer network 270.
It is clarified that the term "module", as used in this document, may mean to include a software component, a hardware component or a combination thereof. A module may include, by way of example, components, such as software components, processes, functions, attributes, procedures, drivers, firmware, data, databases, and data structures. The module may reside on a volatile or non-volatile storage medium and configured to interact with a processor of a computer system. It would be appreciated that the system components depicted in FIG. 2 are for the purpose of illustration only and the actual components may vary depending on the computing system and architecture deployed for implementation of the present solution. The various components described above may be hosted on a single computing system or multiple computer systems, including servers, connected together through suitable means.
In one example, during an operative phase, the computing system 210 is connected to a search engine portal through a network, such as the internet, and a user provides an input video search query to a video search engine through a web browser stored on the computing system 210. The proposed solution may be implemented on the computing system 210 or another computing device such as a server computer used to host a search engine portal.
It will be appreciated that the embodiments within the scope of the present solution may be implemented in the form of a computer program product including computer-executable instructions, such as program code, which may be run on any suitable computing environment in conjunction with a suitable operating system, such as Microsoft Windows, Linux or UNIX operating system. Embodiments within the scope of the present solution may also include program products comprising computer- readable media for carrying or having computer-executable instructions or data structures stored thereon. Such computer-readable media can be any available media that can be accessed by a general purpose or special purpose computer. By way of example, such computer-readable media can comprise RAM, ROM, EPROM, EEPROM, CD-ROM, magnetic disk storage or other storage devices, or any other medium which can be used to carry or store desired program code in the form of computer-executable instructions and which can be accessed by a general purpose or special purpose computer. It should be noted that the above-described embodiment of the present solution is for the purpose of illustration only. Although the solution has been described in conjunction with a specific embodiment thereof, numerous modifications are possible without materially departing from the teachings and advantages of the subject matter described herein. Other substitutions, modifications and changes may be made without departing from the spirit of the present solution.

Claims

We claim:
1. A computer-implemented method of performing a video search, comprising: analyzing a search query, to identify a first set of query terms; using the first set of query terms to query a knowledge repository, wherein the knowledge repository is a collection of electronic documents; identifying an electronic document corresponding to the first set of query terms; parsing the electronic document to obtain a second set of query terms; ranking query terms obtained in the second set of query terms, by assigning a weight to the query terms; and providing top N ranked query terms to a video search engine.
2. The method of claim 1, wherein analyzing a text string search query, to identify a first set of query terms, includes identifying noun phrases and focus words in the text string search query, wherein the noun phrases include nouns and proper nouns, and the focus words include nouns, proper nouns, non-trivial verbs, adjectives and numerals.
3. The method of claim 1, wherein parsing the electronic document to obtain a second set of query terms includes: obtaining section headings present in the electronic document; obtaining sub-section headings present in the electronic document- obtaining hyperlinks present in the electronic document; and obtaining noun phrases present in the electronic document, wherein the noun phrases are those which are not present in the section headings, the sub-section headings, and the hyperlinks of the electronic document.
4. The method of claim 3, further comprising combining the section headings, the subsection headings, the hyperlinks, and said noun phrases to obtain the second set of query terms.
5. The method of claim 3, further comprising removing a duplicate entry.
6. The method of claim 1, wherein assigning a weight to the query terms, includes: assigning relatively more weight to a query term present in the sub-section headings of the electronic document than to a query term present in the section headings of the electronic document; assigning relatively more weight to a query term present in the hyperlinks of the electronic document than otherwise; and recognizing those section and sub-section headings of the electronic document which share at least one common term with the second set of query terms, and upon recognition assigning relatively more weight to those query terms which are present in a text associated with aforesaid section and sub-section headings of the electronic document;
7. The method of claim 1, wherein identifying an electronic document corresponding to the first set of query terms includes identifying an electronic document whose title corresponds to the first set of query terms.
8. A system for performing a video search, comprising: a user interface to obtain a video search query; and a processor programmed to: identify a first set of query terms from the video search query; use the first set of query terms to query a knowledge repository, wherein the knowledge repository is a collection of electronic documents; identify an electronic document corresponding to the first set of query terms; parse the electronic document to obtain a second set of query terms; rank query terms obtained in the second set of query terms, by assigning a weight to the query terms; and provide top N ranked query terms to a video search engine.
9. The system of claim 8, wherein to identify a first set of query terms includes identifying noun phrases and focus words in the text string search query, wherein the noun phrases include nouns and proper nouns, and the focus words include nouns, proper nouns, non-trivial verbs, adjectives and numerals.
10. The system of claim 8, wherein to parse the electronic document to obtain a second set of query terms includes: obtaining section headings present in the electronic document; obtaining sub-section headings present in the electronic document; obtaining hyperlinks present in the electronic document; and obtaining noun phrases present in the electronic document, wherein the noun phrases are those which are not present in the section headings, the sub-section headings, and the hyperlinks of the electronic document.
11. The system of claim 8, wherein to assign a weight to the query terms, includes: assigning relatively more weight to a query term present in the sub-section headings of the electronic document than to a query term present in the section headings of the electronic document; assigning relatively more weight to a query term present in the hyperlinks of the electronic document than otherwise; and recognizing those section and sub-section headings of the electronic document which share at least one common term with the first set of query terms, and upon recognition assigning relatively more weight to those query terms which are present in a text associated with aforesaid section and sub-section headings of the electronic document;
12. The system of claim 8, further comprising a display screen to display video search results provided by the video search engine.
13. The system of claim 8, wherein the knowledge repository is an external or an internal repository.
14. The method of claim 8, wherein the search query is a text input or a speech input.
15. A computer program product for performing a video search, the computer program product comprising: a computer readable storage medium having computer usable program code embodied therewith, the computer usable program code comprising: computer usable program code that analyzes a search query, to identify a first set of query terms; computer usable program code that uses the first set of query terms to query a knowledge repository, wherein the knowledge repository is a collection of electronic documents; computer usable program code that identifies an electronic document corresponding to the first set of query terms; computer usable program code that parses the electronic document to obtain a second set of query terms; computer usable program code that ranks query terms obtained in the second set of query terms, by assigning a weight to the query terms; and computer usable program code that provides top N ranked query terms to a video search engine.
PCT/IN2012/000133 2012-02-27 2012-02-27 Video search WO2013128462A1 (en)

Priority Applications (4)

Application Number Priority Date Filing Date Title
CN201280070193.7A CN104106064A (en) 2012-02-27 2012-02-27 Video search
PCT/IN2012/000133 WO2013128462A1 (en) 2012-02-27 2012-02-27 Video search
US14/373,493 US20140379731A1 (en) 2012-02-27 2012-02-27 Video search
EP12869857.8A EP2820568A4 (en) 2012-02-27 2012-02-27 Video search

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/IN2012/000133 WO2013128462A1 (en) 2012-02-27 2012-02-27 Video search

Publications (1)

Publication Number Publication Date
WO2013128462A1 true WO2013128462A1 (en) 2013-09-06

Family

ID=49081747

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IN2012/000133 WO2013128462A1 (en) 2012-02-27 2012-02-27 Video search

Country Status (4)

Country Link
US (1) US20140379731A1 (en)
EP (1) EP2820568A4 (en)
CN (1) CN104106064A (en)
WO (1) WO2013128462A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20210115789A (en) * 2020-03-16 2021-09-27 네이버 주식회사 Method and system for retrieving videos based on content analysis

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8745684B1 (en) 2006-08-08 2014-06-03 CastTV Inc. Facilitating video search
US10277953B2 (en) 2016-12-06 2019-04-30 The Directv Group, Inc. Search for content data in content
CN109299376B (en) * 2018-10-26 2021-01-01 深圳点猫科技有限公司 Fuzzy search method and device based on education cloud operating system

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102033937A (en) * 2010-12-20 2011-04-27 百度在线网络技术(北京)有限公司 Method and system for displaying video search result
CN102117331A (en) * 2011-03-07 2011-07-06 北京百度网讯科技有限公司 Video search method and system
CN102253994A (en) * 2011-07-08 2011-11-23 宇龙计算机通信科技(深圳)有限公司 Automatic searching device and method

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101251841B (en) * 2007-05-17 2011-06-29 华东师范大学 Method for establishing and searching feature matrix of Web document based on semantics
EP2192503A1 (en) * 2008-11-21 2010-06-02 BRITISH TELECOMMUNICATIONS public limited company Optimised tag based searching
US8463786B2 (en) * 2010-06-10 2013-06-11 Microsoft Corporation Extracting topically related keywords from related documents

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102033937A (en) * 2010-12-20 2011-04-27 百度在线网络技术(北京)有限公司 Method and system for displaying video search result
CN102117331A (en) * 2011-03-07 2011-07-06 北京百度网讯科技有限公司 Video search method and system
CN102253994A (en) * 2011-07-08 2011-11-23 宇龙计算机通信科技(深圳)有限公司 Automatic searching device and method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP2820568A4 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20210115789A (en) * 2020-03-16 2021-09-27 네이버 주식회사 Method and system for retrieving videos based on content analysis
KR102353338B1 (en) * 2020-03-16 2022-01-19 네이버 주식회사 Method and system for retrieving videos based on content analysis

Also Published As

Publication number Publication date
CN104106064A (en) 2014-10-15
EP2820568A4 (en) 2015-09-23
US20140379731A1 (en) 2014-12-25
EP2820568A1 (en) 2015-01-07

Similar Documents

Publication Publication Date Title
US11176124B2 (en) Managing a search
US11250214B2 (en) Keyphrase extraction beyond language modeling
US9864808B2 (en) Knowledge-based entity detection and disambiguation
US11372941B2 (en) Search result filters from resource content
JP5497022B2 (en) Proposal of resource locator from input string
US10146862B2 (en) Context-based metadata generation and automatic annotation of electronic media in a computer network
US9336277B2 (en) Query suggestions based on search data
US10642905B2 (en) System and method for ranking search engine results
US9336318B2 (en) Rich content for query answers
US20140019460A1 (en) Targeted search suggestions
US20110179021A1 (en) Dynamic keyword suggestion and image-search re-ranking
US9727617B1 (en) Systems and methods for searching quotes of entities using a database
US9251274B2 (en) Grouping search results into a profile page
US10176260B2 (en) Measuring semantic incongruity within text data
US10339191B2 (en) Method of and a system for processing a search query
US10691746B2 (en) Images for query answers
US20090119283A1 (en) System and Method of Improving and Enhancing Electronic File Searching
CN105912662A (en) Coreseek-based vertical search engine research and optimization method
WO2022134824A1 (en) Tuning query generation patterns
WO2018125342A1 (en) Structured machine learning for improved selection of information for informational displays
US20140379731A1 (en) Video search
US11853331B2 (en) Specialized search system and method for matching a student to a tutor
US20100332491A1 (en) Method and system for utilizing user selection data to determine relevance of a web document for a search query
US10339148B2 (en) Cross-platform computer application query categories
US20070174266A1 (en) Method of optimization of listed result of internet-based search and system based on the method

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 12869857

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 14373493

Country of ref document: US

WWE Wipo information: entry into national phase

Ref document number: 2012869857

Country of ref document: EP

NENP Non-entry into the national phase

Ref country code: DE