WO2014029338A1 - 多媒体信息检索方法及电子设备 - Google Patents

多媒体信息检索方法及电子设备 Download PDF

Info

Publication number
WO2014029338A1
WO2014029338A1 PCT/CN2013/081992 CN2013081992W WO2014029338A1 WO 2014029338 A1 WO2014029338 A1 WO 2014029338A1 CN 2013081992 W CN2013081992 W CN 2013081992W WO 2014029338 A1 WO2014029338 A1 WO 2014029338A1
Authority
WO
WIPO (PCT)
Prior art keywords
multimedia
identification codes
retrieved
electronic device
retrieval
Prior art date
Application number
PCT/CN2013/081992
Other languages
English (en)
French (fr)
Inventor
胡鹏
张腾
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Priority to EP13831356.4A priority Critical patent/EP2889786A4/en
Priority to JP2015523408A priority patent/JP5948671B2/ja
Publication of WO2014029338A1 publication Critical patent/WO2014029338A1/zh
Priority to US14/613,989 priority patent/US9704485B2/en

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/60Information retrieval; Database structures therefor; File system structures therefor of audio data
    • G06F16/63Querying
    • G06F16/638Presentation of query results
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/40Information retrieval; Database structures therefor; File system structures therefor of multimedia data, e.g. slideshows comprising image and additional audio data
    • G06F16/43Querying
    • G06F16/432Query formulation
    • G06F16/433Query formulation using audio data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/60Information retrieval; Database structures therefor; File system structures therefor of audio data
    • G06F16/68Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/683Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/60Information retrieval; Database structures therefor; File system structures therefor of audio data
    • G06F16/68Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/683Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/685Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using automatically derived transcript of audio data, e.g. lyrics
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/242Dictionaries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/268Morphological analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/04Segmentation; Word boundary detection
    • G10L15/05Word boundary detection
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue

Definitions

  • the present invention relates to the field of information retrieval, and in particular to a multimedia information retrieval method and an electronic device.
  • the object of the present invention is to provide a multimedia information retrieval method and an electronic device based on lyric content recognition, which can automatically, quickly and comprehensively present multimedia information that a user wants to know to a user, thereby greatly improving user retrieval efficiency and retrieval. Success rate.
  • the technical problem that the existing multimedia retrieval process is complicated and error-prone is solved.
  • the invention provides a multimedia information retrieval method, which comprises the steps of:
  • the invention also provides an electronic device comprising:
  • One or more processors are One or more processors;
  • a voice extraction module configured to extract, from the multimedia to be retrieved, the voice of the multimedia to be retrieved
  • a voice recognition module configured to perform voice recognition on the voice of the multimedia to be retrieved to obtain the recognized text
  • a retrieval module configured to perform retrieval on the multimedia database according to the identification text, to obtain multimedia information of the multimedia to be retrieved.
  • the invention also provides an electronic device comprising:
  • One or more processors are One or more processors;
  • a multimedia database for storing multimedia information
  • a downloading module configured to download the multimedia to be retrieved by using a download link of the multimedia to be retrieved
  • a voice extraction module configured to extract, from the multimedia to be retrieved, the voice of the multimedia to be retrieved
  • a voice recognition module configured to perform voice recognition on the voice of the multimedia to be retrieved to obtain the recognized text
  • a retrieval module configured to perform, on the multimedia database, the retrieval according to the identification text to obtain the multimedia information of the multimedia to be retrieved.
  • the multimedia information retrieval method and the electronic device of the invention can automatically, quickly and comprehensively present the multimedia information that the user wants to know to the user, thereby greatly improving the retrieval efficiency and the retrieval success of the user. rate.
  • the technical problem that the existing multimedia retrieval process is complicated and error-prone is solved.
  • FIG. 1 is a schematic structural view of a preferred embodiment of an electronic device of the present invention
  • FIG. 2 is a schematic structural view of a preferred embodiment of an electronic device of the present invention.
  • FIG. 3 is a flow chart of a preferred embodiment of a multimedia information retrieval method of the present invention.
  • step 303 is a detailed flowchart of step 303 in the multimedia information retrieval method shown in FIG. 1;
  • FIG. 5 is a schematic diagram of the use of the multimedia information retrieval method of the present invention at one end of a server
  • FIG. 6 is a schematic diagram of the use of the multimedia information retrieval method of the present invention at one end of a server
  • FIG. 7 is a schematic structural diagram of a working environment of an electronic device according to the present invention.
  • the principles of the present invention operate using many other general purpose or special purpose computing, communication environments, or configurations.
  • Examples of computing systems, environments, and configurations that are well known for use with the present invention may include, but are not limited to, mobile phones, personal computers, servers, multi-processor systems, microcomputer-based systems, mainframe computers, and distributions.
  • a computing environment including any of the above systems or devices.
  • FIG. 1 is a schematic structural diagram of a preferred embodiment of an electronic device according to the present invention.
  • the electronic device includes a voice extraction module 11, a voice recognition module 12, a retrieval module 13, and a multimedia presentation module 14.
  • the voice extraction module 11 is configured to extract the voice of the multimedia to be retrieved from the multimedia to be retrieved;
  • the voice recognition module 12 is configured to perform voice recognition on the voice to be retrieved to obtain the recognized text;
  • the retrieval module 13 is configured to use the recognized text on the multimedia database.
  • the retrieval is performed to obtain multimedia information of the multimedia to be retrieved;
  • the multimedia presentation module 14 is configured to display the multimedia information to the user.
  • the retrieval module 13 includes a pending identification code determining unit, an identification code determining unit, and a retrieval unit.
  • the pending identification code determining unit is configured to perform word segmentation processing on the recognized text according to the preset thesaurus to obtain a plurality of pending identification codes; the identification code determining unit is configured to determine the plurality of identification codes according to the word frequency of the to-be-identified identification code in the thesaurus.
  • the retrieval unit is configured to perform retrieval on the multimedia database using a plurality of identification codes to obtain multimedia information of the multimedia to be retrieved.
  • the user listening to the song plays the local multimedia through the music player or downloads the network multimedia to the local broadcast. If the user wants to know the multimedia information of the multimedia, the voice extraction module 11 extracts the voice of the voice from the multimedia.
  • the voice recognition module 12 performs voice recognition on the voice of the voice, obtains the recognized text, and sends the recognized text to the to-be-identified identification code determining unit of the retrieval module 13; the retrieval module 13
  • the pending identification code determining unit performs word segmentation processing on the recognized text according to the preset vocabulary to obtain a plurality of pending identification codes; and then the identification code determining unit of the retrieval module 13 determines the pending identification code determined by the pending identification code determining unit in the thesaurus.
  • the retrieval unit of the retrieval module 13 determines a plurality of identification codes, and transmitting the determined plurality of identification codes to the retrieval unit of the retrieval module 13; the retrieval unit of the retrieval module 13 requests the multimedia database to perform retrieval according to the plurality of identification codes described above, and obtain corresponding Multimedia information; final multimedia display module 14 Retrieval unit retrieves the multimedia information presented to the user (here, of course, other ways may also be multimedia information back to the user).
  • the specific working principle of the electronic device of the present invention is the same as or similar to the specific embodiment of the multimedia information retrieval method below. For details, refer to the following specific embodiment of the multimedia information retrieval method.
  • the modules of the electronic device of the present invention may be integrated with each other, or a module may be split into a plurality of modules having independent functions, and the modules may be directly connected or indirectly connected.
  • FIG. 2 is a schematic structural diagram of a preferred embodiment of the electronic device of the present invention.
  • the electronic device includes a multimedia database 21, a download module 22, a voice extraction module 23, a voice recognition module 24, a retrieval module 25, a feedback module 26, and an association module 27.
  • the multimedia database 21 is configured to store multimedia information; the downloading module 22 is configured to download the multimedia to be retrieved through a download link of the multimedia to be retrieved; the voice extraction module 23 is configured to extract the voice of the multimedia to be retrieved from the multimedia to be retrieved; the voice recognition module 24 The speech for the multimedia to be retrieved is subjected to speech recognition to obtain the recognized text; the retrieval module 25 is configured to perform retrieval on the multimedia database 21 according to the recognized text to obtain multimedia information of the multimedia to be retrieved; and the feedback module 26 is configured to feed back the multimedia information. To the user; the association module 27 is configured to associate the download link obtained by the download module 22 with the corresponding multimedia information.
  • the retrieval module 25 includes a pending identification code determining unit, an identification code determining unit, and a retrieval unit.
  • the pending identification code determining unit is configured to perform word segmentation processing on the recognized text according to the preset thesaurus to obtain a plurality of pending identification codes; the identification code determining unit is configured to determine the plurality of identification codes according to the word frequency of the to-be-identified identification code in the thesaurus.
  • the retrieval unit is configured to perform retrieval on the multimedia database 21 using a plurality of identification codes to obtain multimedia information of the multimedia to be retrieved.
  • the electronic device of the present invention integrates the multimedia information retrieval device (ie, the electronic device shown in FIG. 1) and the multimedia database 21 on the server side, so that the user only needs to send the multimedia download link to the electronic device, and the electronic device can be retrieved.
  • the multimedia multimedia information is fed back to the listening user, which greatly facilitates the operation of the listening user.
  • the user who listens to the song accesses the music page through the browser. For example, if the user who listens to the song wants to know the multimedia information of the multimedia on the page, the download module 22 of the electronic device downloads the multimedia according to the download link of the multimedia.
  • the voice extraction module 23 extracts the voice of the voice from the multimedia, and sends the voice of the voice to the voice recognition module 24; the voice recognition module 24 performs voice recognition on the voice of the voice to obtain the recognized text, and Sending the identification text to the pending identification code determining unit of the retrieval module 25; the pending identification code determining unit of the retrieval module 25 performs word segmentation processing on the recognized text according to the preset vocabulary to obtain a plurality of pending identification codes; and then the retrieval module 25
  • the identification code determining unit determines a plurality of identification codes according to the word frequency of the pending identification code determined by the to-be-determined identification code determining unit in the thesaurus, and transmits the determined plurality of identification codes to the retrieval unit of the retrieval module 25;
  • the retrieval unit requests the multimedia database 21 for multiple recognition generations according to the above Searching, to give the corresponding multimedia information; final feedback module 26 to the retrieval unit retrieves multimedia information feedback to the user.
  • the electronic device of the present invention further includes an association module 27 that associates the download link obtained by the download module 22 with the corresponding multimedia information.
  • an association module 27 that associates the download link obtained by the download module 22 with the corresponding multimedia information.
  • the specific working principle of the electronic device of the present invention is the same as or similar to the specific embodiment of the multimedia information retrieval method below. For details, refer to the following specific embodiment of the multimedia information retrieval method.
  • the modules of the electronic device of the present invention may be integrated with each other, or a module may be split into a plurality of modules having independent functions, and the modules may be directly connected or indirectly connected.
  • FIG. 3 is a flowchart of a preferred embodiment of the multimedia information retrieval method of the present invention.
  • the multimedia information retrieval method includes:
  • Step S301 extracting a voice of a multimedia to be retrieved from the multimedia to be retrieved;
  • Step S302 performing speech recognition on the speech of the multimedia to be retrieved to obtain the recognized text
  • Step S303 Perform a search on the multimedia database according to the recognized text to obtain multimedia information of the multimedia to be retrieved;
  • Step S304 displaying multimedia information to the user
  • FIG. 4 is a detailed flowchart of step 303 in the multimedia information retrieval method shown in FIG.
  • step S301 the voice of the multimedia to be retrieved is extracted from the multimedia to be retrieved.
  • the voice of the vocal lead singer is mainly separated from the multimedia voice, and the specific separation process can be based on a speech separation method such as an auditory scene analysis technique or a blind signal separation technique, so that the voice signal of the vocal lead vocal can be output.
  • a speech separation method such as an auditory scene analysis technique or a blind signal separation technique
  • the full multimedia length can be selected, or only one segment of the multimedia can be selected. The longer the multimedia speech is usually selected, the more computing resources are consumed, but the more information is provided for subsequent steps, which facilitates subsequent steps for accurate retrieval.
  • step S302 the speech of the multimedia to be retrieved is speech-recognized to obtain the recognized text.
  • the voice signal of the vocal lead singer is mainly converted into the recognized text, that is, the vocabulary content in the human voice is converted into a computer readable input, such as a key, a binary code or a character sequence.
  • the identification text includes a plurality of identification codes including, but not limited to, Chinese characters, Chinese words, pinyin, English characters, and/or English words.
  • the specific speech recognition process can adopt a speech recognition method such as a statistical pattern recognition technology. Since speech recognition itself may have an uncertain error rate, the probability of subsequent multimedia retrieval errors caused by speech recognition errors can be reduced by increasing the time for extracting multimedia speech.
  • step S303 specifically includes:
  • Step S3031 Perform word segmentation processing on the recognized text according to the preset vocabulary to obtain a plurality of pending identification codes.
  • the recognition text is directly recognized for retrieval, and the final retrieval may fail due to some small recognition errors. Therefore, the word segmentation processing is performed on the recognized text through the preset thesaurus, and a plurality of small units in the recognized text are obtained, that is, the identification code to be determined.
  • the to-be-identified identification code includes, but is not limited to, Chinese characters, Chinese words, pinyin, English characters, and/or English words, etc., and is specifically divided according to the type of the preset thesaurus. If the preset vocabulary only supports Chinese characters and words, it only divides the recognized text into multiple Chinese characters or Chinese word type pending identification codes; if the preset vocabulary supports Chinese and Pinyin, the multimedia database also supports pinyin. If you search, you can search some unrecognizable Chinese characters in pinyin, which can further ensure the quality of the search and avoid retrieval errors. For example, the preset vocabulary supports both Chinese and English, so that some can have English. Multimedia or pure English multimedia for direct retrieval.
  • Step S3032 determining a plurality of identification codes according to the word frequency of the to-be-identified identification code in the thesaurus.
  • Each pending identification code selected from the thesaurus has a word frequency attribute in the thesaurus.
  • the word frequency of the pending identification code indicates how frequently the to-be-identified identification code is used in daily life (the thesaurus is for people)
  • the language frequently used on the network is summarized and summarized. The more frequently the identification code is used in daily life, the larger the word frequency of the pending identification code, otherwise the word frequency of the pending identification code is smaller.
  • some pending identification codes are selected as the last identification code for multimedia retrieval according to the word frequency of the pending identification code in the thesaurus. The specific process is as follows:
  • the pending identification codes are arranged according to the word frequency of the pending identification code in the lexicon from low to high; then the n pending identification codes with the lowest word frequency are selected; then m pending identification codes are randomly selected, such that n pending identification codes and the m The pending identification codes are not repeated; the last n pending identification codes are set as identification codes with the m pending identification codes, where n is greater than or equal to 1, and m is greater than or equal to zero.
  • the n pending identification codes can guarantee the accuracy of the retrieval result to a certain extent because of the low frequency of the words, and the m pending identification codes can avoid the retrieval failure due to the low word frequency pending identification code to some extent (here)
  • the search is a fuzzy search, which satisfies a certain degree of search conditions, that is, feedback to the search user).
  • the n and m values here can be flexibly set according to the user's requirements.
  • Step S3033 Searching on the multimedia database using the plurality of identification codes determined above to obtain multimedia information of the multimedia to be retrieved.
  • the retrieval process here can be adjusted according to the retrieval result. If the retrieval feedback result is too much, a new identification code can be added from the pending identification code correspondingly; or the retrieval condition can be refined, and the retrieval result satisfying the retrieval condition can be fed back to the user. . If the feedback retrieval fails, the number of identification codes is correspondingly reduced and then retrieved again to avoid the retrieval failure caused by the identification code generated by the speech recognition error.
  • the specific retrieval process here can be set according to the actual situation, and the retrieval of the specific process does not affect the protection scope of the present invention.
  • the multimedia database can be a local multimedia database located on the local computer or a network multimedia database of servers located on the network.
  • step S304 the multimedia information of the retrieved multimedia is presented to the user.
  • the multimedia information here may be various related information about the multimedia retrieved by the user, including but not limited to the title, singer, lyrics, album, background information, song score and multimedia download link of the multimedia; for some users You can also show the multimedia download link, such as the free download link of online multimedia on the Internet, and the free free link of the user's local multimedia.
  • FIG. 5 is a schematic diagram of the use of the multimedia information retrieval method of the present invention at the user end.
  • speech extraction, speech recognition, and result presentation are generated at the user end, and the multimedia database for retrieval is placed at the retrieval server end, and the retrieval server only assumes the role of retrieval.
  • FIG. 6 is a schematic diagram of the use of the multimedia information retrieval method of the present invention at the server end.
  • the search server here not only performs multimedia retrieval, but also performs multimedia downloading, voice extraction, voice recognition, and result feedback, which further facilitates user operations.
  • the storage medium may be a magnetic disk, an optical disk, or a read-only storage memory (Read-Only) Memory, ROM) or Random Access Memory (RAM).
  • the multimedia information retrieval method, the electronic device and the storage medium of the invention can automatically, quickly and comprehensively present the multimedia information that the user wants to know to the user, which greatly increases the interest experience of the user listening to the song.
  • the technical problem that the existing multimedia retrieval process is complicated and error-prone is solved.
  • the electronic device of the invention facilitates the user to perform multimedia retrieval on the computer, and the electronic device can perform multimedia retrieval on the server according to the request of the user, and only feeds the retrieval result to the user, which further facilitates the operation of the user.
  • ком ⁇ онент can be, but is not limited to being, a process running on a processor, a processor, an object, an executable application, a thread of execution, a program, and/or a computer.
  • a component can be, but is not limited to being, a process running on a processor, a processor, an object, an executable application, a thread of execution, a program, and/or a computer.
  • an application running on a controller and the controller can be a component.
  • One or more components can reside within a process and/or thread of execution, and a component can be located on a computer and/or distributed between two or more computers.
  • the claimed subject matter can be implemented as a method, apparatus, or article of manufacture that uses standard programming and/or engineering techniques to produce software, firmware, hardware, or any combination thereof, to control a computer to implement the disclosed subject matter.
  • article of manufacture as used herein is intended to encompass a computer program accessible from any computer-readable device, carrier, or media.
  • Example electronic device 712 includes, but is not limited to, a personal computer, a server computer, a handheld or laptop device, a mobile device (such as a mobile phone, a personal digital assistant (PDA), a media player, etc.), a multi-processor system, a consumer Electronic devices, small computers, mainframe computers, distributed computing environments including any of the above systems or devices, and the like.
  • a personal computer such as a mobile phone, a personal digital assistant (PDA), a media player, etc.
  • PDA personal digital assistant
  • Multi-processor system such as a mobile phone, a personal digital assistant (PDA), a media player, etc.
  • consumer Electronic devices small computers, mainframe computers, distributed computing environments including any of the above systems or devices, and the like.
  • Computer readable instructions may be distributed via computer readable media (discussed below).
  • Computer readable instructions may be implemented as program modules, such as functions, objects, application programming interfaces (APIs), data structures, etc. that perform particular tasks or implement particular abstract data types.
  • program modules such as functions, objects, application programming interfaces (APIs), data structures, etc. that perform particular tasks or implement particular abstract data types.
  • APIs application programming interfaces
  • data structures such as lists, etc. that perform particular tasks or implement particular abstract data types.
  • the functionality of the computer readable instructions can be combined or distributed at will in various environments.
  • FIG. 7 illustrates an example of an electronic device 712 that includes one or more embodiments of the multimedia information retrieval method of the present invention.
  • electronic device 712 includes at least one processing unit 716 and memory 718.
  • memory 718 can be volatile (such as RAM), non-volatile (such as ROM, flash memory, etc.) or some combination of the two. This configuration is illustrated in Figure 7 by dashed line 714.
  • electronic device 712 can include additional features and/or functionality.
  • device 712 may also include additional storage devices (eg, removable and/or non-removable) including, but not limited to, magnetic storage devices, optical storage devices, and the like.
  • additional storage devices eg, removable and/or non-removable
  • storage device 720 Such an additional storage device is illustrated by storage device 720 in FIG.
  • computer readable instructions for implementing one or more embodiments provided herein may be in storage device 720.
  • Storage device 720 can also store other computer readable instructions for implementing an operating system, applications, and the like. Computer readable instructions may be loaded into memory 718 for execution by, for example, processing unit 716.
  • Computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions or other data.
  • Memory 718 and storage device 720 are examples of computer storage media.
  • Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disk (DVD) or other optical storage device, magnetic tape cassette, magnetic tape, magnetic disk storage device or other magnetic storage device, Or any other medium that can be used to store desired information and that can be accessed by electronic device 712. Any such computer storage media may be part of the electronic device 712.
  • Electronic device 712 may also include a communication connection 726 that allows electronic device 712 to communicate with other devices.
  • Communication connection 726 may include, but is not limited to, a modem, a network interface card (NIC), an integrated network interface, a radio frequency transmitter/receiver, an infrared port, a USB connection, or other interface for connecting electronic device 712 to other electronic devices.
  • Communication connection 726 can include a wired connection or a wireless connection.
  • Communication connection 726 can transmit and/or receive communication media.
  • Computer readable medium can include a communication medium.
  • Communication media typically embodies computer readable instructions or other data in "modulated data signals" such as carrier waves or other transport mechanisms, and includes any information delivery media.
  • modulated data signal can include a signal that one or more of the signal characteristics are set or changed in such a manner as to encode the information into the signal.
  • the electronic device 712 can include an input device 724 such as a keyboard, mouse, pen, voice input device, touch input device, infrared camera, video input device, and/or any other input device.
  • Output device 722 may also be included in device 712, such as one or more displays, speakers, printers, and/or any other output device.
  • Input device 724 and output device 722 can be connected to electronic device 712 via a wired connection, a wireless connection, or any combination thereof.
  • an input device or output device from another electronic device can be used as input device 724 or output device 722 of electronic device 712.
  • the components of electronic device 712 can be connected by various interconnects, such as a bus. Such interconnections may include Peripheral Component Interconnect (PCI) (such as Fast PCI), Universal Serial Bus (USB), Firewire (IEEE) 1394), optical bus structure, and the like.
  • PCI Peripheral Component Interconnect
  • USB Universal Serial Bus
  • Firewire IEEE 1394
  • optical bus structure and the like.
  • the components of electronic device 712 can be interconnected by a network.
  • memory 718 can be comprised of multiple physical memory units that are interconnected by a network located in different physical locations.
  • storage devices for storing computer readable instructions may be distributed across a network.
  • electronic device 730 accessible via network 728 can store computer readable instructions for implementing one or more embodiments of the present disclosure.
  • the electronic device 712 can access the electronic device 730 and download a portion or all of the computer readable instructions for execution.
  • electronic device 712 can download a plurality of computer readable instructions as needed, or some of the instructions can be executed at electronic device 712 and some of the instructions can be executed at electronic device 730.
  • the one or more operations may constitute computer readable instructions stored on one or more computer readable media that, when executed by an electronic device, cause the computing device to perform the operations.
  • the order in which some or all of the operations are described should not be construed as implying that the operations must be sequential. Those skilled in the art will appreciate alternative rankings that have the benefit of this specification. Moreover, it should be understood that not all operations must be present in every embodiment provided herein.
  • the word "preferred” as used herein is intended to serve as an example, instance, or illustration. Any aspect or design described as “preferred” by the text is not necessarily to be construed as being more advantageous than other aspects or designs. Instead, the use of the word “preferred” is intended to present a concept in a specific manner.
  • the term “or” as used in this application is intended to mean an “or” or “an” That is, unless otherwise specified or clear from the context, "X employs A or B” means naturally including any one of the permutations. That is, if X uses A; X uses B; or X uses both A and B, then "X uses A or B" is satisfied in any of the foregoing examples.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Library & Information Science (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Mathematical Physics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本发明涉及一种多媒体信息检索方法及电子设备,该多媒体信息检索方法包括步骤:从待检索多媒体中提取所述待检索多媒体的语音;对待检索多媒体的语音进行语音识别,以得到识别文本;以及根据识别文本在多媒体数据库上进行检索,以得到待检索多媒体的多媒体信息。本发明还涉及一种电子设备。本发明的多媒体信息检索方法及电子设备可将用户想了解的多媒体信息自动、快速以及全面的呈现给用户,极大地提高了用户的检索效率及检索成功率。

Description

多媒体信息检索方法及电子设备 技术领域
本发明涉及信息检索领域,特别是涉及一种多媒体信息检索方法及电子设备。
背景技术
用户在听歌曲时,有时会产生想进一步了解该歌曲信息的需求。例如用户在电脑上听到一首比较喜欢的歌曲,就会想进一步了解该歌曲的歌名、演唱者或其他关于该歌曲的背景信息。为了得到上述歌曲的相关信息,用户通常是自己先记下歌词的片段,然后通过互联网对上述歌词的片段进行检索,这个过程需要用户对歌词进行快速识别记忆,并对检索结果具有一定的识别能力,因此整个检索的操作过程比较复杂,且容易出错。
技术问题
本发明的目的在于提供一种基于歌词内容识别的多媒体信息检索方法及电子设备,其可将用户想了解的多媒体信息自动、快速以及全面的呈现给用户,极大地提高了用户的检索效率及检索成功率。解决了现有的多媒体检索过程复杂且容易出错的技术问题。
技术解决方案
为解决上述问题,本发明提供的技术方案如下:
本发明提供一种多媒体信息检索方法,其包括步骤:
从待检索多媒体中提取所述待检索多媒体的语音;
对所述待检索多媒体的语音进行语音识别,以得到识别文本;以及
根据所述识别文本在多媒体数据库上进行检索,以得到所述待检索多媒体的多媒体信息。
本发明还提供一种电子设备,其包括:
一个或一个以上处理器;
存储器;以及
一个或一个以上程序,存储在所述存储器中,且经配置以由所述一个或一个以上处理器执行以提供一多媒体信息检索方法,所述一个或一个以上程序按照功能划分,包括:
语音提取模块,用于从待检索多媒体中提取所述待检索多媒体的语音;
语音识别模块,用于对所述待检索多媒体的语音进行语音识别,以得到识别文本;以及
检索模块,用于根据所述识别文本在多媒体数据库上进行检索,以得到所述待检索多媒体的多媒体信息。
本发明还提供一种电子设备,其包括:
一个或一个以上处理器;
存储器;以及
一个或一个以上程序,存储在所述存储器中,且经配置以由所述一个或一个以上处理器执行以提供一多媒体信息检索方法,所述一个或一个以上程序按照功能划分,包括:
多媒体数据库,用于存储多媒体信息;
下载模块,用于通过待检索多媒体的下载链接下载所述待检索多媒体;
语音提取模块,用于从待检索多媒体中提取所述待检索多媒体的语音;
语音识别模块,用于对所述待检索多媒体的语音进行语音识别,以得到识别文本;以及
检索模块,用于根据所述识别文本在所述多媒体数据库上进行检索,以得到所述待检索多媒体的多媒体信息。
有益效果
相较于现有的多媒体信息检索方法,本发明的多媒体信息检索方法及电子设备可将用户想了解的多媒体信息自动、快速以及全面的呈现给用户,极大地提高了用户的检索效率及检索成功率。解决了现有的多媒体检索过程复杂且容易出错的技术问题。
附图说明
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。
图1为本发明的电子设备的优选实施例的结构示意图;
图2为本发明的电子设备的优选实施例的结构示意图;
图3为本发明的多媒体信息检索方法的优选实施例的流程图;
图4为图1所示的多媒体信息检索方法中的步骤303的详细流程图;
图5为本发明的多媒体信息检索方法使用在服务器一端的使用示意图;
图6为本发明的多媒体信息检索方法使用在服务器一端的使用示意图;
图7为本发明的电子设备的工作环境结构示意图。
请参照图式,其中相同的组件符号代表相同的组件,本发明的原理是以实施在一适当的运算环境中来举例说明。以下的说明是基于所例示的本发明具体实施例,其不应被视为限制本发明未在此详述的其它具体实施例。
在以下的说明中,本发明的具体实施例将参考由一部或多部计算机所执行之作业的步骤及符号来说明,除非另有述明。因此,其将可了解到这些步骤及操作,其中有数次提到为由计算机执行,包括了由代表了以一结构化型式中的数据之电子信号的计算机处理单元所操纵。此操纵转换该数据或将其维持在该计算机之内存系统中的位置处,其可重新配置或另外以本领域技术人员所熟知的方式来改变该计算机之运作。该数据所维持的数据结构为该内存之实体位置,其具有由该数据格式所定义的特定特性。但是,本发明原理以上述文字来说明,其并不代表为一种限制,本领域技术人员将可了解到以下所述的多种步骤及操作亦可实施在硬件当中。
本发明的原理使用许多其它泛用性或特定目的运算、通信环境或组态来进行作业。所熟知适合用于本发明的运算系统、环境与组态的范例可包括(但不限于)行动电话、个人计算机、服务器、多处理器系统、微电脑为主的系统、主架构型计算机、及分布式运算环境,其中包括了任何的上述系统或装置。
本发明提供一种电子设备,请参照图1,图1为本发明的电子设备的优选实施例的结构示意图。该电子设备包括语音提取模块11、语音识别模块12、检索模块13以及多媒体展示模块14。语音提取模块11用于从待检索多媒体中提取待检索多媒体的语音;语音识别模块12用于对待检索多媒体的语音进行语音识别,以得到识别文本;检索模块13用于根据识别文本在多媒体数据库上进行检索,以得到待检索多媒体的多媒体信息;多媒体展示模块14用于将多媒体信息展示给用户。
该检索模块13包括待定识别代码确定单元、识别代码确定单元以及检索单元。待定识别代码确定单元用于根据预设的词库对识别文本进行分词处理,以得到多个待定识别代码;识别代码确定单元用于根据待定识别代码在词库中的词频,确定多个识别代码;检索单元用于使用多个识别代码在多媒体数据库上进行检索,以得到待检索多媒体的多媒体信息。
本发明的电子设备使用时,听歌用户通过音乐播放器播放本地多媒体或下载网络多媒体到本地播放,如用户想了解该多媒体的多媒体信息,语音提取模块11就从该多媒体中提取人声语音,并将该人声语音发送给语音识别模块12;语音识别模块12对该人声语音进行语音识别,得到识别文本,并将该识别文本发送给检索模块13的待定识别代码确定单元;检索模块13的待定识别代码确定单元根据预设的词库对该识别文本进行分词处理,得到多个待定识别代码;然后检索模块13的识别代码确定单元根据待定识别代码确定单元确定的待定识别代码在词库中的词频,确定多个识别代码,并把确定的多个识别代码发送给检索模块13的检索单元;检索模块13的检索单元向多媒体数据库请求根据上述的多个识别代码进行检索,得到相应的多媒体信息;最后多媒体展示模块14将检索单元检索到的多媒体信息展示给用户(当然这里也可用其他的方式将多媒体信息反馈给用户)。
本发明的电子设备的具体工作原理与下面的多媒体信息检索方法的具体实施例相同或相似,可参见下面的多媒体信息检索方法的具体实施例。本发明的电子设备的各模块可相互一体化,或将某个模块拆分为多个独立功能的模块,各模块之间可直接连接也可间接连接。
本发明还提供一种电子设备,请参照图2,图2为本发明的电子设备的优选实施例的结构示意图。该电子设备包括多媒体数据库21、下载模块22、语音提取模块23、语音识别模块24、检索模块25、反馈模块26以及关联模块27。多媒体数据库21用于存储多媒体信息;下载模块22用于通过待检索多媒体的下载链接下载所述待检索多媒体;语音提取模块23用于从待检索多媒体中提取待检索多媒体的语音;语音识别模块24用于对待检索多媒体的语音进行语音识别,以得到识别文本;检索模块25用于根据识别文本在多媒体数据库21上进行检索,以得到待检索多媒体的多媒体信息;反馈模块26用于将多媒体信息反馈给用户;关联模块27用于将下载模块22获得的下载链接与相应的多媒体信息关联起来。
该检索模块25包括待定识别代码确定单元、识别代码确定单元以及检索单元。待定识别代码确定单元用于根据预设的词库对识别文本进行分词处理,以得到多个待定识别代码;识别代码确定单元用于根据待定识别代码在词库中的词频,确定多个识别代码;检索单元用于使用多个识别代码在多媒体数据库21上进行检索,以得到待检索多媒体的多媒体信息。
本发明的电子设备将多媒体信息检索设备(即图1所示的电子设备)以及服务器端的多媒体数据库21一体化,使得用户只需要将多媒体的下载链接发送给电子设备,电子设备就能将待检索多媒体的多媒体信息反馈给听歌用户,极大的方便了听歌用户的操作。
本发明的电子设备使用时,听歌用户通过浏览器访问音乐页面,如用听歌用户想了解页面上的多媒体的多媒体信息,电子设备的下载模块22就会根据多媒体的下载链接将该多媒体下载到电子设备上;语音提取模块23就从该多媒体中提取人声语音,并将该人声语音发送给语音识别模块24;语音识别模块24对该人声语音进行语音识别,得到识别文本,并将该识别文本发送给检索模块25的待定识别代码确定单元;检索模块25的待定识别代码确定单元根据预设的词库对该识别文本进行分词处理,得到多个待定识别代码;然后检索模块25的识别代码确定单元根据待定识别代码确定单元确定的待定识别代码在词库中的词频,确定多个识别代码,并把确定的多个识别代码发送给检索模块25的检索单元;检索模块25的检索单元向多媒体数据库21请求根据上述的多个识别代码进行检索,得到相应的多媒体信息;最后反馈模块26将检索单元检索到的多媒体信息反馈给用户。
进一步的,本发明的电子设备还包括关联模块27,该关联模块27将下载模块22获得的下载链接与相应的多媒体信息关联起来。这样从听歌用户获取的下载链接如与电子设备中保存的下载链接相同,可将关联模块27关联的相应的多媒体信息的检索结果直接反馈给用户,从而大大节约了进行识别、检索的资源。
本发明的电子设备的具体工作原理与下面的多媒体信息检索方法的具体实施例相同或相似,可参见下面的多媒体信息检索方法的具体实施例。本发明的电子设备的各模块可相互一体化,或将某个模块拆分为多个独立功能的模块,各模块之间可直接连接也可间接连接。
本发明还提供一种多媒体信息检索方法,请参照图3,图3为本发明的多媒体信息检索方法的优选实施例的流程图。该多媒体信息检索方法包括:
步骤S301,从待检索多媒体中提取待检索多媒体的语音;
步骤S302,对待检索多媒体的语音进行语音识别,以得到识别文本;
步骤S303,根据识别文本在多媒体数据库上进行检索,以得到待检索多媒体的多媒体信息;
步骤S304,将多媒体信息展示给用户;
下面结合图3和图4详细说明本发明的多媒体信息检索方法的各步骤的详细流程。图4为图3所示的多媒体信息检索方法中的步骤303的详细流程图。
在步骤S301中,从待检索多媒体中提取待检索多媒体的语音。该步骤中,主要是将人声主唱的语音从多媒体语音中分离出来,具体的分离过程可基于听觉场景分析技术或盲信号分离技术等语音分离方法,这样可输出一路人声主唱的语音信号。在选取的多媒体语音的持续时间方面,可以选取完整的多媒体长度,也可以只选取多媒体的一段。通常选取的多媒体语音的时间越长,所消耗的计算资源也就越多,但为后续步骤提供的信息也就越多,这样方便后续步骤进行精确的检索。
在步骤S302中,对待检索多媒体的语音进行语音识别,得到识别文本。在该步骤中,主要是将人声主唱的语音信号转换为识别文本,即将人类的语音中的词汇内容转换为计算机可读的输入,例如按键、二进制编码或字符序列等。该识别文本包括多个识别代码,识别代码包括但不限于中文文字、中文词语、拼音、英文文字和/或英文词语等。目前具体的语音识别过程可采用统计模式识别技术等语音识别方法。由于语音识别本身可能存在不确定的出错率,因此可通过增加提取多媒体语音的时间,来降低语音识别错误导致后续的多媒体检索错误的概率。
请参照图4,步骤S303具体包括:
步骤S3031,根据预设的词库对识别文本进行分词处理,以得到多个待定识别代码。
由于语音识别具有一定的出错率,因此直接将识别出来识别文本进行检索,可能会由于一些小的识别错误导致最终的检索失败。因此这里通过预设的词库对识别文本进行分词处理,得到识别文本中的多个小单元,即为待定识别代码。
该待定识别代码包括但不限于中文文字、中文词语、拼音、英文文字和/或英文词语等,具体根据预设的词库的类型进行划分。如预设的词库只支持中文文字和词语,那只将识别文本划分为多个中文文字或中文词语类型的待定识别代码;如预设的词库支持中文和拼音,同时多媒体数据库也支持拼音检索的话,那可将一些无法识别的中文文字使用拼音的方式进行检索,这样可以进一步保证检索的质量,避免出现检索错误;如预设的词库同时支持中文和英文,这样可以对一些具有英文的多媒体或纯英文多媒体直接进行检索。
步骤S3032,根据待定识别代码在词库中的词频,确定多个识别代码。
每个从词库中选出的待定识别代码在词库中都具有一个词频的属性,待定识别代码的词频大小说明了该待定识别代码在日常生活中使用的频繁程度(词库是对人们在网络上经常使用的语言进行归纳总结形成的),该待定识别代码在日常生活中使用的越频繁,该待定识别代码的词频就越大,否则该待定识别代码的词频就越小。
为了减小检索消耗的资源,会根据待定识别代码在词库中的词频选出一些待定识别代码作为最后用于多媒体检索的识别代码。具体过程如下:
将待定识别代码按待定识别代码在词库中的词频从低到高排列;然后选择词频最低的n个待定识别代码;再随机选择m个待定识别代码,这样n个待定识别代码和所述m个待定识别代码不重复;最后n个待定识别代码以述m个待定识别代码设定为识别代码,这里的n大于等于1,m大于等于0。这里n个待定识别代码由于其词频较低,可以在一定程度上保证检索结果的精确性,而m个待定识别代码可以在一定程度上避免由于低词频的待定识别代码可能导致的检索失败(这里的检索为模糊检索,满足一定程度的检索条件即反馈给检索用户)。这里的n值和m值可根据用户的要求进行灵活设定。
步骤S3033,使用上面确定的多个识别代码在多媒体数据库上进行检索,以得到待检索多媒体的多媒体信息。
这里的检索过程可根据检索结果进行调整,如检索反馈结果过多,可以相应的从待定识别代码中增加新的识别代码;或将检索条件精细化,将满足检索条件高的检索结果反馈给用户。如反馈检索失败,则相应的减少识别代码的数量后进行再次检索,以避免由于语音识别错误产生的识别代码导致检索失败。这里具体的检索过程可根据实际情况进行设定,检索具体过程的不同并不影响本发明的保护范围。多媒体数据库可以是位于本地计算机的本地多媒体数据库,也可以是位于网络上的服务器的网络多媒体数据库。
在步骤S304中,将检索到的多媒体的多媒体信息展示给用户。这里的多媒体信息可为用户想知道的关于检索多媒体的各种相关信息,包括但不限于该多媒体的歌名、演唱者、歌词、专辑、背景信息、歌谱以及多媒体下载链接等;对于某些用户,也可以向其展示该多媒体的下载链接,例如网络在线多媒体的免费下载链接,以及用户本地多媒体相应的免费正版链接等。
通过上述的步骤S301至步骤S304即完成了对待检索多媒体的自动检索,并将检索结果快速的反馈给了听歌用户。
如本发明的多媒体信息检索方法用于用户一端,可如图5所示,图5为将本发明的多媒体信息检索方法使用在用户一端的使用示意图。这样语音提取、语音识别以及结果展示均在用户一端产生,而用于检索的多媒体数据库设置在检索服务器一端,检索服务器仅仅承担检索的作用。
如本发明的多媒体信息检索方法用于服务器一端,可如图6所示,图6为本发明的多媒体信息检索方法使用在服务器一端的使用示意图。这样用户可通过网络浏览器将待检索多媒体的下载链接发送给检索服务器,检索服务器将检索结果反馈给用户。这里的检索服务器不仅要进行多媒体的检索,还要进行多媒体的下载、语音提取、语音识别以及结果反馈等操作,这样可进一步方便用户的操作。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,所述的程序可存储于一计算机可读取存储介质中,该程序在执行时,可包括如上述各方法的实施例的流程。其中,所述的存储介质可为磁碟、光盘、只读存储记忆体(Read-Only Memory,ROM)或随机存储记忆体(Random Access Memory,RAM)等。
本发明的多媒体信息检索方法、电子设备及存储介质可将用户想了解的多媒体信息自动、快速以及全面的呈现给用户,极大地增加了用户听歌的兴趣体验。解决了现有的多媒体检索过程复杂且容易出错的技术问题。同时本发明的电子设备方便用户在自己的电脑上进行多媒体的检索,电子设备可以根据用户的请求在服务器上进行多媒体的检索,而只将检索结果反馈给用户,进一步的方便了用户的操作。
如本申请所使用的术语“组件”、“模块”、“系统”、“接口”等等一般地旨在指计算机相关实体:硬件、硬件和软件的组合、软件或执行中的软件。例如,组件可以是但不限于是运行在处理器上的进程、处理器、对象、可执行应用、执行的线程、程序和/或计算机。通过图示,运行在控制器上的应用和该控制器二者都可以是组件。一个或多个组件可以有在于执行的进程和/或线程内,并且组件可以位于一个计算机上和/或分布在两个或更多计算机之间。
而且,要求保护的主题可以被实现为使用标准编程和/或工程技术产生软件、固件、硬件或其任意组合以控制计算机实现所公开的主题的方法、装置或制造品。本文所使用的术语“制造品”旨在包含可从任意计算机可读设备、载体或介质访问的计算机程序。当然,本领域技术人员将认识到可以对该配置进行许多修改,而不脱离要求保护的主题的范围或精神。
图7和随后的讨论提供了对实现本发明所述的电子设备的工作环境的简短、概括的描述。图7的工作环境仅仅是适当的工作环境的一个实例并且不旨在建议关于工作环境的用途或功能的范围的任何限制。实例电子设备712包括但不限于个人计算机、服务器计算机、手持式或膝上型设备、移动设备(比如移动电话、个人数字助理(PDA)、媒体播放器等等)、多处理器系统、消费型电子设备、小型计算机、大型计算机、包括上述任意系统或设备的分布式计算环境,等等。
尽管没有要求,但是在“计算机可读指令”被一个或多个电子设备执行的通用背景下描述实施例。计算机可读指令可以经由计算机可读介质来分布(下文讨论)。计算机可读指令可以实现为程序模块,比如执行特定任务或实现特定抽象数据类型的功能、对象、应用编程接口(API)、数据结构等等。典型地,该计算机可读指令的功能可以在各种环境中随意组合或分布。
图7图示了包括本发明的多媒体信息检索方法的一个或多个实施例的电子设备712的实例。在一种配置中,电子设备712包括至少一个处理单元716和存储器718。根据电子设备的确切配置和类型,存储器718可以是易失性的(比如RAM)、非易失性的(比如ROM、闪存等)或二者的某种组合。该配置在图7中由虚线714图示。
在其他实施例中,电子设备712可以包括附加特征和/或功能。例如,设备712还可以包括附加的存储装置(例如可移除和/或不可移除的),其包括但不限于磁存储装置、光存储装置等等。这种附加存储装置在图7中由存储装置720图示。在一个实施例中,用于实现本文所提供的一个或多个实施例的计算机可读指令可以在存储装置720中。存储装置720还可以存储用于实现操作系统、应用程序等的其他计算机可读指令。计算机可读指令可以载入存储器718中由例如处理单元716执行。
本文所使用的术语“计算机可读介质”包括计算机存储介质。计算机存储介质包括以用于存储诸如计算机可读指令或其他数据之类的信息的任何方法或技术实现的易失性和非易失性、可移除和不可移除介质。存储器718和存储装置720是计算机存储介质的实例。计算机存储介质包括但不限于RAM、ROM、EEPROM、闪存或其他存储器技术、CD-ROM、数字通用盘(DVD)或其他光存储装置、盒式磁带、磁带、磁盘存储装置或其他磁存储设备、或可以用于存储期望信息并可以被电子设备712访问的任何其他介质。任意这样的计算机存储介质可以是电子设备712的一部分。
电子设备712还可以包括允许电子设备712与其他设备通信的通信连接726。通信连接726可以包括但不限于调制解调器、网络接口卡(NIC)、集成网络接口、射频发射器/接收器、红外端口、USB连接或用于将电子设备712连接到其他电子设备的其他接口。通信连接726可以包括有线连接或无线连接。通信连接726可以发射和/或接收通信媒体。
术语“计算机可读介质”可以包括通信介质。通信介质典型地包含计算机可读指令或诸如载波或其他传输机构之类的“己调制数据信号”中的其他数据,并且包括任何信息递送介质。术语“己调制数据信号”可以包括这样的信号:该信号特性中的一个或多个按照将信息编码到信号中的方式来设置或改变。
电子设备712可以包括输入设备724,比如键盘、鼠标、笔、语音输入设备、触摸输入设备、红外相机、视频输入设备和/或任何其他输入设备。设备712中也可以包括输出设备722,比如一个或多个显示器、扬声器、打印机和/或任意其他输出设备。输入设备724和输出设备722可以经由有线连接、无线连接或其任意组合连接到电子设备712。在一个实施例中,来自另一个电子设备的输入设备或输出设备可以被用作电子设备712的输入设备724或输出设备722。
电子设备712的组件可以通过各种互连(比如总线)连接。这样的互连可以包括外围组件互连(PCI)(比如快速PCI)、通用串行总线(USB)、火线(IEEE 1394)、光学总线结构等等。在另一个实施例中,电子设备712的组件可以通过网络互连。例如,存储器718可以由位于不同物理位置中的、通过网络互连的多个物理存储器单元构成。
本领域技术人员将认识到,用于存储计算机可读指令的存储设备可以跨越网络分布。例如,可经由网络728访问的电子设备730可以存储用于实现本发明所提供的一个或多个实施例的计算机可读指令。电子设备712可以访问电子设备730并且下载计算机可读指令的一部分或所有以供执行。可替代地,电子设备712可以按需要下载多条计算机可读指令,或者一些指令可以在电子设备712处执行并且一些指令可以在电子设备730处执行。
本文提供了实施例的各种操作。在一个实施例中,所述的一个或多个操作可以构成一个或多个计算机可读介质上存储的计算机可读指令,其在被电子设备执行时将使得计算设备执行所述操作。描述一些或所有操作的顺序不应当被解释为暗示这些操作必需是顺序相关的。本领域技术人员将理解具有本说明书的益处的可替代的排序。而且,应当理解,不是所有操作必需在本文所提供的每个实施例中存在。
而且,本文所使用的词语“优选的”意指用作实例、示例或例证。奉文描述为“优选的”任意方面或设计不必被解释为比其他方面或设计更有利。相反,词语“优选的”的使用旨在以具体方式提出概念。如本申请中所使用的术语“或”旨在意指包含的“或”而非排除的“或”。即,除非另外指定或从上下文中清楚,“X使用A或B”意指自然包括排列的任意一个。即,如果X使用A;X使用B;或X使用A和B二者,则“X使用A或B”在前述任一示例中得到满足。
而且,尽管已经相对于一个或多个实现方式示出并描述了本公开,但是本领域技术人员基于对本说明书和附图的阅读和理解将会想到等价变型和修改。本公开包括所有这样的修改和变型,并且仅由所附权利要求的范围限制。特别地关于由上述组件(例如元件、资源等)执行的各种功能,用于描述这样的组件的术语旨在对应于执行所述组件的指定功能(例如其在功能上是等价的)的任意组件(除非另外指示),即使在结构上与执行本文所示的本公开的示范性实现方式中的功能的公开结构不等同。此外,尽管本公开的特定特征已经相对于若干实现方式中的仅一个被公开,但是这种特征可以与如可以对给定或特定应用而言是期望和有利的其他实现方式的一个或多个其他特征组合。而且,就术语“包括”、“具有”、“含有”或其变形被用在具体实施方式或权利要求中而言,这样的术语旨在以与术语“包含”相似的方式包括。
综上所述,虽然本发明已以优选实施例揭露如上,但上述优选实施例并非用以限制本发明,本领域的普通技术人员,在不脱离本发明的精神和范围内,均可作各种更动与润饰,因此本发明的保护范围以权利要求界定的范围为准。
本发明的实施方式
工业实用性
序列表自由内容

Claims (16)

  1. 一种多媒体信息检索方法,其包括步骤:
    从待检索多媒体中提取所述待检索多媒体的语音;
    对所述待检索多媒体的语音进行语音识别,以得到识别文本;以及
    根据所述识别文本在多媒体数据库上进行检索,以得到所述待检索多媒体的多媒体信息。
  2. 根据权利要求1所述的多媒体信息检索方法,其中所述根据所述识别文本在多媒体数据库上进行检索,得到所述待检索多媒体的多媒体信息的步骤包括:
    根据预设的词库对所述识别文本进行分词处理,以得到多个待定识别代码;
    根据所述待定识别代码在所述词库中的词频,确定多个识别代码;以及
    使用所述多个识别代码在所述多媒体数据库上进行检索,以得到所述待检索多媒体的多媒体信息。
  3. 根据权利要求2所述的多媒体信息检索方法,其中所述待定识别代码为文字、拼音和英文的至少其中之一。
  4. 根据权利要求2所述的多媒体信息检索方法,其中所述根据所述待定识别代码在所述词库中的词频,确定多个识别代码的步骤具体为:
    在所有的待定识别代码中选择在所述词库中的词频最低的n个待定识别代码,随后随机选择m个待定识别代码,所述n个待定识别代码和所述m个待定识别代码不重复,将所述n个待定识别代码以及所述m个待定识别代码设定为识别代码,其中n大于等于1,m大于等于0。
  5. 根据权利要求1所述的多媒体信息检索方法,其中所述多媒体信息检索方法还包括步骤:
    将所述多媒体信息展示给用户,其中所述多媒体信息包括歌名、演唱者、歌词、专辑、背景信息、歌谱以及多媒体下载链接至少其中之一。
  6. 一种电子设备,其包括:
    一个或一个以上处理器;
    存储器;以及
    一个或一个以上程序,存储在所述存储器中,且经配置以由所述一个或一个以上处理器执行以提供一多媒体信息检索方法,所述一个或一个以上程序按照功能划分,包括:
    语音提取模块,用于从待检索多媒体中提取所述待检索多媒体的语音;
    语音识别模块,用于对所述待检索多媒体的语音进行语音识别,以得到识别文本;以及
    检索模块,用于根据所述识别文本在多媒体数据库上进行检索,以得到所述待检索多媒体的多媒体信息。
  7. 根据权利要求6所述的电子设备,其中所述检索模块包括:
    待定识别代码确定单元,用于根据预设的词库对所述识别文本进行分词处理,以得到多个待定识别代码;
    识别代码确定单元,用于根据所述待定识别代码在所述词库中的词频,确定多个识别代码;以及
    检索单元,用于使用所述多个识别代码在所述多媒体数据库上进行检索,以得到所述待检索多媒体的多媒体信息。
  8. 根据权利要求7所述的电子设备,其中所述待定识别代码为文字、拼音和英文的至少其中之一。
  9. 根据权利要求7所述的电子设备,其中所述识别代码确定单元具体用于在所有的待定识别代码中选择在所述词库中的词频最低的n个待定识别代码,随后随机选择m个待定识别代码,所述n个待定识别代码和所述m个待定识别代码不重复,将所述n个待定识别代码以及所述m个待定识别代码设定为识别代码,其中n大于等于1,m大于等于0。
  10. 根据权利要求6所述的电子设备,其中所述电子设备还包括:
    多媒体展示模块,用于将所述多媒体信息展示给用户;其中所述多媒体信息包括歌名、演唱者、歌词、专辑、背景信息、歌谱以及多媒体下载链接至少其中之一。
  11. 一种电子设备,其包括:
    一个或一个以上处理器;
    存储器;以及
    一个或一个以上程序,存储在所述存储器中,且经配置以由所述一个或一个以上处理器执行以提供一多媒体信息检索方法,所述一个或一个以上程序按照功能划分,包括:
    多媒体数据库,用于存储多媒体信息;
    下载模块,用于通过待检索多媒体的下载链接下载所述待检索多媒体;
    语音提取模块,用于从待检索多媒体中提取所述待检索多媒体的语音;
    语音识别模块,用于对所述待检索多媒体的语音进行语音识别,以得到识别文本;以及
    检索模块,用于根据所述识别文本在所述多媒体数据库上进行检索,以得到所述待检索多媒体的多媒体信息。
  12. 根据权利要求11所述的电子设备,其中所述检索模块包括:
    待定识别代码确定单元,用于根据预设的词库对所述识别文本进行分词处理,以得到多个待定识别代码;
    识别代码确定单元,用于根据所述待定识别代码在所述词库中的词频,确定多个识别代码;以及
    检索单元,用于使用所述多个识别代码在所述多媒体数据库上进行检索,以得到所述待检索多媒体的多媒体信息。
  13. 根据权利要求12所述的电子设备,其中所述待定识别代码为文字、拼音和英文的至少其中之一。
  14. 根据权利要求12所述的电子设备,其中所述识别代码确定单元具体用于在所有的待定识别代码中选择在所述词库中的词频最低的n个待定识别代码,随后随机选择m个待定识别代码,所述n个待定识别代码和所述m个待定识别代码不重复,将所述n个待定识别代码以及所述m个待定识别代码设定为识别代码,其中n大于等于1,m大于等于0。
  15. 根据权利要求11所述的电子设备,其中所述电子设备还包括:
    反馈模块,用于将所述多媒体信息反馈给用户;其中所述多媒体信息包括歌名、演唱者、歌词、专辑、背景信息、歌谱以及多媒体下载链接至少其中之一。
  16. 根据权利要求11所述的电子设备,其中所述电子设备还包括:
    关联模块,用于将所述下载模块获得的下载链接与相应的多媒体信息关联起来。
PCT/CN2013/081992 2012-08-24 2013-08-21 多媒体信息检索方法及电子设备 WO2014029338A1 (zh)

Priority Applications (3)

Application Number Priority Date Filing Date Title
EP13831356.4A EP2889786A4 (en) 2012-08-24 2013-08-21 MULTIMEDIA INFORMATION CALL PROCESS AND ELECTRONIC DEVICE THEREFOR
JP2015523408A JP5948671B2 (ja) 2012-08-24 2013-08-21 マルチメディア情報検索方法及び電子機器
US14/613,989 US9704485B2 (en) 2012-08-24 2015-02-04 Multimedia information retrieval method and electronic device

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201210303990.7A CN103631802B (zh) 2012-08-24 2012-08-24 歌曲信息检索方法、装置及相应的服务器
CN201210303990.7 2012-08-24

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US14/613,989 Continuation US9704485B2 (en) 2012-08-24 2015-02-04 Multimedia information retrieval method and electronic device

Publications (1)

Publication Number Publication Date
WO2014029338A1 true WO2014029338A1 (zh) 2014-02-27

Family

ID=50149454

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2013/081992 WO2014029338A1 (zh) 2012-08-24 2013-08-21 多媒体信息检索方法及电子设备

Country Status (5)

Country Link
US (1) US9704485B2 (zh)
EP (1) EP2889786A4 (zh)
JP (1) JP5948671B2 (zh)
CN (1) CN103631802B (zh)
WO (1) WO2014029338A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105828210A (zh) * 2016-03-15 2016-08-03 武汉斗鱼网络科技有限公司 一种基于弹幕的点播歌曲的方法及装置

Families Citing this family (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104010063B (zh) * 2014-05-09 2018-01-02 郑明� 移动终端回铃信息的显示方法及设备
CN104598515A (zh) * 2014-12-03 2015-05-06 百度在线网络技术(北京)有限公司 歌曲搜索方法、装置和系统
CN104882146B (zh) * 2015-05-12 2018-05-15 北京音之邦文化科技有限公司 音频推广信息的处理方法及装置
CN105677711A (zh) * 2015-12-28 2016-06-15 小米科技有限责任公司 信息显示方法和装置
CN105956014A (zh) * 2016-04-22 2016-09-21 成都涂鸦科技有限公司 一种基于深度学习的音乐播放方法
WO2018018283A1 (zh) * 2016-07-24 2018-02-01 张鹏华 歌曲信息识别技术的使用情况统计方法和识别系统
CN106896933B (zh) * 2017-01-19 2019-12-06 深圳情景智能有限公司 将语音输入转换成文本输入的方法、装置和语音输入设备
US11017771B2 (en) * 2019-01-18 2021-05-25 Adobe Inc. Voice command matching during testing of voice-assisted application prototypes for languages with non-phonetic alphabets
US10964322B2 (en) 2019-01-23 2021-03-30 Adobe Inc. Voice interaction tool for voice-assisted application prototypes
CN110795593A (zh) * 2019-10-12 2020-02-14 百度在线网络技术(北京)有限公司 语音包的推荐方法、装置、电子设备和存储介质
CN111368136A (zh) * 2020-03-31 2020-07-03 北京达佳互联信息技术有限公司 歌曲识别方法、装置、电子设备及存储介质
KR102362815B1 (ko) * 2020-05-18 2022-02-14 니나노 주식회사 음성 인식 선곡 서비스 제공 방법 및 음성 인식 선곡 장치
CN113658594A (zh) * 2021-08-16 2021-11-16 北京百度网讯科技有限公司 歌词识别方法、装置、设备、存储介质及产品

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6219640B1 (en) * 1999-08-06 2001-04-17 International Business Machines Corporation Methods and apparatus for audio-visual speaker recognition and utterance verification
CN101021857A (zh) * 2006-10-20 2007-08-22 鲍东山 基于内容分析的视频搜索系统
CN101634987A (zh) * 2008-07-21 2010-01-27 上海天统电子科技有限公司 多媒体播放器

Family Cites Families (48)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4852170A (en) * 1986-12-18 1989-07-25 R & D Associates Real time computer speech recognition system
US4829572A (en) * 1987-11-05 1989-05-09 Andrew Ho Chung Speech recognition system
DE3931638A1 (de) * 1989-09-22 1991-04-04 Standard Elektrik Lorenz Ag Verfahren zur sprecheradaptiven erkennung von sprache
WO1995002879A1 (en) * 1993-07-13 1995-01-26 Theodore Austin Bordeaux Multi-language speech recognition system
US6014615A (en) * 1994-08-16 2000-01-11 International Business Machines Corporaiton System and method for processing morphological and syntactical analyses of inputted Chinese language phrases
SG42314A1 (en) * 1995-01-30 1997-08-15 Mitsubishi Electric Corp Language processing apparatus and method
US5749066A (en) * 1995-04-24 1998-05-05 Ericsson Messaging Systems Inc. Method and apparatus for developing a neural network for phoneme recognition
US5737489A (en) * 1995-09-15 1998-04-07 Lucent Technologies Inc. Discriminative utterance verification for connected digits recognition
US5832478A (en) * 1997-03-13 1998-11-03 The United States Of America As Represented By The National Security Agency Method of searching an on-line dictionary using syllables and syllable count
US6032111A (en) * 1997-06-23 2000-02-29 At&T Corp. Method and apparatus for compiling context-dependent rewrite rules and input strings
ITTO980383A1 (it) * 1998-05-07 1999-11-07 Cselt Centro Studi Lab Telecom Procedimento e dispositivo di riconoscimento vocale con doppio passo di riconoscimento neurale e markoviano.
US6243713B1 (en) * 1998-08-24 2001-06-05 Excalibur Technologies Corp. Multimedia document retrieval by application of multimedia queries to a unified index of multimedia data for a plurality of multimedia data types
US6345252B1 (en) * 1999-04-09 2002-02-05 International Business Machines Corporation Methods and apparatus for retrieving audio information using content and speaker information
US7403888B1 (en) * 1999-11-05 2008-07-22 Microsoft Corporation Language input user interface
US7165019B1 (en) * 1999-11-05 2007-01-16 Microsoft Corporation Language input architecture for converting one text form to another text form with modeless entry
US6892191B1 (en) * 2000-02-07 2005-05-10 Koninklijke Philips Electronics N.V. Multi-feature combination generation and classification effectiveness evaluation using genetic algorithms
US7107204B1 (en) * 2000-04-24 2006-09-12 Microsoft Corporation Computer-aided writing system and method with cross-language writing wizard
US7072827B1 (en) * 2000-06-29 2006-07-04 International Business Machines Corporation Morphological disambiguation
US6973427B2 (en) * 2000-12-26 2005-12-06 Microsoft Corporation Method for adding phonetic descriptions to a speech recognition lexicon
JP2002258874A (ja) * 2001-03-01 2002-09-11 Alpine Electronics Inc 音楽試聴方法、システムおよび情報端末、音楽検索サーバ
US7013273B2 (en) * 2001-03-29 2006-03-14 Matsushita Electric Industrial Co., Ltd. Speech recognition based captioning system
US7124080B2 (en) * 2001-11-13 2006-10-17 Microsoft Corporation Method and apparatus for adapting a class entity dictionary used with language models
US7395203B2 (en) * 2003-07-30 2008-07-01 Tegic Communications, Inc. System and method for disambiguating phonetic input
US20050038814A1 (en) * 2003-08-13 2005-02-17 International Business Machines Corporation Method, apparatus, and program for cross-linking information sources using multiple modalities
US20050071148A1 (en) * 2003-09-15 2005-03-31 Microsoft Corporation Chinese word segmentation
TW200538969A (en) * 2004-02-11 2005-12-01 America Online Inc Handwriting and voice input with automatic correction
JP2005266198A (ja) * 2004-03-18 2005-09-29 Pioneer Electronic Corp 音響情報再生装置および音楽データのキーワード作成方法
CN1993692A (zh) * 2004-05-24 2007-07-04 紫熊猫有限公司 字符显示系统
CN1750117A (zh) * 2004-09-16 2006-03-22 乐金电子(惠州)有限公司 伴唱机歌曲搜索系统及其旋律数据库构成方法
US7996208B2 (en) * 2004-09-30 2011-08-09 Google Inc. Methods and systems for selecting a language for text segmentation
US7680648B2 (en) * 2004-09-30 2010-03-16 Google Inc. Methods and systems for improving text segmentation
US8463611B2 (en) * 2004-10-13 2013-06-11 Hewlett-Packard Development Company, L.P. Method and system for improving the fidelity of a dialog system
JP2006186426A (ja) * 2004-12-24 2006-07-13 Toshiba Corp 情報検索表示装置、情報検索表示方法および情報検索表示プログラム
TWI277949B (en) * 2005-02-21 2007-04-01 Delta Electronics Inc Method and device of speech recognition and language-understanding analysis and nature-language dialogue system using the method
US7516125B2 (en) * 2005-08-01 2009-04-07 Business Objects Americas Processor for fast contextual searching
NO326770B1 (no) * 2006-05-26 2009-02-16 Tandberg Telecom As Fremgangsmate og system for videokonferanse med dynamisk layout basert pa orddeteksjon
US8694318B2 (en) * 2006-09-19 2014-04-08 At&T Intellectual Property I, L. P. Methods, systems, and products for indexing content
US20080085099A1 (en) * 2006-10-04 2008-04-10 Herve Guihot Media player apparatus and method thereof
US20080221866A1 (en) * 2007-03-06 2008-09-11 Lalitesh Katragadda Machine Learning For Transliteration
US20080300872A1 (en) * 2007-05-31 2008-12-04 Microsoft Corporation Scalable summaries of audio or visual content
US20090031885A1 (en) * 2007-07-31 2009-02-05 Christopher Lee Bennetts Networked karaoke system and method
CN100470633C (zh) * 2007-11-30 2009-03-18 清华大学 语音点歌方法
US8155961B2 (en) * 2008-12-09 2012-04-10 Nokia Corporation Adaptation of automatic speech recognition acoustic models
JP2010157080A (ja) * 2008-12-26 2010-07-15 Ntt Communications Kk コンテンツ関連情報検索システム、コンテンツ関連情報検索方法、およびコンテンツ関連情報検索プログラム
JP5697860B2 (ja) * 2009-09-09 2015-04-08 クラリオン株式会社 情報検索装置,情報検索方法及びナビゲーションシステム
CN102236686A (zh) * 2010-05-07 2011-11-09 盛乐信息技术(上海)有限公司 语音分段式歌曲检索方法
CN102404278A (zh) * 2010-09-08 2012-04-04 盛乐信息技术(上海)有限公司 一种基于声纹识别的点歌系统及其应用方法
US20140180762A1 (en) * 2012-12-12 2014-06-26 Ishlab, Inc. Systems and methods for customized music selection

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6219640B1 (en) * 1999-08-06 2001-04-17 International Business Machines Corporation Methods and apparatus for audio-visual speaker recognition and utterance verification
CN101021857A (zh) * 2006-10-20 2007-08-22 鲍东山 基于内容分析的视频搜索系统
CN101634987A (zh) * 2008-07-21 2010-01-27 上海天统电子科技有限公司 多媒体播放器

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105828210A (zh) * 2016-03-15 2016-08-03 武汉斗鱼网络科技有限公司 一种基于弹幕的点播歌曲的方法及装置

Also Published As

Publication number Publication date
CN103631802A (zh) 2014-03-12
EP2889786A1 (en) 2015-07-01
JP5948671B2 (ja) 2016-07-06
US9704485B2 (en) 2017-07-11
US20150154958A1 (en) 2015-06-04
EP2889786A4 (en) 2016-03-30
CN103631802B (zh) 2015-05-20
JP2015522892A (ja) 2015-08-06

Similar Documents

Publication Publication Date Title
WO2014029338A1 (zh) 多媒体信息检索方法及电子设备
JP6820058B2 (ja) 音声認識方法、装置、デバイス、及び記憶媒体
WO2014059863A1 (zh) 字幕查询方法、电子设备及存储介质
WO2022042512A1 (zh) 文本处理方法、装置、电子设备及介质
US20070025704A1 (en) Information-processing apparatus, reproduction apparatus, communication method, reproduction method and computer programs
WO2008145055A1 (fr) Procédé pour obtenir une information de mot de restriction et pour optimiser le système du procédé d'entrée et de sortie
CN107526812A (zh) 一种搜索方法、装置及电子设备
TWI668629B (zh) 欄位條目的音訊輸入技術
KR20070122274A (ko) 포터블 기기의 파일 관리 방법 및 장치
KR100613859B1 (ko) 개인 휴대 단말기를 위한 멀티미디어 데이터 편집, 제공장치 및 방법
US20170011114A1 (en) Common data repository for improving transactional efficiencies of user interactions with a computing device
US20240098332A1 (en) Systems and methods to handle queries comprising a media quote
US20170083499A1 (en) Identifying and modifying specific user input
KR20080083290A (ko) 디지털 파일의 컬렉션에서 디지털 파일을 액세스하기 위한방법 및 장치
CN106663123B (zh) 以评论为中心的新闻阅读器
WO2021167220A1 (ko) 내용 기반 동영상 목차 자동생성 방법 및 시스템
WO2015121715A1 (en) Method of and system for generating metadata
CN110598067A (zh) 词语权重获取方法、装置及存储介质
US20160048489A1 (en) Information processing device, data input assistance method, and program
WO2021017302A1 (zh) 一种数据提取方法、装置、计算机系统及可读存储介质
JP2001101184A (ja) 構造化文書生成方法及び装置及び構造化文書生成プログラムを格納した記憶媒体
US11379664B2 (en) Method for acquiring a parallel corpus, electronic device, and storage medium
WO2016043493A1 (ko) 애플리케이션을 생성 및 제공하기 위한 방법, 서버 및 컴퓨터 프로그램
CN117892724B (zh) 一种文本检测方法、装置、设备及存储介质
CN102222089A (zh) 电视输入法系统

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 13831356

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 2013831356

Country of ref document: EP

ENP Entry into the national phase

Ref document number: 2015523408

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE