WO2014048348A1 - 一种多媒体设备语音控制系统及方法、计算机存储介质 - Google Patents
一种多媒体设备语音控制系统及方法、计算机存储介质 Download PDFInfo
- Publication number
- WO2014048348A1 WO2014048348A1 PCT/CN2013/084348 CN2013084348W WO2014048348A1 WO 2014048348 A1 WO2014048348 A1 WO 2014048348A1 CN 2013084348 W CN2013084348 W CN 2013084348W WO 2014048348 A1 WO2014048348 A1 WO 2014048348A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- voice
- module
- user
- control instruction
- control
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 35
- 230000006870 function Effects 0.000 claims abstract description 51
- 230000009471 action Effects 0.000 claims abstract description 25
- 230000003213 activating effect Effects 0.000 claims abstract description 5
- 238000012545 processing Methods 0.000 claims description 19
- 230000003993 interaction Effects 0.000 claims description 4
- 230000002618 waking effect Effects 0.000 claims 2
- 230000005236 sound signal Effects 0.000 claims 1
- 238000005516 engineering process Methods 0.000 abstract description 8
- 230000007613 environmental effect Effects 0.000 abstract description 4
- 230000002452 interceptive effect Effects 0.000 abstract description 3
- 230000001427 coherent effect Effects 0.000 description 5
- 238000010586 diagram Methods 0.000 description 5
- 230000008569 process Effects 0.000 description 4
- 230000003044 adaptive effect Effects 0.000 description 3
- 230000003238 somatosensory effect Effects 0.000 description 3
- 230000009286 beneficial effect Effects 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000001914 filtration Methods 0.000 description 2
- 206010039740 Screaming Diseases 0.000 description 1
- 238000009825 accumulation Methods 0.000 description 1
- 238000004590 computer program Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000012549 training Methods 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/41—Structure of client; Structure of client peripherals
- H04N21/422—Input-only peripherals, i.e. input devices connected to specially adapted client devices, e.g. global positioning system [GPS]
- H04N21/42203—Input-only peripherals, i.e. input devices connected to specially adapted client devices, e.g. global positioning system [GPS] sound input device, e.g. microphone
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
- G06F3/017—Gesture based interaction, e.g. based on a set of recognized hand gestures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
- G06F3/03—Arrangements for converting the position or the displacement of a member into a coded form
- G06F3/0304—Detection arrangements using opto-electronic means
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/16—Sound input; Sound output
- G06F3/167—Audio in a user interface, e.g. using voice commands for navigating, audio feedback
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/41—Structure of client; Structure of client peripherals
- H04N21/422—Input-only peripherals, i.e. input devices connected to specially adapted client devices, e.g. global positioning system [GPS]
- H04N21/4223—Cameras
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/44—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
- H04N21/44008—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics in the video stream
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/441—Acquiring end-user identification, e.g. using personal code sent by the remote control or by inserting a card
- H04N21/4415—Acquiring end-user identification, e.g. using personal code sent by the remote control or by inserting a card using biometric characteristics of the user, e.g. by voice recognition or fingerprint scanning
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/442—Monitoring of processes or resources, e.g. detecting the failure of a recording device, monitoring the downstream bandwidth, the number of times a movie has been viewed, the storage space available from the internal hard disk
- H04N21/44213—Monitoring of end-user related data
- H04N21/44218—Detecting physical presence or behaviour of the user, e.g. using sensors to detect if the user is leaving the room or changes his face expression during a TV program
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/47—End-user applications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2203/00—Indexing scheme relating to G06F3/00 - G06F3/048
- G06F2203/038—Indexing scheme relating to G06F3/038
- G06F2203/0381—Multimodal input, i.e. interface arrangements enabling the user to issue commands by simultaneous use of input devices of different nature, e.g. voice plus gesture on digitizer
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/44—Arrangements for executing specific programs
- G06F9/451—Execution arrangements for user interfaces
- G06F9/453—Help systems
Definitions
- the present invention relates to voice remote control technology, and more particularly to a multimedia device voice control system and method, and a computer storage medium.
- a variety of multimedia equipment TVs are generally equipped with high-performance control chips, with an open platform and operating system; users can install and uninstall applications themselves, such applications extend the capabilities of multimedia devices; support users to browse and network through the network Social.
- a smart TV as an example, a television is no longer limited to the traditional TV program playing function, but also implements audio and video sharing, interactive entertainment games and the like by running an application.
- the traditional push-button remote control has been unable to meet a variety of multimedia function selection and operation needs.
- the prior art proposes various human-computer interaction schemes including touch control, sound control, gesture recognition, and somatosensory control to realize intelligent control.
- touch control due to the limitation of use scenes and usage habits of the television, there is still no intelligent control method that can completely Instead of a handheld remote, the user must manipulate with a combination of specific function keys and numeric keys on the handheld remote.
- the touch control scheme requires a touch sensing module installed on the remote controller; the gesture recognition scheme cannot quickly perform common program channel jump control: if the user wants to switch from the current channel 1 to channel 55, only gesture recognition is used.
- the operation is obviously not as fast as using a conventional remote control; the problem of the somatosensory control scheme is similar to the gesture recognition scheme, and it is usually necessary to install an expensive depth image sensing module to facilitate accurate somatosensory control functions.
- a problem with prior art voice recognition control schemes is that the microphone module is typically mounted on the remote control to clearly capture the user's voice, still requiring the use of a handheld remote control.
- voice recognition and semantic recognition have basically reached a practical stage.
- cloud computing technology many cloud service-based voice recognition service providers and smart TVs combine to realize voice control television.
- most of the existing technical solutions are installed with a microphone pickup module on the remote controller, and the user's voice is processed and transmitted to the cloud for identification processing; even if the microphone array technology capable of long-distance pickup is used, there is also a TV output.
- the sound, environmental sound interference and the user's non-control command voice are misinterpreted as control commands and other issues, affecting the effect of voice control television and other multimedia devices.
- the technical problem to be solved by the present invention is to propose a multimedia device voice control system.
- the technical solution adopted by the present invention to solve the technical problem thereof is to provide a multimedia device voice control system, comprising: an image sensing module for collecting user motion images; an image recognition module for determining a control instruction type or state according to a user motion image; and a voice recognition state
- the management module activates or suspends voice recognition according to the current control instruction type; the sound collection module collects voice data; the voice recognition module identifies the collected voice data to form a control command; the multimedia function module executes the control command to the user Provide corresponding multimedia features.
- the image recognition module compares the user motion image with the preset image template, and selects a control instruction type that matches the user motion image; if the comparison result is a control instruction type that matches the user motion image, the user is considered
- the location is the target sound source location, and the location information of the target sound source is sent to the voice recognition state management module, the voice recognition information and/or the control instruction type is activated; or if the control instruction type matching the user motion image is not found, the voice recognition state is The management module issues a comparison failure message.
- the multimedia device voice control system further includes a sound beam forming module that determines the sound pickup direction and the sound pickup receiving angle according to the position information of the target sound source.
- the sound collecting module is an array sound collecting module, comprising at least one sound collecting sensor arranged in a regular manner, and collecting a voice signal emitted by the target sound source according to the definition of the sound collecting direction and the sound receiving angle, and performing the number
- the word processing forms voice data and sends it to the voice recognition module.
- the voice recognition state management module sends a start command and a control command type to the voice recognition module to activate voice recognition according to the received voice recognition information, and sends the location information of the target sound source to the sound beam forming module, and controls the multimedia function.
- the module reduces the volume of the multimedia output sound; or sends an instruction to pause the speech recognition to the speech recognition module according to the received comparison failure information.
- the voice recognition module identifies the voice data from the sound pickup module according to the start command and the control command type from the voice recognition state management module, and forms a control command belonging to the control command type, and sends the control command to the multimedia function module.
- the voice recognition module comprises a local voice recognition module and a cloud voice recognition module; the local voice recognition module recognizes the voice data, forms a control instruction belonging to the control instruction type, and sends the message to the multimedia function module; the cloud voice recognition module pairs the local voice recognition module The unrecognized speech data is subjected to semantic recognition processing to form a control instruction belonging to the control instruction type and sent to the multimedia function module.
- the present invention also provides a multimedia device voice control method, comprising: a step of acquiring an image of a user motion by an image sensing module; a step of determining a type or state of a control instruction according to a motion image of the user; and a voice recognition state management module according to the current control instruction
- the image sensing module collects a user motion image; the image recognition module compares the user motion image with a preset image template, and selects a control instruction type that matches the user motion image; if the comparison result is found to match the user motion image
- the control command type considers that the user's location is the target sound source location, sends the location information of the target sound source to the voice recognition state management module, activates the voice recognition information and/or the control instruction type; if no control command is found that matches the user motion image Type, the comparison failure information is sent to the voice recognition state management module; the voice recognition state management module sends a start command and a control command type to the voice recognition module to activate the voice recognition according to the received start voice recognition information, to the sound beam forming module Sending the location information of the target sound source, and controlling the multimedia function module to reduce the volume of the multimedia output sound; or sending an instruction to pause the speech recognition to the voice recognition module according to the received comparison failure information; the sound beam forming module according to the location of the target sound source
- the multimedia device voice control method wherein the voice recognition module comprises a local voice recognition module and a cloud voice recognition module, and the voice recognition module presets a voice command dictionary, comprising: the local voice recognition module recognizes the voice data, and the voice data and the voice command The word model comparison in the dictionary, if the similarity between the voice data and the at least one word model is greater than a preset threshold, the voice data is interpreted as a control instruction corresponding to the word model, and sent to the multimedia function module; if the voice data and the at least one word model If the similarity is not greater than the preset threshold, the voice data is sent to the cloud voice recognition module through the network; the cloud voice recognition module performs semantic recognition processing on the voice data to form a control command, which is sent to the multimedia function module through the network.
- the voice recognition module comprises a local voice recognition module and a cloud voice recognition module
- the voice recognition module presets a voice command dictionary, comprising: the local voice recognition module recognizes the voice data, and the voice data and the voice command The word
- the invention also provides a computer storage medium for storing computer executable instructions.
- the computer readable storage medium stores one or more programs, the one or more programs being used by one or more processors to execute instructions
- a method for performing a multimedia device voice control comprising:
- the invention combines image recognition and speech recognition technology and computer storage medium to realize free and convenient voice control without relying on the handheld remote control, and is not limited to the short-distance sound collection module, thereby effectively avoiding the sound output of the multimedia device, the environmental background sound and the user's
- the interference of the non-control command voice signal to the voice recognition of the control command is beneficial to realize accurate recognition of the control command issued by the user.
- FIG. 1 is a schematic structural diagram of a module of a multimedia device voice control system according to an embodiment of the present invention
- FIG. 2 is a schematic diagram of a preset image template according to an embodiment of the present invention.
- FIG. 3 is a detailed working flow chart of a voice control system for a multimedia device according to an embodiment of the present invention.
- FIG. 4 is a schematic view showing the arrangement of the array sound pickup module 14 according to an embodiment of the present invention.
- FIG. 5 is a basic working flow chart of a voice control system for a multimedia device according to an embodiment of the present invention.
- FIG. 6 is a detailed flow chart of the speech recognition module 15 according to an embodiment of the present invention.
- the multimedia device 1 includes an image sensing module 10 for collecting a user motion image.
- the image recognition module 11 determines a control command type or state according to a user motion image.
- the voice recognition state management module 12 activates or wakes up the voice recognition according to the current control command type;
- the sound collection module 14 collects voice data;
- the voice recognition module 15 identifies the collected voice data to form a control command;
- the multimedia function module 16 Execute control commands to provide users with corresponding multimedia functions.
- the image recognition module 11 presets at least one image template, and different control instruction types respectively correspond to different image templates, and the user action image and the at least one image are Template comparison, if an image template matching the user action image is found, the user is considered to be the target sound source, and the next voice sent by the user is a control instruction belonging to the corresponding control instruction type. If the comparison fails, no matching with the user action image is found.
- the image template considers that the user's action is not to issue a control command, suspending the recognition of its voice.
- the image recognition module 11 processes the user motion image sent by the image sensing module 10, compares the processing result with the preset image template data, and selects The type of control instruction that the user action image matches;
- the location of the user is regarded as the target sound source location, and the location information of the target sound source is transmitted to the voice recognition state management module 12, and the voice recognition information and/or the control instruction type are activated. ;
- the comparison failure information is sent to the voice recognition state management module 12.
- image recognition module 11 requires training for specific user actions.
- the multimedia device 1 guides the user to naturally put the right hand on the mouth to make a screaming action by playing the human-computer interaction content to the user until the action conforms to the preset first image template corresponding to the "start voice remote control" control command type.
- the multimedia device 1 can guide the user to cover the mouth with the palm of the hand until the action conforms to the preset second image template corresponding to the type of the "mute" control command.
- the embodiment of the present invention further includes a sound beam forming module 13, which determines the sound pickup direction and the sound pickup receiving angle according to the position information of the target sound source, and combines the array sound pickup technology to effectively eliminate noise and improve the accuracy of voice recognition.
- the sound collecting module 14 of the embodiment is an array sound collecting module, comprising at least one sound collecting sensor arranged in a regular manner, and collecting a voice signal emitted by the target sound source according to the definition of the sound collecting direction and the sound receiving angle, performing digitization processing to eliminate background noise.
- the voice data is formed, it is sent to the voice recognition module 15.
- the array sound pickup module 14 may include a plurality of sound pickup sensors arranged in a regular geometric shape, for example, using an equidistant linear arrangement manner, leveling a plurality of sound pickup sensors, and the like. The intervals are arranged on both sides of the image sensing module 10.
- the sound beam forming module 13 determines the direction and range of the main beam direction of the sound collection signal of the array sound pickup module 14, that is, the sound pickup direction and the sound receiving angle. According to this, the array sound collection module 14 is defined to collect the voice signal emitted by the target sound source.
- Existing common beamforming methods include delay-accumulation (traditional beam method), adaptive beam method, and post-adaptive filtering. These three methods have their own advantages and disadvantages, delay-accumulated beam method and post-slave
- the adaptive filtering method is suitable for non-coherent noise and weak coherent noise cancellation; while the adaptive beam method is suitable for eliminating coherent noise and has a poor effect on eliminating incoherent noise or scattered noise.
- the manner of determining the position of the target sound source by image recognition is used to skillfully realize the determination of the pickup direction and the pickup reception angle. Even if there are multiple TV viewers and they are within the range of image sensing recognition, only the voice signals from the target user are recognized.
- the voice recognition state management module 12 is mainly responsible for managing and controlling the recognition state of the multimedia device voice control system.
- the start command and the control command type are sent to the voice recognition module 15 to activate the voice recognition, and the position information of the target sound source is sent to the sound beam forming module 13, and the voice signal sent by the user is As the control command, it is sent to the voice recognition module 15 for processing by the array sound pickup module 14; when the comparison failure information is received, the voice recognition module 15 is sent an instruction to pause the voice recognition.
- the voice recognition state management module 12 activates voice recognition, and controls the multimedia function 16 module to reduce the volume of the multimedia output sound.
- the sound intensity of the control TV output is reduced to a voice that is appropriately smaller than the current target sound source. Signal strength. Without loss of generality, the sound output from the smart TV can be temporarily muted, thereby preventing the TV background from becoming noise and interfering with speech recognition. If the voice recognition is paused after the speech recognition or the comparison fails, the voice recognition module 15 is not activated, and the voice outputted by the smart TV is adjusted to the normal volume, and the voice signal of the user is ignored, thereby achieving the purpose of avoiding the unintentional voice command interference of the user. .
- the present invention further proposes that the voice recognition module 15 recognizes the voice data from the sound pickup module 14 according to the start command and the control command type from the voice recognition state management module 12, and forms a control command belonging to the control command type, and transmits the control command to the multimedia function.
- the voice recognition module 15 recognizes the voice data from the sound pickup module 14 according to the start command and the control command type from the voice recognition state management module 12, and forms a control command belonging to the control command type, and transmits the control command to the multimedia function.
- the voice recognition module 15 recognizes the voice data from the sound pickup module 14 according to the start command and the control command type from the voice recognition state management module 12, and forms a control command belonging to the control command type, and transmits the control command to the multimedia function.
- the voice recognition module 15 recognizes the voice data from the sound pickup module 14 according to the start command and the control command type from the voice recognition state management module 12, and forms a control command belonging to the control command type, and transmits the control command to the multimedia function.
- the voice recognition module 15 presets a built-in voice command dictionary, where the voice command dictionary stores the processed control command voice signal word model, including but not limited to “previous channel”, “next channel”, and “added”. Large volume, “reduced volume”, “central one”, “Hunan TV”, etc.
- the speech recognition module 15 compares the speech data with the word model in the speech instruction dictionary. If the similarity between the speech data and the at least one word model is greater than a preset threshold, the speech data is interpreted as a control instruction corresponding to the word model, and sent to the multimedia function. Module 16.
- the speech recognition module 15 includes a local speech recognition module 151 and a cloud speech recognition module 152; the former is responsible for the identification and processing of simple control instructions, including but not limited to changing channels, adjusting Volume, switching machine, etc.; the latter is responsible for the identification and processing of complex control instructions containing semantically recognized content, implemented by means of speech recognition cloud services.
- the local voice recognition module 151 recognizes the voice data, forms a control command belonging to the type of control command, and sends it to the multimedia function module 16;
- the cloud speech recognition module 152 can employ an online service provided by a speech recognition service provider having semantic recognition capabilities, such as Keda Xunfei. If the voice data of the user is not recognized in the local voice recognition module 152, that is, the similarity between all the word models in the voice data and the voice command dictionary is not greater than a preset threshold, the voice data is sent to the cloud voice recognition module 152 through the network.
- the semantic recognition process forms a control instruction belonging to the control instruction type and sends it to the multimedia function module 16.
- the present invention also provides a multimedia device voice control method.
- the basic working flow chart of the multimedia device voice control system includes:
- Step S1 The image sensing module 10 collects a user motion image.
- Step S2 The image recognition module 11 determines a control instruction type or state according to the user motion image
- Step S3 the voice recognition state management module 12 activates or wakes up the voice recognition according to the current control instruction type
- Step S4 the sound beam forming module 13 determines a pickup direction and a pickup reception angle
- Step S5 the array sound collection module 14 collects the voice signal sent by the user according to the definition of the sound pickup direction and the sound receiving angle, and performs digitization processing to form voice data;
- Step S6 The voice recognition module 15 identifies the collected voice data to form a control command.
- Step S7 The multimedia function module 16 executes a control instruction to provide a corresponding multimedia function to the user.
- the present invention provides an embodiment, including:
- Step S1 The image sensing module 10 collects a user motion image.
- Step S21 The image recognition module 11 compares the user motion image with the preset image template, and selects a control instruction type that matches the user motion image; if the comparison result is a control instruction type that matches the user motion image, the step is performed. S22; if no control instruction type matching the user action image is found, proceeding to step S23;
- Step S22 The image recognition module 11 considers that the location of the user is the target sound source location, and sends the location information of the target sound source to the voice recognition state management module 12, and starts the voice recognition information and/or the control instruction type;
- Step S23 the image recognition module 11 sends comparison failure information to the voice recognition state management module 12;
- Step S31 the voice recognition state management module 12 analyzes the received information, if it is to start the voice recognition information, proceed to step S32; if it is the comparison failure information, proceed to step S35;
- Step S32 the voice recognition state management module 12 sends a start command and a control command type to the voice recognition module 15 to activate voice recognition;
- Step S33 the voice recognition state management module 12 sends the location information of the target sound source to the sound beam forming module 13;
- Step S34 the voice recognition state management module 12 controls the multimedia function module 16 to reduce the volume of the multimedia output sound
- Step S35 the voice recognition state management module 12 sends an instruction to pause the voice recognition to the voice recognition module 15;
- Step S4 the sound beam forming module 13 determines the sound pickup direction and the sound pickup receiving angle according to the position information of the target sound source;
- Step S51 the array sound collection module 14 collects a voice signal emitted by the target sound source according to the definition of the sound pickup direction and the sound receiving angle;
- Step S52 the array sound collection module 14 digitally processes the collected voice signal to form voice data, and sends it to the voice recognition module 15;
- Step S61 the voice recognition module 15 identifies the voice data from the array sound collection module 14 according to the startup command and the control instruction type from the voice recognition state management module 12, and forms a control instruction belonging to the control instruction type, and sends the control command to the multimedia function module. 16;
- Step S7 The multimedia function module 16 executes a control instruction to provide a corresponding multimedia function to the user.
- the image sensing module 10 of the smart television 1 collects the user A to perform the action shown in FIG. 2 within the sensing range.
- the image recognition module 11 compares the user motion image with the preset image template, and finds that the image template corresponding to the preset “start voice remote control” control instruction type is consistent, and then considers that the position of the user A is the target sound source position, and the voice is
- the recognition state management module 12 transmits the location information of the target sound source, activates the voice recognition information, and/or the control instruction type; the voice recognition state management module 12 sends the startup instruction and the control instruction type to the voice recognition module 15 according to the received startup voice recognition information.
- the speech recognition is activated, and the position information of the target sound source is sent to the sound beam forming module 13 to ensure that even if there are multiple TV viewers and are within the image sensing recognition range, only the user A is the target user, and only the voice signal sent by him is recognized. .
- the sound beam forming module 13 determines the sound collecting direction and the sound receiving angle according to the position information of the target sound source; the array sound collecting module 14 collects the voice signal “Hunan Satellite TV” issued by the user A according to the definition of the sound collecting direction and the sound receiving angle, It digitizes to form voice data and sends it to the voice recognition module 15.
- the voice recognition module 15 identifies the voice data, and finds that the similarity between the voice data and the word model is greater than a preset threshold, and forms a “channel adjustment to Hunan Satellite Video Channel” control command, and sends the control command to the multimedia function module 16.
- the multimedia function module 16 executes a control command to adjust the channel to the Hunan Wei video channel.
- the present invention also provides a multimedia device voice control method.
- the voice recognition module 15 includes a local voice recognition module 151 and a cloud voice recognition module 152.
- the voice recognition module 15 presets a voice command dictionary, and further includes:
- Step S611 the local speech recognition module 151 identifies the speech data, the speech data is compared with the word model in the speech command dictionary, if the speech data and the at least one word model similarity is greater than the preset threshold, proceed to step S612, otherwise proceed to step S613;
- Step S612 the local speech recognition module 151 interprets the speech data as a control command corresponding to the word model, and sends it to the multimedia function module 16;
- Step S613 the voice data is sent to the cloud voice recognition module 152 through the network;
- Step S614 the cloud speech recognition module 152 performs semantic recognition processing on the speech data to form a control instruction, and sends the control instruction to the multimedia function module 16 through the network.
- Steps S1 to S51 of the present example are the same as the previous specific application example, and therefore are not described again.
- the array sound pickup module 14 collects the voice signal "Give me a song of Andy Lau" from the user's voice signal according to the definition of the sound pickup direction and the sound pickup receiving angle, digitizes it to form voice data, and sends it to the voice recognition module 15.
- the local speech recognition module 151 of the speech recognition module 15 recognizes the speech data, compares the speech data with the word model in the speech instruction dictionary, and if the word model with the similarity of the speech data greater than the preset threshold is not found, the voice data is sent through the network.
- the cloud speech recognition module 152 is provided.
- the cloud speech recognition module 152 performs semantic recognition processing on the speech data, and forms a "play song of Andy Lau" control command according to the user voice data, and transmits it to the multimedia function module 16 through the network.
- the multimedia function module 16 executes the control instruction, automatically searches for a song of Andy Lau via the search engine, downloads and transmits the audio and video data to the music playing module built in the smart TV 1 to play the audio and video data.
- the invention combines image recognition and speech recognition technology and computer storage medium to realize free and convenient voice control without relying on the handheld remote controller, and is not limited to the close-range sound pickup device, thereby effectively avoiding the sound outputted by the multimedia device, the environmental background sound and the user's non-
- the interference of the control command voice signal to the voice recognition of the control command is beneficial to realizing the accurate recognition of the control command issued by the user, and also enables the multiple users to separately or jointly control the multimedia device.
- the storage medium may be a magnetic disk, an optical disk, or a read-only storage memory (Read-Only) Memory, ROM) or Random Access Memory (RAM).
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Theoretical Computer Science (AREA)
- Signal Processing (AREA)
- General Engineering & Computer Science (AREA)
- Human Computer Interaction (AREA)
- Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- General Physics & Mathematics (AREA)
- Social Psychology (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Biomedical Technology (AREA)
- Computer Networks & Wireless Communication (AREA)
- Databases & Information Systems (AREA)
- Computational Linguistics (AREA)
- Acoustics & Sound (AREA)
- User Interface Of Digital Computer (AREA)
- Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
- Obtaining Desirable Characteristics In Audible-Bandwidth Transducers (AREA)
- Circuit For Audible Band Transducer (AREA)
Abstract
Description
Claims (19)
- 一种多媒体设备语音控制系统,其特征在于,包括:图像感应模块,用于采集用户动作图像;图像识别模块,用于根据用户动作图像确定控制指令类型或状态,将发出用户动作图像的用户所在位置确定为目标音源位置,发送目标音源所在位置信息,根据所述目标音源位置确定目标用户,所述目标用户即为操控者;语音识别状态管理模块,用于根据当前的控制指令类型激活或唤醒语音识别,向音束形成模块发送目标音源所在位置信息,并控制多媒体功能模块减小多媒体输出声音的音量;音束形成模块,用于根据目标音源所在位置确定拾音方向和拾音接收角;拾音模块,用于根据所述拾音方向和拾音接收角采集目标音源发出的语音信号,进行数字化处理形成语音数据;语音识别模块,用于对采集到的语音数据进行识别,形成控制指令;多媒体功能模块,用于执行控制指令,向用户提供相应的多媒体功能。
- 如权利要求1所述的多媒体设备语音控制系统,其特征在于,所述图像识别模块用于将用户动作图像与预设的图像模版比对,选取与用户动作图像匹配的控制指令类型;若比对结果为找到与用户动作图像匹配的控制指令类型,则认为所述用户所在位置为目标音源位置,向语音识别状态管理模块发送目标音源所在位置信息、启动语音识别信息和/或控制指令类型;若未找到与用户动作图像匹配的控制指令类型,则向语音识别状态管理模块发出比对失败信息。
- 如权利要求2所述的多媒体设备语音控制系统,其特征在于,所述图像识别模块用于向用户播放人机交互内容,引导用户做动作,直至动作符合预设的图像模板。
- 如权利要求2所述的多媒体设备语音控制系统,其特征在于,所述拾音模块为阵列拾音模块或者至少一拾音传感器,所述拾音传感器是规则或不规则排列的,所述拾音传感器根据拾音方向和拾音接收角的限定采集目标音源发出的语音信号,进行数字化处理形成语音数据后发送给语音识别模块。
- 如权利要求2所述的多媒体设备语音控制系统,其特征在于,所述语音识别状态管理模块根据接收到的启动语音识别信息,向语音识别模块发送启动指令及控制指令类型以激活或唤醒语音识别,向音束形成模块发送目标音源所在位置信息,并控制多媒体功能模块减小多媒体输出声音的音量,待所述拾音模块完成语音信号采集后将所述多媒体输出声音的音量调至正常音量。
- 如权利要求5所述的多媒体设备语音控制系统,其特征在于,所述语音识别模块根据来自语音识别状态管理模块的启动指令和控制指令类型,对来自拾音模块的语音数据进行识别,形成属于所述控制指令类型的控制指令,发送给多媒体功能模块。
- 如权利要求6所述的多媒体设备语音控制系统,其特征在于,所述语音识别模块预设内置的语音指令词典,所述语音指令词典中保存经过处理的控制指令语音信号单词模型;所述语音识别模块将语音数据与语音指令词典中的单词模型对比,若语音数据与至少一单词模型相似度大于预设阈值,则将所述语音数据解释为与所述单词模型对应的控制指令,发送给多媒体功能模块。
- 如权利要求6所述的多媒体设备语音控制系统,其特征在于,所述语音识别模块包括本地语音识别模块和云端语音识别模块;本地语音识别模块识别语音数据,形成属于所述控制指令类型的控制指令,发送给多媒体功能模块;云端语音识别模块对本地语音识别模块无法识别的语音数据进行语义识别处理,形成属于所述控制指令类型的控制指令,发送给多媒体功能模块。
- 如权利要求1所述的多媒体设备语音控制系统,其特征在于,所述多媒体功能模块执行控制指令,根据所述控制指令通过搜索引擎进行自动搜索得到音视频数据,下载并播放音视频数据。
- 一种多媒体设备语音控制方法,包括:采集用户动作图像;根据所述用户动作图像确定控制指令类型或状态,将发出用户动作图像的用户所在位置确定为目标音源位置,发送目标音源所在位置信息,根据所述目标音源位置确定目标用户,所述目标用户即为操控者;根据所述控制指令类型激活或唤醒语音识别,发送目标音源所在位置信息,并减小多媒体输出声音的音量;根据目标音源所在位置确定拾音方向和拾音接收角;根据所述拾音方向和拾音接收角的限定采集用户发出的语音信号,进行数字化处理形成语音数据;对采集到的语音数据进行识别,形成控制指令;执行控制指令,向用户提供相应的多媒体功能。
- 如权利要求10所述的一种多媒体设备语音控制方法,其特征在于,所述根据所述用户动作图像确定控制指令类型或状态,将发出用户动作图像的用户所在位置确定为目标音源位置,发送目标音源所在位置信息的步骤为:将用户动作图像与预设的图像模版比对,选取与用户动作图像匹配的控制指令类型;若比对结果为找到与用户动作图像匹配的控制指令类型,则认为所述用户所在位置为目标音源位置,发送目标音源所在位置信息、启动语音识别信息和/或控制指令类型;若未找到与用户动作图像匹配的控制指令类型,发出比对失败信息。
- 如权利要求11所述的多媒体设备语音控制方法,其特征在于,还包括:向用户播放人机交互内容,引导用户做动作,直至动作符合预设的图像模板。
- 如权利要求11所述的多媒体设备语音控制方法,其特征在于,所述根据所述拾音方向和拾音接收角采集目标音源发出的语音信号,形成语音数据的步骤为:规则或不规则排列的至少一拾音传感器,通过所述拾音传感器根据拾音方向和拾音接收角的限定采集目标音源发出的语音信号,进行数字化处理形成语音数据后发送所述语音数据。
- 如权利要求11所述的多媒体设备语音控制方法,其特征在于,所述控制指令类型激活语音识别,发送目标音源所在位置信息,并减小多媒体输出声音的音量的步骤还包括:根据接收到的启动语音识别信息,发送启动指令及控制指令类型以激活或唤醒语音识别,发送目标音源所在位置信息,减小多媒体输出声音的音量,待完成语音信号采集后将所述多媒体输出声音的音量调至正常音量。
- 如权利要求14所述的多媒体设备语音控制方法,其特征在于,所述根据接收到的启动语音识别信息,发送启动指令及控制指令类型以激活语音识别的步骤为:根据启动指令和控制指令类型,对语音数据进行识别,形成属于所述控制指令类型的控制指令,发送所述控制指令。
- 如权利要求15所述的多媒体设备语音控制方法,其特征在于,所述对语音数据进行识别,形成属于所述控制指令类型的控制指令,发送所述控制指令的步骤为:将语音数据与语音指令词典中的单词模型对比,所述语音指令词典中保存经过处理的控制指令语音信号单词模型;若语音数据与至少一单词模型相似度大于预设阈值,则将所述语音数据解释为与所述单词模型对应的控制指令,发送所述控制指令。
- 如权利要求15所述的多媒体设备语音控制方法,其特征在于,所述对语音数据进行识别,形成属于所述控制指令类型的控制指令,发送所述控制指令的步骤为:本地识别语音数据,形成属于所述控制指令类型的控制指令,发送所述控制指令;对本地无法识别的语音数据进行语义识别处理,形成属于所述控制指令类型的控制指令,发送所述控制指令。
- 如权利要求10所述的多媒体设备语音控制方法,其特征在于,所述执行控制指令,向用户提供相应的多媒体功能的步骤为:执行控制指令,根据所述控制指令通过搜索引擎进行自动搜索得到音视频数据,下载并播放音视频数据。
- 一种用于存储计算机可执行指令的计算机存储介质所述计算机可读存储介质存储有一个或者一个以上程序,所述一个或者一个以上程序被一个或者一个以上的处理器用来执行指令用于执行多媒体设备语音控制方法,其特征在于,所述方法包括:采集用户动作图像;根据所述用户动作图像确定控制指令类型或状态,将发出用户动作图像的用户所在位置确定为目标音源位置,发送目标音源所在位置信息,根据所述目标音源位置确定目标用户,所述目标用户即为操控者;根据所述控制指令类型激活或唤醒语音识别,发送目标音源所在位置信息,并减小多媒体输出声音的音量;根据目标音源所在位置确定拾音方向和拾音接收角;根据所述拾音方向和拾音接收角的限定采集用户发出的语音信号,进行数字化处理形成语音数据;对采集到的语音数据进行识别,形成控制指令;执行控制指令,向用户提供相应的多媒体功能。
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP13841489.1A EP2897126B1 (en) | 2012-09-29 | 2013-09-26 | Multimedia device voice control system and method, and computer storage medium |
US14/421,900 US9955210B2 (en) | 2012-09-29 | 2013-09-26 | Multimedia device voice control system and method, and computer storage medium |
JP2015533437A JP6012877B2 (ja) | 2012-09-29 | 2013-09-26 | マルチメディアデバイス用音声制御システム及び方法、及びコンピュータ記憶媒体 |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201210374809.1 | 2012-09-29 | ||
CN2012103748091A CN102945672B (zh) | 2012-09-29 | 2012-09-29 | 一种多媒体设备语音控制系统及方法 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2014048348A1 true WO2014048348A1 (zh) | 2014-04-03 |
Family
ID=47728610
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2013/084348 WO2014048348A1 (zh) | 2012-09-29 | 2013-09-26 | 一种多媒体设备语音控制系统及方法、计算机存储介质 |
Country Status (5)
Country | Link |
---|---|
US (1) | US9955210B2 (zh) |
EP (1) | EP2897126B1 (zh) |
JP (1) | JP6012877B2 (zh) |
CN (1) | CN102945672B (zh) |
WO (1) | WO2014048348A1 (zh) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104298349A (zh) * | 2014-09-24 | 2015-01-21 | 联想(北京)有限公司 | 信息处理方法及电子设备 |
TWI668979B (zh) * | 2017-12-29 | 2019-08-11 | 智眸科技有限公司 | 多媒體視聽系統 |
Families Citing this family (88)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102945672B (zh) * | 2012-09-29 | 2013-10-16 | 深圳市国华识别科技开发有限公司 | 一种多媒体设备语音控制系统及方法 |
CN104049721B (zh) * | 2013-03-11 | 2019-04-26 | 联想(北京)有限公司 | 信息处理方法及电子设备 |
CN104049723B (zh) * | 2013-03-12 | 2017-05-24 | 联想(北京)有限公司 | 在便携设备中启动关联应用的方法和便携设备 |
CN104065806A (zh) * | 2013-03-20 | 2014-09-24 | 辉达公司 | 对移动信息设备的语音控制 |
CN103268408A (zh) * | 2013-05-13 | 2013-08-28 | 云南瑞攀科技有限公司 | 多维交互平台 |
CN103456299B (zh) * | 2013-08-01 | 2016-06-15 | 百度在线网络技术(北京)有限公司 | 一种控制语音识别的方法和装置 |
CN203338756U (zh) * | 2013-08-03 | 2013-12-11 | 袁志贤 | 语音图像识别双控无线汽车音响 |
CN103581726A (zh) * | 2013-10-16 | 2014-02-12 | 四川长虹电器股份有限公司 | 一种电视设备上采用语音实现游戏控制的方法 |
CN104216351B (zh) * | 2014-02-10 | 2017-09-29 | 美的集团股份有限公司 | 家用电器语音控制方法及系统 |
CN103902373B (zh) * | 2014-04-02 | 2017-09-29 | 百度在线网络技术(北京)有限公司 | 智能终端控制方法、服务器和智能终端 |
US9569174B2 (en) * | 2014-07-08 | 2017-02-14 | Honeywell International Inc. | Methods and systems for managing speech recognition in a multi-speech system environment |
CN104269172A (zh) * | 2014-07-31 | 2015-01-07 | 广东美的制冷设备有限公司 | 基于视频定位的语音控制方法和系统 |
CN104200816B (zh) * | 2014-07-31 | 2017-12-22 | 广东美的制冷设备有限公司 | 语音控制方法和系统 |
CN104200817B (zh) * | 2014-07-31 | 2017-07-28 | 广东美的制冷设备有限公司 | 语音控制方法和系统 |
CN106796786B (zh) * | 2014-09-30 | 2021-03-02 | 三菱电机株式会社 | 语音识别系统 |
CN104681023A (zh) * | 2015-02-15 | 2015-06-03 | 联想(北京)有限公司 | 一种信息处理方法及电子设备 |
CN104882141A (zh) * | 2015-03-03 | 2015-09-02 | 盐城工学院 | 一种基于时延神经网络和隐马尔可夫模型的串口语音控制投影系统 |
CN104820556A (zh) * | 2015-05-06 | 2015-08-05 | 广州视源电子科技股份有限公司 | 唤醒语音助手的方法及装置 |
CN106325481A (zh) * | 2015-06-30 | 2017-01-11 | 展讯通信(天津)有限公司 | 一种非接触式控制系统及方法以及移动终端 |
CN106488286A (zh) * | 2015-08-28 | 2017-03-08 | 上海欢众信息科技有限公司 | 云端信息收集系统 |
CN106504753A (zh) * | 2015-09-07 | 2017-03-15 | 上海隆通网络系统有限公司 | 一种在it运维管理系统中的语音识别方法及系统 |
CN105976814B (zh) * | 2015-12-10 | 2020-04-10 | 乐融致新电子科技(天津)有限公司 | 头戴设备的控制方法和装置 |
CN105975060A (zh) * | 2016-04-26 | 2016-09-28 | 乐视控股(北京)有限公司 | 虚拟现实终端及其控制方法和装置 |
CN105976818B (zh) * | 2016-04-26 | 2020-12-25 | Tcl科技集团股份有限公司 | 指令识别的处理方法及装置 |
CN106023990A (zh) * | 2016-05-20 | 2016-10-12 | 深圳展景世纪科技有限公司 | 一种基于投影设备的语音控制方法及装置 |
CN107506165A (zh) * | 2016-06-14 | 2017-12-22 | 深圳市三诺声智联股份有限公司 | 一种智能电子宠物语音交互系统及方法 |
CN106920551A (zh) * | 2016-06-28 | 2017-07-04 | 广州零号软件科技有限公司 | 共用一套麦克风阵列的服务机器人双语音识别方法 |
WO2018013564A1 (en) * | 2016-07-12 | 2018-01-18 | Bose Corporation | Combining gesture and voice user interfaces |
CN107665708B (zh) * | 2016-07-29 | 2021-06-08 | 科大讯飞股份有限公司 | 智能语音交互方法及系统 |
CN106338711A (zh) * | 2016-08-30 | 2017-01-18 | 康佳集团股份有限公司 | 一种基于智能设备的语音定向方法及系统 |
CN106409294B (zh) * | 2016-10-18 | 2019-07-16 | 广州视源电子科技股份有限公司 | 防止语音命令误识别的方法和装置 |
CN106356061A (zh) * | 2016-10-24 | 2017-01-25 | 合肥华凌股份有限公司 | 基于声源定位的语音识别方法和系统、及智能家电设备 |
US10210863B2 (en) * | 2016-11-02 | 2019-02-19 | Roku, Inc. | Reception of audio commands |
KR20180049787A (ko) * | 2016-11-03 | 2018-05-11 | 삼성전자주식회사 | 전자 장치, 그의 제어 방법 |
EP4220630A1 (en) | 2016-11-03 | 2023-08-02 | Samsung Electronics Co., Ltd. | Electronic device and controlling method thereof |
CN106775562A (zh) * | 2016-12-09 | 2017-05-31 | 奇酷互联网络科技(深圳)有限公司 | 音频参数处理的方法及装置 |
KR102398390B1 (ko) | 2017-03-22 | 2022-05-16 | 삼성전자주식회사 | 전자 장치 및 전자 장치의 제어 방법 |
CN107103906B (zh) * | 2017-05-02 | 2020-12-11 | 网易(杭州)网络有限公司 | 一种唤醒智能设备进行语音识别的方法、智能设备和介质 |
US10435148B2 (en) * | 2017-05-08 | 2019-10-08 | Aurora Flight Sciences Corporation | Systems and methods for acoustic radiation control |
CN108986801B (zh) * | 2017-06-02 | 2020-06-05 | 腾讯科技(深圳)有限公司 | 一种人机交互方法、装置及人机交互终端 |
US11178280B2 (en) * | 2017-06-20 | 2021-11-16 | Lenovo (Singapore) Pte. Ltd. | Input during conversational session |
CN107195304A (zh) * | 2017-06-30 | 2017-09-22 | 珠海格力电器股份有限公司 | 一种电器设备的语音控制电路和方法 |
KR102392087B1 (ko) * | 2017-07-10 | 2022-04-29 | 삼성전자주식회사 | 원격 조정 장치 및 원격 조정 장치의 사용자 음성 수신방법 |
US10599377B2 (en) | 2017-07-11 | 2020-03-24 | Roku, Inc. | Controlling visual indicators in an audio responsive electronic device, and capturing and providing audio using an API, by native and non-native computing devices and services |
US11062710B2 (en) | 2017-08-28 | 2021-07-13 | Roku, Inc. | Local and cloud speech recognition |
US11062702B2 (en) | 2017-08-28 | 2021-07-13 | Roku, Inc. | Media system with multiple digital assistants |
US10777197B2 (en) | 2017-08-28 | 2020-09-15 | Roku, Inc. | Audio responsive device with play/stop and tell me something buttons |
CN107656977A (zh) * | 2017-09-05 | 2018-02-02 | 捷开通讯(深圳)有限公司 | 多媒体文件的获取及播放方法以及装置 |
CN107657956B (zh) * | 2017-10-23 | 2020-12-22 | 吴建伟 | 一种多媒体设备语音控制系统及方法 |
CN108064007A (zh) * | 2017-11-07 | 2018-05-22 | 苏宁云商集团股份有限公司 | 用于智能音箱的增强人声识别的方法及微控制器和智能音箱 |
KR102527278B1 (ko) | 2017-12-04 | 2023-04-28 | 삼성전자주식회사 | 전자 장치, 그 제어 방법 및 컴퓨터 판독가능 기록 매체 |
CN109961781B (zh) * | 2017-12-22 | 2021-08-27 | 深圳市优必选科技有限公司 | 基于机器人的语音信息接收方法、系统及终端设备 |
CN108319171B (zh) * | 2018-02-09 | 2020-08-07 | 广景视睿科技(深圳)有限公司 | 一种基于语音控制的动向投影方法、装置及动向投影系统 |
US11145298B2 (en) | 2018-02-13 | 2021-10-12 | Roku, Inc. | Trigger word detection with multiple digital assistants |
CN108536418A (zh) * | 2018-03-26 | 2018-09-14 | 深圳市冠旭电子股份有限公司 | 一种无线音箱播放模式切换的方法、装置及无线音箱 |
CN110321201A (zh) * | 2018-03-29 | 2019-10-11 | 努比亚技术有限公司 | 一种后台程序处理方法、终端及计算机可读存储介质 |
CN108469772B (zh) * | 2018-05-18 | 2021-07-20 | 创新先进技术有限公司 | 一种智能设备的控制方法和装置 |
TWI704490B (zh) * | 2018-06-04 | 2020-09-11 | 和碩聯合科技股份有限公司 | 語音控制裝置及方法 |
CN108806682B (zh) * | 2018-06-12 | 2020-12-01 | 奇瑞汽车股份有限公司 | 获取天气信息的方法和装置 |
CN110719553B (zh) * | 2018-07-13 | 2021-08-06 | 国际商业机器公司 | 具有认知声音分析和响应的智能扬声器系统 |
WO2020014899A1 (zh) * | 2018-07-18 | 2020-01-23 | 深圳魔耳智能声学科技有限公司 | 语音控制方法、中控设备和存储介质 |
KR20200013162A (ko) | 2018-07-19 | 2020-02-06 | 삼성전자주식회사 | 전자 장치 및 그의 제어 방법 |
CN109410931A (zh) * | 2018-10-15 | 2019-03-01 | 四川长虹电器股份有限公司 | 以电视为中心的移动终端物联网语音控制系统及方法 |
CN109348164A (zh) * | 2018-11-19 | 2019-02-15 | 国网山东省电力公司信息通信公司 | 一种电视电话会议自助保障控制系统 |
CN109727596B (zh) * | 2019-01-04 | 2020-03-17 | 北京市第一〇一中学 | 控制遥控器的方法和遥控器 |
WO2020140271A1 (zh) * | 2019-01-04 | 2020-07-09 | 珊口(上海)智能科技有限公司 | 移动机器人的控制方法、装置、移动机器人及存储介质 |
CN110136707B (zh) * | 2019-04-22 | 2021-03-02 | 云知声智能科技股份有限公司 | 一种用于进行多设备自主决策的人机交互系统 |
CN110099295B (zh) * | 2019-05-30 | 2022-04-12 | 深圳创维-Rgb电子有限公司 | 电视机语音控制方法、装置、设备及存储介质 |
CN112435660A (zh) * | 2019-08-08 | 2021-03-02 | 上海博泰悦臻电子设备制造有限公司 | 车辆控制方法、系统及车辆 |
CN110364176A (zh) * | 2019-08-21 | 2019-10-22 | 百度在线网络技术(北京)有限公司 | 语音信号处理方法及装置 |
JP6886118B2 (ja) * | 2019-08-27 | 2021-06-16 | 富士通クライアントコンピューティング株式会社 | 情報処理装置およびプログラム |
CN110689884A (zh) * | 2019-09-09 | 2020-01-14 | 苏州臻迪智能科技有限公司 | 智能设备控制方法及装置 |
CN110597122A (zh) * | 2019-09-17 | 2019-12-20 | 电子科技大学中山学院 | 一种嵌入式多媒体的控制系统 |
WO2021051403A1 (zh) * | 2019-09-20 | 2021-03-25 | 深圳市汇顶科技股份有限公司 | 一种语音控制方法、装置、芯片、耳机及系统 |
CN111208736B (zh) * | 2019-12-17 | 2023-10-27 | 中移(杭州)信息技术有限公司 | 智能音箱控制方法、装置、电子设备及存储介质 |
CN111462744B (zh) * | 2020-04-02 | 2024-01-30 | 深圳创维-Rgb电子有限公司 | 一种语音交互方法、装置、电子设备及存储介质 |
CN111356022A (zh) * | 2020-04-18 | 2020-06-30 | 徐琼琼 | 一种基于语音识别的视频文件处理方法 |
CN111554283A (zh) * | 2020-04-23 | 2020-08-18 | 海信集团有限公司 | 一种智能设备及其控制方法 |
CN111767793A (zh) * | 2020-05-25 | 2020-10-13 | 联想(北京)有限公司 | 一种数据处理方法及装置 |
EP4163764A4 (en) * | 2020-07-03 | 2023-11-22 | Huawei Technologies Co., Ltd. | IN-VEHICLE AIR GESTURE INTERACTION METHOD, ELECTRONIC DEVICE AND SYSTEM |
CN112333534B (zh) * | 2020-09-17 | 2023-11-14 | 深圳Tcl新技术有限公司 | 杂音消除方法、装置、智能电视系统及可读存储介质 |
CN112201237B (zh) * | 2020-09-23 | 2024-04-19 | 安徽中科新辰技术有限公司 | 一种基于com口实现语音集中控制指挥大厅多媒体设备的方法 |
CN112141834A (zh) * | 2020-10-26 | 2020-12-29 | 华中科技大学同济医学院附属协和医院 | 一种电梯的语音控制系统及控制方法 |
CN112383822B (zh) * | 2020-11-16 | 2022-03-15 | 四川长虹电器股份有限公司 | 一种电视机管控语音模块的方法 |
CN113470637A (zh) * | 2021-05-10 | 2021-10-01 | 辛巴网络科技(南京)有限公司 | 一种车载多个音频媒体的语音控制方法 |
CN113450795A (zh) * | 2021-06-28 | 2021-09-28 | 深圳七号家园信息技术有限公司 | 一种具有语音唤醒功能的图像识别方法及系统 |
CN116417006A (zh) * | 2021-12-31 | 2023-07-11 | 华为技术有限公司 | 声音信号处理方法、装置、设备及存储介质 |
CN115190243B (zh) * | 2022-07-08 | 2024-04-05 | 上海西派埃智能化系统有限公司 | 一种行车停止位监测系统及方法 |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6243683B1 (en) * | 1998-12-29 | 2001-06-05 | Intel Corporation | Video control of speech recognition |
CN1397063A (zh) * | 2000-11-27 | 2003-02-12 | 皇家菲利浦电子有限公司 | 对具有声音输出装置的设备进行控制的方法 |
JP2007094104A (ja) * | 2005-09-29 | 2007-04-12 | Sony Corp | 情報処理装置および方法、並びにプログラム |
US20070233321A1 (en) * | 2006-03-29 | 2007-10-04 | Kabushiki Kaisha Toshiba | Position detecting device, autonomous mobile device, method, and computer program product |
CN201115599Y (zh) * | 2007-10-19 | 2008-09-17 | 深圳市壹声通语音科技有限公司 | 一种具有声控识别功能的智能烹饪装置 |
US7538711B2 (en) * | 2004-09-24 | 2009-05-26 | Samsung Electronics Co., Ltd. | Integrated remote control device receiving multimodal input and method of the same |
CN102306051A (zh) * | 2010-06-18 | 2012-01-04 | 微软公司 | 复合姿势-语音命令 |
WO2012070812A2 (en) * | 2010-11-22 | 2012-05-31 | Lg Electronics Inc. | Control method using voice and gesture in multimedia device and multimedia device thereof |
CN102682770A (zh) * | 2012-02-23 | 2012-09-19 | 西安雷迪维护系统设备有限公司 | 基于云计算的语音识别系统 |
CN102945672A (zh) * | 2012-09-29 | 2013-02-27 | 深圳市国华识别科技开发有限公司 | 一种多媒体设备语音控制系统及方法 |
Family Cites Families (24)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPS57196300A (en) | 1981-05-28 | 1982-12-02 | Mitsubishi Electric Corp | Voice output controller |
JPH1124694A (ja) * | 1997-07-04 | 1999-01-29 | Sanyo Electric Co Ltd | 命令認識装置 |
US6690618B2 (en) * | 2001-04-03 | 2004-02-10 | Canesta, Inc. | Method and apparatus for approximating a source position of a sound-causing event for determining an input used in operating an electronic device |
US20030069733A1 (en) * | 2001-10-02 | 2003-04-10 | Ryan Chang | Voice control method utilizing a single-key pushbutton to control voice commands and a device thereof |
JP2005122128A (ja) * | 2003-09-25 | 2005-05-12 | Fuji Photo Film Co Ltd | 音声認識システム及びプログラム |
JP4581441B2 (ja) * | 2004-03-18 | 2010-11-17 | パナソニック株式会社 | 家電機器システム、家電機器および音声認識方法 |
JP2007041089A (ja) * | 2005-08-01 | 2007-02-15 | Hitachi Ltd | 情報端末および音声認識プログラム |
JP4845183B2 (ja) * | 2005-11-21 | 2011-12-28 | 独立行政法人情報通信研究機構 | 遠隔対話方法及び装置 |
JP2008263422A (ja) * | 2007-04-12 | 2008-10-30 | Yasumasa Muto | 画像撮像装置および画像撮像方法 |
CN100449468C (zh) * | 2007-04-26 | 2009-01-07 | 上海交通大学 | 基于视觉跟踪与语音识别的鼠标系统 |
JP2009069202A (ja) * | 2007-09-10 | 2009-04-02 | Teac Corp | 音声処理装置 |
JP2009098217A (ja) * | 2007-10-12 | 2009-05-07 | Pioneer Electronic Corp | 音声認識装置、音声認識装置を備えたナビゲーション装置、音声認識方法、音声認識プログラム、および記録媒体 |
CN101464773A (zh) * | 2007-12-19 | 2009-06-24 | 神基科技股份有限公司 | 随使用者位置而显示程序执行视窗的方法与电脑系统 |
US7934161B1 (en) * | 2008-12-09 | 2011-04-26 | Jason Adam Denise | Electronic search interface technology |
JP2011061461A (ja) | 2009-09-09 | 2011-03-24 | Sony Corp | 撮像装置、指向性制御方法及びそのプログラム |
CN102483918B (zh) * | 2009-11-06 | 2014-08-20 | 株式会社东芝 | 声音识别装置 |
US8676581B2 (en) * | 2010-01-22 | 2014-03-18 | Microsoft Corporation | Speech recognition analysis via identification information |
JP2011209787A (ja) | 2010-03-29 | 2011-10-20 | Sony Corp | 情報処理装置、および情報処理方法、並びにプログラム |
JP2011257943A (ja) * | 2010-06-08 | 2011-12-22 | Canon Inc | ジェスチャ操作入力装置 |
US8381108B2 (en) * | 2010-06-21 | 2013-02-19 | Microsoft Corporation | Natural user input for driving interactive stories |
JP2013529794A (ja) * | 2010-06-24 | 2013-07-22 | 本田技研工業株式会社 | 車載音声認識システム及び車両外音声認識システム間の通信システム及び方法 |
WO2012091185A1 (en) * | 2010-12-27 | 2012-07-05 | Lg Electronics Inc. | Display device and method of providing feedback for gestures thereof |
JP5039214B2 (ja) * | 2011-02-17 | 2012-10-03 | 株式会社東芝 | 音声認識操作装置及び音声認識操作方法 |
TWI569258B (zh) * | 2012-01-02 | 2017-02-01 | 晨星半導體股份有限公司 | 電子裝置的聲控系統以及相關控制方法 |
-
2012
- 2012-09-29 CN CN2012103748091A patent/CN102945672B/zh active Active
-
2013
- 2013-09-26 WO PCT/CN2013/084348 patent/WO2014048348A1/zh active Application Filing
- 2013-09-26 US US14/421,900 patent/US9955210B2/en active Active
- 2013-09-26 EP EP13841489.1A patent/EP2897126B1/en active Active
- 2013-09-26 JP JP2015533437A patent/JP6012877B2/ja active Active
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6243683B1 (en) * | 1998-12-29 | 2001-06-05 | Intel Corporation | Video control of speech recognition |
CN1397063A (zh) * | 2000-11-27 | 2003-02-12 | 皇家菲利浦电子有限公司 | 对具有声音输出装置的设备进行控制的方法 |
US7538711B2 (en) * | 2004-09-24 | 2009-05-26 | Samsung Electronics Co., Ltd. | Integrated remote control device receiving multimodal input and method of the same |
JP2007094104A (ja) * | 2005-09-29 | 2007-04-12 | Sony Corp | 情報処理装置および方法、並びにプログラム |
US20070233321A1 (en) * | 2006-03-29 | 2007-10-04 | Kabushiki Kaisha Toshiba | Position detecting device, autonomous mobile device, method, and computer program product |
CN201115599Y (zh) * | 2007-10-19 | 2008-09-17 | 深圳市壹声通语音科技有限公司 | 一种具有声控识别功能的智能烹饪装置 |
CN102306051A (zh) * | 2010-06-18 | 2012-01-04 | 微软公司 | 复合姿势-语音命令 |
WO2012070812A2 (en) * | 2010-11-22 | 2012-05-31 | Lg Electronics Inc. | Control method using voice and gesture in multimedia device and multimedia device thereof |
CN102682770A (zh) * | 2012-02-23 | 2012-09-19 | 西安雷迪维护系统设备有限公司 | 基于云计算的语音识别系统 |
CN102945672A (zh) * | 2012-09-29 | 2013-02-27 | 深圳市国华识别科技开发有限公司 | 一种多媒体设备语音控制系统及方法 |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104298349A (zh) * | 2014-09-24 | 2015-01-21 | 联想(北京)有限公司 | 信息处理方法及电子设备 |
TWI668979B (zh) * | 2017-12-29 | 2019-08-11 | 智眸科技有限公司 | 多媒體視聽系統 |
Also Published As
Publication number | Publication date |
---|---|
JP6012877B2 (ja) | 2016-10-25 |
CN102945672A (zh) | 2013-02-27 |
US9955210B2 (en) | 2018-04-24 |
EP2897126A4 (en) | 2016-05-11 |
US20150222948A1 (en) | 2015-08-06 |
EP2897126B1 (en) | 2017-09-20 |
CN102945672B (zh) | 2013-10-16 |
JP2015535952A (ja) | 2015-12-17 |
EP2897126A1 (en) | 2015-07-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2014048348A1 (zh) | 一种多媒体设备语音控制系统及方法、计算机存储介质 | |
US20190304448A1 (en) | Audio playback device and voice control method thereof | |
WO2014107076A1 (en) | Display apparatus and method of controlling a display apparatus in a voice recognition system | |
WO2013094778A1 (en) | Pause and resume schemes for gesture-based game | |
WO2013168988A1 (en) | Electronic apparatus and method for controlling electronic apparatus thereof | |
WO2017190614A1 (zh) | 基于智能终端的人机交互方法和系统 | |
JP4897169B2 (ja) | 音声認識装置及び消費者電子システム | |
CN106875946B (zh) | 语音控制交互系统 | |
WO2022161077A1 (zh) | 语音控制方法和电子设备 | |
CN108882101A (zh) | 一种智能音箱的播放控制方法、装置、设备及存储介质 | |
CN111447519A (zh) | 智能音箱、基于智能音箱的交互方法及程序产品 | |
CN106713699A (zh) | 一种控制系统 | |
WO2013100367A1 (en) | Electronic apparatus and method for controlling thereof | |
CN104715769B (zh) | 一种通过光感应控制无线音乐的方法及系统 | |
WO2021054671A1 (en) | Electronic apparatus and method for controlling voice recognition thereof | |
WO2018117660A1 (en) | Security enhanced speech recognition method and device | |
WO2017080233A1 (zh) | 一种基于蓝牙人机接口协议实现远程控制的方法及系统 | |
WO2021008095A1 (zh) | 线下远场语音控制系统、控制方法及设备 | |
WO2021246795A1 (ko) | 제스처를 인식하는 방법, 시스템 및 비일시성의 컴퓨터 판독 가능 기록 매체 | |
WO2023011370A1 (zh) | 音频播放方法、装置 | |
US20210117148A1 (en) | Determining method and corresponding system of music playback for whether to play music based on image information | |
WO2021091063A1 (ko) | 전자장치 및 그 제어방법 | |
WO2019160388A1 (ko) | 사용자의 발화를 기반으로 컨텐츠를 제공하는 장치 및 시스템 | |
CN113593563A (zh) | 语音处理方法、遥控器及系统 | |
WO2013100368A1 (en) | Electronic apparatus and method of controlling the same |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 13841489 Country of ref document: EP Kind code of ref document: A1 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 14421900 Country of ref document: US |
|
REEP | Request for entry into the european phase |
Ref document number: 2013841489 Country of ref document: EP |
|
ENP | Entry into the national phase |
Ref document number: 2015533437 Country of ref document: JP Kind code of ref document: A |
|
NENP | Non-entry into the national phase |
Ref country code: DE |