WO2015146179A1 - 音声コマンド入力装置および音声コマンド入力方法 - Google Patents
音声コマンド入力装置および音声コマンド入力方法 Download PDFInfo
- Publication number
- WO2015146179A1 WO2015146179A1 PCT/JP2015/001721 JP2015001721W WO2015146179A1 WO 2015146179 A1 WO2015146179 A1 WO 2015146179A1 JP 2015001721 W JP2015001721 W JP 2015001721W WO 2015146179 A1 WO2015146179 A1 WO 2015146179A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- voice
- information
- voice command
- unit
- input
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/28—Constructional details of speech recognition systems
- G10L15/32—Multiple recognisers used in sequence or in parallel; Score combination systems therefor, e.g. voting systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G10L17/02—Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G10L17/06—Decision making techniques; Pattern matching strategies
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/223—Execution procedure of a spoken command
Definitions
- the microphone 601 is installed in each room of the house, and is connected to one control unit 602 installed in the house with a signal line. For example, when the speaker instructs to turn on / off the air conditioner 610 by a speaker, the control unit 602 turns the power on / off from the interface 607 to the remote controller 608 of the air conditioner 610 based on the voice recognition result. A control signal for instructing switching is transmitted. Then, the power supply of the air conditioner 610 is turned on or off via the remote controller 608.
- the control unit 602 is configured as follows.
- the control unit 602 includes an analog-digital conversion circuit (hereinafter referred to as “A / D circuit”) 603, an arbitration circuit 605, a speech recognition processor 606, and an interface 607.
- a / D circuit analog-digital conversion circuit
- a / D circuit analog-digital conversion circuit
- arbitration circuit 605 an arbitration circuit
- speech recognition processor 606 an interface 607.
- the present disclosure relates to a voice command input device capable of appropriately processing even when a plurality of users utter a voice at the same time, or a plurality of microphones collects sounds of a single speaker. I will provide a.
- the voice command input device includes a first voice input unit, a second voice input unit, and a voice command identification unit.
- the first voice input unit includes a first identification information generation unit that outputs first identification information, and a first voice recognition unit that converts voice into first voice command information.
- the first voice information including the identification information and the first voice command information is output.
- the second voice input unit includes a second identification information generation unit that outputs the second identification information, and a second voice recognition unit that converts voice into second voice command information.
- Second voice information including identification information and second voice command information is output.
- the voice command identification unit is configured to generate and output a control signal for controlling the operation target device based on the first voice information and the second voice information. Then, the voice command identification unit generates a control signal with reference to the first identification information and the second identification information.
- a voice command input method includes a step of generating first identification information, a step of converting voice into first voice command information, a step of generating second identification information, and a voice as a second Based on the step of converting into voice command information, the step of referring to the first identification information and the second identification information, the result of the reference, the first voice command information and the second voice command information. Generating a control signal for controlling the operation target device.
- FIG. 1 is a block diagram illustrating a configuration example of a voice command input device according to the first embodiment.
- FIG. 2 is a flowchart showing an operation example of the voice command input device according to the first embodiment.
- FIG. 3 is a block diagram illustrating a configuration example of the voice command input device according to the second embodiment.
- FIG. 4 is a block diagram illustrating a configuration example of the voice command input device according to the third embodiment.
- FIG. 5 is a flowchart illustrating an operation example of the voice command input device according to the third embodiment.
- FIG. 6 is a block diagram showing the configuration of a conventional voice command input device.
- a person who uses the voice command input device is referred to as a “user” or a “speaker”.
- a speaker is a person who utters a voice command to the voice command input device.
- voice command the voice that the user utters in order to instruct device operation to the voice command input device.
- FIG. 1 is a block diagram illustrating a configuration example of the voice command input device 100 according to the first embodiment.
- the voice command input device 100 includes a first voice input unit 114, a second voice input unit 115, a voice command identification unit 107, and a command issue unit 108.
- the first voice input unit 114 includes a first microphone 101, a first voice recognition unit 102, and a first time stamp addition unit 103.
- the second voice input unit 115 includes a second microphone 104, a second voice recognition unit 105, and a second time stamp adding unit 106.
- the user's voice collected by the first microphone 101 is input to the first voice recognition unit 102 for voice recognition processing.
- the voice collected by the first microphone 101 is referred to as a voice command 111.
- the first voice recognition unit 102 recognizes the voice command 111 and converts it into first voice command information.
- the first time stamp adding unit 103 outputs first time stamp information indicating the time when the voice command 111 is input to the first voice input unit 114.
- the first time stamp adding unit 103 is an example of a first identification information generating unit, and the first time stamp information is an example of first identification information.
- the first voice recognition unit 102 outputs the first voice information 109 including the first voice command information and the first time stamp information to the voice command identification unit 107.
- the user's voice collected by the second microphone 104 is input to the second voice recognition unit 105 for voice recognition processing.
- the voice collected by the second microphone 104 is referred to as a voice command 112.
- the second voice recognition unit 105 recognizes the voice command 112 and converts it into second voice command information.
- the second time stamp adding unit 106 outputs second time stamp information indicating the time when the voice command 112 is input to the second voice input unit 115.
- the second time stamp adding unit 106 is an example of a second identification information generating unit, and the second time stamp information is an example of second identification information.
- the second voice recognition unit 105 outputs the second voice information 110 including the second voice command information and the second time stamp information to the voice command identification unit 107.
- the speech recognition process performed by the first speech recognition unit 102 and the second speech recognition unit 105 can be realized using a generally used speech recognition technology, detailed description thereof is omitted.
- the first time stamp adding unit 103 and the second time stamp adding unit 106 both refer to one time managed in the voice command input device 100, and the first time stamp information and the second time stamp. It is desirable to generate stamp information.
- information other than the time may be referred to as long as the timing at which the voice is input to the voice input unit can be indicated. For example, time information counted up or down with a lapse of a certain time may be referred to instead of the time.
- the voice command identifying unit 107 is based on the first voice information 109 output from the first voice input unit 114 and the second voice information 110 output from the second voice input unit 115.
- the input device 100 generates a control signal corresponding to a device to be operated (hereinafter referred to as “operation target device”) and outputs the control signal to the command issuing unit 108.
- the command issuing unit 108 converts the control signal output from the voice command identifying unit 107 into a device control signal 113 for controlling the operation target device of the voice command input device 100, and outputs the device control signal 113.
- the command issuing unit 108 is appropriately configured according to the operation target device of the voice command input device 100.
- the device to be operated is a television receiver (hereinafter referred to as “TV”) having an infrared remote control (hereinafter abbreviated as “remote control”) signal light receiving unit
- the command issuing unit 108 outputs an infrared remote control code.
- the control signal output from the voice command identification unit 107 is a remote control code for controlling the television.
- the command issuing unit 108 converts the control signal input from the voice command identification unit 107 into an infrared remote control code and outputs the infrared remote control code. Therefore, the device control signal 113 is the infrared remote control code.
- the command issuing unit 108 is not limited to a specific configuration, and is appropriately configured according to the operation target device of the voice command input device 100.
- a plurality of configurations corresponding to the plurality of operation target devices are provided.
- the voice command identification unit 107 is configured to output a control signal appropriately in accordance with the configuration of the command issuing unit 108.
- each of the plurality of voice input units can simultaneously receive voice commands uttered by different users. Therefore, even when a plurality of users utter voice commands to each microphone at the same time, the voice command input device 100 recognizes each of the plurality of voice commands and executes a plurality of processes based on the result of the voice recognition. can do.
- a plurality of microphones are installed in a relatively narrow area, so that one voice command uttered by one speaker is duplicated by a plurality of microphones.
- a sound is collected, it may be mistakenly recognized as a plurality of voice commands even though it is a single voice command, and there is a possibility that the processing is executed redundantly.
- FIG. 2 is a flowchart showing an operation example of the voice command input device 100 according to the first embodiment.
- the voice command identification unit 107 when two pieces of voice information (for example, the first voice information 109 and the second voice information 110) are input to the voice command identification unit 107 almost simultaneously, or one voice information.
- the voice command identification unit 107 When only one piece of voice information is input, the voice command identification unit 107 generates and outputs a control signal corresponding to the voice command information included in the voice information, and description of the operation is omitted.
- the voice command input device 100 may be configured to include three or more voice input units, and three or more voice information may be input to the voice command identification unit 107.
- the voice command identification unit 107 uses voice command information (for example, first voice command information and second voice command information) from each of two pieces of voice information (for example, first voice information 109 and second voice information 110). ) And time stamp information (for example, first time stamp information and second time stamp information) are extracted (step S200).
- voice command information for example, first voice command information and second voice command information
- time stamp information for example, first time stamp information and second time stamp information
- the voice command identification unit 107 compares the two voice command information extracted in step S200 with each other and confirms whether or not they are substantially the same (step S201).
- step S201 When it is determined in step S201 that the respective voice command information is not the same (No), the voice command identifying unit 107 uses the voice command information extracted in step S200 as a voice uttered by a different speaker. It is determined that they are different from each other (step S202).
- the voice command identification unit 107 generates a control signal (for example, two control signals) corresponding to each of the voice command information extracted in step S200, and outputs it to the command issuing unit 108 (step S203).
- a control signal for example, two control signals
- step S201 When it is determined in step S201 that the voice command information is the same (Yes), the voice command identification unit 107 calculates the time difference between the two time stamp information extracted in step S200. Then, the calculated time difference is compared with a predetermined recognition threshold (step S204).
- the recognition threshold is set to 1 second, for example, but is not limited to this value, and may be set to a value other than 1 second.
- the voice command identification unit 107 may hold a recognition threshold value in advance, or may acquire or set the recognition threshold value from the outside.
- step S204 When it is determined in step S204 that the time difference between the two time stamp information is less than or equal to the recognition threshold (Yes), the voice command identification unit 107 determines that each voice command information extracted in step S200 is one speaker. It is determined that they are the same by the voice uttered from (step S205).
- the voice command identifying unit 107 generates one control signal corresponding to the voice command information and outputs it to the command issuing unit 108 (step S206).
- step S204 When it is determined in step S204 that the time difference between the two time stamp information is larger than the recognition threshold (No), the voice command identifying unit 107 utters each voice command information extracted in step S200 from a different speaker. It is determined that the voices are different from each other (step S207). That is, the voice command identifying unit 107 determines that the two voice command information has the same content but should be processed as different voice commands from different speakers.
- the voice command identifying unit 107 generates a control signal (for example, two control signals) corresponding to each of the voice command information extracted in step S200, and outputs the control signal to the command issuing unit 108 (step S208).
- a control signal for example, two control signals
- the voice command input device includes the first voice input unit, the second voice input unit, and the voice command identification unit.
- the first voice input unit includes a first identification information generation unit that outputs first identification information, and a first voice recognition unit that converts voice into first voice command information.
- the first voice information including the identification information and the first voice command information is output.
- the second voice input unit includes a second identification information generation unit that outputs the second identification information, and a second voice recognition unit that converts voice into second voice command information.
- Second voice information including identification information and second voice command information is output.
- the voice command identification unit is configured to generate and output a control signal for controlling the operation target device based on the first voice information and the second voice information. Then, the voice command identification unit generates a control signal with reference to the first identification information and the second identification information.
- the first identification information generation unit outputs first time stamp information indicating the timing at which the voice is input to the first voice input unit as the first identification information
- the second The identification information generating unit outputs the second time stamp information indicating the timing when the voice is input to the second voice input unit as the second identification information.
- the voice command identification unit generates a control signal based on the time difference between the first time stamp information and the second time stamp information.
- the first voice input unit 114 is an example of the first voice input unit
- the first voice recognition unit 102 is an example of the first voice recognition unit
- the first time stamp adding unit 103 is the first 1 is an example of an identification information generation unit
- the second voice input unit 115 is an example of a second voice input unit
- the second voice recognition unit 105 is an example of a second voice recognition unit
- the time stamp adding unit 106 is an example of a second identification information generation unit
- the voice command identification unit 107 is an example of a voice command identification unit
- the first voice information 109 is an example of first voice information.
- the second audio information 110 is an example of the second audio information.
- each of the plurality of voice input units can simultaneously receive voice commands uttered by different users. Therefore, even if a plurality of users utter voice commands to each microphone at the same time, the plurality of voice commands can be recognized as voices, and a plurality of processes based on the results of the voice recognition can be executed.
- the voice command input device 100 determines whether the voice command collected by each of the plurality of microphones is uttered by one speaker or a plurality of speakers based on the time stamp information.
- the control signal can be generated based on the determination result. Therefore, for example, even if a plurality of microphones are installed in a relatively small area and one voice command uttered by one speaker is collected by a plurality of microphones, the processing is repeated. The processing based on the voice command can be appropriately executed.
- the recognition threshold is 1 second.
- the recognition threshold may be shorter than 1 second or longer than 1 second.
- the recognition threshold is set to a relatively long time, and there is a tendency for the user to repeat the same word when uttering a voice command, the interval between spoken voice commands is within the recognition threshold. Therefore, it is possible to increase the possibility that a repeated voice command is determined as one voice command, and to suppress malfunction.
- the voice command identifying unit may erroneously recognize two voices as separate voice commands.
- a control signal for switching on / off of the television is output twice from the voice command identification unit to the command issuing unit, and the TV is turned on / off from the command issuing unit receiving the control signal.
- the device control signal 113 is issued twice, and as a result, a malfunction occurs in which the television is turned off and then turned on again.
- such an unintended malfunction can be prevented by setting the recognition threshold to a relatively long time.
- the recognition threshold is set to a relatively short time, even if multiple users utter the same voice command to different microphones almost simultaneously, they are recognized as different voice commands and processed correctly. Can increase the possibility of
- the time information referred to by the first time stamp adding unit 103 and the time information referred to by the second time stamp adding unit 106 may be the same time information or different time information. . However, in the case of different time information, it is desirable that they are synchronized with each other.
- the first time stamp adding unit 103 and the second time stamp adding unit 106 may regularly communicate with each other so that their time information is synchronized.
- the voice command input so that the time information is synchronized with the time information source periodically by communicating with the same time information source (clock source, for example, a time distribution device such as an NTP server (Network Time Protocol Server)).
- An apparatus may be configured.
- the recognition threshold is set to a relatively short time, as described above, there is a possibility that a malfunction may occur when there is a tendency to repeat the same word when a user utters a voice command.
- the same input threshold is set as the second threshold information in the voice command identification unit 107. Even if the time difference between the two time stamp information exceeds the recognition threshold value that is the first threshold information, the two voice command information is less than the same input threshold value that is the second threshold information. Is treated as one voice command information. This makes it appropriate for both cases where multiple users utter voice commands almost simultaneously and when users tend to repeat the same words when uttering voice commands. It is possible to respond.
- first voice input unit 114 and the second voice input unit 115 included in the voice command input device 100 may be installed at locations separated from each other.
- first microphone 101 and the second microphone 104 may be installed at locations separated from each other, and the other blocks may be incorporated in one device.
- the first time stamp adding unit 103 is provided as the first identification information generating unit, the first time stamp information is used as the first identification information, and the second time is used as the second identification information generating unit.
- the time stamp adding unit 106 is provided, and the second time stamp information is used as the second identification information.
- Embodiment 2 will be described with reference to FIG.
- FIG. 3 is a block diagram illustrating a configuration example of the voice command input device 300 according to the second embodiment.
- the voice command input device 300 includes a first voice input unit 318, a second voice input unit 319, a third voice input unit 320, a voice command identification unit 310, and a command issue unit 311.
- the first voice input unit 318 includes a first microphone 301, a first voice recognition unit 302, and a first position information addition unit 303.
- the second voice input unit 319 includes a second microphone 304, a second voice recognition unit 305, and a second position information addition unit 306.
- the third voice input unit 320 includes a third microphone 307, a third voice recognition unit 308, and a third position information addition unit 309.
- the user's voice collected by the first microphone 301 is input to the first voice recognition unit 302 for voice recognition processing.
- the voice collected by the first microphone 301 is referred to as a voice command 315.
- the first voice recognition unit 302 recognizes the voice command 315 and converts it into first voice command information.
- the first position information adding unit 303 outputs first position information indicating position information of a place where the first microphone 301 is installed.
- the first position information adding unit 303 is an example of a first identification information generating unit, and the first position information is an example of first identification information.
- the first voice recognition unit 302 outputs the first voice information 312 including the first voice command information and the first position information to the voice command identification unit 310.
- the user's voice collected by the second microphone 304 is input to the second voice recognition unit 305 for voice recognition processing.
- the voice collected by the second microphone 304 is referred to as a voice command 316.
- the second voice recognition unit 305 recognizes the voice command 316 and converts it into second voice command information.
- the second position information adding unit 306 outputs second position information indicating position information of a place where the second microphone 304 is installed.
- the second position information adding unit 306 is an example of a second identification information generating unit, and the second position information is an example of second identification information.
- the second voice recognition unit 305 outputs the second voice information 313 including the second voice command information and the second position information to the voice command identification unit 310.
- the user's voice collected by the third microphone 307 is input to the third voice recognition unit 308 for voice recognition processing.
- the voice collected by the third microphone 307 is assumed to be a voice command 317.
- the third voice recognition unit 308 recognizes the voice command 317 and converts it into third voice command information.
- the third position information adding unit 309 outputs, for example, third position information indicating position information of a place where the third microphone 307 is installed.
- the third position information adding unit 309 is an example of a second identification information generation unit, and the third position information is an example of second identification information.
- the third voice recognition unit 308 outputs the third voice information 314 including the third voice command information and the third position information to the voice command identification unit 310.
- Each position information adding unit may be configured to detect position information using a position information detection technique that is generally used, or configured to hold position information registered in advance. Also good.
- the first voice input unit 318 and the second voice input unit 319 are installed in the vicinity (for example, in the same room), and the name of the place is “ Place 1 ”.
- the third audio input unit 320 is assumed to be installed in a place different from the place 1 (for example, a room different from the place 1), and the name of the place is “place 2”.
- the voice uttered at the place 1 is collected by one or both of the first microphone 301 and the second microphone 304 but not collected by the third microphone 307. Also, it is assumed that the voice uttered at the place 2 is collected by the third microphone 307 but not collected by the first microphone 301 and the second microphone 304.
- the first position information adding unit 303 and the second position information adding unit 306 hold the same position information indicating the location 1, and the third position information adding unit 309 stores the position information indicating the location 2. It shall be retained.
- the voice command identification unit 310 includes first voice information 312 output from the first voice input unit 318, second voice information 313 output from the second voice input unit 319, and a third voice input unit. Based on the third voice information 314 output from 320, the voice command input device 300 generates a control signal corresponding to the device to be operated (operation target device) and outputs it to the command issuing unit 311.
- the command issuing unit 311 converts the control signal output from the voice command identifying unit 310 into a device control signal 330 for controlling the operation target device of the voice command input device 300, and outputs the device control signal 330.
- the command issuing unit 311 is appropriately configured according to the operation target device of the voice command input device 300.
- the operation target device is a television provided with an infrared remote control signal light receiving unit
- the command issuing unit 311 is an infrared remote control code output device.
- the control signal output from the voice command identifying unit 310 is a remote control code for controlling the television
- the command issuing unit 311 converts the control signal input from the voice command identifying unit 310 into an infrared remote control code and outputs it.
- the device control signal 330 is the infrared remote control code.
- the command issuing unit 311 is not limited to a specific configuration, and is appropriately configured according to the operation target device of the voice command input device 300.
- the voice command identification unit 310 is configured to output a control signal appropriately according to the configuration of the command issuing unit 311.
- each of the plurality of voice input units can simultaneously receive voice commands uttered by different users. Therefore, similar to the voice command input device 100 shown in the first embodiment, the voice command input device 300 can output a plurality of voice commands even if a plurality of users speak a voice command to each microphone at the same time. A plurality of processes can be executed based on the results of the speech recognition.
- the voice command input device 300 shown in the present embodiment is different from the voice command input device 100 shown in the first embodiment, and the voice command collected by each of the plurality of microphones is one person. It is possible to distinguish between whether the speaker has uttered or whether the speaker has uttered, and appropriately process it.
- the voice command identification unit 310 In the present embodiment, three voice information (here, the first voice information 312, the second voice information 313, and the third voice information 314) are input to the voice command identification unit 310 almost simultaneously. A description will be given of an example of the operation when the input is performed within a predetermined time (a time in which overlapping input periods occur, for example, 5 seconds). When only one piece of voice information is input, the voice command identification unit 310 generates and outputs a control signal corresponding to the voice command information included in the voice information, and description of the operation is omitted. Further, the voice command input device 300 may be configured to include two voice input units or four or more voice input units, and two voice information or four or more voice information is input to the voice command identification unit 310. Also good.
- the voice command identification unit 310 When a plurality of pieces of voice information are input within a predetermined time, the voice command identification unit 310 first extracts position information from each piece of voice information and compares the pieces of position information with each other. Also, voice command information is extracted from each voice information, and each voice command information is compared with each other.
- the voice command identification unit 310 indicates that the voice command information included in each voice information is different from voices uttered by different speakers.
- the control signal corresponding to each voice command information is generated and output.
- the voice command identification unit 310 determines that the voice command information included in each voice information is one person's talk. It is determined that the voices are the same, and a control signal corresponding to the voice command information is generated and output.
- the voice command identification unit 310 utters the voice command information included in each voice information from different speakers. It is determined that the voice is different, and a control signal corresponding to each voice command information is generated and output.
- the voice command identifying unit 310 regards the voice information of the different position information as to the voice command included in the voice information. It is determined that the information is separate from speech uttered by different speakers. As for the voice information of the same position information, if the voice command information included in the voice information is the same, the voice command information is the same by the voice uttered from one speaker. If the voice command information included in the voice information is different from each other, it is determined that the voice command information is separate from voices uttered by different speakers. Based on the determination result, a control signal corresponding to the voice command information is generated and output.
- the voice command identifying unit 310 determines that the first voice command information and the third voice command information are different speakers. It is determined that the voices are different from each other. Therefore, the voice command identifying unit 310 generates a control signal corresponding to the first voice command information and a control signal corresponding to the third voice command information, and outputs them to the command issuing unit 311.
- the voice command identification unit 310 determines that the first voice command information and the second voice command information are the same by the voice uttered from one speaker. Then, one control signal corresponding to the first voice command information (or second voice command information) is generated and output to the command issuing unit 311.
- the voice command identifying unit 310 determines that the first voice command information and the second voice command information are different from each other by voices uttered by different speakers. Therefore, the voice command identifying unit 310 generates a control signal corresponding to the first voice command information and a control signal corresponding to the second voice command information, and outputs them to the command issuing unit 311.
- the voice command identification unit 310 determines that the first voice command information and the second voice command information are different from each other. It is determined that they are the same by the voice uttered by the speaker. Further, if the first position information (and the second position information) and the third position information are different from each other, the voice command identification unit 310 causes the first voice command information (and the second voice information).
- the voice command identifying unit 310 generates a control signal corresponding to the first voice command information (or second voice command information) and a control signal corresponding to the third voice command information, respectively, The data is output to the issuing unit 311.
- the voice command identification unit 310 shown in the present embodiment extracts position information from each piece of voice information and compares the position information with each other. Then, a control signal is generated and output based on the comparison result and voice command information included in each voice information. When only one piece of voice information is input within a predetermined time, a control signal corresponding to the voice command information included in the voice information is generated and output.
- the determination as to whether or not the time is within the predetermined time is, for example, provided with a timer in the voice command identification unit 310 and a threshold for determining the predetermined time is determined in advance, and the measured time in the timer is compared with the threshold.
- the timer may be configured such that time measurement is started when the first audio information is input and reset when the control signal is output.
- the voice command input device includes the first voice input unit, the second voice input unit, and the voice command identification unit.
- the first voice input unit includes a first identification information generation unit that outputs first identification information, and a first voice recognition unit that converts voice into first voice command information.
- the first voice information including the identification information and the first voice command information is output.
- the second voice input unit includes a second identification information generation unit that outputs the second identification information, and a second voice recognition unit that converts voice into second voice command information.
- Second voice information including identification information and second voice command information is output.
- the voice command identification unit is configured to generate and output a control signal for controlling the operation target device based on the first voice information and the second voice information. Then, the voice command identification unit generates a control signal with reference to the first identification information and the second identification information.
- the first identification information generation unit outputs first position information indicating a place where the first voice input unit is installed as the first identification information, and the second identification information.
- the generation unit outputs second position information indicating a place where the second voice input unit is installed as second identification information.
- the voice command identification unit generates a control signal based on the comparison between the first position information and the second position information.
- the first voice input unit 318 is an example of the first voice input unit
- the first voice recognition unit 302 is an example of the first voice recognition unit
- the first position information addition unit 303 is the first voice information input unit 303.
- the second voice input unit 319 and the third voice input unit 320 are examples of the second voice input unit
- the second voice recognition unit 305 and the third voice input unit are examples of the first identification information generation unit.
- the recognition unit 308 is an example of a second voice recognition unit
- the second position information addition unit 306 and the third position information addition unit 309 are examples of a second identification information generation unit
- the voice command identification unit 310 Is an example of a voice command identification unit
- the first voice information 312 is an example of first voice information
- the second voice information 313 and the third voice information 314 are examples of second voice information. .
- each of the plurality of voice input units can simultaneously receive voice commands uttered by different users. Therefore, even if a plurality of users utter voice commands to each microphone at the same time, the plurality of voice commands can be recognized as voices, and a plurality of processes based on the results of the voice recognition can be executed.
- the voice command input device 300 determines whether a voice command collected by each of the plurality of microphones is uttered by one speaker or a plurality of speakers based on position information.
- the control signal can be generated based on the determination result. Therefore, for example, even if a plurality of microphones are installed in a relatively small area and one voice command uttered by one speaker is collected by a plurality of microphones, the processing is repeated. The processing based on the voice command can be appropriately executed.
- the position information may be set in advance, or commonly used position information detection means (for example, Wi-Fi (Wireless Fidelity) access point information, beacon information, or GPS (Global Positioning System). ), And the like.
- position information detection means for example, Wi-Fi (Wireless Fidelity) access point information, beacon information, or GPS (Global Positioning System).
- the configuration including such position information detection means is, for example, that a microphone is attached to a user or a voice input unit is installed in a moving body such as a car, so that the voice input unit can be This is effective when there is a possibility of movement.
- Embodiment 3 will be described with reference to FIGS.
- FIG. 4 is a block diagram illustrating a configuration example of the voice command input device 400 according to the third embodiment.
- the voice command input device 400 includes a first voice input unit 418, a second voice input unit 419, a voice command identification unit 407, and a command issue unit 408.
- the first voice input unit 418 includes a first microphone 401, a first personal identification unit 402, and a first voice recognition unit 403.
- the second voice input unit 419 includes a second microphone 404, a second personal identification unit 405, and a second voice recognition unit 406.
- the user's voice collected by the first microphone 401 is input to the first voice recognition unit 403 for voice recognition processing.
- the sound collected by the first microphone 401 is assumed to be the first sound command 409.
- the first voice command 409 is also input to the first personal identification unit 402.
- the first personal identification unit 402 analyzes the voice to identify the speaker who uttered the first voice command 409, and identifies the speaker. Then, the first speaker information 414 indicating the speaker is output to the first speech recognition unit 403.
- the second personal identification unit 405 analyzes the voice to identify the speaker who uttered the second voice command 410, and identifies the speaker. Then, the second speaker information 415 indicating the speaker is output to the second speech recognition unit 406.
- the voice command identification unit 407 is a voice command input device based on the first voice information 411 output from the first voice input unit 418 and the second voice information 412 output from the second voice input unit 419. 400 generates a control signal corresponding to a device to be operated (operation target device) and outputs the control signal to the command issuing unit 408.
- each of the plurality of voice input units can simultaneously receive voice commands uttered by different users. Therefore, like the voice command input device 100 shown in the first embodiment and the voice command input device 300 shown in the second embodiment, the voice command input device 400 allows a plurality of users to speak to each microphone simultaneously. Even if a command is uttered, it is possible to recognize each of the plurality of voice commands and execute a plurality of processes based on the result of the voice recognition.
- the voice command input device 400 shown in the present embodiment has a technique different from that of the voice command input device 100 shown in the first embodiment and the voice command input device 300 shown in the second embodiment. It is possible to distinguish whether the voice commands collected by each speaker are uttered by a single speaker or a plurality of speakers and process them appropriately.
- FIG. 5 is a flowchart illustrating an operation example of the voice command input device 400 according to the third embodiment.
- the voice command identification unit 407 when two pieces of voice information (here, the first voice information 411 and the second voice information 412) are input to the voice command identification unit 407 almost simultaneously, or overlap each other. An example of the operation when the input is performed after a period is generated will be described. When only one piece of voice information is input, the voice command identification unit 407 generates and outputs a control signal corresponding to the voice command information included in the voice information, and description of the operation is omitted.
- the voice command input device 400 may be configured to include three or more voice input units, and three or more voice information may be input to the voice command identification unit 407.
- the voice command identification unit 407 compares the speaker information extracted in step S500 with each other and confirms whether or not they indicate the same speaker (step S501).
- step S501 When it is determined in step S501 that each speaker information indicates the same speaker (Yes), the voice command identification unit 407 determines that each voice command information extracted in step S500 is obtained from one speaker. It is determined that the voices are the same (step S502).
- the voice command identifying unit 407 generates one control signal corresponding to the voice command information and outputs it to the command issuing unit 408 (step S503).
- the voice command identification unit 407 generates a control signal (for example, two control signals) corresponding to each of the voice command information extracted in step S500 and outputs the control signal to the command issuing unit 408 (step S505).
- a control signal for example, two control signals
- the voice command input device includes the first voice input unit, the second voice input unit, and the voice command identification unit.
- the first voice input unit includes a first identification information generation unit that outputs first identification information, and a first voice recognition unit that converts voice into first voice command information.
- the first voice information including the identification information and the first voice command information is output.
- the second voice input unit includes a second identification information generation unit that outputs the second identification information, and a second voice recognition unit that converts voice into second voice command information.
- Second voice information including identification information and second voice command information is output.
- the voice command identification unit is configured to generate and output a control signal for controlling the operation target device based on the first voice information and the second voice information. Then, the voice command identification unit generates a control signal with reference to the first identification information and the second identification information.
- the first identification information generation unit outputs first speaker information indicating the speaker of the voice input to the first voice input unit as the first identification information.
- the second identification information generation unit outputs second speaker information indicating the speaker of the voice input to the second voice input unit as second identification information.
- the voice command identification unit generates a control signal based on the comparison between the first speaker information and the second speaker information.
- the first voice input unit 418 is an example of the first voice input unit
- the first voice recognition unit 403 is an example of the first voice recognition unit
- the first personal identification unit 402 is the first voice recognition unit 402.
- the second voice input unit 419 is an example of the second voice input unit
- the second voice recognition unit 406 is an example of the second voice recognition unit
- the second voice input unit 419 is an example of the second voice recognition unit.
- Personal identification unit 405 is an example of a second identification information generation unit
- voice command identification unit 407 is an example of a voice command identification unit
- first voice information 411 is an example of first voice information.
- the second audio information 412 is an example of second audio information.
- each of the plurality of voice input units can simultaneously receive voice commands uttered by different users. Therefore, even if a plurality of users utter voice commands to each microphone at the same time, the plurality of voice commands can be recognized as voices, and a plurality of processes based on the results of the voice recognition can be executed.
- Embodiments 1 to 3 have been described as examples of the technology disclosed in the present application.
- the technology in the present disclosure is not limited to this, and can also be applied to embodiments in which changes, replacements, additions, omissions, and the like are performed.
- Embodiment 1 the configuration in which the time stamp adding unit is provided in each of the voice recognition units has been described.
- a time information source such as a timer is provided inside the voice command identification unit. Then, when the voice command information is input to the voice command identification unit, the voice command identification unit generates time stamp information by referring to the time information source. Then, the time stamp information is linked to the voice command information. In this way, the voice command input device may be configured.
- the voice command identification unit determines whether “the position information is the same as each other” by comparing the position information with each other.
- the voice command identification unit is configured to calculate a separation distance from two pieces of position information and to determine whether or not the position information is the same by comparing the calculated distance and a threshold value. Also good. In this configuration, if the distance between each other calculated from the two pieces of position information is equal to or less than a threshold (for example, 20 m), it is determined that the two pieces of position information are the same. It can be determined that the information is different from each other. Further, longitude information, latitude information, altitude information, and the like may be used as position information.
- the configuration has been described in which the first speaker information and the second speaker information are information (for example, a personal identification ID or a personal name) that identifies the speaker.
- the speaker information is voice print information extracted from a voice command or information indicating a feature point of a voice print
- the voice command identification unit is configured to compare the voice print information with each other or the voice print feature points with each other. May be.
- the voice command input device can determine whether the speakers of the plurality of voice command information are the same speaker or different speakers.
- the voice command input device configured as described above may not register voiceprint information in advance. Furthermore, since this voice command input device does not need to register voiceprint information in advance, it is effective for use in places where there are a large number of unspecified users such as public places.
- each component shown in the present embodiment may be configured as an electronic circuit, or may be configured to realize each component by causing a processor to execute a program.
- the present disclosure is applicable to a voice command input device that allows a plurality of users to perform voice operations on devices.
- a configuration is such that a plurality of microphones are installed in different locations, and the sound collected by each microphone is input to one control device, and the operation target device is controlled from the control device based on the sound.
- the present disclosure can be applied to a system and an apparatus that are provided.
Landscapes
- Engineering & Computer Science (AREA)
- Acoustics & Sound (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Multimedia (AREA)
- Computational Linguistics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Business, Economics & Management (AREA)
- Game Theory and Decision Science (AREA)
- User Interface Of Digital Computer (AREA)
- Selective Calling Equipment (AREA)
Abstract
Description
以下、図1および図2を用いて、実施の形態1を説明する。
図1は、実施の形態1における音声コマンド入力装置100の一構成例を示すブロック図である。
図2は、実施の形態1における音声コマンド入力装置100の一動作例を示すフローチャートである。
以上のように、本実施の形態において、音声コマンド入力装置は、第1の音声入力部と、第2の音声入力部と、音声コマンド識別部と、を備える。第1の音声入力部は、第1の識別情報を出力する第1の識別情報発生部と、音声を第1の音声コマンド情報に変換する第1の音声認識部と、を備え、第1の識別情報と第1の音声コマンド情報とを含む第1の音声情報を出力する、ように構成されている。第2の音声入力部は、第2の識別情報を出力する第2の識別情報発生部と、音声を第2の音声コマンド情報に変換する第2の音声認識部と、を備え、第2の識別情報と第2の音声コマンド情報とを含む第2の音声情報を出力する、ように構成されている。音声コマンド識別部は、第1の音声情報と第2の音声情報とにもとづき操作対象機器を制御するコントロール信号を生成して出力するように構成されている。そして、音声コマンド識別部は、第1の識別情報と第2の識別情報とを参照してコントロール信号を生成する。
実施の形態1では、第1の識別情報発生部として第1のタイムスタンプ付加部103を設け、第1の識別情報として第1のタイムスタンプ情報を用い、第2の識別情報発生部として第2のタイムスタンプ付加部106を設け、第2の識別情報として第2のタイムスタンプ情報を用いる例を説明した。
図3は、実施の形態2における音声コマンド入力装置300の一構成例を示すブロック図である。
なお、本実施の形態では、音声コマンド識別部310に、3つの音声情報(ここでは、第1の音声情報312、第2の音声情報313および第3の音声情報314)が、ほぼ同時に入力されたとき、もしくは、所定の時間(互いに重複する入力期間が生じる程度の時間。例えば、5秒)以内に入力されたとき、の動作例を説明する。音声コマンド識別部310は、1つの音声情報だけが入力されたときは、その音声情報に含まれる音声コマンド情報に応じたコントロール信号を生成して出力するものとし、その動作の説明は省略する。また、音声コマンド入力装置300を2つの音声入力部または4つ以上の音声入力部を備えた構成としてもよく、音声コマンド識別部310に2つの音声情報または4つ以上の音声情報が入力されてもよい。
以上のように、本実施の形態において、音声コマンド入力装置は、第1の音声入力部と、第2の音声入力部と、音声コマンド識別部と、を備える。第1の音声入力部は、第1の識別情報を出力する第1の識別情報発生部と、音声を第1の音声コマンド情報に変換する第1の音声認識部と、を備え、第1の識別情報と第1の音声コマンド情報とを含む第1の音声情報を出力する、ように構成されている。第2の音声入力部は、第2の識別情報を出力する第2の識別情報発生部と、音声を第2の音声コマンド情報に変換する第2の音声認識部と、を備え、第2の識別情報と第2の音声コマンド情報とを含む第2の音声情報を出力する、ように構成されている。音声コマンド識別部は、第1の音声情報と第2の音声情報とにもとづき操作対象機器を制御するコントロール信号を生成して出力するように構成されている。そして、音声コマンド識別部は、第1の識別情報と第2の識別情報とを参照してコントロール信号を生成する。
本実施の形態では、識別情報発生部としての個人識別部を設け、識別情報として話者情報を用いる例を説明する。
図4は、実施の形態3における音声コマンド入力装置400の一構成例を示すブロック図である。
図5は、実施の形態3における音声コマンド入力装置400の一動作例を示すフローチャートである。
以上のように、本実施の形態において、音声コマンド入力装置は、第1の音声入力部と、第2の音声入力部と、音声コマンド識別部と、を備える。第1の音声入力部は、第1の識別情報を出力する第1の識別情報発生部と、音声を第1の音声コマンド情報に変換する第1の音声認識部と、を備え、第1の識別情報と第1の音声コマンド情報とを含む第1の音声情報を出力する、ように構成されている。第2の音声入力部は、第2の識別情報を出力する第2の識別情報発生部と、音声を第2の音声コマンド情報に変換する第2の音声認識部と、を備え、第2の識別情報と第2の音声コマンド情報とを含む第2の音声情報を出力する、ように構成されている。音声コマンド識別部は、第1の音声情報と第2の音声情報とにもとづき操作対象機器を制御するコントロール信号を生成して出力するように構成されている。そして、音声コマンド識別部は、第1の識別情報と第2の識別情報とを参照してコントロール信号を生成する。
以上のように、本出願において開示する技術の例示として、実施の形態1~3を説明した。しかしながら、本開示における技術は、これに限定されず、変更、置き換え、付加、省略等を行った実施の形態にも適用できる。また、上記実施の形態1~3で説明した各構成要素を組み合わせて、新たな実施の形態とすることも可能である。
101,301,401 第1のマイクロホン
102,302,403 第1の音声認識部
103 第1のタイムスタンプ付加部
104,304,404 第2のマイクロホン
105,305,406 第2の音声認識部
106 第2のタイムスタンプ付加部
107,310,407 音声コマンド識別部
108,311,408 コマンド発行部
109,312,411 第1の音声情報
110,313,412 第2の音声情報
111,112,315,316,317,409,410 音声コマンド
113,330,413 機器制御信号
114,318,418 第1の音声入力部
115,319,419 第2の音声入力部
303 第1の位置情報付加部
306 第2の位置情報付加部
307 第3のマイクロホン
308 第3の音声認識部
309 第3の位置情報付加部
314 第3の音声情報
320 第3の音声入力部
402 第1の個人識別部
405 第2の個人識別部
414 第1の話者情報
415 第2の話者情報
Claims (6)
- 第1の識別情報を出力する第1の識別情報発生部と、音声を第1の音声コマンド情報に変換する第1の音声認識部と、を備え、前記第1の識別情報と前記第1の音声コマンド情報とを含む第1の音声情報を出力する、ように構成された第1の音声入力部と、
第2の識別情報を出力する第2の識別情報発生部と、音声を第2の音声コマンド情報に変換する第2の音声認識部と、を備え、前記第2の識別情報と前記第2の音声コマンド情報とを含む第2の音声情報を出力する、ように構成された第2の音声入力部と、
前記第1の音声情報と前記第2の音声情報とにもとづき、操作対象機器を制御するコントロール信号を生成して出力する、ように構成された音声コマンド識別部と、
を備え、
前記音声コマンド識別部は、前記第1の識別情報と前記第2の識別情報とを参照して前記コントロール信号を生成する、
音声コマンド入力装置。 - 前記第1の識別情報発生部は、前記第1の音声入力部に音声が入力されるタイミングを示す第1のタイムスタンプ情報を前記第1の識別情報として出力し、
前記第2の識別情報発生部は、前記第2の音声入力部に音声が入力されるタイミングを示す第2のタイムスタンプ情報を前記第2の識別情報として出力し、
前記音声コマンド識別部は、前記第1のタイムスタンプ情報と前記第2のタイムスタンプ情報との時間差にもとづき前記コントロール信号を生成する、
請求項1に記載の音声コマンド入力装置。 - 前記第1の識別情報発生部は、前記第1の音声入力部が設置された場所を示す第1の位置情報を前記第1の識別情報として出力し、
前記第2の識別情報発生部は、前記第2の音声入力部が設置された場所を示す第2の位置情報を前記第2の識別情報として出力し、
前記音声コマンド識別部は、前記第1の位置情報と前記第2の位置情報との比較にもとづき前記コントロール信号を生成する、
請求項1に記載の音声コマンド入力装置。 - 前記第1の識別情報発生部は、前記第1の音声入力部に入力される音声の発話者を示す第1の話者情報を前記第1の識別情報として出力し、
前記第2の識別情報発生部は、前記第2の音声入力部に入力される音声の発話者を示す第2の話者情報を前記第2の識別情報として出力し、
前記音声コマンド識別部は、前記第1の話者情報と前記第2の話者情報との比較にもとづき前記コントロール信号を生成する、
請求項1に記載の音声コマンド入力装置。 - 前記第1の識別情報発生部は、前記第1の音声入力部に入力される音声の特徴を示す第1の話者情報を前記第1の識別情報として出力し、
前記第2の識別情報発生部は、前記第2の音声入力部に入力される音声の特徴を示す第2の話者情報を前記第2の識別情報として出力し、
前記音声コマンド識別部は、前記第1の話者情報と前記第2の話者情報との比較にもとづき前記コントロール信号を生成する、
請求項1に記載の音声コマンド入力装置。 - 第1の識別情報を発生するステップと、
音声を第1の音声コマンド情報に変換するステップと、
第2の識別情報を発生するステップと、
音声を第2の音声コマンド情報に変換するステップと、
前記第1の識別情報と前記第2の識別情報とを参照するステップと、
前記参照の結果と、前記第1の音声コマンド情報および前記第2の音声コマンド情報と、にもとづき、操作対象機器を制御するコントロール信号を生成するステップと、
を備えた音声コマンド入力方法。
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2016510046A JP6436400B2 (ja) | 2014-03-28 | 2015-03-26 | 音声コマンド入力装置および音声コマンド入力方法 |
US15/122,429 US10074367B2 (en) | 2014-03-28 | 2015-03-26 | Voice command input device and voice command input method |
US16/055,821 US10304456B2 (en) | 2014-03-28 | 2018-08-06 | Voice command input device and voice command input method |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2014068192 | 2014-03-28 | ||
JP2014-068192 | 2014-03-28 |
Related Child Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US15/122,429 A-371-Of-International US10074367B2 (en) | 2014-03-28 | 2015-03-26 | Voice command input device and voice command input method |
US16/055,821 Continuation US10304456B2 (en) | 2014-03-28 | 2018-08-06 | Voice command input device and voice command input method |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2015146179A1 true WO2015146179A1 (ja) | 2015-10-01 |
Family
ID=54194723
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2015/001721 WO2015146179A1 (ja) | 2014-03-28 | 2015-03-26 | 音声コマンド入力装置および音声コマンド入力方法 |
Country Status (3)
Country | Link |
---|---|
US (2) | US10074367B2 (ja) |
JP (2) | JP6436400B2 (ja) |
WO (1) | WO2015146179A1 (ja) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2017145373A1 (ja) * | 2016-02-26 | 2017-08-31 | 三菱電機株式会社 | 音声認識装置 |
JP2019032479A (ja) * | 2017-08-09 | 2019-02-28 | レノボ・シンガポール・プライベート・リミテッド | 音声アシストシステム、サーバ装置、デバイス、その音声アシスト方法、及びコンピュータが実行するためのプログラム |
JP2019095835A (ja) * | 2017-11-17 | 2019-06-20 | キヤノン株式会社 | 音声制御システム、制御方法及びプログラム |
US10448762B2 (en) | 2017-09-15 | 2019-10-22 | Kohler Co. | Mirror |
US10663938B2 (en) | 2017-09-15 | 2020-05-26 | Kohler Co. | Power operation of intelligent devices |
US10887125B2 (en) | 2017-09-15 | 2021-01-05 | Kohler Co. | Bathroom speaker |
US11093554B2 (en) | 2017-09-15 | 2021-08-17 | Kohler Co. | Feedback for water consuming appliance |
US11099540B2 (en) | 2017-09-15 | 2021-08-24 | Kohler Co. | User identity in household appliances |
Families Citing this family (37)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8676904B2 (en) | 2008-10-02 | 2014-03-18 | Apple Inc. | Electronic devices with voice command and contextual data processing capabilities |
EP2954514B1 (en) | 2013-02-07 | 2021-03-31 | Apple Inc. | Voice trigger for a digital assistant |
US9338493B2 (en) | 2014-06-30 | 2016-05-10 | Apple Inc. | Intelligent automated assistant for TV user interactions |
US10748539B2 (en) * | 2014-09-10 | 2020-08-18 | Crestron Electronics, Inc. | Acoustic sensory network |
US10204622B2 (en) * | 2015-09-10 | 2019-02-12 | Crestron Electronics, Inc. | Acoustic sensory network |
US10747498B2 (en) | 2015-09-08 | 2020-08-18 | Apple Inc. | Zero latency digital assistant |
US10783888B2 (en) * | 2015-09-10 | 2020-09-22 | Crestron Electronics Inc. | System and method for determining recipient of spoken command in a control system |
US10691473B2 (en) | 2015-11-06 | 2020-06-23 | Apple Inc. | Intelligent automated assistant in a messaging environment |
US10388273B2 (en) * | 2016-08-10 | 2019-08-20 | Roku, Inc. | Distributed voice processing system |
EP3291580A1 (en) | 2016-08-29 | 2018-03-07 | Oticon A/s | Hearing aid device with speech control functionality |
JP6659514B2 (ja) * | 2016-10-12 | 2020-03-04 | 東芝映像ソリューション株式会社 | 電子機器及びその制御方法 |
US11276395B1 (en) * | 2017-03-10 | 2022-03-15 | Amazon Technologies, Inc. | Voice-based parameter assignment for voice-capturing devices |
DK179496B1 (en) | 2017-05-12 | 2019-01-15 | Apple Inc. | USER-SPECIFIC Acoustic Models |
DK201770427A1 (en) | 2017-05-12 | 2018-12-20 | Apple Inc. | LOW-LATENCY INTELLIGENT AUTOMATED ASSISTANT |
US10983753B2 (en) | 2017-06-09 | 2021-04-20 | International Business Machines Corporation | Cognitive and interactive sensor based smart home solution |
CN107146616B (zh) * | 2017-06-13 | 2020-05-08 | Oppo广东移动通信有限公司 | 设备控制方法及相关产品 |
US10928918B2 (en) | 2018-05-07 | 2021-02-23 | Apple Inc. | Raise to speak |
CN112437955A (zh) * | 2018-05-18 | 2021-03-02 | 施耐德电气公司亚洲 | 继电器装置 |
US10892996B2 (en) * | 2018-06-01 | 2021-01-12 | Apple Inc. | Variable latency device coordination |
DK180639B1 (en) | 2018-06-01 | 2021-11-04 | Apple Inc | DISABILITY OF ATTENTION-ATTENTIVE VIRTUAL ASSISTANT |
US11462215B2 (en) | 2018-09-28 | 2022-10-04 | Apple Inc. | Multi-modal inputs for voice commands |
CN109065051B (zh) * | 2018-09-30 | 2021-04-09 | 珠海格力电器股份有限公司 | 一种语音识别处理方法及装置 |
EP3709194A1 (en) | 2019-03-15 | 2020-09-16 | Spotify AB | Ensemble-based data comparison |
US11348573B2 (en) | 2019-03-18 | 2022-05-31 | Apple Inc. | Multimodality in digital assistant systems |
CN110177033B (zh) * | 2019-05-21 | 2021-08-31 | 四川虹美智能科技有限公司 | 基于物联网控制家电的方法、终端及物联网系统 |
US11227599B2 (en) | 2019-06-01 | 2022-01-18 | Apple Inc. | Methods and user interfaces for voice-based control of electronic devices |
US11094319B2 (en) | 2019-08-30 | 2021-08-17 | Spotify Ab | Systems and methods for generating a cleaned version of ambient sound |
US10827028B1 (en) | 2019-09-05 | 2020-11-03 | Spotify Ab | Systems and methods for playing media content on a target device |
US11328722B2 (en) | 2020-02-11 | 2022-05-10 | Spotify Ab | Systems and methods for generating a singular voice audio stream |
US11308959B2 (en) | 2020-02-11 | 2022-04-19 | Spotify Ab | Dynamic adjustment of wake word acceptance tolerance thresholds in voice-controlled devices |
US11908480B1 (en) * | 2020-03-23 | 2024-02-20 | Amazon Technologies, Inc. | Natural language processing using context |
US11386887B1 (en) | 2020-03-23 | 2022-07-12 | Amazon Technologies, Inc. | Natural language processing using context |
US11061543B1 (en) | 2020-05-11 | 2021-07-13 | Apple Inc. | Providing relevant data items based on context |
US11490204B2 (en) | 2020-07-20 | 2022-11-01 | Apple Inc. | Multi-device audio adjustment coordination |
US11438683B2 (en) | 2020-07-21 | 2022-09-06 | Apple Inc. | User identification using headphones |
CN113990298B (zh) * | 2021-12-24 | 2022-05-13 | 广州小鹏汽车科技有限公司 | 语音交互方法及其装置、服务器和可读存储介质 |
WO2023150491A1 (en) * | 2022-02-02 | 2023-08-10 | Google Llc | Speech recognition using word or phoneme time markers based on user input |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2003289587A (ja) * | 2002-03-28 | 2003-10-10 | Fujitsu Ltd | 機器制御装置および機器制御方法 |
JP2005151471A (ja) * | 2003-11-19 | 2005-06-09 | Sony Corp | 音声集音・映像撮像装置および撮像条件決定方法 |
Family Cites Families (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5657425A (en) * | 1993-11-15 | 1997-08-12 | International Business Machines Corporation | Location dependent verbal command execution in a computer based control system |
JPH0836480A (ja) | 1994-07-22 | 1996-02-06 | Hitachi Ltd | 情報処理装置 |
JP3357629B2 (ja) | 1999-04-26 | 2002-12-16 | 旭化成株式会社 | 設備制御システム |
US20030093281A1 (en) * | 1999-05-21 | 2003-05-15 | Michael Geilhufe | Method and apparatus for machine to machine communication using speech |
DE60120062T2 (de) * | 2000-09-19 | 2006-11-16 | Thomson Licensing | Sprachsteuerung von elektronischen Geräten |
JP2010047093A (ja) * | 2008-08-20 | 2010-03-04 | Fujitsu Ten Ltd | 音声認識処理装置および音声認識処理方法 |
KR101972955B1 (ko) * | 2012-07-03 | 2019-04-26 | 삼성전자 주식회사 | 음성을 이용한 사용자 디바이스들 간 서비스 연결 방법 및 장치 |
-
2015
- 2015-03-26 WO PCT/JP2015/001721 patent/WO2015146179A1/ja active Application Filing
- 2015-03-26 US US15/122,429 patent/US10074367B2/en active Active
- 2015-03-26 JP JP2016510046A patent/JP6436400B2/ja active Active
-
2018
- 2018-06-26 JP JP2018120925A patent/JP6624575B2/ja active Active
- 2018-08-06 US US16/055,821 patent/US10304456B2/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2003289587A (ja) * | 2002-03-28 | 2003-10-10 | Fujitsu Ltd | 機器制御装置および機器制御方法 |
JP2005151471A (ja) * | 2003-11-19 | 2005-06-09 | Sony Corp | 音声集音・映像撮像装置および撮像条件決定方法 |
Cited By (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPWO2017145373A1 (ja) * | 2016-02-26 | 2018-08-09 | 三菱電機株式会社 | 音声認識装置 |
WO2017145373A1 (ja) * | 2016-02-26 | 2017-08-31 | 三菱電機株式会社 | 音声認識装置 |
US10867596B2 (en) | 2017-08-09 | 2020-12-15 | Lenovo (Singapore) Pte. Ltd. | Voice assistant system, server apparatus, device, voice assistant method therefor, and program to be executed by computer |
JP2019032479A (ja) * | 2017-08-09 | 2019-02-28 | レノボ・シンガポール・プライベート・リミテッド | 音声アシストシステム、サーバ装置、デバイス、その音声アシスト方法、及びコンピュータが実行するためのプログラム |
US11093554B2 (en) | 2017-09-15 | 2021-08-17 | Kohler Co. | Feedback for water consuming appliance |
US10663938B2 (en) | 2017-09-15 | 2020-05-26 | Kohler Co. | Power operation of intelligent devices |
US10448762B2 (en) | 2017-09-15 | 2019-10-22 | Kohler Co. | Mirror |
US10887125B2 (en) | 2017-09-15 | 2021-01-05 | Kohler Co. | Bathroom speaker |
US11099540B2 (en) | 2017-09-15 | 2021-08-24 | Kohler Co. | User identity in household appliances |
US11314214B2 (en) | 2017-09-15 | 2022-04-26 | Kohler Co. | Geographic analysis of water conditions |
US11314215B2 (en) | 2017-09-15 | 2022-04-26 | Kohler Co. | Apparatus controlling bathroom appliance lighting based on user identity |
US11892811B2 (en) | 2017-09-15 | 2024-02-06 | Kohler Co. | Geographic analysis of water conditions |
US11921794B2 (en) | 2017-09-15 | 2024-03-05 | Kohler Co. | Feedback for water consuming appliance |
US11949533B2 (en) | 2017-09-15 | 2024-04-02 | Kohler Co. | Sink device |
JP2019095835A (ja) * | 2017-11-17 | 2019-06-20 | キヤノン株式会社 | 音声制御システム、制御方法及びプログラム |
JP7057647B2 (ja) | 2017-11-17 | 2022-04-20 | キヤノン株式会社 | 音声制御システム、制御方法及びプログラム |
Also Published As
Publication number | Publication date |
---|---|
JPWO2015146179A1 (ja) | 2017-04-13 |
US10304456B2 (en) | 2019-05-28 |
JP6436400B2 (ja) | 2018-12-12 |
US20170069321A1 (en) | 2017-03-09 |
JP2018173653A (ja) | 2018-11-08 |
JP6624575B2 (ja) | 2019-12-25 |
US10074367B2 (en) | 2018-09-11 |
US20180350367A1 (en) | 2018-12-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP6436400B2 (ja) | 音声コマンド入力装置および音声コマンド入力方法 | |
US10743107B1 (en) | Synchronization of audio signals from distributed devices | |
EP4345816A2 (en) | Speaker attributed transcript generation | |
US11023690B2 (en) | Customized output to optimize for user preference in a distributed system | |
US11875796B2 (en) | Audio-visual diarization to identify meeting attendees | |
US11138980B2 (en) | Processing overlapping speech from distributed devices | |
US20180090138A1 (en) | System and method for localization and acoustic voice interface | |
US10812921B1 (en) | Audio stream processing for distributed device meeting | |
US20170330566A1 (en) | Distributed Volume Control for Speech Recognition | |
US8666750B2 (en) | Voice control system | |
US9792901B1 (en) | Multiple-source speech dialog input | |
US11557306B2 (en) | Method and system for speech enhancement | |
JP2018197855A (ja) | 複数の音声認識装置間の調整 | |
US11468895B2 (en) | Distributed device meeting initiation | |
JP2007168972A (ja) | エレベータ制御装置 | |
US11083069B2 (en) | Lighting control system, lighting control method, and program | |
KR101544671B1 (ko) | 소리 기반 저전력 전단 이벤트 감지 방법 및 장치 | |
JPWO2011118021A1 (ja) | エレベーター装置 | |
KR20140072727A (ko) | 음성 인식 장치 및 방법 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 15767737 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 2016510046 Country of ref document: JP Kind code of ref document: A |
|
WWE | Wipo information: entry into national phase |
Ref document number: 15122429 Country of ref document: US |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 15767737 Country of ref document: EP Kind code of ref document: A1 |