WO2015167008A1 - 案内装置、案内方法、プログラム及び情報記憶媒体 - Google Patents
案内装置、案内方法、プログラム及び情報記憶媒体 Download PDFInfo
- Publication number
- WO2015167008A1 WO2015167008A1 PCT/JP2015/063064 JP2015063064W WO2015167008A1 WO 2015167008 A1 WO2015167008 A1 WO 2015167008A1 JP 2015063064 W JP2015063064 W JP 2015063064W WO 2015167008 A1 WO2015167008 A1 WO 2015167008A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- voice
- information
- guidance
- input
- recognition
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 73
- 230000010365 information processing Effects 0.000 description 92
- 238000007726 management method Methods 0.000 description 20
- 238000013500 data storage Methods 0.000 description 15
- 230000006870 function Effects 0.000 description 9
- 238000004891 communication Methods 0.000 description 7
- 238000010586 diagram Methods 0.000 description 4
- 230000007613 environmental effect Effects 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 3
- 230000001133 acceleration Effects 0.000 description 2
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/20—Speech recognition techniques specially adapted for robustness in adverse environments, e.g. in noise, of stress induced speech
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/16—Sound input; Sound output
- G06F3/167—Audio in a user interface, e.g. using voice commands for navigating, audio feedback
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/16—Sound input; Sound output
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/223—Execution procedure of a spoken command
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/225—Feedback of the input speech
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/21—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being power information
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
Definitions
- the present invention relates to a guide device, a guide method, a program, and an information storage medium.
- a speech recognition technology in which a plurality of pieces of information ranked by an index such as likelihood is identified as a recognition result of received speech.
- a voice recognition technology if the volume of the input voice is too low, the voice recognition accuracy is lowered due to the influence of the environmental sound. Moreover, even if the volume of the input voice is too high, the voice recognition accuracy is lowered. If the speech recognition accuracy is low, information that the user wants to recognize is specified as a recognition result with a low recognition rank, or is not specified as a recognition result. Therefore, in order to specify the information to be recognized as a recognition result having a high recognition rank, the user needs to input sound at a sound volume to be input.
- the volume of the input voice when the volume of the input voice is low, if the guidance regarding the volume of the voice to be input is output, such as “If you speak loudly, the voice will be easily recognized.” You can grasp the volume.
- the guidance is output in a manner corresponding to the recognition order of given information such as a magic word input by voice by the user.
- the lower the speech recognition accuracy the lower the recognition order of the information. Therefore, the user can know the speech recognition accuracy using the guidance output mode as a clue, and as a result, the speech to be input. It will be possible to grasp the volume of the sound more accurately.
- the recognition order of given information has not been used for the output control of guidance related to the volume of voice to be input.
- the present invention has been made in view of the above problems, and one of its purposes is a guidance device capable of performing guidance output control on the volume of voice to be input using the recognition order of voice to be accepted, guidance It is to provide a method, a program, and an information storage medium.
- the guidance device inputs a reception unit that receives voice and a mode according to the recognition order of the information when given information is specified as the voice recognition result. And an output control unit that controls to output guidance related to the volume of the sound to be output.
- another guidance device includes a receiving unit that receives voice and a normal state in which a predetermined command cannot be input by voice when given information is identified as the voice recognition result.
- a state changing unit for changing to a voice recognition state in which the command can be input by voice, and an output for controlling so that guidance regarding the volume of the voice to be input according to the recognized volume of the voice is output after the change
- a control unit for controlling the control unit.
- the guidance method according to the present invention includes a step of accepting voice and guidance on the volume of voice to be input in a manner corresponding to the recognition order of the information when given information is specified as the voice recognition result. Is controlled to be output.
- the program according to the present invention outputs a guidance regarding the volume of the voice to be input in a manner corresponding to the recognition order of the information when the procedure for accepting the sound and the predetermined information is specified as the recognition result of the sound.
- the computer is caused to execute a procedure for controlling to perform the control.
- the information storage medium provides a procedure for accepting sound, and guidance regarding sound volume to be input in a manner corresponding to the recognition order of the information when given information is specified as the recognition result of the sound.
- a computer-readable information storage medium storing a program characterized by causing a computer to execute a procedure for controlling the output of a computer.
- the recognition order of the information is first And a state changing unit that changes to a voice recognition state in which the command can be input by voice, and the output control unit controls the guidance to be output after the state is changed to the voice recognition state.
- the output control unit when the given information is specified as a speech recognition result received in the normal state and the recognition order of the information is other than the first, the normal state You may control so that the said guidance may be output as it is.
- the output control unit controls the guidance to be output only when the condition that the volume of the recognized voice is smaller than a predetermined volume is further satisfied.
- FIG. 1 is a diagram illustrating an example of the overall configuration of an information processing system 10 according to an embodiment of the present invention.
- the information processing system 10 includes an information processing device 12, a display 14, a camera microphone unit 16, and a controller 18.
- the information processing apparatus 12 is a computer such as an entertainment apparatus such as a game console, and includes a control unit 20, a storage unit 22, a communication unit 24, and an input / output unit 26 as shown in FIG. It is out.
- the control unit 20 is a program control device such as a CPU that operates according to a program installed in the information processing apparatus 12, for example.
- the storage unit 22 is, for example, a storage element such as a ROM or a RAM, a hard disk drive, or the like.
- the storage unit 22 stores a program executed by the control unit 20.
- the communication unit 24 is, for example, a communication interface such as a network board or a wireless LAN module.
- the input / output unit 26 is an input / output port such as an HDMI (registered trademark) (High-Definition Multimedia Interface) port or a USB port.
- the display 14 according to the present embodiment is a liquid crystal display or the like, and displays a screen or the like generated by the information processing apparatus 12.
- the display 14 according to the present embodiment also includes a speaker that outputs sound represented by the sound data generated by the information processing apparatus 12.
- the camera microphone unit 16 according to the present embodiment acquires, for example, a camera 16a that outputs an image obtained by capturing a subject to the information processing apparatus 12 and surrounding sound, converts the sound into sound data, and outputs the sound data to the information processing apparatus 12. Including a microphone 16b.
- the information processing apparatus 12 and the display 14 are connected via, for example, an AV cable or an HDMI cable.
- the information processing apparatus 12 and the camera microphone unit 16 are connected via, for example, a USB cable, an AV cable, an HDMI (registered trademark) (High-Definition Multimedia Interface) cable, or the like.
- the controller 18 according to the present embodiment is an operation input device for performing an operation input to the information processing device 12.
- the controller 18 according to the present embodiment is provided with operators such as buttons, a touch panel, and operation sticks.
- the controller 18 according to the present embodiment includes sensors such as a gyro sensor that detects angular velocity and an acceleration sensor that detects acceleration.
- the controller 18 includes a jack, and by inserting a plug included in the microphone into the jack, voice input by the microphone can be performed.
- the sound input to the microphone inserted into the controller 18 is converted into sound data by the controller 18 and output to the information processing apparatus 12.
- the information processing apparatus 12 when the user inputs voice to the microphone 16b included in the camera microphone unit 16, the information processing apparatus 12 recognizes the voice and executes various processes according to the recognition result of the voice. In this way, in the present embodiment, the user can operate the information processing apparatus 12 by voice.
- the recognition result of the voice input to the microphone 16b is the recognition result of the voice input to the microphone 16 of the camera microphone unit 16. It has been given priority over handling.
- the user can perform various operation inputs using the controller 18 by pressing a button or tilting an operation stick.
- the controller 18 outputs input data associated with the operation input to the information processing apparatus 12.
- the controller 18 includes a USB port.
- the controller 18 can output input data to the information processing device 12 by wire via the input / output unit 26 by connecting to the information processing device 12 with a USB cable.
- the controller 18 according to the present embodiment includes a wireless communication module and the like, and can output input data to the information processing apparatus 12 wirelessly via the communication unit 24.
- a known speech recognition engine is installed in the information processing apparatus 12 according to the present embodiment.
- the speech recognition engine identifies a plurality of pieces of information ranked by an index such as likelihood as a recognition result of speech input to the information processing device 12, that is, speech received by the information processing device 12. It has become.
- the voice recognition engine can identify the relative volume of the sound recognized by the information processing apparatus 12 as the user's voice based on the input voice.
- the ratio of the sound volume recognized by the information processing apparatus 12 as the user's voice to the sound volume recognized by the information processing apparatus 12 as ambient noise, that is, environmental sound can be specified. It is like that.
- the ratio specified in this way is referred to as SNR (signal-to-noise ratio).
- the information processing apparatus 12 executes various processes in an execution state of either a normal state where a predetermined command cannot be input by voice or a voice recognition state where a predetermined command can be input by voice. To do. The user can switch the execution state of the information processing apparatus 12 as appropriate.
- FIG. 3 is a diagram illustrating an example of the home screen 30 displayed on the display 14 according to the present embodiment.
- the user can select one of programs installed in the information processing apparatus 12.
- the home screen 30 has program icon images 32 (program icon images 32-1 to 32-5 in the example of FIG. 3) associated with the programs as shown in FIG.
- program icon images 32 for at least a part of programs installed in the information processing apparatus 12 are arranged.
- any one of the program icon images 32 that is, the program icon image 32-1 in the example of FIG. 3, is selected.
- the selected program icon image 32-1 is displayed in a different form from the other program icon images 32.
- the option that has been selected is referred to as an option of interest.
- the program icon image 32-1 is the attention option.
- a character string representing the name of the program associated with the program icon image 32-1 is arranged at the lower right of the program icon image 32-1 as the option of interest.
- the program icon image 32-1 which is the option of interest is highlighted more than the other program icon images 32 (32-2 to 32-5).
- the program icon image 32-1 is larger in size than the other program icon images 32 (32-2 to 32-5), and a frame is arranged around the program icon image 32-1.
- the above indicates that the program icon image 32-1 is selected, that is, the program icon image 32-1 is the option of interest.
- the method for indicating that the program icon image 32 is the option of interest is not limited to that shown in FIG.
- a predetermined time for example, 10 seconds elapses after the home screen 30 shown in FIG. 3 is displayed.
- the home screen 30 displayed on the display 14 is in the state shown in FIG.
- the home screen 30 switches between the state shown in FIG. 4 and the state shown in FIG. 5 at a predetermined time interval, for example, every 3 seconds.
- a controller operation guide image 34 is arranged on the home screen 30.
- operation guide information OI for guiding operation contents by the controller 18 is arranged.
- the magic word guidance image 36 is arranged on the home screen 30.
- magic word guidance information MI for prompting voice input of given information representing a voice for starting voice recognition is arranged.
- the magic word guidance information MI includes given information representing a voice for starting voice recognition.
- FIG. 5 shows a phrase “starting speech recognition” as an example of the given information.
- a given phrase representing a voice for starting voice recognition is referred to as a magic word MW.
- both the operation to be performed on the controller 18 and the sound to be input are both determined by the user. Will be guided to.
- the home screen 30 is not switched to the state where the magic word guidance image 36 is arranged.
- the home screen 30 is in a state where the magic word guidance image 36 is arranged. Will not switch.
- the execution state of the information processing apparatus 12 is in the normal state.
- the information processing apparatus 12 receives the user's voice when the displayed home screen 30 is in the state illustrated in FIG. 3, FIG. 4, or FIG. 5.
- the rank processing relation data shown in FIG. 6 the SNR value identified based on the voice and the recognition rank of the magic word MW identified based on the sound recognized as the user's voice. Processing corresponding to the combination of the above will be executed.
- the order process relation data shown in FIG. 6 shows the relation between the recognition order of predetermined information such as the magic word MW and the name of the process to be executed.
- the combination of the condition related to the volume and the recognition rank of the predetermined information is managed in association with the name of the process to be executed. More specifically, for example, a combination of the condition regarding the SNR value specified based on the received speech and the recognition rank of the magic word MW is managed in association with the name of the process to be executed.
- the identified SNR value is larger than a predetermined value L1 (here, greater than 5 dB, for example), and the recognition result having the first recognition rank is the magic word MW (here, “voice recognition start”, for example).
- L1 a predetermined value
- MW the magic word
- the execution state of the information processing apparatus 12 is changed to the voice recognition state.
- the guidance regarding the volume of the voice to be input is displayed as it is changed. In this case, the home screen 30 is switched to the state shown in FIG.
- guidance regarding the volume of the voice to be input is output.
- the execution state of the information processing apparatus 12 remains in the normal state.
- the home screen 30 is switched to the state shown in FIG.
- the execution state of the information processing apparatus 12 is the voice recognition state, as shown in FIGS. 7 and 8, the voice to be input when performing the voice input of the command on the home screen 30 is guided to the user.
- a voice input guidance image 38 is arranged.
- the execution state of the information processing apparatus 12 is a voice recognition state
- the information processing apparatus 12 recognizes information represented by the voice and based on the recognition result.
- the command represented by the voice is specified. Then, the information processing apparatus 12 executes processing according to the command.
- At least one command information CI indicating a command is arranged.
- a word indicating a command is arranged as the command information CI.
- other information such as an icon image symbolizing a command may be arranged as the command information CI instead of a word.
- the command information CI is associated with a command that can be received by the information processing apparatus 12.
- processing corresponding to the command associated with the command information CI is executed.
- command information CIs are arranged on the home screen 30 illustrated in FIGS.
- a command identification image CIP is arranged on the left side of the command information CI arranged on the home screen 30.
- the command identification image CIP allows the user to recognize that processing corresponding to the command associated with the command information CI is executed by inputting the voice represented by the command information CI on the right side.
- the home screen 30 shown in FIG. 7 or 8 is displayed on the display 14.
- the information processing apparatus 12 receives a voice representing the phrase “begin”
- the execution of the program associated with the program icon image 32 that is the option of interest is started.
- a screen that can be controlled to turn off the information processing apparatus 12 is displayed on the display 14.
- the information processing apparatus 12 receives a voice representing the phrase “take a screen shot”
- a captured image obtained by capturing the display content of the home screen 30 as a still image is stored in the storage unit 22 of the information processing apparatus 12. Is done.
- the information processing apparatus 12 When the information processing apparatus 12 receives a voice representing the phrase “login”, a screen showing a list of users is displayed on the display 14. And the user's login is performed by carrying out the voice input of the identifier of the user registered into the information processing apparatus 12 on the said screen.
- an operator such as a button of the controller 18 is assigned to the command represented by the command information CI.
- processing corresponding to a command associated with the button is executed.
- the process according to the command represented by the command information CI can be executed either by operating the operator or by voice input.
- the displayed program icon image 32 is associated with at least one piece of information such as the name, abbreviation, and common name of the program associated with the program icon image 32.
- program name input guidance information PI for prompting voice input of the name of a program such as a game is arranged in the voice input guidance image 38 shown in FIGS.
- the program icon image 32 associated with the program is specified as the option of interest.
- the program icon image 32 associated with any one of the plurality of programs is specified as the attention option.
- the home screen 30 shown in FIG. 7 is displayed and the information processing apparatus 12 receives a voice representing the phrase “dragon game”
- the program icon image 32-4 is identified as the attention option. Will be.
- the home screen 30 displayed so that the program icon image 32-4 is highlighted is updated.
- the command corresponding to the received voice may be specified when the volume of the received voice is within a predetermined volume range.
- the lower limit of the predetermined volume range may be larger than the predetermined volume L1.
- volume guidance information VI that is guidance regarding the volume of the voice to be inputted in this embodiment is displayed.
- the voice input guidance image 38 is arranged.
- a character string “speaking with a louder voice becomes easier to recognize” is arranged in the voice input guidance image 38.
- the volume guidance information VI may be displayed for the timing when the execution state of the information processing apparatus 12 is changed from the normal state to the voice recognition state.
- the volume guidance is provided when the execution state of the information processing apparatus 12 is the voice recognition state even if the SNR value specified based on the received voice is equal to or less than the predetermined value L1. Information VI is not displayed.
- volume guidance information VI is arranged instead of the magic word guidance information MI shown in FIG.
- guidance regarding the sound volume to be input is output in a manner corresponding to the recognition order of given information such as the magic word MW identified as the speech recognition result.
- the recognition order of given information such as the magic word MW
- the voice input guide image 38 displayed when the execution state of the information processing apparatus 12 is the voice recognition state.
- the above-mentioned volume guidance information VI is arranged.
- the recognition order of given information such as the magic word MW
- the above-described volume guidance is displayed in the magic word guidance image 36 displayed when the execution state of the information processing apparatus 12 is the normal state.
- Information VI is arranged.
- the user can output the guidance regarding the volume of the voice to be input in a manner corresponding to the recognition order of the given information specified as the voice recognition result. This makes it possible to know the accuracy of speech recognition using the output mode as a clue. As a result, the user can more accurately grasp the sound volume to be input.
- the volume of the recognized user's voice when the volume of the recognized user's voice is relatively small compared to the volume of the ambient noise, that is, the environmental sound, guidance is provided to prompt the user to input with a louder voice. Will be output.
- the voice recognition accuracy is lowered even if the volume of the input voice is too high. Therefore, for example, when the volume of the recognized voice is higher than a predetermined volume, the volume of the voice to be input in a manner corresponding to the recognition order of given information such as the magic word MW specified as the voice recognition result.
- a guidance may be output.
- a character string “speaking with a small voice makes it easier to recognize the voice” may be displayed.
- guidance regarding the volume of the voice to be input is output in a manner corresponding to the recognition order of given information such as the magic word MW identified as the voice recognition result. It may be.
- guidance regarding the volume of the voice to be input may be output as a voice.
- the SNR value specified based on the input voice is equal to or less than a predetermined value L1.
- the recognition order of given information such as the magic word MW is first, the execution state of the information processing apparatus 12 is changed to the voice recognition state, and then “speech A voice representing the content “can be easily recognized” may be output.
- the recognition order of the given information such as the magic word MW is the second, the execution state of the information processing device 12 remains in the normal state.
- a voice representing the content of "" may be output as a voice.
- the information processing apparatus 12 is also provided when a predetermined operation element is pressed while the home screen 30 is in the state shown in FIG. 3, FIG. 4, or FIG. 5.
- the execution state is changed to the voice recognition state.
- the home screen 30 is switched to the state shown in FIG.
- the execution state of the information processing apparatus 12 can be set to the voice recognition state either by an operation by the controller 18 or by voice input of the magic word MW.
- the execution state of the information processing apparatus 12 becomes the voice recognition state and the state in which neither the operation by the controller 18 nor the voice input is performed for a predetermined time, for example, 10 seconds
- the execution of the information processing apparatus 12 is performed.
- the state is changed to the normal state.
- the operation by the controller 18 is performed after the execution state of the information processing apparatus 12 becomes the voice recognition state
- the execution state of the information processing apparatus 12 is changed to the normal state.
- the home screen 30 shown in FIG. 7 or FIG. 8 is displayed on the display 14 and the execution state of the information processing apparatus 12 becomes the normal state
- the displayed home screen 30 is the one shown in FIG. Changed to
- the home screen 30 may be updated to that shown in FIG. 7 where the volume guidance information VI is not arranged. Further, it is assumed that a state in which neither the operation by the controller 18 nor the voice input is performed from the state in which the home screen 30 in the normal state is displayed as illustrated in FIG. In this case, the home screen 30 may be updated to the one shown in FIG. 5 in which the volume guidance information VI is not arranged.
- the volume of the sound output from the speaker may be controlled to be small.
- the information processing apparatus 12 may be able to recognize the voice represented by the command information CI when the execution state of the information processing apparatus 12 is the normal state.
- the displayed home screen 30 is in the state shown in FIG. 3, FIG. 4, or FIG. It is assumed that the SNR value specified based on the voice is equal to or less than the predetermined value L1.
- the recognition result having the first recognition rank is one of the command information CI described above, and the recognition result having the second recognition rank is the magic word MW
- the home screen 30 is displayed as shown in FIG. You may make it switch to the state shown to.
- the home screen 30 is in the state shown in FIG. You may make it switch. Further, the home screen 30 may be switched to the state shown in FIG. 8 even when the recognition rank of the magic word MW is 3rd or lower.
- FIG. 10 is a functional block diagram illustrating an example of functions related to execution state change control and volume guidance information VI display control of the information processing apparatus 12 implemented in the information processing apparatus 12 according to the present embodiment. Note that the information processing apparatus 12 according to the present embodiment does not have to include all the functions illustrated in FIG. 10, and functions other than the functions illustrated in FIG. 10 may be mounted.
- the information processing apparatus 12 functionally includes, for example, a rank processing relation data storage unit 40, a state management data storage unit 42, an operation reception unit 44, a voice reception unit 46, and a voice.
- a recognition unit 48, a process specifying unit 50, a state changing unit 52, a process executing unit 54, a screen generating unit 56, and an output control unit 58 are included.
- the rank processing relation data storage unit 40 and the state management data storage unit 42 are mainly implemented by the storage unit 22.
- the operation reception unit 44 is mainly mounted with the communication unit 24 or the input / output unit 26.
- the voice receiving unit 46 is mainly mounted with the input / output unit 26.
- the output control unit 58 is mainly mounted with the input / output unit 26.
- Other functions are mainly implemented by the control unit 20.
- the voice recognition unit 48 corresponds to a function implemented by the voice recognition engine described above.
- the above functions are implemented by causing the control unit 20 to execute a program that is installed in the information processing apparatus 12 that is a computer and that includes instructions corresponding to the above functions.
- This program is supplied to the information processing apparatus 12 via a computer-readable information storage medium such as an optical disk, a magnetic disk, a magnetic tape, a magneto-optical disk, or a flash memory, or via communication means such as the Internet.
- the rank process relation data storage unit 40 stores rank process relation data exemplified in FIG.
- the state management data storage unit 42 stores state management data for managing the execution state of the information processing apparatus 12.
- the state management data takes a value of either “normal state” or “voice recognition state”.
- the operation reception unit 44 receives an operation on the controller 18.
- the operation reception unit 44 receives, for example, a signal indicating a pressed button.
- the voice reception unit 46 receives voice.
- the voice receiving unit 46 receives voice input to the microphone 16b by the user.
- the voice recognition unit 48 recognizes the voice received by the voice reception unit 46.
- the speech recognition unit 48 specifies at least one piece of information ranked by an index such as likelihood as a speech recognition result received by the speech reception unit 46.
- an index such as likelihood
- the voice recognition unit 48 specifies the relative volume of the sound volume recognized by the information processing apparatus 12 as the user's voice based on the voice received by the voice reception unit 46.
- the speech recognition unit 48 specifies, for example, the above-described SNR value.
- the process specifying unit 50 specifies a process to be executed.
- the process specifying unit 50 is specified as rank processing relation data stored in the rank process relation data storage unit 40, the SNR value specified by the voice recognition unit 48, and the recognition result by the voice recognition unit 48.
- the name of the process to be executed is specified based on the information.
- the process specifying unit 50 specifies a process corresponding to the command associated with the command information CI described above based on the recognition result by the voice recognition unit 48 or the operation received by the operation receiving unit 44.
- the state changing unit 52 changes the execution state of the information processing apparatus 12.
- the state changing unit 52 changes the value of the state management data stored in the state management data storage unit 42 based on the recognition result by the voice recognition unit 48 or the operation received by the operation receiving unit 44.
- the state change unit 52 executes the execution state of the information processing device 12.
- the voice recognition state For example, when given information such as the magic word MW is in a predetermined order, for example, when it is first, the value of the state management data may be changed to “voice recognition state”.
- the process executing unit 54 executes the process when the process specifying unit 50 specifies a process corresponding to the command associated with the command information CI described above.
- the screen generation unit 56 generates data indicating a screen such as the home screen 30.
- the screen generation unit 56 generates data indicating a screen at a predetermined frame rate. Further, the screen generation unit 56 selects the screen based on the SNR value identified by the voice recognition unit 48, the value of the state management data stored in the state management data storage unit 42, the execution result of the process by the process execution unit 54, and the like. Generate data indicating
- the output control unit 58 relates to the sound volume to be input in a manner corresponding to the recognition order of the information when the given information such as the magic word MW is specified as the speech recognition result recognized by the speech recognition unit 48. Control to output guidance. For example, it is assumed that given information is identified as a speech recognition result received when the execution state of the information processing apparatus 12 is the normal state, and the recognition rank of the information is first. In this case, the output control unit 58 according to the present embodiment controls to output guidance related to the volume of the voice to be input after the execution state of the information processing apparatus 12 is changed from the normal state to the voice recognition state.
- the output control unit 58 performs control so that guidance regarding the sound volume to be input is output while the execution state of the information processing apparatus 12 is in the normal state.
- the output control unit 58 may perform control so that guidance regarding the volume of the voice to be input is output only when the condition that the volume of the recognized voice is lower than the predetermined volume is satisfied.
- the output control unit 58 performs control so that the screen generated by the screen generation unit 56 is output.
- the output control unit 58 outputs data indicating the screen to the display 14 every time the screen generation unit 56 generates data indicating the screen at a predetermined frame rate.
- the display 14 displays a screen corresponding to the data.
- the screen is displayed on the display 14 at a predetermined frame rate.
- the output control unit 58 may perform control to output a guidance regarding the volume of the voice to be input.
- the screen generation unit 56 changes the execution state of the information processing apparatus 12 to the voice recognition state, and then corresponds to the volume of the recognized voice.
- Data indicating a screen on which the volume guidance information VI is arranged may be generated.
- the output control unit 58 performs control so that guidance related to the volume of the voice to be input according to the recognized volume of the voice is output after the execution state of the information processing apparatus 12 is changed.
- the voice receiving unit 46 receives the voice, it is performed by the information processing apparatus 12 according to the present embodiment.
- An example of the flow of processing will be described with reference to the flowchart shown in FIG. In this situation, since the execution state of the information processing apparatus 12 is the normal state, the value of the state management data stored in the state management data storage unit 42 is “normal state”.
- the voice recognition unit 48 specifies the value of the SNR based on the voice received by the voice reception unit 46 (S101). Then, the voice recognition unit 48 identifies at least one ranked information as a recognition result based on the voice received by the voice reception unit 46 (S102).
- the process specifying unit 50 is based on the rank process relation data stored in the rank process relation data storage unit 40, the SNR value specified in the process shown in S101, and the information specified in the process shown in S102.
- the name of the process to be executed is specified (S103).
- the SNR value specified in the process shown in S101 is larger than 5 dB, and the magic word MW is specified as the recognition result of the first recognition rank in the process shown in S102. In this case, it is specified as the name of the process to be executed by “change to voice recognition state”. Further, for example, it is assumed that the SNR value specified in the process shown in S101 is 5 dB or less and the magic word MW is specified as the recognition result of the first recognition rank in the process shown in S102. In this case, it is specified as the name of the process to be executed by “change to voice recognition state” and “display guidance”.
- the state change unit 52 checks whether or not “change to voice recognition state” is included in the name of the process specified in the process shown in S103 (S104). When it is confirmed that it is included (S104: Y), the state changing unit 52 changes the value of the state management data stored in the state management data storage unit 42 to “voice recognition state” (S105).
- the screen generator 56 When it is confirmed in the process shown in S104 that the name of the process specified in the process shown in S103 does not include “change to voice recognition state” (S104: N), or the process shown in S105 is ended. If so, the screen generator 56 generates the home screen 30 (S106). In the process shown in S106, the screen generation unit 56 determines whether or not “guidance display” is included in the value of the state management data stored in the state management data storage unit 42 and the name of the process specified in the process shown in S103. Based on this, the home screen 30 is generated. For example, when the value of the state management data stored in the state management data storage unit 42 is “voice recognition state”, the home screen 30 on which the voice input guidance image 38 is arranged is generated.
- the home screen 30 on which the magic word guidance image 36 is arranged is generated. Further, when “guidance display” is included in the name of the process specified in the process shown in S103, the home screen 30 on which the volume guidance information VI is arranged is generated. In the present processing example, if none of the above conditions is applicable, the screen generation unit 56 generates the same screen as the displayed screen in the processing shown in S106.
- the output control unit 58 outputs the data indicating the home screen 30 generated in the process shown in S106 to the display 14 (S107), and ends the process shown in this process example.
- the display 14 displays a screen corresponding to the data.
- voice data representing the voice of guidance related to the volume of the voice to be input may be output to the display 14.
- the display 14 outputs the voice represented by the voice data.
- the screen generator 56 may generate the home screen 30 in which the volume guidance information VI is not arranged in the process shown in S106.
- the information processing device 12 may be a portable game device including a camera 16a and a microphone 16b. Further, the information processing apparatus 12 may be a personal computer, a tablet terminal, a smartphone, or the like. Further, the division of roles of the information processing apparatus 12, the display 14, and the camera microphone unit 16 is not limited to the above. Further, the information processing apparatus 12 may be composed of a plurality of cases.
Abstract
Description
Claims (8)
- 音声を受け付ける受付部と、
前記音声の認識結果として所与の情報が特定される場合に当該情報の認識順位に応じた態様で入力すべき音声の音量に関する案内が出力されるよう制御する出力制御部と、
を含むことを特徴とする案内装置。 - 音声による所定のコマンドの入力が不可能な通常状態である際に受け付ける音声の認識結果として前記所与の情報が特定され当該情報の認識順位が第1位である場合に、音声による前記コマンドの入力が可能な音声認識状態に変更する状態変更部、をさらに含み、
前記出力制御部は、前記音声認識状態に変更された後に前記案内が出力されるよう制御する、
ことを特徴とする請求項1に記載の案内装置。 - 前記出力制御部は、前記通常状態である際に受け付ける音声の認識結果として前記所与の情報が特定され当該情報の認識順位が第1位以外である場合に、前記通常状態のままで前記案内が出力されるよう制御する、
ことを特徴とする請求項2に記載の案内装置。 - 前記出力制御部は、認識された前記音声の音量が所定の音量よりも小さいという条件をさらに満足する場合に限って前記案内が出力されるよう制御する、
ことを特徴とする請求項1から3のいずれか一項に記載の案内装置。 - 音声を受け付ける受付部と、
前記音声の認識結果として所与の情報が特定される場合に、音声による所定のコマンドの入力が不可能な通常状態から音声による当該コマンドの入力が可能な音声認識状態に変更する状態変更部と、
認識された前記音声の音量に応じた入力すべき音声の音量に関する案内が当該変更の後に出力されるよう制御する出力制御部と、
を含むことを特徴とする案内装置。 - 音声を受け付けるステップと、
前記音声の認識結果として所与の情報が特定される場合に当該情報の認識順位に応じた態様で入力すべき音声の音量に関する案内が出力されるよう制御するステップと、
を含むことを特徴とする案内方法。 - 音声を受け付ける手順、
所定の情報が前記音声の認識結果として特定される場合に当該情報の認識順位に応じた態様で入力すべき音声の音量に関する案内が出力されるよう制御する手順、
をコンピュータに実行させることを特徴とするプログラム。 - 音声を受け付ける手順、
所与の情報が前記音声の認識結果として特定される場合に当該情報の認識順位に応じた態様で入力すべき音声の音量に関する案内が出力されるよう制御する手順、
をコンピュータに実行させることを特徴とするプログラムを記憶したコンピュータ読み取り可能な情報記憶媒体。
Priority Applications (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR1020167030516A KR101883414B1 (ko) | 2014-05-02 | 2015-05-01 | 안내 장치, 안내 방법, 프로그램 및 정보 저장 매체 |
JP2016516421A JP6383409B2 (ja) | 2014-05-02 | 2015-05-01 | 案内装置、案内方法、プログラム及び情報記憶媒体 |
US15/303,642 US9870772B2 (en) | 2014-05-02 | 2015-05-01 | Guiding device, guiding method, program, and information storage medium |
EP15785826.7A EP3139377B1 (en) | 2014-05-02 | 2015-05-01 | Guidance device, guidance method, program, and information storage medium |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2014095233 | 2014-05-02 | ||
JP2014-095233 | 2014-05-02 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2015167008A1 true WO2015167008A1 (ja) | 2015-11-05 |
Family
ID=54358729
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2015/063064 WO2015167008A1 (ja) | 2014-05-02 | 2015-05-01 | 案内装置、案内方法、プログラム及び情報記憶媒体 |
Country Status (5)
Country | Link |
---|---|
US (1) | US9870772B2 (ja) |
EP (1) | EP3139377B1 (ja) |
JP (1) | JP6383409B2 (ja) |
KR (1) | KR101883414B1 (ja) |
WO (1) | WO2015167008A1 (ja) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10448762B2 (en) | 2017-09-15 | 2019-10-22 | Kohler Co. | Mirror |
US10663938B2 (en) | 2017-09-15 | 2020-05-26 | Kohler Co. | Power operation of intelligent devices |
US10887125B2 (en) | 2017-09-15 | 2021-01-05 | Kohler Co. | Bathroom speaker |
US11093554B2 (en) | 2017-09-15 | 2021-08-17 | Kohler Co. | Feedback for water consuming appliance |
US11099540B2 (en) | 2017-09-15 | 2021-08-24 | Kohler Co. | User identity in household appliances |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11170768B2 (en) * | 2017-04-17 | 2021-11-09 | Samsung Electronics Co., Ltd | Device for performing task corresponding to user utterance |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH06236196A (ja) * | 1993-02-08 | 1994-08-23 | Nippon Telegr & Teleph Corp <Ntt> | 音声認識方法および装置 |
JP2000081891A (ja) * | 1998-09-03 | 2000-03-21 | Seiko Epson Corp | 認識対象音声の入力状態報知方法及び音声認識装置並びに認識対象音声の入力状態報知処理プログラムを記録した記録媒体 |
JP2000322078A (ja) * | 1999-05-14 | 2000-11-24 | Sumitomo Electric Ind Ltd | 車載型音声認識装置 |
JP2001042891A (ja) * | 1999-07-27 | 2001-02-16 | Suzuki Motor Corp | 音声認識装置、音声認識搭載装置、音声認識搭載システム、音声認識方法、及び記憶媒体 |
JP2003148987A (ja) * | 2001-11-09 | 2003-05-21 | Mitsubishi Electric Corp | ナビゲーション装置 |
JP2006227499A (ja) * | 2005-02-21 | 2006-08-31 | Toyota Motor Corp | 音声認識装置 |
Family Cites Families (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
DE19939102C1 (de) * | 1999-08-18 | 2000-10-26 | Siemens Ag | Verfahren und Anordnung zum Erkennen von Sprache |
DE19956747C1 (de) * | 1999-11-25 | 2001-01-11 | Siemens Ag | Verfahren und Vorrichtung zur Spracherkennung sowie ein Telekommunikationssystem |
GB2417812B (en) * | 2003-05-08 | 2007-04-18 | Voice Signal Technologies Inc | A signal-to-noise mediated speech recognition algorithm |
US7480618B2 (en) * | 2004-09-02 | 2009-01-20 | Microsoft Corporation | Eliminating interference of noisy modality in a multimodal application |
JP4786384B2 (ja) * | 2006-03-27 | 2011-10-05 | 株式会社東芝 | 音声処理装置、音声処理方法および音声処理プログラム |
US8140325B2 (en) | 2007-01-04 | 2012-03-20 | International Business Machines Corporation | Systems and methods for intelligent control of microphones for speech recognition applications |
US20110166862A1 (en) * | 2010-01-04 | 2011-07-07 | Eyal Eshed | System and method for variable automated response to remote verbal input at a mobile device |
KR101661767B1 (ko) * | 2010-08-19 | 2016-09-30 | 현대모비스 주식회사 | 음성을 이용한 사용자 인터페이스를 제공하는 음성인식 방법 및 장치 |
US20120089392A1 (en) | 2010-10-07 | 2012-04-12 | Microsoft Corporation | Speech recognition user interface |
JP5790238B2 (ja) | 2011-07-22 | 2015-10-07 | ソニー株式会社 | 情報処理装置、情報処理方法及びプログラム |
US9047857B1 (en) * | 2012-12-19 | 2015-06-02 | Rawles Llc | Voice commands for transitioning between device states |
US20140257799A1 (en) * | 2013-03-08 | 2014-09-11 | Daniel Shepard | Shout mitigating communication device |
-
2015
- 2015-05-01 EP EP15785826.7A patent/EP3139377B1/en active Active
- 2015-05-01 WO PCT/JP2015/063064 patent/WO2015167008A1/ja active Application Filing
- 2015-05-01 US US15/303,642 patent/US9870772B2/en active Active
- 2015-05-01 KR KR1020167030516A patent/KR101883414B1/ko active IP Right Grant
- 2015-05-01 JP JP2016516421A patent/JP6383409B2/ja active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH06236196A (ja) * | 1993-02-08 | 1994-08-23 | Nippon Telegr & Teleph Corp <Ntt> | 音声認識方法および装置 |
JP2000081891A (ja) * | 1998-09-03 | 2000-03-21 | Seiko Epson Corp | 認識対象音声の入力状態報知方法及び音声認識装置並びに認識対象音声の入力状態報知処理プログラムを記録した記録媒体 |
JP2000322078A (ja) * | 1999-05-14 | 2000-11-24 | Sumitomo Electric Ind Ltd | 車載型音声認識装置 |
JP2001042891A (ja) * | 1999-07-27 | 2001-02-16 | Suzuki Motor Corp | 音声認識装置、音声認識搭載装置、音声認識搭載システム、音声認識方法、及び記憶媒体 |
JP2003148987A (ja) * | 2001-11-09 | 2003-05-21 | Mitsubishi Electric Corp | ナビゲーション装置 |
JP2006227499A (ja) * | 2005-02-21 | 2006-08-31 | Toyota Motor Corp | 音声認識装置 |
Non-Patent Citations (1)
Title |
---|
See also references of EP3139377A4 * |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10448762B2 (en) | 2017-09-15 | 2019-10-22 | Kohler Co. | Mirror |
US10663938B2 (en) | 2017-09-15 | 2020-05-26 | Kohler Co. | Power operation of intelligent devices |
US10887125B2 (en) | 2017-09-15 | 2021-01-05 | Kohler Co. | Bathroom speaker |
US11093554B2 (en) | 2017-09-15 | 2021-08-17 | Kohler Co. | Feedback for water consuming appliance |
US11099540B2 (en) | 2017-09-15 | 2021-08-24 | Kohler Co. | User identity in household appliances |
US11314214B2 (en) | 2017-09-15 | 2022-04-26 | Kohler Co. | Geographic analysis of water conditions |
US11314215B2 (en) | 2017-09-15 | 2022-04-26 | Kohler Co. | Apparatus controlling bathroom appliance lighting based on user identity |
US11892811B2 (en) | 2017-09-15 | 2024-02-06 | Kohler Co. | Geographic analysis of water conditions |
US11921794B2 (en) | 2017-09-15 | 2024-03-05 | Kohler Co. | Feedback for water consuming appliance |
US11949533B2 (en) | 2017-09-15 | 2024-04-02 | Kohler Co. | Sink device |
Also Published As
Publication number | Publication date |
---|---|
EP3139377A1 (en) | 2017-03-08 |
KR20160138572A (ko) | 2016-12-05 |
EP3139377A4 (en) | 2018-01-10 |
KR101883414B1 (ko) | 2018-07-31 |
US20170032782A1 (en) | 2017-02-02 |
JPWO2015167008A1 (ja) | 2017-04-20 |
EP3139377B1 (en) | 2024-04-10 |
JP6383409B2 (ja) | 2018-08-29 |
US9870772B2 (en) | 2018-01-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP6383409B2 (ja) | 案内装置、案内方法、プログラム及び情報記憶媒体 | |
JP5955299B2 (ja) | 表示制御装置、表示制御方法、プログラム及び情報記憶媒体 | |
KR102111983B1 (ko) | 제어 장치, 제어 방법, 및 정보 기억 매체 | |
JP6405316B2 (ja) | エンタテインメント装置、表示制御方法、プログラム及び情報記憶媒体 | |
JP6482911B2 (ja) | 機器制御方法および電気機器 | |
US20140267933A1 (en) | Electronic Device with Embedded Macro-Command Functionality | |
JP2015509680A (ja) | 音声認識を通じる端末機のロック/ロック解除状態を制御する方法及び装置 | |
US10678563B2 (en) | Display apparatus and method for controlling display apparatus | |
JP6229071B2 (ja) | 制御装置、制御方法、プログラム及び情報記憶媒体 | |
WO2017020373A1 (zh) | 一种终端应用的启动方法及终端 | |
JP6216892B2 (ja) | キャプチャ装置、キャプチャ方法、プログラム及び情報記憶媒体 | |
JPWO2019235013A1 (ja) | 情報処理装置および情報処理方法 | |
WO2018045882A1 (zh) | 控制智能终端应用的软操作键的方法及系统 | |
US20180350359A1 (en) | Methods, systems, and media for controlling a media content presentation device in response to a voice command | |
KR102662558B1 (ko) | 디스플레이 장치 및 디스플레이 장치의 제어 방법 | |
US20140085187A1 (en) | Display apparatus and control method thereof | |
JP2007219600A (ja) | マルチモーダル入力装置 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 15785826 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 2016516421 Country of ref document: JP Kind code of ref document: A |
|
REEP | Request for entry into the european phase |
Ref document number: 2015785826 Country of ref document: EP |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2015785826 Country of ref document: EP |
|
WWE | Wipo information: entry into national phase |
Ref document number: 15303642 Country of ref document: US |
|
ENP | Entry into the national phase |
Ref document number: 20167030516 Country of ref document: KR Kind code of ref document: A |
|
NENP | Non-entry into the national phase |
Ref country code: DE |