WO2016088410A1 - 情報処理装置、情報処理方法およびプログラム - Google Patents
情報処理装置、情報処理方法およびプログラム Download PDFInfo
- Publication number
- WO2016088410A1 WO2016088410A1 PCT/JP2015/073488 JP2015073488W WO2016088410A1 WO 2016088410 A1 WO2016088410 A1 WO 2016088410A1 JP 2015073488 W JP2015073488 W JP 2015073488W WO 2016088410 A1 WO2016088410 A1 WO 2016088410A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- display
- information processing
- volume
- voice
- unit
- Prior art date
Links
- 230000010365 information processing Effects 0.000 title claims abstract description 112
- 238000003672 processing method Methods 0.000 title claims description 5
- 238000005516 engineering process Methods 0.000 abstract description 5
- 238000004891 communication Methods 0.000 description 17
- 238000010586 diagram Methods 0.000 description 15
- 230000006870 function Effects 0.000 description 15
- 238000012986 modification Methods 0.000 description 15
- 230000004048 modification Effects 0.000 description 15
- 238000003384 imaging method Methods 0.000 description 8
- 238000000034 method Methods 0.000 description 6
- 230000000694 effects Effects 0.000 description 5
- 230000003287 optical effect Effects 0.000 description 5
- 238000001514 detection method Methods 0.000 description 3
- 239000011521 glass Substances 0.000 description 3
- 239000004065 semiconductor Substances 0.000 description 3
- 239000000470 constituent Substances 0.000 description 2
- 238000005401 electroluminescence Methods 0.000 description 2
- 230000001151 other effect Effects 0.000 description 2
- 125000002066 L-histidyl group Chemical group [H]N1C([H])=NC(C([H])([H])[C@](C(=O)[*])([H])N([H])[H])=C1[H] 0.000 description 1
- 230000001133 acceleration Effects 0.000 description 1
- 230000003190 augmentative effect Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000000295 complement effect Effects 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 229910044991 metal oxide Inorganic materials 0.000 description 1
- 150000004706 metal oxides Chemical class 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/16—Sound input; Sound output
- G06F3/167—Audio in a user interface, e.g. using voice commands for navigating, audio feedback
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
- G06F3/048—Interaction techniques based on graphical user interfaces [GUI]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/14—Digital output to display device ; Cooperation and interconnection of the display device with other functional units
- G06F3/1423—Digital output to display device ; Cooperation and interconnection of the display device with other functional units controlling a plurality of local displays, e.g. CRT and flat panel display
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/70—Determining position or orientation of objects or cameras
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/27—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N23/00—Cameras or camera modules comprising electronic image sensors; Control thereof
- H04N23/90—Arrangement of cameras or camera modules, e.g. multiple cameras in TV studios or sports stadiums
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
Definitions
- This disclosure relates to an information processing apparatus, an information processing method, and a program.
- the display control unit includes: a determination unit that determines a user utterance volume based on an input voice; and a display control unit that controls the display unit so that a display object is displayed on the display unit.
- a determination unit that determines a user utterance volume based on an input voice
- a display control unit that controls the display unit so that a display object is displayed on the display unit. Is provided with an information processing apparatus that causes a first moving object that moves toward the display object to be displayed on the display unit when the user utterance volume exceeds a sound recognizable volume.
- the method includes: determining a user utterance volume based on an input voice; and controlling the display unit such that a display object is displayed by the display unit, wherein the user utterance volume is a voice recognition
- an information processing method including displaying, on the display unit, a first moving object that moves toward the display object when the possible volume is exceeded.
- the computer includes: a determination unit that determines a user utterance volume based on an input voice; and a display control unit that controls the display unit so that a display object is displayed on the display unit.
- the display control unit causes the display unit to display a first moving object that moves toward the display object when the user utterance volume exceeds a voice recognizable volume, and causes the display unit to function as an information processing apparatus.
- a program is provided.
- a plurality of constituent elements having substantially the same functional configuration may be distinguished by attaching different alphabets or numbers after the same reference numeral.
- it is not necessary to particularly distinguish each of a plurality of constituent elements having substantially the same functional configuration only the same reference numerals are given.
- Embodiment of the present disclosure 1.1. System configuration example 1.2. Functional configuration example 1.3. Display of first moving object 1.4. Setting of recognizable volume 1.5. Display of second moving object 1.6. Example of operation 1.7. Modification of display mode 1.8. 1. Hardware configuration example Conclusion
- FIG. 1 is a diagram illustrating a configuration example of an information processing system 10 according to an embodiment of the present disclosure.
- the information processing system 10 includes an image input unit 110, an operation input unit 115, a voice input unit 120, and a display unit 130.
- the information processing system 10 can perform voice recognition on a voice uttered by a user U (hereinafter also simply referred to as “user”).
- the image input unit 110 has a function of inputting an image.
- the image input unit 110 includes two cameras embedded in the table Tbl.
- the number of cameras included in the image input unit 110 is not particularly limited as long as it is one or more. In such a case, the position where each of the one or more cameras included in the image input unit 110 is provided is not particularly limited.
- the one or more cameras may include a monocular camera or a stereo camera.
- the operation input unit 115 has a function of inputting a user U operation.
- the operation input unit 115 includes one camera suspended from the ceiling that exists above the table Tbl.
- the position where the camera included in the operation input unit 115 is provided is not particularly limited.
- the camera may include a monocular camera or a stereo camera.
- the operation input unit 115 may not be a camera as long as it has a function of inputting the operation of the user U.
- the operation input unit 115 may be a touch panel or a hardware button.
- the display unit 130 has a function of displaying a screen on the table Tbl.
- the display unit 130 is suspended from the ceiling above the table Tbl.
- the position where the display unit 130 is provided is not particularly limited.
- the display unit 130 may be a projector that can project a screen onto the top surface of the table Tbl.
- the display unit 130 may be a display of another form. May be.
- the display surface of the screen may be other than the top surface of the table Tbl.
- the display surface of the screen may be a wall, a building, a floor, a ground, a ceiling, and others It may be the surface at the location.
- the display surface of the screen may be a display surface that the display unit 130 has.
- the voice input unit 120 has a function of inputting voice.
- the audio input unit 120 includes a total of six microphones including three microphones existing above the table Tbl and three microphones existing on the upper surface of the table Tbl.
- the number of microphones included in the voice input unit 120 is not particularly limited as long as it is one or more. In such a case, the position where each of the one or more microphones included in the audio input unit 120 is provided is not particularly limited.
- the sound input unit 120 includes a plurality of microphones, the sound source direction can be estimated based on the sound input to each of the plurality of microphones. Further, if the sound input unit 120 includes a microphone having directivity, the sound source direction can be estimated based on the sound input to the microphone having directivity.
- FIG. 2 is a block diagram illustrating a functional configuration example of the information processing system 10 according to the embodiment of the present disclosure.
- the information processing system 10 according to the embodiment of the present disclosure includes an image input unit 110, an operation input unit 115, a voice input unit 120, a display unit 130, and an information processing device 140 (hereinafter referred to as “information processing device 140”). , Also referred to as “control unit 140”).
- the information processing apparatus 140 executes control of each unit of the information processing system 10. For example, the information processing apparatus 140 generates information output from the display unit 130. Further, for example, the information processing apparatus 140 reflects information input by the image input unit 110, the operation input unit 115, and the voice input unit 120 in information output from the display unit 130. As illustrated in FIG. 2, the information processing apparatus 140 includes an input image acquisition unit 141, an input voice acquisition unit 142, an operation detection unit 143, a determination unit 144, a voice recognition unit 145, and a display control unit 146. Is provided. Details of these functional blocks will be described later.
- the information processing apparatus 140 may be configured by, for example, a CPU (Central Processing Unit).
- a CPU Central Processing Unit
- the processing device can be configured by an electronic circuit.
- FIG. 3 is a diagram illustrating an example of a screen displayed by the display unit 130.
- the display control unit 146 displays the voice recognition cancel operation object Bu1, the voice recognition end operation object Bu2, and the display object Sb.
- the voice recognition cancel operation object Bu1 is an object for receiving an input of an operation for canceling voice recognition.
- the voice recognition end operation object Bu2 is an object for receiving an input of an operation to end voice recognition.
- the display object Sb is not particularly limited as long as the object is visible to the user.
- the display object Sb may be a stationary object or a moving object.
- the determination unit 144 determines the utterance volume by the user U based on the input voice.
- the determination method of the user utterance volume is not particularly limited.
- the determination unit 144 may estimate the sound source direction Du of the uttered voice by the user, and determine the volume input from the sound source direction Du of the uttered voice by the user as the user uttered volume.
- the estimation method of the sound source direction Du of the speech sound by the user is not particularly limited.
- the determination unit 144 may estimate the arrival direction of the voice input by the voice input unit 120 at a volume exceeding the threshold as the sound source direction Du of the uttered voice by the user. In addition, when there are a plurality of directions of arrival of sound input at a volume exceeding the threshold, the determination unit 144 first inputs the sound at a volume exceeding the threshold among the plurality of arrival directions.
- the similarity range may be determined in advance.
- the finger direction may be obtained by analyzing the input image.
- the determination unit 144 may estimate the arrival direction of the voice input with the highest volume by the voice input unit 120 as the sound source direction Du of the uttered voice by the user.
- the determination unit 144 may estimate one arrival direction that matches or is similar to the finger direction of the user who has performed an operation of selecting a voice recognition start object (not shown) as the sound source direction Du of the uttered voice by the user. .
- the determination unit 144 determines the voice input to the voice input unit 120 from a direction other than the sound source direction of the uttered voice by the user as noise, and the voice input unit 120 from a direction other than the sound source direction Du of the uttered voice by the user.
- the volume input to may be determined as the noise volume.
- the display control unit 146 moves toward the display object Sb when the user utterance volume exceeds the volume that can be recognized by the voice recognition unit 145 (hereinafter also referred to as “recognizable volume”).
- the first moving object Mu may be displayed on the display unit 130. If it does so, it will become possible to make a user grasp
- the display control unit 146 may move the first moving object Mu toward the display object Sb in the direction opposite to the sound source direction Du of the uttered voice by the user.
- the movement of the first moving object Mu is not limited to such an example.
- the display control unit 146 may control a parameter related to the first moving object Mu based on predetermined information corresponding to the input voice.
- the input voice used at this time may be input voice from the sound source direction of the uttered voice by the user.
- the parameter relating to the first moving object may include at least one of the size, shape, color, and moving speed of the first moving object Mu.
- the predetermined information corresponding to the input voice is at least one of user utterance volume, input voice frequency, recognition character string acquisition speed, feature amount extracted from the input voice, and user identified from the input voice.
- One may be included.
- the display control unit 146 may increase the movement speed of the first moving object Mu as the recognition character string acquisition speed increases. Further, when the recognition character string acquisition speed exceeds a predetermined speed, the display control unit 146 gives a predetermined movement (for example, a movement that is played by the display object Sb) to the first moving object Mu.
- a predetermined movement for example, a movement that is played by the display object Sb
- the speech recognition unit 145 may acquire the recognized character string by performing speech recognition on the input speech from the sound source direction of the uttered speech by the user. Then, compared to the case where voice recognition is directly performed on the voice input by the voice input unit 120, voice recognition is performed on voice with less noise, and thus the accuracy of voice recognition is improved. Is expected to do.
- the display control unit 146 may display the recognized character string on the display unit 130. If it does so, it becomes possible to make a user grasp the recognition character string obtained by voice recognition.
- FIG. 4 is a diagram for explaining the display start of the first moving object Mu. As shown in FIG. 4, it is assumed that the noise volume and the user utterance volume change with time. As shown in FIG. 4, the display control unit 146 displays the first moving object when the user utterance volume exceeds the recognizable volume V_able (or when the user utterance volume becomes equal to the recognizable volume V_able). Mu may be displayed on the display unit 130.
- the recognizable volume will be described in detail. It is conceivable that the recognizable volume described above is not always constant and changes based on the noise volume.
- 5 to 7 are diagrams for explaining the recognizable sound volume. For example, as shown in FIG. 5, it is considered that the recognizable volume V_able does not change when the noise volume average value N_ave falls below a predetermined lower limit value (hereinafter also referred to as “noise volume lower limit value”) N_min. Therefore, the determination unit 144 may set the specified value V_able_min for the recognizable volume V_able when the noise volume average value N_ave is lower than the noise volume lower limit N_min. Note that the noise volume itself may be used instead of the noise volume average value N_ave.
- the determination unit 144 sets the recognizable volume V_able to a volume corresponding to the noise volume average value N_ave (in the example illustrated in FIG. 6, the noise volume average value). A value obtained by multiplying N_ave by V_ratio) may be set. Note that the noise volume itself may be used instead of the noise volume average value N_ave.
- FIG. 7 shows the relationship between the noise volume average value N_ave and the recognizable volume V_able based on the examples shown in FIGS. 5 and 6.
- the noise volume average value N_ave when the noise volume average value N_ave is lower than the noise volume lower limit N_min, the specified value V_able_min is set to the recognizable volume V_able, but the noise volume average value N_ave exceeds the noise volume lower limit N_min.
- a value obtained by multiplying the noise volume average value N_ave by V_ratio is set to the recognizable volume V_able.
- the change in the recognizable sound volume V_able when the noise sound volume average value N_ave exceeds the noise sound volume lower limit value N_min may not be a linear change.
- the specified value V_able_min may be set to the recognizable volume V_able, or the volume corresponding to the noise volume average value N_ave may be set to the recognizable volume V_able. May be set.
- values such as the specified value V_able_min, noise volume lower limit N_min, and V_ratio may be set in advance according to the use environment, use case, etc. of the product that performs voice recognition, or when the voice recognition starts. It may be dynamically updated by software update or the like.
- FIG. 8 is a diagram illustrating another example of a screen displayed by the display unit 130. Referring to FIG. 8, noise sound sources Ns1 and Ns2 exist. Here, a case where two noise sound sources are present will be described, but the number of noise sound sources is not limited.
- the determination unit 144 determines the noise volume based on the input voice.
- the noise volume determination method is not particularly limited.
- the determination unit 144 may estimate the noise sound source directions Dn1 and Dn2 and determine the sound volume input from the noise sound source directions Dn1 and Dn2 as the noise sound volume.
- the estimation method of the noise sound source directions Dn1 and Dn2 is not particularly limited.
- the determination unit 144 when there are a plurality of directions of arrival of sound input at a volume exceeding the threshold, the determination unit 144 is input at a volume exceeding the threshold after the second of the plurality of arrival directions.
- the voice arrival direction may be estimated as the noise source directions Dn1 and Dn2.
- the determination unit 144 may estimate the arrival directions of the voices input by the voice input unit 120 at the second or higher volume as the noise source directions Dn1 and Dn2.
- the display control unit 146 may cause the display unit 130 to display the second moving objects Mn1 and Mn2 different from the first moving object Mu when the noise volume exceeds the voice recognizable volume. If it does so, it becomes possible to make the user U grasp
- the display control unit 146 may display the second moving objects Mn1 and Mn2 on the display unit 130 based on the noise sound source direction. If it does so, it becomes possible to make a user grasp
- the display control unit 146 may move the second moving objects Mn1 and Mn2 so that the movement to the display object Sb is blocked. For example, as shown in FIG. 8, the display control unit 146 may move the second moving objects Mn1 and Mn2 so as not to go outside the predetermined range. Then, when speech recognition is performed on the input speech from the sound source direction of the uttered speech by the user, it is more intuitive that speech recognition is not performed on the speech emitted from the noise sound source directions Dn1 and Dn2. This makes it possible for the user to grasp.
- FIG. 9 is a diagram for explaining display start of the second moving objects Mn1 and Mn2.
- the display control unit 146 performs the second motion object Mn1 when the first noise volume exceeds the recognizable volume V_able (or when the first noise volume becomes equal to the recognizable volume V_able). May be displayed on the display unit 130.
- the display control unit 146 displays the second moving object Mn2 when the second noise volume exceeds the recognizable volume V_able (or when the second noise volume becomes equal to the recognizable volume V_able). You may display on the display part 130.
- FIG. The display start of the first moving object Mu is as already described.
- 10A and 10B are flowcharts illustrating an example of the operation flow of the information processing system 10 according to the embodiment of the present disclosure. Note that the flowcharts of FIGS. 10A and 10B are merely examples of the operation flow of the information processing system 10 according to the embodiment of the present disclosure. Therefore, the operation flow of the information processing system 10 according to the embodiment of the present disclosure is as follows. The present invention is not limited to the examples shown in the flowcharts of FIGS. 10A and 10B.
- the input image acquisition unit 141 acquires the input image input by the image input unit 110 (S11). Further, the input voice acquisition unit 142 acquires the input voice input by the voice input unit 120 (S12). Subsequently, the information processing apparatus 140 shifts the operation to S11 and S12 when the sound source direction of the user uttered voice cannot be specified based on the input image and the input voice (“No” in S13). If the sound source direction of the user uttered voice can be specified based on the input image and the input voice (“Yes” in S13), the operation is shifted to S14.
- the determination unit 144 determines the sound source direction of the user utterance voice and the user utterance volume (S14), and determines the direction of the noise sound source and the noise volume (S15). Subsequently, when the noise volume exceeds the noise volume lower limit N_min (“No” in S16), the determination unit 144 sets a value obtained by multiplying the recognizable volume V_able by the noise volume average value N_ave by V_ratio ( The operation is shifted to S17) and S19. On the other hand, when the noise volume is lower than the noise volume lower limit N_min (“Yes” in S16), the determination unit 144 sets the specified value V_able_min to the recognizable volume V_able (S18), and shifts the operation to S19.
- the information processing apparatus 140 shifts the operation to S24.
- the voice recognition unit 145 performs voice recognition from the input voice. At this time, the voice recognition unit 145 may perform voice recognition on the input voice from the sound source direction of the uttered voice by the user.
- the display control unit 146 causes the display unit 130 to display the first moving object Mu according to the sound source direction of the user uttered voice (S21). Then, when there is a noise sound source that emits a noise volume that exceeds the recognizable volume V_able (“Yes” in S22), the display control unit 146 displays a second moving object corresponding to the direction of the noise sound source. (S23), and the operation is shifted to S13. On the other hand, when there is no noise sound source that emits a noise volume higher than the recognizable volume V_able (“No” in S22), the information processing apparatus 140 shifts the operation to S24. When the operation is shifted to S24, the information processing apparatus 140 causes the input image acquisition unit 141 and the input sound acquisition unit 142 to acquire the input image and the input sound in the next time unit (S24), and shifts the operation to S13.
- FIG. 11 is a diagram illustrating a first modification of the display form by the display unit 130.
- the display unit 130 may be included in the mobile terminal.
- the kind of portable terminal is not specifically limited, A tablet terminal may be sufficient, a smart phone may be sufficient, and a mobile phone may be sufficient.
- FIG. 12 is a diagram showing a second modification of the display form by the display unit 130.
- the display unit 130 may be included in the television receiver.
- the display control unit 146 displays the first moving object Mu on the display unit 130 based on the sound source direction of the uttered voice by the user U, and changes the first movement object Mu based on the direction of the noise sound source Ns.
- Two moving objects Mn may be displayed on the display unit 130.
- FIG. 13 is a diagram showing a third modification of the display form by the display unit 130.
- the display unit 130 may be a head mounted display.
- the display control unit 146 causes the display unit 130 to display the display object Sb when the object Ob corresponding to the display object Sb is recognized from the image captured by the camera included in the head mounted display. May be.
- the display control unit 146 may recognize the three-dimensional position and posture of the object Ob, and place the display object Sb in an AR (augmented reality) space according to the recognized three-dimensional position and posture.
- the first moving object Mu may also be moved based on the three-dimensional position of the object Ob.
- the display control unit 146 moves the first moving object Mu in the AR space toward the three-dimensional position of the object Ob when the user utterance volume exceeds the voice recognizable volume.
- the moving object Mu may be displayed so as to move from the front to the back.
- the first moving object Mu appears from the vicinity of the user's mouth and moves to the display object Sb. is doing.
- the information processing apparatus 140 recognizes the recognized character string obtained by the speech recognition. It is assumed that the object Ob is caused to execute an operation corresponding to the power (for example, switching between power ON and OFF of the lighting fixture). However, the object Ob may be any object other than the lighting fixture.
- FIG. 14 is a diagram showing a fourth modification of the display form by the display unit 130.
- the display unit 130 may be included in the three-dimensional stereoscopic display.
- the display control unit 146 moves toward the display object Sb with an expression that moves from the front to the back.
- the first moving object Mu to be displayed may be displayed.
- the expression of moving from the front to the back can be realized by using the parallax between the left and right eyes of the user U.
- an expression that moves from the front to the back is realized by causing the user U to wear the stereoscopic glasses L, but the naked eye stereoscopic that does not cause the user U to wear the stereoscopic glasses L.
- An expression that moves from the front to the back may be realized.
- the depth of the display object Sb displayed on the object G1 is increased.
- FIG. 15 is a diagram illustrating a fifth modification of the display form by the display unit 130.
- the display control unit 146 may display the virtual object Vr on the display unit 130 and display a predetermined object included in the virtual object Vr on the display unit 130 as the display object Sb.
- the virtual object Vr corresponds to the game controller Cr, but the virtual object Vr may correspond to an object other than the game controller Cr.
- the predetermined object corresponds to the microphone Mc included in the game controller Cr, but the predetermined object is not limited to the microphone Mc.
- the user U views the destination of the first moving object Mu displayed by the display unit 130 to input his / her utterance voice. It is possible to easily grasp what is being done. Further, if the virtual object Vr and the display object Sb are displayed in this way, it is possible to prevent the user U from performing an utterance toward an incorrect position (for example, the position of the display unit 130). It becomes.
- FIG. 16 is a block diagram illustrating a hardware configuration example of the information processing system 10 according to the embodiment of the present disclosure.
- the information processing system 10 includes a CPU (Central Processing unit) 901, a ROM (Read Only Memory) 903, and a RAM (Random Access Memory) 905.
- the information processing system 10 may also include a host bus 907, a bridge 909, an external bus 911, an interface 913, an input device 915, an output device 917, a storage device 919, a drive 921, a connection port 923, and a communication device 925.
- the information processing system 10 may include an imaging device 933 and a sensor 935 as necessary.
- the information processing system 10 may include a processing circuit called DSP (Digital Signal Processor) or ASIC (Application Specific Integrated Circuit) instead of or in addition to the CPU 901.
- DSP Digital Signal Processor
- ASIC Application Specific Integrated Circuit
- the CPU 901 functions as an arithmetic processing unit and a control unit, and controls all or part of the operation in the information processing system 10 according to various programs recorded in the ROM 903, the RAM 905, the storage device 919, or the removable recording medium 927.
- the ROM 903 stores programs and calculation parameters used by the CPU 901.
- the RAM 905 temporarily stores programs used in the execution of the CPU 901, parameters that change as appropriate during the execution, and the like.
- the CPU 901, the ROM 903, and the RAM 905 are connected to each other by a host bus 907 configured by an internal bus such as a CPU bus. Further, the host bus 907 is connected to an external bus 911 such as a PCI (Peripheral Component Interconnect / Interface) bus via a bridge 909.
- PCI Peripheral Component Interconnect / Interface
- the input device 915 is a device operated by the user, such as a mouse, a keyboard, a touch panel, a button, a switch, and a lever.
- the input device 915 may include a microphone that detects the user's voice.
- the input device 915 may be, for example, a remote control device using infrared rays or other radio waves, or may be an external connection device 929 such as a mobile phone that supports the operation of the information processing system 10.
- the input device 915 includes an input control circuit that generates an input signal based on information input by the user and outputs the input signal to the CPU 901. The user operates the input device 915 to input various data to the information processing system 10 and instruct processing operations.
- An imaging device 933 which will be described later, can also function as an input device by imaging a user's hand movement, a user's finger, and the like. At this time, the pointing position may be determined according to the movement of the hand or the direction of the finger.
- the output device 917 is a device that can notify the user of the acquired information visually or audibly.
- the output device 917 is, for example, a display device such as an LCD (Liquid Crystal Display), a PDP (Plasma Display Panel), an organic EL (Electro-Luminescence) display, a projector, an audio output device such as a hologram display device, a speaker and headphones, As well as a printer device.
- the output device 917 outputs the result obtained by the processing of the information processing system 10 as a video such as text or an image, or outputs it as a voice such as voice or sound.
- the output device 917 may include a light or the like to brighten the surroundings.
- the storage device 919 is a data storage device configured as an example of a storage unit of the information processing system 10.
- the storage device 919 includes, for example, a magnetic storage device such as an HDD (Hard Disk Drive), a semiconductor storage device, an optical storage device, or a magneto-optical storage device.
- the storage device 919 stores programs executed by the CPU 901, various data, various data acquired from the outside, and the like.
- the drive 921 is a reader / writer for a removable recording medium 927 such as a magnetic disk, an optical disk, a magneto-optical disk, or a semiconductor memory, and is built in or externally attached to the information processing system 10.
- the drive 921 reads information recorded on the attached removable recording medium 927 and outputs the information to the RAM 905.
- the drive 921 writes a record in the attached removable recording medium 927.
- the connection port 923 is a port for directly connecting a device to the information processing system 10.
- the connection port 923 can be, for example, a USB (Universal Serial Bus) port, an IEEE 1394 port, a SCSI (Small Computer System Interface) port, or the like.
- the connection port 923 may be an RS-232C port, an optical audio terminal, an HDMI (registered trademark) (High-Definition Multimedia Interface) port, or the like.
- Various data can be exchanged between the information processing system 10 and the external connection device 929 by connecting the external connection device 929 to the connection port 923.
- the communication device 925 is a communication interface configured with, for example, a communication device for connecting to the communication network 931.
- the communication device 925 can be, for example, a communication card for wired or wireless LAN (Local Area Network), Bluetooth (registered trademark), or WUSB (Wireless USB).
- the communication device 925 may be a router for optical communication, a router for ADSL (Asymmetric Digital Subscriber Line), or a modem for various communication.
- the communication device 925 transmits and receives signals and the like using a predetermined protocol such as TCP / IP with the Internet and other communication devices, for example.
- the communication network 931 connected to the communication device 925 is a wired or wireless network, such as the Internet, a home LAN, infrared communication, radio wave communication, or satellite communication.
- the imaging device 933 uses various members such as an imaging element such as a CCD (Charge Coupled Device) or CMOS (Complementary Metal Oxide Semiconductor), and a lens for controlling the imaging of a subject image on the imaging element. It is an apparatus that images a real space and generates a captured image.
- the imaging device 933 may capture a still image or may capture a moving image.
- the sensor 935 is various sensors such as an acceleration sensor, a gyro sensor, a geomagnetic sensor, an optical sensor, and a sound sensor.
- the sensor 935 obtains information related to the state of the information processing system 10 such as the posture of the information processing system 10, and information related to the surrounding environment of the information processing system 10 such as brightness and noise around the information processing system 10. To do.
- the sensor 935 may include a GPS sensor that receives a GPS (Global Positioning System) signal and measures the latitude, longitude, and altitude of the apparatus.
- GPS Global Positioning System
- Each component described above may be configured using a general-purpose member, or may be configured by hardware specialized for the function of each component. Such a configuration can be appropriately changed according to the technical level at the time of implementation.
- the determination unit 144 that determines the user utterance volume based on the input voice and the display unit 130 are controlled so that the display object Sb is displayed on the display unit 130.
- Display control unit 146, and display control unit 146 causes display unit 130 to display a first moving object that moves toward display object Sb when the user's utterance volume exceeds the voice recognizable volume.
- An information processing apparatus 140 is provided.
- the user it is possible to make the user know whether or not an utterance is made at a volume that allows voice recognition. For example, when voice recognition is not correctly performed, the user can grasp how to change the utterance. Further, if the user's utterance changes based on the result, it is expected that the success rate of the speech recognition is improved.
- the display form by the display unit 130 is not limited to the above-described example.
- the display unit 130 may be a display provided in a wearable terminal (for example, a watch, glasses, etc.) other than the head mounted display.
- the display unit 130 may be a display provided in an in-vehicle navigation system.
- the display unit 130 may be a display used in the healthcare field.
- the display control unit 146 generates display control information for causing the display unit 130 to display the display content, and outputs the generated display control information to the display unit 130, so that the display content is displayed on the display unit 130. In this way, the display unit 130 can be controlled.
- the contents of the display control information may be changed as appropriate according to the system configuration.
- the program for realizing the information processing apparatus 140 may be a web application.
- the display control information may be realized by a markup language such as HTML (HyperText Markup Language), SGML (Standard Generalized Markup Language), XML (Extensible Markup Language), or the like.
- the position of each component is not particularly limited as long as the operation of the information processing system 10 described above is realized.
- the image input unit 110, the operation input unit 115, the voice input unit 120, the display unit 130, and the information processing device 140 may be provided in different devices connected via a network.
- the information processing apparatus 140 corresponds to a server such as a web server or a cloud server, for example, and the image input unit 110, the operation input unit 115, the voice input unit 120, and the display unit 130 are connected to the server. It may correspond to a client connected via
- a determination unit for determining a user utterance volume based on an input voice A display control unit that controls the display unit such that a display object is displayed by the display unit, The display control unit causes the display unit to display a first moving object that moves toward the display object when the user utterance volume exceeds a voice recognizable volume.
- Information processing device (2) The determination unit determines the sound source direction of the uttered voice by the user, The display control unit causes the display unit to display the first moving object based on a sound source direction of the uttered voice by the user.
- the information processing apparatus includes a voice recognition unit that acquires a recognition character string by performing voice recognition on an input voice from a sound source direction of the voice spoken by the user.
- the display control unit displays the recognition character string on the display unit.
- the determination unit determines a noise volume based on the input voice,
- the display control unit causes the display unit to display a second moving object different from the first moving object when the noise volume exceeds the voice recognizable volume;
- the information processing apparatus according to any one of (1) to (4).
- the determination unit determines a noise source direction,
- the display control unit causes the display unit to display the second moving object based on the noise source direction.
- the second moving object moves so that movement to the display object is blocked;
- the display control unit controls a parameter related to the first moving object based on predetermined information corresponding to the input voice;
- the parameter relating to the first moving object includes at least one of a size, a shape, a color, and a moving speed of the first moving object.
- the predetermined information corresponding to the input voice is the user utterance volume, the frequency of the input voice, the acquisition speed of the recognized character string, the feature amount extracted from the input voice, and the user identified from the input voice, Including at least one The information processing apparatus according to (8) or (9).
- the determination unit determines the sound source direction of the uttered voice by the user based on the arrival direction of the voice input at a volume level exceeding the threshold.
- the determination unit determines the sound source direction of the speech voice by the user based on the arrival direction of the voice input at the maximum volume.
- the determination unit determines a sound source direction of the uttered voice by the user based on a direction from a fingertip to a finger base; The information processing apparatus according to (2).
- the determination unit sets a prescribed value for the voice recognizable volume.
- the determination unit sets a volume according to an average value of the noise volume or a volume according to the noise volume as the sound recognizable volume.
- the information processing apparatus according to (6).
- the display control unit displays the display object on the display unit when an object corresponding to the display object is recognized from a captured image; The information processing apparatus according to any one of (1) to (15).
- the display control unit displays the first moving object that moves toward the display object with an expression that moves from the front to the back. Display on the display, The information processing apparatus according to any one of (1) to (16).
- the display control unit displays a virtual object on the display unit, and displays a predetermined object included in the virtual object on the display unit as the display object.
- the information processing apparatus according to any one of (1) to (17). (19) Determining the user utterance volume based on the input voice; Controlling the display unit such that a display object is displayed by the display unit, Including displaying a first moving object that moves toward the display object on the display unit when the user utterance volume exceeds a voice recognizable volume. Information processing method.
- Computer A determination unit for determining a user utterance volume based on an input voice; A display control unit that controls the display unit such that a display object is displayed by the display unit, The display control unit causes the display unit to display a first moving object that moves toward the display object when the user utterance volume exceeds a voice recognizable volume.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Human Computer Interaction (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- General Health & Medical Sciences (AREA)
- Signal Processing (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Computational Linguistics (AREA)
- Acoustics & Sound (AREA)
- User Interface Of Digital Computer (AREA)
Abstract
Description
1.本開示の実施形態
1.1.システム構成例
1.2.機能構成例
1.3.第1の動きオブジェクトの表示
1.4.認識可能音量の設定
1.5.第2の動きオブジェクトの表示
1.6.動作例
1.7.表示形態の変形例
1.8.ハードウェア構成例
2.むすび
[1.1.システム構成例]
まず、図面を参照しながら本開示の実施形態に係る情報処理システム10の構成例について説明する。図1は、本開示の実施形態に係る情報処理システム10の構成例を示す図である。図1に示したように、本開示の実施形態に係る情報処理システム10は、画像入力部110と、操作入力部115と、音声入力部120と、表示部130とを備える。情報処理システム10は、ユーザU(以下、単に「ユーザ」とも言う。)によって発せられた音声に対して音声認識を行うことが可能である。
続いて、本開示の実施形態に係る情報処理システム10の機能構成例について説明する。図2は、本開示の実施形態に係る情報処理システム10の機能構成例を示すブロック図である。図2に示したように、本開示の実施形態に係る情報処理システム10は、画像入力部110と、操作入力部115と、音声入力部120と、表示部130と、情報処理装置140(以下、「制御部140」とも言う。)と、を備える。
まず、音声認識開始オブジェクト(不図示)を選択する操作が操作検出部143によって検出されると、音声認識部145によって入力音声に対する音声認識が開始される。図3は、表示部130によって表示される画面の例を示す図である。図3を参照すると、表示制御部146は、音声認識キャンセル操作オブジェクトBu1と音声認識終了操作オブジェクトBu2と表示オブジェクトSbとを表示させている。音声認識キャンセル操作オブジェクトBu1は、音声認識を取り消す操作の入力を受け付けるためのオブジェクトである。音声認識終了操作オブジェクトBu2は、音声認識を終了する操作の入力を受け付けるためのオブジェクトである。
続いて、認識可能音量について詳細に説明する。上記した認識可能音量は、常に一定とは限らずにノイズ音量に基づいて変化することが考えられる。図5~図7は、認識可能音量について説明するための図である。例えば、図5に示すように、ノイズ音量平均値N_aveが所定の下限値(以下、「ノイズ音量下限値」とも言う。)N_minを下回る場合、認識可能音量V_ableは変化しないと考えられる。そこで、判定部144は、ノイズ音量平均値N_aveがノイズ音量下限値N_minを下回る場合、認識可能音量V_ableに規定値V_able_minを設定してよい。なお、ノイズ音量平均値N_aveの代わりにノイズ音量自体が用いられてもよい。
上記のようにして第1の動きオブジェクトMuが表示されれば、音声認識が可能な音量で発話がなされていることをユーザに把握させることが可能となる。一方、音声認識が可能な音量で発話がなされていたとしても、ノイズによって音声認識が妨害される可能性もある。そこで、ノイズの存在をユーザに把握させることが有効である。図8は、表示部130によって表示される画面の他の例を示す図である。図8を参照すると、ノイズ音源Ns1、Ns2が存在している。ここでは、ノイズ音源が2つ存在する場合を説明するが、ノイズ音源の数は限定されない。
続いて、本開示の実施形態に係る情報処理システム10の動作の流れについて説明する。図10Aおよび図10Bは、本開示の実施形態に係る情報処理システム10の動作の流れの例を示すフローチャートである。なお、図10Aおよび図10Bのフローチャートは、本開示の実施形態に係る情報処理システム10の動作の流れの例に過ぎないため、本開示の実施形態に係る情報処理システム10の動作の流れは、図10Aおよび図10Bのフローチャートに示された例に限定されない。
上記においては、表示部130がテーブルTblの天面に画面を投影することが可能なプロジェクタである例について説明した。しかし、表示部130による表示形態は、かかる例に限定されない。以下では、表示部130による表示形態の変形例について説明する。図11は、表示部130による表示形態の変形例1を示す図である。図11に示すように、情報処理システム10が携帯端末である場合に、表示部130は、携帯端末に備わっていてもよい。携帯端末の種類は特に限定されず、タブレット端末であってもよいし、スマートフォンであってもよいし、携帯電話であってもよい。
次に、図16を参照して、本開示の実施形態に係る情報処理システム10のハードウェア構成について説明する。図16は、本開示の実施形態に係る情報処理システム10のハードウェア構成例を示すブロック図である。
以上説明したように、本開示の実施形態によれば、入力音声に基づいてユーザ発話音量を判定する判定部144と、表示オブジェクトSbが表示部130によって表示されるように表示部130を制御する表示制御部146と、を備え、表示制御部146は、ユーザ発話音量が音声認識可能音量を超えている場合に、表示オブジェクトSbに向かって移動する第1の動きオブジェクトを表示部130に表示させる、情報処理装置140が提供される。
(1)
入力音声に基づいてユーザ発話音量を判定する判定部と、
表示オブジェクトが表示部によって表示されるように前記表示部を制御する表示制御部と、を備え、
前記表示制御部は、前記ユーザ発話音量が音声認識可能音量を超えている場合に、前記表示オブジェクトに向かって移動する第1の動きオブジェクトを前記表示部に表示させる、
情報処理装置。
(2)
前記判定部は、ユーザによる発話音声の音源方向を判定し、
前記表示制御部は、前記ユーザによる発話音声の音源方向に基づいて前記第1の動きオブジェクトを前記表示部に表示させる、
前記(1)に記載の情報処理装置。
(3)
前記情報処理装置は、前記ユーザによる発話音声の音源方向からの入力音声に対して音声認識を行うことにより認識文字列を取得する音声認識部を備える、
前記(2)に記載の情報処理装置。
(4)
前記表示制御部は、前記認識文字列を前記表示部に表示させる、
前記(3)に記載の情報処理装置。
(5)
前記判定部は、前記入力音声に基づいてノイズ音量を判定し、
前記表示制御部は、前記ノイズ音量が前記音声認識可能音量を超えている場合に、前記第1の動きオブジェクトとは異なる第2の動きオブジェクトを前記表示部に表示させる、
前記(1)~(4)のいずれか一項に記載の情報処理装置。
(6)
前記判定部は、ノイズ音源方向を判定し、
前記表示制御部は、前記ノイズ音源方向に基づいて前記第2の動きオブジェクトを前記表示部に表示させる、
前記(5)に記載の情報処理装置。
(7)
前記第2の動きオブジェクトは、前記表示オブジェクトへの移動がブロックされるように移動する、
前記(6)に記載の情報処理装置。
(8)
前記表示制御部は、前記入力音声に応じた所定の情報に基づいて前記第1の動きオブジェクトに関するパラメータを制御する、
前記(1)~(7)のいずれか一項に記載の情報処理装置。
(9)
前記第1の動きオブジェクトに関するパラメータは、前記第1の動きオブジェクトのサイズ、形状、色および移動速度のうち、少なくともいずれか一つを含む、
前記(8)に記載の情報処理装置。
(10)
前記入力音声に応じた所定の情報は、前記ユーザ発話音量、前記入力音声の周波数、認識文字列の取得速度、前記入力音声から抽出される特徴量および前記入力音声から識別されるユーザのうち、少なくともいずれか一つを含む、
前記(8)または(9)に記載の情報処理装置。
(11)
前記判定部は、閾値を超える大きさの音量で入力された音声の到来方向に基づいて前記ユーザによる発話音声の音源方向を判定する、
前記(2)に記載の情報処理装置。
(12)
前記判定部は、最も大きな音量で入力された音声の到来方向に基づいて前記ユーザによる発話音声の音源方向を判定する、
前記(2)に記載の情報処理装置。
(13)
前記判定部は、指先から指の根元への方向に基づいて前記ユーザによる発話音声の音源方向を判定する、
前記(2)に記載の情報処理装置。
(14)
前記判定部は、前記ノイズ音量が下限値を下回る場合には、前記音声認識可能音量に規定値を設定する、
前記(6)に記載の情報処理装置。
(15)
前記判定部は、前記ノイズ音量が下限値を上回る場合には、前記音声認識可能音量に前記ノイズ音量の平均値に応じた音量または前記ノイズ音量に応じた音量を設定する、
前記(6)に記載の情報処理装置。
(16)
前記表示制御部は、前記表示オブジェクトに対応する物体を撮像画像から認識した場合に、前記表示オブジェクトを前記表示部に表示させる、
前記(1)~(15)のいずれか一項に記載の情報処理装置。
(17)
前記表示制御部は、前記ユーザ発話音量が音声認識可能音量を超えている場合に、手前から奥に移動するような表現を伴って前記表示オブジェクトに向かって移動する前記第1の動きオブジェクトを前記表示部に表示させる、
前記(1)~(16)のいずれか一項に記載の情報処理装置。
(18)
前記表示制御部は、仮想オブジェクトを前記表示部に表示させ、前記仮想オブジェクトに含まれる所定のオブジェクトを前記表示オブジェクトとして前記表示部に表示させる、
前記(1)~(17)のいずれか一項に記載の情報処理装置。
(19)
入力音声に基づいてユーザ発話音量を判定することと、
表示オブジェクトが表示部によって表示されるように前記表示部を制御することと、を含み、
前記ユーザ発話音量が音声認識可能音量を超えている場合に、前記表示オブジェクトに向かって移動する第1の動きオブジェクトを前記表示部に表示させることを含む、
情報処理方法。
(20)
コンピュータを、
入力音声に基づいてユーザ発話音量を判定する判定部と、
表示オブジェクトが表示部によって表示されるように前記表示部を制御する表示制御部と、を備え、
前記表示制御部は、前記ユーザ発話音量が音声認識可能音量を超えている場合に、前記表示オブジェクトに向かって移動する第1の動きオブジェクトを前記表示部に表示させる、
情報処理装置として機能させるためのプログラム。
110 画像入力部
115 操作入力部
120 音声入力部
130 表示部
140 情報処理装置(制御部)
141 入力画像取得部
142 入力音声取得部
143 操作検出部
144 判定部
145 音声認識部
146 表示制御部
Mu 第1の動きオブジェクト
Mn、Mn1、Mn2 第2の動きオブジェクト
Ns、Ns2、Ns1 ノイズ音源
Du ユーザによる発話音声の音源方向
Dn、Dn1、Dn2 ノイズ音源方向
Sb 表示オブジェクト
Claims (20)
- 入力音声に基づいてユーザ発話音量を判定する判定部と、
表示オブジェクトが表示部によって表示されるように前記表示部を制御する表示制御部と、を備え、
前記表示制御部は、前記ユーザ発話音量が音声認識可能音量を超えている場合に、前記表示オブジェクトに向かって移動する第1の動きオブジェクトを前記表示部に表示させる、
情報処理装置。 - 前記判定部は、ユーザによる発話音声の音源方向を判定し、
前記表示制御部は、前記ユーザによる発話音声の音源方向に基づいて前記第1の動きオブジェクトを前記表示部に表示させる、
請求項1に記載の情報処理装置。 - 前記情報処理装置は、前記ユーザによる発話音声の音源方向からの入力音声に対して音声認識を行うことにより認識文字列を取得する音声認識部を備える、
請求項2に記載の情報処理装置。 - 前記表示制御部は、前記認識文字列を前記表示部に表示させる、
請求項3に記載の情報処理装置。 - 前記判定部は、前記入力音声に基づいてノイズ音量を判定し、
前記表示制御部は、前記ノイズ音量が前記音声認識可能音量を超えている場合に、前記第1の動きオブジェクトとは異なる第2の動きオブジェクトを前記表示部に表示させる、
請求項1に記載の情報処理装置。 - 前記判定部は、ノイズ音源方向を判定し、
前記表示制御部は、前記ノイズ音源方向に基づいて前記第2の動きオブジェクトを前記表示部に表示させる、
請求項5に記載の情報処理装置。 - 前記第2の動きオブジェクトは、前記表示オブジェクトへの移動がブロックされるように移動する、
請求項6に記載の情報処理装置。 - 前記表示制御部は、前記入力音声に応じた所定の情報に基づいて前記第1の動きオブジェクトに関するパラメータを制御する、
請求項1に記載の情報処理装置。 - 前記第1の動きオブジェクトに関するパラメータは、前記第1の動きオブジェクトのサイズ、形状、色および移動速度のうち、少なくともいずれか一つを含む、
請求項8に記載の情報処理装置。 - 前記入力音声に応じた所定の情報は、前記ユーザ発話音量、前記入力音声の周波数、認識文字列の取得速度、前記入力音声から抽出される特徴量および前記入力音声から識別されるユーザのうち、少なくともいずれか一つを含む、
請求項8に記載の情報処理装置。 - 前記判定部は、閾値を超える大きさの音量で入力された音声の到来方向に基づいて前記ユーザによる発話音声の音源方向を判定する、
請求項2に記載の情報処理装置。 - 前記判定部は、最も大きな音量で入力された音声の到来方向に基づいて前記ユーザによる発話音声の音源方向を判定する、
請求項2に記載の情報処理装置。 - 前記判定部は、指先から指の根元への方向に基づいて前記ユーザによる発話音声の音源方向を判定する、
請求項2に記載の情報処理装置。 - 前記判定部は、前記ノイズ音量が下限値を下回る場合には、前記音声認識可能音量に規定値を設定する、
請求項6に記載の情報処理装置。 - 前記判定部は、前記ノイズ音量が下限値を上回る場合には、前記音声認識可能音量に前記ノイズ音量の平均値に応じた音量または前記ノイズ音量に応じた音量を設定する、
請求項6に記載の情報処理装置。 - 前記表示制御部は、前記表示オブジェクトに対応する物体を撮像画像から認識した場合に、前記表示オブジェクトを前記表示部に表示させる、
請求項1に記載の情報処理装置。 - 前記表示制御部は、前記ユーザ発話音量が音声認識可能音量を超えている場合に、手前から奥に移動するような表現を伴って前記表示オブジェクトに向かって移動する前記第1の動きオブジェクトを前記表示部に表示させる、
請求項1に記載の情報処理装置。 - 前記表示制御部は、仮想オブジェクトを前記表示部に表示させ、前記仮想オブジェクトに含まれる所定のオブジェクトを前記表示オブジェクトとして前記表示部に表示させる、
請求項1に記載の情報処理装置。 - 入力音声に基づいてユーザ発話音量を判定することと、
表示オブジェクトが表示部によって表示されるように前記表示部を制御することと、を含み、
前記ユーザ発話音量が音声認識可能音量を超えている場合に、前記表示オブジェクトに向かって移動する第1の動きオブジェクトを前記表示部に表示させることを含む、
情報処理方法。 - コンピュータを、
入力音声に基づいてユーザ発話音量を判定する判定部と、
表示オブジェクトが表示部によって表示されるように前記表示部を制御する表示制御部と、を備え、
前記表示制御部は、前記ユーザ発話音量が音声認識可能音量を超えている場合に、前記表示オブジェクトに向かって移動する第1の動きオブジェクトを前記表示部に表示させる、
情報処理装置として機能させるためのプログラム。
Priority Applications (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201580057995.8A CN107148614B (zh) | 2014-12-02 | 2015-08-21 | 信息处理设备、信息处理方法和程序 |
US15/521,322 US10642575B2 (en) | 2014-12-02 | 2015-08-21 | Information processing device and method of information processing for notification of user speech received at speech recognizable volume levels |
EP15866106.6A EP3229128A4 (en) | 2014-12-02 | 2015-08-21 | Information processing device, information processing method, and program |
JP2016562324A JP6627775B2 (ja) | 2014-12-02 | 2015-08-21 | 情報処理装置、情報処理方法およびプログラム |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2014-243906 | 2014-12-02 | ||
JP2014243906 | 2014-12-02 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2016088410A1 true WO2016088410A1 (ja) | 2016-06-09 |
Family
ID=56091368
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2015/073488 WO2016088410A1 (ja) | 2014-12-02 | 2015-08-21 | 情報処理装置、情報処理方法およびプログラム |
Country Status (5)
Country | Link |
---|---|
US (1) | US10642575B2 (ja) |
EP (1) | EP3229128A4 (ja) |
JP (1) | JP6627775B2 (ja) |
CN (1) | CN107148614B (ja) |
WO (1) | WO2016088410A1 (ja) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPWO2019146032A1 (ja) * | 2018-01-25 | 2020-07-02 | 三菱電機株式会社 | ジェスチャー操作装置およびジェスチャー操作方法 |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP7250547B2 (ja) * | 2019-02-05 | 2023-04-03 | 本田技研工業株式会社 | エージェントシステム、情報処理装置、情報処理方法、およびプログラム |
JP7169921B2 (ja) * | 2019-03-27 | 2022-11-11 | 本田技研工業株式会社 | エージェント装置、エージェントシステム、エージェント装置の制御方法、およびプログラム |
CN111265851B (zh) * | 2020-02-05 | 2023-07-04 | 腾讯科技(深圳)有限公司 | 数据处理方法、装置、电子设备及存储介质 |
CN113934289A (zh) | 2020-06-29 | 2022-01-14 | 北京字节跳动网络技术有限公司 | 数据处理方法、装置、可读介质及电子设备 |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH11352995A (ja) * | 1998-06-08 | 1999-12-24 | Toshiba Tec Corp | 音声認識装置 |
JP2000000377A (ja) * | 1998-06-12 | 2000-01-07 | Umbrella:Kk | 音声入力式ヒューマンインタフェースに特徴を有するビデオゲーム機およびプログラム記録媒体 |
JP2001079265A (ja) * | 1999-09-14 | 2001-03-27 | Sega Corp | ゲーム装置 |
JP2006227499A (ja) * | 2005-02-21 | 2006-08-31 | Toyota Motor Corp | 音声認識装置 |
JP2007329702A (ja) * | 2006-06-08 | 2007-12-20 | Toyota Motor Corp | 受音装置と音声認識装置とそれらを搭載している可動体 |
JP2011227199A (ja) * | 2010-04-16 | 2011-11-10 | Nec Casio Mobile Communications Ltd | 雑音抑圧装置、雑音抑圧方法及びプログラム |
Family Cites Families (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7096185B2 (en) * | 2000-03-31 | 2006-08-22 | United Video Properties, Inc. | User speech interfaces for interactive media guidance applications |
US7023498B2 (en) * | 2001-11-19 | 2006-04-04 | Matsushita Electric Industrial Co. Ltd. | Remote-controlled apparatus, a remote control system, and a remote-controlled image-processing apparatus |
US7260538B2 (en) * | 2002-01-08 | 2007-08-21 | Promptu Systems Corporation | Method and apparatus for voice control of a television control device |
JP2007142840A (ja) * | 2005-11-18 | 2007-06-07 | Canon Inc | 情報処理装置及び情報処理方法 |
JP4887911B2 (ja) * | 2006-05-31 | 2012-02-29 | 船井電機株式会社 | 電子機器 |
US8175885B2 (en) * | 2007-07-23 | 2012-05-08 | Verizon Patent And Licensing Inc. | Controlling a set-top box via remote speech recognition |
WO2012169679A1 (ko) | 2011-06-10 | 2012-12-13 | 엘지전자 주식회사 | 디스플레이 장치, 디스플레이 장치의 제어 방법 및 디스플레이 장치의 음성인식 시스템 |
US9563265B2 (en) | 2012-01-12 | 2017-02-07 | Qualcomm Incorporated | Augmented reality with sound and geometric analysis |
US8793136B2 (en) | 2012-02-17 | 2014-07-29 | Lg Electronics Inc. | Method and apparatus for smart voice recognition |
US9020825B1 (en) * | 2012-09-25 | 2015-04-28 | Rawles Llc | Voice gestures |
CN104077105B (zh) * | 2013-03-29 | 2018-04-27 | 联想(北京)有限公司 | 一种信息处理方法以及一种电子设备 |
JP2014203207A (ja) * | 2013-04-03 | 2014-10-27 | ソニー株式会社 | 情報処理装置、情報処理方法及びコンピュータプログラム |
-
2015
- 2015-08-21 US US15/521,322 patent/US10642575B2/en active Active
- 2015-08-21 EP EP15866106.6A patent/EP3229128A4/en not_active Withdrawn
- 2015-08-21 CN CN201580057995.8A patent/CN107148614B/zh not_active Expired - Fee Related
- 2015-08-21 WO PCT/JP2015/073488 patent/WO2016088410A1/ja active Application Filing
- 2015-08-21 JP JP2016562324A patent/JP6627775B2/ja active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH11352995A (ja) * | 1998-06-08 | 1999-12-24 | Toshiba Tec Corp | 音声認識装置 |
JP2000000377A (ja) * | 1998-06-12 | 2000-01-07 | Umbrella:Kk | 音声入力式ヒューマンインタフェースに特徴を有するビデオゲーム機およびプログラム記録媒体 |
JP2001079265A (ja) * | 1999-09-14 | 2001-03-27 | Sega Corp | ゲーム装置 |
JP2006227499A (ja) * | 2005-02-21 | 2006-08-31 | Toyota Motor Corp | 音声認識装置 |
JP2007329702A (ja) * | 2006-06-08 | 2007-12-20 | Toyota Motor Corp | 受音装置と音声認識装置とそれらを搭載している可動体 |
JP2011227199A (ja) * | 2010-04-16 | 2011-11-10 | Nec Casio Mobile Communications Ltd | 雑音抑圧装置、雑音抑圧方法及びプログラム |
Non-Patent Citations (1)
Title |
---|
See also references of EP3229128A4 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPWO2019146032A1 (ja) * | 2018-01-25 | 2020-07-02 | 三菱電機株式会社 | ジェスチャー操作装置およびジェスチャー操作方法 |
Also Published As
Publication number | Publication date |
---|---|
EP3229128A4 (en) | 2018-05-30 |
US20180150279A1 (en) | 2018-05-31 |
US10642575B2 (en) | 2020-05-05 |
CN107148614B (zh) | 2020-09-08 |
JPWO2016088410A1 (ja) | 2017-09-14 |
CN107148614A (zh) | 2017-09-08 |
JP6627775B2 (ja) | 2020-01-08 |
EP3229128A1 (en) | 2017-10-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10546582B2 (en) | Information processing device, method of information processing, and program | |
JP6729555B2 (ja) | 情報処理システムおよび情報処理方法 | |
JP6627775B2 (ja) | 情報処理装置、情報処理方法およびプログラム | |
US20190019512A1 (en) | Information processing device, method of information processing, and program | |
US11373650B2 (en) | Information processing device and information processing method | |
JP6750697B2 (ja) | 情報処理装置、情報処理方法及びプログラム | |
US20200018926A1 (en) | Information processing apparatus, information processing method, and program | |
JP2016109726A (ja) | 情報処理装置、情報処理方法およびプログラム | |
WO2018139036A1 (ja) | 情報処理装置、情報処理方法およびプログラム | |
JP6575518B2 (ja) | 表示制御装置、表示制御方法およびプログラム | |
US20180063283A1 (en) | Information processing apparatus, information processing method, and program | |
WO2019021566A1 (ja) | 情報処理装置、情報処理方法、及びプログラム | |
JP2016156877A (ja) | 情報処理装置、情報処理方法およびプログラム | |
US20200342229A1 (en) | Information processing device, information processing method, and program | |
US20200380733A1 (en) | Information processing device, information processing method, and program | |
WO2019054037A1 (ja) | 情報処理装置、情報処理方法、およびプログラム | |
JP2016180778A (ja) | 情報処理システムおよび情報処理方法 | |
WO2019187593A1 (ja) | 情報処理装置、情報処理方法およびプログラム | |
JP2016170584A (ja) | 情報処理装置、情報処理方法およびプログラム | |
US10855639B2 (en) | Information processing apparatus and information processing method for selection of a target user | |
WO2019082520A1 (ja) | 情報処理装置、情報処理方法、およびプログラム | |
WO2019026392A1 (ja) | 情報処理装置、情報処理方法、およびプログラム |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 15866106 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 2016562324 Country of ref document: JP Kind code of ref document: A |
|
REEP | Request for entry into the european phase |
Ref document number: 2015866106 Country of ref document: EP |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
WWE | Wipo information: entry into national phase |
Ref document number: 15521322 Country of ref document: US |