WO2020189410A1 - Speech recognition device - Google Patents

Speech recognition device Download PDF

Info

Publication number
WO2020189410A1
WO2020189410A1 PCT/JP2020/010283 JP2020010283W WO2020189410A1 WO 2020189410 A1 WO2020189410 A1 WO 2020189410A1 JP 2020010283 W JP2020010283 W JP 2020010283W WO 2020189410 A1 WO2020189410 A1 WO 2020189410A1
Authority
WO
WIPO (PCT)
Prior art keywords
voice recognition
recognition device
surface portion
microphone
unit
Prior art date
Application number
PCT/JP2020/010283
Other languages
French (fr)
Japanese (ja)
Inventor
優 坂西
Original Assignee
優 坂西
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from JP2019179273A external-priority patent/JP7432177B2/en
Application filed by 優 坂西 filed Critical 優 坂西
Publication of WO2020189410A1 publication Critical patent/WO2020189410A1/en

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/28Constructional details of speech recognition systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R3/00Circuits for transducers, loudspeakers or microphones

Definitions

  • the present invention relates to a voice recognition device.
  • Patent Document 1 describes a simultaneous interpretation device including two displays.
  • a state in which the simultaneous interpretation device is used while being placed on a stationary body such as a desk or a table is illustrated.
  • An object of the present invention is to provide a voice recognition device that can be efficiently used according to a situation in which a conversation that is the target of voice recognition is performed.
  • the voice recognition device includes a plurality of microphones, a sensor for determining whether or not the voice recognition device is in a stationary state, and a plurality of sensors according to the determination result by the sensor.
  • the voice recognition unit includes a microphone control unit that controls whether each of the microphones is turned on or off, and a voice recognition unit that recognizes a voice input to at least one microphone in the on state.
  • the microphones are arranged on the left side surface portion, the right side surface portion, and the upper surface portion of the device.
  • the present invention it is possible to provide a voice recognition device that can be efficiently used according to a situation in which a conversation that is the target of voice recognition is performed.
  • the outer shape of the voice recognition device 1 is a substantially triangular prism.
  • This substantially triangular prism is a substantially pentahedron having two faces that are substantially triangular and three faces that are substantially rectangular. The short sides of the three faces that are substantially rectangular are also the sides of the two faces that are substantially triangular.
  • a front portion is indicated by reference numeral 11.
  • a display unit D visually recognized by the user of the voice recognition device 1 is arranged on the front unit 11.
  • the voice recognition device 1 is horizontal from the front so that the two long sides 11a and 11b of the front portion 11 are horizontal and one long side 11a is located directly above the other long side 11b.
  • the substantially triangular surfaces on the left and right are referred to as the left side surface portion 13 and the right side surface portion 14, respectively, and the substantially rectangular surface sharing the front portion 11 and the long side portion 11a is referred to as the upper surface portion 15 and is long with the front portion 11.
  • a substantially rectangular surface that shares the side portion 11b is referred to as a lower surface portion 16.
  • the front surface portion 11, the upper surface portion 15, and the lower surface portion 16 of the voice recognition device 1 are substantially rectangular, and each side of the left side surface portion 13 and the right side surface portion 14 is compared with the lengths of the long side portions 11a and 11b of the front surface portion 11.
  • the length of the part is short. That is, the voice recognition device 1 has an elongated shape in the left-right direction.
  • the angle ⁇ formed by the front portion 11 and the left surface portion 13 is an acute angle (FIG. 5). Similarly, the angle formed by the front surface portion 11 and the right side surface portion 14 is also an acute angle. Further, the angle ⁇ formed by the side portion 12 and the left side surface portion 13 shared by the upper surface portion 15 and the lower surface portion 16 is an obtuse angle. Similarly, the angle formed by the side portion 12 and the right side surface portion 14 is also an obtuse angle.
  • the voice recognition device 1 is easy to carry and has a handy size so that it can be used even when it is held by hand.
  • the length of the front portion 11 in the left-right direction is 120 mm
  • the distance between the long side portion 11a and the long side portion 11b is 30 mm
  • the depth the length from the front portion 11 to the side portion 12. Is 15 millimeters.
  • the first microphone M1 is arranged on the upper surface portion 15.
  • a second microphone M2 is arranged on the left side surface portion 13.
  • a third microphone M3 is arranged on the right side surface portion 14.
  • the fourth microphone M4 is arranged below the display unit D in the front portion 11.
  • the first microphone M1, the second microphone M2, the third microphone M3, and the fourth microphone M4 are all directional microphones having directivity in front of each microphone. Compared to an omnidirectional microphone, a directional microphone can capture the voice of a speaker located farther away from the front of the directional microphone. Each microphone converts the input voice into an electrical signal when it is on. The control of whether each microphone is turned on or off will be described later.
  • the voice recognition device 1 further includes a voice recognition unit 21, a translation unit 22, a sensor 23, and a microphone control unit 24.
  • the translation unit 22 is not an essential element.
  • the voice recognition unit 21 receives an electric signal from at least one microphone in the ON state among the first to fourth microphones M1 to M4, and performs voice recognition.
  • the voice recognition unit 21 recognizes the electric signal as the first language. This voice recognition result is displayed as a character string on the display unit D.
  • the translation unit 22 translates the voice recognized as the first language by the voice recognition unit 21 into a second language different from the first language. This translation is done by the translation engine.
  • the translation engine is built in the voice recognition device 1 or is provided outside the voice recognition device 1 so as to be able to communicate with the translation unit 22.
  • the translation result by the translation unit 22 is displayed as a character string on the display unit D.
  • the acceleration sensor 23 measures the acceleration of the voice recognition device 1 and determines whether or not the voice recognition device 1 is in a stationary state based on the measured acceleration.
  • the acceleration sensor 23 determines that the voice recognition device 1 is in a stationary state based on the measured acceleration.
  • the acceleration sensor 23 uses the measured acceleration to make a voice. The recognition device 1 determines that it is not in a stationary state.
  • the microphone control unit 24 controls whether each microphone is turned on or off according to the determination result of the acceleration sensor 23.
  • the acceleration sensor 23 determines that the voice recognition device 1 is in the stationary state
  • all the first to fourth microphones M1 to M4 are controlled to be in the ON state.
  • the first microphone M1 is turned on and the second to fourth microphones M2 to M4 are controlled to be turned off.
  • the voice recognition device 1 has a computer hardware configuration of an interface device capable of communicating with a processor and an external computer, and an input used by a user of the voice recognition device 1 for input. It further includes a device and a storage device.
  • FIG 8 and 9 show a state in which the user P1 of the voice recognition device 1 has a conversation with the speaker P2 while holding the voice recognition device 1 in his hand. Since the user P1 holds the voice recognition device 1 in his hand, the acceleration sensor 23 determines that the voice recognition device 1 is not in a stationary state. As a result, the microphone control unit 24 is controlled so that the first microphone M1 is turned on and the second to fourth microphones M2 to M4 are turned off. The region where the first microphone M1 can acquire voice is indicated by reference numeral S1.
  • the upper surface portion 15 almost faces the speaker P2.
  • the region S1 from which the first microphone M1 can acquire voice has a spread from the upper surface portion 15 toward the speaker P2.
  • the voice emitted by the speaker P2 is input to the first microphone M1 in the on state, and is recognized as the first language (English in this example) by the voice recognition unit 21.
  • the recognition result is displayed in the area on the left side of the display unit D (FIG. 8).
  • the voice recognized by the voice recognition unit 21 is translated into a second language (Japanese in this example) by the translation unit 22.
  • the translation result is displayed in the area on the right side of the display unit D (FIG. 8).
  • the voice recognition device 1 is tilted so that the user P1 can easily see the display unit D, and at the same time, the area S1 is directed toward the speaker P2. Since it has a spread, the voice emitted by the speaker P2 can be efficiently acquired. Further, since the direction of the speaker P2 is easy for the user P1 to see, the user P1 can naturally have a conversation with the speaker P2 while using the voice recognition device 1.
  • the voice recognition device 1 has an elongated shape in the left-right direction, and the user P1 so that the left-right direction (longitudinal direction) of the voice recognition device 1 is perpendicular to the direction in which the user P1 and the speaker P2 face each other during use. Will hold the voice recognition device 1 in his hand. Therefore, the feeling of oppression on the speaker P2 due to the use of the voice recognition device 1 can be reduced as compared with the case where the longitudinal direction of the voice recognition device 1 is parallel to the direction in which the user P1 and the speaker P2 face each other. ..
  • the fourth microphone M4 moves. It may be controlled to be on. Alternatively, the first microphone M1 and the fourth microphone M4 may be controlled to be in the ON state regardless of whether or not the specific operation is performed.
  • the voice emitted by the user P1 is captured by the fourth microphone M4 in the ON state, and can be recognized by the voice recognition unit 21. Further, the recognition result can be translated into another language by the translation unit 22. The translation result is displayed on the display unit D and can be visually recognized by the user P1. User P1 can utter the content while looking at the translation result.
  • FIGS. 10 and 11 show a state in which the user P1 of the voice recognition device 1 talks with the speakers P3 to P5 with the voice recognition device 1 placed on the table 9 as a stationary body.
  • the voice recognition device 1 is placed on the table 9 so that the lower surface portion 16 is a contact surface with the table 9.
  • Both the user P1 and the speakers P3 to P5 are seated.
  • the speaker P3 sits to the right of the user P1, the speaker P4 sits facing the user P1 across the table 9, and the speaker P5 sits facing the speaker P3 across the table 9. There is.
  • the acceleration sensor 23 determines that the voice recognition device 1 is in a stationary state.
  • the microphone control unit 24 controls all the first to fourth microphones M1 to M4 to be in the ON state.
  • the areas where the second to fourth microphones M2 to M4 can acquire sound are shown as S2 to S4, respectively.
  • the display unit D is displayed for the user P1. It will be easy to see. Further, the upper surface portion 15 on which the first microphone M1 is arranged faces the speakers P4 and P5, and the right side surface portion 14 on which the third microphone M3 is arranged faces the speakers P3 and P5.
  • the region S1 from which the voice can be acquired by the first microphone M1 has a spread toward the speakers P4 and P5
  • the region S3 from which the voice can be acquired by the third microphone M3 becomes the speakers P3 and P5. It will have a spread toward it.
  • the voice emitted by the speaker P3 is input to the third microphone M3 and recognized by the voice recognition unit 21.
  • the voice emitted by the speaker P4 is input to the first microphone M1 and recognized by the voice recognition unit 21.
  • the voice emitted by the speaker P5 is input to at least one of the first microphone M1 and the third microphone M3, and is recognized by the voice recognition unit 21. With the voice recognition device 1 placed in front of the user P1, the voice recognition device 1 can efficiently acquire the voices emitted by the speakers P3 to P5.
  • speakers P3 to P5 are easier to see. That is, the user P1 can naturally have a conversation with the speakers P3 to P5 while using the voice recognition device 1.
  • the fourth microphone M4 may be controlled so as to be turned on when a specific operation is performed on the voice recognition device 1 by the user P1.
  • the voice recognition device 1 is placed on a stationary body such as a table 9 or a desk even when the voice recognition device 1 is held in the hand according to the situation in which a conversation is performed. Even in that case, the on / off of each of the plurality of microphones is controlled accordingly.
  • the voice recognition device 1 can efficiently acquire the voice of the speaker by the microphone turned on. At the same time, the voice recognition result and the translation result can be easily visually recognized by the user through the display unit D.
  • the outer shape of the voice recognition device 1 is a substantially triangular prism
  • another shape such as a rectangular parallelepiped may be used.
  • the voice recognition device 1 is not limited to the acceleration sensor, and the voice recognition device 1 may be provided with a sensor means capable of determining whether or not the voice recognition device 1 is in a stationary state.
  • the voice recognition device 1 does not have to include the translation unit 22.
  • the voice recognition device 1 can recognize the input voice as a language and store the recognition result in the storage device in the voice recognition device 1.
  • the voice recognition result may be displayed on the display unit D, or may be sent to another device outside the voice recognition device 1.
  • This voice recognition result can be used as a material for recording minutes and the like, and can also be used as a material for analyzing the productivity of conversations and meetings.
  • the voice recognition device 1 is small and lightweight enough to be used even when held by hand, and is equipped with a plurality of directional microphones.
  • the plurality of directional microphones can capture and recognize the voice emitted by the participants of the target conversation or conference even if they are relatively far from the voice recognition device. In other words, there is an increased possibility that the target conversation and the voices of all the participants in the conference can be captured and recognized.
  • the voice recognition device 1 turns on each of the plurality of microphones, the sensor 23 for determining whether or not the voice recognition device is in a stationary state, and the plurality of microphones according to the determination result by the sensor. It includes a microphone control unit 24 that controls whether to turn off the state, and a voice recognition unit 21 that recognizes the sound input to at least one microphone in the on state.
  • the on / off of each microphone is controlled according to the state of the voice recognition device. As a result, it is possible to efficiently acquire the voice that is the target of voice recognition.
  • the outer shape of the voice recognition device 1 is a substantially triangular prism having two faces having a substantially triangular shape and three faces having a substantially rectangular shape, and each of the two faces having the substantially triangular shape (left side surface portion 13 and right side surface portion 14).
  • a microphone is arranged on each of two surfaces (front surface portion 11 and upper surface portion 15) of the three substantially rectangular surfaces (front surface portion 11, upper surface portion 15 and lower surface portion 16).
  • the microphone control unit turns on all of the plurality of microphones.
  • the microphone control unit is arranged on each of two of the three substantially rectangular surfaces (front surface portion 11 and upper surface portion 15). The microphone is turned on, and the microphones arranged on each of the two substantially triangular surfaces (left side surface portion 13 and right side surface portion 14) are turned off.
  • the voice recognition device 1 further includes a display unit D that displays a voice recognition result by the voice recognition unit 21. As a result, the user of the voice recognition device can visually recognize the voice recognition result through the display unit D.
  • the voice recognition device 1 further includes a translation unit 22 that translates the voice recognized as the first language by the voice recognition unit 21 into a second language different from the first language.
  • the display unit D further displays the translation result by the translation unit. As a result, the voice recognition result is translated into a language different from the recognized language, and the translation result can be visually recognized by the user through the display unit D.
  • the quadrangular pyramid stand is a three-dimensional figure obtained by cutting a quadrangular pyramid in two on a plane parallel to the bottom surface and excluding the portion including the apex of the original quadrangular pyramid.
  • a quadrangular pyramid is a three-dimensional figure surrounded by four pyramid surfaces of a quadrangular pyramid, a bottom surface of the quadrangular pyramid, and a plane parallel to the bottom surface.
  • the flat surface portion having a large surface area is called a front surface portion and is indicated by reference numeral 11A.
  • the other flat portion having a small surface area is called a back surface portion and is indicated by reference numeral 12A.
  • Both the front portion 11A and the back portion 12A are substantially rectangular.
  • a display unit D visually recognized by the user of the voice recognition device 1A is arranged on the front unit 11A.
  • the voice recognition device 1A is provided so that the two long sides 11a 1 and 11b 1 of the front portion 11A are horizontal, and one long side 11a 1 is located above the other long side 11b 1.
  • the pyramidal surface portion located on the left side is called the left side surface portion of the voice recognition device 1A, and is indicated by reference numeral 13A
  • the pyramid surface portion located on the right side is the right side portion of the voice recognition device 1A. It is called a face portion and is indicated by reference numeral 14A.
  • the pyramidal surface portion located on the upper side is referred to as the upper surface portion of the voice recognition device 1A and is indicated by reference numeral 15A
  • the pyramid surface portion located on the lower side is referred to as the lower surface portion of the voice recognition device 1A and is referred to by reference numeral 16A. Shown.
  • the front portion 11A and the back portion 12A of the voice recognition device 1A are substantially rectangular, and the length from the front portion 11A to the back portion 12A is larger than the length of the long side portions 11a 1 and 11b 1 of the front portion 11A. short. That is, the voice recognition device 1A has an elongated shape in the left-right direction.
  • the first microphone M1 is arranged on the upper surface portion 15A.
  • the second microphone M2 is arranged on the left side surface portion 13A, and the third microphone M3 is arranged on the right side surface portion 14A. Further, in the front portion 11A, the fourth microphone M4 is arranged below the display portion D.
  • the flat surface portion having a large surface area is called a front surface portion and is indicated by reference numeral 11B.
  • the other flat portion having a small surface area is called a back surface portion and is indicated by reference numeral 12B. Both the front portion 11B and the back portion 12B are substantially rectangular.
  • a display unit D visually recognized by the user of the voice recognition device 1B is arranged on the front unit 11B.
  • the voice recognition device 1B is set so that the two long sides 11a 2 and 11b 2 of the front portion 11A are horizontal, and one long side 11a 2 is located above the other long side 11b 2.
  • the pyramidal surface portion located on the left side is called the left side surface portion of the voice recognition device 1A, and is indicated by reference numeral 13B
  • the pyramid surface portion located on the right side is the right side portion of the voice recognition device 1B. It is called a face portion and is indicated by reference numeral 14B.
  • the pyramid surface portion located on the upper side is referred to as the upper surface portion of the voice recognition device 1B and is indicated by reference numeral 15B
  • the cone surface portion located on the lower side is referred to as the lower surface portion of the voice recognition device 1B and is referred to by reference numeral 16B. Shown.
  • the front portion 11B and the back portion 12B of the voice recognition device 1B are substantially rectangular, and the length from the front portion 11B to the back portion 12B is larger than the length of the long side portions 11a 2 and 11b 2 of the front portion 11B. short. That is, the voice recognition device 1B has an elongated shape in the left-right direction.
  • the first microphone M1 is arranged on the upper surface portion 15B.
  • the second microphone M2 is arranged on the left side surface portion 13B, and the third microphone M3 is arranged on the right side surface portion 14B. Further, on the front portion 11B, a fourth microphone M4 is arranged below the display portion D.
  • the voice recognition device having a substantially quadrangular pyramid shape as described above is efficiently used according to the situation in which the conversation to be voice recognition is performed. be able to.
  • the same effect can be obtained with a voice recognition device having a pentahedron as an outer shape including a substantially triangular prism and a voice recognition device having a hexahedron as an outer shape including a substantially quadrangular pyramid.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • User Interface Of Digital Computer (AREA)
  • Machine Translation (AREA)

Abstract

Provided is a speech recognition device that can be used efficiently in accordance with a situation in which a conversation as the object of speech recognition takes place. The speech recognition device 1 is provided with: a plurality of microphones; a sensor 23 which determines whether the speech recognition device is in a stationary state; a microphone control unit 24 which controls whether to place each of the plurality of the microphones in on-state or off-state depending on the result of determination by the sensor; and a speech recognition unit 21 which recognizes speech that has been input to at least one microphone in on-state.

Description

音声認識装置Voice recognition device
 本発明は音声認識装置に関する。 The present invention relates to a voice recognition device.
 音声認識技術の発展に伴い、音声認識機能を備えた装置の需要が高まりつつある。特許文献1には、2つのディスプレイを備えた同時通訳装置が記載されている。同文献において、同時通訳装置が机、テーブル等の静止体に置かれた状態で使用される様子が図示されている。 With the development of voice recognition technology, the demand for devices equipped with a voice recognition function is increasing. Patent Document 1 describes a simultaneous interpretation device including two displays. In the same document, a state in which the simultaneous interpretation device is used while being placed on a stationary body such as a desk or a table is illustrated.
特開2018-195276号公報JP-A-2018-195276
 本発明は、音声認識の対象となる会話が行われる状況に合わせて効率的に使用可能な音声認識装置を提供することを目的とする。 An object of the present invention is to provide a voice recognition device that can be efficiently used according to a situation in which a conversation that is the target of voice recognition is performed.
 上記の目的を達成するため、本発明に係る音声認識装置は、複数のマイクロフォンと、前記音声認識装置が静止状態にあるか否かを判定するセンサと、前記センサによる判定結果に応じて複数の前記マイクロフォンの各々をオン状態とするかオフ状態とするかを制御するマイクロフォン制御部と、オン状態にある少なくとも1つの前記マイクロフォンに入力された音声を認識する音声認識部とを備え、前記音声認識装置の左側面部と右側面部と上面部とに前記マイクロフォンが配置されている。 In order to achieve the above object, the voice recognition device according to the present invention includes a plurality of microphones, a sensor for determining whether or not the voice recognition device is in a stationary state, and a plurality of sensors according to the determination result by the sensor. The voice recognition unit includes a microphone control unit that controls whether each of the microphones is turned on or off, and a voice recognition unit that recognizes a voice input to at least one microphone in the on state. The microphones are arranged on the left side surface portion, the right side surface portion, and the upper surface portion of the device.
 本発明によれば、音声認識の対象となる会話が行われる状況に合わせて効率的に使用可能な音声認識装置を提供することができる。 According to the present invention, it is possible to provide a voice recognition device that can be efficiently used according to a situation in which a conversation that is the target of voice recognition is performed.
音声認識装置の正面図である。It is a front view of the voice recognition device. 音声認識装置の背面図である。It is a rear view of the voice recognition device. 音声認識装置の左側面図である。It is a left side view of a voice recognition device. 音声認識装置の右側面図である。It is a right side view of the voice recognition device. 音声認識装置の平面図である。It is a top view of the voice recognition device. 音声認識装置の底面図である。It is a bottom view of the voice recognition device. 音声認識装置の機能ブロック図である。It is a functional block diagram of a voice recognition device. 音声認識装置を手で持った状態で使用する場合を示す説明図である。It is explanatory drawing which shows the case of using the voice recognition device in the state of holding by hand. 音声認識装置を手で持った状態で使用する場合を示す別の説明図である。It is another explanatory drawing which shows the case where the voice recognition device is used in the state of holding by hand. 音声認識装置を置いた状態で使用する場合を示す説明図である。It is explanatory drawing which shows the case of using with the voice recognition apparatus placed. 音声認識装置を置いた状態で使用する場合を示す別の説明図である。It is another explanatory diagram which shows the case of using with the voice recognition device placed. 別の実施形態に係る音声認識装置の正面図である。It is a front view of the voice recognition apparatus which concerns on another embodiment. 別の実施形態に係る音声認識装置の背面図である。It is a rear view of the voice recognition apparatus which concerns on another embodiment. 別の実施形態に係る音声認識装置の左側面図である。It is a left side view of the voice recognition apparatus which concerns on another embodiment. 別の実施形態に係る音声認識装置の右側面図である。It is a right side view of the voice recognition apparatus which concerns on another embodiment. 別の実施形態に係る音声認識装置の平面図である。It is a top view of the voice recognition apparatus which concerns on another embodiment. 別の実施形態に係る音声認識装置の底面図である。It is a bottom view of the voice recognition device which concerns on another embodiment. さらに別の実施形態に係る音声認識装置の正面図である。It is a front view of the voice recognition apparatus which concerns on still another embodiment. さらに別の実施形態に係る音声認識装置の背面図である。It is a rear view of the voice recognition apparatus which concerns on still another embodiment. さらに別の実施形態に係る音声認識装置の左側面図である。It is a left side view of the voice recognition apparatus which concerns on still another embodiment. さらに別の実施形態に係る音声認識装置の右側面図である。It is a right side view of the voice recognition apparatus which concerns on still another embodiment. さらに別の実施形態に係る音声認識装置の平面図である。It is a top view of the voice recognition apparatus which concerns on still another Embodiment. さらに別の実施形態に係る音声認識装置の底面図である。It is a bottom view of the voice recognition device which concerns on still another embodiment. さらに別の実施形態に係る音声認識装置の斜視図である。It is a perspective view of the voice recognition apparatus which concerns on still another Embodiment. さらに別の実施形態に係る音声認識装置の斜視図である。It is a perspective view of the voice recognition apparatus which concerns on still another Embodiment.
 以下、図面を参照しながら本発明の実施形態について説明する。ただし、本発明は、以下の実施形態によって限定されるものではない。 Hereinafter, embodiments of the present invention will be described with reference to the drawings. However, the present invention is not limited to the following embodiments.
 図1~図6に示すように、音声認識装置1の外形は略三角柱である。この略三角柱は、略三角形である2つの面と、略矩形である3つの面とを有する略五面体である。略矩形である3つの面の短辺部が、略三角形である2つの面の辺部でもある。略矩形である上記3つの面のうち、ある1つの面を正面部と呼び、符号11により示す。正面部11には、音声認識装置1のユーザが視認する表示部Dが配置されている。 As shown in FIGS. 1 to 6, the outer shape of the voice recognition device 1 is a substantially triangular prism. This substantially triangular prism is a substantially pentahedron having two faces that are substantially triangular and three faces that are substantially rectangular. The short sides of the three faces that are substantially rectangular are also the sides of the two faces that are substantially triangular. Of the above three surfaces that are substantially rectangular, one surface is referred to as a front portion and is indicated by reference numeral 11. A display unit D visually recognized by the user of the voice recognition device 1 is arranged on the front unit 11.
 正面部11の2本の長辺部11a及び11bが水平となるようにし、かつ一方の長辺11aが他方の長辺11bの真上に位置するようにして、音声認識装置1を正面から水平に見た状態を考える。この状態で、左右にある略三角形の面をそれぞれ左側面部13、右側面部14と呼び、正面部11と長辺部11aを共有する略矩形の面を上面部15と呼び、正面部11と長辺部11bを共有する略矩形の面を下面部16と呼ぶ。 The voice recognition device 1 is horizontal from the front so that the two long sides 11a and 11b of the front portion 11 are horizontal and one long side 11a is located directly above the other long side 11b. Think about the state you saw in. In this state, the substantially triangular surfaces on the left and right are referred to as the left side surface portion 13 and the right side surface portion 14, respectively, and the substantially rectangular surface sharing the front portion 11 and the long side portion 11a is referred to as the upper surface portion 15 and is long with the front portion 11. A substantially rectangular surface that shares the side portion 11b is referred to as a lower surface portion 16.
 音声認識装置1の正面部11、上面部15及び下面部16が略矩形であるとともに、正面部11の長辺部11a及び11bの長さに比べて、左側面部13及び右側面部14の各辺部の長さは短い。つまり、音声認識装置1は左右方向に細長い形状である。 The front surface portion 11, the upper surface portion 15, and the lower surface portion 16 of the voice recognition device 1 are substantially rectangular, and each side of the left side surface portion 13 and the right side surface portion 14 is compared with the lengths of the long side portions 11a and 11b of the front surface portion 11. The length of the part is short. That is, the voice recognition device 1 has an elongated shape in the left-right direction.
 また、正面部11と左側面部13とのなす角αは、鋭角である(図5)。同様に、正面部11と右側面部14とのなす角も鋭角である。さらに、上面部15と下面部16とが共有する辺部12と左側面部13とのなす角βは、鈍角である。同様に、辺部12と右側面部14とのなす角も鈍角である。 The angle α formed by the front portion 11 and the left surface portion 13 is an acute angle (FIG. 5). Similarly, the angle formed by the front surface portion 11 and the right side surface portion 14 is also an acute angle. Further, the angle β formed by the side portion 12 and the left side surface portion 13 shared by the upper surface portion 15 and the lower surface portion 16 is an obtuse angle. Similarly, the angle formed by the side portion 12 and the right side surface portion 14 is also an obtuse angle.
 音声認識装置1は、持ち運びが容易であり、かつ手で持った状態でも使用可能な程度にハンディーなサイズである。一例として、正面部11の左右方向の長さは120ミリメートルであり、長辺部11aと長辺部11bとの間隔は30ミリメートルであり、奥行き(正面部11から辺部12までの長さ)は15ミリメートルである。 The voice recognition device 1 is easy to carry and has a handy size so that it can be used even when it is held by hand. As an example, the length of the front portion 11 in the left-right direction is 120 mm, the distance between the long side portion 11a and the long side portion 11b is 30 mm, and the depth (the length from the front portion 11 to the side portion 12). Is 15 millimeters.
 図5に示すように、上面部15には第1マイクロフォンM1が配置されている。図3に示すように、左側面部13には第2マイクロフォンM2が配置されている。図4に示すように、右側面部14には第3マイクロフォンM3が配置されている。音声認識装置1を正面から水平に見たときに、正面部11において、表示部Dの下方には第4マイクロフォンM4が配置されている。 As shown in FIG. 5, the first microphone M1 is arranged on the upper surface portion 15. As shown in FIG. 3, a second microphone M2 is arranged on the left side surface portion 13. As shown in FIG. 4, a third microphone M3 is arranged on the right side surface portion 14. When the voice recognition device 1 is viewed horizontally from the front, the fourth microphone M4 is arranged below the display unit D in the front portion 11.
 第1マイクロフォンM1、第2マイクロフォンM2、第3マイクロフォンM3及び第4マイクロフォンM4は、いずれも、各マイクロフォンの前方に指向性を有する指向性マイクロフォンである。指向性マイクロフォンは、無指向性マイクロフォンに比べて、当該指向性マイクロフォンの正面であれば、より遠く離れて位置する発話者の音声を捉えることができる。各マイクロフォンは、オン状態であるときに、入力された音声を電気信号に変換する。各マイクロフォンをオン状態とするのか、オフ状態とするのかの制御については後述する。 The first microphone M1, the second microphone M2, the third microphone M3, and the fourth microphone M4 are all directional microphones having directivity in front of each microphone. Compared to an omnidirectional microphone, a directional microphone can capture the voice of a speaker located farther away from the front of the directional microphone. Each microphone converts the input voice into an electrical signal when it is on. The control of whether each microphone is turned on or off will be described later.
 図7に示すように、音声認識装置1は、音声認識部21と翻訳部22とセンサ23とマイクロフォン制御部24とをさらに有している。ただし、後述するように、翻訳部22は必須の要素ではない。 As shown in FIG. 7, the voice recognition device 1 further includes a voice recognition unit 21, a translation unit 22, a sensor 23, and a microphone control unit 24. However, as will be described later, the translation unit 22 is not an essential element.
 音声認識部21は、第1から第4のマイクロフォンM1~M4のうち、オン状態にある少なくとも1つのマイクロフォンから電気信号を受け取り、音声認識を行う。音声認識部21により、上記電気信号が第1の言語として認識される。この音声認識結果は、表示部Dに文字列として表示される。 The voice recognition unit 21 receives an electric signal from at least one microphone in the ON state among the first to fourth microphones M1 to M4, and performs voice recognition. The voice recognition unit 21 recognizes the electric signal as the first language. This voice recognition result is displayed as a character string on the display unit D.
 翻訳部22は、音声認識部21により第1の言語として認識された音声を、第1の言語とは異なる第2の言語へ翻訳する。この翻訳は翻訳エンジンにより行われる。翻訳エンジンは、音声認識装置1の内部に組み込まれているか、または音声認識装置1の外部にあって翻訳部22と通信できるように設けられている。翻訳部22による翻訳結果は、表示部Dに文字列として表示される。 The translation unit 22 translates the voice recognized as the first language by the voice recognition unit 21 into a second language different from the first language. This translation is done by the translation engine. The translation engine is built in the voice recognition device 1 or is provided outside the voice recognition device 1 so as to be able to communicate with the translation unit 22. The translation result by the translation unit 22 is displayed as a character string on the display unit D.
 加速度センサ23は、音声認識装置1の加速度を測定し、測定された加速度に基づいて音声認識装置1が静止状態にあるか否かを判定する。音声認識装置1が机、テーブルなどの静止体に置かれて静止している場合、加速度センサ23は、測定された加速度に基づいて、音声認識装置1は静止状態にあると判定する。これに対し、音声認識装置1のユーザが当該音声認識装置1を手に持っているなどして音声認識装置1が静止していない場合、加速度センサ23は、測定された加速度に基づいて、音声認識装置1は静止状態にないと判定する。 The acceleration sensor 23 measures the acceleration of the voice recognition device 1 and determines whether or not the voice recognition device 1 is in a stationary state based on the measured acceleration. When the voice recognition device 1 is placed on a stationary body such as a desk or a table and is stationary, the acceleration sensor 23 determines that the voice recognition device 1 is in a stationary state based on the measured acceleration. On the other hand, when the voice recognition device 1 is not stationary because the user of the voice recognition device 1 holds the voice recognition device 1 in his / her hand, the acceleration sensor 23 uses the measured acceleration to make a voice. The recognition device 1 determines that it is not in a stationary state.
 マイクロフォン制御部24は、加速度センサ23の判定結果によって、各マイクロフォンをオン状態とするか、オフ状態とするかを制御する。加速度センサ23により音声認識装置1が静止状態にあると判定された場合、第1から第4の全てのマイクロフォンM1~M4がオン状態となるように制御される。他方、音声認識装置1が静止状態にないと判定された場合、第1マイクロフォンM1がオン状態となり、かつ第2から第4のマイクロフォンM2~M4がオフ状態となるように制御される。 The microphone control unit 24 controls whether each microphone is turned on or off according to the determination result of the acceleration sensor 23. When the acceleration sensor 23 determines that the voice recognition device 1 is in the stationary state, all the first to fourth microphones M1 to M4 are controlled to be in the ON state. On the other hand, when it is determined that the voice recognition device 1 is not in the stationary state, the first microphone M1 is turned on and the second to fourth microphones M2 to M4 are controlled to be turned off.
 音声認識装置1は、そのコンピュータハードウェア構成として、図示はしていないが、プロセッサと、外部のコンピュータとの通信が可能なインタフェース装置と、音声認識装置1のユーザが入力のために使用する入力装置と、記憶装置とをさらに備えている。 Although not shown, the voice recognition device 1 has a computer hardware configuration of an interface device capable of communicating with a processor and an external computer, and an input used by a user of the voice recognition device 1 for input. It further includes a device and a storage device.
 図8及び図9に、音声認識装置1のユーザP1が音声認識装置1を手に持った状態で、話者P2と向かい合って会話する様子を示す。ユーザP1が音声認識装置1を手に持っていることから、加速度センサ23により、音声認識装置1は静止状態にないと判定される。その結果、マイクロフォン制御部24により第1マイクロフォンM1がオン状態となり、かつ第2から第4のマイクロフォンM2~M4がオフ状態となるように制御される。第1マイクロフォンM1が音声を取得できる領域を符号S1として示す。 8 and 9 show a state in which the user P1 of the voice recognition device 1 has a conversation with the speaker P2 while holding the voice recognition device 1 in his hand. Since the user P1 holds the voice recognition device 1 in his hand, the acceleration sensor 23 determines that the voice recognition device 1 is not in a stationary state. As a result, the microphone control unit 24 is controlled so that the first microphone M1 is turned on and the second to fourth microphones M2 to M4 are turned off. The region where the first microphone M1 can acquire voice is indicated by reference numeral S1.
 ユーザP1が、自身にとって表示部Dが視認しやすいように音声認識装置1をユーザP1側に傾けると、上面部15が話者P2とほぼ向かい合うことになる。その結果、第1マイクロフォンM1が音声を取得できる領域S1が上面部15から話者P2に向かって広がりを有することになる。 When the user P1 tilts the voice recognition device 1 toward the user P1 so that the display unit D can be easily seen by himself / herself, the upper surface portion 15 almost faces the speaker P2. As a result, the region S1 from which the first microphone M1 can acquire voice has a spread from the upper surface portion 15 toward the speaker P2.
 話者P2が発した音声は、オン状態にある第1マイクロフォンM1に入力され、音声認識部21により第1の言語(本例では英語)として認識される。認識結果は、表示部Dの左側の領域に表示される(図8)。音声認識部21により認識された音声は、翻訳部22により第2の言語(本例では日本語)へ翻訳される。翻訳結果は、表示部Dの右側の領域に表示される(図8)。 The voice emitted by the speaker P2 is input to the first microphone M1 in the on state, and is recognized as the first language (English in this example) by the voice recognition unit 21. The recognition result is displayed in the area on the left side of the display unit D (FIG. 8). The voice recognized by the voice recognition unit 21 is translated into a second language (Japanese in this example) by the translation unit 22. The translation result is displayed in the area on the right side of the display unit D (FIG. 8).
 このように、音声認識装置1の外形が略三角柱であることから、ユーザP1が自身にとって表示部Dが視認しやすいように音声認識装置1を傾けると同時に、領域S1が話者P2に向かって広がりを有することとなって話者P2が発した音声を効率的に取得できる。また、ユーザP1にとって話者P2の方向も見やすいため、ユーザP1は、音声認識装置1を使用しながら話者P2との間で自然に会話をすることができる。 As described above, since the outer shape of the voice recognition device 1 is a substantially triangular prism, the voice recognition device 1 is tilted so that the user P1 can easily see the display unit D, and at the same time, the area S1 is directed toward the speaker P2. Since it has a spread, the voice emitted by the speaker P2 can be efficiently acquired. Further, since the direction of the speaker P2 is easy for the user P1 to see, the user P1 can naturally have a conversation with the speaker P2 while using the voice recognition device 1.
 また、音声認識装置1は左右方向に細長い形状であり、使用時に音声認識装置1の左右方向(長手方向)が、ユーザP1と話者P2とが向かい合う方向に対して垂直となるようにユーザP1が音声認識装置1を手に持つことになる。そのため、音声認識装置1の長手方向が、ユーザP1と話者P2とが向かい合う方向と平行となる場合に比べて、音声認識装置1の使用による話者P2への圧迫感を低減することができる。 Further, the voice recognition device 1 has an elongated shape in the left-right direction, and the user P1 so that the left-right direction (longitudinal direction) of the voice recognition device 1 is perpendicular to the direction in which the user P1 and the speaker P2 face each other during use. Will hold the voice recognition device 1 in his hand. Therefore, the feeling of oppression on the speaker P2 due to the use of the voice recognition device 1 can be reduced as compared with the case where the longitudinal direction of the voice recognition device 1 is parallel to the direction in which the user P1 and the speaker P2 face each other. ..
 なお、図8及び図9に示すようにユーザP1が音声認識装置1を手に持った状態で、ユーザP1により音声認識装置1に対して特定の操作がされたときに、第4マイクロフォンM4がオン状態となるように制御されてもよい。あるいは、上記特定の操作がされたかどうかに関わらず、第1マイクロフォンM1及び第4マイクロフォンM4がオン状態となるように制御されてもよい。 As shown in FIGS. 8 and 9, when the user P1 holds the voice recognition device 1 in his / her hand and the user P1 performs a specific operation on the voice recognition device 1, the fourth microphone M4 moves. It may be controlled to be on. Alternatively, the first microphone M1 and the fourth microphone M4 may be controlled to be in the ON state regardless of whether or not the specific operation is performed.
 オン状態にある第4マイクロフォンM4により、ユーザP1が発した音声が捉えられ、音声認識部21により認識することができる。さらには、その認識結果を翻訳部22により別の言語へと翻訳することができる。翻訳結果は表示部Dに表示され、ユーザP1が視認することができる。ユーザP1は翻訳結果を見ながらその内容を発話することができる。 The voice emitted by the user P1 is captured by the fourth microphone M4 in the ON state, and can be recognized by the voice recognition unit 21. Further, the recognition result can be translated into another language by the translation unit 22. The translation result is displayed on the display unit D and can be visually recognized by the user P1. User P1 can utter the content while looking at the translation result.
 図10及び図11に、音声認識装置1のユーザP1が、静止体としてのテーブル9の上に音声認識装置1を置いた状態で話者P3~P5と会話する様子を示す。音声認識装置1は、下面部16がテーブル9との接触面となるようにしてテーブル9の上に置かれる。ユーザP1及び話者P3~P5はいずれも着席している。話者P3は、ユーザP1の右隣りに座っており、話者P4はユーザP1とテーブル9を挟んで向かい合って座っており、話者P5は話者P3とテーブル9を挟んで向かい合って座っている。 10 and 11 show a state in which the user P1 of the voice recognition device 1 talks with the speakers P3 to P5 with the voice recognition device 1 placed on the table 9 as a stationary body. The voice recognition device 1 is placed on the table 9 so that the lower surface portion 16 is a contact surface with the table 9. Both the user P1 and the speakers P3 to P5 are seated. The speaker P3 sits to the right of the user P1, the speaker P4 sits facing the user P1 across the table 9, and the speaker P5 sits facing the speaker P3 across the table 9. There is.
 音声認識装置1はテーブル9の上に置かれていることから、加速度センサ23により、音声認識装置1は静止状態にあると判定される。その結果、マイクロフォン制御部24により第1から第4の全てのマイクロフォンM1~M4がオン状態となるように制御される。第1マイクロフォンM1が音声を取得できる領域S1に加え、第2から第4のマイクロフォンM2~M4が音声を取得できる領域をそれぞれS2~S4として示す。 Since the voice recognition device 1 is placed on the table 9, the acceleration sensor 23 determines that the voice recognition device 1 is in a stationary state. As a result, the microphone control unit 24 controls all the first to fourth microphones M1 to M4 to be in the ON state. In addition to the area S1 where the first microphone M1 can acquire sound, the areas where the second to fourth microphones M2 to M4 can acquire sound are shown as S2 to S4, respectively.
 下面部16がテーブル9との接触面となるようにして音声認識装置1がテーブル9の上に置かれると、音声認識装置1の外形が略三角柱であることから、ユーザP1にとって表示部Dが視認しやすい状態となる。さらに、第1マイクロフォンM1が配置された上面部15が話者P4及びP5の方向を向き、第3マイクロフォンM3が配置された右側面部14が話者P3及びP5の方向を向くこととなる。 When the voice recognition device 1 is placed on the table 9 so that the lower surface portion 16 is a contact surface with the table 9, since the outer shape of the voice recognition device 1 is a substantially triangular prism, the display unit D is displayed for the user P1. It will be easy to see. Further, the upper surface portion 15 on which the first microphone M1 is arranged faces the speakers P4 and P5, and the right side surface portion 14 on which the third microphone M3 is arranged faces the speakers P3 and P5.
 そのため、第1マイクロフォンM1により音声が取得可能な領域S1が話者P4及びP5に向かって広がりを有することになるとともに、第3マイクロフォンM3により音声が取得可能な領域S3が話者P3及びP5に向かって広がりを有することになる。 Therefore, the region S1 from which the voice can be acquired by the first microphone M1 has a spread toward the speakers P4 and P5, and the region S3 from which the voice can be acquired by the third microphone M3 becomes the speakers P3 and P5. It will have a spread toward it.
 話者P3が発した音声は、第3マイクロフォンM3に入力され、音声認識部21により認識される。話者P4が発した音声は、第1マイクロフォンM1に入力され、音声認識部21により認識される。話者P5が発した音声は、第1マイクロフォンM1及び第3マイクロフォンM3の少なくとも一方に入力され、音声認識部21により認識される。ユーザP1が自身の前に音声認識装置1を置いたままの状態で、音声認識装置1は話者P3~P5の発する音声を効率的に取得することができる。 The voice emitted by the speaker P3 is input to the third microphone M3 and recognized by the voice recognition unit 21. The voice emitted by the speaker P4 is input to the first microphone M1 and recognized by the voice recognition unit 21. The voice emitted by the speaker P5 is input to at least one of the first microphone M1 and the third microphone M3, and is recognized by the voice recognition unit 21. With the voice recognition device 1 placed in front of the user P1, the voice recognition device 1 can efficiently acquire the voices emitted by the speakers P3 to P5.
 また、ユーザP1にとって話者P3~P5の方も見やすい。つまり、ユーザP1は、音声認識装置1を使用しながら話者P3~P5との間で自然に会話をすることができる。 Also, for user P1, speakers P3 to P5 are easier to see. That is, the user P1 can naturally have a conversation with the speakers P3 to P5 while using the voice recognition device 1.
 なお、第4マイクロフォンM4は、ユーザP1により音声認識装置1に対して特定の操作がされたときにオン状態となるように制御されてもよい。 Note that the fourth microphone M4 may be controlled so as to be turned on when a specific operation is performed on the voice recognition device 1 by the user P1.
 以上のように、音声認識装置1を用いて、会話が行われる状況に合わせて、音声認識装置1を手に持った場合でも、音声認識装置1をテーブル9、机等の静止体に置いた場合でも、それに応じて複数のマイクロフォンの各々のオン・オフが制御される。音声認識装置1は、オン状態とされたマイクロフォンにより話者の音声を効率的に取得できる。同時に、音声認識結果と翻訳結果とがユーザにとって表示部Dを通して視認しやすい。 As described above, using the voice recognition device 1, the voice recognition device 1 is placed on a stationary body such as a table 9 or a desk even when the voice recognition device 1 is held in the hand according to the situation in which a conversation is performed. Even in that case, the on / off of each of the plurality of microphones is controlled accordingly. The voice recognition device 1 can efficiently acquire the voice of the speaker by the microphone turned on. At the same time, the voice recognition result and the translation result can be easily visually recognized by the user through the display unit D.
 [その他]
 音声認識装置1の外形が略三角柱である例を示したが、直方体など別の形状であってもよい。また、加速度センサに限られず、音声認識装置1が静止状態にあるか否かを判定できるセンサ手段が音声認識装置1に設けられていればよい。
[Other]
Although an example is shown in which the outer shape of the voice recognition device 1 is a substantially triangular prism, another shape such as a rectangular parallelepiped may be used. Further, the voice recognition device 1 is not limited to the acceleration sensor, and the voice recognition device 1 may be provided with a sensor means capable of determining whether or not the voice recognition device 1 is in a stationary state.
 音声認識装置1は、翻訳部22を備えていなくてもよい。この場合、音声認識装置1は入力された音声を言語として認識し、その認識結果を音声認識装置1内の記憶装置に記憶することができる。あるいは、音声認識結果は、表示部Dに表示されてもよいし、音声認識装置1の外部にある別の装置に送られてもよい。この音声認識結果は、議事録などの記録の材料として用いることができ、その会話、会議の生産性の解析材料として用いることもできる。 The voice recognition device 1 does not have to include the translation unit 22. In this case, the voice recognition device 1 can recognize the input voice as a language and store the recognition result in the storage device in the voice recognition device 1. Alternatively, the voice recognition result may be displayed on the display unit D, or may be sent to another device outside the voice recognition device 1. This voice recognition result can be used as a material for recording minutes and the like, and can also be used as a material for analyzing the productivity of conversations and meetings.
 音声認識装置1は、手で持った状態でも使用可能な程度に小型・軽量であり、かつ複数の指向性マイクロフォンを備えている。複数の指向性マイクロフォンにより、対象となる会話、会議の参加者が音声認識装置から比較的離れた位置にいたとしても、その参加者の発する音声を捉えて認識することができる。つまり、対象となる会話、会議の参加者全員の音声を捉えて認識できる可能性が高まる。 The voice recognition device 1 is small and lightweight enough to be used even when held by hand, and is equipped with a plurality of directional microphones. The plurality of directional microphones can capture and recognize the voice emitted by the participants of the target conversation or conference even if they are relatively far from the voice recognition device. In other words, there is an increased possibility that the target conversation and the voices of all the participants in the conference can be captured and recognized.
 上記実施形態について改めて以下に説明する。
 [その1]
 音声認識装置1は、複数のマイクロフォンと、前記音声認識装置が静止状態にあるか否かを判定するセンサ23と、前記センサによる判定結果に応じて複数の前記マイクロフォンの各々をオン状態とするかオフ状態とするかを制御するマイクロフォン制御部24と、オン状態にある少なくとも1つの前記マイクロフォンに入力された音声を認識する音声認識部21とを備える。
The above embodiment will be described below again.
[Part 1]
The voice recognition device 1 turns on each of the plurality of microphones, the sensor 23 for determining whether or not the voice recognition device is in a stationary state, and the plurality of microphones according to the determination result by the sensor. It includes a microphone control unit 24 that controls whether to turn off the state, and a voice recognition unit 21 that recognizes the sound input to at least one microphone in the on state.
 これにより、音声認識装置の状態に合わせて各マイクロフォンのオン・オフが制御される。その結果、音声認識の対象となる音声を効率的に取得することができる。 As a result, the on / off of each microphone is controlled according to the state of the voice recognition device. As a result, it is possible to efficiently acquire the voice that is the target of voice recognition.
 [その2]
 音声認識装置1の外形が、略三角形である2つの面と略矩形である3つの面とを有する略三角柱であり、前記略三角形である2つの面(左側面部13及び右側面部14)の各々と、前記略矩形である3つの面(正面部11、上面部15及び下面部16)のうち2つの面(正面部11及び上面部15)の各々とに、マイクロフォンが配置されている。
[Part 2]
The outer shape of the voice recognition device 1 is a substantially triangular prism having two faces having a substantially triangular shape and three faces having a substantially rectangular shape, and each of the two faces having the substantially triangular shape (left side surface portion 13 and right side surface portion 14). A microphone is arranged on each of two surfaces (front surface portion 11 and upper surface portion 15) of the three substantially rectangular surfaces (front surface portion 11, upper surface portion 15 and lower surface portion 16).
 これにより、音声認識装置のユーザが会話をする相手である話者に向かって各マイクロフォンの音声取得可能領域が広がりを有することとなる可能性が高まる。これは、音声の効率的な取得につながる。 As a result, there is a high possibility that the voice acquisition area of each microphone will be expanded toward the speaker with whom the user of the voice recognition device has a conversation. This leads to efficient acquisition of audio.
 [その3]
 音声認識装置が静止状態にあると前記センサが判定した場合、前記マイクロフォン制御部は、複数の前記マイクロフォンの全てをオン状態とする。また、音声認識装置が静止状態にないと前記センサが判定した場合、前記マイクロフォン制御部は、前記略矩形である3つの面のうち2つの面(正面部11及び上面部15)の各々に配置されたマイクロフォンをオン状態とし、前記略三角形である2つの面(左側面部13及び右側面部14)の各々に配置されたマイクロフォンをオフ状態とする。
[Part 3]
When the sensor determines that the voice recognition device is in a stationary state, the microphone control unit turns on all of the plurality of microphones. When the sensor determines that the voice recognition device is not in a stationary state, the microphone control unit is arranged on each of two of the three substantially rectangular surfaces (front surface portion 11 and upper surface portion 15). The microphone is turned on, and the microphones arranged on each of the two substantially triangular surfaces (left side surface portion 13 and right side surface portion 14) are turned off.
 これにより、音声認識装置が静止状態にある場合と静止状態にない場合との双方において、音声を効率的に取得できる可能性が高まる。 This increases the possibility that the voice can be efficiently acquired both when the voice recognition device is in the stationary state and when it is not in the stationary state.
 [その4]
 音声認識装置1は、音声認識部21による音声認識結果を表示する表示部Dをさらに備える。これにより、音声認識装置のユーザは、表示部Dを通して音声認識結果を視認することができる。
[Part 4]
The voice recognition device 1 further includes a display unit D that displays a voice recognition result by the voice recognition unit 21. As a result, the user of the voice recognition device can visually recognize the voice recognition result through the display unit D.
 [その5]
 音声認識装置1は、音声認識部21により第1の言語として認識された音声を、前記第1の言語とは異なる第2の言語へ翻訳する翻訳部22をさらに備える。表示部Dにはさらに、翻訳部による翻訳結果が表示される。これにより、音声認識結果が、認識された言語とは別の言語へ翻訳され、その翻訳結果を、表示部Dを通してユーザが視認することができる。
[Part 5]
The voice recognition device 1 further includes a translation unit 22 that translates the voice recognized as the first language by the voice recognition unit 21 into a second language different from the first language. The display unit D further displays the translation result by the translation unit. As a result, the voice recognition result is translated into a language different from the recognized language, and the translation result can be visually recognized by the user through the display unit D.
 以下、音声認識装置の別の例について説明する。 Hereinafter, another example of the voice recognition device will be described.
 図12~図17に、外形が略四角錐台状である音声認識装置1Aを示す。四角錐台とは、四角錐を底面に平行な平面で2つに切り、元の四角錐の頭頂点を含む部分を除いた立体図形である。言い換えれば、四角錐台は、四角錐の4枚の錐体面と、四角錐の底面と、該底面に平行な平面とによって囲まれる立体図形である。 12 to 17 show a voice recognition device 1A having a substantially quadrangular pyramid shape. The quadrangular pyramid stand is a three-dimensional figure obtained by cutting a quadrangular pyramid in two on a plane parallel to the bottom surface and excluding the portion including the apex of the original quadrangular pyramid. In other words, a quadrangular pyramid is a three-dimensional figure surrounded by four pyramid surfaces of a quadrangular pyramid, a bottom surface of the quadrangular pyramid, and a plane parallel to the bottom surface.
 音声認識装置1Aの外形を形成する2つの平行な平面部のうち、表面積が大きい一方の平面部を正面部と呼び、符号11Aにより示す。また、表面積が小さい他方の平面部を背面部と呼び、符号12Aにより示す。正面部11A及び背面部12Aはいずれも略長方形である。正面部11Aには、音声認識装置1Aのユーザが視認する表示部Dが配置されている。 Of the two parallel flat surfaces forming the outer shape of the voice recognition device 1A, the flat surface portion having a large surface area is called a front surface portion and is indicated by reference numeral 11A. The other flat portion having a small surface area is called a back surface portion and is indicated by reference numeral 12A. Both the front portion 11A and the back portion 12A are substantially rectangular. A display unit D visually recognized by the user of the voice recognition device 1A is arranged on the front unit 11A.
 正面部11Aの2本の長辺部11a及び11bが水平となるようにし、かつ一方の長辺11aが他方の長辺11bよりも上方に位置するようにして、音声認識装置1Aを正面から水平に見た状態を考える。この状態で、4つの錐体面部のうち、左側に位置する錐体面部を音声認識装置1Aの左側面部と呼び、符号13Aにより示すとともに、右側に位置する錐体面部を音声認識装置1Aの右側面部と呼び、符号14Aにより示す。同じ状態で、上側に位置する錐体面部を音声認識装置1Aの上面部と呼び、符号15Aにより示すとともに、下側に位置する錐体面部を音声認識装置1Aの下面部と呼び、符号16Aにより示す。 The voice recognition device 1A is provided so that the two long sides 11a 1 and 11b 1 of the front portion 11A are horizontal, and one long side 11a 1 is located above the other long side 11b 1. Consider the state of looking horizontally from the front. In this state, of the four pyramidal surface portions, the pyramidal surface portion located on the left side is called the left side surface portion of the voice recognition device 1A, and is indicated by reference numeral 13A, and the pyramid surface portion located on the right side is the right side portion of the voice recognition device 1A. It is called a face portion and is indicated by reference numeral 14A. In the same state, the pyramidal surface portion located on the upper side is referred to as the upper surface portion of the voice recognition device 1A and is indicated by reference numeral 15A, and the pyramid surface portion located on the lower side is referred to as the lower surface portion of the voice recognition device 1A and is referred to by reference numeral 16A. Shown.
 音声認識装置1Aの正面部11A及び背面部12Aが略長方形であるとともに、正面部11Aの長辺部11a及び11bの長さに比べて、正面部11Aから背面部12Aまでの長さは短い。つまり、音声認識装置1Aは左右方向に細長い形状である。 The front portion 11A and the back portion 12A of the voice recognition device 1A are substantially rectangular, and the length from the front portion 11A to the back portion 12A is larger than the length of the long side portions 11a 1 and 11b 1 of the front portion 11A. short. That is, the voice recognition device 1A has an elongated shape in the left-right direction.
 上面部15Aに第1マイクロフォンM1が配置されている。左側面部13Aに第2マイクロフォンM2が配置され、右側面部14Aに第3マイクロフォンM3が配置されている。さらに、正面部11Aにおいて、表示部Dの下方には第4マイクロフォンM4が配置されている。 The first microphone M1 is arranged on the upper surface portion 15A. The second microphone M2 is arranged on the left side surface portion 13A, and the third microphone M3 is arranged on the right side surface portion 14A. Further, in the front portion 11A, the fourth microphone M4 is arranged below the display portion D.
 図18~図25に、外形が略四角錐台状である別の音声認識装置1Bを示す。 18 to 25 show another voice recognition device 1B having a substantially quadrangular pyramid shape.
 音声認識装置1Bの外形を形成する2つの平行な平面部のうち、表面積が大きい一方の平面部を正面部と呼び、符号11Bにより示す。また、表面積が小さい他方の平面部を背面部と呼び、符号12Bにより示す。正面部11B及び背面部12Bはいずれも略長方形である。正面部11Bには、音声認識装置1Bのユーザが視認する表示部Dが配置されている。 Of the two parallel flat surfaces forming the outer shape of the voice recognition device 1B, the flat surface portion having a large surface area is called a front surface portion and is indicated by reference numeral 11B. Further, the other flat portion having a small surface area is called a back surface portion and is indicated by reference numeral 12B. Both the front portion 11B and the back portion 12B are substantially rectangular. A display unit D visually recognized by the user of the voice recognition device 1B is arranged on the front unit 11B.
 正面部11Aの2本の長辺部11a及び11bが水平となるようにし、かつ一方の長辺11aが他方の長辺11bよりも上方に位置するようにして、音声認識装置1Bを正面から水平に見た状態を考える。この状態で、4つの錐体面部のうち、左側に位置する錐体面部を音声認識装置1Aの左側面部と呼び、符号13Bにより示すとともに、右側に位置する錐体面部を音声認識装置1Bの右側面部と呼び、符号14Bにより示す。同じ状態で、上側に位置する錐体面部を音声認識装置1Bの上面部と呼び、符号15Bにより示すとともに、下側に位置する錐体面部を音声認識装置1Bの下面部と呼び、符号16Bにより示す。 The voice recognition device 1B is set so that the two long sides 11a 2 and 11b 2 of the front portion 11A are horizontal, and one long side 11a 2 is located above the other long side 11b 2. Consider the state of looking horizontally from the front. In this state, of the four pyramidal surface portions, the pyramidal surface portion located on the left side is called the left side surface portion of the voice recognition device 1A, and is indicated by reference numeral 13B, and the pyramid surface portion located on the right side is the right side portion of the voice recognition device 1B. It is called a face portion and is indicated by reference numeral 14B. In the same state, the pyramid surface portion located on the upper side is referred to as the upper surface portion of the voice recognition device 1B and is indicated by reference numeral 15B, and the cone surface portion located on the lower side is referred to as the lower surface portion of the voice recognition device 1B and is referred to by reference numeral 16B. Shown.
 音声認識装置1Bの正面部11B及び背面部12Bが略長方形であるとともに、正面部11Bの長辺部11a及び11bの長さに比べて、正面部11Bから背面部12Bまでの長さは短い。つまり、音声認識装置1Bは左右方向に細長い形状である。 The front portion 11B and the back portion 12B of the voice recognition device 1B are substantially rectangular, and the length from the front portion 11B to the back portion 12B is larger than the length of the long side portions 11a 2 and 11b 2 of the front portion 11B. short. That is, the voice recognition device 1B has an elongated shape in the left-right direction.
 上面部15Bに第1マイクロフォンM1が配置されている。左側面部13Bに第2マイクロフォンM2が配置され、右側面部14Bに第3マイクロフォンM3が配置されている。さらに、正面部11Bには、表示部Dの下方には第4マイクロフォンM4が配置されている。 The first microphone M1 is arranged on the upper surface portion 15B. The second microphone M2 is arranged on the left side surface portion 13B, and the third microphone M3 is arranged on the right side surface portion 14B. Further, on the front portion 11B, a fourth microphone M4 is arranged below the display portion D.
 以上のような、外形が略四角錐台状の音声認識装置についても、外形が略三角柱状の音声認識装置と同様、音声認識の対象となる会話が行われる状況に合わせて効率的に使用することができる。略三角柱を含む五面体を外形とする音声認識装置と、略四角錐台を含む六面体を外形とする音声認識装置とについても同様の効果が得られる。 As with the voice recognition device having a substantially triangular pyramid shape, the voice recognition device having a substantially quadrangular pyramid shape as described above is efficiently used according to the situation in which the conversation to be voice recognition is performed. be able to. The same effect can be obtained with a voice recognition device having a pentahedron as an outer shape including a substantially triangular prism and a voice recognition device having a hexahedron as an outer shape including a substantially quadrangular pyramid.
 本発明の特定の実施形態について説明したが、本発明はこのような実施形態に限定されず、本発明の技術的思想に基づく種々の変更は本発明の概念に含まれる。 Although the specific embodiment of the present invention has been described, the present invention is not limited to such an embodiment, and various modifications based on the technical idea of the present invention are included in the concept of the present invention.
1 音声認識装置
11 正面部、13 左側面部、14 右側面部、15 上面部、16 下面部
D 表示部(ディスプレイ)、M1~M4 マイクロフォン
21 音声認識部、22 翻訳部、23 センサ、24 マイクロフォン制御部
1 Voice recognition device 11 Front part, 13 Left side part, 14 Right side part, 15 Top part, 16 Bottom part D Display (display), M1 to M4 Microphone 21 Voice recognition part, 22 Translation part, 23 Sensor, 24 Microphone control part

Claims (6)

  1.  音声認識装置であって、
     複数のマイクロフォンと、
     前記音声認識装置が静止状態にあるか否かを判定するセンサと、
     前記センサによる判定結果に応じて複数の前記マイクロフォンの各々をオン状態とするかオフ状態とするかを制御するマイクロフォン制御部と、
     オン状態にある少なくとも1つの前記マイクロフォンに入力された音声を認識する音声認識部と
     を備え、
     前記音声認識装置の左側面部と右側面部と上面部とに前記マイクロフォンが配置されている、音声認識装置。
    It is a voice recognition device
    With multiple microphones
    A sensor that determines whether or not the voice recognition device is in a stationary state, and
    A microphone control unit that controls whether each of the plurality of microphones is turned on or off according to the determination result by the sensor.
    It includes a voice recognition unit that recognizes the voice input to at least one microphone in the ON state.
    A voice recognition device in which the microphones are arranged on the left side surface portion, the right side surface portion, and the upper surface portion of the voice recognition device.
  2.  前記音声認識装置の外形が、略四角錐の4枚の錐体面と、前記略四角錐の底面と、前記底面に平行な平面とによって囲まれる略四角錐台である、請求項1に記載の音声認識装置。 The first aspect of claim 1, wherein the outer shape of the voice recognition device is a substantially quadrangular pyramid base surrounded by four pyramid surfaces of a substantially quadrangular pyramid, a bottom surface of the substantially quadrangular pyramid, and a plane parallel to the bottom surface. Voice recognition device.
  3.  前記音声認識装置が静止状態にあると前記センサが判定した場合、前記マイクロフォン制御部は、複数の前記マイクロフォンの全てをオン状態とし、
     前記音声認識装置が静止状態にないと前記センサが判定した場合、前記マイクロフォン制御部は、前記上面部に配置されたマイクロフォンをオン状態とし、前記左側面部及び前記右側面部に配置されたマイクロフォンをオフ状態とする、
     請求項1又は2に記載の音声認識装置。
    When the sensor determines that the voice recognition device is in a stationary state, the microphone control unit turns on all of the plurality of microphones.
    When the sensor determines that the voice recognition device is not in a stationary state, the microphone control unit turns on the microphone arranged on the upper surface portion and turns off the microphone arranged on the left side surface portion and the right side surface portion. To be in a state
    The voice recognition device according to claim 1 or 2.
  4.  前記音声認識部による音声認識結果を表示する表示部をさらに備える請求項1~3のいずれか一項に記載の音声認識装置。 The voice recognition device according to any one of claims 1 to 3, further comprising a display unit for displaying the voice recognition result by the voice recognition unit.
  5.  前記音声認識部により第1の言語として認識された音声を、前記第1の言語とは異なる第2の言語へ翻訳する翻訳部をさらに備え、
     前記表示部にはさらに、前記翻訳部による翻訳結果が表示される、
     請求項4に記載の音声認識装置。
    A translation unit for translating the voice recognized as the first language by the voice recognition unit into a second language different from the first language is further provided.
    The display unit further displays the translation result by the translation unit.
    The voice recognition device according to claim 4.
  6.  前記表示部の左側の領域に、前記音声認識部による音声認識結果が表示され、
     前記表示部の右側の領域に、前記翻訳部による翻訳結果が表示される、
     請求項5に記載の音声認識装置。
    The voice recognition result by the voice recognition unit is displayed in the area on the left side of the display unit.
    The translation result by the translation unit is displayed in the area on the right side of the display unit.
    The voice recognition device according to claim 5.
PCT/JP2020/010283 2019-03-15 2020-03-10 Speech recognition device WO2020189410A1 (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
JP2019-049070 2019-03-15
JP2019049070 2019-03-15
JP2019179273A JP7432177B2 (en) 2019-03-15 2019-09-30 voice recognition device
JP2019-179273 2019-09-30

Publications (1)

Publication Number Publication Date
WO2020189410A1 true WO2020189410A1 (en) 2020-09-24

Family

ID=72520982

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2020/010283 WO2020189410A1 (en) 2019-03-15 2020-03-10 Speech recognition device

Country Status (1)

Country Link
WO (1) WO2020189410A1 (en)

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004294945A (en) * 2003-03-28 2004-10-21 Toshiba Corp Speech recognition apparatus
WO2007099908A1 (en) * 2006-02-27 2007-09-07 Matsushita Electric Industrial Co., Ltd. Wearable terminal, mobile imaging sound collecting device, and device, method, and program for implementing them
JP2008051882A (en) * 2006-08-22 2008-03-06 Canon Inc Speech information processing apparatus and its control method
JP2009295236A (en) * 2008-06-05 2009-12-17 Kenwood Corp Recording device, control method, and program
JP2011248140A (en) * 2010-05-27 2011-12-08 Fujitsu Toshiba Mobile Communications Ltd Voice recognition device
JP2015069600A (en) * 2013-09-30 2015-04-13 株式会社東芝 Voice translation system, method, and program
WO2016117421A1 (en) * 2015-01-21 2016-07-28 シャープ株式会社 Voice-input device, information processing device, method of controlling voice-input device, and control program
JP3209755U (en) * 2017-01-24 2017-04-06 シェンジェン ロイオル テクノロジーズ カンパニー リミテッドShenzhen Royole Technologies Co., Ltd. Multimedia processing terminal
JP2018085091A (en) * 2016-11-11 2018-05-31 パナソニックIpマネジメント株式会社 Translation device control method, translation device, and program

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004294945A (en) * 2003-03-28 2004-10-21 Toshiba Corp Speech recognition apparatus
WO2007099908A1 (en) * 2006-02-27 2007-09-07 Matsushita Electric Industrial Co., Ltd. Wearable terminal, mobile imaging sound collecting device, and device, method, and program for implementing them
JP2008051882A (en) * 2006-08-22 2008-03-06 Canon Inc Speech information processing apparatus and its control method
JP2009295236A (en) * 2008-06-05 2009-12-17 Kenwood Corp Recording device, control method, and program
JP2011248140A (en) * 2010-05-27 2011-12-08 Fujitsu Toshiba Mobile Communications Ltd Voice recognition device
JP2015069600A (en) * 2013-09-30 2015-04-13 株式会社東芝 Voice translation system, method, and program
WO2016117421A1 (en) * 2015-01-21 2016-07-28 シャープ株式会社 Voice-input device, information processing device, method of controlling voice-input device, and control program
JP2018085091A (en) * 2016-11-11 2018-05-31 パナソニックIpマネジメント株式会社 Translation device control method, translation device, and program
JP3209755U (en) * 2017-01-24 2017-04-06 シェンジェン ロイオル テクノロジーズ カンパニー リミテッドShenzhen Royole Technologies Co., Ltd. Multimedia processing terminal

Similar Documents

Publication Publication Date Title
JP6494286B2 (en) In-vehicle gesture interaction space audio system
EP3424229B1 (en) Systems and methods for spatial audio adjustment
US20190028817A1 (en) System and method for a directional speaker selection
JP6738342B2 (en) System and method for improving hearing
EP2839675B1 (en) Auto detection of headphone orientation
US9747068B2 (en) Audio processing based upon camera selection
EP3226579B1 (en) Information-processing device, information-processing system, control method, and program
US20140328505A1 (en) Sound field adaptation based upon user tracking
JP2012220959A (en) Apparatus and method for determining relevance of input speech
WO2017135194A1 (en) Information processing device, information processing system, control method and program
WO2020189410A1 (en) Speech recognition device
JP7432177B2 (en) voice recognition device
JP2020149035A (en) Voice recognition device
US20230362571A1 (en) Information processing device, information processing terminal, information processing method, and program
JP2006094315A (en) Stereophonic reproduction system
JP2011150657A (en) Translation voice reproduction apparatus and reproduction method thereof
JP2013055428A (en) Information processing apparatus and information processing method
AU2014321133A1 (en) Multi-channel microphone mapping
WO2023195048A1 (en) Voice augmented reality object reproduction device and information terminal system
EP4054181A1 (en) Virtual space sharing system, virtual space sharing method, and virtual space sharing program
WO2023054047A1 (en) Information processing device, information processing method, and program
EP4184507A1 (en) Headset apparatus, teleconference system, user device and teleconferencing method
WO2006106671A1 (en) Image processing device, image display device, reception device, transmission device, communication system, image processing method, image processing program, and recording medium containing the image processing program
JP2022072454A (en) Virtual event system, information processing device, output control method, and program
KR20230112688A (en) Head-mounted computing device with microphone beam steering

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20772563

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20772563

Country of ref document: EP

Kind code of ref document: A1