WO2016063587A1 - 音声処理システム - Google Patents
音声処理システム Download PDFInfo
- Publication number
- WO2016063587A1 WO2016063587A1 PCT/JP2015/070040 JP2015070040W WO2016063587A1 WO 2016063587 A1 WO2016063587 A1 WO 2016063587A1 JP 2015070040 W JP2015070040 W JP 2015070040W WO 2016063587 A1 WO2016063587 A1 WO 2016063587A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- voice
- user
- processing system
- unit
- control unit
- Prior art date
Links
- 238000000034 method Methods 0.000 claims description 50
- 230000008569 process Effects 0.000 claims description 48
- 230000035945 sensitivity Effects 0.000 claims description 10
- 239000007787 solid Substances 0.000 claims description 2
- 238000010586 diagram Methods 0.000 description 23
- 238000003384 imaging method Methods 0.000 description 22
- 238000004891 communication Methods 0.000 description 17
- 230000006870 function Effects 0.000 description 10
- 230000000694 effects Effects 0.000 description 7
- 230000000052 comparative effect Effects 0.000 description 6
- 230000033001 locomotion Effects 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 3
- 230000005236 sound signal Effects 0.000 description 3
- 230000001629 suppression Effects 0.000 description 3
- 230000007613 environmental effect Effects 0.000 description 2
- 230000001151 other effect Effects 0.000 description 2
- 241000272201 Columbiformes Species 0.000 description 1
- 230000001133 acceleration Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 230000000295 complement effect Effects 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 239000011521 glass Substances 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000009434 installation Methods 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 229910044991 metal oxide Inorganic materials 0.000 description 1
- 150000004706 metal oxides Chemical class 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R3/00—Circuits for transducers, loudspeakers or microphones
- H04R3/005—Circuits for transducers, loudspeakers or microphones for combining the signals of two or more microphones
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R1/00—Details of transducers, loudspeakers or microphones
- H04R1/20—Arrangements for obtaining desired frequency or directional characteristics
- H04R1/32—Arrangements for obtaining desired frequency or directional characteristics for obtaining desired directional characteristic only
- H04R1/40—Arrangements for obtaining desired frequency or directional characteristics for obtaining desired directional characteristic only by combining a number of identical transducers
- H04R1/406—Arrangements for obtaining desired frequency or directional characteristics for obtaining desired directional characteristic only by combining a number of identical transducers microphones
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/20—Speech recognition techniques specially adapted for robustness in adverse environments, e.g. in noise, of stress induced speech
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M9/00—Arrangements for interconnection not involving centralised switching
- H04M9/08—Two-way loud-speaking telephone systems with means for conditioning the signal, e.g. for suppressing echoes for one or both directions of traffic
- H04M9/082—Two-way loud-speaking telephone systems with means for conditioning the signal, e.g. for suppressing echoes for one or both directions of traffic using echo cancellers
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R1/00—Details of transducers, loudspeakers or microphones
- H04R1/02—Casings; Cabinets ; Supports therefor; Mountings therein
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R1/00—Details of transducers, loudspeakers or microphones
- H04R1/20—Arrangements for obtaining desired frequency or directional characteristics
- H04R1/32—Arrangements for obtaining desired frequency or directional characteristics for obtaining desired directional characteristic only
- H04R1/40—Arrangements for obtaining desired frequency or directional characteristics for obtaining desired directional characteristic only by combining a number of identical transducers
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R3/00—Circuits for transducers, loudspeakers or microphones
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R1/00—Details of transducers, loudspeakers or microphones
- H04R1/10—Earpieces; Attachments therefor ; Earphones; Monophonic headphones
- H04R1/1008—Earpieces of the supra-aural or circum-aural type
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R2201/00—Details of transducers, loudspeakers or microphones covered by H04R1/00 but not provided for in any of its subgroups
- H04R2201/02—Details casings, cabinets or mounting therein for transducers covered by H04R1/02 but not provided for in any of its subgroups
- H04R2201/023—Transducers incorporated in garment, rucksacks or the like
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R2430/00—Signal processing covered by H04R, not provided for in its groups
- H04R2430/20—Processing of the output signals of the acoustic transducers of an array for obtaining a desired directivity characteristic
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R2499/00—Aspects covered by H04R or H04S not otherwise provided for in their subgroups
- H04R2499/10—General applications
- H04R2499/11—Transducers incorporated or for use in hand-held devices, e.g. mobile phones, PDA's, camera's
Definitions
- This disclosure relates to a voice processing system.
- wearable devices that are worn at any place on a user's body to sense the user's state, capture or record surroundings, and output various information to the user are becoming widespread.
- wearable devices are used in various fields such as life log fields and sports support fields.
- the information acquired by the wearable device can be greatly influenced by the mounting location, the user's condition, and the surrounding environment.
- audio audio emitted from the user's mouth (hereinafter also referred to as user audio) may be buried in noise such as frictional and vibrational sounds between the wearable device and clothes, ambient environmental sounds, and the like. .
- user audio audio emitted from the user's mouth
- noise such as frictional and vibrational sounds between the wearable device and clothes, ambient environmental sounds, and the like.
- Patent Document 1 there is a technology for acquiring a voice signal that suppresses noise and emphasizes a user voice by providing two microphones in a headset and performing a microphone array process on a voice signal input from each microphone. It is disclosed.
- the present disclosure proposes a new and improved voice processing system that can more clearly acquire user voice.
- an audio processing system including an attachment unit attached to a user, wherein the attachment unit includes at least three audio acquisition units that acquire audio data for beam forming.
- elements having substantially the same functional configuration may be distinguished by adding different alphabets after the same reference numerals.
- a plurality of elements having substantially the same functional configuration are distinguished as necessary, such as voice acquisition units 110A, 110B, and 110C.
- voice acquisition units 110A, 110B, and 110C are simply referred to as the voice acquisition unit 110.
- FIGS. 1 to 3 are diagrams showing an example of an external configuration of the speech processing system according to the present embodiment.
- the speech processing system 1 according to the present embodiment includes a mounting unit (a mounting portion) shaped to make a half turn from both sides of the neck to the back side (back side).
- wearing unit is mounted
- FIGS. 1 to 3 show views of the mounting unit mounted by the user from various viewpoints. Specifically, FIG. 1 is a perspective view, and FIG. 2 is viewed from the right side of the user. FIG. 3 is a side view, and FIG. 3 is a plan view seen from the upper side of the user.
- directions such as up / down / left / right and front / rear are used. These directions are viewed from the center of the user's body (for example, the position of the pigeon tail) in the upright posture of the user as shown in FIG.
- the direction shall be indicated.
- “right” indicates the direction of the user's right side
- “left” indicates the direction of the user's left side
- “up” indicates the direction of the user's head
- “down” Indicates the direction of the user's foot.
- front indicates a direction in which the user's body faces
- “rear” indicates a direction on the user's back side.
- the mounting unit may be a neck-mounted type that is mounted around the user's neck.
- the mounting unit may be mounted in close contact with the user's neck or may be mounted separately.
- Other shapes of the neck-mounted mounting unit include, for example, a pendant type that is attached to the user by a neck strap, or a headset type that has a neck band that passes through the back of the neck instead of a head band that is applied to the head. It is done.
- the usage form of the wearable unit may be a form that is used by being worn directly on the human body.
- the form used by being directly worn refers to a form used without any object between the wearable unit and the human body.
- the case where the mounting unit shown in FIGS. 1 to 3 is mounted so as to be in contact with the skin of the user's neck corresponds to this embodiment.
- various forms such as a headset type and a glasses type that are directly attached to the head are conceivable.
- the usage form of the wearable unit may be a form that is used by being indirectly worn on the human body.
- the form that is used by being indirectly attached refers to a form that is used in a state where some object exists between the wearable unit and the human body.
- the wearing unit shown in FIGS. 1 to 3 is worn so as to be hidden under the collar of a shirt so as to be in contact with the user from the top of the clothes, it corresponds to this embodiment.
- various forms such as a pendant type attached to the user by a neck strap and a broach type fastened to clothes by a fastener or the like are conceivable.
- the mounting unit has a plurality of sound acquisition units 110 (110A, 110B, 110C, and 110D) as shown in FIGS.
- the voice acquisition unit 110 acquires voice data such as user voice, voice uttered by the user's speaking partner, or ambient environmental sound.
- the voice data acquired by the voice acquisition unit 110 is a target of beam forming processing that makes the user voice clear, makes the voice spoken by the user's talking partner clear, and suppresses other noises.
- FIGS. 1 to 3 show a configuration in which four audio acquisition units 110 are provided in the mounting unit, the present technology is not limited to such an example.
- the mounting unit may have at least three voice acquisition units, or may have five or more.
- the voice processing system 1 may be realized as a single mounting unit or may be realized as a combination of a plurality of devices.
- the voice processing system 1 may be realized as a combination of the neck-mounted mounting unit and the wristband-type mounting unit mounted on the arm shown in FIGS.
- the sound processing system 1 may perform beam forming processing using sound data acquired by a plurality of sound acquisition units provided in a plurality of devices.
- the voice processing system 1 is realized as a single mounting unit shown in FIGS.
- FIGS. 4 and 5 are diagrams showing another example of the external configuration of the sound processing system according to the present embodiment.
- FIG. 4 shows an external configuration of the sound processing system 1 composed of a single glasses-type wearing unit.
- FIG. 5 shows an external configuration of the sound processing system 1 composed of a single neckband type mounting unit. 4 and 5, the voice processing system 1 includes a plurality of voice acquisition units 110 (110A, 110B, 110C, and 110D) as in the examples shown in FIGS. Yes.
- FIG. 6 is a diagram illustrating an example of an external configuration of a sound processing system according to a comparative example.
- the left diagram and the right diagram in FIG. 6 show an example of the external configuration of a so-called Bluetooth (registered trademark) headset.
- the audio processing system according to the comparative example includes two audio acquisition units 910 (910A and 910B), and is worn by the user by being put on the user's right ear.
- the audio processing system according to the comparative example includes two audio acquisition units 910 (910C and 910D) provided symmetrically on the cable connected to the left and right earphones.
- the distance between the microphone and the user's mouth may be increased during use, and the user's voice may be buried in noise. Even if the beam forming process using the audio data acquired by the two audio acquisition units as in the comparative example is performed, it is difficult to solve such a problem.
- FIG. 7 is a diagram for explaining an arrangement policy of the voice acquisition unit 110 according to the present embodiment.
- the first arrangement policy is to arrange the voice acquisition unit 110 linearly with respect to the direction 210 in which the target sound arrives.
- the second arrangement policy is to arrange the voice acquisition unit 110 linearly with respect to the direction 220 in which the noise to be suppressed arrives.
- the sound acquisition units 110A and 110B may be arranged linearly with respect to the direction 210 of the user's mouth, which is the direction in which the user sound that is the target sound arrives. According to the first and second arrangement policies, it is possible to efficiently suppress the noise component coming from the opposite direction 220.
- phase difference time difference
- the voice arriving from the opposite direction 220 is voice acquisition units 110B and 110A.
- the phase difference until it reaches is large.
- the polar pattern shown in the right diagram of FIG. 7 user speech arriving from the user's mouth direction 210 is emphasized and beam directions from the opposite directions 220 ⁇ / b> A, 220 ⁇ / b> B, and 220 ⁇ / b> C are enhanced by beam forming processing by the control unit 160 described later. Noise component is suppressed.
- FIG. 8 is a diagram for explaining an arrangement policy of the voice acquisition unit 110 according to the present embodiment.
- the third arrangement policy is to arrange the voice acquisition units 110A and 110B linearly in the downward direction.
- the voice processing system 1 is used outdoors, most of the noise generated outdoors comes from the ground direction (downward) or the horizontal direction with reference to the user's mouth as shown in FIG. Note that noise coming from the direction of the ground is also referred to as road noise.
- road noise can be efficiently suppressed by beam forming processing. Become.
- the sound acquisition units 110A and 110B may not be strictly arranged linearly in the downward direction (vertical direction), or may be inclined.
- FIG. 9 is a diagram for explaining an arrangement policy of the voice acquisition unit 110 according to the present embodiment.
- the fourth arrangement policy is to arrange a plurality of sound acquisition units 110 in a three-dimensional manner.
- the shape formed by connecting the positions where the four sound acquisition units 110 are provided is a three-dimensional shape.
- the remaining one voice acquisition unit 110 does not exist on the plane including the positions of any three voice acquisition units 110.
- FIG. 9 is a diagram for explaining an arrangement policy of the voice acquisition unit 110 according to the present embodiment.
- the shape formed by connecting the positions where the four sound acquisition units 110 are provided is a regular tetrahedron.
- the shape formed by connecting the positions where the plurality of sound acquisition units 110 are provided is a regular polyhedron such as a regular tetrahedron in which the distances from the sound acquisition units 110 to the user's mouth are equally spaced. desirable.
- the shape formed by connecting the positions where the four sound acquisition units 110 are provided may be a tetrahedron that is not a regular tetrahedron, depending on the shape of the mounting unit. .
- FIG. 10 is a diagram for explaining an arrangement policy of the voice acquisition unit 110 according to the present embodiment.
- the fifth arrangement policy is to bring at least one of the sound acquisition units 110 closer to the user's mouth.
- at least one voice acquisition unit 110 can acquire a user voice at a louder volume than other noises.
- the enhancement effect of the user voice by the beam forming process can be further increased.
- a fifth voice acquisition unit 110E may be provided at a position closer to the user's mouth than the four voice acquisition units 110 forming the tetrahedron.
- any of the voice acquisition units 110 located at the vertices of the tetrahedron (the voice acquisition unit 110 ⁇ / b> A in the example shown in the right diagram of FIG. 9) It may be provided at a position closest to the user's mouth as compared with the above.
- the sound acquisition unit 110 ⁇ / b> A and the sound acquisition unit 110 ⁇ / b> B are arranged in the same direction as viewed from the user's mouth in a state where the mounting unit is mounted on the user. Further, in a state where the mounting unit is attached to the user, the distance between the voice acquisition unit 110A (first voice acquisition unit) included in the four voice acquisition units 110 and the user's mouth, and the four voice acquisition units 110 The included voice acquisition unit 110B (second voice acquisition unit) and the distance between the user's mouth are different from each other.
- the voice acquisition units 110A and 110B are arranged linearly with respect to the direction of the user's mouth where the target sound arrives, the user is subjected to the beam forming process. The voice can be emphasized efficiently.
- the sound acquisition unit 110 ⁇ / b> A and the sound acquisition unit 110 ⁇ / b> B are arranged in the same direction as viewed from the user's mouth in a state where the mounting unit is mounted on the user.
- the voice acquisition unit 110A (first voice acquisition unit) and the voice acquisition unit 110B (second voice acquisition unit) are attached to the user while the mounting unit is attached to the user. Provided on the side.
- the voice acquisition units 110A and 110B are linearly arranged with respect to the direction of the ground where the noise to be suppressed arrives, the noise is generated by the beam forming process. It can be suppressed efficiently.
- the shapes formed by connecting the positions where the sound acquisition units 110A, 110B, 110C, and 110D are provided are three-dimensional. As described above, in the example shown in FIGS. 1 to 3, since the plurality of voice acquisition units 110 are arranged in a three-dimensional manner, it is possible to suppress noise coming from all directions by beam forming processing.
- the sound acquisition unit 110A (first sound acquisition unit) is closest to the user's mouth compared to other sound acquisition units in a state where the mounting unit is mounted on the user. In the position. In this way, in the example shown in FIGS. 1 to 3, since the voice acquisition unit 110A is provided at a position close to the user's mouth, the user voice can be acquired with a louder volume than other noises. It becomes possible.
- the sound acquisition unit 110B (second sound acquisition unit) is the position closest to the user's mouth in the user's upright posture.
- the voice acquisition unit 110A (first voice acquisition unit) provided on the user's foot side. Thereby, in the example shown in FIGS. 1 to 3, it is possible to achieve both the user voice enhancement effect and the noise suppression effect. 1 to 3, the voice acquisition unit 110A is also provided below the user's mouth, but the voice acquisition unit 110A may be provided above the mouth.
- the arrangement of the voice acquisition unit 110 in the voice processing system 1 according to this embodiment has been described above. Next, with reference to FIG. 11, an internal configuration of the voice processing system 1 according to the present embodiment will be described.
- FIG. 11 is a block diagram illustrating an example of an internal configuration of the voice processing system 1 according to the present embodiment.
- the voice processing system 1 includes voice acquisition units 110A to 110D, an imaging unit 120, an operation unit 130, a sensor unit 140, a communication unit 150, and a control unit 160.
- the sound acquisition unit 110 has a function of acquiring sound data for beam forming.
- the voice acquisition unit 110 acquires a user voice uttered by a user wearing the voice processing system 1 (a mounting unit) or surrounding sounds.
- the sound acquisition unit 110 is realized by a microphone.
- the sound acquisition unit 110 may be provided in one mounting unit, may be provided in a device different from the mounting unit, or may be provided in a plurality of devices.
- the voice acquisition unit 110 may be provided in a wristband-type mounting unit, a glasses-type mounting unit, and a smartphone.
- the voice acquisition unit 110 may not be a directional microphone.
- the voice acquisition unit 110 may be a microphone having sensitivity in all directions. Having sensitivity in all directions means that there is no insensitive area (orientation) in the polar pattern. Such a microphone may also be referred to as a semi-directional microphone.
- the voice acquisition unit 110 may be a microphone whose sensitivity is uniform or substantially uniform in all directions. Sensitivity is uniform or substantially uniform in all directions indicates that the sensitivity is circular in the polar pattern, but not necessarily a perfect circle. That is, the sound acquisition unit 110 may be an omnidirectional microphone.
- the sound acquisition unit 110 may include a microphone amplifier circuit and an A / D converter that amplify the sound signal obtained by the microphone.
- the voice acquisition unit 110 outputs the acquired voice data to the control unit 160.
- Imaging unit 120 photoelectrically converts imaging light obtained by the lens system including a lens system including an imaging lens, a diaphragm, a zoom lens, and a focus lens, a drive system that causes the lens system to perform a focus operation and a zoom operation, and the lens system.
- a solid-state imaging device array that generates an imaging signal.
- the solid-state imaging device array may be realized by, for example, a CCD (Charge Coupled Device) sensor array or a CMOS (Complementary Metal Oxide Semiconductor) sensor array.
- the imaging unit 120 may be provided so that the front of the user can be imaged in a state where the audio processing system 1 (installation unit) is attached to the user.
- the imaging unit 120 can capture, for example, the user's talking partner.
- the imaging unit 120 may be provided so that the user's face can be imaged in a state where the voice processing system 1 is attached to the user.
- the voice processing system 1 can specify the position of the user's mouth from the captured image.
- the imaging unit 120 outputs the captured image data, which is a digital signal, to the control unit 160.
- Operation unit 130 The operation unit 130 is operated by a user and has a function of receiving input from the user.
- the operation unit 130 may be realized as a camera button that receives an input instructing the imaging unit 120 to capture a still image and an input instructing the start or stop of moving image imaging.
- the operation unit 130 may be realized as a voice input button that receives an input for instructing start or stop of voice input by the voice acquisition unit 110.
- the operation unit 130 may be realized as a touch slider that accepts a touch operation or a slide operation.
- the operation unit 130 may be realized as a power button that receives an operation for instructing to turn on or off the power of the voice processing system 1.
- the operation unit 130 outputs information indicating user input to the control unit 160.
- the sensor unit 140 has a function of sensing the state of the user wearing the voice processing system 1 or the surrounding state.
- the sensor unit 140 may include at least one of an acceleration sensor, a speed sensor, a gyro sensor, a geomagnetic sensor, a GPS (Global Positioning System) module, or a vibration sensor.
- the sensor unit 140 may be provided in a device different from the mounting unit, or may be provided dispersedly in a plurality of devices.
- a pulse sensor may be provided in a wristband type device, and a vibration sensor may be provided in a smartphone.
- the sensor unit 140 outputs information indicating the sensing result to the control unit 160.
- the communication unit 150 is a communication module for performing transmission / reception of data between the voice processing system 1 and another device by wire / wireless.
- the communication unit 150 is, for example, a wired LAN (Local Area Network), wireless LAN, Wi-Fi (Wireless Fidelity, registered trademark), infrared communication, Bluetooth, NFC (Near field communication), etc. Wireless communication is performed via a network access point.
- the communication unit 150 is acquired by the voice acquisition unit 110, the imaging unit 120, the operation unit 130, and the sensor unit 140. Transmitted data may be transmitted. In this case, beam forming processing, speech recognition processing, and the like are performed by another device.
- the communication unit 150 receives and controls data acquired by them. The data may be output to the unit 160. Further, the communication unit 150 may transmit the audio data after the beam forming processing by the control unit 160 to a storage device for storing the audio data.
- Control unit 160 functions as an arithmetic processing unit and a control unit, and controls the overall operation in the voice processing system 1 according to various programs.
- the control unit 160 is realized by an electronic circuit such as a CPU (Central Processing Unit) or a microprocessor, for example.
- the control unit 160 may include a ROM (Read Only Memory) that stores programs to be used, calculation parameters, and the like, and a RAM (Random Access Memory) that temporarily stores parameters that change as appropriate.
- ROM Read Only Memory
- RAM Random Access Memory
- the control unit 160 performs beam forming processing that forms directivity for acquiring sound from the direction of the user's mouth, using the plurality of sound data acquired by the sound acquisition unit 110.
- the beam forming process is a process for changing the degree of enhancement for each region where sound comes.
- the beam forming process performed by the control unit 160 may include a process of suppressing sound arriving from a specific area, or may include a process of enhancing sound from a desired direction.
- the control unit 160 may suppress voice from a direction other than the direction of the user's mouth as noise.
- the control part 160 may emphasize the audio
- the voice acquisition unit 110 itself may not have directivity.
- the control unit 160 controls the directivity by performing beam forming processing on the audio data acquired by each audio acquisition unit 110.
- the control unit 160 can perform the beam forming process using the phase difference between the audio data acquired by each audio acquisition unit 110.
- the control unit 160 can control the beam forming process from various viewpoints.
- the control unit 160 can control the direction and / or range in which directivity is formed from the viewpoint described below as an example.
- the control unit 160 may control the beam forming process based on the positional relationship between the noise generation source and the voice acquisition unit 110. For example, since the road noise generation source is the ground as described above, the control unit 160 may control the beam forming process so as to suppress sound from the direction of the ground. For example, when it can be determined from the position information that there is a road or a road with a large amount of traffic in a specific direction, the control unit 160 may control the beam forming process so as to suppress the sound from the direction. In addition, for example, when there is a user instruction specifying the position of the noise generation source, the control unit 160 may control the beam forming process so as to suppress the sound from the position indicated by the user instruction.
- control unit 160 may control the beam forming process based on the position of a speaker other than the user.
- the control unit 160 may perform a beamforming process that emphasizes voices from speakers other than the user.
- the control unit 160 may perform beam forming processing for suppressing speech from other speakers other than the user.
- Various methods for specifying the presence or position (direction) of speakers other than the user can be considered. For example, when the spoken voice is acquired from a direction other than the user, the control unit 160 may determine that there is another speaker and specify the direction. The control unit 160 may determine that another speaker is present when it is recognized that the voice of another speaker has been acquired by voice recognition.
- control unit 160 may specify the presence and position of another speaker based on the image recognition result of the captured image captured by the imaging unit 120.
- control unit 160 identifies the presence and position of another speaker by comparing the user's position information acquired by the GPS module of the sensor unit 140 with the position information of the other speaker. Also good.
- control unit 160 identifies the presence and position of the other speaker by measuring the radio wave intensity (for example, Wi-Fi radio wave intensity) of the radio wave emitted from the device possessed by the other speaker. Also good.
- the control unit 160 may control the beam forming process based on information indicating the user state.
- the user state may refer to an exercise state such as a user running, walking, or riding a vehicle.
- the control unit 160 can estimate the user's motion state according to the sensing result acquired by the sensor unit 140.
- the controller 160 may estimate a detailed motion state by combining a plurality of sensing results.
- the control unit 160 may estimate that the user is riding a bicycle when the vibration level and the speed are larger than those during walking by combining the sensing results of the vibration sensor and the speed sensor.
- the control unit 160 may estimate that the vehicle is riding in an automobile when the vibration level is low and the speed is high compared to when riding a bicycle.
- control part 160 may expand or reduce the directivity range to form according to the estimated user's exercise state. For example, when the intensity of exercise indicated by the exercise state (for example, a numerical value output from each sensor) is relatively large, the control unit 160 may expand the directivity range as compared to the case where the exercise intensity is small. Good. Note that enlarging or reducing the directivity range may be understood as enlarging or reducing the range of a region that exhibits a sensitivity equal to or higher than a predetermined value with respect to the incoming sound.
- the user's state may refer to the posture of the user such as the orientation and posture of the user's face.
- control unit 160 may estimate the orientation of the user's face from the image recognition result of the captured image captured by the imaging unit 120 and control the direction of directivity according to the orientation. In this case, even when the orientation of the face changes and the positional relationship between the user's mouth and the sound acquisition unit 110 changes, the control unit 160 controls the directivity so that the sound emitted from the user's mouth is clearly acquired. It is possible.
- control unit 160 may perform a process according to the result of the voice recognition performed based on the voice data subjected to the beam forming process.
- the voice recognition process may be executed by the control unit 160 or may be executed by another device such as a server on the cloud.
- the control unit 160 may control the operation of the speech processing system 1 based on the result of speech recognition.
- the control unit 160 may control the directivity related to the beam forming process based on the result of speech recognition.
- the user can instruct the voice to direct the directivity in the direction of the voice to be recorded, for example.
- the control unit 160 may start or stop imaging with a camera or record a specific sensing result based on the result of voice recognition.
- the user can instruct by voice to record, for example, the scenery or motion state to be recorded.
- control unit 160 may be realized as a mobile processor, for example. As described above, the control unit 160 may be included in the mounting unit, or may be included in another arbitrary device such as a smartphone or a server on the cloud.
- the speech processing system 1 may have various components.
- the voice processing system 1 may have a battery.
- the battery is preferably a curved curved battery.
- the voice processing system 1 may have a charging connector to which a cable for charging the battery can be connected.
- the charging connector may be a charging communication connector having a function as a communication connector to which a communication cable can be connected.
- the voice processing system 1 may have a vibrator that functions as an output device to the user.
- the voice processing system 1 may have a speaker that functions as an output device to the user.
- the audio processing system 1 may have an earphone connector to which an earphone that functions as an output device for a user can be connected.
- the earphone connector may have a magnetic force, and the earphone connector and the earphone may be detachable by the magnetic force.
- the sound processing system 1 may include a storage unit for storing sound data after the beam forming process by the control unit 160.
- FIG. 12 is a flowchart showing an example of the flow of audio signal processing executed in the audio processing system 1 according to the present embodiment.
- step S102 the voice processing system 1 acquires voice data.
- each of the sound acquisition units 110A, 110B, 110C, and 110D acquires sound data and outputs the sound data to the control unit 160.
- the sound processing system 1 acquires information indicating the positional relationship between the sound source and the sound acquisition unit 110.
- the sound source may be a noise generation source, a user's mouth that is a generation source of user voice, or a speaker other than the user.
- the control unit 160 acquires information indicating the positional relationship between these sound sources and the sound acquisition unit 110, specifically, the direction viewed from the sound acquisition unit 110.
- the sensing result by the sensor unit 140, information acquired from another device by the communication unit 150, and the like can be given.
- the voice processing system 1 acquires information indicating the user state.
- the control unit 160 acquires information indicating the user's exercise state or the user's posture.
- the voice recognition result of the voice acquired by the voice acquisition unit 110, the image recognition result of the captured image captured by the imaging unit 120, the information indicating the user input acquired by the operation unit 130, The sensing result by the sensor unit 140, information acquired from another device by the communication unit 150, and the like can be given.
- the sound processing system 1 performs a beam forming process.
- the control unit 160 performs beam forming processing for forming directivity for acquiring sound from the direction of the mouth of the user, using the plurality of sound data acquired in step S102.
- the control unit 160 may control the beam forming process so as to suppress noise based on the positional relationship between the noise generation source and the voice acquisition unit 110.
- the control unit 160 may perform a beamforming process that emphasizes or suppresses sound from a speaker other than the user based on the position of the speaker other than the user.
- the control part 160 may control the direction and / or range which form directivity according to a user's state.
- step S110 the speech processing system 1 performs speech recognition processing.
- the control unit 160 performs a voice recognition process based on the voice data that has undergone the beamforming process.
- the control part 160 may control operation
- the voice processing system 1 has at least three voice acquisition units in the mounting unit. Thereby, the audio processing system 1 can acquire audio data suitable for performing the beam forming process that makes the user audio clearer.
- each device described in this specification may be realized using any of software, hardware, and a combination of software and hardware.
- the program constituting the software is stored in advance in a storage medium (non-transitory medium) provided inside or outside each device.
- Each program is read into a RAM when executed by a computer and executed by a processor such as a CPU.
- the mounting unit includes at least three sound acquisition units that acquire sound data for beam forming.
- the mounting unit includes at least four of the sound acquisition units,
- the voice processing system according to (1) wherein the shape formed by connecting the positions where the four voice acquisition units are provided is a solid.
- (3) In a state where the wearing unit is worn by the user, the distance between the first voice obtaining unit included in the four voice obtaining units and the mouth of the user, and the second contained in the four voice obtaining units.
- the speech processing system according to (1) or (2), wherein the distance between the speech acquisition unit and the user's mouth is provided differently.
- the first voice acquisition unit is provided at a position closest to the mouth of the user as compared with the other voice acquisition units, The voice processing system according to (3), wherein the second voice acquisition unit is provided closer to the user's foot than the first voice acquisition unit in the upright posture of the user.
- the voice processing system further includes a control unit that performs beam forming processing for forming directivity for acquiring voice from the direction of the user's mouth using the plurality of voice data acquired by the voice acquisition unit.
- the speech processing system according to any one of (2) to (7), comprising: (9) The sound processing system according to (8), wherein the beam forming process is a process of changing a degree of enhancement for each region where sound comes. (10) The voice processing system according to (9), wherein the beam forming process includes a process of suppressing sound coming from a specific region.
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Signal Processing (AREA)
- Otolaryngology (AREA)
- Human Computer Interaction (AREA)
- Computational Linguistics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Multimedia (AREA)
- Quality & Reliability (AREA)
- General Health & Medical Sciences (AREA)
- Circuit For Audible Band Transducer (AREA)
- User Interface Of Digital Computer (AREA)
- Obtaining Desirable Characteristics In Audible-Bandwidth Transducers (AREA)
Abstract
Description
なお、上記の効果は必ずしも限定的なものではなく、上記の効果とともに、または上記の効果に代えて、本明細書に示されたいずれかの効果、または本明細書から把握され得る他の効果が奏されてもよい。
1.外観構成
2.音声取得部の配置
2-1.配置方針
2-2.実際の配置例
3.内部構成
4.動作処理
5.まとめ
まず、図1~図6を参照して、本開示の一実施形態に係る音声処理システムの外観構成を説明する。
[2-1.配置方針]
まず、図7~図10を参照して、音声取得部110の配置方針について説明する。
続いて、再度図1~図3を参照しながら、上述した配置方針に従った音声取得部110の実際の配置例を説明する。なお、音声取得部110の実際の配置は、装着ユニットの形状や各部品の重量等の制約条件により、上述した配置方針に必ずしも完全に従っていなくてもよい。
図11は、本実施形態に係る音声処理システム1の内部構成の一例を示すブロック図である。図11に示すように、音声処理システム1は、音声取得部110A~110D、撮像部120、操作部130、センサ部140、通信部150、及び制御部160を有する。
音声取得部110は、ビームフォーミングのための音声データを取得する機能を有する。例えば、音声取得部110は、音声処理システム1(装着ユニット)を装着したユーザが発するユーザ音声、または周囲の音声を取得する。例えば、音声取得部110は、マイクロホンにより実現される。音声取得部110は、ひとつの装着ユニットに設けられてもよいし、装着ユニットとは別の装置に設けられていてもよいし、複数の装置に分散して設けられていてもよい。例えば、図1~図3に示した首かけ型の装着ユニットに加え、リストバンド型の装着ユニット、メガネ型の装着ユニット、及びスマートフォンに音声取得部110が設けられてもよい。
撮像部120は、撮像レンズ、絞り、ズームレンズ、及びフォーカスレンズ等により構成されるレンズ系、レンズ系に対してフォーカス動作やズーム動作を行わせる駆動系、レンズ系で得られる撮像光を光電変換して撮像信号を生成する固体撮像素子アレイ等を有する。固体撮像素子アレイは、例えばCCD(Charge Coupled Device)センサアレイや、CMOS(Complementary Metal Oxide Semiconductor)センサアレイにより実現されてもよい。例えば、撮像部120は、音声処理システム1(装着ユニット)がユーザに装着された状態で、ユーザの前方を撮像可能に設けられてもよい。この場合、撮像部120は、例えばユーザの話し相手を撮像することが可能となる。また、撮像部120は、音声処理システム1がユーザに装着された状態で、ユーザの顔を撮像可能に設けられてもよい。この場合、音声処理システム1は、撮像画像からユーザの口の位置を特定することが可能となる。撮像部120は、デジタル信号とされた撮像画像のデータを制御部160へ出力する。
操作部130は、ユーザにより操作され、ユーザからの入力を受け付ける機能を有する。例えば、操作部130は、撮像部120による静止画像の撮像を指示する入力、動画像の撮像開始又は停止を指示する入力を受け付けるカメラボタンとして実現されてもよい。また、操作部130は、音声取得部110による音声入力の開始又は停止を指示する入力を受け付ける音声入力ボタンとして実現されてもよい。また、操作部130は、タッチ操作やスライド操作を受け付けるタッチスライダーとして実現されてもよい。また、操作部130は、音声処理システム1の電源ON又はOFFを指示する操作を受け付ける電源ボタンとして実現されてもよい。操作部130は、ユーザ入力を示す情報を制御部160へ出力する。
センサ部140は、音声処理システム1を装着したユーザの状態又は周囲の状態をセンシングする機能を有する。例えば、センサ部140は、加速度センサ、速度センサ、ジャイロセンサ、地磁気センサ、GPS(Global Positioning System)モジュール、又は振動センサの少なくともいずれかを有していてもよい。センサ部140は、装着ユニットとは別の装置に設けられていてもよいし、複数の装置に分散して設けられていてもよい。例えば、リストバンド型の装置に脈拍センサが設けられ、スマートフォンに振動センサが設けられてもよい。センサ部140は、センシング結果を示す情報を制御部160へ出力する。
通信部150は、有線/無線により音声処理システム1と他の装置との間でデータの送受信を行うための通信モジュールである。通信部150は、例えば有線LAN(Local Area Network)、無線LAN、Wi-Fi(Wireless Fidelity、登録商標)、赤外線通信、Bluetooth、NFC(Near field communication)等の方式で、外部機器と直接、またはネットワークアクセスポイントを介して無線通信する。
制御部160は、演算処理装置および制御装置として機能し、各種プログラムに従って音声処理システム1内の動作全般を制御する。制御部160は、例えばCPU(Central Processing Unit)、マイクロプロセッサ等の電子回路によって実現される。なお、制御部160は、使用するプログラムや演算パラメータ等を記憶するROM(Read Only Memory)、及び適宜変化するパラメータ等を一時記憶するRAM(Random Access Memory)を含んでいてもよい。
他にも、音声処理システム1は、多様な構成要素を有し得る。例えば、音声処理システム1は、バッテリーを有していてもよい。図1~図3に示すように、装着ユニットが湾曲した形状を有し得るため、バッテリーは曲面状の曲面バッテリーであることが望ましい。また、音声処理システム1は、バッテリーに充電するためのケーブルを接続可能な充電コネクタを有していてもよい。充電コネクタは、通信ケーブルを接続可能な通信コネクタとしての機能を兼ね備える、充電通信コネクタであってもよい。また、音声処理システム1は、ユーザへの出力装置として機能するバイブレータを有していてもよい。また、音声処理システム1は、ユーザへの出力装置として機能するスピーカを有していてもよい。また、音声処理システム1は、ユーザへの出力装置として機能するイヤホンを接続可能なイヤホンコネクタを有していてもよい。イヤホンコネクタは、磁力を有していてもよく、磁力によりイヤホンコネクタとイヤホンとが着脱可能であってもよい。また、音声処理システム1は、制御部160によるビームフォーミング処理後の音声データを記憶するための記憶部を有していてもよい。
図12は、本実施形態に係る音声処理システム1において実行される音声信号処理の流れの一例を示すフローチャートである。
以上、図1~図12を参照して、本開示の一実施形態について詳細に説明した。上記説明したように、本実施形態に係る音声処理システム1は、装着ユニットに少なくとも3つの音声取得部を有する。これにより、音声処理システム1は、ユーザ音声をより鮮明にするビームフォーミング処理を行うために適した音声データを取得することが可能となる。
(1)
ユーザに装着される装着部を備え、
前記装着部は、ビームフォーミングのための音声データを取得する音声取得部を少なくとも3つ有する、音声処理システム。
(2)
前記装着部は、少なくとも4つの前記音声取得部を有し、
4つの前記音声取得部が設けられる位置をそれぞれ結んで形成される形状は立体である、前記(1)に記載の音声処理システム。
(3)
前記装着部が前記ユーザに装着された状態で、前記4つの音声取得部に含まれる第1の音声取得部と前記ユーザの口との距離と、前記4つの音声取得部に含まれる第2の音声取得部と前記ユーザの口との距離と、を異ならせて設けられる、前記(1)又は(2)に記載の音声処理システム。
(4)
前記装着部が前記ユーザに装着された状態で、
前記第1の音声取得部は、他の前記音声取得部と比較して最も前記ユーザの口に近い位置に設けられ、
前記第2の音声取得部は、前記ユーザの直立姿勢における前記第1の音声取得部より前記ユーザの足側に設けられる、前記(3)に記載の音声処理システム。
(5)
前記第1の音声取得部及び前記第2の音声取得部は、前記ユーザの直立姿勢における前記ユーザの口より足側に設けられる、前記(3)又は(4)に記載の音声処理システム。
(6)
前記音声取得部は、全方位に感度を有するマイクロホンである、前記(2)~(5)のいずれか一項に記載の音声処理システム。
(7)
前記音声取得部は、感度が全方位に一様又は略一様なマイクロホンである、前記(6)に記載の音声処理システム。
(8)
前記音声処理システムは、前記音声取得部により取得された複数の音声データを用いて、前記ユーザの口の方向からの音声を取得するための指向性を形成するビームフォーミング処理を行う制御部をさらに備える、前記(2)~(7)のいずれか一項に記載の音声処理システム。
(9)
前記ビームフォーミング処理は、音の到来する領域ごとに強調の度合を変化させる処理である、前記(8)に記載の音声処理システム。
(10)
前記ビームフォーミング処理は、特定の領域から到来する音を抑圧する処理を含む、前記(9)に記載の音声処理システム。
(11)
前記制御部は、雑音発生源と前記音声取得部との位置関係に基づいて前記ビームフォーミング処理を制御する、前記(8)~(10)のいずれか一項に記載の音声処理システム。
(12)
前記制御部は、前記ユーザ以外の話者の位置に基づいて前記ビームフォーミング処理を制御する、前記(8)~(11)のいずれか一項に記載の音声処理システム。
(13)
前記制御部は、前記ユーザの状態を示す情報に基づいて前記ビームフォーミング処理を制御する、前記(8)~(12)のいずれか一項に記載の音声処理システム。
(14)
前記制御部は、前記ビームフォーミング処理を行った音声データに基づいて実行された音声認識の結果に応じた処理をする、前記(8)~(13)のいずれか一項に記載の音声処理システム。
(15)
前記制御部は、前記音声認識の結果に基づいて前記音声処理システムの動作を制御する、前記(14)に記載の音声処理システム。
(16)
前記制御部は、前記音声認識の結果に基づいて前記指向性を制御する、前記(15)に記載の音声処理システム。
(17)
前記装着部は、前記制御部を有する、前記(8)~(16)のいずれか一項に記載の音声処理システム。
(18)
前記装着部は、前記ユーザの首回りに装着される、前記(2)~(17)のいずれか一項に記載の音声処理システム。
110 音声取得部
120 撮像部
130 操作部
140 センサ部
150 通信部
160 制御部
Claims (18)
- ユーザに装着される装着部を備え、
前記装着部は、ビームフォーミングのための音声データを取得する音声取得部を少なくとも3つ有する、音声処理システム。 - 前記装着部は、少なくとも4つの前記音声取得部を有し、
4つの前記音声取得部が設けられる位置をそれぞれ結んで形成される形状は立体である、請求項1に記載の音声処理システム。 - 前記装着部が前記ユーザに装着された状態で、前記4つの音声取得部に含まれる第1の音声取得部と前記ユーザの口との距離と、前記4つの音声取得部に含まれる第2の音声取得部と前記ユーザの口との距離と、を異ならせて設けられる、請求項1に記載の音声処理システム。
- 前記装着部が前記ユーザに装着された状態で、
前記第1の音声取得部は、他の前記音声取得部と比較して最も前記ユーザの口に近い位置に設けられ、
前記第2の音声取得部は、前記ユーザの直立姿勢における前記第1の音声取得部より前記ユーザの足側に設けられる、請求項3に記載の音声処理システム。 - 前記第1の音声取得部及び前記第2の音声取得部は、前記ユーザの直立姿勢における前記ユーザの口より足側に設けられる、請求項3に記載の音声処理システム。
- 前記音声取得部は、全方位に感度を有するマイクロホンである、請求項2に記載の音声処理システム。
- 前記音声取得部は、感度が全方位に一様又は略一様なマイクロホンである、請求項6に記載の音声処理システム。
- 前記音声処理システムは、前記音声取得部により取得された複数の音声データを用いて、前記ユーザの口の方向からの音声を取得するための指向性を形成するビームフォーミング処理を行う制御部をさらに備える、請求項2に記載の音声処理システム。
- 前記ビームフォーミング処理は、音の到来する領域ごとに強調の度合を変化させる処理である、請求項8に記載の音声処理システム。
- 前記ビームフォーミング処理は、特定の領域から到来する音を抑圧する処理を含む、請求項9に記載の音声処理システム。
- 前記制御部は、雑音発生源と前記音声取得部との位置関係に基づいて前記ビームフォーミング処理を制御する、請求項8に記載の音声処理システム。
- 前記制御部は、前記ユーザ以外の話者の位置に基づいて前記ビームフォーミング処理を制御する、請求項8に記載の音声処理システム。
- 前記制御部は、前記ユーザの状態を示す情報に基づいて前記ビームフォーミング処理を制御する、請求項8に記載の音声処理システム。
- 前記制御部は、前記ビームフォーミング処理を行った音声データに基づいて実行された音声認識の結果に応じた処理をする、請求項8に記載の音声処理システム。
- 前記制御部は、前記音声認識の結果に基づいて前記音声処理システムの動作を制御する、請求項14に記載の音声処理システム。
- 前記制御部は、前記音声認識の結果に基づいて前記指向性を制御する、請求項15に記載の音声処理システム。
- 前記装着部は、前記制御部を有する、請求項8に記載の音声処理システム。
- 前記装着部は、前記ユーザの首回りに装着される、請求項2に記載の音声処理システム。
Priority Applications (6)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2016555104A JP6503559B2 (ja) | 2014-10-20 | 2015-07-13 | 音声処理システム |
EP15852448.8A EP3211918B1 (en) | 2014-10-20 | 2015-07-13 | Voice processing system |
EP18186728.4A EP3413583A1 (en) | 2014-10-20 | 2015-07-13 | Voice processing system |
US15/504,063 US10306359B2 (en) | 2014-10-20 | 2015-07-13 | Voice processing system |
US16/012,473 US10674258B2 (en) | 2014-10-20 | 2018-06-19 | Voice processing system |
US16/816,479 US11172292B2 (en) | 2014-10-20 | 2020-03-12 | Voice processing system |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2014-213496 | 2014-10-20 | ||
JP2014213496 | 2014-10-20 |
Related Child Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US15/504,063 A-371-Of-International US10306359B2 (en) | 2014-10-20 | 2015-07-13 | Voice processing system |
US16/012,473 Continuation US10674258B2 (en) | 2014-10-20 | 2018-06-19 | Voice processing system |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2016063587A1 true WO2016063587A1 (ja) | 2016-04-28 |
Family
ID=55760637
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2015/070040 WO2016063587A1 (ja) | 2014-10-20 | 2015-07-13 | 音声処理システム |
Country Status (5)
Country | Link |
---|---|
US (3) | US10306359B2 (ja) |
EP (2) | EP3211918B1 (ja) |
JP (2) | JP6503559B2 (ja) |
CN (3) | CN205508399U (ja) |
WO (1) | WO2016063587A1 (ja) |
Cited By (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2018007256A (ja) * | 2016-07-04 | 2018-01-11 | イーエム−テック・カンパニー・リミテッドEM−TECH.Co.,Ltd. | オーディオフォーカシング機能を備える音声増幅装置 |
WO2018051663A1 (ja) * | 2016-09-13 | 2018-03-22 | ソニー株式会社 | 音源位置推定装置及びウェアラブルデバイス |
WO2018116678A1 (ja) | 2016-12-22 | 2018-06-28 | ソニー株式会社 | 情報処理装置及びその制御方法 |
WO2018216339A1 (ja) | 2017-05-23 | 2018-11-29 | ソニー株式会社 | 情報処理装置及びその制御方法、並びに記録媒体 |
JPWO2018207453A1 (ja) * | 2017-05-08 | 2020-03-12 | ソニー株式会社 | 情報処理装置 |
JP2021082904A (ja) * | 2019-11-15 | 2021-05-27 | Fairy Devices株式会社 | 首掛け型装置 |
JP2021083079A (ja) * | 2019-11-20 | 2021-05-27 | ダイキン工業株式会社 | 遠隔作業支援システム |
JP2021527853A (ja) * | 2018-06-21 | 2021-10-14 | マジック リープ, インコーポレイテッドMagic Leap,Inc. | ウェアラブルシステム発話処理 |
JP2022533391A (ja) * | 2019-05-22 | 2022-07-22 | ソロズ・テクノロジー・リミテッド | 眼鏡デバイス、システム、装置、および方法のためのマイク配置 |
US11642431B2 (en) | 2017-12-08 | 2023-05-09 | Sony Corporation | Information processing apparatus, control method of the same, and recording medium |
US11790935B2 (en) | 2019-08-07 | 2023-10-17 | Magic Leap, Inc. | Voice onset detection |
US11854550B2 (en) | 2019-03-01 | 2023-12-26 | Magic Leap, Inc. | Determining input for speech processing engine |
US11917384B2 (en) | 2020-03-27 | 2024-02-27 | Magic Leap, Inc. | Method of waking a device using spoken voice commands |
Families Citing this family (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2017027397A2 (en) * | 2015-08-07 | 2017-02-16 | Cirrus Logic International Semiconductor, Ltd. | Event detection for playback management in an audio device |
WO2017060828A1 (en) | 2015-10-08 | 2017-04-13 | Cordio Medical Ltd. | Assessment of a pulmonary condition by speech analysis |
JP1589838S (ja) * | 2017-01-20 | 2017-11-06 | ||
JP6766086B2 (ja) | 2017-09-28 | 2020-10-07 | キヤノン株式会社 | 撮像装置およびその制御方法 |
CN108235165B (zh) * | 2017-12-13 | 2020-09-15 | 安克创新科技股份有限公司 | 一种麦克风颈环耳机 |
CN108235164B (zh) * | 2017-12-13 | 2020-09-15 | 安克创新科技股份有限公司 | 一种麦克风颈环耳机 |
JP2019186630A (ja) * | 2018-04-03 | 2019-10-24 | キヤノン株式会社 | 撮像装置及びその制御方法及びプログラム |
US11172293B2 (en) * | 2018-07-11 | 2021-11-09 | Ambiq Micro, Inc. | Power efficient context-based audio processing |
US10847177B2 (en) * | 2018-10-11 | 2020-11-24 | Cordio Medical Ltd. | Estimating lung volume by speech analysis |
US11024327B2 (en) | 2019-03-12 | 2021-06-01 | Cordio Medical Ltd. | Diagnostic techniques based on speech models |
US11011188B2 (en) | 2019-03-12 | 2021-05-18 | Cordio Medical Ltd. | Diagnostic techniques based on speech-sample alignment |
JP7408414B2 (ja) * | 2020-01-27 | 2024-01-05 | シャープ株式会社 | ウェアラブルマイクスピーカ |
US11484211B2 (en) | 2020-03-03 | 2022-11-01 | Cordio Medical Ltd. | Diagnosis of medical conditions using voice recordings and auscultation |
US11417342B2 (en) | 2020-06-29 | 2022-08-16 | Cordio Medical Ltd. | Synthesizing patient-specific speech models |
JP6786139B1 (ja) | 2020-07-06 | 2020-11-18 | Fairy Devices株式会社 | 音声入力装置 |
JP2022119582A (ja) * | 2021-02-04 | 2022-08-17 | 株式会社日立エルジーデータストレージ | 音声取得装置および音声取得方法 |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2001095098A (ja) * | 1999-09-27 | 2001-04-06 | Harada Denshi Kogyo Kk | 体感補聴器 |
JP2009017083A (ja) * | 2007-07-03 | 2009-01-22 | Data Bank Commerce:Kk | ノイズキャンセル装置 |
JP2009514312A (ja) * | 2005-11-01 | 2009-04-02 | コーニンクレッカ フィリップス エレクトロニクス エヌ ヴィ | 音響追跡手段を備える補聴器 |
JP2012133250A (ja) * | 2010-12-24 | 2012-07-12 | Sony Corp | 音情報表示装置、音情報表示方法およびプログラム |
JP2012524505A (ja) * | 2010-02-18 | 2012-10-11 | クゥアルコム・インコーポレイテッド | ロバストな雑音低減のためのマイクロフォンアレイサブセット選択 |
Family Cites Families (28)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH0631835Y2 (ja) | 1988-06-27 | 1994-08-22 | 株式会社オーディオテクニカ | 咽喉マイクロホンの保持装置 |
US5793875A (en) * | 1996-04-22 | 1998-08-11 | Cardinal Sound Labs, Inc. | Directional hearing system |
US6091832A (en) * | 1996-08-12 | 2000-07-18 | Interval Research Corporation | Wearable personal audio loop apparatus |
US7099821B2 (en) * | 2003-09-12 | 2006-08-29 | Softmax, Inc. | Separation of target acoustic signals in a multi-transducer arrangement |
JP2005303574A (ja) | 2004-04-09 | 2005-10-27 | Toshiba Corp | 音声認識ヘッドセット |
US7522738B2 (en) * | 2005-11-30 | 2009-04-21 | Otologics, Llc | Dual feedback control system for implantable hearing instrument |
CN101390440B (zh) * | 2006-02-27 | 2012-10-10 | 松下电器产业株式会社 | 可穿戴终端、控制可穿戴终端的处理器及方法 |
SE530137C2 (sv) | 2006-06-27 | 2008-03-11 | Bo Franzen | Headset med strupmikrofon i kombination med en mot hörselgången ljudtätande öronhögtalare |
JP5401760B2 (ja) * | 2007-02-05 | 2014-01-29 | ソニー株式会社 | ヘッドフォン装置、音声再生システム、音声再生方法 |
US10226234B2 (en) * | 2011-12-01 | 2019-03-12 | Maui Imaging, Inc. | Motion detection using ping-based and multiple aperture doppler ultrasound |
JP2010187218A (ja) * | 2009-02-12 | 2010-08-26 | Sony Corp | 制御装置、制御方法及び制御プログラム |
BR112013012539B1 (pt) | 2010-11-24 | 2021-05-18 | Koninklijke Philips N.V. | método para operar um dispositivo e dispositivo |
WO2012083989A1 (en) * | 2010-12-22 | 2012-06-28 | Sony Ericsson Mobile Communications Ab | Method of controlling audio recording and electronic device |
US9711127B2 (en) * | 2011-09-19 | 2017-07-18 | Bitwave Pte Ltd. | Multi-sensor signal optimization for speech communication |
CN102393520B (zh) * | 2011-09-26 | 2013-07-31 | 哈尔滨工程大学 | 基于目标回波多普勒特性的声纳运动目标成像方法 |
JP5772448B2 (ja) * | 2011-09-27 | 2015-09-02 | 富士ゼロックス株式会社 | 音声解析システムおよび音声解析装置 |
CN103024629B (zh) * | 2011-09-30 | 2017-04-12 | 斯凯普公司 | 处理信号 |
JP6031761B2 (ja) | 2011-12-28 | 2016-11-24 | 富士ゼロックス株式会社 | 音声解析装置および音声解析システム |
JP2013191996A (ja) | 2012-03-13 | 2013-09-26 | Seiko Epson Corp | 音響装置 |
EP2736272A1 (en) * | 2012-11-22 | 2014-05-28 | ETH Zurich | Wearable microphone array apparatus |
CN202998463U (zh) * | 2012-12-11 | 2013-06-12 | 启通科技有限公司 | 一种挂颈式助听器 |
RU2520184C1 (ru) * | 2012-12-28 | 2014-06-20 | Алексей Леонидович УШАКОВ | Гарнитура для мобильного электронного устройства |
CN103135092A (zh) * | 2013-02-05 | 2013-06-05 | 中国科学院上海微系统与信息技术研究所 | 一种微孔径声阵列运动目标定向方法 |
US9525938B2 (en) * | 2013-02-06 | 2016-12-20 | Apple Inc. | User voice location estimation for adjusting portable device beamforming settings |
US10229697B2 (en) * | 2013-03-12 | 2019-03-12 | Google Technology Holdings LLC | Apparatus and method for beamforming to obtain voice and noise signals |
JP6375362B2 (ja) * | 2013-03-13 | 2018-08-15 | コピン コーポレーション | 雑音キャンセリングマイクロホン装置 |
US9343053B2 (en) * | 2013-05-13 | 2016-05-17 | Sound In Motion | Adding audio sound effects to movies |
EP2840807A1 (en) * | 2013-08-19 | 2015-02-25 | Oticon A/s | External microphone array and hearing aid using it |
-
2015
- 2015-07-13 JP JP2016555104A patent/JP6503559B2/ja active Active
- 2015-07-13 EP EP15852448.8A patent/EP3211918B1/en active Active
- 2015-07-13 EP EP18186728.4A patent/EP3413583A1/en active Pending
- 2015-07-13 WO PCT/JP2015/070040 patent/WO2016063587A1/ja active Application Filing
- 2015-07-13 US US15/504,063 patent/US10306359B2/en active Active
- 2015-09-23 CN CN201520742860.2U patent/CN205508399U/zh active Active
- 2015-09-23 CN CN201510612564.5A patent/CN105529033B/zh active Active
- 2015-09-23 CN CN201810681854.9A patent/CN108683972B/zh active Active
-
2018
- 2018-06-19 US US16/012,473 patent/US10674258B2/en active Active
-
2019
- 2019-03-04 JP JP2019038212A patent/JP6747538B2/ja active Active
-
2020
- 2020-03-12 US US16/816,479 patent/US11172292B2/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2001095098A (ja) * | 1999-09-27 | 2001-04-06 | Harada Denshi Kogyo Kk | 体感補聴器 |
JP2009514312A (ja) * | 2005-11-01 | 2009-04-02 | コーニンクレッカ フィリップス エレクトロニクス エヌ ヴィ | 音響追跡手段を備える補聴器 |
JP2009017083A (ja) * | 2007-07-03 | 2009-01-22 | Data Bank Commerce:Kk | ノイズキャンセル装置 |
JP2012524505A (ja) * | 2010-02-18 | 2012-10-11 | クゥアルコム・インコーポレイテッド | ロバストな雑音低減のためのマイクロフォンアレイサブセット選択 |
JP2012133250A (ja) * | 2010-12-24 | 2012-07-12 | Sony Corp | 音情報表示装置、音情報表示方法およびプログラム |
Non-Patent Citations (1)
Title |
---|
See also references of EP3211918A4 * |
Cited By (23)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2018007256A (ja) * | 2016-07-04 | 2018-01-11 | イーエム−テック・カンパニー・リミテッドEM−TECH.Co.,Ltd. | オーディオフォーカシング機能を備える音声増幅装置 |
WO2018051663A1 (ja) * | 2016-09-13 | 2018-03-22 | ソニー株式会社 | 音源位置推定装置及びウェアラブルデバイス |
US11402461B2 (en) | 2016-09-13 | 2022-08-02 | Sony Corporation | Sound source position estimation device and wearable device |
JPWO2018051663A1 (ja) * | 2016-09-13 | 2019-06-24 | ソニー株式会社 | 音源位置推定装置及びウェアラブルデバイス |
WO2018116678A1 (ja) | 2016-12-22 | 2018-06-28 | ソニー株式会社 | 情報処理装置及びその制御方法 |
JP7103353B2 (ja) | 2017-05-08 | 2022-07-20 | ソニーグループ株式会社 | 情報処理装置 |
JPWO2018207453A1 (ja) * | 2017-05-08 | 2020-03-12 | ソニー株式会社 | 情報処理装置 |
US11468884B2 (en) | 2017-05-08 | 2022-10-11 | Sony Corporation | Method, apparatus and computer program for detecting voice uttered from a particular position |
US10869124B2 (en) | 2017-05-23 | 2020-12-15 | Sony Corporation | Information processing apparatus, control method, and recording medium |
WO2018216339A1 (ja) | 2017-05-23 | 2018-11-29 | ソニー株式会社 | 情報処理装置及びその制御方法、並びに記録媒体 |
US11642431B2 (en) | 2017-12-08 | 2023-05-09 | Sony Corporation | Information processing apparatus, control method of the same, and recording medium |
JP2021527853A (ja) * | 2018-06-21 | 2021-10-14 | マジック リープ, インコーポレイテッドMagic Leap,Inc. | ウェアラブルシステム発話処理 |
US11854566B2 (en) | 2018-06-21 | 2023-12-26 | Magic Leap, Inc. | Wearable system speech processing |
JP7419270B2 (ja) | 2018-06-21 | 2024-01-22 | マジック リープ, インコーポレイテッド | ウェアラブルシステム発話処理 |
US11854550B2 (en) | 2019-03-01 | 2023-12-26 | Magic Leap, Inc. | Determining input for speech processing engine |
JP2022533391A (ja) * | 2019-05-22 | 2022-07-22 | ソロズ・テクノロジー・リミテッド | 眼鏡デバイス、システム、装置、および方法のためのマイク配置 |
JP7350092B2 (ja) | 2019-05-22 | 2023-09-25 | ソロズ・テクノロジー・リミテッド | 眼鏡デバイス、システム、装置、および方法のためのマイク配置 |
US11790935B2 (en) | 2019-08-07 | 2023-10-17 | Magic Leap, Inc. | Voice onset detection |
US12094489B2 (en) | 2019-08-07 | 2024-09-17 | Magic Leap, Inc. | Voice onset detection |
JP2021082904A (ja) * | 2019-11-15 | 2021-05-27 | Fairy Devices株式会社 | 首掛け型装置 |
JP2021083079A (ja) * | 2019-11-20 | 2021-05-27 | ダイキン工業株式会社 | 遠隔作業支援システム |
JP7270154B2 (ja) | 2019-11-20 | 2023-05-10 | ダイキン工業株式会社 | 遠隔作業支援システム |
US11917384B2 (en) | 2020-03-27 | 2024-02-27 | Magic Leap, Inc. | Method of waking a device using spoken voice commands |
Also Published As
Publication number | Publication date |
---|---|
CN108683972B (zh) | 2022-08-09 |
EP3211918A1 (en) | 2017-08-30 |
JP6747538B2 (ja) | 2020-08-26 |
US20170280239A1 (en) | 2017-09-28 |
US11172292B2 (en) | 2021-11-09 |
US10306359B2 (en) | 2019-05-28 |
EP3211918A4 (en) | 2018-05-09 |
CN205508399U (zh) | 2016-08-24 |
CN105529033B (zh) | 2020-11-10 |
CN108683972A (zh) | 2018-10-19 |
CN105529033A (zh) | 2016-04-27 |
US20200213730A1 (en) | 2020-07-02 |
JP6503559B2 (ja) | 2019-04-24 |
US10674258B2 (en) | 2020-06-02 |
JP2019134441A (ja) | 2019-08-08 |
JPWO2016063587A1 (ja) | 2017-08-03 |
US20180317005A1 (en) | 2018-11-01 |
EP3413583A1 (en) | 2018-12-12 |
EP3211918B1 (en) | 2021-08-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP6747538B2 (ja) | 情報処理装置 | |
KR102378762B1 (ko) | 지향성의 사운드 변형 | |
US10257637B2 (en) | Shoulder-mounted robotic speakers | |
US20160249141A1 (en) | System and method for improving hearing | |
JPWO2019053993A1 (ja) | 音響処理装置及び音響処理方法 | |
JP2019054337A (ja) | イヤホン装置、ヘッドホン装置及び方法 | |
CN113393856A (zh) | 拾音方法、装置和电子设备 | |
US11785389B2 (en) | Dynamic adjustment of earbud performance characteristics | |
JP2023514462A (ja) | 眼鏡フレーム内に一体化可能な補聴システム | |
JP7065353B2 (ja) | ヘッドマウントディスプレイ及びその制御方法 | |
KR20230112688A (ko) | 마이크로폰 빔 스티어링이 있는 머리-착용형 컴퓨팅 장치 | |
US20230036986A1 (en) | Processing of audio signals from multiple microphones | |
TW202314478A (zh) | 音訊事件資料處理 | |
TW202314684A (zh) | 對來自多個麥克風的音訊信號的處理 | |
US11689878B2 (en) | Audio adjustment based on user electrical signals | |
CN118020314A (zh) | 音频事件数据处理 | |
CN118020313A (zh) | 处理来自多个麦克风的音频信号 | |
CN117499837A (zh) | 音频处理方法、装置以及音频播放设备 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 15852448 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 2016555104 Country of ref document: JP Kind code of ref document: A |
|
WWE | Wipo information: entry into national phase |
Ref document number: 15504063 Country of ref document: US |
|
REEP | Request for entry into the european phase |
Ref document number: 2015852448 Country of ref document: EP |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2015852448 Country of ref document: EP |
|
NENP | Non-entry into the national phase |
Ref country code: DE |