JPWO2021230180A5

JPWO2021230180A5 -

Info

Publication number: JPWO2021230180A5
Application number: JP2022521892A
Authority: JP
Filing date: 2021-05-10
Publication date: 2024-05-21

Claims

A means for acquiring sounds collected by a plurality of microphones is provided,
means for estimating a direction of arrival of the acquired sound;
A means for extracting a sound corresponding to the estimated arrival direction by a beamforming process based on the estimated arrival direction,
means for generating a text image corresponding to the extracted speech;
means for determining a presentation manner of the text image by referring to the estimated direction of arrival;
means for presenting the text image in the determined presentation manner;
Information processing device.

2. The information processing device according to claim 1, wherein when the arrival direction estimating means estimates a plurality of arrival directions, the means for extracting the voice extracts the voice corresponding to each of the plurality of arrival directions by beamforming processing based on the plurality of arrival directions.

The information processing apparatus according to claim 1 , wherein the means for determining a presentation mode determines a presentation mode such that the text image is presented at a position according to the estimated arrival direction.

The information processing device according to claim 3 , wherein the position according to the arrival direction is a position according to an angle in the left-right direction between a predetermined axial direction and the arrival direction, and is a position in a predetermined elevation angle direction.

The information processing apparatus according to claim 1 , wherein the means for determining the presentation mode determines the presentation mode so as to present the text image in a format including at least one of a character string and a symbol corresponding to the estimated arrival direction.

A means for estimating speaker attributes by analyzing the acquired speech,
6. The information processing apparatus according to claim 1, wherein the means for determining the presentation manner determines the presentation manner of the text image by referring to the estimated speaker attribute.

a means for acquiring, by a sensor, a sensing signal relating to an area where sound is collected by the plurality of microphones;
The information processing apparatus according to claim 1 , wherein the means for determining the presentation mode determines the presentation mode of the text image by referring to the acquired sensing signal.

the sensing signal is an imaging signal obtained by imaging the area using an image sensor;
The information processing device according to claim 7 .

a means for acquiring an imaging signal obtained by imaging the area,
means for converting the acquired photographic signal into a photographic image;
8. The information processing apparatus according to claim 1, wherein the means for presenting the text image presents the text image by superimposing it on the photographed image.

A means for estimating speaker attributes by analyzing the photographed signal,
The information processing apparatus according to claim 8 , wherein the means for determining the presentation manner determines the presentation manner of the text image by referring to the estimated speaker attribute.

A means for extracting a speech sound uttered by a person from the acquired voice,
The means for estimating the direction of arrival estimates the direction of arrival of the extracted voice,
the means for generating a text image generates a text image corresponding to the extracted voice.
11. The information processing device according to claim 1 .

A means for acquiring sounds collected by a plurality of microphones is provided,
means for estimating a direction of arrival of the acquired sound;
A means for extracting a sound corresponding to the estimated arrival direction by a beamforming process based on the estimated arrival direction,
means for generating a text image corresponding to the extracted speech;
means for determining a presentation manner of the text image by referring to the estimated direction of arrival;
a display for presenting the text image in the determined presentation manner.
Display device.

The display device of claim 12 , wherein the display device is at least one of a glasses-type display device, a mobile terminal, and a conference system.

14. The display device of claim 12 or 13 , wherein the display device is a retinal projection display device.

means for communicating with a microphone module disposed separately from the display device;
the means for acquiring the sound acquires the sound picked up by the plurality of microphones included in the microphone module via the means for communicating;
A display device according to any one of claims 12 to 14.

A program for causing a computer to realize the means according to any one of claims 1 to 15 .

A method for presenting an image corresponding to a sound, comprising:
Acquiring sounds collected by a plurality of microphones;
estimating a direction of arrival of the captured sound;
Extracting a sound corresponding to the estimated arrival direction by a beamforming process based on the estimated arrival direction,
generating a text image corresponding to the extracted speech;
determining a presentation manner of the text image with reference to the estimated direction of arrival;
presenting the text image in the determined presentation manner.
Method.