JP2017097330A5

JP2017097330A5 -

Info

Publication number: JP2017097330A5
Application number: JP2016151383A
Authority: JP
Filing date: 2016-08-01
Publication date: 2017-07-13
Anticipated expiration: 2036-08-01

Claims

A speech recognition method,
A reference value determining step for determining a reference value for determining the length of the first silent section included in the processing section;
A processing mode determination step for determining a processing mode to be used according to the reference value from a plurality of processing modes of voice processing having different processing amounts from each other;
End-of-speech determination step of acquiring speech information of the processing section including the target section and the first silent section after the target section from the speech information of the input section including the processing section using the reference value. When,
A voice processing step of performing voice processing in the determined processing mode on the voice information of the target section among the voice information of the processing section;
Look containing a voice recognition step of executing speech recognition processing on the audio information of the target section sound processing is performed,
In the reference value determining step, as the reference value, information for determining the end of the processing section, a threshold value indicating the length of the first silent section is determined,
In the processing mode determination step, the processing mode is determined based on the threshold value,
The speech recognition method further includes:
A detection step of detecting a silent section from the voice information of the input section,
In the end of speech determination step, the time when the length of the silent section exceeds the threshold is determined as the end of the processing section, so that the voice information of the processing section is extracted from the voice information of the input section. ,
The voice processing is noise suppression processing of the voice information,
The speech recognition method of determining a noise suppression algorithm or a noise suppression parameter as the processing mode in the processing mode determination step .

A speech recognition method,
A reference value determining step for determining a reference value for determining the length of the first silent section included in the processing section;
A processing mode determination step for determining a processing mode to be used according to the reference value from a plurality of processing modes of voice processing having different processing amounts from each other;
End-of-speech determination step of acquiring speech information of the processing section including the target section and the first silent section after the target section from the speech information of the input section including the processing section using the reference value. When,
A voice processing step of performing voice processing in the determined processing mode on the voice information of the target section among the voice information of the processing section;
A voice recognition step of performing voice recognition processing on voice information of the target section on which the voice processing has been executed,
In the reference value determining step, as the reference value, information for determining the end of the processing section, a threshold value indicating the length of the first silent section is determined,
In the processing mode determination step, the processing mode is determined based on the threshold value,
The speech recognition method further includes:
A detection step of detecting a silent section from the voice information of the input section,
In the call termination determination step, a time length of the silence interval exceeds the threshold value, by determining that the termination of the processing section, extracts audio information of the processing section from the voice information of the input section ,
The audio processing is encoding processing of the audio information,
In the processing mode determination step, an encoding algorithm or an encoding parameter is determined as the processing mode,
The speech recognition method further includes:
A transmission step of transmitting voice information encoded as the voice processing to a voice recognition device;
A decoding step of decoding the transmitted voice information in the voice recognition device,
In the voice recognition step, the voice recognition processing is executed on the decoded voice information by the voice recognition device.
Voice recognition method.

3. The speech recognition method according to claim 1, wherein in the speech processing step, a silent section included in the target section is removed, and the speech processing is executed on the speech information of the target section from which the silent section is removed.

The speech recognition method further includes:
A processing time measuring step of measuring a processing time of the voice processing in the determined processing mode;
Based on the measured the processing time, the speech recognition method according to any one of claims 1 to 3 including a processing mode changing step of changing the processing mode of the speech processing.

Wherein as the audio processing in the audio information of the target section for the time length of the processing section in the audio processing step is performed, wherein the processing mode determination step any of claims 1-4, wherein the processing mode is determined The speech recognition method according to claim 1.

A speech recognition device,
A reference value determining unit that determines a reference value for determining the length of the first silent section included in the processing section;
A processing mode determination unit that determines a processing mode to be used from a plurality of processing modes of voice processing having different processing amounts according to the reference value;
A voice acquisition unit that acquires voice information of an input section including the processing section;
From the voice information of the input section, using the reference value, an end-of-speech determination unit that acquires voice information of the processing section including the target section and the first silent section after the target section;
A voice processing unit that performs voice processing in the determined processing mode on voice information of the target section among voice information of the processing section;
A speech recognition unit that performs speech recognition processing on the speech information of the target section on which the speech processing has been performed.
The reference value determination unit is information for determining the end of the processing section as the reference value, and determines a threshold value indicating the length of the first silent section;
The processing mode determination unit determines the processing mode based on the threshold value,
The voice recognition device further includes:
A detection unit for detecting a silent section from the voice information of the input section;
The end speech determination unit extracts the audio information of the processing section from the audio information of the input section by determining that the time when the length of the silent section exceeds the threshold is the end of the processing section. ,
The voice processing is noise suppression processing of the voice information,
The processing mode determination unit is a speech recognition apparatus that determines a noise suppression algorithm or a noise suppression parameter as the processing mode .

  A speech recognition device,
  A reference value determining unit that determines a reference value for determining the length of the first silent section included in the processing section;
  A processing mode determination unit that determines a processing mode to be used from a plurality of processing modes of voice processing having different processing amounts according to the reference value;
  End speech determination unit for acquiring speech information of the processing section including the target section and the first silent section after the target section from the speech information of the input section including the processing section, using the reference value. When,
  A voice processing unit that performs voice processing in the determined processing mode on voice information of the target section among voice information of the processing section;
  A voice recognition unit that performs voice recognition processing on voice information of the target section on which the voice processing has been executed,
  The reference value determination unit is information for determining the end of the processing section as the reference value, and determines a threshold value indicating the length of the first silent section;
  The processing mode determination unit determines the processing mode based on the threshold value,
  The voice recognition device further includes:
  A detection unit for detecting a silent section from the voice information of the input section;
  The end speech determination unit extracts the audio information of the processing section from the audio information of the input section by determining that the time when the length of the silent section exceeds the threshold is the end of the processing section. ,
  The audio processing is encoding processing of the audio information,
  The processing mode determination unit determines an encoding algorithm or an encoding parameter as the processing mode,
  The voice recognition device further includes:
  A decoding unit that decodes audio information that has been encoded as the audio processing;
  The voice recognition unit performs the voice recognition process on the decoded voice information.
  Voice recognition device.

A program for causing a computer to execute the speech recognition method according to claim 1.

A method for causing a computer to execute the speech recognition method according to claim 2.
program.