JP3942831B2 - Voice communication terminal and voice communication system - Google Patents

Voice communication terminal and voice communication system Download PDF

Info

Publication number
JP3942831B2
JP3942831B2 JP2001029757A JP2001029757A JP3942831B2 JP 3942831 B2 JP3942831 B2 JP 3942831B2 JP 2001029757 A JP2001029757 A JP 2001029757A JP 2001029757 A JP2001029757 A JP 2001029757A JP 3942831 B2 JP3942831 B2 JP 3942831B2
Authority
JP
Japan
Prior art keywords
speech
code
voice
means
output
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
JP2001029757A
Other languages
Japanese (ja)
Other versions
JP2002229595A (en
Inventor
裕久 田崎
Original Assignee
三菱電機株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 三菱電機株式会社 filed Critical 三菱電機株式会社
Priority to JP2001029757A priority Critical patent/JP3942831B2/en
Publication of JP2002229595A publication Critical patent/JP2002229595A/en
Application granted granted Critical
Publication of JP3942831B2 publication Critical patent/JP3942831B2/en
Application status is Active legal-status Critical
Anticipated expiration legal-status Critical

Links

Images

Abstract

PROBLEM TO BE SOLVED: To solve the problem of the conventional voice communication terminal that the telephone call is made difficult by the echo of the self-terminal speech delayed greatly when the speech of the self-terminal is superimposed on the speech of other terminal, and the signal input, speech processing and outputting of the resultant speech are carried out. SOLUTION: The voice communication terminal receives a speech and speech codes from separate terminals, respectively, encodes the inputted speech to obtain output speech codes, and decodes the speech codes to output the resultant speech from separate terminals. The voice communication terminal is provided with a correction means which corrects either inputted speech codes, decoded speech or the operation of a speech decoding means and reduces the amplitude of the portion ascribed to the inputted speech contained in the output speech on the basis of the similarity of the speech codes obtained by encoding the inputted speech and inputted speech codes.

Description

[0001]
BACKGROUND OF THE INVENTION
The present invention relates to an audio communication terminal which receives an input voice and an input voice code, and outputs an output voice code obtained by encoding the input voice and an output voice obtained by decoding the input voice code. The present invention also relates to a voice communication system to which this voice communication terminal is applied. In particular, each voice communication terminal listens to the contents of a one-to-one call or a conference call by delivering input voices input from a plurality of voice communication terminals to each voice communication terminal via a single low-speed line. And a voice communication terminal suitable for the voice communication system.
[0002]
[Prior art]
As such a conventional voice communication terminal and voice communication system, there is a configuration in which two voices are superimposed on an analog voice signal and the superimposed signal is transmitted through one line. For example, when a call is made between the speaker A and the speaker B, the received voice signal of the speaker B is superimposed on the voice signal of the speaker A at the voice communication terminal A on the speaker A side, There is a method using a method of transmitting the superimposed audio signal to the audio communication terminal B on the speaker B side and to another terminal.
[0003]
FIG. 8 shows the overall configuration of such a conventional voice communication system.
In the figure, 1 is a voice communication terminal A used by a speaker A, 2 is a voice communication terminal B used by a speaker B, and 3 is a voice communication terminal C used when other related members monitor. .
Hereinafter, the voice communication terminal A, the voice communication terminal B, and the voice communication terminal C are simply referred to as terminal A, terminal B, and terminal C.
[0004]
The terminal A demodulates the modulation signal B from the terminal B to produce the output voice A, and the voice demodulator 4 superimposes the input voice A uttered by the speaker A and the output voice A to generate the superimposed voice. 5. Analog modulation means 6 for modulating the superimposed sound is provided.
[0005]
The terminal B is provided with an analog modulation means 7 for modulating the input speech B uttered by the speaker B and an analog demodulation means 8 for demodulating the modulation signal A from the terminal A.
Further, the terminal C is provided with analog demodulation means 9 for demodulating the modulation signal A from the terminal A.
[0006]
Reference numeral 10 denotes a line A for transmitting the modulated signal A from the terminal A, and reference numeral 11 denotes a line B for transmitting the modulated signal B from the terminal B.
The operation of this conventional voice communication system will be described below with reference to the drawings.
[0007]
A voice uttered by the speaker A is input to the terminal A as an input voice A. Also, a modulation signal B is input from a terminal B described later via the line B. The analog demodulator 4 demodulates the modulated signal B and outputs the obtained audio signal as output audio A. The sound superimposing means 5 generates a superimposed sound in which the input sound A and the output sound A are superimposed, and inputs this to the analog modulation means 6. The analog modulation means 6 modulates the input superimposed voice and outputs the obtained modulated signal to the line A as a modulated signal A.
[0008]
A voice uttered by the speaker B is input to the terminal B as the input voice B. Further, the modulation signal A is input from the terminal A through the line A. The analog modulation means 7 modulates the input voice B and outputs the obtained modulated signal as a modulated signal B to the line B. The analog demodulator 8 demodulates the modulated signal A and outputs the obtained audio signal as output audio B.
The modulation signal A is input from the terminal A to the terminal C via the line A. The analog demodulator 9 demodulates the modulated signal A and outputs the obtained audio signal as output audio C.
[0009]
By configuring as described above, when the speaker A and the speaker B are making a call using the terminal A and the terminal B, respectively, it is possible to intercept the call between the speaker A and the speaker B at the terminal C. It has become.
[0010]
FIG. 9 shows an improvement in line utilization efficiency such as the introduction of data communication in the above-described conventional voice communication system, quality improvement in areas where radio waves are weak in combination with an error correction code, and secrecy by combination with encryption. It is a figure which shows another structure which aimed at digitization of the system for ensuring.
In the figure, 1 is a terminal A used by a speaker A, 2 is a terminal B used by a speaker B, and 3 is a terminal C used when other related members monitor.
[0011]
The terminal A demodulates the modulated signal B from the terminal B via the line B11, the speech decoding means 13 for decoding the speech code B obtained by the demodulation means 12, and the output of the speech decoding means 13 as a digital signal. D / A conversion means 14 for outputting analog output (DA conversion) and output voice A as an analog signal; input voice A uttered by speaker A and output voice A of DA conversion means 14 are superimposed Speech superimposing means 15, analog-to-digital conversion (A / D conversion) A / D conversion means 16, speech encoding means 17 for encoding the output of the A / D conversion means 16, speech encoding means 17 Is modulated digitally and output to the line A10.
[0012]
The terminal B includes a demodulator 20 for demodulating the modulated signal A from the line A10, a voice decoder 21 for decoding the output of the demodulator 20, and a digital / analog converter (DA converter) for the output of the voice decoder 21. And D-A conversion means 22 for outputting the output sound B which is an analog signal.
[0013]
Further, the terminal B performs analog-digital conversion (AD conversion) on the input speech B uttered by the speaker B, and speech encoding means 24 for encoding the output of the A-D conversion means 23. Further, there is provided modulation means 25 for digitally modulating the output of the voice encoding means 24 and outputting the result to the line B11.
[0014]
The terminal C also includes a demodulator 26 that demodulates the modulated signal A from the line A10, a voice decoder 27 that decodes the output of the demodulator 26, and a digital / analog converter (D−) for the output of the voice decoder 27. A-D conversion means 28 for outputting output sound B, which is an analog signal, is provided.
[0015]
Next, the operation of this other conventional voice communication system will be described.
The voice uttered by the speaker A is input to the terminal A as the input voice A, and the modulation signal B is input from the terminal B described later via the line B. The demodulating means 12 demodulates the modulated signal B and outputs the obtained speech code B to the speech decoding means 13. The voice decoding unit 13 decodes the voice code B and outputs the obtained digital output voice A to the DA conversion unit 14. The DA converter 14 performs digital / analog conversion (DA conversion) on the digital output sound A and outputs an output sound A which is an analog signal.
[0016]
In the terminal A, a superimposed voice in which the input voice A and the output voice A are superimposed is generated by the voice superimposing means 15 and input to the A-D conversion means 16. The A-D conversion means 16 performs analog / digital conversion (A-D conversion) on the input superimposed voice to obtain a digital input voice A which is a digital signal. The voice encoding unit 17 encodes the digital input voice A and outputs the obtained voice code A to the modulation unit 18. The modulation means 18 digitally modulates the input voice code A, and outputs the obtained modulation signal to the line A as the modulation signal A.
[0017]
A voice uttered by the speaker B is input to the terminal B as the input voice B. Further, the modulation signal A is input from the terminal A through the line A. The A-D conversion means 23 performs analog / digital conversion (A-D conversion) on the input voice B to obtain a digital input voice B which is a digital signal. The speech encoding unit 24 encodes the digital input speech B and outputs the obtained speech code B to the modulation unit 25. The modulation means 25 digitally modulates the input speech code A and outputs the obtained modulated signal as a modulated signal B to the line B.
[0018]
The demodulating means 20 in the terminal B demodulates the modulated signal A and outputs the obtained speech code A to the speech decoding means 21. The voice decoding means 21 decodes the voice code A and outputs the obtained digital output voice B to the DA conversion means 22. The DA conversion means 22 performs digital / analog conversion (DA conversion) on the digital output sound B and outputs an output sound B which is an analog signal.
[0019]
The modulation signal A is input from the terminal A to the terminal C via the line A. The demodulating means 26 demodulates the modulated signal A and outputs the obtained speech code A to the speech decoding means 27. The voice decoding means 27 decodes the voice code A and outputs the obtained digital output voice C to the DA conversion means 28. The DA conversion means 28 performs digital / analog conversion (DA conversion) on the digital output sound C and outputs an output sound C which is an analog signal.
[0020]
[Problems to be solved by the invention]
The conventional voice communication system in which the above digitization is performed and the voice communication terminal constituting the same have the following problems.
[0021]
In many cases, speech coding and speech decoding at a low bit rate are performed in units of a predetermined time frame of about 10 to 40 ms. Until the input voice B returns to the terminal B via the terminal A and is included in the output voice B and output, normally, a delay time of 6 times or more of this frame, that is, several hundred ms is generated.
[0022]
For this reason, the speaker B must speak while listening to his / her voice delayed by several hundred ms as a large-amplitude echo with greatly deteriorated quality, and there is a problem that the call becomes extremely difficult. . Note that it is very difficult to talk while listening to one's own voice with a delay of about 500 ms, and it is known that many speakers stop talking.
[0023]
In addition, the voice of speakers A and B can be heard on all terminals, but the input voice B uttered by speaker B is output from terminals B and C via terminal A. By the time, the voice coding means 24 and 17 and two voice codings are performed, so that there is a problem that the quality is greatly deteriorated.
[0024]
When trying to improve line utilization efficiency by digitization, the bit rate in speech coding is set low, so that quality degradation due to speech coding per time becomes large, and sound quality that is very hard to hear when coding twice There is a problem that deteriorates.
[0025]
Furthermore, in a normal two-way call, speaker A and speaker B often speak at the same time, so-called double talk, but the low bit rate speech coding method is a single speech. Since efficient information compression is performed using a model for utterance, the encoding quality in the audio encoding means 17 in the terminal A is poor at the time of double talk, and in both cases, both speakers are speaking. There is a problem that the content cannot be heard.
[0026]
The present invention has been made in order to solve such a problem, and it is difficult to make a call due to a greatly delayed echo, the quality is greatly deteriorated by two encodings, and a double-talk speech is encoded and greatly increased. The purpose is to avoid degrading the quality.
[0027]
[Means for Solving the Problems]
  The voice communication terminal according to the present invention inputs a voice and a voice code from separate terminals, encodes the input voice into an output voice code, and decodes the voice code as an output voice from a separate terminal. Output,
  A voice encoding means for encoding the input voice and outputting the obtained voice code as an output voice code;
  Audio decoding means for decoding the input audio code and outputting the obtained decoded audio as output audio;
  Storage means for storing a predetermined number of output speech codes encoded by the speech encoding means;
  Similarity evaluation means for evaluating the similarity between the output speech code stored in the storage means and the input speech code, and outputting the obtained similarity;
  The input speech code or the decoded speech or the operation of the speech decoding means is modified based on the similarity of the similarity evaluation means, and the input included in the output speech output from the terminal Correction means for reducing the amplitude of the part caused by the voice,
The similarity evaluation means uses the number of matching bits or the bit matching rate between the speech codes to be compared as the similarity.
[0028]
Further, the voice communication terminal according to the present invention, the correction means, a gain value control means for determining a gain value to be multiplied by the decoded voice based on the similarity of the similarity evaluation means,
Multiplication means for multiplying the decoded speech by the gain value output by the gain value control means and outputting the obtained result as an output speech signal;
Is provided.
[0029]
  Further, the speech communication terminal according to the present invention includes: a determination unit that determines whether the correction unit corrects the input speech code based on the similarity of the similarity evaluation unit;
  When the determination means determines that correction is to be performed, a speech code obtained by replacing the input speech code with a predetermined code is output, and when the determination means determines that correction is not to be performed, the input speech code is output as it is. Code replacement means,
  The speech decoding unit decodes the speech code output from the code replacement unit and outputs the decoded speech obtained as an output sound.Voice andAnd was configured to output.
[0030]
In the speech communication terminal according to the present invention, the speech code replaced by the code replacement means is a fixed speech code capable of decoding low-amplitude decoded speech.
[0031]
In the voice communication terminal according to the present invention, when the voice code is composed of an information code representing voice information and an error correction code thereof, and the code replacement means determines that the determination means performs correction, the voice code The decoding means replaces the voice code that determines that the error exceeding the correction limit is superimposed on the error correction code,
The speech decoding means is configured to perform decoding so that the amplitude of the decoded speech is sequentially reduced when an error superposition exceeding a correction limit is detected.
[0032]
Further, the voice communication terminal according to the present invention includes: a determination unit that determines whether the correction unit corrects the operation of the voice decoding unit based on the similarity;
A control means for outputting a signal that gives a correction to a gain value or an error detection flag obtained in the decoding process of the speech decoding means when the determination means decides to make a correction;
The speech decoding means is configured to correct the gain value or error detection flag obtained in the decoding process and perform decoding in accordance with the signal from the control means.
[0033]
In the voice communication terminal according to the present invention, each voice code is constituted by an information code representing voice information and an error correction code thereof,
The similarity in the similarity evaluation unit is an error correction of the information code in the output speech code stored in the storage unit and the information code in the input speech code by the error correction code in the input speech code Evaluation was made with respect to the information code after error correction.
[0036]
A voice communication system according to the present invention includes a first voice communication terminal having a configuration described in any of paragraph numbers [0027] to [0034],
One of a speech code obtained by inputting speech and encoding the speech by speech encoding means every frame for a certain period of time and an output speech code output by the first speech communication terminal are selected by a predetermined criterion. A second voice communication terminal that selects and outputs the output voice code output from the first voice communication terminal by a voice decoding means.
[0038]
DETAILED DESCRIPTION OF THE INVENTION
Embodiments of the present invention will be described below with reference to the drawings.
[0039]
Embodiment 1 FIG.
FIG. 1 shows the configuration of a voice communication system according to the present invention. In the figure, the voice communication terminal B corresponds to the voice communication terminal according to the present invention.
In the figure, 1 is a terminal A used by a speaker A, 2 is a terminal B used by a speaker B, 3 is a terminal C used when other related members monitor, 10 is a line A, 11 is Line B.
[0040]
In the terminal A, 12 is demodulation means, 13 is speech decoding means, 14 is DA conversion means, 16 is AD conversion means, 17 is speech encoding means, 18 is modulation means, and these are shown in FIG. This is similar to the conventional voice communication system shown.
[0041]
Reference numeral 30 denotes a selection means for selecting one of the outputs of the speech encoding means 17 and the demodulation means 12 in accordance with a predetermined selection criterion and outputting the selected output to the modulation means 18. The speech superimposing means in the conventional speech communication system shown in FIG. This is provided instead of 15.
[0042]
In the terminal B, 20 is demodulation means, 21 is speech decoding means, 22 is DA conversion means, 23 is AD conversion means, 24 is speech encoding means, 25 is modulation means, and these are shown in FIG. This is similar to the conventional voice communication system shown.
[0043]
31 is a storage means having a memory capable of storing N frames of the speech code B input from the speech encoding means 24, 32 is a comparison of the speech code A ′ from the demodulation means 20 and the speech code B of the storage means 31; Similarity evaluation means 33 for evaluating the similarity, a gain value control means 34 for outputting a gain value based on the similarity from the similarity evaluation means 18, and a gain value for the decoded speech output from the speech decoding means 12. This is correction means formed from multiplication means 21 that multiplies the gain value from the control means 20 and outputs the obtained result to the DA conversion means 13 as digital output speech B. These are the conventional means shown in FIG. It is additionally provided as compared with the terminal B of the voice communication system.
[0044]
The terminal C includes a demodulator 26, a voice decoder 27, and a DA converter 28 as in the conventional voice communication system shown in FIG.
[0045]
Hereinafter, the operation will be described with reference to the drawings.
A voice uttered by the speaker A is input to the terminal A as an input voice A. Also, a modulation signal B is input from a terminal B described later via the line B. The demodulator 12 demodulates the modulated signal B and outputs the obtained speech code B to the speech decoder 13 and the selector 30.
[0046]
The voice decoding unit 13 decodes the voice code B and outputs the obtained digital output voice A to the DA conversion unit 14. The DA converter 14 performs digital / analog conversion (DA conversion) on the digital output sound A and outputs an output sound A which is an analog signal.
[0047]
The A / D conversion means 16 performs analog / digital conversion (A / D conversion) on the input voice A to obtain a digital input voice A which is a digital signal. The voice encoding unit 17 encodes the digital input voice A and outputs the obtained voice code A to the selection unit 30. The selection unit 30 selects one of the input speech code A and speech code B according to a predetermined selection criterion, and outputs the selected speech code to the modulation unit 18 as a speech code A ′.
[0048]
As an example of the selection criterion, there is a method of actually decoding the speech code A and the speech code B, comparing the amplitudes of the two obtained decoded speech, and selecting the larger one. Then, the modulation means 18 digitally modulates the voice code A ′ input from the selection means 30 and outputs the obtained modulation signal as the modulation signal A to the line A10.
[0049]
A voice uttered by the speaker B is input to the terminal B as the input voice B. Further, the modulation signal A is input from the terminal A through the line A. The A-D conversion means 23 performs analog / digital conversion (A-D conversion) on the input voice B to obtain a digital input voice B which is a digital signal. The speech encoding unit 24 encodes the digital input speech B and outputs the obtained speech code B to the modulation unit 25 and the storage unit 31. The modulation means 25 digitally modulates the input speech code B and outputs the obtained modulated signal as a modulated signal B to the line B11.
[0050]
The demodulating means 20 in the terminal B demodulates the modulated signal A and outputs the obtained speech code A ′ to the speech decoding means 21 and the similarity evaluation means 32. The storage means 31 stores therein the speech code B input from the speech encoding means 24 for N frames, and outputs a part or all thereof to the similarity evaluation means 32.
[0051]
Note that there is a memory that can store the speech code B for N frames in the storage unit 31, and after the speech code is output to the similarity evaluation unit 32, the speech code B before N frames is stored. The stored contents are updated by overwriting the voice code B of the current frame at a certain location. Note that the configuration and the updating method in the storage unit 31 are not limited to this as long as the speech code B for N frames can be stored.
[0052]
Here, the N frames are stored in order to absorb the delay until the voice code B returns to the terminal B via the terminal A, so the value of N is larger than the number of delay frames assumed. Must be set.
[0053]
Also, depending on the type of line and the configuration of terminal A, the number of delay frames may not be constant. At that time, it is necessary to output a plurality (partially or all) of the stored speech code B.
[0054]
The similarity evaluation unit 32 compares the speech code A ′ input from the demodulation unit 20 with one or more speech codes B input from the storage unit 31, evaluates the similarity in order, and determines the maximum among them. The value is output to the gain value control means 34 in the correction means 33. As the similarity used here, the number of bit matches or the bit match rate of two speech codes can be used.
[0055]
The gain value control means 34 determines that the speech code B has returned via the terminal A when the similarity input from the similarity evaluation means 32 exceeds a predetermined threshold value, and has a small gain value less than 1. Is output to the multiplication means 35. When the similarity input from the similarity evaluation unit 32 is equal to or less than a predetermined threshold, a gain value of 1 is output to the multiplication unit 35.
[0056]
Note that if the gain value suddenly changes from 1 to 1 or vice versa, discontinuity will occur in the output result of the multiplication means 35 described later. Is decreased or increased so as to shift to a small value or 1.
[0057]
Also, the predetermined threshold value may always be a fixed value, but if the similarity level of the previous frame exceeds the predetermined threshold value, a smaller value is used, and conversely, the similarity level of the previous frame is equal to or lower than the predetermined threshold value. By using a larger value, it is possible to suppress frequent changes in the gain value.
Furthermore, it is possible to suppress a frequent change in the gain value for each frame with reference to past control results.
[0058]
The voice decoding unit 21 decodes the input voice code A ′ and outputs the obtained decoded voice to the multiplication unit 35.
The multiplication unit 35 multiplies each sample value of the decoded speech input from the speech decoding unit 21 by the gain value for each sample input from the gain value control unit 34, and digitally outputs the obtained result. The output sound B is output to the DA conversion means 22. The DA conversion means 22 performs digital / analog conversion (DA conversion) on the digital output sound B and outputs an output sound B which is an analog signal.
[0059]
The voice communication terminal C receives the modulation signal A from the terminal A via the line A. The demodulating means 26 demodulates the modulated signal A and outputs the obtained speech code A ′ to the speech decoding means 27. The speech decoding unit 27 decodes the speech code A ′ and outputs the obtained digital output speech C to the DA conversion unit 28. The DA conversion means 28 performs digital / analog conversion (DA conversion) on the digital output sound C and outputs an output sound C which is an analog signal.
[0060]
In the above embodiment, the multiplication means 35 always performs multiplication of the gain value. However, when the gain value is 1, the multiplication means 35 does not change the result. The multiplication may be stopped and the decoded speech may be used as the digital output speech B as it is.
[0061]
In the above embodiment, the configuration in which a wireless line, an analog modem transmission line, or the like is used as the communication path has been described. However, a configuration using other types of lines such as ATM is also possible. At that time, the configuration of the modulation means and the demodulation means is changed according to the type of line.
[0062]
Also, normally, the voice communication terminal C has the same configuration as the voice communication terminal B, and when the voice communication terminal A and the voice communication terminal C are making a call, the memory in the voice communication terminal B is stored. Since no voice code is stored in the means, a gain value of 1 is always input to the multiplying means 35 and a call between the voice communication terminal A and the voice communication terminal C is output as the output voice B. Of course, a configuration including a plurality of voice communication terminals having the same configuration as the voice communication terminal B and the voice communication terminal C is also possible.
[0063]
According to the first embodiment, the storage means for storing a predetermined number of the speech codes B and the similarity between the speech code B and the speech code A ′ stored in the storage means are evaluated and obtained. The similarity evaluation means for outputting the similarity, and the amplitude of the portion caused by the input speech B included in the output speech B that is finally output by correcting the decoded speech based on at least the similarity Since the correction means for reducing the size is provided, there is an effect of obtaining a voice communication terminal that avoids difficulty in making a call due to a greatly delayed echo.
[0064]
Further, the correction means includes gain value control means for determining a gain value to be multiplied with the decoded speech based on at least the similarity, and multiplication means for multiplying the decoded speech with the gain value output from the gain value control means. As a result, it is possible to reduce the amplitude of the portion caused by the input voice B included in the output voice B that is finally output, and avoid the difficulty in making a call due to a greatly delayed echo. There is an effect that a voice communication terminal can be obtained.
[0065]
Further, since the number of matching bits or the bit matching rate between the speech codes to be compared is used as the similarity in the similarity evaluation means, the past even if a bit error is slightly superimposed on the transmission path, Thus, it is possible to correctly detect that the voice code B output to is returned as the voice code A ′, and there is an effect that it is possible to obtain a voice communication terminal that avoids difficulty in making a call due to a greatly delayed echo.
[0066]
Further, a storage means for storing a predetermined number of the speech codes B, and a similarity for evaluating the similarity between the speech code B and the speech code A ′ stored in the storage means and outputting the obtained similarity Evaluation means and correction means for correcting the decoded speech based on at least the similarity so that the amplitude of the portion caused by the input speech B included in the output speech B that is finally output is reduced In addition to the first voice communication terminal (voice communication terminal B) provided with the above, the output voice code output by the first voice communication terminal is decoded and output, and for each frame of a certain time, the second The second voice communication terminal (voice communication terminal) that selects and outputs one of the voice code obtained by encoding the input voice (input voice A) and the output voice code output by the first voice communication terminal A) and so two times You can avoid large quality degradation due-coding, there is an effect that the voice communication system obtained a call by echoes greater delay can be avoided to become difficult.
[0067]
Furthermore, since only the speech code obtained by encoding the input speech by the speaker A or the speaker B is decoded, it is possible to obtain a speech communication system that avoids a significant deterioration in quality due to the encoding of the double talk speech. is there.
[0068]
Embodiment 2. FIG.
FIG. 2 shows another configuration of the voice communication terminal according to the present invention. Note that this voice communication terminal is the voice communication terminal B of FIG. 1, which is the overall configuration of the voice communication system. In the figure, the same reference numerals as those in FIG. 33 is a correction means, a determination means 36 for determining whether or not the similarity from the similarity evaluation means 32 exceeds a predetermined threshold, and a code for performing a replacement process on the speech code A ′ from the demodulation means 20 based on the determination result The replacement unit 37 is configured.
[0069]
Hereinafter, the operation will be described with reference to the drawings.
The voice communicated by the speaker B is input to the voice communication terminal B as the input voice B. Also, the modulation signal A is input from the voice communication terminal A via the line A. The A-D conversion means 23 performs analog / digital conversion (A-D conversion) on the input voice B to obtain a digital input voice B which is a digital signal. The speech encoding unit 24 encodes the digital input speech B and outputs the obtained speech code B to the modulation unit 25 and the storage unit 31. The modulation means 25 digitally modulates the input speech code B and outputs the obtained modulated signal as a modulated signal B to the line B.
[0070]
The demodulating means 20 in the voice communication terminal B demodulates the modulated signal A and outputs the obtained voice code A ′ to the similarity evaluating means 32 and the code replacing means 37 in the correcting means 33. The storage means 31 stores therein the speech code B input from the speech encoding means 24 for N frames, and outputs a part or all thereof to the similarity evaluation means 32.
[0071]
The similarity evaluation unit 32 compares the speech code A ′ input from the demodulation unit 20 with one or more speech codes B input from the storage unit 31, evaluates the similarity in order, and determines the maximum among them. The value is output to the determination unit 36 in the correction unit 33. As the similarity used here, the number of bit matches or the bit match rate of two speech codes can be used.
[0072]
The determination unit 36 determines whether the similarity input from the similarity evaluation unit 32 exceeds a predetermined threshold, and outputs the determination result to the code replacement unit 37. Note that this predetermined threshold may be a fixed value or adaptively controlled as in the gain value control unit 34 of the first embodiment.
[0073]
The code replacement unit 37 performs the replacement process on the speech code A ′ input from the demodulation unit 20 only when the determination result indicates that the similarity exceeds a predetermined threshold, and the obtained speech The code A ″ is output to the speech decoding means 21. If the determination result indicates that the similarity is equal to or less than a predetermined threshold, the speech code A ′ input from the demodulation means 20 is directly used as the speech code A. "To the speech decoding means 21.
[0074]
FIG. 3 is a diagram for explaining an example of replacement processing in the code replacement unit 37. In the figure, (a) is the speech code (speech code A ′) output from the demodulating means 20, and (b) is the speech code (speech code A ″) replaced by the code replacing means 37. In this example, The code replacement means 37 replaces only the portion of the gain code a (0110011) representing the gain information in the speech code with a fixed gain code b (0000000), and can generate a small amplitude decoded speech in advance. By searching for the gain code and using the gain code as a fixed gain code used for replacement, the amplitude of the decoded speech can be reduced by replacement, for example, the speech coding system is a general CELP system. In this case, the amplitude of the decoded speech is made very small by replacing it with a gain code that has an adaptive excitation gain of almost 0 and a very small value of the driving excitation gain. It is possible.
[0075]
Note that the replacement code is not limited to the gain code, and a configuration in which a part or all of the speech code is replaced may be used as long as the amplitude of the decoded speech can be finally reduced. Depending on the speech coding method, there may be no gain code. At that time, a code related to amplitude such as a code related to power may be replaced.
[0076]
The speech decoding means 21 decodes the input speech code A ″ and outputs the obtained decoded speech as the digital output speech B to the DA conversion means 22. The DA conversion means 22 is the digital output. Digital / analog conversion (DA conversion) is performed on the voice B to output an output voice B which is an analog signal.
[0077]
As in the first embodiment, a configuration using other types of lines such as ATM is also possible. Also, normally, the voice communication terminal C has the same configuration as the voice communication terminal B, and when the voice communication terminal A and the voice communication terminal C are making a call, the memory in the voice communication terminal B is stored. Since no voice code is stored in the means, the call between the voice communication terminal A and the voice communication terminal C is output as the output voice B without being replaced. Of course, a configuration including a plurality of voice communication terminals having the same configuration as the voice communication terminal B and the voice communication terminal C is also possible.
[0078]
According to the second embodiment, the storage means for storing a predetermined number of the speech codes B and the similarity between the speech code B and the speech code A ′ stored in the storage means are evaluated and obtained. Similarity evaluation means for outputting a similarity, correction of the speech code A ′ based on at least the similarity, and a portion caused by the input speech B included in the output speech B that is finally output Since the correction means for reducing the amplitude is provided, there is an effect of obtaining a voice communication terminal that avoids difficulty in making a call due to a greatly delayed echo.
[0079]
In addition, when the correction unit determines whether or not to correct the speech code A ′ based on at least the similarity, and when the determination unit determines to perform correction, the speech code A Code replacement for outputting a speech code A "in which a part or all of 'is replaced with a predetermined code and outputting the input speech code as it is as a speech code A" when the determination means decides not to perform correction Since the means is provided, it is possible to perform code substitution so that the amplitude of the part caused by the input speech B included in the output speech B that is finally output is decoded, and a greatly delayed echo As a result, it is possible to obtain a voice communication terminal that avoids difficulty in calling.
[0080]
Further, since the number of matching bits or the bit matching rate between the speech codes to be compared is used as the similarity in the similarity evaluation means, the past even if a bit error is slightly superimposed on the transmission path, Thus, it is possible to correctly detect that the voice code B output to is returned as the voice code A ′, and there is an effect that it is possible to obtain a voice communication terminal that avoids difficulty in making a call due to a greatly delayed echo.
[0081]
Further, a storage means for storing a predetermined number of the speech codes B, and a similarity for evaluating the similarity between the speech code B and the speech code A ′ stored in the storage means and outputting the obtained similarity Based on the evaluation means and at least the similarity, the speech code A ′ is corrected so that the amplitude of the portion caused by the input speech B included in the output speech B that is finally output becomes small. In addition to the first voice communication terminal (voice communication terminal B) provided with the correction means, the output voice code output by the first voice communication terminal is decoded and output, and for each frame of a certain time, A second voice communication terminal (voice) that selects and outputs one of the voice code obtained by encoding the second input voice (input voice A) and the output voice code output by the first voice communication terminal. Communication terminal A). Of avoiding a large quality degradation due to encoding, the effect of the voice communication system is obtained which can avoid that the call by the echoes increased delay is difficult. Furthermore, since only the speech code obtained by encoding the input speech by the speaker A or the speaker B is decoded, it is possible to obtain a speech communication system that avoids a significant deterioration in quality due to the encoding of the double talk speech. is there.
[0082]
Embodiment 3 FIG.
The third embodiment is an example applied when the speech code is configured by an information code that purely represents speech information and its error correction code.
[0083]
In general, a voice code transmitted through a wireless line is composed of an information code that purely represents voice information and an error correction code thereof. Regarding information codes representing speech information, since the importance of each bit is biased, it is often the case that a predetermined number of bits having high importance are collected to calculate an error correction code for this.
[0084]
The configuration of the voice communication terminal in the third embodiment is the same as that shown in FIG. 2 showing the second embodiment, but the voice code is composed of an information code that purely represents voice information and its error correction code. Therefore, the internal configurations of the speech encoding unit 24, the speech decoding unit 21, and the code replacement unit 37 are different.
[0085]
FIG. 4 is a diagram for explaining another example of the substitution process of the code substitution unit 37 in the voice communication terminal according to the third embodiment. In the figure, (a) is a speech code (speech code A ′) output from the demodulating means 20, and (b) is a speech code (speech code A ″) replaced by the code replacing means 37.
[0086]
The voice encoding means 24 generates an information code that purely represents the voice information, collects only the most important bits among them, calculates an error correction code for this, and combines the information code and the error correction code. Is output as a voice code.
[0087]
The voice decoding means 21 performs error correction processing of the information code using the error correction code in the voice code, and decodes the information code after error correction.
Further, the speech decoding means 21 determines whether or not an error exceeding the error correction limit is superimposed, generates an error detection flag indicating the determination result, and indicates that this flag has exceeded the correction limit. In some cases, decoding is performed by discarding the information code (an important predetermined number of bits) to be corrected and replacing it with the value of the previous frame.
[0088]
Further, when the flag indicating the error superposition exceeding the correction limit continues, control is performed so that the amplitude of the decoded speech is gradually reduced. By doing so, it is possible to realize an effective error tolerance improvement within a limited amount of transmission information.
[0089]
The code replacement unit 37 replaces the speech code A ′ input from the demodulation unit 20 only when the determination unit 36 indicates that the similarity from the similarity evaluation unit 32 exceeds a predetermined threshold. And the obtained speech code A ″ is output to the speech decoding means 21. This replacement is performed by replacing the error correction code in the speech code and the information code C to be corrected, as shown in FIG. The error correction code in the voice code as shown in (b) and the information code d to be corrected are replaced.
[0090]
This replacement is different from the replacement in the speech decoding means 21 and is used to determine that an error exceeding the correction limit is superimposed in the speech decoding means 21. Specifically, a fixed code for detecting that an error exceeding the error correction limit is superimposed is prepared as a replacement code, and replacement using this is performed.
[0091]
When this replacement is performed, the speech decoding unit 21 performs decoding by discarding the information code (an important predetermined number of bits) to be corrected and replacing it with the value of the previous frame. Further, when the replacement by the code replacement means 37 is continued, the speech decoding means 21 makes corrections so that the amplitude of the decoded speech is gradually reduced.
[0092]
According to the third embodiment, in the configuration shown in FIG. 2, the code replacement means 37 replaces the error correction code included in the speech code A ′ so that it is determined that an error exceeding the correction limit is superimposed. Thus, in addition to the effect of the second embodiment, there is an effect that a smooth decoded sound can be obtained with little sudden change in amplitude.
[0093]
Embodiment 4 FIG.
FIG. 5 shows the configuration of a voice communication terminal according to the fourth embodiment. Note that this voice communication terminal is the voice communication terminal B of FIG. 1, which is the overall configuration of the voice communication system. The same reference numerals as those in FIG. 1 and FIG. 33 is a correcting means for determining whether or not the similarity from the similarity evaluation means 32 exceeds a predetermined threshold, and for the speech decoding means 21 when the similarity exceeds the predetermined threshold by the determination. The control unit 38 outputs a correction coefficient smaller than 1 for multiplying the gain value obtained in the decoding process of the speech decoding unit 21.
[0094]
Hereinafter, the operation will be described with reference to the drawings.
The voice communicated by the speaker B is input to the voice communication terminal B as the input voice B. Also, the modulation signal A is input from the voice communication terminal A via the line A. The A-D conversion means 23 performs analog / digital conversion (A-D conversion) on the input voice B to obtain a digital input voice B which is a digital signal. The speech encoding unit 24 encodes the digital input speech B and outputs the obtained speech code B to the modulation unit 25 and the storage unit 31. The modulation means 25 digitally modulates the input speech code B and outputs the obtained modulated signal as a modulated signal B to the line B.
[0095]
The demodulating means 20 in the voice communication terminal B demodulates the modulated signal A and outputs the obtained voice code A ′ to the voice decoding means 21 and the similarity evaluation means 32. The storage means 31 stores therein the speech code B input from the speech encoding means 24 for N frames, and outputs a part or all thereof to the similarity evaluation means 32.
[0096]
The similarity evaluation unit 32 compares the speech code A ′ input from the demodulation unit 20 with one or more speech codes B input from the storage unit 31, evaluates the similarity in order, and determines the maximum among them. The value is output to the determination unit 36 in the correction unit 33.
The determination unit 36 determines whether the similarity input from the similarity evaluation unit 32 exceeds a predetermined threshold, and outputs the determination result to the control unit 38.
[0097]
Only when the determination result indicates that the degree of similarity exceeds a predetermined threshold, the control unit 38 gives the speech decoding unit 21 the multiplication value obtained in the decoding process. A correction coefficient smaller than 1 is output. It is also possible to suppress discontinuity by controlling the correction coefficient to be reduced stepwise in accordance with the number of consecutive determination results indicating that the degree of similarity exceeds a predetermined threshold.
[0098]
The voice decoding unit 21 decodes the input voice code A ′ and outputs the obtained decoded voice to the DA conversion unit 22 as the digital output voice B. However, when the correction coefficient for the gain value is input from the control means 38, the gain value for the frame or subframe (part of the frame) obtained in the decoding process is multiplied by the correction coefficient, and thereafter The decoding process is performed.
The DA conversion means 22 performs digital / analog conversion (DA conversion) on the digital output sound B and outputs an output sound B which is an analog signal.
[0099]
The control means 38 may output an instruction to correct the error detection flag in the speech decoding means 21 described in the third embodiment, instead of the gain value correction coefficient. .
[0100]
As described above, the error detection flag is information indicating a determination result as to whether or not an error exceeding the correction limit is superimposed on the speech code input to the speech decoding means 21, and the control means 38 provides the error detection flag. Is corrected and is set to a value when an error exceeding the correction limit is superimposed, an error exceeding the correction limit is superimposed in the speech decoding unit 21 as in the third embodiment. Execute the process.
[0101]
Specifically, decoding is performed by discarding the information code (an important predetermined number of bits) to be corrected and replacing it with the value of the previous frame. When the error detection flag is continuously corrected, the speech decoding means 21 performs decoding while correcting the amplitude of the decoded speech so as to gradually decrease.
[0102]
According to the fourth embodiment, the storage means for storing a predetermined number of the speech codes B and the similarity between the speech code B and the speech code A ′ stored in the storage means are evaluated and obtained. Similarity evaluation means for outputting the similarity, and a portion resulting from the input speech B included in the output speech B that is finally output by modifying the operation of the speech decoding means based on the similarity Therefore, there is an effect that a voice communication terminal can be obtained in which it is possible to avoid difficulty in a telephone call due to a greatly delayed echo.
[0103]
And determining means for determining whether or not to correct the operation of the speech decoding means based on at least the similarity, and the decoding when the determining means determines to make corrections. Control means for issuing an instruction to correct the gain value or error detection flag obtained in the process, and the voice decoding means corrects the gain value obtained in the decoding process according to the instruction input from the control means. Since the decoding is performed while correcting to the error detection flag, the operation of the decoding process is performed so that the amplitude of the portion caused by the input speech B included in the output speech B to be finally output is decoded. Thus, there is an effect that a voice communication terminal can be obtained in which a call is difficult due to a greatly delayed echo.
[0104]
Further, a storage means for storing a predetermined number of the speech codes B, and a similarity for evaluating the similarity between the speech code B and the speech code A ′ stored in the storage means and outputting the obtained similarity The operation of the speech decoding means is corrected based on the evaluation means and at least the similarity so that the amplitude of the portion caused by the input speech B included in the output speech B that is finally output is reduced. In addition to the first voice communication terminal (voice communication terminal B) provided with the correcting means, the output voice code output by the first voice communication terminal is decoded and output, and every frame for a certain period of time. And a second voice communication terminal that selects and outputs one of the voice code obtained by encoding the second input voice (input voice A) and the output voice code output by the first voice communication terminal. (Voice communication terminal A) In, can avoid large quality degradation due to two coding, the effect of the voice communication system obtained a call by echoes greater delay can be avoided to become difficult.
[0105]
Furthermore, since only the speech code obtained by encoding the input speech by the speaker A or the speaker B is decoded, it is possible to obtain a speech communication system that avoids a significant deterioration in quality due to the encoding of the double talk speech. is there.
[0106]
In the second embodiment, the output of the speech decoding unit 21 is multiplied by the gain for each sample. However, in this embodiment, the gain value for each frame or subframe in the speech decoding unit 21 is corrected. . The configuration of the second embodiment requires many gain multiplications for each sample and requires a smoothing process. However, since the means is highly independent, there is an advantage that it is not necessary to modify the speech decoding means 21. . This embodiment is advantageous in that the independence of the means is reduced, but the gain correction process is simplified.
[0107]
Further, a storage means for storing a predetermined number of the speech codes B, and a similarity for evaluating the similarity between the speech code B and the speech code A ′ stored in the storage means and outputting the obtained similarity The operation of the speech decoding means is corrected based on the evaluation means and at least the similarity so that the amplitude of the portion caused by the input speech B included in the output speech B that is finally output is reduced. In addition to the first voice communication terminal (voice communication terminal B) provided with the correcting means, the output voice code output by the first voice communication terminal is decoded and output, and every frame for a certain period of time. And a second voice communication terminal that selects and outputs one of the voice code obtained by encoding the second input voice (input voice A) and the output voice code output by the first voice communication terminal. (Voice communication terminal A) In, can avoid large quality degradation due to two coding, the effect of the voice communication system obtained a call by echoes greater delay can be avoided to become difficult.
[0108]
Furthermore, since only the speech code obtained by encoding the input speech by the speaker A or the speaker B is decoded, it is possible to obtain a speech communication system that avoids a significant deterioration in quality due to the encoding of the double talk speech. is there.
[0109]
Embodiment 5 FIG.
As the similarity in the first to fourth embodiments, the number of matching bits or the bit matching rate between the speech codes to be compared is used, but each speech code represents speech information. When the information code and its error correction code are used, the number of bits or the bit match rate for the information code in the voice code B and the information code after the error correction of the voice code A ′ by the error correction code May be used as the similarity.
[0110]
FIG. 6 shows the configuration of a voice communication terminal that operates as described above. Note that this voice communication terminal is the voice communication terminal B of FIG. 1, which is the overall configuration of the voice communication system. The same reference numerals as those in FIG. In the figure, the speech encoding means 24 comprises an encoding means 39 for encoding an information source that purely represents speech information and an error correction encoding means 40 for encoding the error correction information. It comprises error correction decoding means 41 for decoding an error correction code and information source decoding means 42 for decoding an information source code.
[0111]
Hereinafter, the operation will be described with reference to the drawings.
The voice communicated by the speaker B is input to the voice communication terminal B as the input voice B. Also, the modulation signal A is input from the voice communication terminal A via the line A. The A-D conversion means 23 performs analog / digital conversion (A-D conversion) on the input voice B to obtain a digital input voice B which is a digital signal.
[0112]
The information source encoding unit 39 in the speech encoding unit 24 encodes the digital input speech B into a small amount of information, and the obtained information code B is combined with the error correction encoding unit 40 in the speech encoding unit 24. The data is output to the storage unit 31. As a typical example that can be applied to the information source encoding means 39, there is a CELP system.
[0113]
The error correction coding means 40 calculates an error correction code B for all or part of the information code B, and adds the result to the information code B, and outputs the result to the modulation means 25 as a voice code B. . Typical error correction coding methods used here include convolutional coding and CRC coding. When CELP is used for information source coding, usually, a plurality of important bits in information code B are collected, an error detection code (CRC) is added thereto, and a plurality of bits to be subjected to CRC and a CRC code are convolutional codes. A configuration is used.
[0114]
The modulation means 25 digitally modulates the input speech code B and outputs the obtained modulated signal as a modulated signal B to the line B11.
[0115]
The demodulating means 20 in the voice communication terminal B demodulates the modulated signal A and outputs the obtained voice code A ′ to the error correction decoding means 41 in the voice decoding means 21.
The error correction decoding means 41 extracts the error correction code and the bit group protected by the error correction code included in the audio code A ′, executes the error correction decoding process, and performs correction. A combination of the subsequent bit group and the remaining unprotected bits is output as information code A ′ to the information source decoding unit 42 and the similarity evaluation unit 32. When an error detection code such as CRC is included in the error correction code, the error detection result is combined and output to the information source decoding unit 42 and the similarity evaluation unit 32.
[0116]
The storage unit 31 stores therein the information code B input from the information source encoding unit 39 for N frames, and outputs a part or all of the information code B to the similarity evaluation unit 32. The storage means 31 has a memory capable of storing the information code B for N frames. After the information code is output to the similarity evaluation means 32, the information code B before N frames is stored. The stored content is updated by overwriting the information code B of the current frame at a certain position. As long as the information code B for N frames can be stored, the configuration and the updating method in the storage unit 31 are not limited to this.
[0117]
The similarity evaluation unit 32 compares the information code A ′ input from the error correction decoding unit 41 with one or more information codes B input from the storage unit 31 and sequentially evaluates the similarity, The maximum value is output to the gain value control means 34 in the correction means 33. Note that the number of bit matches or the bit match rate of two information codes can be used as the similarity used here.
[0118]
The gain value control unit 34 determines that the voice code B has returned via the voice communication terminal A when the similarity input from the similarity evaluation unit 32 exceeds a predetermined threshold, and is smaller than 1. The gain value is output to the multiplication means 35. When the similarity input from the similarity evaluation unit 32 is equal to or less than a predetermined threshold, a gain value of 1 is output to the multiplication unit 35.
[0119]
The information source decoding unit 42 in the audio decoding unit 21 decodes the input information code A ′ and outputs the obtained decoded audio to the multiplication unit 35. If the error detection result indicates that there is an error, decoding is performed while performing interpolation processing using the information code A ′ of the previous frame or the decoded speech of the previous frame, and the obtained decoded speech is multiplied by means. 35.
[0120]
The multiplication unit 35 multiplies each sample value of the decoded speech input from the information source decoding unit 42 by the gain value for each sample input from the gain value control unit 34, and obtains the obtained result. The digital output sound B is output to the DA conversion means 22. The DA conversion means 22 performs digital / analog conversion (DA conversion) on the digital output sound B and outputs an output sound B which is an analog signal.
[0121]
Of course, in the case of this voice communication system, the voice coding means and the voice decoding means in the other voice communication terminals are all configured in the same way as the voice coding means 24 and the voice decoding means 21. It is necessary.
Here, the speech encoding means in Embodiment 1 shown in FIG. 1 is composed of information source coding means and error correction coding means, and the speech decoding means is composed of error correction decoding means and information source decoding means. Thus, the similarity calculation is performed using the information code, but the configurations of the speech encoding unit and the speech decoding unit in the second to fourth embodiments shown in FIGS. 2 and 5 are the same. By changing to, it is possible to perform similarity calculation using information codes also in these cases.
[0122]
Specifically, in each figure, the part of the voice encoding means is replaced with the information source encoding means, an error correction encoding means is inserted between this and the modulation means 25, and immediately after the demodulation means 20. In this configuration, error correction decoding means is inserted and the audio decoding means 21 is replaced with information source decoding means.
[0123]
According to the fifth embodiment, as the similarity in the similarity evaluation unit, the information code stored in the storage unit and the information code in the speech code A ′ are converted into the error correction code in the speech code A ′. Since the evaluation is performed between the error-corrected information code and the error-corrected information code, the speech code B output in the past is returned as the speech code A ′ even if a bit error is slightly superimposed on the transmission path. Can be detected correctly, and there is an effect that a voice communication terminal can be obtained that avoids difficulty in calling due to a greatly delayed echo.
[0124]
Embodiment 6 FIG.
FIG. 7 shows another configuration of the voice communication system according to the present invention. In the figure, the voice communication terminal B corresponds to the voice communication terminal according to the present invention. The same reference numerals as those in FIG. 1 and FIG. Reference numeral 31 denotes storage means for storing the digital input speech B inputted from the A / D conversion means 23 for N frames. Reference numeral 32 denotes a comparison between the decoded speech from the speech decoding means 21 and the digital input speech B from the storage means 31. The similarity evaluation means for evaluating the similarity.
[0125]
Hereinafter, the operation will be described with reference to the drawings.
The voice uttered by the speaker A is input to the voice communication terminal A as the input voice A. Also, a modulation signal B is input from a voice communication terminal B (described later) via the line B. The demodulating means 12 demodulates the modulated signal B and outputs the obtained speech code B to the speech decoding means 13. The voice decoding unit 13 decodes the voice code B and outputs the obtained digital output voice A to the DA conversion unit 14. The DA converter 14 performs digital / analog conversion (DA conversion) on the digital output sound A and outputs an output sound A which is an analog signal.
[0126]
In the voice communication terminal A, a superimposed voice is generated by superimposing the input voice A and the output voice A and is input to the A / D conversion means 16. The A-D conversion means 16 performs analog / digital conversion (A-D conversion) on the input superimposed voice to obtain a digital input voice A which is a digital signal. The voice encoding unit 17 encodes the digital input voice A and outputs the obtained voice code A to the modulation unit 18. The modulation means 18 digitally modulates the input voice code A and outputs the obtained modulated signal as a modulated signal A to the line A10.
[0127]
The voice communicated by the speaker B is input to the voice communication terminal B as the input voice B. Further, the modulation signal A is input from the voice communication terminal A through the line A10. The A-D conversion means 23 performs analog-digital conversion (A-D conversion) on the input speech B to generate a digital input speech B which is a digital signal, which is converted into a speech encoding means 24 and a storage means. 31 is output. The speech encoding unit 24 encodes the digital input speech B and outputs the obtained speech code B to the modulation unit 25. The modulation means 25 digitally modulates the input speech code B and outputs the obtained modulated signal as a modulated signal B to the line B11.
[0128]
The storage means 31 stores therein the digital input speech B inputted from the A-D conversion means 23 for N frames, and outputs part or all of the digital input speech B to the similarity evaluation means 32 in units of frames. . The storage means 31 has a memory capable of storing N frames of digital input speech B. After outputting the digital input speech to the similarity evaluation means 32, the digital input speech B before N frames is output. The stored contents are updated by overwriting the digital input speech B of the current frame in the stored location. If the digital input speech B for N frames can be stored, the configuration in the storage means 31 and the updating method are not limited to this.
[0129]
The demodulating means 20 in the voice communication terminal B demodulates the modulated signal A and outputs the obtained voice code A ′ to the voice decoding means 21. The speech decoding unit 21 decodes the input speech code A ′ and outputs the obtained decoded speech to the similarity evaluation unit 32 and the multiplication unit 35.
[0130]
The similarity evaluation unit 32 compares the decoded speech for one frame input from the speech decoding unit 21 with one or more digital input speech B input in units of frames from the storage unit 31, and determines the similarity. Evaluation is performed in order, and the maximum value is output to the gain value control means 34 in the correction means 33. As the similarity used here, the reciprocal of the vector distance between two voices can be used.
[0131]
The gain value control unit 34 determines that the voice code B has returned via the voice communication terminal A when the similarity input from the similarity evaluation unit 32 exceeds a predetermined threshold, and is smaller than 1. The gain value is output to the multiplication means 35. When the similarity input from the similarity evaluation unit 32 is equal to or less than a predetermined threshold, a gain value of 1 is output to the multiplication unit 35.
[0132]
The multiplication unit 35 multiplies each sample value of the decoded speech input from the speech decoding unit 21 by the gain value for each sample input from the gain value control unit 34, and digitally outputs the obtained result. The output sound B is output to the DA conversion means 22. The DA conversion means 22 performs digital / analog conversion (DA conversion) on the digital output sound B and outputs an output sound B which is an analog signal.
[0133]
In the above embodiment, the multiplication means 35 always performs multiplication of the gain value. However, when the gain value is 1, the multiplication means 35 does not change the result. The multiplication may be stopped and the decoded speech may be used as the digital output speech B as it is.
[0134]
In the above embodiment, the configuration in which a wireless line, an analog modem transmission line, or the like is used as the communication path has been described. However, a configuration using other types of lines such as ATM is also possible. At that time, the configuration of the modulation means and the demodulation means is changed according to the type of line.
[0135]
Also, normally, the voice communication terminal C has the same configuration as the voice communication terminal B, and when the voice communication terminal A and the voice communication terminal C are making a call, the memory in the voice communication terminal B is stored. Since no voice code is stored in the means, a gain value of 1 is always input to the multiplying means 35 and a call between the voice communication terminal A and the voice communication terminal C is output as the output voice B. Of course, a configuration including a plurality of voice communication terminals having the same configuration as the voice communication terminal B and the voice communication terminal C is also possible.
[0136]
According to the sixth embodiment, the storage means for storing the digital input speech B for a predetermined length, the similarity between the digital input speech B stored in the storage means and the decoded speech is evaluated, and the similarity obtained Similarity evaluation means for outputting a degree, gain value control means for determining a gain value to be multiplied by the decoded speech based on at least the similarity, and multiplying the decoded speech by the gain value output by the gain value control means, Since the multiplication means for outputting the obtained result is provided, there is an effect that it is possible to obtain a voice communication terminal that avoids difficulty in calling by a greatly delayed echo.
[0137]
Further, in addition to the voice communication terminal (voice communication terminal B), the voice code output by the voice communication terminal B is decoded and output as the output voice A, and the output voice A and the second input voice (input) The voice signal obtained by adding the voice A) is encoded, and the second voice communication terminal (voice communication terminal A) that outputs the obtained voice code is provided. There is an effect that a voice communication system capable of avoiding difficulty in calling is obtained.
[0138]
【The invention's effect】
As described above, according to the present invention, the voice and the voice code are input from the respective separate terminals, the input voice is encoded to be the output voice code, and the voice code is decoded to be the output voice. Since there is a correction means for reducing the amplitude of the portion caused by the input voice included in the output voice output from the voice communication terminal in the voice communication terminal output from the terminal, a large delay Thus, there is an effect of obtaining a voice communication terminal that avoids difficulty in calling due to the echo.
[0139]
Further, the correction means determines a gain value to be multiplied by the decoded speech based on the similarity between the output speech code and the input speech code, and the decoding is performed on the gain value output by the gain value control means. The multiplication means for multiplying the voice is provided, so that the amplitude of the portion caused by the input voice B included in the output voice B output from the voice communication terminal can be reduced, and the call is caused by the echo greatly delayed. Therefore, there is an effect that a voice communication terminal that avoids the difficulty of being obtained can be obtained.
[0140]
In addition, similarity evaluation means for evaluating the similarity between the output speech code and the input speech code is provided, and the number of matching bits or the bit match rate between the speech codes being compared is used as the similarity. Therefore, even if a bit error is slightly superimposed on the transmission path, it is possible to correctly detect that the speech code B output in the past has returned as the input speech code A ′, and it is difficult to make a call due to a greatly delayed echo. There is an effect of obtaining a voice communication terminal that avoids the situation.
[0141]
According to this invention, the speech code is composed of an information code representing speech information and its error correction code, and the information code of the output speech code and the information code in the input speech code are used as the similarity in the similarity evaluation means. Since the evaluation is performed with the error-corrected information code that has been error-corrected by the error correction code in the same input voice code, even if a bit error is superimposed on the transmission path, the voice code output in the past is It is possible to correctly detect the return as the code ', and there is an effect that a voice communication terminal can be obtained that avoids difficulty in calling due to a greatly delayed echo.
[0142]
According to the voice communication terminal of the present invention, the gain correction value for each frame or subframe in the voice decoding means for decoding the voice code into the output voice is corrected, so that the gain correction process is simplified. There is.
[0143]
According to the present invention, a voice and a voice code are input from separate terminals, the input voice is encoded to be an output voice code, and the voice code is decoded and output as output voice from a separate terminal. And a first voice communication terminal comprising correction means for reducing the amplitude of the portion caused by the input voice included in the output voice output from the terminal, and voice and voice code from separate terminals, respectively. Since the voice communication system is configured with the second voice communication terminal that decodes the voice code and outputs the voice as an output voice from each separate terminal. There is an effect that it is possible to avoid a large quality deterioration due to the encoding twice, and to avoid a difficult call due to a greatly delayed echo.
[0144]
Furthermore, since only the speech code obtained by encoding the input speech by the speaker A of the first speech communication terminal or the speaker B of the second speech communication terminal is decoded, the quality is greatly deteriorated by encoding the double talk speech. There is an effect of obtaining a voice communication system that avoids this.
[Brief description of the drawings]
FIG. 1 is a configuration diagram of a voice communication system according to Embodiment 1 of the present invention;
FIG. 2 is a block diagram of a voice communication terminal according to Embodiment 2 of the present invention.
FIG. 3 is a diagram for explaining an example of substitution processing in a code substitution means according to Embodiment 2 of the present invention.
FIG. 4 is a diagram for explaining an example of code substitution means according to Embodiment 3 of the present invention.
FIG. 5 is a block diagram of a voice communication terminal according to Embodiment 4 of the present invention.
FIG. 6 is a block diagram of a voice communication terminal according to Embodiment 5 of the present invention.
FIG. 7 is a configuration diagram of a voice communication system according to a sixth embodiment of the present invention.
FIG. 8 is an overall configuration diagram of a conventional voice communication system.
FIG. 9 is a configuration diagram of a conventional voice communication system in which the system is digitized.
[Explanation of symbols]
1, 2, 3: voice communication terminal, 12: demodulation means, 13: voice decoding means, 14: DA conversion means, 15: voice superimposition means, 16: AD conversion means, 17: voice coding means , 18: modulation means, 20: demodulation means, 21: speech decoding means, 23: A-D conversion means, 24: speech encoding means, 25: modulation means, 30: selection means, 31: storage means, 32: Similarity evaluation means, 33: correction means, 34: gain value control means, 35: multiplication means, 36: determination means, 37: code replacement means, 38: control means, 39: information source coding means, 40: error correction Encoding means, 41: error correction decoding means, 42: information source decoding means.

Claims (8)

  1. A voice communication terminal that inputs a voice and a voice code from respective separate terminals, encodes the input voice into an output voice code, and decodes the voice code and outputs it as an output voice from each separate terminal,
    A voice encoding means for encoding the input voice and outputting the obtained voice code as an output voice code;
    Audio decoding means for decoding the input audio code and outputting the obtained decoded audio as output audio;
    Storage means for storing a predetermined number of output speech codes encoded by the speech encoding means;
    Similarity evaluation means for evaluating the similarity between the output speech code stored in the storage means and the input speech code, and outputting the obtained similarity;
    The input speech code or the decoded speech or the operation of the speech decoding means is modified based on the similarity of the similarity evaluation means, and the input included in the output speech output from the terminal Correction means for reducing the amplitude of the part caused by the voice ,
    The speech communication terminal characterized in that the similarity evaluation means uses the number of matching bits or the bit match rate between speech codes to be compared as the similarity .
  2. The correcting means is
    Gain value control means for determining a gain value to be multiplied by the decoded speech based on the similarity of the similarity evaluation means;
    2. The voice communication terminal according to claim 1, further comprising multiplication means for multiplying the decoded voice by the gain value output by the gain value control means and outputting the obtained result as an output voice signal.
  3. The correcting means is
    Determining means for determining whether to modify the input speech code based on the similarity of the similarity evaluation means;
    When the determination means determines that correction is to be performed, a speech code obtained by replacing the input speech code with a predetermined code is output, and when the determination means determines that correction is not to be performed, the input speech code is output as it is. Code replacement means,
    The voice communication terminal according to claim 1, wherein the voice decoding unit is configured to decode the voice code output from the code replacement unit and output the obtained decoded voice as output voice.
  4.   4. The voice communication terminal according to claim 3, wherein the voice code replaced by the code replacing means is a fixed voice code capable of decoding a low-amplitude decoded voice.
  5. The speech code is composed of an information code representing speech information and its error correction code. When the code replacement means determines that the determination means performs correction, the speech decoding means sets a correction limit in the error correction code. It is configured to replace it with a speech code that determines that an excess error has been superimposed,
    4. The voice communication terminal according to claim 3, wherein the voice decoding means is configured to perform decoding so that the amplitude of the decoded voice is sequentially reduced when an error superposition exceeding a correction limit is detected.
  6. The correcting means is
    Determining means for determining whether to modify the operation of the speech decoding means based on the similarity;
    Control means for outputting a signal that gives a correction to a gain value or an error detection flag obtained in the decoding process of the speech decoding means when the determination means decides to make a correction, the speech decoding means, 2. The voice communication terminal according to claim 1, wherein decoding is performed by correcting a gain value or an error detection flag obtained in the decoding process according to a signal from the control means.
  7. The voice code is composed of an information code representing voice information and an error correction code thereof,
    The similarity evaluation unit performs error correction on the information code in the output speech code stored in the storage unit and the information code in the input speech code by the error correction code in the input speech code. 7. The voice communication terminal according to claim 1, wherein the voice communication terminal is configured to be evaluated with respect to the information code after error correction.
  8. A first voice communication terminal having a structure according to any one of claims 1 to 7,
    One of a speech code obtained by inputting speech and encoding the speech by speech encoding means every frame for a certain period of time and an output speech code output by the first speech communication terminal are selected by a predetermined criterion. A voice communication system comprising: a second voice communication terminal that selects and outputs the output voice code output from the first voice communication terminal and decodes the output voice code by a voice decoding unit.
JP2001029757A 2001-02-06 2001-02-06 Voice communication terminal and voice communication system Active JP3942831B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2001029757A JP3942831B2 (en) 2001-02-06 2001-02-06 Voice communication terminal and voice communication system

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
JP2001029757A JP3942831B2 (en) 2001-02-06 2001-02-06 Voice communication terminal and voice communication system
TW90116446A TW515190B (en) 2001-02-06 2001-07-05 Voice communication terminal and voice communication system
CN 01132642 CN1183734C (en) 2001-02-06 2001-09-05 Voice communication terminal and a voice communication system

Publications (2)

Publication Number Publication Date
JP2002229595A JP2002229595A (en) 2002-08-16
JP3942831B2 true JP3942831B2 (en) 2007-07-11

Family

ID=18894063

Family Applications (1)

Application Number Title Priority Date Filing Date
JP2001029757A Active JP3942831B2 (en) 2001-02-06 2001-02-06 Voice communication terminal and voice communication system

Country Status (3)

Country Link
JP (1) JP3942831B2 (en)
CN (1) CN1183734C (en)
TW (1) TW515190B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8521520B2 (en) * 2010-02-03 2013-08-27 General Electric Company Handoffs between different voice encoder systems
CN102300240A (en) * 2011-08-26 2011-12-28 北京邮电大学 A method of assessment of the similarity of the two systems based on the output performance parameters
DE102012213609B4 (en) * 2012-08-01 2014-06-05 Continental Automotive Gmbh Method for outputting information with synthetic speech

Also Published As

Publication number Publication date
TW515190B (en) 2002-12-21
JP2002229595A (en) 2002-08-16
CN1183734C (en) 2005-01-05
CN1368820A (en) 2002-09-11

Similar Documents

Publication Publication Date Title
DE69912075T2 (en) TURBOENCODER / DECODER AND FRAME PROCESSING PROCESS DEPENDING ON SERVICE QUALITY (QoS)
CN1303585C (en) Noise suppression
EP0861531B1 (en) Acoustic echo elimination in a digital mobile communications system
US7653350B2 (en) Wireless terminals and methods for communicating over cellular and enhanced mode bluetooth communication links
US7693708B2 (en) System and method for adaptive transmission of comfort noise parameters during discontinuous speech transmission
CN1124726C (en) Method and apparatus for determining rate of received data in variable rate communication system
US6985856B2 (en) Method and device for compressed-domain packet loss concealment
JP3241961B2 (en) Linear prediction coefficient signal generating method
JP3241978B2 (en) How to improve the performance of the encoding system
JP4313570B2 (en) A system for error concealment of speech frames in speech decoding.
US7069208B2 (en) System and method for concealment of data loss in digital audio transmission
US7092875B2 (en) Speech transcoding method and apparatus for silence compression
EP1126437A2 (en) Variable rate vocoder
CA2299535C (en) Method for the transmission of speech inactivity with reduced power in a tdma system
US20010028634A1 (en) Packet loss compensation method using injection of spectrally shaped noise
US20070116300A1 (en) Channel decoding for wireless telephones with multiple microphones and multiple description transmission
RU2130693C1 (en) Method for improving quality of current voice frame in multiple-station access radio system with time division of channels and device which implements said method
US6223154B1 (en) Using vocoded parameters in a staggered average to provide speakerphone operation based on enhanced speech activity thresholds
JP2518765B2 (en) Voice coding communication method and apparatus
CA2081441C (en) Method and apparatus for the transmission of speech signals
US5572622A (en) Rejected frame concealment
KR100191295B1 (en) Method and apparatus for determining data rate of transmitted variable rate data in a communication receiver
CA2117587C (en) System for adaptively reducing noise in speech signals
US6363340B1 (en) Transmission system with improved speech encoder
KR100581413B1 (en) Improved spectral parameter substitution for the frame error concealment in a speech decoder

Legal Events

Date Code Title Description
RD01 Notification of change of attorney

Free format text: JAPANESE INTERMEDIATE CODE: A7421

Effective date: 20040702

A977 Report on retrieval

Free format text: JAPANESE INTERMEDIATE CODE: A971007

Effective date: 20050214

A131 Notification of reasons for refusal

Free format text: JAPANESE INTERMEDIATE CODE: A131

Effective date: 20050301

RD01 Notification of change of attorney

Free format text: JAPANESE INTERMEDIATE CODE: A7421

Effective date: 20050405

A521 Written amendment

Free format text: JAPANESE INTERMEDIATE CODE: A523

Effective date: 20050428

A131 Notification of reasons for refusal

Free format text: JAPANESE INTERMEDIATE CODE: A131

Effective date: 20060307

A521 Written amendment

Free format text: JAPANESE INTERMEDIATE CODE: A523

Effective date: 20060426

TRDD Decision of grant or rejection written
A01 Written decision to grant a patent or to grant a registration (utility model)

Free format text: JAPANESE INTERMEDIATE CODE: A01

Effective date: 20070306

A61 First payment of annual fees (during grant procedure)

Free format text: JAPANESE INTERMEDIATE CODE: A61

Effective date: 20070404

R150 Certificate of patent or registration of utility model

Free format text: JAPANESE INTERMEDIATE CODE: R150

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20100413

Year of fee payment: 3

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20110413

Year of fee payment: 4

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20120413

Year of fee payment: 5

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20120413

Year of fee payment: 5

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20130413

Year of fee payment: 6

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20130413

Year of fee payment: 6

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20140413

Year of fee payment: 7

R250 Receipt of annual fees

Free format text: JAPANESE INTERMEDIATE CODE: R250

R250 Receipt of annual fees

Free format text: JAPANESE INTERMEDIATE CODE: R250

R250 Receipt of annual fees

Free format text: JAPANESE INTERMEDIATE CODE: R250

R250 Receipt of annual fees

Free format text: JAPANESE INTERMEDIATE CODE: R250