EP0643380B1 - Méthode et appareil pour la conversion de la vitesse de la parole - Google Patents

Méthode et appareil pour la conversion de la vitesse de la parole Download PDF

Info

Publication number
EP0643380B1
EP0643380B1 EP19940114160 EP94114160A EP0643380B1 EP 0643380 B1 EP0643380 B1 EP 0643380B1 EP 19940114160 EP19940114160 EP 19940114160 EP 94114160 A EP94114160 A EP 94114160A EP 0643380 B1 EP0643380 B1 EP 0643380B1
Authority
EP
European Patent Office
Prior art keywords
speech
speed conversion
switch
frame
speech speed
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Lifetime
Application number
EP19940114160
Other languages
German (de)
English (en)
Other versions
EP0643380A2 (fr
EP0643380A3 (fr
Inventor
Yoshito Hitachi Koyasudai Nejime
Yukio Kumagai
Tadashi Takamiya
Yasunori Kawauchi
Nobuo Hataoka
Juichi Morikawa
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hitachi Ltd
Original Assignee
Hitachi Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hitachi Ltd filed Critical Hitachi Ltd
Publication of EP0643380A2 publication Critical patent/EP0643380A2/fr
Publication of EP0643380A3 publication Critical patent/EP0643380A3/fr
Application granted granted Critical
Publication of EP0643380B1 publication Critical patent/EP0643380B1/fr
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/04Time compression or expansion

Definitions

  • the present invention relates to a speech speed conversion method, a speech speed conversion apparatus, and an electronic apparatus for modulating the speed of a voice, and, particularly, to a technique which is effective on application to a control technique for using such an apparatus in conversation and so on.
  • the amplification of the amplitude of a speech signal and the compression of a dynamic range are generally performed with every frequency in accordance with the hearing characteristic of a user.
  • such a process is realized by an analog circuit.
  • this process is realized by a software such as a digital filter, or the like, so that adaptation to the hearing characteristic of the user can be made more in detail.
  • a broadcasted voice over the television/radio or the like or a voice recorded in a tape recorder or the like was used as the voice to be subjected to speech speed conversion. That is, the subject of speech speed conversion was only a voice one-sidedly given to a listener.
  • the speech speed conversion apparatus also can use other voices than the aforementioned voices as the input voice.
  • the apparatus can be used not only in the case of hearing perception aids for aged or hearing-impaired people but also in the case of hearing aids in conversation of a foreign language unfamiliar to hearing-unhandicapped people, and so on.
  • Speech speed conversion methods and apparatus with the features included in the first part of claim 1 and claim 2, respectively, are known from DE-A-4 227 826.
  • data are stored on a frame by frame basis, so that writing/reading efficiency can be improved.
  • the decision about waveform expansion and reduction processes, silent-part elimination process, etc. in the speech speed conversion process is performed based on comparison between power of a frame and a threshold, and the threshold is changed in accordance with the loudness of the input speech. Accordingly, the speech speed conversion process can be carried out in accordance with the environmental condition in use.
  • a speech speed selection switch for selecting the speed of the speech, and means for changing the speed of the speech to the speech speed selected by the speech speed selection switch. Accordingly, the speed of the speech to be heard can be selected by the listener's own will.
  • means (AV control) for controlling an audio/video apparatus is provided in the speech speed conversion apparatus. Accordingly, a series of operation in which a signal for pausing the reproducing operation of the external apparatus is issued to temporarily stop the inputting of the speech to the speech speed conversion apparatus when the memory capacity is insufficient and in which the outputting of the pause signal is stopped to re-start the inputting of the speech from the external apparatus when there is some free area in the memory, is repeated irrespective of the expansion/reduction rate in the speech speed conversion. As a result, use of speech speed conversion can be continued for a long time.
  • a repeat switch and means for repeating a reproduced speech in a period in which the repeat switch is turned on Accordingly, the speech speed conversion of the repeat speech can be carried out.
  • the catching-up means is provided such that widening of the range of application of the speech speed conversion apparatus, reduction in operating time, improvement in handling property, and so on, can be attained.
  • At least one of the speech speed conversion switch, speech speed selection switch, repeat switch and reset switch is provided in a peripheral portion on a side surface of the speech speed conversion apparatus so as to perform handling easily. Accordingly, widening of the range of application of the speech speed conversion apparatus, reduction in operating time, improvement in handling property, and so on, can be attained.
  • the speech speed conversion means is provided as a software executed by a digital signal processor having an input terminal for receiving an interruption request signal from the outside, so that controlling of the speech speed conversion process or switching of the speech speed conversion rate on the basis of the speech speed conversion switch is given to the digital signal processor via the interruption request signal input terminal.
  • the microphone does not pick up click noise of each switch, so that loud noise at the time of the manipulation of the switch can be prevented.
  • the switches have respective surface formed different in tactility so as to be identified without seeing, so that handling property can be improved.
  • a display means is provided at a predetermined position of the speech speed conversion apparatus so that the quantity of a time lag from the real time can be indicated visually. Accordingly, reduction in operating time, improvement in handling property, and so on, can be attained.
  • a ring buffer is used as the memory means, and there is provided means for managing a lag time by a counter indicating a time lag on the ring buffer. Accordingly, the repeat process, the catching-up process, and so on, can be carried out easily.
  • a standby mode is provided besides the through mode, so that reduction in consumed electric power can be attained.
  • an electric source switch operated in three stages consisting of an ON stage, an OFF stage and an ON-OFF intermediate stage so that an analog through mode is provided. Accordingly, reduction in electric power can be attained.
  • the speech speed conversion means is provided between a handset of a telephone and a body of the telephone. Accordingly, a speech to be subjected to speech speed conversion can be selected by the listener without any disturbance of the listener's own speech.
  • the voice can be heard at a slow speech speed without any change of the characteristic of the talker's voice.
  • the speech speed conversion means is provided in a telephone line switching system. Accordingly, the voice to be subjected to speech speed conversion can be selected by the listener without any disturbance of the listener's own speech.
  • Fig. 1 is a block diagram showing the schematic structure of internal circuits according to the present invention.
  • the reference numeral 1 designates a DSP (Digital Signal Processor); 11, a software for performing a speech speed conversion process; 12, a serial port; 13, a terminal for external interruption flag; 14, a flag register; 2, a memory (output buffer); 3, a selector switch; 4, a PTL (Push-To-Listen) switch; 5, an A/D converter; 6, a D/A converter; 7, a low-pass filter; 8, a low-pass filter; 9, an analog amplifier; 10, an analog amplifier; 321, a microphone; and 325, a binaural headphone (earphone).
  • DSP Digital Signal Processor
  • a voice is inputted to the microphone 321 and outputted as a voice signal (an electric signal).
  • This voice signal is inputted via the amplifier 10 and the low-pass filter 7 to the A/D converter 5, in which the voice signal is converted from an analog value into a digital value at intervals of a time set in advance.
  • the voice signal converted into a digital value as described above is inputted to the DSP 1. Then, the speech speed conversion process of the voice signal is realized by the software 11 on the DSP 1.
  • the PTL switch 4 is connected to the external interruption flag terminal 13 contained in the DSP 1, so that the state of the PTL switch 4 is expressed as a numerical value of the flag register 14 which is provided in the inside of the DSP 1 so as to correspond to this terminal 13.
  • a judgment in accordance with the numerical value of the flag register 14 is made as to whether the speech speed conversion process is to be performed or not to be performed.
  • the digital voice data subjected to the speech speed conversion process is stored in the output buffer memory 2.
  • the D/A converter 6 converts the data of the output buffer memory 2 from a digital value into an analog value at intervals of a time set in advance.
  • the analog signal obtained by this conversion is inputted via the low-pass filter 8 to the analog amplifier 9 and outputted as a voice from the binaural headphone 325 in listener's favorite amplitude of speech signal.
  • two kinds of switches are prepared for the PTL switch 4. One thereof is a switch in which current conduction is made as long as a pushbutton is pushed. The other is a switch in which the current conduction state is maintained though the hold of the pushbutton is released.
  • the former is used in the case of conversation whereas the latter is used in the case of continuous speech speed conversion of a one-sidedly given voice such as a radio voice which is conventional utilization, and the like.
  • the selector switch 3 as well as the PTL switch 4 is connected to the external interruption flag terminal 13 contained in the DSP 1.
  • the numerical value of the flag register 14 is changed by the changeover of the selector switch 3, so that the software 11 changes the expansion rate of the speech speed conversion process in accordance with this numerical value.
  • Fig. 2 is a view for explaining the speech speed conversion process which is performed in the DSP 1 in this embodiment.
  • the speech speed conversion process in this embodiment is a method of detecting the pitch (basic period) of a voice signal and expanding the length of a waveform with the detected pitch as a unit, in which a voice data set of the order of tens of milliseconds (hereinafter referred to as a frame) is made a unit for one process.
  • a frame voice data set of the order of tens of milliseconds
  • at least two frame length input buffers are prepared in the inside of the DSP 1 so that while data from the A/D converter is inputted to one buffer, data stored in the other buffer is processed (pipe-line process). After processed, data is stored in the output buffer 2 having a sufficiently large capacity.
  • the procedure of processing data in each frame is as follows.
  • the aforementioned speech speed conversion process in Fig. 2 is not always applied to all frames but the aforementioned process in Fig. 2 is applied only in the case where the calculated average power of each frame exceeds a threshold Th which was set in advance. Data in a frame having power not exceeding the threshold Th is therefore transferred to the output buffer in its original condition.
  • Fig. 3 shows a concept of this threshold process.
  • the portion in which the power of each frame exceeds the threshold Th is expressed as a duration of expansion. Because the leading and trailing portions of the voice signal are not processed but outputted in their original condition by this threshold process, there is an advantage in that voice characteristic contained in the leading and trailing of the voice, for example, consonantal characteristic, is not destroyed.
  • a second threshold To is provided in the threshold process for the average power of each frame as shown in Fig. 3.
  • the frame having power lower than the threshold To continuously for a time not smaller than one second is therefore processed so as not to be outputted. Accordingly, reduction in the quantity of data stored in the output buffer is attained.
  • this not-outputted portion is expressed as a duration of elimination.
  • data are one by one outputted to the D/A converter 6 at regular time intervals in parallel with the writing of the speech-speed-conversion-processed data at once each frame.
  • Addresses in the output buffer 2 are set in the form of a ring so that the last address is continued to the first address.
  • an operation is carried out so that an address pointer Po which points data to be fed to the D/A converter runs after an address pointer Pi which points the destination of the writing of the speech-speed-conversion-processed data.
  • Pi will overtake Po sooner or later because the speed of Pi is higher than the speed of Po.
  • information which has been stored in the output buffer 2 is not outputted but rewritten.
  • the time from the start of the speech speed conversion operation to this state becomes a time length of the input voice which can be tackled by the speech speed conversion process of this embodiment.
  • the reduction in the quantity of data based on the aforementioned threshold To has an effect that this time length which can be tackled is made long.
  • Fig. 4 is a view showing the form of use of the speech speed conversion apparatus according to this embodiment.
  • Fig. 4 shows the case where the PTL switch 4 is disposed on the upper surface of the apparatus, it is a matter of course that the position of the arrangement thereof may be replaced by another position.
  • the selector switch 3 for changing the expansion rate of speech speed conversion is prepared in a side of the PTL switch 4. Because the selector switch 3 as well as the PTL switch 4 is provided so that the state of the selector switch 3 can be observed through the external interruption flag terminal of the DSP 1 from the software on the DSP 1, the value of n in the aforementioned speech speed conversion process is changed in accordance with the state of the selector switch 3 when the PTL switch 4 is pushed.
  • the expansion rate can be changed for every speech by operating the PTL switch 4 and the selector switch 3 alternately.
  • Fig. 5 shows the aforementioned control procedure expressed in a flow chart.
  • the A/D conversion and D/A conversion are processes which are carried out at regular intervals of a smaller time pitch than that, for example, at regular intervals of a time pitch of the order of, for example, tens of microseconds.
  • the A/D conversion, the D/A conversion and their attendant process are realized as an interruption process. While the speech speed conversion process and a process of waiting for interruption are carried out, the interruption process is carried out in accordance with an interruption signal from the serial port to which the A/D converter and the D/A converter are connected.
  • the speech speed conversion apparatus can be used not only for the voice one-sidedly given to a listener like a radio broadcasting voice but also in the situation of conversation, so that the listener can select the voice subjected to the speech speed conversion without any disturbance of listener's own speech.
  • the speech speed conversion apparatus of this embodiment can be used to compensate for the deterioration of voice hearing ability as observed in the aged or the like. It is further needless to say that the apparatus can be used even in the situation in which a listener who has no difficulty in hearing hears an unfamiliar foreign language.
  • Fig. 6 is a front plan view from the front;
  • Fig. 7 is a back plan view from the back;
  • Fig. 8 is a top plan view from the top;
  • Fig. 9 is a left plan view from the left;
  • Fig. 10 is a right plan view from the right.
  • the reference numeral 101 designates a body of the speech speed conversion apparatus; 102, a back cover; 103, a finger stop hollow; 104, a slow switch (slow pushbutton); 105, a repeat switch (repeat pushbutton); 106, a reset switch (reset pushbutton); 321, a microphone; 108, a voice volume; 109, an electric source switch; 110, an earphone terminal; 111, an external input terminal; 112, an AV control terminal; and 113, a speech speed changeover switch (speech speed setting switch).
  • the slow switch 104, the repeat switch 105 and the reset switch 106 are provided in positions where the body 101 of the speech speed conversion apparatus is easy to operate with one hand, for example, in the front upper side portion, and the speech speed changeover switch 113 is provided in the right plan view.
  • the pushbutton of the aforementioned slow switch 104 is formed so as to be larger than the other pushbuttons because the frequency of pushing of it is higher. Further, because the continuous slow pushing of the pushbutton is tiring, the pushbutton is provided so that it can be fixed. For example, there are used (1) a slide lock type in which the pushbutton is locked when it is pushed and slid laterally, (2) a double click type in which the pushbutton is locked when it is clicked twice, (3) a type in which the hold of the pushbutton is released when the reset pushbutton is pushed, and so on.
  • the aforementioned speech speed changeover switch (speech speed setting switch) 113 is disposed close to a range allowing the operation thereof with the same finger so that this switch and the slow switch 104 can be operated alternately.
  • a ring switch Besides the position in the aforementioned embodiment, a ring switch, a slide switch, and so on, may be used to make the operation easier.
  • the aforementioned voice volume 108 is also disposed in a range allowing the operation with the same finger so as to be easy to adjust in order to always make hearing in appropriate voice volume possible.
  • a switch which has a feeling of soft touch so that the microphone 321 does not pick up click noise of the switch is preferably used as the aforementioned switches high in the frequency of use, such as the slow switch 104, the repeat switch 105, the reset switch 106, the speech speed changeover switch 113, and so on.
  • a switch using electrically conductive rubber or the like is used.
  • the external appearances of the aforementioned respective switches are preferably formed into surface states which are different in the tactile feeling in order to identify the switches in kind without seeing.
  • the internal circuit structure of the speech speed conversion apparatus in this embodiment is formed so as to be identical to the aforementioned circuit structure shown in Fig. 1.
  • the slow switch 104 there are used the slow switch 104, the repeat switch 105, the reset switch 106, and so on, as described above.
  • the selector switch 3 in the previous embodiment there is used the speech speed changeover switch (speech speed setting switch) 113.
  • the speech speed changeover switch (speech speed setting switch) 113 is connected to the external interruption flag terminal 13 contained in the DSP 1. The numeral value of the flag register 14 is changed by the changeover of the speech speed changeover switch 113, so that the software 11 changes the expansion rate of the speech speed conversion process in accordance with this numerical value.
  • Fig. 11 is a block diagram showing the functional structure of the speech speed conversion apparatus in this embodiment, in which the reference numeral 21 designates speech input devices; 22, input buffers; 23, a central processing unit (CPU); 24, a ring buffer memory (which corresponds to the memory 2 in Fig. 1); 25, a function chooser; 26, output buffers; and 27, speech output devices.
  • the reference numeral 21 designates speech input devices
  • 22, input buffers 23, a central processing unit (CPU); 24, a ring buffer memory (which corresponds to the memory 2 in Fig. 1); 25, a function chooser; 26, output buffers; and 27, speech output devices.
  • the speech input devices 21 are constituted by the microphone 321, analog amplifier 10, low-pass filter 7 and A/D converter 5 of Fig. 1.
  • the aforementioned input buffers 22 serve to hold a speech converted into a digital signal by the aforementioned speech input devices 21 and have a size enough to hold data of the length of one frame which is a unit for signal processing after that.
  • These input buffers 22 can be realized by the allocation of a part of addresses of the ring buffer memory 24 (which corresponds to the memory 2 in Fig. 1).
  • the aforementioned central processing unit (CPU) 23 which corresponds to the portion of software executed on the DSP 1 shown in Fig. 1, has an encoder 23A, a silent-part elimination process 23B, a decoder 23C, a wave-form manipulation process (speech speed conversion process) 23D, and a controller 23E.
  • the aforementioned function chooser 25 which corresponds to the portion constituted by the switches 3 and 4 and the external interruption flag terminal 13 shown in Fig. 1, is constituted by the slow switch 104, the repeat switch 105, the reset switch 106, the speech speed changeover switch 113, and so on, as described above.
  • the aforementioned output buffers 26 which serve to hold resulting data processed by the aforementioned wave-form manipulation process 23D are two in practice and each of them has a size enough to store data of the length of one frame expanded by wave-form manipulation.
  • two input buffers are provided so that a pipe-line process is realized by using them alternately, whereas in this embodiment a pipe-line process is realized by using two output buffers alternately in the same manner as in the previous embodiment.
  • output buffers 26 can be realized by the allocation of a part of addresses of the ring buffer memory 24 (which corresponds to the memory 2 in Fig. 1).
  • the inputting of data to the input buffers 22 and the outputting of data from the output buffers 26 are carried out at intervals of the sampling rate of the A/D converter 5 and of the D/A converter 6 in the same manner as in the previous embodiment.
  • the process executed by the DSP 1 is therefore constituted by a wave-form manipulation process for each frame and an interruption process executed at sampling intervals.
  • the interruption process is executed any number of times while the wave-form manipulation process is applied to data of the length of one frame, so that the two processes are executed apparently and simultaneously.
  • ring buffer memory 24 there is used a well-known type memory in which writing/reading is performed for each frame. The details thereof will be described below.
  • Fig. 11 speech data inputted through the speech input devices 21 are held in the input buffers 22.
  • the input buffers 22 have a capacity enough to hold a number of data corresponding to one frame so that the code length of 16 bits per one data is allocated thereto, and the input buffers 22 are realized by the allocation of a part of addresses on the memory 2 shown in Fig. 1.
  • the controller 23E shown in Fig. 11 monitors the state of these input buffers 22 and transfers speech data of the length of one frame to the encoder 23A whenever the input buffers 22 are filled with the data of the length of one frame.
  • the input speech data of the length of one frame is subjected to an information compression process, so that the data as a result of the compression is held in the ring buffer memory 24.
  • an information compression process Several methods are considered as this compression process.
  • One example thereof is a difference data holding method shown in Figs. 12A and 12B.
  • Figs. 12A and 12B are typical graphs for explaining the compression process in the encoder 23A in this embodiment.
  • the output data of the compression process are data obtained by arranging the aforementioned difference data ⁇ 1, ⁇ 2,... into the code length of 8 bits per one data after dividing the leading data of the frame into upper 8 bits and lower 8 bits.
  • One data of the input data has a digital code length of 16 bits.
  • the difference from the previous sampling value is however not so large that the difference can be expressed sufficiently in the code length of 8 bits which is a half as shown in Fig. 12B.
  • the capacity of data after the compression process is therefore about a half the capacity of data before the compression process but there is no missing from the contents thereof as long as the difference in the middle of the process does not become too large to be expressed in the code length of 8 bits.
  • a frame header is added to the leading of the compressed data of each frame in order to indicate a break between frames.
  • the calculation of the sum of the absolute values of all data in the frame as well as the aforementioned compression process in Fig. 12 is carried out and, at the same time, the work of recording a result thereof as the power value of this frame in the aforementioned frame header portion is carried out.
  • the determination of a frame to be subjected to the waveform expansion/reduction process is performed on the basis of comparison between the power of the frame and the threshold Th. Further, the silent-part elimination process is carried out on the basis of comparison between the power of the frame and the threshold To.
  • these thresholds are not used as fixed values but changed in accordance with the loudness of the input voice. For example, between the case of use in a quiet room and the case of use in a situation of large background noise, speech speed conversion, of course, cannot be performed well unless these thresholds are adjusted well.
  • the maximum/ minimum values of frame power in the past period of several seconds are stored so that the aforementioned thresholds are determined on the basis of these values.
  • these thresholds are to be changed at intervals of five seconds in the condition in which the time length of one frame is 50 milliseconds (msec)
  • the process of changing the threshold Th can be carried out once whenever 100 frames are processed.
  • the power of each frame is always calculated with respect to all inputs whenever information compression is performed for each frame by the encoder in Fig. 11, so that information thereof is recorded in the frame header and held in the ring buffer 24.
  • the maximum frame power Pmax and the minimum frame power Pmin are compared with each other so that they are updated if necessary. If the maximum frame power Pmax and the minimum frame power Pmin are provided so as to be reset at intervals of five seconds (100 frames), the maximum frame power and the minimum frame power in the past period of five seconds can always remain.
  • Th and To are set to 10 % and 5 % the difference between the maximum frame power Pmax and the minimum frame power Pmin, respectively. These are given by the following expressions (1) and (2).
  • Th
  • *0.10 + Pmin To
  • the function of silent-part elimination serves to eliminate a silent part (a duration in which power is lower than the voice-part/silent-part threshold To) continued for a time not smaller than one second.
  • the silent-part elimination process is carried out by the silent-part elimination process 23B shown in Fig. 11.
  • This silent-part elimination process is a process independent of a later-described process executed for each frame (hereinafter referred to as a main process) so that the process is carried out after the main process for one frame is terminated.
  • the silent-part elimination process 23B data accumulated in the input buffers 22 are added up at intervals of a predetermined unit (for example, 1/4 frame) to calculate power, so that the silent-part elimination operation is started when the power "crosses the voice-part/silent-part threshold upwards".
  • a predetermined unit for example, 1/4 frame
  • the frame header of the ring buffer memory 24 is retrieved retroactively to the past. Compressed data on the ring buffer memory 24 are compressed for each frame and, as described above, the power value of the frame is recorded in the frame header. If a frame having power lower than To is continued for a time not smaller than one second, silent-part elimination is enabled and the input pointer to the ring buffer memory 24 is returned to the point of time in which the silent part has been continued for one second. The input of the next compressed data is recorded so as to be overwritten from the returned point of time. Accordingly, the silent part continued for a time not smaller than one second just before the current point of time is always eliminated.
  • the later-described main process in the apparatus of this embodiment is carried out for each frame.
  • the wave-form manipulation process 23D shown in Fig. 11 therefore holds currently processed frame data, so that reading from the ring buffer memory 24 is performed collectively for each frame. That is, because addressing to the ring buffer memory 24 can be made easily by a process of increasing the address one by one simply in the case where data are collectively picked out, this case is better in efficiency than the case where data are one by one picked out.
  • the decoder 23C shown in Fig. 11 is provided for this purpose. First, leading two 8-bit data are arranged in the upper/lower of 16 bits with one-frame compressed data as an input to generate a leading data. Then, the value of the third data of the compressed data is added to the leading data to restore the second data. Then, the value of the next data of the compressed data is added to the second data to restore the third data. Thereafter, the work of adding the compressed data to the previously restored data successively is repeated thus to restore all data of the frame.
  • the speech converted into a digital signal by the speech input devices 21 is first inputted to the input buffers 22.
  • the speech signal read from the input buffers 22 is fed to the encoder 23A contained in the CPU 23 of DSP1 (Fig. 1), subjected to the data compression process and stored in the ring buffer memory 24.
  • the aforementioned speech signal is also fed to the silent-part elimination process 23B so that the silent-part elimination process is applied to the data stored in the ring buffer memory 24 if necessary.
  • the data of the speech signal stored in the ring buffer memory 24 are frame-by-frame fed to the decoder 23C, so that the compressed speech data are decoded by the decoder 23C and inputted to the wave-form manipulation process (speech speed conversion process) 23D.
  • the wave-form manipulation process (speech speed conversion process) 23D there is carried out speech speed conversion or the like on the basis of the condition set by the function chooser 25.
  • the digital speech data subjected to the speech speed conversion process or the like are held in the output buffers 26.
  • the data of the output buffers 26 are read out so that the speech subjected to the speech speed conversion process or the like is outputted from the speech output devices 27.
  • the data of the output buffers 26 are read out so that the data are converted from a digital value into an analog value at intervals of a set time by the D/A converter 6 as shown in Fig. 1.
  • the analog signal thus obtained by this conversion is inputted to the analog amplifier 9 via the low-pass filter 8 and outputted as a voice from the binaural headphone 325 in listener's favorite amplitude of speech signal.
  • Figs. 13 and 14 are flow charts showing the procedure of the main process in this embodiment.
  • the "fade-in” step is carried out (S131) with Powering ON. That is, just after the powering-on of the electric source, data stored in the output buffers 26 are indefinite. Just after the powering-on of the electric source, data having no relation to the speech may be therefore outputted. In the case where the data are outputted intact from the speech output devices 27, the data may form noise of a very large level. To prevent this, in this embodiment, the values of data in the output buffers are adjusted by the execution of the fade-in step so that the output of the speech output devices is increased gradually for a predetermined time after the powering-on of the electric source irrespective of the data in the output buffers.
  • the "reading pointer coincidence” step is carried out (S132).
  • This reading pointer coincidence process is a process in which when data from the speech input devices 21 is inputted, the same data is inputted to the output buffers 26 just after the inputting of the data to the input buffers 22. This operation is realized by making the value of the input pointer pointing an input address on memory coincident with the value of the output pointer pointing an output data address on memory just after the inputting of data to the input buffers 22. In Fig. 11, this operation is carried out by the controller 23E.
  • the pushed states (ON states) of the slow switch 104 and the repeat switch 105 are checked (S133 and S144).
  • both switches are in non-pushed states (OFF states)
  • the situation of the routine goes back to the previous reading pointer coincidence step (S132) so that the through mode is continued. Accordingly, in the interruption process which occurs while the through mode is continued, input data is always outputted intact, so that the same speech as the input speech is outputted from the speech output devices 27.
  • the aforementioned respective switches such as the slow switch 104, the repeat switch 105 and the reset switch 106 are contained in the function chooser 25 and the states thereof are checked by the controller 23E.
  • Fig. 16 shows a flow chart of the internal procedure of this reading pointer return routine. The explanation of Fig. 16 will be described later.
  • Fig. 17 shows a flow chart of the internal procedure of this routine. The explanation of Fig. 17 will be described later.
  • Figs. 18 and 19 show flow charts of the internal procedure of this one-frame waveform expansion process. The explanation of Figs. 18 and 19 will be described later.
  • the situation of the routine goes to the step for checking the states of the respective switches as to whether each switch is pushed or not.
  • the process is completed in the order of tens of milliseconds (msec).
  • switching devices which are such that the pushed states are maintained for a time not shorter than the duration of pushing, no matter how short, in the case where the respective switches (pushbuttons) are pushed by a user, are used in this apparatus. Accordingly, the situation of the routine can be shifted to a desired operation with such a time lag that a feeling of slow response is not given to the user, as long as the pushed states of the switches are checked whenever the one-frame process is carried out.
  • the situation of the routine goes to the next judgment as to the repeat pushed-down state (S140).
  • the case where the pushing-down of the repeat switch 105 is detected at this point of time is either the case where "the repeat switch is pushed at the time of repeat reproduction” or the case where "the repeat switch is pushed at the time of catching-up reproduction".
  • the situation of the routine branches into the reading pointer return routine so that the repeat reproduction is started from the silent part near a position returned back to the past by about five seconds from the current position of the output pointer of the ring buffer memory 24.
  • the situation of the routine goes to the following repeat end judgment (S141).
  • the repeat operation is continued until the output pointer goes back to the output pointer position where the through mode was changed to the repeat operation by the pushing-down of the repeat switch 105. That is, in the case where this judgment shows that the repeat mode is used currently and that the position of the output pointer does not yet go back to the output pointer position where the repeat was started, a processing loop is formed so that the situation of the routine goes back to the aforementioned one-frame waveform expansion/reduction process.
  • the subsequent process is a process for catching-up reproduction.
  • the catching-up reproduction means an operation in which a time lag from the real time as caused by the repeat or slow reproduction is made up for by fast reproduction realized by the repetition of the one-frame waveform reduction process.
  • the setting of parameter is performed for the waveform reduction process for the catching-up reproduction (S142).
  • the quantity of the lag from the real time increases when the repeat button is pushed down or when the waveform expansion process is carried out. On the contrary, it decreases when the waveform reduction process is carried out.
  • the time lag from the real time as caused by the speech speed conversion or repeat operation is managed as "lag quantity" by using a counter.
  • the time lag from the real time can be managed also as difference between the position on the ring buffer 24 where the current sampled data is inputted and the position on the ring buffer 24 where the position of data outputted is inputted, that is, as difference between addresses pointed by two pointers
  • the management method using the lag quantity counter as described above is employed in the present invention. This is because the quantity of the lag may be unable to be expressed correctly in the address difference between the input and output pointers on the ring buffer 24.
  • the ring buffer 24 is realized by the handling of the memory address space in a manner of "next to address 1000, jump to address 0" in the program. Therefore, in the case where the input and output pointers lie across this break between addresses, the quantity of data therebetween cannot be expressed easily by taking the difference between address values simply.
  • address value calculation including complex classification that takes into account the histories of the two pointers up to their current positions is required.
  • the value of the lag quantity counter is changed to thereby manage the quantity of the time lag to prevent the increase of the quantity of processing based on the complex address calculation.
  • the aforementioned main process is provided in the form of an infinite loop in which the aforementioned process is repeated until the electric source switch is turned off (S144).
  • Fig. 15 is a state transition view typically showing transition between respective modes in this embodiment as described above. The way of mode switching on the basis of the switching operation will be understood well from Fig. 15. Further, the standby mode in Fig. 15 will be described later in detail.
  • Fig. 16 is a flow chart showing the procedure of the reading pointer return routine.
  • the reading pointer return routine in this embodiment is a specific method for changing the value of the output pointer pointing the position of data to be read from the ring buffer 24, which method is necessary for realizing a repeat function.
  • the position of the output pointer at the current point of time is set to Pout (S161). Then, the quantity of the lag from the real time at the current point of time is set to D (S162).
  • this routine is terminated without any change of Pout and D (S169 and S170).
  • the quantity of the lag is also further increased by one frame (+F).
  • a judgment is made as to whether or not the total quantity of the lag exceeds the size of the ring buffer memory if the quantity of the lag is further increased by one frame (S166), and in the case where a decision is made as a result of the judgment so that the total quantity of the lag exceeds the size of the ring buffer memory (the case of Yes in S166), this search for the silent part is stopped and Pout and D at this time are set as the output pointer vale and the lag quantity respectively (S169 and S170) whereafter this routine is terminated.
  • the pointer is further returned back by one frame and the search for the silent part is continued to detect the silent part in the same manner as described above but the search is continued until the quantity of the lag exceeds the size of the ring buffer memory.
  • the output pointer return process at the time of the pushing of the repeat switch 105 is completed.
  • Figs. 17 and 18 are flow charts showing the procedure of the one-frame waveform expansion/reduction process in this embodiment.
  • a specific method for realizing the consonant emphasis process there is, for example, considered a method in which a frame having lower power than the threshold Th just prior to a frame having higher power than the threshold Th is regarded as a consonant and the values of data in the frame are increased.
  • the pitch extraction process is carried out from the leading of the frame (S175).
  • the pitch extraction process For example, the pitch length at the leading of the frame is extracted on the basis of a well-known algorithm using autocorrelation.
  • the quantity of data corresponding to twice the thus extracted pitch length is compared with the quantity of not-yet-processed data (S176), and in the case where the quantity Z of not-yet-processed data is smaller the quantity of data twice as much as the extracted pitch, this process is stopped.
  • a pre-transfer process is carried out (S178).
  • the pre-transfer process means a process in which a part of input data is transferred intact to the output buffer 26 before a reproduced wave pattern insertion process which will be described later.
  • the pre-transfer process corresponds to the portion of (b) in Fig. 2.
  • the number of data to be transferred by the pre-transfer process is set with the pitch as a unit but the number thereof varies in accordance with the wave-form expansion/reduction rate.
  • the number Npf is set (S177) by a parameter setting routine which will be described later with reference to Fig. 19.
  • the quantity Z of not-yet-processed data is reduced by the number of transferred data (S179).
  • the position of application of a ⁇ window function for generating a reproduced wave pattern is determined (S180) in accordance with another parameter Ptri set in the parameter setting routine shown in Fig. 19. What differs between expansion and reduction is only the position on current wave to which the window function is applied in the case where a reproduced wave pattern is generated by using the ⁇ window function.
  • the ⁇ window function is applied so that waveform of the length of two pitches is generated from waveform of the length of one pitch (S181). Contrariwise in the case of waveform reduction, as shown in Figs. 20 to 22, the ⁇ window function is applied so that waveform of the length of two pitches is generated from waveform of the length of three or four pitches.
  • the quantity of the lag from the real time is changed by the insertion of the reproduced wave pattern (though not shown).
  • the quantity Z of not-yet-processed data is reduced by the number of the thus processed data (S182).
  • the quantity of data twice as much as the newly extracted pitch is compared with the number of not-yet-processed data (S184). If the quantity of data of the length of two pitches does not remain (the case of No in S184), this process is stopped immediately.
  • the post-transfer process means a process similar to the pre-transfer process and corresponds to the portion of (e) in Fig. 2 in the previous embodiment.
  • the number of data to be transferred by the post-transfer process is set with the pitch as a unit but the number thereof varies in accordance with the waveform expansion/reduction rate.
  • the number Npf is set (S185) by the parameter setting routine which will be described later with reference to Fig. 19.
  • Fig. 19 is a flow chart showing the procedure of the parameter setting routine for setting parameter for the expansion process in this embodiment.
  • the parameter setting routine shown in Fig. 19 is used twice in the main process shown in Figs. 13 and 14. Once thereof is used just before the aforementioned one-frame waveform expansion/reduction routine and the other once is used in a "process for setting parameter for the reduction process" after the repeat end judgment.
  • the waveform reduction process is a process for realizing the "catching-up process (fast hearing process)" which is continued after slow hearing or after repeating.
  • the generation of a reproduced wave pattern with use of the ⁇ window function as carried out in the waveform expansion process is carried out while the position subjected to the window function is shifted in a direction reverse to the case of expansion, waveform reduction is obtained.
  • parameter setting for the expansion process after this discrimination, the position of the speech speed selection switch is checked (S192), the expansion rate e is set in accordance with the position of the switch (S193), the positions of parameters Npf and Npr used in the waveform expansion process are set in accordance with the expansion rate e, and parameter Ptri indicating the position of the start of weighted summation with respect to the ⁇ window as carried out in the waveform expansion process is set, whereafter this routine is terminated.
  • the catching mode (Mcat) practically serves not to "catch up” but to jump actually just at the moment that the hold of the slow switch (slow pushbutton) is released (S199). Specifically, a branching process for forcedly returning back to the through mode is carried out in this portion.
  • the reduction rate s is set through the center flow in Fig. 19 at the time of the catching-up mode (S201), the values of parameters Npf and Npr used in the waveform reduction process are set in accordance with the reduction rate s (S202) and, further, parameter Ptri indicating the position from which weighted summation with respect to the ⁇ window as carried out in the waveform reduction process is started is set (S203), whereafter this routine is terminated.
  • Figs. 23 and 24 are flow charts showing the procedure of the total operation of a speech speed conversion apparatus provided with a continuous speech speed conversion means according to the present invention.
  • continuous speech speed conversion in the speech speed conversion apparatus provided with the continuous speech speed conversion means is substantially an operation in which the pushing of the slow switch (slow pushbutton) 104 is continued so that slow reproduction is continued.
  • the time lag is however accumulated rapidly when waveform expansion at a constant waveform expansion rate is continued, so that the quantity of the lag from the real time finally exceeds the capacity of the ring buffer 24 to make it impossible to continue slow hearing any more.
  • the continuous speech speed conversion means is therefore provided to mix a waveform expansion period and a waveform reduction period reverse thereto at the time of slow reproduction so that the lag from the real time is not increased rapidly.
  • step S231 in Figs. 23 and 24 whether the continuous speech speed conversion process is intended or not is checked in step S231 in Figs. 23 and 24 (S231). If the continuous speech speed conversion process is intended (the case of Yes in S231), the one-frame waveform expansion/ reduction process is carried out (S232). Then, a judgment is made as to whether the reset switch 106 is pushed (turned on) or not (S233). In the case where the reset switch 106 is not pushed (turned off), counting up by one frame is performed (S234) and a judgment is made as to whether the expansion period is intended or not (S235). If the expansion period is intended (the case of Yes in S235), the situation of the routine goes back to the step S232.
  • parameter is set for the reduction process (S236). Then, whether the lag quantity is zero or not is checked (S237). In the case where the lag quantity is zero (the case of Yes in S237), the situation of the routine goes back to the step S232. In the case where the lag quantity is not zero (the case of No in S237), parameter is set for the expansion process (S238) and the frame counter is reset (S239) whereafter the situation of the routine goes back to the step S232 so that the continuous speech speed conversion operation is repeated. In the case where the continuous speech speed conversion process is not intended in the aforementioned step S231 (the case of No in S231), the mode is shifted to the aforementioned main process routine (through mode).
  • the continuous speech speed conversion means in this embodiment is a method in which slow reproduction and catching-up reproduction are repeated alternately at intervals of a preliminarily set time. According to this method, catching up to the real time at intervals of a predetermined time is always made possible.
  • the management of the changeover between waveform expansion and waveform reduction is performed on the basis of the count of the number of frames. For example, when the expansion process for a number of frames corresponding to about five seconds is completed, the reduction process is then carried out repeatedly, and when the lag quantity reaches zero, the frame count is returned to zero and the expansion process is repeated again.
  • escape out of the continuous speech speed conversion mode is achieved by the pushing-down of the reset switch 106 to return the mode to the through mode.
  • FIGs. 25 and 26 there are flow charts showing the procedure of the total operation of a speech speed conversion apparatus provided with a continuous speech speed conversion means different from that in the embodiment shown in Figs. 23 and 24.
  • the continuous speech speed conversion in the speech speed conversion apparatus provided with the continuous speech speed conversion means in this embodiment is an operation for applying waveform expansion to a frame of high power and applying waveform reduction to a frame of low power.
  • step S251 whether the continuous speech speed conversion process is intended or not is checked in step S251 in Figs. 25 and 26. If the continuous speech speed conversion process is intended (the case of Yes in S251), a judgment is made as to whether the reset switch 106 is pushed (turned on) or not (S252). In the case where the reset switch 106 is not pushed (turned off), one-frame power is calculated (S253). Then, whether the calculated one-frame power is higher than the threshold Th or not is checked (S254). In the case where the calculated one-frame power is lower than the threshold Th (the case of No in S251), parameter is set for the reduction process (S256) and the situation of the routine goes to step S257.
  • one-frame power is calculated so that either expansion or reduction is applied to each frame on the basis of comparison between the one-frame power and the threshold Th. Escape out of the continuous speech speed conversion mode is achieved by the pushing-down of the reset switch 106.
  • the speech is made slow or fast in accordance with the power thereof.
  • the speech speed control in this embodiment is characterized in that an output voice nearer to the natural voice is obtained.
  • the probability of appearance of the high-power portion and the probability of appearance of the low-power portion are however not always equal to each other, so that catching up to the real time at intervals of a predetermined time as in the case of the previous embodiment of Figs. 23 and 24 is not always ensured.
  • a method of instruction from the user to attain entry into the continuous speech speed conversion mode there are considered a method in which the slow switch (slow pushbutton) 104 is pushed and then slid laterally to thereby be locked, a method in which the slow switch (slow pushbutton) 104 is double-clicked (pushed down twice in succession at a short time interval), and so on. If these methods are used, the respective intentions of "executing slow reproduction” and of “continuing" the operation by the pushing of the slow switch (slow pushbutton) 104 can be expressed in difference in the way of pushing of the same pushbutton so that there can be provided an operating system which is more intuitive and easier to understand compared with the case where a continuous speech speed conversion pushbutton is provided separately.
  • the through mode in the aforementioned embodiments is executed.
  • the waveform expansion process is employed so that "slow reproduction” is executed with a lag from the real time.
  • the waveform reduction process is employed reversely so that fast reproduction is executed (until the lag from the real time reaches zero).
  • the controller changes the waveform expansion/reduction rate in accordance with the distance from the center of the slide switch.
  • the expansion/reduction rate however can be set to no value but several values which can be expressed in integer rates. In practice, therefore, the expansion/reduction rate may be preferably set so that several stages of values can be selected in accordance with the distance from the center of the slide switch.
  • FIG. 28 there is a block diagram showing the functional structure of a speech speed conversion apparatus provided with an AV control means.
  • Fig. 29 there is a view for explaining the operation of the AV control means in this embodiment of Fig. 28.
  • Figs. 30 and 31 there are flow charts showing the operating procedure of the main process in the speech speed conversion apparatus provided with the AV control means in this embodiment.
  • the speech speed conversion apparatus provided with the AV control means in this embodiment is provided as a functional structure in which an AV controller 28 is added to the functional structure of the speech speed conversion apparatus in the aforementioned embodiment shown in Fig. 11 and connected to the controller 23E.
  • the aforementioned controller 23E judges whether a condition for outputting an AV control signal is satisfied or not and operates the AV controller 28 to start/stop the outputting of the AV control signal.
  • the AV control means is a software in which the AV control signal is outputted when the quantity of the lag from the real time as caused by slow or repeat reproduction exceeds a predetermined value (30 seconds in Fig. 29) and in which the outputting of the same signal is stopped when the lag quantity then reaches zero via catching-up reproduction.
  • the AV control signal is picked out of this apparatus and used for temporarily stopping the reproducing operation of a recording/reproducing apparatus such as a tape recorder, a video tape recorder, or the like.
  • a recording/reproducing apparatus such as a tape recorder, a video tape recorder, or the like.
  • the portion surrounded by the broken line is a step showing the operating procedure of the AV control means added to the flow charts in Figs. 12 and 13.
  • a judgment is made as to whether the condition for outputting the AV control signal is satisfied or not (S301).
  • the judgment with respect to the outputting of the AV control signal is realized by a judgment as to whether the quantity of the lag from the real time in a loop in which the one-frame waveform expansion/reduction process is repeated for slow or repeat reproduction is over 30 seconds or not (S301) and by starting the outputting of the AV control signal when the lag from the real time is over 30 seconds (S302).
  • FIGs. 32A, 32B and 32C there are views for explaining the arrangement of a microphone in a speech speed conversion apparatus according to the present invention.
  • the reference numeral 101 designates a body of the speech speed conversion apparatus; 321, a microphone; 322, a prop capable of expansion and contraction for supporting the microphone 321; 323, a flexible prop for supporting the microphone 321; and 324, an electric cord for electrically connecting the microphone 321 to the speech speed conversion apparatus body 101 by wire.
  • Fig. 33 is a view showing a modified example of this embodiment of Fig. 32, in which the reference numeral 101 designates a body of the speech speed conversion apparatus; 104 a slow switch; 105, a repeat switch; 106, a reset switch; 321, a microphone; 324, an electric cord for electrically connecting the microphone 321 to the speech speed conversion apparatus body 101; 325, an earphone; and 300, a connection member.
  • the reference numeral 101 designates a body of the speech speed conversion apparatus
  • 104 a slow switch
  • 105 a repeat switch
  • 106 a reset switch
  • 321 a microphone
  • 324 an electric cord for electrically connecting the microphone 321 to the speech speed conversion apparatus body 101
  • 325 an earphone
  • 300 a connection member.
  • the microphone 321 is supported by the prop 322 capable of expansion and contraction. Because the aforementioned supporting of the microphone 321 makes the microphone 321 far away from the speech speed conversion apparatus body 101, the rustle of clothes can be prevented from being produced when the apparatus body is put into a breast pocket in use.
  • the microphone 321 is supported by the flexible prop 323. Being supported by such a manner, the microphone 321 is separated from the speech speed conversion apparatus body 101, and can be bent in a desired direction. Accordingly, the rustle of clothes can be prevented from being produced when the apparatus body is put into a breast pocket in use.
  • the microphone 321 and the speech speed conversion apparatus body 101 are electrically connected to each other by wire (or wireless).
  • the S/N ratio can be improved because the microphone 321 and the speech speed conversion apparatus body 101 are electrically connected to each other by wire (or wireless) as described above so that the microphone 321 is disposed near the listener independently of the speech speed conversion apparatus body 101.
  • the speech speed conversion apparatus body 101 and the microphone 321 are electrically connected to each other by the electric cord through the earphone 325 and the connection member 300. Further, operation switches such as the slow switch 104, the repeat switch 105, the reset switch 106, and so on, are provided on the aforementioned connection member 300. In this manner, not only the rustle of clothes can be prevented from being produced when the apparatus body is put into a breast pocket in use but also both the S/N ratio and the handling property can be improved.
  • FIG. 34 there is a view for explaining a lag time display means in a speech speed conversion apparatus according to a further embodiment of the present invention.
  • the reference numeral 341 designates a display portion; and 342, a display screen.
  • the lag time display means in this embodiment displays how much the speech of a speaker is delayed from the real speech speed at the time of the aforementioned slow/repeat reproduction. For example, assuming that one human image represents the time lag of 10 seconds in Fig. 34, then the time lag from the current time is expressed in the number of displayed human images. In this manner, the quantity of time lag from the current time is recognized visually. Accordingly, both speaker and listener can adjust the speech speed conversion easily, so that this apparatus can be used so as to be easy to handle.
  • the visual display of the time lag is realized, for example, by the provision of a liquid crystal display in the front center of the speech speed conversion apparatus body shown in Fig. 6 and by the display of the display screen as shown in Fig. 34 on the liquid crystal display. Further, this display portion is controlled by a "liquid crystal display driver" (not shown) connected to the controller 23E in Fig. 11.
  • this lag quantity counter can be converted at the conversion rate of one to 10 seconds so that a corresponding number of human images can be indicated on the aforementioned display.
  • This displaying operation is carried out by the controller 23E in Fig. 11 through the aforementioned display driver and the timing of rewriting the display is sufficient as long as the rewriting is performed whenever the processing of one frame is completed. For example, this displaying process is carried out between the steps S137 and S138 in Fig. 14.
  • the reference numeral 1000 designates an apparatus portion concerning the speech speed conversion apparatus; 1, a DSP; 5, an A/D converter; 6, a D/A converter; 9, an analog amplifier; 10, an analog amplifier; 1001, an electric source; 1002, an electric power supply line; and 1003, a changeover switch.
  • a standby mode is provided besides the through mode so that entry into the standby mode is made automatically when the through mode is continued for a predetermined time. That is, when either slow switch or repeat switch is pushed (turned on), clock frequency is heightened so that each process is carried out.
  • the DSP 1 operates with fast clock but power is wasteful because the speech speed conversion process or the like is not executed. In the standby mode, therefore, the operating clock for the DSP 1 is lowered so that only I/O of data is performed to thereby reduce consumed electric power. Further, only storage into the memory is performed. In this manner, a voice memory function is realized.
  • the changeover switch 1003 is connected to a contact side to cut off the electric power supply line 1002 and also connected to a contact side to connect the analog amplifiers 10 and 9 directly so that electric power is not supplied to the DSP 1, the A/D converter 5, the D/A converter 6 and peripheral digital circuits. At this time, the storage into the memory is not performed. That is, I/O analog systems are connected directly so as to be operated simply as an analog amplifier.
  • the aforementioned changeover switch is provided as a switch of three stages, namely, an ON stage, an OFF stage and an ON-OFF intermediate stage as shown in Fig. 35, so that the analog through mode is provided.
  • a switch of three stage consisting of an ON stage, an OFF stage and an ON-OFF intermediate stage is formed so that the analog through mode is provided. Accordingly, not only reduction in electric power can be attained but also the range of use of the electric source can be widened.
  • the reference numeral 2000 designates the speech speed conversion means according to the present invention; 3000, a body of the telephone; 3001, a transceiver; and 3002, a telephone line.
  • the telephone in this embodiment is formed by inserting the speech speed conversion means 2000 according to the present invention between the handset 3001 and the telephone body 3000.
  • the speech speed conversion means 2000 is, for example, shaped like a mount on which the telephone body 3000 is put.
  • the speech speed conversion means 2000 is inserted between the transceiver 3001 and the telephone body 3000 by wireless connection.
  • the speech speed conversion means according to the present invention may be used as a speech speed conversion means in a switching system so that it can be operated at the user's request.
  • a voice can be heard over the telephone slowly. Further, because the voice is fed back as a through voice to the speaker side as well as the voice can be heard slowly to the listener so that the speaker can speak ordinarily at the time of telephone conversation with the aged or the like, there is no fear of hard speaking.
  • any A/D means is provided in the inside of the speech speed conversion means as long as the speech speed conversion means is provided as a digital circuit.
  • the reference numeral 2000 designates the speech speed conversion means; 321, a microphone; 325,an earphone; 4003, an amplifier; and 4004, a speaker.
  • the speech speed conversion means 200 is inserted between the microphone 321, the earphone 325 and the amplifier 4003 for the speaker 4004.
  • the listener can hear a voice at a suitable speech speed even in the case where the speaking person does not control the speech speed conversion operation. For example, even in the case where the speaking person talks volubly at a high speech speed (impetuous speed) selfishly, the listener can hear at a suitable speech speed.
  • the listener can hear at a suitable speech speed from a speaker even in the case where a speaking person speaks slowly.
  • the present invention can be applied to technical fields requiring speech speed conversion, such as for example hearing aids, learning of languages, abroad traveling, music, and so on, besides telephones, telephone line switching systems and premises broadcasting.
  • the present invention can be applied to the following cases.
  • any A/D converter is provided in the speech speed conversion apparatus as long as the audio apparatus has a digital output.
  • the present invention can be applied as long as changes are made as follows.
  • the pitch extraction range is widened compared with the case of a voice.
  • the waveform expansion process is carried out on the basis of the pitch of a fixed length.
  • the pitch is detected so that processing is made on the basis of the detected pitch.
  • a foot switch is provided so that a converting operation can be carried out by the foot switch. This makes it possible to control a music instrument while playing.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Quality & Reliability (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)
  • Transmission Systems Not Characterized By The Medium Used For Transmission (AREA)

Claims (28)

  1. Procédé de conversion de vitesse de la parole pour recevoir et mémoriser une parole d'entrée et changer une vitesse de ladite parole d'entrée sans aucun changement du pas de ladite parole d'entrée, dans lequel la conversion de vitesse pour ladite parole d'entrée est effectuée dans une période qui est désignée par un auditeur lorsque la conversion de vitesse de parole est nécessaire, et aucune conversion de vitesse de parole n'est effectuée dans une période autre que ladite période désignée,
       caractérisé en ce que la quantité de retard par rapport au temps réel est ajustée dans une période dans laquelle la parole mémorisée est reproduite dans le cas où le retard est provoqué par une conversion de vitesse de parole ou une opération de répétition.
  2. Dispositif de conversion de vitesse de la parole comportant :
    des moyens (321) pour recevoir une parole d'entrée,
    des moyens de mémorisation (2) pour mémoriser des informations représentatives de ladite parole d'entrée,
    des moyens de conversion de vitesse (1) pour changer la vitesse de ladite parole d'entrée,
    des moyens (325) pour envoyer dans les oreilles d'un auditeur une sortie desdits moyens de conversion de vitesse (1) en tant que sortie de parole,
    un commutateur de conversion de vitesse de parole (113), et
    des moyens adaptés pour émettre une parole tout en changeant la vitesse de ladite parole d'entrée lorsque ledit commutateur de conversion de vitesse (113) est passant, et pour émettre une parole sans changer la vitesse lorsque ledit commutateur est bloqué,
       caractérisé par des moyens de rattrapage pour ajuster la quantité de retard par rapport au temps réel dans une période dans laquelle lesdites informations mémorisées sont reproduites dans le cas où le retard est provoqué par une conversion de vitesse de parole ou une opération de répétition.
  3. Dispositif selon la revendication 2, dans lequel lesdits moyens de mémorisation (2) comportent des moyens pour mémoriser des données sur une base de trame par trame.
  4. Dispositif selon la revendication 3, comportant de plus des moyens pour décider des processus d'extension et de réduction de forme d'onde dans ledit processus de conversion de vitesse sur la base d'une comparaison entre la puissance d'une trame et un seuil fourni en tant que variable.
  5. Dispositif selon la revendication 2, comportant de plus un commutateur de sélection de vitesse (113) pour sélectionner la vitesse de ladite parole, et des moyens pour changer la vitesse de ladite parole en celle sélectionnée par ledit commutateur de sélection de vitesse.
  6. Dispositif selon la revendication 5, comportant de plus des moyens pour commander un dispositif audio/vidéo.
  7. Dispositif selon la revendication 2, comportant de plus un commutateur de répétition (105) et des moyens pour répéter une parole reproduite lorsque ledit commutateur de répétition est passant.
  8. Dispositif selon la revendication 7, dans lequel lesdits moyens de répétition comportent des moyens pour faire revenir en arrière la parole de plusieurs secondes à chaque fois que ledit commutateur de répétition (105) est actionné, des moyens pour produire parfois des sons intermittents pendant que la parole est renvoyée en arrière, des moyens pour stopper le retour en arrière de la parole lorsque la parole atteint la fin d'un tampon en anneau (24), et/ou des moyens pour sélectionner la vitesse au moment de la répétition.
  9. Dispositif selon la revendication 8, dans lequel lesdits moyens pour sélectionner la vitesse au moment de la répétition ont au moins deux des modes suivants : une répétition à valeur de vitesse par défaut, une répétition lente, une répétition rapide et une répétition graduellement accélérée.
  10. Dispositif selon la revendication 2, dans lequel lesdits moyens de rattrapage comportent des moyens pour commencer le rattrapage lorsqu'un mode de reproduction lente est terminé, des moyens pour commencer le rattrapage lorsque la reproduction est renvoyée à l'instant du début d'une répétition après la répétition, des moyens pour sélectionner la vitesse de parole au moment du rattrapage, des moyens pour permuter automatiquement le mode courant en un mode continu pour émettre directement la parole d'entrée lorsque le rattrapage est terminé, et/ou des moyens pour produire un son d'avertissement lorsque le rattrapage est terminé.
  11. Dispositif selon la revendication 10, dans lequel lesdits moyens pour sélectionner la vitesse au moment du rattrapage comportent des moyens pour effectuer un saut non stop jusqu'au temps réel, des moyens pour rattraper le temps réel avec une écoute rapide, et/ou des moyens pour effectuer un mouvement parallèle avec un retard de temps.
  12. Dispositif selon la revendication 2, comportant de plus un commutateur de sélection de vitesse, un commutateur de répétition (105), et/ou un commutateur de réinitialisation (106), ledit commutateur ou chaque commutateur étant agencé dans une partie périphérique sur une surface latérale dudit dispositif de conversion de vitesse de manière à faciliter la manipulation.
  13. Dispositif selon la revendication 12, dans lequel ledit commutateur de réinitialisation (106) comporte des moyens pour stopper l'opération de répétition ou de rattrapage et faire un saut jusqu'au temps réel lorsque ledit commutateur est passant au moment de la répétition ou du rattrapage, et permuter ensuite le mode courant en mode continu.
  14. Dispositif selon la revendication 2, dans lequel lesdits moyens de conversion de vitesse sont munis d'un logiciel (11) exécuté par un processeur de signaux numériques (1) ayant une borne d'entrée pour recevoir un signal de demande d'interruption provenant de l'extérieur, de sorte qu'une commande du processus de conversion de vitesse ou une commutation de la fréquence de conversion de vitesse sur la base dudit commutateur de conversion de vitesse (113) est envoyée dans ledit processeur de signaux numériques (1) via ladite borne d'entrée de signal de demande d'interruption.
  15. Dispositif selon la revendication 2, comportant de plus des moyens pour entendre ladite parole de sortie à travers un écouteur binaural (325).
  16. Dispositif selon la revendication 2, comportant de plus :
    un microphone (321) pour convertir un signal sonore en un signal électrique,
    un amplificateur analogique (10) pour amplifier une sortie dudit microphone (321),
    un filtre passe-bas (7) pour éliminer des composantes à haute fréquence de la sortie dudit amplificateur analogique (10),
    un convertisseur analogique/numérique A/N (5) pour convertir le signal de sortie analogique dudit filtre passe-bas (7) en un signal numérique,
    un processeur de signaux numériques (1) pour exécuter le processus de changement de vitesse,
    des moyens (113) pour changer un paramètre de traitement,
    un convertisseur N/A (6) pour convertir les données numériques de parole en une valeur analogique,
    un second filtre passe-bas (8) pour éliminer des composantes à haute fréquence de la sortie dudit convertisseur N/A (6),
    un second amplificateur analogique pour amplifier la sortie dudit second filtre passe-bas (8), et
    un écouteur (325) pour convertir la sortie dudit second amplificateur analogique en un signal sonore et délivrer le signal sonore dans les deux oreilles.
  17. Dispositif selon la revendication 16, dans lequel lesdits moyens de conversion de vitesse effectuent une série de processus sur une trame entière de manière répétée à travers un processus en pipeline par trame en utilisant une pluralité de tampons de trame d'entrée, ladite série de processus comportant les étapes consistant à :
    appliquer un processus d'extraction de pas dans une partie avant de la trame pour détecter le pas de la partie avant,
    transférer des données ayant une longueur d'un pas ainsi détectées dans des tampons de sortie,
    multiplier des données ayant une longueur de deux pas par une fonction de fenêtre qui change de 0 à 1, et par une fonction de fenêtre qui change de 1 à 0,
    ajouter des données respectives obtenues par les multiplications par les fonctions de fenêtre pour produire un motif d'onde reproduit ayant une durée de deux pas,
    insérer le motif d'onde reproduit dans la partie arrière des données transférées au préalable ayant une longueur d'un pas,
    effectuer un processus de détection de pas à nouveau tout en avançant d'une position à une distance de deux pas de la position soumise au préalable au processus d'extraction de pas de manière à effectuer une détection de pas à ladite position, et
    transférer dans les tampons de sortie des données ayant une longueur de n pas (n étant un entier) sur la base de la longueur de pas obtenue par la détection de pas final.
  18. Dispositif selon la revendication 17, dans lequel ledit processus de conversion de vitesse est exécuté uniquement si la puissance moyenne de données dans une trame d'entrée est supérieure à un seuil établi au préalable, les données contenues dans ladite trame étant transférées directement dans les tampons de sortie si ladite puissance moyenne est inférieure audit seuil.
  19. Dispositif selon la revendication 18, dans lequel un second seuil est fourni dans le processus de seuil pour la puissance moyenne des données dans la trame d'entrée de sorte que, lorsqu'une trame ayant une puissance moyenne inférieure audit second seuil se prolonge pendant un temps plus long qu'un seuil de temps établi au préalable, des données dans la trame ayant une puissance moyenne inférieure au second seuil et se prolongeant pendant un temps plus long que ledit seuil de temps sont interdites de transfert dans les tampons de sortie.
  20. Dispositif selon l'une quelconque des revendications 2 à 15, dans lequel ledit commutateur ou chacun desdits commutateurs a une sensation de toucher doux de sorte que le microphone (321) ne prend pas le bruit de déclic du commutateur.
  21. Dispositif selon la revendication 20, dans lequel ledit commutateur ou chacun desdits commutateurs a des formes de surface respectives différentes en ce qui concerne le toucher de manière à les identifier sans les voir.
  22. Dispositif selon l'une quelconque des revendications 2 à 21, comportant de plus des moyens de prévention de bruissement pour changer la distance entre un microphone (321) et le corps de dispositif (101) de sorte que ledit microphone (321) ne touche pas des vêtements directement lorsque ledit corps de dispositif (101) est mis en utilisation dans une poche poitrine.
  23. Dispositif selon la revendication 2, comportant de plus des moyens d'affichage agencés à une position prédéterminée dudit dispositif de conversion de vitesse pour indiquer visuellement la quantité de retard de temps par rapport au temps réel.
  24. Dispositif selon l'une quelconque des revendications 2 à 23, dans lequel un tampon en anneau (24) est utilisé en tant que lesdits moyens de mémorisation (2), et ledit dispositif comporte de plus des moyens pour gérer le temps de retard par un compteur indiquant le retard de temps sur ledit tampon en anneau (24).
  25. Dispositif selon la revendication 16, dans lequel un mode d'attente pour diminuer le cycle d'horloge du processeur (1) et effectuer le même processus que dans le mode continu est fourni en plus du mode continu.
  26. Dispositif selon la revendication 2, comportant de plus un commutateur de source électrique (109) actionné selon trois niveaux constitués d'un niveau MARCHE, d'un niveau ARRET et d'un niveau intermédiaire MARCHE-ARRET, et des moyens d'alimentation de source électrique actionnés dans un mode continu analogique dans lequel des systèmes d'entrée-sortie analogiques sont court-circuités de manière à être directement connectés les uns aux autres pour stopper l'alimentation de source électrique dans un système de traitement numérique entre lesdits systèmes d'entrée-sortie analogique lorsque ledit commutateur est ajusté audit niveau intermédiaire.
  27. Téléphone comportant le dispositif de conversion de vitesse de parole de la revendication 2, entre le combiné et le corps du téléphone.
  28. Système de commutation de ligne téléphonique comportant le dispositif de conversion de vitesse de la parole de la revendication 2.
EP19940114160 1993-09-10 1994-09-08 Méthode et appareil pour la conversion de la vitesse de la parole Expired - Lifetime EP0643380B1 (fr)

Applications Claiming Priority (6)

Application Number Priority Date Filing Date Title
JP22544993 1993-09-10
JP22544993 1993-09-10
JP225449/93 1993-09-10
JP167232/94 1994-07-19
JP16723294A JPH07129190A (ja) 1993-09-10 1994-07-19 話速変換方法及び話速変換装置並びに電子装置
JP16723294 1994-07-19

Publications (3)

Publication Number Publication Date
EP0643380A2 EP0643380A2 (fr) 1995-03-15
EP0643380A3 EP0643380A3 (fr) 1995-05-24
EP0643380B1 true EP0643380B1 (fr) 1999-11-24

Family

ID=26491341

Family Applications (1)

Application Number Title Priority Date Filing Date
EP19940114160 Expired - Lifetime EP0643380B1 (fr) 1993-09-10 1994-09-08 Méthode et appareil pour la conversion de la vitesse de la parole

Country Status (4)

Country Link
EP (1) EP0643380B1 (fr)
JP (1) JPH07129190A (fr)
CA (1) CA2131730A1 (fr)
DE (1) DE69421774T2 (fr)

Families Citing this family (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1517299A3 (fr) * 1997-04-30 2012-08-29 Nippon Hoso Kyokai Méthode et système pour la détection d'un intervalle de parole, et méthode et système pour modifier le débit de parole utilisant la méthode et le système pour la détection d'un intervalle de parole
JP2990265B1 (ja) * 1998-08-03 1999-12-13 北陸先端科学技術大学院大学長 補聴器およびその周波数特性の設定方法
JP4367808B2 (ja) 1999-12-03 2009-11-18 富士通株式会社 音声データ圧縮・解凍装置及び方法
JP2001255894A (ja) * 2000-03-13 2001-09-21 Sony Corp 再生速度変換装置及び方法
JP4513163B2 (ja) * 2000-04-07 2010-07-28 ソニー株式会社 話速変換装置およびスピーカ装置
JP2001290500A (ja) * 2000-04-07 2001-10-19 Sony Corp 話速変換装置、スピーカ装置、テレビジョン受像機
JP5336026B2 (ja) * 2000-08-10 2013-11-06 トムソン ライセンシング 可変速度データ再生機能を備えたシステム用のメモリアドレス指定方法
US7149412B2 (en) * 2002-03-01 2006-12-12 Thomson Licensing Trick mode audio playback
US7260035B2 (en) 2003-06-20 2007-08-21 Matsushita Electric Industrial Co., Ltd. Recording/playback device
EP1840877A4 (fr) * 2005-01-18 2008-05-21 Fujitsu Ltd Méthode de changement de vitesse d'elocution et dispositif de changement de vitesse d'elocution
DE102005021524A1 (de) * 2005-05-10 2006-11-16 Siemens Ag Verfahren und Vorrichtung zum Eingeben von Zeichen in eine Datenverarbeitungsanlage
JP4876245B2 (ja) * 2006-02-17 2012-02-15 国立大学法人九州大学 子音加工装置、音声情報伝達装置及び子音加工方法
WO2007145079A1 (fr) * 2006-06-12 2007-12-21 Kazuo Ishikawa Machine et programme d'apprentissage de reproduction de répétition
JP5176391B2 (ja) * 2007-05-24 2013-04-03 ヤマハ株式会社 音声送信装置
US8840400B2 (en) 2009-06-22 2014-09-23 Rosetta Stone, Ltd. Method and apparatus for improving language communication
JP6181921B2 (ja) * 2012-11-20 2017-08-16 日本放送協会 音声再生装置および音声合成再生装置ならびにこれらのプログラム
JP6904255B2 (ja) * 2015-10-19 2021-07-14 ソニーグループ株式会社 情報処理システム及びプログラム
CN110364177A (zh) * 2019-07-11 2019-10-22 努比亚技术有限公司 语音处理方法、移动终端及计算机可读存储介质

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4630301B1 (en) * 1985-06-04 1999-09-07 Well Made Toy Mfg Co Voice activated echo generator
DE4227826C2 (de) * 1991-08-23 1999-07-22 Hitachi Ltd Digitales Verarbeitungsgerät für akustische Signale

Also Published As

Publication number Publication date
CA2131730A1 (fr) 1995-03-11
EP0643380A2 (fr) 1995-03-15
JPH07129190A (ja) 1995-05-19
EP0643380A3 (fr) 1995-05-24
DE69421774T2 (de) 2000-08-10
DE69421774D1 (de) 1999-12-30

Similar Documents

Publication Publication Date Title
US5717818A (en) Audio signal storing apparatus having a function for converting speech speed
EP0643380B1 (fr) Méthode et appareil pour la conversion de la vitesse de la parole
KR100283421B1 (ko) 음성 속도 변환 방법 및 그 장치
CN110870201B (zh) 音频信号调节方法、装置、存储介质及终端
US9531338B2 (en) Signal processing apparatus, signal processing method, program, signal processing system, and communication terminal
KR100916726B1 (ko) 청력 역치 측정 장치 및 그 방법과 그를 이용한 오디오신호 출력 장치 및 그 방법
CN111508531B (zh) 音频处理方法及装置
CN112185324B (zh) 调音方法、装置、存储介质、智能设备及调音系统
JP2012083746A (ja) 音処理装置
CN111837179A (zh) 捕获噪声用于模式识别处理的系统和方法
CN110286874A (zh) 一种处理方法及电子设备
US7233200B2 (en) AGC circuit, AGC circuit gain control method, and program for the AGC circuit gain control method
JP3367592B2 (ja) 自動利得調整装置
EP2849341A1 (fr) Contrôle de volume pour le rendu audio d'un signal audio
CN110660376B (zh) 音频处理方法、装置及存储介质
WO2022135071A1 (fr) Procédé et appareil de commande de lecture pour écouteurs, dispositif électronique et support de stockage
JPH09311696A (ja) 自動利得調整装置
JP2837639B2 (ja) リモートコントローラ
CA3161269A1 (fr) Appareil et procede de commande automatique de volume a compensation de bruit ambiant
JPH1049191A (ja) 話速変換装置
CN112532788A (zh) 音频播放方法、终端及存储介质
JPH11308062A (ja) 音声出力装置の音量自動調整装置
KR101696997B1 (ko) Dsp 내장 코덱을 이용한 소음에 따른 출력 음향 크기 자동 조정 장치
JPH09146587A (ja) 話速変換装置
CN212724693U (zh) 一种家用小型化的卡拉ok系统

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

AK Designated contracting states

Kind code of ref document: A2

Designated state(s): DE DK NL SE

PUAL Search report despatched

Free format text: ORIGINAL CODE: 0009013

AK Designated contracting states

Kind code of ref document: A3

Designated state(s): DE DK NL SE

17P Request for examination filed

Effective date: 19951109

17Q First examination report despatched

Effective date: 19980930

GRAG Despatch of communication of intention to grant

Free format text: ORIGINAL CODE: EPIDOS AGRA

GRAG Despatch of communication of intention to grant

Free format text: ORIGINAL CODE: EPIDOS AGRA

GRAH Despatch of communication of intention to grant a patent

Free format text: ORIGINAL CODE: EPIDOS IGRA

GRAH Despatch of communication of intention to grant a patent

Free format text: ORIGINAL CODE: EPIDOS IGRA

GRAA (expected) grant

Free format text: ORIGINAL CODE: 0009210

AK Designated contracting states

Kind code of ref document: B1

Designated state(s): DE DK NL SE

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: SE

Free format text: THE PATENT HAS BEEN ANNULLED BY A DECISION OF A NATIONAL AUTHORITY

Effective date: 19991124

REF Corresponds to:

Ref document number: 69421774

Country of ref document: DE

Date of ref document: 19991230

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: DK

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20000224

PLBE No opposition filed within time limit

Free format text: ORIGINAL CODE: 0009261

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: NO OPPOSITION FILED WITHIN TIME LIMIT

26N No opposition filed
PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: NL

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20010401

NLV4 Nl: lapsed or anulled due to non-payment of the annual fee

Effective date: 20010401

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: DE

Payment date: 20040903

Year of fee payment: 11

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: DE

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20060401