GB2213623A - Phoneme recognition - Google Patents

Phoneme recognition Download PDF

Info

Publication number
GB2213623A
GB2213623A GB8828532A GB8828532A GB2213623A GB 2213623 A GB2213623 A GB 2213623A GB 8828532 A GB8828532 A GB 8828532A GB 8828532 A GB8828532 A GB 8828532A GB 2213623 A GB2213623 A GB 2213623A
Authority
GB
United Kingdom
Prior art keywords
phoneme
transient
voice signal
detecting
parameters
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
GB8828532A
Other versions
GB2213623B (en
GB8828532D0 (en
Inventor
Makoto Akabane
Yoichiro Sako
Atsunobu Hiraiwa
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sony Corp
Original Assignee
Sony Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from JP62310569A external-priority patent/JP2643202B2/en
Priority claimed from JP32330787A external-priority patent/JPH01165000A/en
Priority claimed from JP62331656A external-priority patent/JPH01170998A/en
Application filed by Sony Corp filed Critical Sony Corp
Publication of GB8828532D0 publication Critical patent/GB8828532D0/en
Publication of GB2213623A publication Critical patent/GB2213623A/en
Application granted granted Critical
Publication of GB2213623B publication Critical patent/GB2213623B/en
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/04Segmentation; Word boundary detection

Description

1 PHONEME RECOGNITION AND VCICE SIGNAL STATUS DETECTION SYSTEMS This
invent-'on relates to phoneme recognition and voice signal status detection systems, which may form phoneme segment information for segmenting input speech into phoneme segments, particularly for phoneme recognition in speech recognition.
Phoneme recognition is the basis of the recognition of continuous speech and large vocabulary speech. Objective input speech must be segmented into phoneme segments for phoneme recognition. For example, when a syllable 'ISU" is pronounced, the sound waveform can be segmented into a phoneme segment of the consonant "S" and a phoneme segment of the vowel I1Ull.
A method of obtaining a segment boundary by comparing the power or zero-crossing rate of speech with a threshold has been used as a method of phonemic segmentation. However, it has been difficult to achieve accurate pnonemic segmentation simply by comparing the power or zero-crossing rate of speech with a threshold, because the setting of the _threshold is Zifficult. According to this method, a transient detection paramete:- is compared with a threshold to detect a transient part which is grea:er than the threshold, and a stationary part which is smaller than t.-e threshold. The principal object of the transient detection parameter is the detection of a point where the speech spectrum varies mcst sharply, namely, a peak point. Therefore, it is difficult to meas-,re the transient state and the stationary state through the simple application of the transient detection parameter. Since it is diffic,-lt to set the threshold, it is accordingly difficult to discriminate accurately between the stationary part and the transient part.
According tz an aspect of the invention there is provided a system for detect-ng voice signal status, the system comprising:
sound analysing means arranged to receive an input voice signal for acoustically w-alysing said input voice signal and for providing a speech spectrum t,,..-ereof; means recei7ing said speech spectrum from said sound analysing means for deriving transient detection parameters; 2 means receiving said derived transient detection parameters for generating the difference between two successive transient detection parameters; first detecting means receiving said difference for detecting a 5 stationary part of said input voice signal; and second detecting means receiving said difference for detecting a transient part of said input voice signal.
According to another aspect of the invention there is provided a system for recognising phoneme boundaries in a voice signal, the system comprising:
means arranged to receive an input voice signal for acoustically analysing said input voice signal and for generating a plurality of acoustic parameters; means receiving a plurality of said acoustic parameters for detecting feature points of said acoustic parameters; means for producing phoneme segment boundary candidates and phoneme boundary characteristics corresponding to each of said phoneme segment boundary candidates, based on said detected feature points; and means for recognising phoneme boundaries in said input voice signal according to said phoneme segment boundary candidates and said phoneme boundary characteristics corresponding thereto. A preferred embodiment of the present invention, to be described in greater detail hereinafter, provides a phoneme recognition system capable of detecting the stationary part, transient part, and an undecided part, namely neither the stationary part nor the transient part, of input speech, at higher accuracy from transient detection parameters each equivalent to the sum of variance of frequency channels within a block on the time axis, and the difference between the transient detection parameters. 30 The preferred embodiment provides a phoneme recognition system capable of forming phoneme segment information by obtaining phoneme segment boundary candidates from the feature point information including the rising points, falling points and peak points of a plurality of phoneme segment parameters obtained through the sound analysis of input speech, and the phoneme boundary features of boundary candidates including a rising from silence phoneme, and transition of consonant- to-vowel and vowel-to-vowel, and which is capable of Al 3 discriminating the phoneme segment accurately and efficiently on the basis of the peak feature points of the transient detection parameters, by using each transient detection parameter as one of the phoneme segment parameters.
According to a further aspect of the invention there is provided a system comprising a sound analyser for acoustically analysing an input voice signal and for providing a speech spectrum thereof, a first generator for generating a transient detection parameter from the speech spectrum, a second generator for generating the difference between two adjacent transient detection parameters, and a detector for detecting stationary portions and transient portions of the input voice signal according to the generated difference between the adjacent two of the transient detection parameters.
The system may further comprise another generator for generating a plurality of acoustic parameters of the input voice signal such as a logarithm power spectrum and a zero cross rate, another detector for detecting feature points of the acoustic parameters such as rising points, increasing points, peak points and the like, so that the system may provide phoneme segment boundary features of the input voice signal.
The invention will now be described by way of example with reference to the accompanying drawings, throug-out which like parts are referred to by like references, and in which:
Figure 1 is a block diagram of a phoneme recognition system according to an embodiment of the invention; Figure 2 illustrates an example of a waveform of an input voice signal and a transient detection parameter corresponding thereto; Figure 3 shows examples of feature points; Figure 4 is a more detailed block diagrar., of a stationary and transient portion detector shown in Figure 1; Figure 5 shows examples of waveforms of an input voice signal and acoustic parameters thereof; Figure 6 is a more detailed block diagram of a phoneme boundary candidate generator shown in Figure 1; Figure 7 is a table showing the relationship of phoneme boundary characteristics to acoustic parameters and feature points; 4 Figure 8 shows the relative priorities of each acoustic parameter; and Figure 9 is a flow chart of the operation for detecting the stationary and transient portions of the input voice signal.
A phoneme recognition system in accordance with an embodiment of the present invention obtains phoneme segment information on the basis of the peak feature points of transient detection parameters. Prior to description of the phoneme recognition system, the transient detection parameters will be explained.
As an example, when the syllable 11SUll is pronounced, a voice waveform as shown in Figure 2A is obtained. Thus, the syllable "SU" can be phonemised into a consonant "S" and a vowel "U". As can be seen from the voice waveform, a phoneme boundary exists in a transient part of the voice waveform where the phoneme changes. Accurate phoneme recognition can be achieved by recognising the phoneme in a stationary part of a phoneme segment.
Use of transient detection parameters is an effective means of detecting the transient state and the stationary state.
The transient detection parameter is represented by the variation of a speech spectrum defined by the sum of variance within a block on the time axis of each frequency channel.
That is, first the gain of the speech spectrum Si(n) normalised by the average Savg(n) in the direction of frequency.
q Savg(n) - 1 Si(n)/q j=1 is where i is the channel number, and q is the' number of channels. The information relating to each of the q channels is sampled on the time axis. A block of information concerning the q channels at the same time point is designated a frame. In the expression ( 1), n is the frame number of a frame for. recognition.
The gain-normalised voice spectrum Si(n) is expressed by Si(n) = Si(n) Savg(n) 1 A transient detection parameter T(n) is represented by the sum of variance on the time axis of each channel within blocks (n-M, n+M) which are the sum (2M+1) of M frames before and after the frame.
q M ja T (n) -. ú 1 Si (n+j) - Ai (n) (3) j=jj= Ai(n) - P'i(n+j)/(2M+I) i=-M (4) where Ai(n) is the average on the time axis within the block of each channel.
In particular, since the variation in the central Dart of the [n-M, n+M] block is liable to pick up fluctuation in sound and noise, the expression (3) is aeveloped into an expression (5) to eliminate the variation in the central part in calculating the transient detection parameter T(n).
iq -M., a M la 1 E Si (n+j) - Ai (n) 1 + E rS^i (n+ j Ai (n)}2q(M-m+1) i=m 2q(M-m+1) The transient detection parameter T(n) may be determined, for example, by substituting a = 1, M = 28, m = 3 and q = 32 into the expression (5). In the case of the input speech "SU", a transient detection parameter as shown in Figure 2B is obtained.
The peak points of the transient detection parameter T(n) are stable features in the transient parts. Determination of phoneme 6 boundary candidates on the basis of the transient detection parameters T(n) enables the avoidance of erroneous selection of phoneme boundary candidates. The system embodying the present invention particularly utilises such characteristics of the transient detection parameters.
Figure 1 shows a phoneme recognition system according to a preferred embodiment of the present invention, which is equipped with a phoneme segment information forming apparatus. A speech signal generated by a microphone 1 is transmitted through an amplifier 2 and a low-pass filter 3 for limiting bandwidth to an analog- to-d ig ital (A/D) converter 4 which samples the speech signal, for example, at a sampling frequency of 12.5 kHz to convert the analog speech signal into a digital speech signal which is then supplied to a sound analysing unit 5.
The sound analysing unit 5 comprises a band pass filter bank 51 and a sound analyser 54. The band pass filter bank 51 may comprise, for example, thirty-two channels of digital band pass filters 51101 5111, 5112,... 51131' The digital band pass filters 5110, 5111,... and 51131 may, for example, be Butterworth digital filters of fourth degree having equal division bands of a bandwidth between 250 Hz and 5.5 kHz on a logarithmic axis. The output signals of the digital band pass filters 5110, 511,... and 5113 1 are applied to rectifiers 5120, 5121,... and 51231, respectively. The output signals of the rectifiers 5120, 5121,... 5123, are applied to digital low-pass filters 513C), 5131, -.. 51331, respectively. The digita'L low-pass filters 513C), 5131,... 5133, may, for example, be FIR low-pass filters having a cutoff frequency of 52.8 Hz. The output signals of the digital low-pass filters 5130, 5131, ... and 5133 1 are applied to a sampler 52. The sampler 52 samples the output signals of the digital low-pass fiiters 5130, 5131,... and 51331 at a frame period of 5.12 ms. Thus a sample time series forming the speech spectrum Si(n) (i = 1, 2, ... and 32, n = 1, 2,... and N (frame number) 0) is obtained.
The output signal of the sampler 52, namely, the sample time series Si(n), is applied to a normalisation circuit 53 to obtain a time series Si(n) of a normalised speech spectrum.
The sample time series Si(n) of the speech spectrum provided by the normalisation circuit 53 is applied to a transient detection parameter computing unit 6, which executes computation by using the i:l 7 expression (5) to obtain the transient detection parameters T(n). 'In the computation using the expression (5), for example, M = 5 and ir., = 2 (which are smaller than M = 28 and M = 3 used in the foregoing computation) may be used to detect the transient parts and the 5 stationary parts and to reduce the number of computations.
The transient detecting parameter T(n) for an input speech "ASA" for instance, is shown in Figure 5A. Figure 5G is the waveform of the input speech signal.
The sound analyser 54 of this embodiment comprises a logarithmic power detector 541 for detecting the logarithmic power of the input speech signal, a zero-crossing rate computer 542, a computer 543 for computing a primary PARCOR coefficient indicating the degree of correlation between the successive samples, a computer 544 for computing the inclination of the power spectrum, and a pitch period detector 545 for detecting the pitch period of the input speech signal. The detected pitch period is applied to a phoneme recognising unit 10.
In the computation of these parameters, namely, the logarithmic power, the zero-crossing rate, the primary PARCOR coefficient, the inclination of power spectrum and the pitch period, a window having a time width corresponding to M frames before a time point (frame) and M frames after the time point is shifted successively by one sampling point at a time on the time axis to generate the parameters by carrying out computation within each window. These parameters are given to a sampler 55, which samples the parameters at the same sampling pulses as those for the sampler 52. Accordingly, the sampler 55 provides parameters of analysed information in the same time series as that for the speech spectrum Si(n).
Figures 5B, 5C, 51) and 5E respectively show the logarithmic power, the zero-crossing rate, the primary PARCOR coefficient and the inclination of the power spectrum thus obtained. Figure 5F shows the pitch period of the speech.
The parameters thus obtained by the sound analysing unit 5 are fed as parameters for a recognising process to the phoneme recognising unit 10. The transient detection parameters T(n) computed by the transient detection parameter computing unit 6 and the parameters determined by the sound analysing unit 54 excluding the pitch period are fed to a feature point extracting unit 7.
8 The feature point extracting unit 7 extracts genera1 feature points to obtain phoneme boundary candidates from the parameters for segmentation. In this example, the following seven feature points (1) to (7) as shown in Figure '5 are used.
(1) Rising point.
(2) Falling point.
(3) Increasing turning point.
(4) Decreasing turning point.
(5) Peak point.
(6) Positive zero-crossing point.
(7) Negative zero-crossing point.
The feature point extracting unit 7 extracts the feature points of the parameters with reference to feature point information provided by a feature point information storage unit 71. In Figures 5A to 5E, positions on the time axis indicated by vertical lines are the feature points of the parameters. For example, peak points (5) may be extracted as the feature points of the transient detection parameters T(n), and rising points (1), falling points (2), increasing turning points (3) and decreasing t-urning points (4) may be extracted as the feature points of the parameters of the logarithmic power and the zerocrossing rate.
The feature point information obtained by the feature point, extracting unit 7 is applied to a phoneme boundary candidate forming unit 9, which determines phoneme boundary candidates on the basis of the transient detection parameters T(n) and extracts the features of the phoneme boundary candidates.
The phoneme boundary candidate forming unit 9 makes reference to a decision output provided by a transient part, stationary part and undecided part deciding unit 8. The deciding unit 8 receives the transient detection parameters T(n) from the transient detection parameter computing unit 6, and the peak feature point information on the transient detection parameters T(n) from the feature point extracting unit 7, and then the deciding unit 8 decides which are the undecided parts belonging tz, neither the transient parts of the input speech nor the stationary parts of the input speech.
The transient part, s:ationary part and undecided part deciding unit 8 is shown in Figure 4 as comprising a difference computing unit W 2 9 80, a parameter memory 81, a difference memory 82, a stationary part deciding unit 83, a transient point detecting unit 84, a transient part deciding unit 85 and an undecided part deciding unit 86.
The transient detection parameters T(n) provided by the transient detection parameter computing unit 6 are applied to the difference co, mputing unit 80 to compute the difference dT(n) between the successive transition detection parameters.
dT(n) = T(n+l) - T(n) (6) The parameter memory 81 stores the transient detection parameters T(n) provided ty the transient detection parameter computing unit 6, and the difference memory 82 stores the difference dT(n).
The deciding operation will now be described.
(i) The stationary part deciding unit 83 sends a search signal to the memories 81 and 82 to read the transient detection parameters T(n) and the difference dT(n) sequentially from the memories 81 and 82, and decides a segment to be a stationary part when the segment meets the conditions T(n) T 1 0 or and T(n) -5 Ts2 (TS, < Ts2) dT(n) 1 doi (8) where T.1, T.2 and do are set thresholds, for example, T,, = 1.0, Ts2 1.5 and do = 0.1.
(ii) The transient point detecting unit 84 detects the peak points of the transient detection parameters T(n) (Figure 5B) from the feature point extracting unit 7, regards the peak points as transient points each representing the centre of the transient part, and then gives position information (frame numbers) about the transient points to the transient part deciding unit 85.
(iii) The transient part deciding unit 85 sends a search signal having the basic point on the transient point to the difference memory 82 to read the difference dT(n). The past difference is searched backwards with respect to time from the transient point as a basic point (hereinafter, this mode of search will be referred to as "backward search") and decides a segment having a difference dT(n) meeting the condition dT(n) d, (dl is a threshold) (9) to be a rear transient part. For example, d, = 0.2.
(iv) In the backward search, wher, a segment meeting the expression (9) overlaps a stationary part decided by the stationary part deciding unit 83, a segment immediately before a portion of the segment before the stationary part is regarded as a transient part.
(v) Then, the transient part deciding unit 85 makes a search forwards with respect to time (hereinafter, this mode of search is referred to as "forward search") from the transient point as a basic point and decides a segment having a dT(n) meeting an inequality dT(n) f-di .... (10) to be a forward transient part.
(vi) In the forward search, when the segment meeting the expression (10) overlaps a stationary part, a portion of the segment immediately before the stationary part is regarded as a transient part.
(vii) A transient part having its centre on a transient point is detected from the backward transient part And the forward transient part. The foregoing procedure is executed for all the transient points to discriminate all the transient parts.
(viii) Then, the undecided part deciding unit 86 makes reference to the respective decision outputs of the stationary part deciding unit 83 and the transient part deciding unit 85 and decides segments decided to be neither a stationary part nor a transient part to be undecided parts. in Figure 5A, parts indicated by thick solid lines are transient parts, parts indicated by thin solid lines are stationary 9 7 parts, and parts indicated by broken lines are undecided parts.
The decision output of the undecided part deciding unit 86 is supplied, together with the respective decision outputs of the stationary part deciding unit 83 and the transient part deciding unit 85, to the phoneme boundary candidate forming unit 9.
Attention is directed particularly to the stationary parts among data included in the decision output of the deciding unit 8 applied to the phoneme recognising unit 10 for phoneme recognition, and the undecided parts are ignored to achieve accurate phoneme recognition, because the undecided parts are factors of variation. A computer may be employed for carrying out the foregoing operation. Figure 9 is a flow chart showing procedures for deciding the stationary part, the transient part and the undecided part.
The phoneme boundary candidate forming unit 9 will be described hereinafter with reference to Figure 6.
The phoneme boundary candidate forming unit 9 determines phoneme boundary candidates. The following eight phoneme boundary characteristics are used.
(1) Rise from silence (S-R).
(2) Consonant-to-vowel transition (C-V).
(3) Consonant-to-consonant transition (C-C).
(4) Vowel-to-vowel transition (VV).
(5) Fall-to-vowel transition (V-F).
(6) Vowel-to-consonant transition (V-C).
(7) Fall-to-silence transition (F-5). (8) Sound-to-silence transition (SS).
A phoneme boundary characteristics information storage unit 91 stores data representing these eight phoneme boundary characteristics. A phoneme boundary candidate and characteristics discriminating unit 93 discriminates phoneme boundary characteristics of phoneme boundary candidates with reference to information fetched from the phoneme boundary characteristics information storage unit 91. In Figure 7, the phoneme boundary characteristics data are represented by the symbols S-R, C-C, C-V and the like. Also shown in Figure 7 are sound parameters constituting phoneme boundaries, and the numbers (1), (2), (3) Y... of the feature points extracted by the feature point extracting unit 7 shown in Figure 3. Each of the phoneme boundary 12 characteristics may correspond to the plurality of sound parameters and feature points.
A reference priority information storage unit 92 stores reference priority information of the sound parameters as shown in Figure 8, in which the priority of the right-hand parameter is higher than that of the left-hand parameter.
A phoneme boundary candidate and characteristics discriminating unit 93 collects the feature points of the parameters to decide a phoneme boundary candidate and determines the phoneme boundary characteristics of the phoneme boundary candidate using the feature points obtained by the feature point extracting unit 7 dislocated or undetected with the parameters.
In this operation, the discriminating unit 93 makes reference to the transient part decision output provided by the deciding unit 8. The discriminating unit 93 regards the transient point in the transient part, namely, the peak, feature point of the transient detection parameter, as the first phoneme boundary candidate, and examines the feature point of other sound parameter in the vicinity of the transient point to determine a phoneme boundary candidate. In this operation, the discriminating unit 93 decides the reference priority of each parameter with reference to the reference priority information provided by the storage unit 92, and discriminates a phoneme boundary feature corresponding to the feature point of the sound parameter regarded as the phoneme boundary candidate with reference to the phoneme boundary characteristics information provided by the memory unit 91.
Thus, the discriminating unit 93 discriminates the phoneme boundary characteristics of C-V, C-C, V4, V-F, V-C and F-S.
Another feature discriminating unit 94 makes reference to the transient part decision output provided by. the deciding unit 8 to search for further feat-are points before the transient point other than the phoneme boundary candidate discriminated by the discriminating unit 93. If any feature point is found, the feature discriminating unit 94 discriminates the phoneme boundary characteristics of C-V and C-C by using the detected feature point. The discriminating unit 94 deals with the following cases. For example, a bilabial voiced plosive "BA" has little stationary part between transient parts, the two transient parts being close to each other, and hence only one feature point can i 1 1 i r, 13 be detected. Therefore, a peak feature point whien must originally be before the transient point is detected from the feature point of another parameter. A feature point after the transient point is not searched for, because, in the Japanese language, a vowel is preceded by a consonant, and the peak of the vowel is higher than that of the consonant.
Naturally, different languages differ in the expected position of a feature point to be searched for, and hence a method suitable for the specific language is applied to searching for the feature point.
A sound/silence discriminating unit 95 receives the stationary part decision output of the deciding unit 8, and discriminates between, the stationary part of sound and the stationary part of silence from, the feature point information about the logarithmic power and the zerocrossing rate.
An S-R/S-S discriminating unit 96 receives the sound/silence discrimination.output of the sound/silence discriminating unit 95 and the feature point information about the logarithmic power and the zerocrossing rate, and then discriminates between the phoneme boundary feature S-R of rise from silence and the phoneme boundary feature 5-S of transition from sound to silence.
The results of discrimination of the discriminating units 93, 94 and 96 are given to a phoneme boundary candidate deciding unit 97. Then, the phoneme boundary candidate deciding unit 97 applies collectively the position (frame) of the phoneme boundary candidate and the phoneme boundary characteristics obtained by the discriminating units 93, 94 and 96 to the phoneme recognising unit 10. The phoneme boundary candidate and the phoneme boundary features of the specific example are shown under the speech waveform shown in Figure 5G.
In this example, a transient part feature output unit 98 receives the phoneme boundary characteristics from the phoneme boundary candidate deciding unit 97, and the transient part decision output from the deciding unit 8. Then, the transient part characteristics output unit 98 gives a phoneme boundary characteristic of the transient part including the phoneme boundary to the phoneme recognising unit 10.
The phoneme recognising unit 10 carries out phoneme recognition by using the parameters provided by the sound analysing unit 5 and making reference to the phoneme segment information provided by the 14 phoneme boundary candidate forming unit 9. Then, the phoneme recognising unit 10 determines a phoneme symbol and gives the phoneme symbol, for example, to a continuous speech and large vocabulary speech recognising unit (not shown). 5 The hardware part of this embodiment, namely, the feature point extracting unit 7, the transient part, stationary part and undecided part deciding unit 8, the phoneme boundary candidate forming unit 9, and the operating elements of the sound analysing unit 5, may 'be substituted by computer software. Thus, the phoneme recognition system extracts feature points expected to be phoneme boundaries from a plurality of parameters obtained throu&h sound analysis, and decides a phoneme segment candidate from the data of the feature points of the plurality of parameters. Accordingly, more accurate phoneme segment information can readily be obtained. Furthermore, since the phoneme segment information includes the characteristics of the phoneme segment candidate, phoneme recognition can easily be achieved. Further, since the phoneme boundary candidate is decided on the basis of the peak point of the transient detection parameter, which is a stable feature point in a transient part of the input speech, the selection of an erroneous phoneme boundary candidate is obviated.
Since the difference between the transient detection parameters is calculated, the stationary part is decided on the basis of tne transient detection parameters and the difference, and the transient part is decided on the basis of the difference, instead of deciding the stationary partand the transient part through the simple comparison of the transient detection parameters with a threshold; accurate decision of the stationary part and the transient part is thus achieved.
Furthermore, since the system identifies a segment which is neither a stationary part nor a transient part to be an undecided part, phoneme segment decision and phoneme recognition can be achieved by using segments excluding the undecided parts, which are the factors of variation, and the undecided part decision output.

Claims (6)

1. A system for detecting voice signal status, the system comprising: sound analysing means arranged to receive an input voice signal for acoustically analysing said input voice signal and for providing a speech spectrum thereof; means receiving said speech spectrum from said sound analysing means for deriving transient detection parameters; means receiving said derived transient detection parameters for generating the difference between two successive transient detection parameters; first detecting means receiving said difference for detecting a stationary part of said input voice signal; and second detecting means receiving said difference for detecting a transient part of said input voice signal.
2. A system according to claim 1, comprising third detecting means receiving output signals from said first detecting means and said second detecting means for detecting an undecided part of said input voice signal, said undecided part being neither said stationary part nor said transient part.
3. A system according to claim 1 or claim 2, comprising peak detecting means for detecting a peak of said transient detection parameters.
4. A system for detecting voice signal status, the system being substantially as herein described with reference to the accompanying drawings.
5. A system for recognising phoneme boundaries in a voice signal, the system comprising:
means arranged to receive an input voice signal for acoustically 35 analysing said input voice signal and for generating a plurality of acoustic parameters; k 16 means receiving a plurality of said acoustic parameters for detecting feature points of said acoustic parameters; means for producing phoneme segment boundary candidates and phoneme boundary characteristics corresponding to each of said phoneme segment boundary candidates, based on said detected feature points; and means for recognising phoneme boundaries in said input voice signal according to said phoneme segment boundary candidates and said phoneme boundary characteristics corresponding thereto.
6. A system according to claim 5, wherein said plural acoustic parameters include transient detection parameters and said feature point detecting means is operable to detect a peak point of a said transient detection parameter as one of said feature points, said producing means being operable to produce said phoneme segment boundary candidates and phoneme boundary characteristics corresponding thereto only within a predetermined period from said detected peak point.
A system for recognising phoneme boundaries in a voice signal, the system being substantially as herein described with reference to the accompanying drawings.
Published 1989 atThePatentOffioe, State House. 68171 High Holborn, London WCJR4TP- Further eopiesmaybe obtainedfrom The Patent Office, Wes Branch, St Mary Cray, Orpington. Kent BRB 3RD. Printed by multiplex techniques RAI, St Mary CrAY, Kent, Con- 1/87 i Y
GB8828532A 1987-12-08 1988-12-07 Voice signal status detection systems Expired - Lifetime GB2213623B (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
JP62310569A JP2643202B2 (en) 1987-12-08 1987-12-08 Detection device for steady, transient and uncertain parts of input speech
JP32330787A JPH01165000A (en) 1987-12-21 1987-12-21 Vocal sound section information forming apparatus
JP62331656A JPH01170998A (en) 1987-12-25 1987-12-25 Phoneme section information generating device

Publications (3)

Publication Number Publication Date
GB8828532D0 GB8828532D0 (en) 1989-01-11
GB2213623A true GB2213623A (en) 1989-08-16
GB2213623B GB2213623B (en) 1991-07-24

Family

ID=27339113

Family Applications (1)

Application Number Title Priority Date Filing Date
GB8828532A Expired - Lifetime GB2213623B (en) 1987-12-08 1988-12-07 Voice signal status detection systems

Country Status (5)

Country Link
KR (1) KR0136608B1 (en)
AU (1) AU612737B2 (en)
DE (1) DE3841376A1 (en)
FR (1) FR2624297B1 (en)
GB (1) GB2213623B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0538536A1 (en) * 1991-10-25 1993-04-28 International Business Machines Corporation Method for detecting voice presence on a communication line
US5884260A (en) * 1993-04-22 1999-03-16 Leonhard; Frank Uldall Method and system for detecting and generating transient conditions in auditory signals
EP1293961A1 (en) * 1998-03-13 2003-03-19 LEONHARD, Frank Uldall A signal processing method to analyse transients of a speech signal
ITMI20110103A1 (en) * 2011-01-28 2012-07-29 Universal Multimedia Access S R L PROCEDURE AND MEANS OF SCANDING AND / OR SYNCHRONIZING AUDIO / VIDEO EVENTS

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE10317502A1 (en) * 2003-04-16 2004-11-18 Daimlerchrysler Ag Evaluation method e.g. for analysis of sounds signals, evaluating sound signal, through band pass filter with sound signal is in frequency range of first band-pass filter

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS5782896A (en) * 1980-11-12 1982-05-24 Hitachi Ltd Continuous voice recognition system

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0538536A1 (en) * 1991-10-25 1993-04-28 International Business Machines Corporation Method for detecting voice presence on a communication line
US5255340A (en) * 1991-10-25 1993-10-19 International Business Machines Corporation Method for detecting voice presence on a communication line
US5884260A (en) * 1993-04-22 1999-03-16 Leonhard; Frank Uldall Method and system for detecting and generating transient conditions in auditory signals
EP1293961A1 (en) * 1998-03-13 2003-03-19 LEONHARD, Frank Uldall A signal processing method to analyse transients of a speech signal
ITMI20110103A1 (en) * 2011-01-28 2012-07-29 Universal Multimedia Access S R L PROCEDURE AND MEANS OF SCANDING AND / OR SYNCHRONIZING AUDIO / VIDEO EVENTS
WO2012101586A1 (en) * 2011-01-28 2012-08-02 Universal Multimedia Access S.R.L. Process and means for scanning and/or synchronizing audio/video events
US8903524B2 (en) 2011-01-28 2014-12-02 Universal Multimedia Access S.R.L. Process and means for scanning and/or synchronizing audio/video events

Also Published As

Publication number Publication date
GB2213623B (en) 1991-07-24
FR2624297B1 (en) 1992-01-24
DE3841376A1 (en) 1989-06-22
AU612737B2 (en) 1991-07-18
AU2661788A (en) 1989-06-22
GB8828532D0 (en) 1989-01-11
FR2624297A1 (en) 1989-06-09
KR0136608B1 (en) 1998-11-16
KR890010791A (en) 1989-08-10

Similar Documents

Publication Publication Date Title
US4821325A (en) Endpoint detector
US4736429A (en) Apparatus for speech recognition
US5526466A (en) Speech recognition apparatus
US4956865A (en) Speech recognition
US5621850A (en) Speech signal processing apparatus for cutting out a speech signal from a noisy speech signal
US5097509A (en) Rejection method for speech recognition
US4592085A (en) Speech-recognition method and apparatus for recognizing phonemes in a voice signal
Lu et al. Content analysis for audio classification and segmentation
US4481593A (en) Continuous speech recognition
EP0237934B1 (en) Speech recognition system
KR950013551B1 (en) Noise signal predictting dvice
US4074069A (en) Method and apparatus for judging voiced and unvoiced conditions of speech signal
US4677673A (en) Continuous speech recognition apparatus
GB2107101A (en) Continous word string recognition
US4665548A (en) Speech analysis syllabic segmenter
US4937870A (en) Speech recognition arrangement
EP0200347A1 (en) Knowledge-guided automatic speech recognition apparatus and method
US4885791A (en) Apparatus for speech recognition
US6470311B1 (en) Method and apparatus for determining pitch synchronous frames
WO1997040491A1 (en) Method and recognizer for recognizing tonal acoustic sound signals
GB2213623A (en) Phoneme recognition
EP0474496B1 (en) Speech recognition apparatus
EP0192898B1 (en) Speech recognition
EP0310636B1 (en) Distance measurement control of a multiple detector system
US5852799A (en) Pitch determination using low time resolution input signals

Legal Events

Date Code Title Description
PCNP Patent ceased through non-payment of renewal fee

Effective date: 19951207