AU612737B2 - A phoneme recognition system - Google Patents

A phoneme recognition system Download PDF

Info

Publication number
AU612737B2
AU612737B2 AU26617/88A AU2661788A AU612737B2 AU 612737 B2 AU612737 B2 AU 612737B2 AU 26617/88 A AU26617/88 A AU 26617/88A AU 2661788 A AU2661788 A AU 2661788A AU 612737 B2 AU612737 B2 AU 612737B2
Authority
AU
Australia
Prior art keywords
transient
phoneme
detecting
unit
detection parameters
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
AU26617/88A
Other versions
AU2661788A (en
Inventor
Makoto Akabane
Atsunobu Hiraiwa
Yoichiro Sako
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sony Corp
Original Assignee
Sony Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from JP62310569A external-priority patent/JP2643202B2/en
Priority claimed from JP32330787A external-priority patent/JPH01165000A/en
Priority claimed from JP62331656A external-priority patent/JPH01170998A/en
Application filed by Sony Corp filed Critical Sony Corp
Publication of AU2661788A publication Critical patent/AU2661788A/en
Application granted granted Critical
Publication of AU612737B2 publication Critical patent/AU612737B2/en
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/04Segmentation; Word boundary detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Machine Translation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Description

r~ 61 S F Ref: 71806 FORM COMMONWEALTH OF AUSTRALIA PATENTS ACT 1952 COMPLETE SPECIFICATION
(ORIGINAL)
FOR OFFICE USE: Class Int Class Complete Specification Lodged: Accepted: Published: Priority: Related Art: Name and Address of Applicant: Sony Corporation 7-35, Kitashinagawa 6-chome Shinagawa-ku Tokyo
JAPAN
Address for Service: Spruson Ferguson, Patent Attorneys Level 33 St Martins Tower, 31 Market Street Sydney, New South Wales, 2000, Australia Complete Specification for the invention entitled: A Phoneme Recognition System The following statement is a full description of this invention, including the best method of performing it known to me/us 5845/3
I-
t3
ABSTRACT
This invention is directed to an improved phoneme recognition system. The improved system comprises a parameter generator for generating a plurality of acoustic parameters including a transient detection parameter corresponding to an input voice signal, a detector for detecting feature points of the acoustic parameters, a generator for generating the difference between adjacent two of the transient detection 0 0 parameters, and another detector for detecting stationary and transient o parts of the input voice signal according to the generated difference so o 0 that the system may provide phoneme segmentation more precisely.
0 0 0 0 0 0o 0 00 00 0 S 0 0 *00 p..
-1Ao, o o 0 0O 0 000 0 o00000 0 0 0 00 o 0 0 0000 0 00 00 0 0000 0000 00 0 00 0 0 00 S0 0 a 0 00 0 0 0 0 0 ,.6 6 it e
C
ii B TITLE OF INVENTION A Phoneme Recognition System BACKGROUND OF THE INVENTION Field of the Invention The present invention relates to a phoneme recognition system which forms phoneme segment information essential to segmenting input speech into phoneme segments particularly for phoneme recognition in speech recognition.
Description of the Prior Art Phoneme recognition is the basis of the recognition of continuous speech and large vocabulary speech. Objective input speech must be segmented into phoneme segments for phoneme recognition.
For example, when a syllable "SU" is pronounced, the sound waveform can be segmented into a phoneme segment of a consonant "S" and a phoneme segment of a vowel A method of obtaining a segment boundary by comparing the power or zero-crossing rate of speech with a threshold has been used as a method of phonemic segmentation.
However, it has been difficult to achieve accurate phonemic segmentation simply by comparing the power or zero-crossing rate of speech with a threshold, because the setting of the threshold is difficult.
A transient detection parameter which will be described afterward, is compared with a threshold to detect a transient part which is greater than the threshold, and a stationary part which is smaller than the threshold.
BJG/62P
-L
-2- The principal object of the transient detection parameter T(n) is the detection of a point where the speech spectrum varies most sharply, namely, a peak point. Therefore it is difficult to measure the transient state and the stationary state through the simple application of the transient detection parameter. That is, it is difficult to set the threshold and hence it is difficult to discriminate accurately between the stationary part and the transient part.
SUMMARY OF THE INVENTION oo Accordingly, it is an object of the present invention to provide a C° l0 phoneme recognition system capable of providing information for further 0 C0 o co accurate phonemic segmentation.
o 00 0000 .00 According to one aspect of the present invention there is disclosed a system for detecting the transient and stationary status of a voice signal, comprising: 015 sound analyzing means receiving an input voice signal for 0000 aoOO acoustically analyzing said input voice signal and for providing a speech spectrum thereof; 0 08 coo 0 means receiving said speech spectrum from said sound analysing means for deriving transient detection parameters therefrom based on a 00°20 predetermined relationship of the sum of variance of the speech spectrum on a time base axis; means receiving said transient detection parameters for detecting feature pol,its contained therein by comparing said transient detection parameters with predetermined feature point information; HRF/0487y -3detecting means receiving said transient detection parameters and a predetermined one of said detected feature points for detecting a stationary part, a transient part, and an undecided part in said transient detection parameters; and means for forming a phoneme boundary candidate from said detected feature points and from said detected stationary part and detected transient part by discriminating the detected feature points and producing a phoneme boundary candidate feed to a phoneme recognizing unit.
Soo BRIEF DESCRIPTION OF THE DRAWINGS o00 Fig. 1 is a block diagram of an embodiment of a phoneme recognition 0 0 0 system according to the invention; 0O 00 0 0 0000 a0 0 0 0 0 oooo 0 0 0 0 0 0 0 9 0 0 to so o ,020 HRF/0487y 4 Fig. 2 illustrates an example of waveform of an input voice signal and transient detection parameter corresponding thereto; Fig. 3 shows an example of feature points; Fig. 4 is a more detailed block diagram of the stationary and transient portion detector shown in Fig. 1; Fig. 5 is an example of waveforms of an input voice signal and 9o acoustic parameters thereof; Fig. 6 is a more detailed block diagram of a phoneme boundary candidate generator shown in Fig. 1; Fig. 7 is a table showing relationship of phoneme boundary 0 characteristic to acoustic parameters and feature points; Fig. 8 shows relative priorities of each acoustic parameter; and oo Fig. 9 is a flow chart of the operation for detecting the sta- Soo tionary and transient portion of the input voice signal.
OODo DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS A phoneme recognition system in accordance with the present invention obtains phoneme segment information on the basis of the peak feature points of transient detection parameters. Prior to the description of the phoneme recognition system, the transient detection parameters will be explained.
For example,, when a syllable "SU" is pronounced, a voice waveform as shown in Fig. 2A is obtained. Thus, the syllable "SU" can be phonemized into a consonant and a vowel As is obvious from the voice waveform, a phoneme boundary exists in a transient part of the voice waveform where phoneme changes. Accurate phoneme recognition can be achieved by recognizing the phoneme in a stationary part of a phoneme segment.
U, f transient detection parameters is an effective means for detecting the transient state and the stationary state.
The transient detection parameter is represented by the variation of a speech spectrum defined by the sum of variance within a block on *ooo*I the time axis of each channel (frequency).
O 0 00,' That is, first the gain of the speech spectrum Si(n) is normalo0: ized by the average Savg(n) in the direction of frequency.
00a q 4 Savg(n) E Si(n)/q (1) i=l where i is channel number, and q is the number of channels. The inforo0 mation about each of the q channels is sampled on the time axis. A 0 0 S00 block of information about the q channels at the same time point is S designated as a frame. In the expression n is the number of a 0 frame for recognition.
The gain-normalized voice spectrum Si(n) is expressed by Si(n) Si(n) Savg(n) (2) A transient detection parameter T(n) is represented by the sum of variance on the time axis of each channel within blocks n+M] which are the sum (2M+1) of M frames before and after the frame.
qM T(n) E s i(n+j) Ai(n) (3) i=lj=-M Ai(n) (4) j=-M where Ai(n) is the average on the time axis within the block of each channel.
-'I
0 oO 0 0n 0 cO 0 0 0 00 0 0 0 00 0 0 o 00 Q 00 a oo 0 0 q o a Particularly, since the variation in the central part of the n+M] block is liable to pick up the fluctuation of sound and noise, the expression is modified into the expression to eliminate the variation in the central part in calculating the transient detection parameter T(n).
q -M M {n E Si(n+j) Ai(n) a E Ai(n) a)2q(M-m+l) j=M 2q(M-m+l) The transient detection parameter T(n) is determined, for example, by substituting a 1, M 28, m 3 and q 32 into the expression In the case of the input speech a transient detection parameter as shown in Fig. 2B is obtained.
The peak points of the transient detection parameter T(n) are stable features in the transient parts. Determination of phoneme boundary candidates on the basis of the transient detection parameters T(n) enables the avoidance of erroneous selection of phoneme boundary candidates. The present invention utilizes particularly such characteristics of the transient detection parameters.
A preferred embodiment of the present invention will be described hereinafter with reference to the accompanying drawings. Shown in Fig.
1 is a phoneme recognition system in a preferred embodiment according to the present invention, equipped with a phoneme segment information forming apparatus.
A speech signal generated by a microphone 1 is transmitted through an amplifier 2 and a low-pass filter 3 for limiting bandwidth to an A/D converter 4, which samples the speech signal, for example, at a
I
sampling frequency of 12.5 kHz to convert the speech signal into a digital speech signal and gives the digital speech signal to a sound analyzing unit The sound analyzing unit 5 comprises a band pass filter bank 51 and a sound analyzer 54. The band pass filter bank 51 comprises, for example, thirty-two channels of digital band pass filters 5110, 5111, 5112, and 51131. The digital band pass filters 5110, 5111, and 51131, for example, are Butterworth digital filters of four degree and so a have band of equal divisions of a bandwidth between 250 Hz and 5.5 kHz on a logarithmic axis, respectively. The output signals of the digital o band pass filters 5110, 5111, and 51131 are applied to rectifiers 5120, 5121, and 51231, respectively. The output signals of the 0 00 0o0o rectifiers 5120, 5121, 51231 are applied to digital low-pass filters 0 0 Go 5130, 5131, 51331, respectively. The digital low-pass filters 5130, 1 5131, 51331, for example, are FI. low-pass filters having a cutoff frequency of 52.8 Hz. The output signals of the digital low-pass on filters 5130, 5131, and 51331 are applied to a sampler 52. The sampler 52 samples the output signals of the digital low-pass filters 5130, 51311 and 51331 at a frame period of 5.12 msec. Thus a sample time series, namely, speech spectrum, Si(n) (i 1, 2, and 32, n 1, 2, and N (frame number)0 is obtained.
The output signal of the sampler 52, namely, the sample time series Si(n), is applied to a normalization circuit 53 to obtain a time series Si(n) of a normalized speech spectrum.
8 The sample time series Si(n) of the speech spectrum provided by the normalization circuit 53 is applied to a transient detection parameter computing unit 6, which executes computation by using the expression to obtain transient detection parameters In the computation using the expression for example, M 5 and m 2, which are smaller than M 28 and m 3 used in the foregoing computation, are o used to detect transient parts and stationary parts and to reduce o no.. computational quantity.
0 D The transient detecting parameter T(n) for an input speech "ASA" 0 ooo0 for instance, is shown in Fig. 5A. Fig. 5G is the waveform of the input 00 0 0 9.0 o o speech signal.
The sound analyzer 54 of this embodiment comprises a logarithmic 4 0 a oa power detector 541 for detecting the logarithmic power of the input 0 4O speech signal, a zero-crossing rate computer 542, a computer 543 for computing a primary PARCOR coefficient indicating the degree of correlation between the successive samples, a computer 544 for computing the 44 inclination of the power spectrum, a pitch period detector 545 for detecting the pitch period of the input speech signal. The pitch period is applied to a phoneme recognizing unit In the computation of these parameters, namely, the logarithmic power, the zero-crossing rate, the primary PARCOR coefficient, the inclination of power spectrum and the pitch period, a window having a time width corresponding to M frames before a time point (frame) and M frames after the time point is shifted successively by one sampling point at a time on the time axis to generate the parameters by carrying 9 out computation within each window. These parameters are given to a sampler 55, which samples the parameters at the same sampling pulses as those for the sampler 52. Accordingly, the sampler 55 provides parameters of analyzed information in the same time series as that for the speech spectrum Si(n).
Figs. 5B, 5C, 5D and 5E shown the logarithmic power, the zero-crossing rate, the primary PARCOR coefficient and the inclination o~0ooo of the power spectrum thus obtained, respectively.
0 00 QoQ Fig. 5F shows a speech pitch, namely, the pitch period of the oot' speech.
0 4 4 The parameters thus obtained by the sound analyzing unit 5 are fed as parameters for recognizing process to the phoneme recognizing S' unit 10. The transient detection parameters T(n) computed by the S (I transient detection parameter computing unit 6 and the parameters determined by the sound analyzing unit 54 excluding the pitch period are fed to a feature point extracting unit 7.
The feature point extracting unit 7 extracts general feature points to obtain phoneme boundary candidates from the parameters for segmentation. In this example, the following seven feature points (1) to as shown in Fig, 3 are used.
Rising point Falling point Increasing turning point Decreasing turning point Peak point Positive zero-crossing point Negative zero-crossing point The feature point extracting unit 7 extracts the feature points of the parameters with reference to feature point information provided by a feature point information storage unit 71. In Figs. 5A to positions on the time axis indicated by vertical liens are the feature points of the parameters.
For example, peak points are extracted as the feature points 0 -0 o of the transient detection parameters and rising points
'O
falling points increasing turning points and decreasing turning ao 0 S points are extracted as the feature points of the parameters of the logarithmic power and the zero-crossing rate.
o o 00 0 o0 The feature point information obtained by the feature point extracting unit 7 is applied to a phoneme boundary candidate forming o unit 9, which determines phoneme boundary candidates on the basis the transient detection parameters T(n) and extracts the features of the 0o phoneme boundary candidates.
oas The phoneme boundary candidate forming unit 9 makes reference to a decision output provided by a transient part, stationary part and undecided part deciding unit 8. The deciding unit 8 receives the transient detection parameters T(n) from the transient detection parameter computing unit 6, and the peak feature point information on the transient detection parameters T(n) from the feature point extracting unit 7, and then the deciding unit 8 decides undecided parts belonging 11 to neither the transient parts of the input speech nor the stationary parts of the input speech.
Shown in Fig. 4 is the transient part stationary part and undecided part deciding unit 8 comprising a difference calculating unit a parameter memory 81, a difference memory 82, a stationary part deciding unit 83, a transient point detecting unit 84, a transient part deciding unit 85 and an undecided part deciding unit 86.
.0oo The transient detection parameters T(n) provided by the transient detection parameter computing unit 6 are applied to the difference computing unit 80 to compute the difference dT(n) between the successive transition detection parameters.
dT(n) T(n+l) T(n) (6) 0 C 0 0 0 The parameter memory 81 stores the transient detection parameters S T(n) provided by the transient detection parameter computing unit 6, and a 0 the difference memory 82 stores the difference dT(n).
The deciding operation will be described hereinafter.
*a 00 The stationary part deciding unit 83 sends a search signal to the memories 81 and 82 to read the transient detection parameters T(n) and the difference dT(n) sequentially from the memories 81 and 82, and decides a segment to be a stationary part when the segment meets T(n) T (7) sl or T(n) 5 Ts2 (Tsl Ts2 and dT(n) Id01 (8)
I
12 where Tsl Ts2 and d O are set thresholds for example, Ts 1.0, T and d O 0.1.
(ii) The transient point detecting unit 84 detects the peak points of the transient detection parameters T(n) (Fig. 5B) from the feature point extracting unit 7, regards the peak points as transient points each representing the center of the transient part, and then o gives position information (frame numbers) about the transient points to the transient part deciding unit (iii) The transient part deciding unit 85 sends a search signal o 0 Oee having the basic point on the transient point to the difference memory "e 82 to read the difference dT(n). The past difference is searched for backward with respect to time from the transient point as a basic point 0 o0 (hereinafter, this mode of search will be referred to as "backward 0 0 0 0 AO search") and decides a segment having a difference dT(n) meeting o 0 dT dl (dl is a threshold) (9) 0 SO to be a rear transient part. For example, dl 0.2.
:bs (iv) In the backward search, when a segment meeting the expression overlaps a stationary part decided by the stationary part deciding unit 83, a segment immediately before a portion of the segment before the stationary part is regarded as a transient part.
Then, the transient part deciding unit 85 makes a search forward with respect to time (hereinafter, this mode of search is referred to as "forward search") from the transient point as a basic point and decides a segment having a dT(n) meeting an inequality dT(n) 5 -d
I
13 to be a forward transient part.
(vi) In the forward search, when the segment meeting the expression (10) overlaps a stationary part, a portion of the segment immediately before the stationary part is regarded as a transient part.
(vii) A tra:nsient part having its center on a transient point is detected from the backward transient part and the forward transient 0 part. The foregoing procedure is executed for all the transient points to discriminated all the transient parts.
0 0 o01o (viii) Then, the undecided part deciding unit 86 makes reference S to the respective decision outputs of the stationary part deciding unit 83 and the transient part deciding unit 85 and decides segments decided to be neither a stationary part nor a transient part to be undecided S pa:I:s. In Fig. 5A, parts indicated by thick solid lines are transient 0 0s parts, parts indicated by thin solid lines are stationary parts, and 0a 0 C o0 parts indicated by broken lines are undecided parts.
o The decision output of the undecided part deciding unit 86 is 6 given, together with the respective decision outputs of the stationary part deciding unit 83 and the transient part deciding unit 85, to the phoneme boundary candidate forming unit 9.
attention is directed particularly to the stationary parts among data included in the decision output of the deciding unit 8 applied to the phoneme recognizing unit 10 for phoneme recognition, and the undecided parts are ignored in phoneme recognition to achieve accurate phonume recognition, because the undecided parts are factors of variation. A computer may be employed fo. carrying out the foregoing 14 operation. Fig. 9 is a flow chart showing procedures for deciding the stationary part, the transient part and the undecided part.
The phoneme boundary candidate forming unit 9 will be described hereinafter with reference to Fig. 6.
The phoneme boundary candidate forming unit 9 determines phoneme boundary candidates. The following eight phoneme boundary characteristics are used.
Rise from silence (S-R) 0 04 ,ges Consonant-to-vowel transition (C-V) oo> Consonant-to-consonant transition (C-C) Vowel-to-vowel transition (V-V) Fall-to-vowel transition (V-F) 4' Vowel-to-consonant transition (V-C) Fall-to-silence transition (F-S) Sound-to-silence transition (S-S) A phoneme boundary characteristics information storage unit 91 stores these eight phoneme boundary characteristics data. A phoneme boundary candidate and characteristics discriminating unit 93 discriminates phoneme boundary characteristics of phoneme boundary candidates with reference to information fetched from the phoneme boundary characteristics information storage unit 91. In Fig. 7, the phoneme boundary characteristics data are represented by the symbols S-R, C-C, C-V and the like. Also shown in Fig. 7 are sound parameters constituting phoneme boundaries, and the numbers of the feature points extracted by the feature point extracting unit 7 shown in Fig. 3.
Each of the phoneme boundary characteristics may correspond to the plurality of sound parameters and feature points.
A reference priority information storage unit 92 stores reference priority information of the sound parameters as shown in Fig. 8, in which the priority of the right-hand parameter is higher than that of the left-hand parameter.
o NA phoneme boundary candidate and characteristics discriminating o unit 93 collects the feature points of the parameters to decide a S phoneme boundary candidate and determines the phoneme boundary characteristics of the phoneme boundary candidate since the feature points 00 obtained by the feature point extracting unit 143 dislocated or undetected with the parameters.
0 *4 o 0 In this operation, the discriminating unit 93 makes reference to 0 00 the transient part decision output provided by the deciding unit 8. The I r discriminating unit 93 regards the transient point in the transient S part, namely, the peak feature point of the transient detection parameter, as the first phoneme boundary candidate, and examines the feature point of other sound parameter in the vicinity of the transient point to determine a phoneme boundary candidate. In this operation, the discriminating unit 93 decides the reference priority of each parameter with reference to the reference priority information provided by the storage unit 92, and discriminates a phoneme boundary feature corresponding to the feature point of the sound parameter regarded as the phoneme boundary candidate with reference to the phoneme boundary characteristics information provided by the memory unit 91.
V
~I Q 0 00 00 0 0 00 e6 0 8 0 a Thus, the discriminating unit 93 discriminates the phoneme boundary characteristics of C-V, C-C, V-V, V-F, V-C and F-S.
Another feature discriminating unit 94 makes reference to the transient part decision output provided by the deciding unit 8 to search for further feature points before the transient point other than the phoneme boundary candidate discriminated by the discriminating unit 93.
If any feature point is found, the feature discriminating unit 94 discriminates the phoneme boundary characteristics of C-V and C-C by using the feature point. The discriminating unit 94 deals with the following cases.
For example, a bilabial voiced plosive "BA" has little stationary part between transient parts, the two transient parts are close to each other, and hence only one feature point can be detected. Therefore, a peak feature point which must originally be before the transient point is detected from the feature point of another parameter. A feature point after the transient point is not searched for, because, in the Japanese language, a vowel is preceded by a consonant, and the peak of the vowel is higher than that of the consonant.
Naturally, different languages are different from each other in the expected position of a feature point to be searched for, and hence a method suitable of a specific language is applied to searching for the feature point.
A sound/silence discriminating unit 95 receives the stationary part decision output of the deciding unit 8, and discriminates between the stationary part of sound and the stationary part of silence from the hip .i C~E I feature point information about the logarithmic power and the zero-crossing rate.
A S-R/S-S discriminating unit 96 receives the sound/silence discrimination output of the sound/silence discriminating unit 95 and the feature point information about the logarithmic power and the zero-crossing rate, and then discriminates between the phoneme boundary 0 feature S-R of rise from silence and the phoneme boundary feature S-S of o transition from sound to silence.
0 The results of discrimination of the discriminating units 93, 94 0 00 and 96 are given to a phoneme boundary candidate deciding unit 97.
SThen, the phoneme boundary candidate deciding unit 97 applies collectively the position (frame) of the phoneme boundary candidate and the phoneme boundary characteristics obtained by the discriminating units 93, 94 and 96 to the phoneme discriminating unit 10. The phoneme boundary candidate and the phoneme boundal features of the example shown in Fig. 5 are shown under the speech waveform shown in Fig. In this example, a transient part feature output unit 98 receives the phoneme boundary characteristics from the phoneme boundary candidate deciding unit 97, and the transient part decision output from the deciding unit 8. Then, the transient part characteristics output unit 98 gives a phoneme boundary characteristics of the transient part including the phoneme boundary to the phoneme recognizing unit The phoneme recognizing unit 10 carries out phoneme recognition by using the parameters provided by the sound analyzing unit 5 and making reference to the phoneme segment information provided by the 18 phoneme boundary candidate forming unit 9. Then, the phoneme recognizing unit 10 determines a phoneme symbol and gives the phoneme symbol, for example, to a continuous speech and large vocabulary speech recognizing unit, not shown.
The hardwares of this embodiment, namely, the feature point extracting unit 7, the transient part, stationary part and undecided o part deciding unit 8, the phoneme boundary candidate forming unit 9, and the operating elements of the sound analyzing unit 5, may be substituted by computer softwares.
r Thus, in accordance with the present invention, the phoneme oac S recognition system extracts feature points expected to be phoneme boundaries from a plurality of parameters obtained through sound analy- 9 $0 Ssis, and decides a phoneme segment candidate from the data of the feature points of the plurality of parameters. Accordingly, more 00 accurate phoneme segment information can readily be obtained. Furtherj'f more, since the phoneme segment information includes the characteristics of the phoneme segment candidate, phoneme recognition can easily be achieved.
Still further, according to the present invention, since the phoneme boundary candidate is decided on the basis of the peak point of the transient detection parameter, which is a stable feature point in a transient part of the input speech, the selection of an erroneous phoneme boundary candidate is obviated.
Since the difference between the transient detection parameters is calculated, the stationary part is decided on the basis of the 19 transient detection parameters and the difference, and the transient part is decided on the basis of the difference, instead of deciding the stationary part and the transient part through the simple comparison of the transient detection parameters with the threshold, accurate decision of the stationary part and the transient part is achieved.
Furthermore, since the present invention regards a segment which is neither of a stationary part and a transient part as an undecided 0 part, phoneme segment decision and phoneme recognition can be achieved by using segments excluding the undecided parts, which are the factors of variation, and the undecided part decision output.
e a 4 0 P I
I
NI f i

Claims (3)

1. A system for detecting the transient and stationary status of a voice signal, comprising: sound analyzing means receiving an input voice signal for acoustically analyzing said input voice signal and for providing a speech spectrum thereof; means receiving said speech spectrum from said sound analysing means for deriving transient detection parameters therefrom based on a predetermined relationship of the sum of variance of the speech spectrum on a time base axis; means receiving said transient detection parameters for detecting °ooo feature points contained therein by comparing said transient detection parameters with predetermined feature point information; 0 0 o o detecting means receiving said transient detection parameters and a 15 predetermined one of said detected feature points for detecting a stationary part, a transient part, and an undecided part in said transient 0 detection parameters; and means for forming a phoneme boundary candidate from said detected feature points and from said detected stationary part and detected transient part by discriminating the detected feature points and producing ooo a phoneme boundary candidate feed to a phoneme recognizing unit. 0000 o
2. A system for detecting a status of voice signal according to 4 claim 1, further comprising: i. 0 peak detecting means for detecting a peak of said transient detection parameters.
3. A phoneme recognition system substantially as hereinbefore described with reference to the accompanying drawings. DATED this FIRST day of MAY 1991 Sony Corporation Patent Attorneys for the Applicant SPRUSON FERGUSON HRF/0487y I- IS L- i
AU26617/88A 1987-12-08 1988-12-06 A phoneme recognition system Ceased AU612737B2 (en)

Applications Claiming Priority (6)

Application Number Priority Date Filing Date Title
JP62310569A JP2643202B2 (en) 1987-12-08 1987-12-08 Detection device for steady, transient and uncertain parts of input speech
JP62-310569 1987-12-08
JP32330787A JPH01165000A (en) 1987-12-21 1987-12-21 Vocal sound section information forming apparatus
JP62-323307 1987-12-21
JP62331656A JPH01170998A (en) 1987-12-25 1987-12-25 Phoneme section information generating device
JP62-331656 1987-12-25

Publications (2)

Publication Number Publication Date
AU2661788A AU2661788A (en) 1989-06-22
AU612737B2 true AU612737B2 (en) 1991-07-18

Family

ID=27339113

Family Applications (1)

Application Number Title Priority Date Filing Date
AU26617/88A Ceased AU612737B2 (en) 1987-12-08 1988-12-06 A phoneme recognition system

Country Status (5)

Country Link
KR (1) KR0136608B1 (en)
AU (1) AU612737B2 (en)
DE (1) DE3841376A1 (en)
FR (1) FR2624297B1 (en)
GB (1) GB2213623B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0538536A1 (en) * 1991-10-25 1993-04-28 International Business Machines Corporation Method for detecting voice presence on a communication line
DK46493D0 (en) * 1993-04-22 1993-04-22 Frank Uldall Leonhard METHOD OF SIGNAL TREATMENT FOR DETERMINING TRANSIT CONDITIONS IN AUDITIVE SIGNALS
ATE282879T1 (en) * 1998-03-13 2004-12-15 Frank Uldall Leonhard SIGNAL PROCESSING METHOD FOR ANALYZING VOICE SIGNAL TRANSIENTS
DE10317502A1 (en) * 2003-04-16 2004-11-18 Daimlerchrysler Ag Evaluation method e.g. for analysis of sounds signals, evaluating sound signal, through band pass filter with sound signal is in frequency range of first band-pass filter
IT1403658B1 (en) 2011-01-28 2013-10-31 Universal Multimedia Access S R L PROCEDURE AND MEANS OF SCANDING AND / OR SYNCHRONIZING AUDIO / VIDEO EVENTS

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
AU551342B2 (en) * 1980-11-12 1986-04-24 Hitachi Limited Speech phoneme recognition

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
AU551342B2 (en) * 1980-11-12 1986-04-24 Hitachi Limited Speech phoneme recognition

Also Published As

Publication number Publication date
GB2213623A (en) 1989-08-16
GB2213623B (en) 1991-07-24
FR2624297B1 (en) 1992-01-24
DE3841376A1 (en) 1989-06-22
AU2661788A (en) 1989-06-22
GB8828532D0 (en) 1989-01-11
FR2624297A1 (en) 1989-06-09
KR0136608B1 (en) 1998-11-16
KR890010791A (en) 1989-08-10

Similar Documents

Publication Publication Date Title
Lu et al. Content analysis for audio classification and segmentation
US4736429A (en) Apparatus for speech recognition
EP1083542B1 (en) A method and apparatus for speech detection
US5625749A (en) Segment-based apparatus and method for speech recognition by analyzing multiple speech unit frames and modeling both temporal and spatial correlation
JP3162994B2 (en) Method for recognizing speech words and system for recognizing speech words
KR101688240B1 (en) System and method for automatic speech to text conversion
US8428945B2 (en) Acoustic signal classification system
US5526466A (en) Speech recognition apparatus
US5596680A (en) Method and apparatus for detecting speech activity using cepstrum vectors
JPH0990974A (en) Signal processor
Niyogi et al. Detecting stop consonants in continuous speech
CN111724770B (en) Audio keyword identification method for generating confrontation network based on deep convolution
US4665548A (en) Speech analysis syllabic segmenter
US4937870A (en) Speech recognition arrangement
US5995924A (en) Computer-based method and apparatus for classifying statement types based on intonation analysis
US5809453A (en) Methods and apparatus for detecting harmonic structure in a waveform
EP0200347A1 (en) Knowledge-guided automatic speech recognition apparatus and method
AU612737B2 (en) A phoneme recognition system
US4885791A (en) Apparatus for speech recognition
US6823304B2 (en) Speech recognition apparatus and method performing speech recognition with feature parameter preceding lead voiced sound as feature parameter of lead consonant
CN116230018A (en) Synthetic voice quality evaluation method for voice synthesis system
KR100391123B1 (en) speech recognition method and system using every single pitch-period data analysis
Cole et al. The C-MU phonetic classification system
Vicsi et al. Continuous speech recognition using different methods
JP2757356B2 (en) Word speech recognition method and apparatus