CN1106618C

CN1106618C - Method for changing pronunciation speed

Info

Publication number: CN1106618C
Application number: CN99104829A
Authority: CN
Inventors: 刘晓波; 宋建福; 林光信
Original assignee: Inventec Corp
Current assignee: Inventec Corp
Priority date: 1999-04-08
Filing date: 1999-04-08
Publication date: 2003-04-23
Anticipated expiration: 2019-04-08
Also published as: CN1270356A

Abstract

The present invention relates to a method for changing pronouncing speed, particularly to a method for treating and changing the playing speed of digital speech signals. When digital speech signals are pronounced at non-standard speed, the original tone of each syllable still can be sustained; each note signal segment in the speech signals is duplicated or deleted in an equimultiple mode at preset playing speed; a speech treating unit is used for playing each note signal segment according to original sampling frequency; played speech can conform to the preset playing speed for still sustaining the original tone.

Description

Change the method for the rate of articulation

The present invention relates to a kind of method that changes the rate of articulation, the pronunciation that is applied to digitized voice messaging is handled, and in order to after digitized voice messaging is carried out the change of the rate of articulation, can not make the method for the tonal distortion of its pronunciation.

With reference to Fig. 1, no matter be the ActiveMovie of Microsoft exploitation, MCI, or the voice edition software of other company's exploitation, its in computing machine to the collection of voice, storage, the mode of playing, be that various sources of sound are produced equipment (as: microphone, cassette recorder etc.) 10, the voice signal that is produced, sample by 20 pairs of voice signals of an Audio Processing Unit (as: sound card), and convert corresponding digitized voice signal to by Logical processing unit 30, with reference to Fig. 2, digitized voice signal 40 is by a plurality of

note signal segments

41,51,61 form, and each note signal segment 41 also includes a plurality of signal sampling points 411, this digitized voice signal 40 deposited in the voice document of a recording medium 50 at last again; When playing voice, as long as the signal sampling point 411 in each the note signal segment 41 in the voice document is transmitted, outputs in the Audio Processing Unit 30, by Audio Processing Unit 30 these signal sampling points 411 amplifications are outputed to voice output unit 60 again, can send the voice signal that to hear by voice output unit 60.

And wherein with pronunciation the data of substantial connection are arranged are signal sampling points 411, signal sampling point 411 is according to predefined sample frequency, former voice signal (referring to produce the survivor by equipment such as microphone or cassette recorders) is sampled, will deposit in after treatment by the note signal segment 41 that these signal sampling points 411 are formed again in the voice document in the recording medium 50.And then pass through Audio Processing Unit 30 with the frequency identical these signal sampling points reduction are play with sample frequency.The form of 22kHz, 8bit is a monophony radio tonequality in the form of present voice signal, and the form of 44kHz, 16bit is stereo CD Quality; Wherein 22kHz (44kHz) just is meant sample frequency, 8bit (16bit) just is meant that depositing a signal sampling puts 411 shared figure places, and Audio Processing Unit 30 is exactly with a set playback rate, play sound according to aforesaid phonetic matrix, and the broadcasting speed of stereo CD Quality is 172kb/s, and the broadcasting speed of monophony radio tonequality is 22kb/s.

Tradition changes the method for sound pronunciation, be with each signal sampling point 411 as base unit, carry out signal sampling point 411 and duplicate or delete with the quickening of realization playout of voice or slow down.Therefore if the broadcasting speed of former voice is subtracted doppio lunghezza di tempo, exactly each the signal sampling point 411 in each note signal segment 41 is all duplicated once, and insert after each original signal sampling point; So, the wave period of note signal segment 41 just is elongated one times, so in the process of playing, if keep sample frequency constant, the voice word speed that then plays back just reduces by one times, simultaneously sound will step-down, chap.With reference to Fig. 3, oscillogram for original note signal segment 411, as shown be that to contain an amplitude be 156 sampled signal, reproduction time is 2 milliseconds a note signal segment 41, modern if will play note signal segment 41 with the speed of doppio lunghezza di tempo, so according to aforesaid traditional variable-speed processing mode, just must each the signal sampling point 411 in the note signal segment 41 be duplicated, and the signal sampling point 411a after will duplicating inserts in the note signal segment 41, place the back of original signal sampled point 411, note signal segment 41a so after treatment will be as shown in Figure 4, include two signal sampling points 411 adjacent and sample frequency is identical of many groups, 411a, if so carry out the reduction and the broadcast of sound according to preset sampling frequency, then needed 2 milliseconds of note signal segments 41 of finishing a vibration period originally, just becoming needs 4 milliseconds of note signal segment 41a that just can finish a vibration period; So, the broadcasting speed of voice has no doubt slowed down, but owing to changed the vibration period and the frequency of original sound, so voice have just produced the phenomenon that modifies tone.If this is because be the note signal segment of recording with the 22kHz sample frequency 41 originally, just converted a note signal segment 41a who records with the 44kHz sample frequency to through after the above-mentioned processing, but play owing to be still by original 22kb/s speed, it is slow one times when so the frequency ratio of the sound after the reduction is recorded, adding the tone of sounding and the vibration frequency of sound wave has direct relation, so the phenomenon that will occur modifying tone.

The technology of present changing speed of sound in the process that the sampled signal of former voice file is handled, has changed the vibration frequency of reduction back speech sound waves, so the phenomenon that can occur modifying tone; Therefore, cause user's unhappy acoustically no matter present changing speed of sound technology is frequencies go lower or uprises that it is unclear all can sound to be thickened.Particularly in carrying out the phonetic teaching process, the learner is generally to spoken and listening study sensation difficulty, and wherein a part of reason is that the other side's word speed of speaking is too fast, and the beginner has little time to react.Can improve the effect of training greatly if the speed of sound can be slowed down.

Fundamental purpose of the present invention is to propose a kind of when voice signal is carried out the quick broadcast of arbitrary speed or slow play, the method of phenomenon can not occur modifying tone, make that the clear wash rice of voice, intonation remain unchanged after adjusting the broadcasting speed of voice, sound aphonia not.

The principle of processed voice speed change of the present invention, be not as the elementary cell of duplicating or deleting with each the signal sampling point 411 among Fig. 1, but with (i.e. the complete vibration period) 41 of the note signal segment in the former voice signal 40 as a basic unit, duplicate or delete.With reference to Fig. 3 and Fig. 5, if in the time of will making the note signal segment 41 of output have effect than the broadcasting speed doppio lunghezza di tempo of standard, be that note signal segment 41 among Fig. 3 is duplicated, and the note signal segment 41a after duplicating is placed the back of former note signal segment 41, constitute a new note signal segment 42, as shown in Figure 5, carry out the broadcast of note signal segment 42 again with the broadcasting speed of original sampling frequency by Audio Processing Unit 30, so just can not change each note signal segment 41, original frequency of 41a, but also can after changing playout of voice, still keep the intonation (frequency) of original voice.

Relevant detailed content of the present invention and technology, existing accompanying drawings is as follows:

Fig. 1 is the calcspar of changing speed of sound playback process device;

Fig. 2 is the oscillogram of voice signal;

Fig. 3 is the oscillogram with original note signal segment;

Fig. 4 is with the oscillogram of classic method after slow play is handled;

Fig. 5 is the oscillogram of note signal segment after the speed playback process of doppio lunghezza di tempo of Fig. 3;

Fig. 6 is the oscillogram of voice signal after the speed playback process of doppio lunghezza di tempo of Fig. 2;

Fig. 7 is the oscillogram of voice signal after slow 1/2nd times speed playback process of Fig. 2;

Fig. 8 is the oscillogram of voice signal after speed playback process fast again of Fig. 2;

Fig. 9 is the synoptic diagram of structure chained list;

Figure 10 A is the part process flow diagram of the method for processed voice speed change broadcast of the present invention;

Figure 10 B is the part process flow diagram of the method for processed voice speed change broadcast of the present invention;

Figure 10 C is the part process flow diagram of the method for processed voice speed change broadcast of the present invention.

With reference to Fig. 2, the method applied in the present invention is when the speed change of carrying out voice signal 40 is play, it or not each the signal sampling point 411 that duplicates or delete in the voice signal 40, but according to the requirement that voice signal 40 will be play with quickening or slack-off mode, to the note signal segment in it (a complete vibration period of sound wave) 41 actions of duplicating or deleting, so before the processing of voice signal 40 being done the speed change broadcast, just must find out each the note signal segment 41 in the voice signal 40 earlier, below the condition of the note signal segment of decision in the voice signal of classifying as:

1. the starting point 44 of this note signal segment and terminating point 45 must be that line and the center line 46 that central point or it and it next signal sampled point are formed intersects, and the variation tendency of the sampled signal of starting point 44 and terminating point 45 and their next signal sampled points composition is all ascendant trend or is all downtrending.

2. in time interval should be in the scope in one of first-harmonic moving cycle of table for 440Hz between starting point 45 and the terminating point 45, and promptly starting point is divided into the 2-3 millisecond mutually with time between terminating point.

3. note signal segment and contiguous next note signal segment, certain common point should be arranged, and promptly maximal value or gap center line minimum value 46 below between of the center line of two note signal segments more than 46 arrives 1/10th of maximum changing range less than center line.

4. discontented be enough to condition can not be as a note signal segment, and when changing speed of sound is handled, remain unchanged for the data that do not satisfy condition, neither duplicate also and do not delete.

With reference to Figure 10 A to Figure 10 C, be the process flow diagram of changing speed of sound playback process of the present invention, the treatment step of its speed change is in regular turn:

Steps A 1, in digitized voice signal, mode with more per two signal sampling points 411 scans, and with the signal sampling point 411 on institute's own centre line 46, or the sampled point that intersects with the line and the center line 46 of thereafter signal sampling point, and the information of all flex points (turning point that promptly refers to crest, trough) records in the structure chained list 47, and wherein the structure of each chained list 471 is as shown in Table 1:

The structure of table one, chained list

Difference between signal sampling point and the center line
	The signal sampling point is rising point or drop point
Whether signal sampling point is central point	The signal sampling point is rising point or drop point
Whether signal sampling point is central point	The side-play amount of signal sampling point and starting point
Point to the pointer of next list structure

Steps A 2, the unnecessary flex point record of filtering in structure chained list 46 only keeps a distance center line 45 flex point farthest at most between two adjacent central points;

Steps A 3 is sought the central point of an ascendant trend or downtrending backward from the head of structure chained list 46;

Steps A 4 judges whether to exist the central point of an ascendant trend or downtrending, then skips to steps A 6 if yes, if otherwise carry out next step;

Steps A 5, seeking the next one is the central point of ascendant trend or downtrending, and skips to steps A 4.

Steps A 6 judges whether the record of central point, then carries out next step if yes; If otherwise execution in step A8;

Steps A 7, the record of records center point, and skip to steps A 9;

Steps A 8, the position of records center point is in recording medium;

Steps A 9 judges whether two central points with same characteristic features are arranged in the recording medium, then skips to steps A 11 if yes, if not, then carry out next step;

Steps A 10 judges whether all to search to finish, and then carries out next step if yes, if otherwise skip to steps A 5;

Steps A 11 is calculated the side-play amount between two central points;

Steps A 12 calculates temporal interval between two central points according to sample frequency again;

Steps A 13, judgement is then carried out next step if yes at interval whether less than 2 to 3 milliseconds, if otherwise skip to steps A 5;

Steps A 14 o'clock as a note signal segment, and records the signal sampling between two central points in the interim recording medium;

Steps A 15, repeating step 8～14 are sought out next note signal segment;

Steps A 16, relatively in next note signal segment, whether the off-set value of center line and maximum point is far smaller than in the previous note signal segment center line and peaked side-play amount, if yes, can assert that then this is the state that fades out of voice, when variable-speed processing, will this section sound not done special processing, and skip to steps A 19, if, do not carry out next step;

Steps A 17, relatively in next note signal segment, the off-set value of center line and maximum point, whether with previous note signal segment in, center line and peaked side-play amount are approximate, if yes, skip to step 19, if deny, the execution next step;

Steps A 18 is a benchmark with second central point of first note signal segment, skips to steps A 5;

Steps A 19 judges whether all measuring point in the structural table have all passed through relatively recognition, then carries out next step if yes, if not, then skip to steps A 5;

Steps A 20 is determined note signal segment in the voice signal;

Steps A 21 is duplicated all note signal segments according to the setting of the rate of articulation in a recording medium;

Steps A 22 with Audio Processing Unit 20, converts the note signal segment that is replicated in the recording medium to can listen voice signal;

Steps A 23 judges whether the note signal segment after handling all duplicates, and if yes, if execution in step A25 not, carries out next step;

Steps A 24 is taken out the note signal segment after next record duplicates, and is skipped to steps A 22; And

Steps A 25 places waiting status with Audio Processing Unit 20.

In above-mentioned steps A21, if set the rate of articulation doppio lunghezza di tempo of the rate of articulation than standard, then the voice signal among Fig. 2 40 after treatment will be as shown in Figure 6, each

note signal segment

41,51,61 is done twice duplicating in recording medium, so after each original

note signal segment

41,51,61, will produce

note signal segment

41a, 51a, 61a respectively; But if set the rate of articulation slow 1/2nd times than the rate of articulation of standard, then can be as shown in Figure 7,

note signal segment

41,61 with odd number in the voice signal, in recording medium, do twice duplicate, produce

note signal segment

41a, 61a, the note signal segment 51 of even number is then only done duplicating once in recording medium; In addition, with reference to Fig. 8, if it is fast again than the rate of articulation of standard to set the rate of articulation, then be every a note signal segment, in recording medium, duplicate a note signal segment, just only in this voice signal for the

note signal segment

41,61 of odd number cis-position duplicates, just can realize the quick broadcast of voice.

Method of the present invention can be done variable-speed processing to the voice document of various forms, so that behind the broadcasting speed of adjusting voice, the clear wash rice of the voice that produced, intonation remain unchanged, sound is undistorted.

The above is preferred embodiment of the present invention only, is not limited to implement with the device of above-mentioned hardware, and any modification that any those skilled in the art is done in the field of the invention has equal effect, all should contain in the scope that is placed on claim.

Claims

1. a method that changes the rate of articulation is applied to the broadcast of digitized voice signal, so that an Audio Processing Unit can be play this voice signal with the predetermined rate of articulation, described method includes:

Obtain the note signal segment in this voice signal;

Set a broadcasting speed of this voice signal;

Set an original broadcasting speed of this voice signal,, duplicate this note signal segment in a recording medium by an arithmetic logic unit according to this broadcasting speed; And

By this Audio Processing Unit, change the voice signal that to listen into being stored in this note signal segment of this recording medium.

2. change the method for the rate of articulation according to claim 1, wherein this note signal segment is made up of a plurality of signal sampling point.

3. change the method for the rate of articulation according to claim 1, wherein this arithmetic logic unit is that this note signal segment is duplicated two inferior in this recording medium.

4. change the method for the rate of articulation according to claim 1, wherein this arithmetic logic unit be with in this voice signal for this note signal segment of odd number order duplicate two inferior to this recording medium in, and this note signal segment of even number order duplicated one inferior in this recording medium.

5. change the method for the rate of articulation according to claim 1, wherein this arithmetic logic unit be only will be in this voice signal for the voice symbol signal segment of odd number order duplicate one inferior to this recording medium in.