WO1996008813A1

WO1996008813A1 - Sound characteristic convertor, sound/label associating apparatus and method to form them

Info

Publication number: WO1996008813A1
Application number: PCT/JP1995/001806
Authority: WO
Inventors: Seiichi Tenpaku; Yohichi TOHKURA
Original assignee: Arcadia, Inc.; Atr Human Information Processing Research Laboratories Co. Ltd.
Priority date: 1994-09-12
Filing date: 1995-09-12
Publication date: 1996-03-21
Also published as: AU3400395A; JP3066452B2

Abstract

A sound characteristic convertor for accomplishing desired sound characteristic conversion by an easy operation. The waveform of input sound data is displayed on a sound display portion (80). This sound data is divided in accordance with its sound level. The result of division is displayed by lines (84a, 84b, 84c, 84d). An operator inputs a corresponding label to a label display portion (82) on the basis of this display. In this instance, punctuation marks are also inputted for the correspondence between sound and its label. Visual modifications such as highlighting, underlines, etc. are applied to the label by icons (90, 92, 94, 96), and conversion of the sound characteristics corresponding to the visual modifications is made as to the division of the corresponding sound data.

Description

Description Damage Sound characteristics conversion equipment IR, sound, label matching equipment B and their methods

The present invention relates to characteristics 变换 for sounds such as voices, musical sounds, and natural sounds, and more particularly to facilitating the conversion operation. In addition, the present invention relates to the association between a sound and a label that is a miracle for the characteristic conversion.

It has been practiced to convert the characteristics of sound such as voice to obtain the characteristic of g. In order to convert the characteristics, it is necessary to convert the time domain waveform and frequency spectrum of the sound. This is typically performed by transforming the waveform or the spectrum. For example, an analog audio signal is fetched and converted into digital data. After performing waveform deformation corresponding to the desired characteristic conversion, an operation is performed to convert it back to analog 倌. This makes it possible to convert the characteristics of the voice to the desired characteristics.

However, the conventional characteristics conversion as described above has the following problems. The processing of the characteristic conversion is performed by displaying the parameters of the time domain waveform of the sound, the frequency spectrum, the linear prediction (LPC), etc. on the display, and modifying them. By this operation, in order to obtain the desired characteristics, it is necessary to have specialized knowledge tt on the parameters of the time domain waveform and the wave number spectrum ぺ ^ ί ^ predictive analysis (LPC). Was. In addition, even if they have specialized ffl knowledge, there is a shortcoming that sufficient training is required to perform the desired characteristic conversion.

The present invention solves the above-described problems and provides a sound quality conversion device fit and method that can be easily converted, and a sound-label association device and method suitable for these. Intended to be offered,

The sound characteristic changing device of the cocoon finding 1 is a sound data holding sound data classified according to a predetermined division and label data associated with each division of the sound data. (^ When the decoration data is given to the label data and the label data, based on the decoration data, the label based on the label / data is displayed. Means for converting the sound data associated with the label data into data provided corresponding to the label data and performing a corresponding characteristic conversion.

The sound characteristic conversion device g according to claim 2 is input with sound data classification means for classifying input sound data based on sound breaks, and a delimiter corresponding to the sound breaks. Label data dividing means for dividing the label data based on the delimiter, and correspondence forming means for associating the divided sound data and the divided label data with each other,

»Sound characteristic conversion concealment in claim 3 makes it arrogant that the visual modification in the label is the character decoration in the label.

The sound characteristic conversion device B of the refinement 4 is characterized in that the visual K modification to the label is the order of the label.

In the sound characteristic changing method of the refinement item 5, the sound data is associated with the label data, and the sound characteristic conversion content is associated with the modification processing, and is represented by a label. The label is visually displayed based on the given decoration processing, and the characteristic conversion corresponding to the decoration processing given to the label data is performed on the sound data associated with the label data. It is characterized by doing.

»The sound characteristic conversion method of claim 6 divides the input sound data based on sound breaks, classifies lab notes according to the sound breaks, and classifies the categorized labels. It is characterized by being associated with the sound data

»The sound characteristic converting apparatus according to claim 7 includes: sound and label data means for holding sound data classified according to a predetermined section and label data associated with each section of the glare sound data; Performs the corresponding characteristic conversion on the sound data associated with the label based on the modification data given corresponding to the label data It has a modifier S.

According to the sound characteristic conversion method of claim 8, 耷 the label data is associated with the data, the characteristic conversion content is associated with the modification processing, and the sound data associated with the label data is associated with the sound data. , and wherein there to make the characteristics Transformations corresponding to modification treatment was performed on the label data _β

The system according to claim 9 has a transmitting device and a receiving device that are communicable through a communication path, and is a system that transmits sound data from the transmitting device to the receiving device fi. Then,

The 側 fl-side device includes data input means for inputting label data and modification data, and communication means for transmitting the label and the modification data to the receiving device fg via a communication path.

The receiving means comprises: a communication means for receiving the label / i data and the modification data from the fl side device; and a standard sound data generating means for generating standard sound data based on the label data. And converting means for converting the sound characteristics of the reference sound data based on the modification data, and converting the sound characteristic conversion data.

»The transmission method of claim 10 is a method of transmitting sound data from the ¾it side to the receiving side via a communication path,

The transmitting side inputs the label data and the modification data, transmits the label data and the modification data to the receiving side via the AT path,

The receiving side receives the label data and the data from the transmitting side, generates standard sound data based on the label data, converts the standard sound data into sound characteristics based on the modification data, and converts the sound characteristics. 11. The sound 'label associating device according to claim 11, wherein the sound data is generated based on a sound data input means for inputting sound data and a sound volume represented by the sound data. Sound data classification means for classifying data, label data input means for inputting a label having a delimiter at the position B corresponding to the classification of sound data, and label data input means for inputting a label based on the delimiter code. 13. The sound according to claim 12, further comprising: label data classifying means for classifying the data; and correspondence forming means for associating the classified sound data and the classified labels / data with each other. Label mapping Location, the sound data input hand to enter the 耷 data Data input means for inputting label data corresponding to the sound data, and the average time IW of each label represented by the label data and the time of the sound data (based on the sound And a detailed correspondence forming means for classifying the data in association with each label.

»Sound of claim 13 The label fi associating unit is related to the label and sound data associated by the correspondence forming means, and the average continuous time of each label represented by the label Λ ^ —ta And detailed correspondence forming means for classifying the sound data in association with each label based on the data and the data read time.

The sound / label associating device according to claim 14, wherein the sound display unit for visually displaying the properties of the sound represented by the sound data, and the label display for displaying the label represented by the label data And a sound display section for displaying a separation mark indicating a sound separation.

The sound of claim 15 is a method of associating a label with sound data based on the loudness of the sound represented by the sound data, and adding a delimiter at a position corresponding to the sound data in the sound data. Receiving the classified label data, the label / U data is divided based on the delimiter, and the classified sound data and the classified label data are associated with each other.

The sound 'label associating method of claim 16 is a method of associating sound data with a label, and an average time for each label is prepared in advance and is represented by a label. It is characterized in that sound data is classified according to each label based on the average continuous time of each label and the restoration time ffi of sound data. For the labeling method, the average welding time for each label is prepared in advance, the associated label data and sound data are added, and the average of each label represented by the label data «U¾ closing and sound data耷 Data is classified according to each label based on the time of

I »Solution 18 for sound · Display method for label matching is as follows: 音 Sound display section for visually displaying the nature of the sound represented by the data, displaying the label represented by the label data And a delimiter mark indicating a sound delimiter is displayed on the display unit. In the present invention, the label data refers to a character string, a figure ^ I, a syllabary column, a column of hinoki, and a combination thereof, which can be associated with a sound such as a voice or a natural sound. This includes text data, icon data corresponding to characters, and the like. The sound data is data representing a sound waveform directly or in an IW manner, and includes, for example, data obtained by digitizing an analog sound waveform and data representing a sound represented by LPC parameters.

Modification includes strong 1¾, addition of underline, addition of sign, etc., addition of order, etc., performed on labels in order to convert the characteristics of sound obtained based on sound data. It is. The contents of the decoration data may indicate the contents of the property conversion, but when the label data is visually modified based on the decoration data, it may be preferable to indicate the contents of the visualization. ,

Sound characteristic conversion refers to changing some property of a sound, for example, changing pitch, changing intensity, adding vibrato, changing frequency spectrum, changing duration. , Change of sampling interval, feminization of sound quality, masculinization, clarification, unknown I », etc., as well as sound quality change such as combination of these, etc., input of the order of sound output; It is a concept that includes deletion of parts, etc.

The color characteristic conversion device of claim 1 and the sound characteristic conversion method of claim 5 associate label data with sound data, associate sound characteristic conversion contents with modification processing, and generate label data. Is visually displayed based on the given modification ftt theory, and the sound data associated with the label data is subjected to the modification processing given to the label data. It is featured to perform the corresponding characteristic conversion. * Therefore, it is possible to perform the characteristic conversion for the sound only by visually modifying the corresponding label.

The sound characteristic conversion device of claim 2 and the sound characteristic conversion method of claim 6 classify input sound data based on a sound segment, and label data according to the sound segment. The classified labels are made to correspond to the classified sound data. Therefore, the two can be correlated only by inputting the sound data and the label data.

According to the sound characteristic conversion device fi of claim 7 and the sound characteristic conversion method of claim 8, In addition to associating the label data with the characteristic conversion contents and the logic, the data corresponding to the label / ^-data corresponds to the β processing applied to the label data. Characteristic conversion is performed. Therefore, it is possible to perform the characteristic conversion on the sound only by performing the processing on the label that has clear syllable divisions compared to the sound data.

The sound transmission system according to claim 9 and the sound transmission method according to claim 10 are configured such that a transmitting side inputs a labeler and modification data, and a receiving side generates standard sound data based on the label data. In addition, the standard sound data is converted to sound based on the modification data to generate sound quality conversion data. Therefore, it is possible to send the sound with the desired sound characteristics only by sending the label data and the modification data. it can.

If the eleventh term 'sound' label associating device and the fifteenth term '耷' label associating method, the sound data is divided based on the loudness of the sound represented by the 耷 data. Receiving label data with a delimiter attached to the position corresponding to the previous division, classifying the label data based on the delimiter, and associating the classified sound data and the classified label data with each other It has a special feature. Therefore, the segmented sound data can be easily associated with the segmented lab notes.

»Sound of claim 1 2 · Label associating device and» Sound of claim 16 · Sound and label associating method is a method of associating sound data with a label. tt time is prepared in advance, and the sound data is classified according to each label based on the average fttt time of each label represented by the label data and the joint time of the sound data. It is a feature. Therefore, it is easy to associate 耷 data for each label.

The sound / label associating device according to claim 14 and the display method for sound / label associating according to claim 18 are provided for visually displaying the properties of the sound represented by the sound data. Sound display », which is equipped with a label Jiro that displays a label represented by label data, and is characterized in that a delimiter mark indicating a sound delimiter is displayed on the sound display unit. Label data can be input and displayed while confirming the delimiter position of the sound data. Simple explanation of the drawing

FIG. 1 shows a display screen 0 of the sound characteristic conversion device according to an embodiment of the present invention.

FIG. 2 is a diagram showing an overall configuration of a sound quality conversion device according to an embodiment of the present invention. 03 is a diagram showing a hardware configuration in which a CPU is used to realize the functions of FIG. is there.

FIG. 4 is a flowchart showing the operation of the sound characteristic conversion device.

FIG. 5 is a flowchart showing the operation of the sound characteristic conversion device.

FIG. 6 is a diagram showing label data written in association with audio data. FIG. 7 is a diagram showing a useful state of audio data.

FIG. 8 is a diagram showing a label displayed on CRT16.

FIG. 9 is a diagram showing the correspondence between the visual modification and the sound quality change content.

FIG. 10 is a diagram showing label data to which modification data is added.

Figure 11 is the side showing the label with the visual modification.

FIG. 12 is a diagram for explaining the division of audio data.

Fig. 13 is a diagram showing the pitch conversion process.

Figure 14A shows the sound source waveform before pitch conversion.

Figure 14B is a diagram showing the sound source waveform after pitch conversion.

Figure 15A is E that shows the voice data before the power change and its short-term average power.

Figure 15B is 0, which shows the voice data after the power change and its short-time TO section average power.

FIG. 16A shows the original audio data.

0 16 B is a diagram showing audio data in which the duration of the sound has been changed.

FIG. 16C is a diagram showing audio data subjected to vibrato.

FIG. 17 is a diagram showing an example of a symbol used as fiEfldi for the icon. FIG. 18 is a diagram showing how the inclination of the sound is inserted.

FIG. 19 is a diagram illustrating an example of the delimitation process for Nissei. FIG. 20 is a diagram showing an example of classification for each label.

FIG. 21 is a diagram showing one embodiment of a voice transmission device.

FIG. 22 is a diagram showing an example of data transmitted in the embodiment of the problem 21. FIG. 23 is a diagram showing a table in which the modified data is converted into a code. Best mode for carrying out the invention »

FIG. 2 shows the overall configuration of a sound quality conversion apparatus according to one embodiment of the present invention. Sound data classification means 2 classifies input sound data based on sound divisions, and label data classification means. In 4, label data with a delimiter corresponding to the sound delimiter is input. Labeling means 4 classifies the label data based on the delimiter code. The classified sound data and the classified label data are input to the correspondence forming means 6 and are associated with each other for each division. Can be The associated sound data and label data are stored in the sound

The display control means 10 receives the modification data for each division, modifies the corresponding labeler, and displays the modified label on the display means 14. The conversion means 12 receives the modification data for each section, modifies the corresponding 耷 data, and outputs the modified sound data,

Fig. 3 shows a hardware configuration that realizes the configuration of 02 using a CPU. The bus line 40 has display means such as CRT 16 and CPU 18 and sound and label data holding means. Memory 20, input interface 22, hard disk 24, output interface 26, and floppy disk drive (FDD) 15 are connected. A microphone 30 is connected to the input interface 22 via an AZD converter 28. The input interface 22 also has a keyboard 32 and a mouse 34 connected to it.The output interface 26 has a speaker 38 connected to it via a DZA converter 36. The hard disk 24 stores a program whose flow chart is shown in FIGS. 3 and 4. The program is installed on the hard disk 24 from a floppy disk (recording medium) by the FDD 15. It is a thing. Of course, CD— It may be installed from a JE recording medium such as a ROM. The memory 20 is used as a sound / label data holding unit and also as a work area for executing a program.

According to FIGS. 4 and 5, the processing operation of the CPU 18 will be described. First, in step S1, an audio signal (analog audio data) is input by the microphone 30. When the audio signal is input, the CPU 18 converts the digital signal converted by the A / D converter 28 into a digital signal. Data (digital audio data). Further, the CPU 18 displays the waveform of the audio data on the sound display section 80 of the CRT 16. This display state is shown in FIG.

Next, the digital audio data is divided based on sound divisions (step S2). This division is performed as follows. For example, suppose that the voice “Hi my name is John Nice to meet youj” is input. The digital data obtained at this time is as shown in the upper row of FIG. 12. The upper row of FIG. The CPU 18, which is a waveform display of the digital voice data, calculates the short-term average average power based on the digital voice data.The calculated short-time average power is shown in the lower part of 012. .

Next, the CPU 18 performs classification based on two thresholds of a data level and a skip level. After the end of the segmentation, if the short-term average power exceeds the data level for more than lOOmS, then the segmentation starts. After the segmentation starts, the short-term average power increases. If the value falls below the skip level for 8 OmS or more consecutively, the classification is terminated. In this manner, the division is performed. In this embodiment, the data level is set to 50 dB and the skip level is set to 40 dB. Based on the above classification, as shown in FIG. It can be determined that 630tnS to 189 OmS is the second category, and 2060mS to 239 OmS is the third category. CPU 18 is based on the determined division, on the waveform of the sound display Prefecture 80 of CRT 16, a line 84 represents a distribution position a, 84 b, 84 c, 84 d displays (see times 1) _beta

The CPU 18 stores the divided digital audio data in the memory 20 (step S3). Each audio data recorded in the _β memory 20 by tS is shown in FIG. The first section is described below the address ADRS 1, the second section is described below the address ADRS 2, and the third section is described below the address ADRS 3.

Next, the label data corresponding to the above sound data is input from the keyboard 32 to the label display 82 of the CRT 16 shown in FIG. 1 (step S4). Enter with punctuation as a delimiter. For example, for the above audio data, enter “Hi, ray name is John. Nice to meet you. J. received, the force ^s go-between in which the label data to the punctuation point, three of Γ Hi J Γ my name is John J "Nice to meet you J, is divided.

In this embodiment, as shown in FIG. 1, lines 84a, 84b, 84c, and 84d indicating the division positions for the audio data are displayed. Therefore, when inputting label data, it is easy to input the delimiter ^ in association with this.

The CPU 18 sequentially writes the divided labels / data in correspondence with the classified audio data (step S5). That is, as shown in FIG. The first address of the

If the number of audio data divisions does not match the number of label data divisions, it is preferable to correct the number of audio data divisions based on the number of label data divisions. The thresholds (data level and skip level) should be changed and the audio data should be re-classified to match the number of categories. Alternatively, by inferring from the number of characters in the label, the position of the audio data may be newly set or deleted to match the number of segments. Also, the person may correct the division using the mouse 30 or the keyboard 32.

Next, the CPU 18 displays a label based on the input labeler on the label U display 82 of the CRT 16 (see FIG. 1) (step S6). Figure 8 shows the displayed label. Next, the user applies a predetermined visual modification to the displayed label in accordance with the content of each sound characteristic conversion. Figure 9 shows an example of the correspondence between visual modification and sound characteristic variation. If this is useful as a correspondence table, it is possible to change the correspondence between visual modification and sound ^ conversion by changing this content. Wear. Note that the contents of FIG. 8 are displayed as icons on the CRT 16 as shown in FIG. 1, so that guidance is provided for easy operation.

Γ my name is If you want to increase the power only for John J's «J, perform the following operation. First, using the keyboard 32 or the mouse 34, select a part of “ラベル ay name is John J” in the label display part 82 of FIG. Next, click on the selected icon “my name is John j”, using the mouse 34, as an icon 90 to make it a strong W character. As shown in FIG. 10, メモリ_my name is Modification data "\ emphasis" is added to John j, where "\ j is a code indicating that the following sentence ^ is a control code (modification data). In step S7, the label modified based on the modification data is displayed on the label display section 82 of the CRT 16 (see FIG. 11) .As is clear from FIG. , And its contents can be easily ascertained.

Next, the CPU 18 reads the first ft section of the label data indicated by 010, and reads the corresponding audio data based on the first address ADRS1 (step S8). As a result, the digital audio data in the portion of Γ Hi j indicated by 012 is narrowed out. Next, it is determined whether the label data is added to the label data (Wfl ") (step S9). Here, since no ^ data is added, the process proceeds to step S11.

In step SI1, the CPU 18 determines whether or not processing has been performed for all sections. If not, the next section (step S12) returns and executes steps S8 and subsequent steps.修飾 Qualified data is added to my name is John J. Therefore, from step S9, proceed to step _S10.

In step S10, a predetermined characteristic change is performed on “ディジタル! WWJ” for the digital voice data of Γ my name is John J. Here, the power of the voice data is calculated according to the table of 囡 9. The power is increased by enlarging the waveform of the waveform represented by the digital audio data. Adre ADR S2 and below (may be recorded at another address to retain the original audio data).

When the processing for all segments completed, CPU 18 is a de Ijita / ^ voice data subjected to sound quality conversion is outputted from the output interface 26 (Step S 1 3) _beta audio data before characteristic conversion in Figure 15 Alpha Figure 15B shows the audio data after the characteristic conversion. Γ You can see that the power of my name is John J has been converted to be larger. The digital audio data thus converted is converted into analog audio data by the D / A converter WI 36, and is output from the speaker 38 as voice whose characteristics have been converted. That is, Γ my name is John J is enlarged and output.

As described above, sound can be converted simply by applying visual modification to the label, making operation extremely easy. In addition, what kind of sound conversion is applied to which category Can be easily done.

In the same manner, the upper bound of the pitch can be set. In this case, select the icon 92 after selecting the part where the pitch is to be increased (see FIG. 1). FIG. 13 shows the procedure for processing the upper bound of the _β pitch. The CPU 18 first performs morphological prediction analysis (LPC) on the target digital voice data, and separates the vocal data into sound source data and vocal tract transmission characteristic data. The pitch of the sound source data is changed. After that, it is re-synthesized with the vocal tract transfer characteristic data to obtain digital voice data with an increased pitch. For linear prediction analysis, “voice prediction” (JD arker.AHGray.Jr¾, Hisashi Suzuki, Corona Co.) is common. Figure 14 shows a part of the digital voice data before the pitch rise. And _β indicating a part of the digital voice data after the pitch rise

Other examples of conversion of sound characteristics are shown in Fig. 16 変換, Fig. 16Β, and Fig. 16C. Fig. 16A shows the audio data before conversion. Fig. 16B shows "my nane is John J. The label size has been processed to correspond to the time 1 «*.

Figure 16C relates Γ _m y name is John ", a voice data after the vibrato. Label is underlined * Underlined by vibrato May be changed.

The description of all the sound characteristic changes is omitted because it is difficult in terms of κ, but the present invention is directed to other sound characteristic conversions in general. Incidentally, in the case of performing the sound characteristics Hen换the definitive frequency «zone, with the use wavenumber Subetatoramu by FFT or the like, processing may be performed _β

The sound characteristic conversion, others to change primarily quality as described above, changing the order of sounds, or remove some sounds, _beta such as those including processing or return manipulation, 0 1 8 As shown, the order of output of sounds may be changed by entering the order of labels associated with sounds. In this example, `` Hi my name is John Nice to meet you J Γ Hi Nice to meet you ray name is

John J. In the same way, the sound can be deleted by deleting the label, and the sound can be replayed by duplicating the label. Although the description has been made using a character string as an example, a symbol 12, etc. may be used, and an icon may be used. In addition, in the case of voice masculinization, the mark for men is shown in Fig. 17A, and in the case of feminization, the symbol る M "for women is shown in Fig. 17B. It may be done so as to overlap.

Alternatively, the content of the sound K conversion may be determined in association with the face photo displayed on the surface, and the content of the sound 5 表示 conversion may be determined by selecting the face photo using a mouse or the like. .

Furthermore, although the sound was described in the above 2S example, it can be applied to all sounds such as natural sounds such as musical sounds, wind sounds, and wave sounds.

In the above embodiment, the sound is input from the microphone 30. However, the sound may be synthesized based on the label data. In this case, the basic sound is synthesized based on the label data. Based on the ^ content applied to the label data, the synthesized basic sound is converted into a characteristic and output. Alternatively, the sound may be described as data using an LPC parameter or the like.

In the above example, English voice has been described, but the present invention is applicable regardless of language. Figure 19 shows the Japanese word "Yes It shows the processing state of the classification for the voice input J.

In each of the above embodiments, the sound quality is converted by applying a crane to each section. Then, the sound data is converted into 耷 based on the number of lab notes in each section. If it is divided into syllables, it will be possible to perform sound quality conversion for each syllable. Such finer divisions will be described ift, taking the example of the B-Honma voice input of j Is also applicable to the statement R)

First, we measure the average successive length of time when some subjects uttered each element of the label. This is described in advance in the hard disk 24 as a table shown in Table 1.

[Margins below]

table 1

Category Element Average training time

(Scratch),,,,,,,,,,,,,,,,,

,,,,,,,,,,,,,,,,,,,

cv,,,,,,,,,,,,,,,,, 204.0 ms,,,,,,,, ,,,,,,, ぴ, (40.0 ms), ベ, 、, 、, ぱ, ，, 、, ぺ, 、, しゃ, ゅ, ゅ, り,

Chiyu, cho, ju, ju, jo, cha, ぢゅ, ぢ

Pods, pods, pods, pods, pods, pods, pods

C Y V Miya, Miyu, Miyo, Riya, Riyu, Riyo, Giya, Giyu, Giyo, 169.3ms

や, ゆ, よ, 、, び, び (19.6 ms),, 、, ，, 、, こ, つ, つ, 、, つ, つ

Tsutsu, tsu, tsu, tsu, tsu, tsu, tsu, tsu, tsu,

To, Tsuho, Tsuya, Tsuyu, Tsuyo, Tsuru, Tsuru, Tsuru, Tsuru,

DV Tsuru, Tsutsumi, Tsutsugi, Tsuki, Tsugu, Tsutsugi, Tsutsugo, Tsutsuji, Tsuji, 381.0 ms, Tsutsu, Tsutsu, Tsutsuda, Tsu, Tsutsutsu, Tsutsu, Tsutsumi, (49 . 4ms) beauty, Dzubu, - ^. pot, ■ tf, fine _¼ grain, Tsue, whiff,

Tsushiya, Tsushiyu, Tsushiyo, Tsuchiya, Tsuyu, Tsucho, Tsujiya,

Tsujiyu, Tsujiyo, Tsucha, Tsuchiyu, Tsuriyo

Tsukiya, Tsukiyu, Tsuyo, Tsuhiya, Tsuhiyu, Tsuhiyo, Tsuriya,

D Y V Tsuriyu, Tsuriyo, Tsujiya, Tsu Syu, Tsu Syo, Tsubiya, Tsubyu, 356.3ms

Tsubyo, Tsubaya, Tsubyu, Tsubyo (24.2ms)

V Oh, I, U, E, O 143. Bins

(34.0 ms)

N 118.5 ms

(29.4ms) First, the total time length τ of the input voice data “tatsuru” is actually JWed. Here, for example, it is assumed that the measured IW length T at all times is 802 ms. Next, each element of the label data “j” which is associated with the audio data is classified into categories according to ¾1. , J た j Γ 」分解 j Γ 、カテゴリカテゴリカテゴリカテゴリカテゴリカテゴリカテゴリ CPU カテゴリカテゴリ CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU. , T2, and t3, where 204.0ns + 381.0ms + 204.0ms = 789.Oms is obtained, and the total time length t and the average time K length t1, t2 of each element are obtained. Based on t 3, calculate the time ratios r 1, r 2, and r 3 of each element. For example, the time length ratio r 1 of the element “ta” is 204.0 / 789.0. Similarly, the time ratios r 2 and r 3 of the element “sutsu J” are 381.0 / 789.0 and 204.0Z789.0, respectively. *

Based on the time length ratios r1, r2, and r3 of each element calculated in this way, the measured total time length T is allocated to each element. For example, the real-time question T1 allocated to the element "

T 1 = Tr 1

Is calculated. The real-time TOT 2 and T 3 allocated to the element j

T 2 = Tr 2

T3 = Tr3

Based on the real-time ΜΤ1, Τ2, and 算出 3 calculated in this way, the audio data is divided as shown in FIG. 20. By performing the processing described above, a more detailed (label Audio data can be classified (for each search) and associated with a label (formation of international standards). Therefore, it is possible to perform characteristic changes of colors on a label element basis. For example, you can apply vibrato only to "ru" by underlining only "ru" and ^.

In this way, it is possible to categorize each syllable in a simple manner. Note that each syllable may be estimated more accurately by using a speech »« method.

In the above embodiment, the display control means 10 and the display means 14 are set to 89 so that the label data can be displayed while performing the operations. However, even if these means 10 and 14 are not used, if the structure of the modification data is known as shown in FIG. Data can be entered. In this case, ^ cannot be displayed by the display means 14, but the following effects are obtained.

Modification data can be <H ~ for sound data, but it is difficult to perform sound quality conversion over a given syllable range because 音 syllables are not clear in the data itself. . On the other hand, in the label data, the division between characters (corresponding to the division of each syllable) is clear, and it is easy to perform sound K conversion over a predetermined syllable range. That is, it is easy to perform sound quality conversion on syllables in a desired range.

In the above embodiment, the CPU is used to realize the function of each block in FIG. 2, but a part or the whole may be realized by hardware logic.

FIG. 21 shows an embodiment of the voice transmission system. The transmitting device S52 and the receiving device S60 are connected via the communication channel 50. The communication path 50 may be wired or wireless. Sending (I side device tt 52 includes data input means 54 such as a keyboard and communication means 56, and receiving side device 60 includes standard voice data generating means 62 and communication means 6 4, a stage 66 and audio output means 68 are provided.

Hereinafter, a case where voice is transmitted from the ^ -side device 52 to the receiving-side device 60 will be described as an example. First, the data input means 54 inputs the label and the modification data as shown in FIG. The “\ female” and “\ male” parts are ^ data, which determine the content of the sound R conversion of the label data following {}. In this example, "\ female j means transforming into a feminine voice, and" \ male j means transforming into a masculine voice,

Next, this data is transmitted to the receiving device 60 via the communication channel 50 by the communication device 56. The communication device 64 of the receiving device 60 receives this, and The possession · The standard voice data generating means 62 acquires the retained data, and extracts only the Raveno I data from the data. Here, “Good morning” and “OK” are extracted. The standard voice data generating means 62 uses the voice data synthesis method based on this label data to generate the corresponding standard voice. Generate data. On the other hand, the conversion means 66 extracts only the decoration data from the data held in the communication means 64. Here, “\ female J Γ \ male” is retrieved. The conversion means 66 converts the sound quality of the corresponding part of the standard audio data based on the modified data. The relationship between the data and the content of the sound quality conversion is predetermined. Here, "Good morning" is converted to feminized voice data, and "Gekigenga is" is converted to masculinized voice data. And output the sound quality conversion data obtained.

The audio output means 68 converts the sound quality conversion data into an analog signal, and outputs the analog signal from a speaker.

As described above, the voice is transmitted from the ¾ ^ -side device 52 to the receiving-side device fS60. According to this embodiment, the voice is transmitted only by sending the label data and the modification data with a small data amount. Can be sent. Also, in addition to the standard voice, it is possible to send the voice of the desired sound W based on the modified de

In the conventional device, the transmission speed was g because the voice data with much data was transmitted. However, according to this embodiment, the transmission speed can be improved significantly. In this case, a code may be attached to the decoration data and iStft may be applied to the receiving device β60, and only the code may be transmitted. For example, as shown in Fig. 23, it is convenient to write the modified data of \ inflection \ italic \ 25 points with the code "Aj ft,"

In addition, in order to confirm what kind of modification data is attached to the label / data in the transmitting device, the label is modified with the modification data as in the embodiment of FIG. It may be displayed.

Note that the various modifications, applications, and expansions described in the actual example of 囡 2 can be applied to this embodiment as well. For example, in this embodiment, the transmission target is voice, Can be used for sound ^ Jft,

Claims

Range of quest

1. Sound data and label data holding means, which holds sound data classified according to a predetermined division and label data associated with each division of this F 耷 data, and label / ^ data When the data is given, display control means for visually modifying the label based on the label data based on the »modified data and displaying the label on the display means,

Conversion means for performing a corresponding characteristic conversion on the sound data associated with the label data based on the modification data given corresponding to the label data;

Sound characteristic changing device equipped with.

2. The sound characteristic conversion device according to claim 1,

Data classification means for classifying input sound data based on sound delimiters, and labeling means for classifying input label data input with delimiters corresponding to etna sound delimiters based on the delimiter ^. / Data sorting means,

Correspondence forming means for associating the divided sound data and the separated label data with each other;

The sound characteristic conversion device fi that features

3. The sound characteristic conversion device according to claim 1 or 2,

A visual modification to an atria label is a character decoration for the label.

4. において In the sound characteristic converter of claim 1 or 2,

{1} The visual qualification for labels is special in that they are in the order of the labels.

5. In addition to associating the sound data with the label data, associating the sound characteristic change contents with the decoration processing, Display the label represented by the label data visually based on the given modification process,

This is to perform the characteristic conversion corresponding to the modification process given to the label data for the sound data associated with the label data,

Sound characteristic conversion method,

6. The sound characteristic conversion method according to claim 5,

The input sound data is divided based on the sound breaks, and the label A ^ -ta is classified according to the atrffi sound breaks, and the classified label data is associated with the divided sound data.

Sound characteristic conversion method,

7. For sound data classified according to a predetermined division and sound data that holds label data associated with each division of the sound data, label data holding means, sound data associated with the label data , A conversion means for performing a corresponding characteristic conversion based on »« ί data given corresponding to the label data,

Sound characteristic conversion device equipped with

8. In addition to associating the label data with the sound data, as well as the characteristic conversion contents and the modification processing,

Performing characteristic conversion corresponding to the modification processing performed on the label data for the sound data associated with the label data;

9. A system that has a side device that can communicate via a communication path and a receiving device, and that transmits sound data from the transmitting device to the receiving device. And

The side device is

Data input means for inputting label data and decoration data;

A communication device that transmits label information and modification data to a receiving device via a communication path. With steps and

The receiving means is

A means for receiving label / ^ data and modification data from side concealment, a standard data ^ K means for generating standard sound data based on the SE label, and a standard sound data Conversion means for converting sound characteristics based on the modification data and generating sound characteristic conversion data

Is a special transmission sound system.

1 0. This is a method of transmitting sound data from the transmitting side to the receiving side via a communication path.

Enter label data and decoration data,

Transmitting the label data and the modification data to the receiving side via the communication path,

The receiving side

Receive label data and modification data from the sender,

Generate standard sound data based on the label data,

To convert sound characteristics of standard sound data based on modification data to generate sound characteristic conversion data

Sound transmission method characterized by: 1 1. sound data input means for inputting sound data;

Sound data classification means for classifying sound data based on the loudness of the sound represented by the sound data

Raveno I data input means for inputting label data with a delimiter code at the position corresponding to the 1Ε division before the sound data,

Label data classification means for classifying label data based on a delimiter, correspondence forming means for associating the divided sound data and the classified label data with each other,

Sound with label

1 2. Sound data input means for inputting sound data,

Label data input means for inputting label data corresponding to sound data,

Sound / label associating device, comprising: detailed correspondence forming means for classifying sound data in association with each label based on the average continuous time of each label represented by a label and the ¾¾¾ time of sound data. .

1 3. In the sound of label 1 1

The label data and sound data associated by the correspondence forming means are divided into 83, and the sound data is associated with each label based on the average time of each label represented by the label data and the maintenance time of the sound data. Fine correspondence forming means to classify,

Characterized by having.

14. A sound display unit for visually displaying the nature of the sound represented by the sound data, and a label display unit for displaying a label represented by the label data.

A sound-label associating device characterized in that a sound display section displays a break mark indicating a sound break.

15. Divide the sound data based on the loudness of the sound represented by the sound data, receive label data in which a position corresponding to the division of the sound data is assigned a delimiter, and based on the delimiter. And label

A sound / label associating method, wherein the divided sound data and the divided label data are associated with each other. 1 6. A method for associating sound data with label ^

Prepare the average succession time Π5 for each label in advance,

Based on the average continuous time of each label represented by the label data and the duration of the sound data, the sound data is classified according to each label.

· Label matching method.

1 7. The method of claim 15 for associating a sound with a label,

Prepare the average «M¾ hour R5 for each label in advance,

Regarding the associated label data and sound data, the sound data is classified according to each label based on the average continuous time of each label represented by the label and the outage time of the sound data. What I tried to do

Characteristic labeling method.

1 8. A sound display unit for visually displaying the nature of the sound represented by the data, a label display unit for displaying a label represented by the label data,

A display method for associating sounds and labels, in which a delimiter mark indicating a sound delimiter is displayed in a syllable syllable.

1 9. »Convenient medium in which a program for implementing any of the methods of concealing or concealing using a computer in claims 1 to 18 is recorded.