WO1996008813A1 - Convertisseur de caracteristiques sonores, dispositif d'association son/marque et leur procede de realisation - Google Patents

Convertisseur de caracteristiques sonores, dispositif d'association son/marque et leur procede de realisation Download PDF

Info

Publication number
WO1996008813A1
WO1996008813A1 PCT/JP1995/001806 JP9501806W WO9608813A1 WO 1996008813 A1 WO1996008813 A1 WO 1996008813A1 JP 9501806 W JP9501806 W JP 9501806W WO 9608813 A1 WO9608813 A1 WO 9608813A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
sound
label
sound data
modification
Prior art date
Application number
PCT/JP1995/001806
Other languages
English (en)
Japanese (ja)
Inventor
Seiichi Tenpaku
Yohichi TOHKURA
Original Assignee
Arcadia, Inc.
Atr Human Information Processing Research Laboratories Co. Ltd.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Arcadia, Inc., Atr Human Information Processing Research Laboratories Co. Ltd. filed Critical Arcadia, Inc.
Priority to JP8510060A priority Critical patent/JP3066452B2/ja
Priority to AU34003/95A priority patent/AU3400395A/en
Publication of WO1996008813A1 publication Critical patent/WO1996008813A1/fr
Priority to US08/815,306 priority patent/US5956685A/en

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/02Methods for producing synthetic speech; Speech synthesisers
    • G10L13/033Voice editing, e.g. manipulating the voice of the synthesiser
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/003Changing voice quality, e.g. pitch or formants
    • G10L21/007Changing voice quality, e.g. pitch or formants characterised by the process used
    • G10L21/013Adapting to target pitch
    • G10L2021/0135Voice conversion or morphing

Definitions

  • the present invention relates to characteristics ⁇ for sounds such as voices, musical sounds, and natural sounds, and more particularly to facilitating the conversion operation.
  • the present invention relates to the association between a sound and a label that is a miracle for the characteristic conversion.
  • the conventional characteristics conversion as described above has the following problems.
  • the processing of the characteristic conversion is performed by displaying the parameters of the time domain waveform of the sound, the frequency spectrum, the linear prediction (LPC), etc. on the display, and modifying them.
  • LPC linear prediction
  • tt time domain waveform
  • LPC wave number spectrum
  • the present invention solves the above-described problems and provides a sound quality conversion device fit and method that can be easily converted, and a sound-label association device and method suitable for these. Intended to be offered,
  • the sound characteristic changing device of the cocoon finding 1 is a sound data holding sound data classified according to a predetermined division and label data associated with each division of the sound data. ( ⁇ When the decoration data is given to the label data and the label data, based on the decoration data, the label based on the label / data is displayed. Means for converting the sound data associated with the label data into data provided corresponding to the label data and performing a corresponding characteristic conversion.
  • the sound characteristic conversion device g is input with sound data classification means for classifying input sound data based on sound breaks, and a delimiter corresponding to the sound breaks.
  • Label data dividing means for dividing the label data based on the delimiter, and correspondence forming means for associating the divided sound data and the divided label data with each other,
  • the sound characteristic conversion device B of the refinement 4 is characterized in that the visual K modification to the label is the order of the label.
  • the sound data is associated with the label data
  • the sound characteristic conversion content is associated with the modification processing, and is represented by a label.
  • the label is visually displayed based on the given decoration processing, and the characteristic conversion corresponding to the decoration processing given to the label data is performed on the sound data associated with the label data. It is characterized by doing.
  • the sound characteristic conversion method of claim 6 divides the input sound data based on sound breaks, classifies lab notes according to the sound breaks, and classifies the categorized labels. It is characterized by being associated with the sound data
  • the sound characteristic converting apparatus includes: sound and label data means for holding sound data classified according to a predetermined section and label data associated with each section of the glare sound data; Performs the corresponding characteristic conversion on the sound data associated with the label based on the modification data given corresponding to the label data It has a modifier S.
  • the label data is associated with the data
  • the characteristic conversion content is associated with the modification processing
  • the sound data associated with the label data is associated with the sound data.
  • the system according to claim 9 has a transmitting device and a receiving device that are communicable through a communication path, and is a system that transmits sound data from the transmitting device to the receiving device fi. Then,
  • the ⁇ fl-side device includes data input means for inputting label data and modification data, and communication means for transmitting the label and the modification data to the receiving device fg via a communication path.
  • the receiving means comprises: a communication means for receiving the label / i data and the modification data from the fl side device; and a standard sound data generating means for generating standard sound data based on the label data. And converting means for converting the sound characteristics of the reference sound data based on the modification data, and converting the sound characteristic conversion data.
  • the transmission method of claim 10 is a method of transmitting sound data from the 3 ⁇ 4it side to the receiving side via a communication path
  • the transmitting side inputs the label data and the modification data, transmits the label data and the modification data to the receiving side via the AT path,
  • the receiving side receives the label data and the data from the transmitting side, generates standard sound data based on the label data, converts the standard sound data into sound characteristics based on the modification data, and converts the sound characteristics.
  • the sound 'label associating device wherein the sound data is generated based on a sound data input means for inputting sound data and a sound volume represented by the sound data. Sound data classification means for classifying data, label data input means for inputting a label having a delimiter at the position B corresponding to the classification of sound data, and label data input means for inputting a label based on the delimiter code.
  • Label mapping Location the sound data input hand to enter the ⁇ data Data input means for inputting label data corresponding to the sound data, and the average time IW of each label represented by the label data and the time of the sound data (based on the sound And a detailed correspondence forming means for classifying the data in association with each label.
  • the label fi associating unit is related to the label and sound data associated by the correspondence forming means, and the average continuous time of each label represented by the label ⁇ ⁇ —ta And detailed correspondence forming means for classifying the sound data in association with each label based on the data and the data read time.
  • the sound / label associating device wherein the sound display unit for visually displaying the properties of the sound represented by the sound data, and the label display for displaying the label represented by the label data And a sound display section for displaying a separation mark indicating a sound separation.
  • the sound of claim 15 is a method of associating a label with sound data based on the loudness of the sound represented by the sound data, and adding a delimiter at a position corresponding to the sound data in the sound data. Receiving the classified label data, the label / U data is divided based on the delimiter, and the classified sound data and the classified label data are associated with each other.
  • the sound 'label associating method of claim 16 is a method of associating sound data with a label, and an average time for each label is prepared in advance and is represented by a label. It is characterized in that sound data is classified according to each label based on the average continuous time of each label and the restoration time ffi of sound data. For the labeling method, the average welding time for each label is prepared in advance, the associated label data and sound data are added, and the average of each label represented by the label data «U3 ⁇ 4 closing and sound data ⁇ Data is classified according to each label based on the time of
  • the label data refers to a character string, a figure ⁇ I, a syllabary column, a column of hinoki, and a combination thereof, which can be associated with a sound such as a voice or a natural sound.
  • the sound data is data representing a sound waveform directly or in an IW manner, and includes, for example, data obtained by digitizing an analog sound waveform and data representing a sound represented by LPC parameters.
  • Modification includes strong 13 ⁇ 4, addition of underline, addition of sign, etc., addition of order, etc., performed on labels in order to convert the characteristics of sound obtained based on sound data. It is.
  • the contents of the decoration data may indicate the contents of the property conversion, but when the label data is visually modified based on the decoration data, it may be preferable to indicate the contents of the visualization. ,
  • Sound characteristic conversion refers to changing some property of a sound, for example, changing pitch, changing intensity, adding vibrato, changing frequency spectrum, changing duration. , Change of sampling interval, feminization of sound quality, masculinization, clarification, unknown I » etc., as well as sound quality change such as combination of these, etc., input of the order of sound output; It is a concept that includes deletion of parts, etc.
  • the color characteristic conversion device of claim 1 and the sound characteristic conversion method of claim 5 associate label data with sound data, associate sound characteristic conversion contents with modification processing, and generate label data. Is visually displayed based on the given modification ftt theory, and the sound data associated with the label data is subjected to the modification processing given to the label data. It is featured to perform the corresponding characteristic conversion. * Therefore, it is possible to perform the characteristic conversion for the sound only by visually modifying the corresponding label.
  • the sound characteristic conversion device of claim 2 and the sound characteristic conversion method of claim 6 classify input sound data based on a sound segment, and label data according to the sound segment.
  • the classified labels are made to correspond to the classified sound data. Therefore, the two can be correlated only by inputting the sound data and the label data.
  • the sound characteristic conversion device fi of claim 7 and the sound characteristic conversion method of claim 8 In addition to associating the label data with the characteristic conversion contents and the logic, the data corresponding to the label / ⁇ -data corresponds to the ⁇ processing applied to the label data. Characteristic conversion is performed. Therefore, it is possible to perform the characteristic conversion on the sound only by performing the processing on the label that has clear syllable divisions compared to the sound data.
  • the sound transmission system according to claim 9 and the sound transmission method according to claim 10 are configured such that a transmitting side inputs a labeler and modification data, and a receiving side generates standard sound data based on the label data.
  • the standard sound data is converted to sound based on the modification data to generate sound quality conversion data. Therefore, it is possible to send the sound with the desired sound characteristics only by sending the label data and the modification data. it can.
  • the sound data is divided based on the loudness of the sound represented by the ⁇ data.
  • Sound and label associating method is a method of associating sound data with a label. tt time is prepared in advance, and the sound data is classified according to each label based on the average fttt time of each label represented by the label data and the joint time of the sound data. It is a feature. Therefore, it is easy to associate ⁇ data for each label.
  • the sound / label associating device according to claim 14 and the display method for sound / label associating according to claim 18 are provided for visually displaying the properties of the sound represented by the sound data.
  • Sound display » which is equipped with a label Jiro that displays a label represented by label data, and is characterized in that a delimiter mark indicating a sound delimiter is displayed on the sound display unit. Label data can be input and displayed while confirming the delimiter position of the sound data.
  • FIG. 1 shows a display screen 0 of the sound characteristic conversion device according to an embodiment of the present invention.
  • FIG. 2 is a diagram showing an overall configuration of a sound quality conversion device according to an embodiment of the present invention.
  • 03 is a diagram showing a hardware configuration in which a CPU is used to realize the functions of FIG. is there.
  • FIG. 4 is a flowchart showing the operation of the sound characteristic conversion device.
  • FIG. 5 is a flowchart showing the operation of the sound characteristic conversion device.
  • FIG. 6 is a diagram showing label data written in association with audio data.
  • FIG. 7 is a diagram showing a useful state of audio data.
  • FIG. 8 is a diagram showing a label displayed on CRT16.
  • FIG. 9 is a diagram showing the correspondence between the visual modification and the sound quality change content.
  • FIG. 10 is a diagram showing label data to which modification data is added.
  • Figure 11 is the side showing the label with the visual modification.
  • FIG. 12 is a diagram for explaining the division of audio data.
  • Fig. 13 is a diagram showing the pitch conversion process.
  • Figure 14A shows the sound source waveform before pitch conversion.
  • Figure 14B is a diagram showing the sound source waveform after pitch conversion.
  • Figure 15A is E that shows the voice data before the power change and its short-term average power.
  • Figure 15B is 0, which shows the voice data after the power change and its short-time TO section average power.
  • FIG. 16A shows the original audio data.
  • 0 16 B is a diagram showing audio data in which the duration of the sound has been changed.
  • FIG. 16C is a diagram showing audio data subjected to vibrato.
  • FIG. 17 is a diagram showing an example of a symbol used as fiEfldi for the icon.
  • FIG. 18 is a diagram showing how the inclination of the sound is inserted.
  • FIG. 19 is a diagram illustrating an example of the delimitation process for Nissei.
  • FIG. 20 is a diagram showing an example of classification for each label.
  • FIG. 21 is a diagram showing one embodiment of a voice transmission device.
  • FIG. 22 is a diagram showing an example of data transmitted in the embodiment of the problem 21.
  • FIG. 23 is a diagram showing a table in which the modified data is converted into a code. Best mode for carrying out the invention »
  • FIG. 2 shows the overall configuration of a sound quality conversion apparatus according to one embodiment of the present invention.
  • Sound data classification means 2 classifies input sound data based on sound divisions, and label data classification means.
  • label data with a delimiter corresponding to the sound delimiter is input.
  • Labeling means 4 classifies the label data based on the delimiter code.
  • the classified sound data and the classified label data are input to the correspondence forming means 6 and are associated with each other for each division. Can be The associated sound data and label data are stored in the sound
  • the display control means 10 receives the modification data for each division, modifies the corresponding labeler, and displays the modified label on the display means 14.
  • the conversion means 12 receives the modification data for each section, modifies the corresponding ⁇ data, and outputs the modified sound data,
  • Fig. 3 shows a hardware configuration that realizes the configuration of 02 using a CPU.
  • the bus line 40 has display means such as CRT 16 and CPU 18 and sound and label data holding means.
  • Memory 20, input interface 22, hard disk 24, output interface 26, and floppy disk drive (FDD) 15 are connected.
  • a microphone 30 is connected to the input interface 22 via an AZD converter 28.
  • the input interface 22 also has a keyboard 32 and a mouse 34 connected to it.
  • the output interface 26 has a speaker 38 connected to it via a DZA converter 36.
  • the hard disk 24 stores a program whose flow chart is shown in FIGS. 3 and 4.
  • the program is installed on the hard disk 24 from a floppy disk (recording medium) by the FDD 15. It is a thing.
  • the memory 20 is used as a sound / label data holding unit and also as a work area for executing a program.
  • step S1 an audio signal (analog audio data) is input by the microphone 30.
  • the CPU 18 converts the digital signal converted by the A / D converter 28 into a digital signal.
  • Data digital audio data
  • the CPU 18 displays the waveform of the audio data on the sound display section 80 of the CRT 16. This display state is shown in FIG.
  • step S2 the digital audio data is divided based on sound divisions. This division is performed as follows. For example, suppose that the voice “Hi my name is John Nice to meet youj” is input. The digital data obtained at this time is as shown in the upper row of FIG. 12. The upper row of FIG. The CPU 18, which is a waveform display of the digital voice data, calculates the short-term average average power based on the digital voice data. The calculated short-time average power is shown in the lower part of 012. .
  • the CPU 18 performs classification based on two thresholds of a data level and a skip level. After the end of the segmentation, if the short-term average power exceeds the data level for more than lOOmS, then the segmentation starts. After the segmentation starts, the short-term average power increases. If the value falls below the skip level for 8 OmS or more consecutively, the classification is terminated. In this manner, the division is performed. In this embodiment, the data level is set to 50 dB and the skip level is set to 40 dB. Based on the above classification, as shown in FIG. It can be determined that 630tnS to 189 OmS is the second category, and 2060mS to 239 OmS is the third category. CPU 18 is based on the determined division, on the waveform of the sound display Prefecture 80 of CRT 16, a line 84 represents a distribution position a, 84 b, 84 c, 84 d displays (see times 1) beta
  • the CPU 18 stores the divided digital audio data in the memory 20 (step S3).
  • Each audio data recorded in the ⁇ memory 20 by tS is shown in FIG.
  • the first section is described below the address ADRS 1
  • the second section is described below the address ADRS 2
  • the third section is described below the address ADRS 3.
  • the label data corresponding to the above sound data is input from the keyboard 32 to the label display 82 of the CRT 16 shown in FIG. 1 (step S4).
  • Enter with punctuation as a delimiter For example, for the above audio data, enter “Hi, ray name is John. Nice to meet you. J. received, the force s go-between in which the label data to the punctuation point, three of ⁇ Hi J ⁇ my name is John J "Nice to meet you J, is divided.
  • lines 84a, 84b, 84c, and 84d indicating the division positions for the audio data are displayed. Therefore, when inputting label data, it is easy to input the delimiter ⁇ in association with this.
  • the CPU 18 sequentially writes the divided labels / data in correspondence with the classified audio data (step S5). That is, as shown in FIG. The first address of the
  • the number of audio data divisions does not match the number of label data divisions, it is preferable to correct the number of audio data divisions based on the number of label data divisions.
  • the thresholds should be changed and the audio data should be re-classified to match the number of categories.
  • the position of the audio data may be newly set or deleted to match the number of segments.
  • the person may correct the division using the mouse 30 or the keyboard 32.
  • the CPU 18 displays a label based on the input labeler on the label U display 82 of the CRT 16 (see FIG. 1) (step S6).
  • Figure 8 shows the displayed label.
  • the user applies a predetermined visual modification to the displayed label in accordance with the content of each sound characteristic conversion.
  • Figure 9 shows an example of the correspondence between visual modification and sound characteristic variation. If this is useful as a correspondence table, it is possible to change the correspondence between visual modification and sound ⁇ conversion by changing this content. Wear. Note that the contents of FIG. 8 are displayed as icons on the CRT 16 as shown in FIG. 1, so that guidance is provided for easy operation.
  • ⁇ my name is If you want to increase the power only for John J's «J, perform the following operation. First, using the keyboard 32 or the mouse 34, select a part of “ ⁇ ⁇ ⁇ ay name is John J” in the label display part 82 of FIG. Next, click on the selected icon “my name is John j”, using the mouse 34, as an icon 90 to make it a strong W character. As shown in FIG. 10, ⁇ ⁇ ⁇ my name is Modification data " ⁇ emphasis" is added to John j, where " ⁇ j is a code indicating that the following sentence ⁇ is a control code (modification data). In step S7, the label modified based on the modification data is displayed on the label display section 82 of the CRT 16 (see FIG. 11) .As is clear from FIG. , And its contents can be easily ascertained.
  • the CPU 18 reads the first ft section of the label data indicated by 010, and reads the corresponding audio data based on the first address ADRS1 (step S8). As a result, the digital audio data in the portion of ⁇ Hi j indicated by 012 is narrowed out.
  • step SI1 the CPU 18 determines whether or not processing has been performed for all sections. If not, the next section (step S12) returns and executes steps S8 and subsequent steps. ⁇ Qualified data is added to my name is John J. Therefore, from step S9, proceed to step S10.
  • step S10 a predetermined characteristic change is performed on “ ⁇ ⁇ ⁇ ⁇ ⁇ ! WWJ” for the digital voice data of ⁇ my name is John J.
  • the power of the voice data is calculated according to the table of ⁇ 9. The power is increased by enlarging the waveform of the waveform represented by the digital audio data. Adre ADR S2 and below (may be recorded at another address to retain the original audio data).
  • CPU 18 is a de Ijita / ⁇ voice data subjected to sound quality conversion is outputted from the output interface 26 (Step S 1 3) beta audio data before characteristic conversion in Figure 15
  • Alpha Figure 15B shows the audio data after the characteristic conversion.
  • You can see that the power of my name is John J has been converted to be larger.
  • the digital audio data thus converted is converted into analog audio data by the D / A converter WI 36, and is output from the speaker 38 as voice whose characteristics have been converted. That is, ⁇ my name is John J is enlarged and output.
  • FIG. 13 shows the procedure for processing the upper bound of the ⁇ pitch.
  • the CPU 18 first performs morphological prediction analysis (LPC) on the target digital voice data, and separates the vocal data into sound source data and vocal tract transmission characteristic data. The pitch of the sound source data is changed. After that, it is re-synthesized with the vocal tract transfer characteristic data to obtain digital voice data with an increased pitch.
  • LPC morphological prediction analysis
  • “voice prediction” JD arker.AHGray.Jr3 ⁇ 4, Hisashi Suzuki, Corona Co.
  • Figure 14 shows a part of the digital voice data before the pitch rise. And ⁇ indicating a part of the digital voice data after the pitch rise
  • FIG. 16A shows the audio data before conversion.
  • Fig. 16B shows "my nane is John J. The label size has been processed to correspond to the time 1 «*.
  • Figure 16C relates ⁇ m y name is John ", a voice data after the vibrato. Label is underlined * Underlined by vibrato May be changed.
  • the sound characteristic conversion may change the order of sounds, or remove some sounds, beta such as those including processing or return manipulation, 0 1 8
  • the order of output of sounds may be changed by entering the order of labels associated with sounds.
  • ⁇ Hi my name is John Nice to meet you J
  • Hi Hi Nice to meet you ray name is
  • the sound can be deleted by deleting the label, and the sound can be replayed by duplicating the label.
  • a character string as an example, a symbol 12, etc. may be used, and an icon may be used.
  • the mark for men is shown in Fig. 17A
  • the symbol ⁇ M "for women is shown in Fig. 17B. It may be done so as to overlap.
  • the content of the sound K conversion may be determined in association with the face photo displayed on the surface, and the content of the sound 5 ⁇ conversion may be determined by selecting the face photo using a mouse or the like. .
  • the sound is input from the microphone 30.
  • the sound may be synthesized based on the label data.
  • the basic sound is synthesized based on the label data.
  • the synthesized basic sound is converted into a characteristic and output.
  • the sound may be described as data using an LPC parameter or the like.
  • Figure 19 shows the Japanese word "Yes It shows the processing state of the classification for the voice input J.
  • the sound quality is converted by applying a crane to each section. Then, the sound data is converted into ⁇ based on the number of lab notes in each section. If it is divided into syllables, it will be possible to perform sound quality conversion for each syllable. Such finer divisions will be described ift, taking the example of the B-Honma voice input of j Is also applicable to the statement R)
  • Tsushiya Tsushiyu
  • Tsushiyo Tsuchiya, Tsuyu, Tsucho, Tsujiya
  • Tsukiya Tsukiyu, Tsuyo, Tsuhiya, Tsuhiyu, Tsuhiyo, Tsuriya,
  • the measured total time length T is allocated to each element. For example, the real-time question T1 allocated to the element "
  • the audio data is divided as shown in FIG. 20.
  • label Audio data can be classified (for each search) and associated with a label (formation of international standards). Therefore, it is possible to perform characteristic changes of colors on a label element basis. For example, you can apply vibrato only to "ru” by underlining only “ru” and ⁇ .
  • each syllable may be estimated more accurately by using a speech » « method.
  • the display control means 10 and the display means 14 are set to 89 so that the label data can be displayed while performing the operations.
  • these means 10 and 14 are not used, if the structure of the modification data is known as shown in FIG. Data can be entered. In this case, ⁇ cannot be displayed by the display means 14, but the following effects are obtained.
  • Modification data can be ⁇ H ⁇ for sound data, but it is difficult to perform sound quality conversion over a given syllable range because ⁇ syllables are not clear in the data itself. .
  • the division between characters corresponding to the division of each syllable
  • the CPU is used to realize the function of each block in FIG. 2, but a part or the whole may be realized by hardware logic.
  • FIG. 21 shows an embodiment of the voice transmission system.
  • the transmitting device S52 and the receiving device S60 are connected via the communication channel 50.
  • the communication path 50 may be wired or wireless.
  • Sending (I side device tt 52 includes data input means 54 such as a keyboard and communication means 56, and receiving side device 60 includes standard voice data generating means 62 and communication means 6 4, a stage 66 and audio output means 68 are provided.
  • the data input means 54 inputs the label and the modification data as shown in FIG.
  • the “ ⁇ female” and “ ⁇ male” parts are ⁇ data, which determine the content of the sound R conversion of the label data following ⁇ .
  • ⁇ female j means transforming into a feminine voice
  • ⁇ male j means transforming into a masculine voice
  • this data is transmitted to the receiving device 60 via the communication channel 50 by the communication device 56.
  • the communication device 64 of the receiving device 60 receives this, and The possession ⁇
  • the standard voice data generating means 62 acquires the retained data, and extracts only the Raveno I data from the data. Here, “Good morning” and “OK” are extracted.
  • the standard voice data generating means 62 uses the voice data synthesis method based on this label data to generate the corresponding standard voice. Generate data.
  • the conversion means 66 extracts only the decoration data from the data held in the communication means 64. Here, “ ⁇ female J ⁇ ⁇ male” is retrieved.
  • the conversion means 66 converts the sound quality of the corresponding part of the standard audio data based on the modified data. The relationship between the data and the content of the sound quality conversion is predetermined. Here, “Good morning” is converted to feminized voice data, and “Gekigenga is” is converted to masculinized voice data. And output the sound quality conversion data obtained.
  • the audio output means 68 converts the sound quality conversion data into an analog signal, and outputs the analog signal from a speaker.
  • the voice is transmitted from the 3 ⁇ 4 ⁇ -side device 52 to the receiving-side device fS60.
  • the voice is transmitted only by sending the label data and the modification data with a small data amount. Can be sent.
  • the transmission speed was g because the voice data with much data was transmitted.
  • the transmission speed can be improved significantly.
  • a code may be attached to the decoration data and iStft may be applied to the receiving device ⁇ 60, and only the code may be transmitted.
  • iStft may be attached to the decoration data and iStft may be applied to the receiving device ⁇ 60, and only the code may be transmitted.
  • the label is modified with the modification data as in the embodiment of FIG. It may be displayed.
  • the transmission target is voice

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Convertisseur de caractéristiques sonores permettant de réaliser la conversion d'une caractéristique sonore désirée par une opération simple. La forme d'onde des données sonores d'entrée est affichée dans une partie affichage de son (80). Ces données sonores sont divisées en fonction de leur niveau sonore. Le résultat de cette division est affiché par des lignes (84a, 84b, 84c, 84d). Un opérateur introduit une marque correspondante dans une partie visualisation de marque (82) sur la base de cette visualisation. Dans ce cas, les marques de ponctuation sont également introduites pour la correspondance entre le son et sa marque. Les modifications visuelles telles que le contrastage, le soulignement, etc. sont appliquées à la marque par les icônes (90, 92, 94, 96), et la conversion des caractéristiques sonores correspondant aux modifications visuelles s'effectue en ce qui concerne la division des données sonores correspondantes.
PCT/JP1995/001806 1994-09-12 1995-09-12 Convertisseur de caracteristiques sonores, dispositif d'association son/marque et leur procede de realisation WO1996008813A1 (fr)

Priority Applications (3)

Application Number Priority Date Filing Date Title
JP8510060A JP3066452B2 (ja) 1994-09-12 1995-09-12 音特性変換装置、音・ラベル対応付け装置およびこれらの方法
AU34003/95A AU3400395A (en) 1994-09-12 1995-09-12 Sound characteristic convertor, sound/label associating apparatus and method to form them
US08/815,306 US5956685A (en) 1994-09-12 1997-03-11 Sound characteristic converter, sound-label association apparatus and method therefor

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP21700694 1994-09-12
JP6/217006 1994-09-12

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US08/815,306 Continuation US5956685A (en) 1994-09-12 1997-03-11 Sound characteristic converter, sound-label association apparatus and method therefor

Publications (1)

Publication Number Publication Date
WO1996008813A1 true WO1996008813A1 (fr) 1996-03-21

Family

ID=16697351

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP1995/001806 WO1996008813A1 (fr) 1994-09-12 1995-09-12 Convertisseur de caracteristiques sonores, dispositif d'association son/marque et leur procede de realisation

Country Status (3)

Country Link
JP (1) JP3066452B2 (fr)
AU (1) AU3400395A (fr)
WO (1) WO1996008813A1 (fr)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2000016310A1 (fr) * 1998-09-11 2000-03-23 Hans Kull Procede et dispositif de traitement numerique de la voix
WO2002080087A1 (fr) * 2001-03-29 2002-10-10 Kosaido Co., Ltd. Code bidimensionnel, procede de lecture de code bidimensionnel, appareil de conversion de code bidimensionnel en discours, procede de conversion de code bidimensionnel en discours, codeur de code bidimensionnel, procede de codage de code bidimensionnel, procede de conversion de texte en discours, appareil de creation de don

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE3305685A1 (de) * 1982-03-05 1983-09-15 Sensormatic Electronics Corp., 33441 Deerfield Beach, Fla. Mit kodierten erkennungsmarken arbeitendes durchgangsueberwachungssystem
EP0306598A2 (fr) * 1987-09-08 1989-03-15 Clifford Electronics, Inc. Systèmes d'accès avec commande à distance électroniquement programmables
WO1991006926A1 (fr) * 1989-10-31 1991-05-16 Security Dynamics Technologies, Inc. Methode et dispositif destines a assurer la securite de l'identification et de la verification

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE3305685A1 (de) * 1982-03-05 1983-09-15 Sensormatic Electronics Corp., 33441 Deerfield Beach, Fla. Mit kodierten erkennungsmarken arbeitendes durchgangsueberwachungssystem
EP0306598A2 (fr) * 1987-09-08 1989-03-15 Clifford Electronics, Inc. Systèmes d'accès avec commande à distance électroniquement programmables
WO1991006926A1 (fr) * 1989-10-31 1991-05-16 Security Dynamics Technologies, Inc. Methode et dispositif destines a assurer la securite de l'identification et de la verification

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2000016310A1 (fr) * 1998-09-11 2000-03-23 Hans Kull Procede et dispositif de traitement numerique de la voix
AU769036B2 (en) * 1998-09-11 2004-01-15 Hans Kull Device and method for digital voice processing
WO2002080087A1 (fr) * 2001-03-29 2002-10-10 Kosaido Co., Ltd. Code bidimensionnel, procede de lecture de code bidimensionnel, appareil de conversion de code bidimensionnel en discours, procede de conversion de code bidimensionnel en discours, codeur de code bidimensionnel, procede de codage de code bidimensionnel, procede de conversion de texte en discours, appareil de creation de don

Also Published As

Publication number Publication date
AU3400395A (en) 1996-03-29
JP3066452B2 (ja) 2000-07-17

Similar Documents

Publication Publication Date Title
Harrington Phonetic analysis of speech corpora
Kohler Glottal stops and glottalization in German
EP0831460B1 (fr) Synthèse de la parole utilisant des informations auxiliaires
CN1758330B (zh) 用于通过交互式话音响应系统防止语音理解的方法和设备
Olaszy et al. Profivox—A Hungarian text-to-speech system for telecommunications applications
Zue et al. Transcription and alignment of the TIMIT database
Bellegarda et al. Statistical prosodic modeling: from corpus design to parameter estimation
Wells et al. Cross-lingual transfer of phonological features for low-resource speech synthesis
JP2001265375A (ja) 規則音声合成装置
Hirst The Symbolic Coding of Segmental Duration and Tonal Alignment: an Extension to the INTSINT System.
WO1996008813A1 (fr) Convertisseur de caracteristiques sonores, dispositif d'association son/marque et leur procede de realisation
Kasparaitis Diphone Databases for Lithuanian Text‐to‐Speech Synthesis
Kumar et al. Significance of durational knowledge for speech synthesis system in an Indian language
Soman et al. Corpus driven malayalam text-to-speech synthesis for interactive voice response system
Grover et al. Designing prosodic databases for automatic modelling in 6 languages
Hirst ProZed: a multilingual prosody editor for speech synthesis
JP4026512B2 (ja) 歌唱合成用データ入力プログラムおよび歌唱合成用データ入力装置
Carlson et al. The KTH speech database
JP2000047683A (ja) セグメンテーション補助装置及び媒体
Tanner Prosodic and durational influences on the formant dynamics of Japanese vowels
JP2015060038A (ja) 音声合成装置、言語辞書修正方法及び言語辞書修正用コンピュータプログラム
US20230245644A1 (en) End-to-end modular speech synthesis systems and methods
WO2009092139A1 (fr) Système de translitération et de prononciation
JPH11109988A (ja) 音情報可視化方法及び装置及び音情報可視化プログラムを格納した記憶媒体
Demir Phonotactic Properties of Turkish Folk Music Phonetic Notation System/TFMPNS: Urfa Region Sample

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AM AT AU BB BG BR BY CA CH CN CZ DE DK EE ES FI GB GE HU IS JP KE KG KR KZ LK LR LT LU LV MD MG MN MW MX NO NZ PL PT RO RU SD SE SG SI SK TJ TM TT UA UG US UZ VN

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): KE MW SD SZ UG AT BE CH DE DK ES FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN ML MR NE SN TD TG

DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
121 Ep: the epo has been informed by wipo that ep was designated in this application
WWE Wipo information: entry into national phase

Ref document number: 08815306

Country of ref document: US

REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

122 Ep: pct application non-entry in european phase