WO2005093713A1 - 音声合成装置 - Google Patents
音声合成装置 Download PDFInfo
- Publication number
- WO2005093713A1 WO2005093713A1 PCT/JP2005/005815 JP2005005815W WO2005093713A1 WO 2005093713 A1 WO2005093713 A1 WO 2005093713A1 JP 2005005815 W JP2005005815 W JP 2005005815W WO 2005093713 A1 WO2005093713 A1 WO 2005093713A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- unit
- speech
- data
- voice
- waveform
- Prior art date
Links
- 230000015572 biosynthetic process Effects 0.000 title claims abstract description 41
- 238000003786 synthesis reaction Methods 0.000 title claims abstract description 41
- 238000006243 chemical reaction Methods 0.000 claims abstract description 43
- 238000000605 extraction Methods 0.000 claims abstract description 35
- 238000012545 processing Methods 0.000 claims description 36
- 230000006870 function Effects 0.000 claims description 17
- 238000000034 method Methods 0.000 claims description 6
- 239000000284 extract Substances 0.000 claims description 5
- 238000004891 communication Methods 0.000 claims description 4
- 230000005540 biological transmission Effects 0.000 claims description 3
- 230000033764 rhythmic process Effects 0.000 abstract 4
- 238000010586 diagram Methods 0.000 description 19
- 230000002194 synthesizing effect Effects 0.000 description 6
- 238000005516 engineering process Methods 0.000 description 5
- 230000000694 effects Effects 0.000 description 3
- 238000001308 synthesis method Methods 0.000 description 2
- 101000888533 Conus ochroleucus Conantokin-Oc Proteins 0.000 description 1
- 208000032041 Hearing impaired Diseases 0.000 description 1
- 235000016496 Panda oleosa Nutrition 0.000 description 1
- 240000000220 Panda oleosa Species 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 238000010348 incorporation Methods 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 230000002265 prevention Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/08—Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination
- G10L13/10—Prosody rules derived from text; Stress or intonation
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/06—Elementary speech units used in speech synthesisers; Concatenation rules
- G10L13/07—Concatenation rules
Definitions
- the present invention relates to a speech synthesizing apparatus, and more particularly, to a speech synthesizing apparatus, which is provided with a speech database in which a plurality of speech data of a predetermined sentence recorded in advance are stored in a predetermined speech unit.
- the present invention relates to a speech synthesizer including a built-in microcomputer card incorporated in another device for performing corpus-based speech synthesis based on a speech database.
- predetermined words and phrases to be used are recorded in advance as sound sources, and by combining these, a sentence by a machine can be obtained.
- sound data similar to the voice waveform is stored in advance for each character, such as used for automatic telephone guidance, etc.
- the rule synthesis method that outputs as a sound similar to that of the voice waveform is required.
- the rule synthesis method ignores differences in contexts and words, and connects sound data by signal processing one character at a time so that each single sound is successively continued, thereby producing a real voice sound. Since the sound that is close to the waveform is synthesized, the output sound is a mechanical sound, and the sound quality is inevitably degraded, and the mechanical sound is not as natural as utterance. Listening for the distance, it became something uncomfortable.
- Patent Document 1 Japanese Patent No. 2894447
- Patent Document 2 Japanese Patent No. 2975586
- the device becomes large, it is difficult to incorporate it into a small product, for example, a welfare-related device for the hearing impaired, a toy, a home appliance, or the like.
- a welfare-related device for the hearing impaired for example, a toy, a home appliance, or the like.
- it was limited to call centers and the like, and its introduction was limited only to companies with large-scale facilities.
- a first object of the present invention is to reduce the size of an apparatus for performing corpus-based speech synthesis, and to provide a speech synthesis apparatus that can be incorporated and installed in another device.
- a second object of the present invention is to provide a speech database which is used for corpus-based speech synthesis, and which is provided with a speech synthesis apparatus which is capable of recording speech data selectively recorded for each application and detachably attaching the speech data. It is.
- the device of the present invention is a voice synthesizer including a built-in microcomputer that is built in another device, analyzes an arbitrary sentence in text data, and generates a phonetic symbol corresponding to the sentence.
- the text analysis unit that generates the data and the phonetic symbol data of the text analyzed by the text analysis unit according to the prosody knowledge base set in advance for accent and intonation!
- a prosody prediction unit that generates prosody parameters indicating accent and intonation, and a speech database that stores only a plurality of predetermined speech data that have been selected and recorded in advance so as to have only the required speech units according to the application of the speech synthesis device.
- the corresponding predetermined sound From the sound data having a predetermined sound unit closest to each of the prosody parameters generated by the prosody prediction unit, the corresponding predetermined sound
- a speech unit extraction unit that extracts all of the speech unit waveform data of the unit, and a speech unit waveform data group extracted by the speech unit extraction unit, and a speech indicated by the speech unit waveform data group in text order
- a waveform connecting unit that generates synthesized voice data by sequentially connecting the waveforms so that the waveforms are continuous.
- the object of the present invention is to solve the above-mentioned object by adopting new characteristic configuration means ranging from the upper-level concept to the lower-level concept listed below. It is done to achieve.
- the first feature of the present invention device is that it has a voice database that stores a plurality of voice data of a predetermined sentence recorded in advance so that it can be extracted as voice unit waveform data for each predetermined voice unit.
- a speech input device for performing corpus-based speech synthesis on arbitrary text data based on the speech database; a data input unit for acquiring serial data text data; Text that generates and processes phonograms that represent the sound corresponding to the sentence in vowels and consonants as phonogram data And an accent corresponding to each of the phonetic symbol data corresponding to an arbitrary sentence in the text data analyzed in advance according to a prosodic knowledge base set in advance for accent and intonation.
- a prosody prediction unit for generating the prosody parameter indicating the intonation; and storing only a plurality of the predetermined voice data selected and recorded in advance so as to have only the required voice unit corresponding to the use of the voice synthesizer.
- a voice unit extraction unit that extracts all of the voice unit waveform data, and the voice unit waveform data group extracted by the voice unit extraction unit,
- a waveform connection unit that generates synthesized voice data by connecting the sequential waveforms so that the voice waveform indicated by the voice segment waveform data group is continuous in the written order, and converts and outputs the synthesized voice data to analog voice
- a voice conversion processing unit that performs the above.
- a second feature of the device of the present invention is that the voice database in the first feature of the above device of the present invention is constructed on a memory card that is removable from the voice synthesizing device.
- the present invention resides in the configuration of a voice synthesizing device which is configured to be readable when the memory card is inserted.
- a third feature of the device of the present invention is that the data input unit in the first feature of the device of the present invention is connected to another device in which the speech synthesizer is incorporated and mounted, and the device power is also reduced.
- the present invention resides in adopting a configuration of a speech synthesizer that receives serial data.
- a fourth feature of the device of the present invention is that the speech synthesis device according to the first feature of the device of the present invention generates, from the waveform connection unit, a velocity parameter acquired together with the arbitrary sentence by the data input unit.
- the speech synthesis device is configured such that a speech speed conversion unit for adjusting the reading speed of the synthesized speech data to be reflected on the synthesized speech data thus obtained is provided in front of the speech conversion processing unit.
- a fifth feature of the device of the present invention is that the data input unit, the text analysis unit, the prosody prediction unit, the speech database, and the speech unit extraction unit in the first feature of the above-described device of the present invention.
- the waveform connection unit and the voice conversion processing unit are integrally provided in one case.
- the present invention is to adopt a configuration of a speech synthesizer.
- a sixth feature of the device of the present invention is that the data input unit, the waveform connection unit, and the voice conversion processing unit in the first feature of the above-described device of the present invention are incorporated in another device.
- the computer is separately installed on the same network, and the personal computer in the center passes through the data input unit, the text analysis unit, the prosody prediction unit, and the voice unit extraction unit directly connected to the voice database.
- the speech unit waveform data converted from the text data is transmitted to the embedded microcomputer via the network. And can be transmitted to the serial waveform connection section is constructed to the sound conversion processing section of the built-in microcomputer from the waveform connecting unit to a system for delivering the synthesized speech comprising, in the configuration adopting the speech synthesizer.
- a seventh feature of the device of the present invention is that the data input unit is connected to an arbitrary personal computer separately arranged in the first aspect of the device of the present invention, and the text input device is connected to the personal computer.
- the text data to be analyzed by the analysis unit is configured to be acquirable by the personal computer, and connected to an arbitrary speaker separately arranged as the audio conversion processing unit, and generated by the waveform connection unit.
- Another aspect of the present invention resides in a configuration of a voice synthesizing device configured to output the synthesized voice data through the speaker.
- An eighth feature of the device of the present invention is that the predetermined voice unit in the first feature of the above device of the present invention is one or more of a phoneme, a word, a phrase, and a syllable. In configuration adoption.
- a ninth feature of the device of the present invention is that the data input unit and the text analysis unit in the first feature of the above device of the present invention are provided in a personal computer that is used only at the time of initial setting, and are provided with serial data. And an initial setting function for outputting phonetic symbol data.
- the prosody prediction unit, the speech database, the speech unit extraction unit, the waveform connection unit, and the speech conversion processing unit are incorporated in other devices. Embedded maiko to be installed
- the personal computer is connected to the built-in microcomputer only at the time of initial setting, and the phonetic symbol data output from the personal computer is input to the prosody prediction unit of the built-in microcomputer, and the voice database is connected.
- the serial data input to the built-in microcomputer is analog-output sequentially through the prosody prediction unit, the audio unit extraction unit directly connected to the audio database, the waveform connection unit, and the audio conversion processing unit.
- a tenth feature of the device of the present invention is that, in the first feature of the device of the present invention, the data input unit, the waveform connection unit, and the voice conversion processing unit are used for an emergency alert or a guidance / communication output.
- the data input unit, the text analysis unit, the prosody prediction unit, the speech database, and the speech unit extraction unit are incorporated as a built-in microcomputer in the terminal. It is in the configuration and adoption of a speech synthesizer constructed by a system capable of one-way transmission via a microcomputer and a network.
- An eleventh feature of the device of the present invention is that the prosody prediction unit, the speech database, the speech unit extraction unit, the waveform connection unit, and the speech conversion processing unit in the first feature of the above-described device of the present invention.
- Another object of the present invention is to adopt a configuration of a speech synthesizer that is separated from the data input unit and the text analysis unit after initial setting and is incorporated as a microcomputer in a toy or other device.
- a speech synthesizer employing a corpus-based speech technology which has been inevitable to increase in size, is conventionally configured with an embedded microcomputer, and can be significantly reduced in size as compared with the related art. Since it can be incorporated into other devices, it can be used as a communication tool that enables voice transmission by being incorporated into welfare-related devices, such as dolls that can output the voice of a character. It can be used for various products such as toys and home appliances that can transmit information by voice.
- the voice database is constructed on a removable memory card and can be replaced according to the application, the size of the voice synthesizer can be reduced, and the voice database suitable for the application can be used.
- recording data it is possible to output a more natural voice by improving the reading correct rate and accent correct rate of speech synthesis, and to switch the output voice quality to the user's preference. Becomes possible.
- a medium-to-high speed line has conventionally been used for transmitting voice, but in the present invention, text data is received by a receiving side device and converted into voice. Audio broadcasting using a low-speed line is possible, and when applied to push-type services, only text data can be delivered and output as audio on the receiving device, saving labor. In addition, prompt services can be provided even in emergency situations such as disaster prevention radio.
- FIG. 1 is a functional configuration diagram of a speech synthesizer according to an embodiment of the present invention.
- FIG. 2 is a functional configuration diagram of the speech synthesizer obtained by adding a function of a speech speed conversion unit to the speech synthesizer described above.
- FIG. 3 is a schematic diagram showing an example of a hardware configuration of the above-described speech synthesizer.
- FIG. 4 is a diagram for explaining the data configuration of the above speech synthesizer, wherein FIG. 4 (a) shows text data, FIG. 4 (b) shows phonetic symbol data, and FIG. FIG. 3D is a diagram for explaining a prosody knowledge base, FIG. 3D is a diagram for explaining a prosody parameter, and FIG.
- FIG. 5 is a functional configuration diagram of a speech synthesis device according to a functional configuration example 2 of the present invention.
- FIG. 6 is a functional configuration diagram of a speech synthesis device according to a third functional configuration example of the present invention.
- FIG. 7 is a schematic diagram showing an example of a hardware configuration in which a speech synthesizer according to an embodiment of the present invention is mounted on a personal computer.
- FIG. 1 is a functional configuration diagram of a speech synthesizer according to an embodiment of the present invention.
- the speech synthesizer ex converts speech data of a predetermined sentence recorded in advance into predetermined speech units such as phonemes, words, phrases, and syllables, for example.
- predetermined speech units such as phonemes, words, phrases, and syllables
- microcomputer it is not necessary that the microcomputer be limited to all the above functional units.
- the microcomputer may be provided with a plurality of predetermined function units according to the scale thereof, and the other function units may be executed by a personal computer.
- the speech database 1 is a corpus for performing corpus-based speech synthesis, and is a predetermined corpus that is selected and recorded in advance so as to have only a predetermined speech unit corresponding to the use of the speech synthesizer OC.
- the voice synthesizer ⁇ is stored in a plurality of pieces, and is divided and constructed according to the use of the voice synthesizer ⁇ .
- the text analysis unit 2 is configured to analyze an arbitrary sentence in the input text data and generate phonetic symbol data corresponding to the sentence, and the prosody prediction unit 3 is internally provided. It has a prosodic knowledge base 3 ⁇ which is preset with regard to the recognition rules for accent and intonation of phonetic symbol data, and supports each of the phonetic symbol data generated by the text analysis unit 2 according to the prosodic knowledge base 3 ⁇ . It is configured to generate prosodic parameters that indicate the accent and inflection that occur.
- the speech unit extraction unit 4 converts speech data including a phoneme with an accent and intonation closest to each of the prosody parameters generated by the prosody prediction unit 3 into, for example, human auditory characteristics. Extracted from the speech database 1 using an evaluation function that is close to the above, and from each of the extracted speech data, only the speech unit waveform data of a predetermined speech unit such as a phoneme corresponding to this prosodic parameter is extracted. It is configured to
- the waveform connection unit 5 converts the plurality of speech unit waveform data groups extracted by the speech unit extraction unit 4 into a smooth and natural speech waveform of the speech unit waveform data group in the order of sentences. It is configured to generate synthesized speech data with natural prosody by successively connecting waveforms successively so as to produce speech.
- the built-in microcomputer of the voice synthesizer a is further connected to another device in which the voice synthesizer a is mounted, and input means such as a keyboard and a mouse in this device, It has a data input section 6 configured to receive serial data of a recording medium for recording data transmitted and received via a network, acquire the serial data, and input the text data to the text analysis section 2. It does not matter.
- the speech synthesizer ex can perform not only speech synthesis of text data set in advance, but also, for example, a user of the speech synthesizer ⁇ . Speech synthesis of any input sentence is possible, and it is possible to respond to the input of any text data that the user can input, and secure real-time properties such as receiving a desired sentence as needed and immediately outputting it as synthesized speech. It becomes possible.
- the synthesized voice data generated by the waveform connection unit 5 is converted to an analog signal, and the analog-converted synthesized voice data is output to a separately connected speaker or the like.
- a speech conversion processing unit 7 that outputs synthesized speech data as speech may be provided.
- the voice synthesis is performed.
- the device ⁇ can acquire text data and output synthesized speech data as speech without mounting the data input unit 6 and the speech conversion processing unit 7 in the speech synthesis device ⁇ . You can do it! /
- FIG. 2 is a block diagram of the speech synthesizer oc of FIG. 1 with the function of adjusting the reading speed of the synthesized speech.
- the speed parameter input together with the text data from the other equipment including the voice synthesizer ex 1 is mounted on the synthesized voice data generated by the waveform connection unit 5.
- the speech speed conversion unit 8 for adjusting the reading speed of the synthesized speech by reflecting the speech speed on the speech synthesizer 1 may be provided on the microcomputer of the speech synthesis device 1.
- FIG. 3 is a schematic diagram showing a hardware configuration example of the speech synthesizer ex shown in the present embodiment.
- the speech synthesizer ex includes a CPU (Central Processing Unit) 11 that sequentially controls each functional unit of the speech synthesizer ex, and a ROM (Read) accessible from the CPU 11. Only Memory) 12 and RAM (Randam Access
- the ROM 12 has a real-time OS (Operating System) and the voice analysis unit 2, the prosody prediction unit 3, the speech unit extraction unit 4, and the waveform connection unit 5. It is said that a processing program or the like to be executed by the CPU 11 of the device a is recorded.
- OS Operating System
- the speech synthesizer oc is constituted by, for example, a flash memory or the like, and is detachably attached to the memory card 14 detachably mountable to a, by constructing the voice database 1 into the memory card 14, Ya application equipment speech synthesizer a is incorporated
- the voice unit extraction unit 4 functions based on the voice database 1 in the inserted memory card 14. It is good to be constituted as follows.
- serial interface 15 functioning as the data input unit 6 and a DZA converter 16 (D / A: Digital to Analog) functioning as the audio conversion processing unit 7 may be mounted.
- D / A Digital to Analog
- FIG. 4 is a diagram for explaining the data configuration of the speech synthesizer ex shown in the present embodiment, where FIG. 4A shows text data, and FIG. Figure (c) illustrates the prosody knowledge base, Figure (d) illustrates the prosody parameters, and Figure (e) illustrates the speech database. This is schematically shown for this purpose.
- the text data input to the text analysis unit 2 includes, for example, an arbitrary “cross bridge” in the serial data acquired by the data input unit 6.
- this text data may be a mixture of kana and kanji characters, and if it can be converted into voice, the characters used in the text data are limited. It is not.
- the text data is not limited to a text format data file, but may be HTML (Hyper
- Text Markup Language Text Markup Language
- It may be extracted by removing HTML tags from the data file of Text Markup Language (Text Markup Language) format, and it is generated by directly inputting the user's power from the Internet homepage, e-mail, or input means such as keyboard and mouse. It can be text data! / ,.
- the phonetic symbol data generated by the text analysis unit 2 adopts, for example, phonetic symbols that indicate the sound of the text data by vowels and consonants.
- the phonetic symbol data generated based on the text data shown in FIG. 3A is, for example, “ha shi wo wa ta ru”.
- the prosodic knowledge base 3A determines the accent and intonation of phonetic symbol data. For example, for the "ha shi" of the phonetic symbol data shown in FIG. ”,“ Chopsticks ”,... are determined by their contextual power, and an algorithm that can determine the accent and inflection of these phonetic symbol data is provided.
- the prosody prediction unit 3 uses, for example, “11 &” and “11 &” for each predetermined voice unit of “ha shij” in the phonetic symbol data corresponding to “bridge” based on the prosody knowledge base 3A. 31 ⁇ "can be generated for each of the prosodic parameters. According to the prosodic knowledge base 3A, all of the phonetic symbol data can be accented or inflected. , Speed, etc. can be determined.
- accents and inflections are schematically illustrated for explanation by underlining or overlining superimposed on phonograms, accents and inflections and the like are described in the speech synthesizer ⁇ . Any form may be used as long as information necessary for the voice is recorded so as to be identifiable.
- the prosody parameter generated according to the prosody knowledge base 3A described in FIG. 5D For example, between the accent, inflection, and voice, which correspond to the context of the text data, are shown as parameters, and the underline between the accents of "wo" and "wa” shown in the figure are shown.
- the break indicates a predetermined interval between the phonetic symbols.
- the voice database 1 accessed from the voice unit extraction unit 4 is based on a prosodic knowledge base such as accents and inflections.
- a prosodic knowledge base such as accents and inflections.
- it is stored so as to be extractable as audio unit waveform data for each predetermined audio unit such as a phoneme.
- the speech unit extraction unit 4 receives a prosody parameter as shown in FIG. 4D from the prosody prediction unit 3, the speech unit extraction unit 4 has a unique accent and intonation indicated by the prosody parameter. Search the voice database 1 for voice data that has the closest accent and intonation that corresponds to each of the phonetic symbols, “ha”, “shi”, “wo”, “wa”, “ta”, and “ru”. .
- the voice unit extraction unit 4 extracts the "spring has come", “use”, and “movie”. That ",” I “, ... from the voice data, such as, consistent with the prosodic parameters" ha “,” shi “,” wo “,” wa “, r t aj, only speech unit waveform data of" ru " By cutting out and extracting, it is possible to generate synthesized speech data by smoothly connecting the speech unit waveform data in the waveform connection unit 5.
- a phoneme is employed as an example of a predetermined speech unit.
- input text data includes words or phrases previously stored in the speech database 1 in advance.
- the voice unit extraction unit 4 can extract the word or phrase stored in the voice database 1 without dividing it, By outputting these words or phrases as they are or in combination, more natural speech can be synthesized.
- the speech synthesis apparatus oc in which all the functional units 1 to 7 shown in the functional configuration diagram of FIG.
- all of the functional units 1 to 7 are integrally provided in a single case, and a voice synthesizer O that can execute voice synthesis alone without distributing functions to other equipment, devices, and the like. This makes it possible to execute a series of functional units 1 to 7 from serial data input to analog output in the case of one individual.
- the function configuration is not limited as long as all of the above functional units can be executed in a single case.
- the voice conversion output unit 7 and the data input unit 6 have speed and data (not shown).
- An input device or the like may be incorporated and mounted.
- a speech speed conversion unit 8 which is a function for adjusting the reading speed of the synthesized speech, is added to the speech synthesizer ⁇ of the configuration example 1, and all the functional units 1 to 8 described in FIG. Speech synthesizer ⁇ 2, which is integrated into the case, is assumed to be functional configuration example 2.
- the speech speed conversion unit 8 adjusts the speed of the synthesized speech by reflecting the speed parameter in the synthesized speech data.
- the speed parameter is input to the data input section as serial data together with the text data.
- the speed parameter is passed from the data input unit 6 to the waveform connection unit 5 in a state where it is added to each conversion data and parameter, and is recognized by the speech speed conversion unit 8 for the first time.
- the speech speed conversion unit 8 applies the value of the speed parameter to the synthesized speech data received together with the speed parameter from the waveform connection unit 5, and changes the reading speed of the synthesized speech.
- Configuration example 2 aims to change the speed according to the use situation and to accurately transmit the synthesized speech to the user by performing the speech speed conversion. By setting it to be late, it is easy to hear and it is effective in situations where the ability to judge calmly tends to be lacking, such as in an emergency.
- Fig. 5 shows the waveform connection unit 5 and the speech conversion processing unit 7 in the speech synthesizer ⁇ shown in Fig. 1 extracted and selected, mounted on the built-in microcomputer 2, and installed on a personal computer with other functional units installed separately.
- FIG. 5 is a functional block diagram showing an example of the configuration of the speech synthesis system ⁇ to perform a series of speech synthesis by incorporation.
- the speech synthesis system ⁇ converts the text data input in the event of a disaster such as a fire or an earthquake into synthesized speech as an embedded microcomputer 2 and generates an emergency alert.
- a speech synthesis system intended for an output terminal used when used.
- the speech synthesis system ⁇ includes a built-in microcomputer ⁇ 2 having a waveform connection unit 5 and a speech conversion processing unit 7, and a speech as each functional unit other than the above shown in FIG.
- a database 1 and a machine such as a personal computer having each function unit from the data input unit 6 to the voice unit extraction unit 4 are connected to a network and used.
- the built-in microcomputer ⁇ 2 may be connected to a network as a single unit or may be used by incorporating the built-in microcomputer ⁇ 2 into another device.
- the above-mentioned network connection method generally includes an Internet line and a telephone line that can be easily connected even at home or in small-scale facilities.
- the connection means is not limited as long as it can communicate data with separately installed equipment such as a dedicated line.
- the emergency alert may be used not only for guidance but also for guidance and communication.
- the speech speed conversion unit 8 shown in configuration example 2 in this configuration example It is also possible to change the reading speed according to.
- FIG. 6 is a functional configuration diagram of the built-in microcomputer ex3 in which a part of the functional units 1 and 3 to 5 and 7 of the speech synthesizer ⁇ shown in FIG.
- the embedded microcomputer ⁇ 3 outputs phonetic symbol data from an arbitrary personal computer 33 incorporating the data input unit 6 and the text analysis unit 2.
- This is a microcomputer that has a configuration that can be acquired and incorporates a speech database 1 and a series of functional units from the prosody prediction unit 3 to the speech conversion processing unit 7 that output synthesized speech. Note that the personal computer ⁇ 3 is disconnected after the initial setting.
- the embedded microcomputer a3 is intended to be mounted on other devices such as small devices such as toys, and examples of the devices to be mounted include toys, mobile phones, welfare-related devices such as hearing aids, and other devices. Is mentioned.
- these are not limited to only the small devices as described above, and include devices in which the content of the output synthesized voice is limited, such as vending machines, car navigation systems, unmanned reception facilities, and the like. If you use it, you can install embedded Simply adding a con oc 3 enables the synthesis voice function to be incorporated into these devices.
- FIG. 7 is a schematic diagram showing an example of a hardware configuration in which the speech synthesizer ex shown in the present embodiment is mounted on a personal computer ⁇ as another device.
- the speech synthesizer ex is mounted on and connected to an arbitrary personal computer ⁇ arranged separately, for example, data is input from the input means 21 mounted on the personal computer ⁇ . While the input unit 6 is configured to receive serial data, the synthesized voice data generated based on the serial data by the voice synthesizer ⁇ is transmitted from the voice conversion processing unit 7 to the voice built into the personal computer / 3 separately. By outputting an analog signal to the outputtable power 22, sound can be output from the speaker 22.
- the voice synthesizer ex includes a memory card 14 for recording the voice data base 1 in advance in the voice synthesizer ex. Even if the memory card 14 is fixedly and exclusively mounted in advance, the memory card 14 may be arbitrarily replaced with another memory card 14 by a user using a personal computer.
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Machine Translation (AREA)
- Document Processing Apparatus (AREA)
Abstract
Description
Claims
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US10/592,071 US20070203703A1 (en) | 2004-03-29 | 2005-03-29 | Speech Synthesizing Apparatus |
JP2006511572A JP4884212B2 (ja) | 2004-03-29 | 2005-03-29 | 音声合成装置 |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2004-094071 | 2004-03-29 | ||
JP2004094071 | 2004-03-29 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2005093713A1 true WO2005093713A1 (ja) | 2005-10-06 |
Family
ID=35056415
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2005/005815 WO2005093713A1 (ja) | 2004-03-29 | 2005-03-29 | 音声合成装置 |
Country Status (3)
Country | Link |
---|---|
US (1) | US20070203703A1 (ja) |
JP (1) | JP4884212B2 (ja) |
WO (1) | WO2005093713A1 (ja) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2007240988A (ja) * | 2006-03-09 | 2007-09-20 | Kenwood Corp | 音声合成装置、データベース、音声合成方法及びプログラム |
JP2007240987A (ja) * | 2006-03-09 | 2007-09-20 | Kenwood Corp | 音声合成装置、音声合成方法及びプログラム |
JP2007240989A (ja) * | 2006-03-09 | 2007-09-20 | Kenwood Corp | 音声合成装置、音声合成方法及びプログラム |
JP2007240990A (ja) * | 2006-03-09 | 2007-09-20 | Kenwood Corp | 音声合成装置、音声合成方法及びプログラム |
JP2015172658A (ja) * | 2014-03-12 | 2015-10-01 | 東京テレメッセージ株式会社 | 地域に設置された複数の屋外拡声器により音声メッセージを同報するシステムにおける聴き取りやすさの改善 |
Families Citing this family (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070203705A1 (en) * | 2005-12-30 | 2007-08-30 | Inci Ozkaragoz | Database storing syllables and sound units for use in text to speech synthesis system |
US8510113B1 (en) | 2006-08-31 | 2013-08-13 | At&T Intellectual Property Ii, L.P. | Method and system for enhancing a speech database |
EP2188729A1 (en) * | 2007-08-08 | 2010-05-26 | Lessac Technologies, Inc. | System-effected text annotation for expressive prosody in speech synthesis and recognition |
RU2421827C2 (ru) * | 2009-08-07 | 2011-06-20 | Общество с ограниченной ответственностью "Центр речевых технологий" | Способ синтеза речи |
TWI413105B (zh) * | 2010-12-30 | 2013-10-21 | Ind Tech Res Inst | 多語言之文字轉語音合成系統與方法 |
US10469623B2 (en) * | 2012-01-26 | 2019-11-05 | ZOOM International a.s. | Phrase labeling within spoken audio recordings |
US10192541B2 (en) * | 2014-06-05 | 2019-01-29 | Nuance Communications, Inc. | Systems and methods for generating speech of multiple styles from text |
JP6695069B2 (ja) * | 2016-05-31 | 2020-05-20 | パナソニックIpマネジメント株式会社 | 電話装置 |
CN110782871B (zh) | 2019-10-30 | 2020-10-30 | 百度在线网络技术(北京)有限公司 | 一种韵律停顿预测方法、装置以及电子设备 |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH11143483A (ja) * | 1997-08-15 | 1999-05-28 | Hiroshi Kurita | 音声発生システム |
JP2000231395A (ja) * | 1999-02-08 | 2000-08-22 | Nippon Telegr & Teleph Corp <Ntt> | 音声合成方法及び装置 |
JP2001296878A (ja) * | 2000-04-14 | 2001-10-26 | Fujitsu Ltd | 音声合成用辞書作成装置及び方法 |
JP2003036089A (ja) * | 2001-07-24 | 2003-02-07 | Matsushita Electric Ind Co Ltd | テキスト音声合成方法とテキスト音声合成装置 |
JP2003114692A (ja) * | 2001-10-05 | 2003-04-18 | Toyota Motor Corp | 音源データの提供システム、端末、玩具、提供方法、プログラム、および媒体 |
JP2003186489A (ja) * | 2001-12-14 | 2003-07-04 | Omron Corp | 音声情報データベース作成システム,録音原稿作成装置および方法,録音管理装置および方法,ならびにラベリング装置および方法 |
JP2003271200A (ja) * | 2002-03-18 | 2003-09-25 | Matsushita Electric Ind Co Ltd | 音声合成方法および音声合成装置 |
Family Cites Families (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH1138989A (ja) * | 1997-07-14 | 1999-02-12 | Toshiba Corp | 音声合成装置及び方法 |
JP3450237B2 (ja) * | 1999-10-06 | 2003-09-22 | 株式会社アルカディア | 音声合成装置および方法 |
JP3728172B2 (ja) * | 2000-03-31 | 2005-12-21 | キヤノン株式会社 | 音声合成方法および装置 |
US6865533B2 (en) * | 2000-04-21 | 2005-03-08 | Lessac Technology Inc. | Text to speech |
JP2002328694A (ja) * | 2001-03-02 | 2002-11-15 | Matsushita Electric Ind Co Ltd | 携帯端末装置及び読み上げシステム |
US20020156630A1 (en) * | 2001-03-02 | 2002-10-24 | Kazunori Hayashi | Reading system and information terminal |
JP2003223181A (ja) * | 2002-01-29 | 2003-08-08 | Yamaha Corp | 文字−音声変換装置およびそれを用いた携帯端末装置 |
-
2005
- 2005-03-29 WO PCT/JP2005/005815 patent/WO2005093713A1/ja active Application Filing
- 2005-03-29 JP JP2006511572A patent/JP4884212B2/ja not_active Expired - Fee Related
- 2005-03-29 US US10/592,071 patent/US20070203703A1/en not_active Abandoned
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH11143483A (ja) * | 1997-08-15 | 1999-05-28 | Hiroshi Kurita | 音声発生システム |
JP2000231395A (ja) * | 1999-02-08 | 2000-08-22 | Nippon Telegr & Teleph Corp <Ntt> | 音声合成方法及び装置 |
JP2001296878A (ja) * | 2000-04-14 | 2001-10-26 | Fujitsu Ltd | 音声合成用辞書作成装置及び方法 |
JP2003036089A (ja) * | 2001-07-24 | 2003-02-07 | Matsushita Electric Ind Co Ltd | テキスト音声合成方法とテキスト音声合成装置 |
JP2003114692A (ja) * | 2001-10-05 | 2003-04-18 | Toyota Motor Corp | 音源データの提供システム、端末、玩具、提供方法、プログラム、および媒体 |
JP2003186489A (ja) * | 2001-12-14 | 2003-07-04 | Omron Corp | 音声情報データベース作成システム,録音原稿作成装置および方法,録音管理装置および方法,ならびにラベリング装置および方法 |
JP2003271200A (ja) * | 2002-03-18 | 2003-09-25 | Matsushita Electric Ind Co Ltd | 音声合成方法および音声合成装置 |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2007240988A (ja) * | 2006-03-09 | 2007-09-20 | Kenwood Corp | 音声合成装置、データベース、音声合成方法及びプログラム |
JP2007240987A (ja) * | 2006-03-09 | 2007-09-20 | Kenwood Corp | 音声合成装置、音声合成方法及びプログラム |
JP2007240989A (ja) * | 2006-03-09 | 2007-09-20 | Kenwood Corp | 音声合成装置、音声合成方法及びプログラム |
JP2007240990A (ja) * | 2006-03-09 | 2007-09-20 | Kenwood Corp | 音声合成装置、音声合成方法及びプログラム |
JP2015172658A (ja) * | 2014-03-12 | 2015-10-01 | 東京テレメッセージ株式会社 | 地域に設置された複数の屋外拡声器により音声メッセージを同報するシステムにおける聴き取りやすさの改善 |
Also Published As
Publication number | Publication date |
---|---|
JPWO2005093713A1 (ja) | 2008-07-31 |
JP4884212B2 (ja) | 2012-02-29 |
US20070203703A1 (en) | 2007-08-30 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2005093713A1 (ja) | 音声合成装置 | |
US7483832B2 (en) | Method and system for customizing voice translation of text to speech | |
US5774854A (en) | Text to speech system | |
Prahallad et al. | The IIIT-H Indic speech databases. | |
Eide et al. | A corpus-based approach to< ahem/> expressive speech synthesis | |
EP2704092A2 (en) | System for creating musical content using a client terminal | |
JP3270356B2 (ja) | 発話文書作成装置,発話文書作成方法および発話文書作成手順をコンピュータに実行させるプログラムを格納したコンピュータ読み取り可能な記録媒体 | |
AU769036B2 (en) | Device and method for digital voice processing | |
Burkhardt et al. | Emotional speech synthesis: Applications, history and possible future | |
JP4409279B2 (ja) | 音声合成装置及び音声合成プログラム | |
JPH08335096A (ja) | テキスト音声合成装置 | |
Henton | Challenges and rewards in using parametric or concatenative speech synthesis | |
JP3578961B2 (ja) | 音声合成方法及び装置 | |
JP2003029774A (ja) | 音声波形辞書配信システム、音声波形辞書作成装置、及び音声合成端末装置 | |
JP2894447B2 (ja) | 複合音声単位を用いた音声合成装置 | |
KR0134707B1 (ko) | 다이폰 단위를 이용한 엘에스피(lsp)방식의 음성 합성 방법 | |
JP4056647B2 (ja) | 波形接続型音声合成装置および方法 | |
Ojala | Auditory quality evaluation of present Finnish text-to-speech systems | |
Hande | A review on speech synthesis an artificial voice production | |
Khudoyberdiev | The Algorithms of Tajik Speech Synthesis by Syllable | |
KR100269215B1 (ko) | 음성 합성을 위한 발화구의 기본 주파수 궤적 생성 방법 | |
KR20230099934A (ko) | 복수의 화자음성을 이용한 음성 변환 장치 및 그 방법 | |
JPH03214197A (ja) | 音声合成装置 | |
JP3192981B2 (ja) | テキスト音声合成装置 | |
Venkatagiri | Digital speech technology: An overview |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AK | Designated states |
Kind code of ref document: A1 Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BW BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE EG ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NA NI NO NZ OM PG PH PL PT RO RU SC SD SE SG SK SL SM SY TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW |
|
AL | Designated countries for regional patents |
Kind code of ref document: A1 Designated state(s): BW GH GM KE LS MW MZ NA SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IS IT LT LU MC NL PL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
DPEN | Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed from 20040101) | ||
WWE | Wipo information: entry into national phase |
Ref document number: 10592071 Country of ref document: US Ref document number: 2007203703 Country of ref document: US |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2006511572 Country of ref document: JP |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
WWW | Wipo information: withdrawn in national office |
Ref document number: DE |
|
122 | Ep: pct application non-entry in european phase | ||
WWP | Wipo information: published in national office |
Ref document number: 10592071 Country of ref document: US |