WO2007141993A1 - 音声合成装置 - Google Patents
音声合成装置 Download PDFInfo
- Publication number
- WO2007141993A1 WO2007141993A1 PCT/JP2007/059765 JP2007059765W WO2007141993A1 WO 2007141993 A1 WO2007141993 A1 WO 2007141993A1 JP 2007059765 W JP2007059765 W JP 2007059765W WO 2007141993 A1 WO2007141993 A1 WO 2007141993A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- speech
- unit
- synthesized
- small
- speech unit
- Prior art date
Links
- 238000012937 correction Methods 0.000 claims abstract description 7
- 230000015572 biosynthetic process Effects 0.000 claims description 47
- 238000003786 synthesis reaction Methods 0.000 claims description 47
- 238000001308 synthesis method Methods 0.000 claims description 12
- 230000003044 adaptive effect Effects 0.000 claims description 8
- 230000005540 biological transmission Effects 0.000 claims description 4
- 238000000034 method Methods 0.000 description 82
- 238000012545 processing Methods 0.000 description 81
- 238000012986 modification Methods 0.000 description 54
- 230000004048 modification Effects 0.000 description 54
- 230000008569 process Effects 0.000 description 44
- 238000004891 communication Methods 0.000 description 28
- 238000010586 diagram Methods 0.000 description 20
- 238000004364 calculation method Methods 0.000 description 14
- 230000001419 dependent effect Effects 0.000 description 9
- 230000005284 excitation Effects 0.000 description 9
- 230000000694 effects Effects 0.000 description 6
- 238000013179 statistical model Methods 0.000 description 6
- 238000004458 analytical method Methods 0.000 description 5
- 238000000605 extraction Methods 0.000 description 4
- 230000006870 function Effects 0.000 description 4
- 230000004044 response Effects 0.000 description 4
- 230000003595 spectral effect Effects 0.000 description 4
- 230000002194 synthesizing effect Effects 0.000 description 4
- 238000012790 confirmation Methods 0.000 description 3
- 238000007781 pre-processing Methods 0.000 description 3
- 238000003066 decision tree Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 230000006872 improvement Effects 0.000 description 2
- 238000011002 quantification Methods 0.000 description 2
- 230000000717 retained effect Effects 0.000 description 2
- 238000010187 selection method Methods 0.000 description 2
- 230000005236 sound signal Effects 0.000 description 2
- 230000001755 vocal effect Effects 0.000 description 2
- 238000012935 Averaging Methods 0.000 description 1
- 108010076504 Protein Sorting Signals Proteins 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 230000006835 compression Effects 0.000 description 1
- 238000007906 compression Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000005484 gravity Effects 0.000 description 1
- 238000013139 quantization Methods 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 238000001228 spectrum Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/02—Methods for producing synthetic speech; Speech synthesisers
- G10L13/033—Voice editing, e.g. manipulating the voice of the synthesiser
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/02—Methods for producing synthetic speech; Speech synthesisers
- G10L13/04—Details of speech synthesis systems, e.g. synthesiser structure or memory management
Definitions
- the present invention relates to an audio content editing and generation method based on a speech synthesis technique.
- editing of audio content refers to, for example, adding high-quality inflections such as high school girl style and Kansai dialect, This is equivalent to customizing the synthesized sound according to the editor's own preferences, such as changing the prosody and voice quality so that they can be transmitted, and emphasizing the ending. Rather than performing such customization in a single process, it is possible to create content that the user desires by repeating editing and auditioning.
- a synthesized sound being edited can be easily auditioned.
- FIG. 1 shows the conventional speech synthesis apparatus described in Patent Document 1. It is a block diagram which shows a structure.
- a conventional speech synthesizer receives a synthesizer instruction 002 obtained as a result of analyzing a text as a synthesis target as an input, and uses an expanded speech unit included in a speech unit DB (database) 001. This is a speech synthesizer that selects and connects appropriate speech segments and outputs a synthesized speech waveform 019.
- the speech synthesizer includes a multistage preliminary selection unit 003, a segment selection unit 004, and a connection unit 005.
- the multistage preselection unit 003 receives the synthesizer instruction 002, and performs multistage preselection of speech units designated by the synthesizer instruction 002, as described later, and selects a preselection candidate group 01 8 select.
- the segment selection unit 004 receives the synthesizer command 002, and selects the segment with the lowest cost calculated from all the sub-costs from the preliminary selection candidate group 018.
- Connection unit 005 connects the speech units selected by unit selection unit 004 and outputs synthesized speech waveform 019.
- the preliminary selection candidate group 018 is used only for selecting a segment, and therefore includes only a feature amount necessary for cost calculation, and does not include speech segment data itself.
- the connection unit 005 can obtain the speech unit data of the unit selected by the unit selection unit 004 with reference to the speech unit DB001.
- the sub-costs used in conventional speech synthesizers are fundamental frequency error, duration error, MFCC (Mel Frequency Cepstrum Coefficient) error, FO (fundamental frequency) discontinuous error, MFCC discontinuous error, phoneme environment 6 sub-costs corresponding to each error Including.
- MFCC Mel Frequency Cepstrum Coefficient
- FO fundamental frequency
- MFCC discontinuous error phoneme environment 6 sub-costs corresponding to each error Including.
- the former three belong to the target cost and the latter three belong to the connection cost.
- the cost is calculated from the sub cost.
- the multistage preliminary selection unit 003 includes four preliminary selection units 006, 009, 012 and 015.
- the first preliminary selection unit 006 receives the synthesizer command 002, and performs the preliminary selection based on the F0 error and the duration length error at each time in the speech unit candidate power in the speech unit DB001. Group 007 is output.
- the second preliminary selection unit 009 performs preliminary selection based on the F0 error, the duration length error, and the MFCC error at each time from the segments in the first candidate group 007. Is output.
- the third preliminary selection unit 012 and the fourth preliminary selection unit 015 similarly perform preliminary selection using a part of the sub-cost.
- Patent Document 1 Japanese Patent Laid-Open No. 2005-265895 (Fig. 1)
- the object of the present invention is to create audio content, and for this purpose, means for editing synthesized sound is required.
- the synthesized sound that is, the audio content is edited using the technique of Patent Document 1, there are the following problems.
- the speech synthesizer described in Patent Document 1 can reduce the total calculation cost by introducing a preliminary selection unit when selecting a segment.
- the first preliminary selection unit 006 needs to perform preliminary selection from all speech segments.
- the connection unit 005 must finally select the optimum speech unit from the speech unit DB001 every time.
- the speech unit DB001 must store a large number of speech units, and the total playback time is usually several hours to several hundred hours. It will be a large database.
- the present invention solves the above-described conventional problems, and an object of the present invention is to provide a speech synthesizer capable of executing speech content editing at high speed and easily creating speech content.
- a speech synthesizer is a speech synthesizer that generates a synthesized sound that conforms to a phonetic symbol and prosodic information, and generates synthesized sound generation data used to generate the synthesized sound.
- a small-scale database that is held, a large-scale database that holds a larger number of speech segments than the synthesized-sound generation data held in the small-scale database, and the generated synthesized speech is a voice Synthetic sound generation data selection means for selecting synthetic sound generation data that matches the symbol and prosodic information from the small-scale database, and the synthetic sound generation data selected by the synthetic sound generation data selection means
- the compatible speech unit selecting means for selecting a corresponding speech unit from the large-scale database and the speech unit selected by the compatible speech unit selecting means are connected.
- speech unit connecting means for generating a synthesized sound.
- the synthetic sound generation data selection means selects synthetic sound generation data from the small-scale database.
- the adaptive speech segment selection means selects a high-quality speech segment corresponding to the selected synthesized speech generation data from the large-scale database. In this way, by selecting speech units in two stages, it is possible to select speech units with high sound quality at high speed.
- the large-scale database is provided in a server connected to the speech synthesizer via a computer network, and the adaptive speech element selection means is provided in the server provided in the server.
- the center of the scale database The speech unit may be selected.
- the above-described speech synthesizer further includes a small-scale speech unit connection unit that generates a simple synthesized speech by connecting the speech units selected by the synthesized speech generation data selection unit; Prosody information correcting means for receiving information for correcting the prosodic information of the simple synthesized sound and correcting the prosodic information based on the information.
- the synthetic sound generation data selection means when the prosody information of the simple synthetic sound is corrected, the synthetic sound generation data in which the generated synthetic sound matches the phonetic symbol and the corrected prosodic information Is reselected from the small database and the reselected synthesized speech generation data is output to the small speech unit connection means.
- the adaptive speech unit selection means receives the synthesized sound generation data determined by the modification and the reselection, and selects speech units corresponding to the synthesized sound generation data from the large-scale data. You can also choose the medium strength!
- the synthetic sound generation data is reselected. While repeating the prosodic information correction and the reselection of the synthesized sound generation data, the user selects the desired synthesized sound generation data. In addition, the selection of speech units with large database power needs to be done only once at the end. Therefore, it is possible to efficiently create high-quality synthesized sound.
- the present invention is realized as a speech synthesizing method having steps as characteristic means included in a speech synthesizer that can be realized as a speech synthesizer having such characteristic means as much as possible. It can also be realized as a program that causes a computer to execute the characteristic steps included in the speech synthesis method. Needless to say, such a program can be distributed via a communication network such as a recording medium such as a CD-ROM (Compact Disc-Read Only Memory) or the Internet.
- a communication network such as a recording medium such as a CD-ROM (Compact Disc-Read Only Memory) or the Internet.
- a synthetic sound can be created using only a terminal and a small-scale database.
- the prosody modification means by the user Synthetic sound editing processing is possible.
- audio content can be edited even on terminals with relatively small resources such as mobile terminals.
- the synthesized sound can be created on the terminal side using a small database, the edited synthesized sound can be played back only on the terminal, and the user can audition the reproduced synthesized sound.
- the user can perform high-quality sound processing using a large-scale database stored in the server.
- the already determined small speech unit sequences are associated with the candidates for the large-scale database. For this reason, the selection of a unit by the large-scale speech unit selector only needs to search a limited search space compared to the case of re-selecting a unit again! Can be reduced.
- there is a system of several GB or more as an example of a large speech unit while another example of a small speech unit is about 0.5 MB.
- the communication between the terminal and the server for obtaining the segments stored in the large-scale database need only be performed once when the high-quality key processing is performed. Therefore, the time loss associated with communication can be reduced. That is, by separating the editing work from the sound quality enhancement process, it is possible to improve the response required for the sound content editing work.
- FIG. 1 is a configuration diagram of a conventional multi-stage unit selection type speech synthesizer.
- FIG. 2 is a configuration diagram of a multi-quality speech synthesizer according to Embodiment 1 of the present invention.
- FIG. 3 shows an example of a corresponding DB according to the first embodiment of the present invention.
- FIG. 4 is a conceptual diagram when the multi-quality speech synthesizer according to Embodiment 1 of the present invention is implemented as a system.
- FIG. 5 is a flowchart showing the operation of the multi-quality speech synthesizer in Embodiment 1 of the present invention.
- FIG. 6 is a diagram showing an operation example of high-quality sound processing according to Embodiment 1 of the present invention.
- Fig. 7 is a conceptual diagram when hierarchical clustering is performed on a speech unit group held in a large speech unit DB.
- FIG. 8 is a flowchart of multi-quality speech synthesis processing in Modification 1 of Embodiment 1 of the present invention. It is a chart.
- FIG. 9 is a flowchart of multi-quality speech synthesis processing in Modification 2 of Embodiment 1 of the present invention.
- FIG. 10 is a flowchart of multi-quality speech synthesis processing in Modification 3 of Embodiment 1 of the present invention.
- FIG. 11 is a flowchart of multi-quality speech synthesis processing in Modification 4 of Embodiment 1 of the present invention.
- FIG. 12 is a configuration diagram of a text-to-speech synthesizer using an HMM speech synthesis method, which is one of speech synthesis methods based on a statistical model.
- FIG. 13 is a configuration diagram of a multi-quality speech synthesizer according to Embodiment 2 of the present invention.
- FIG. 14 is a flowchart showing an operation of the multi-quality speech synthesizer in Embodiment 2 of the present invention.
- FIG. 15 is a diagram showing an operation example of high-quality sound processing according to Embodiment 2 of the present invention.
- FIG. 16 is a conceptual diagram when context clustering is performed on a speech unit group held in a large speech unit DB.
- FIG. 17 is an example of a corresponding DB according to the second embodiment of the present invention.
- FIG. 18 is a diagram showing an operation example in the case where a plurality of HMMs are assigned to each unit in the sound quality enhancement processing according to Embodiment 2 of the present invention.
- FIG. 19 is a block diagram showing a configuration of a multi-quality speech synthesis system according to Embodiment 3 of the present invention.
- FIG. 20 is a flowchart showing a flow of processing by the multi-quality speech synthesis system according to Embodiment 3.
- FIG. 21 is a flowchart showing a flow of processing by the multi-quality speech synthesis system according to Embodiment 3. Explanation of symbols
- Embodiment 1 of the present invention is characterized in that audio content editing is efficiently performed by hierarchizing a speech unit DB into a small speech unit DB and a large speech unit DB. .
- FIG. 2 is a configuration diagram of the multi-quality speech synthesizer according to Embodiment 1 of the present invention.
- the multi-quality speech synthesizer is a device that synthesizes a plurality of quality speech, and includes a small speech unit DB101, a small speech unit selector 102, a small speech unit connection unit 103, A prosody modification unit 104, a large speech unit DB 105, a corresponding DB 106, a speech unit candidate acquisition unit 107, a large speech unit selection unit 108, and a large speech unit connection unit 109.
- the small speech unit DB101 is a database that holds small speech units.
- the speech element stored in the DB 101 is specifically referred to as a “small speech element” in this specification.
- the small speech unit selection unit 102 receives as input the phoneme information and prosodic information that are the targets of the synthesized speech to be created, and from among speech units held in the small speech unit DB101. It is a processing unit that selects an optimal speech unit sequence.
- the small speech unit connection unit 103 is configured to output the sound selected by the small speech unit selection unit 102. It is a processing unit that generates a synthesized sound by connecting phoneme sequences.
- the prosody modification unit 104 is a processing unit that accepts input by the user of information for modifying the prosody information, and modifies the target prosody information created by the multi-quality speech synthesizer. .
- the large speech unit DB105 is a database that holds large speech units.
- the speech element stored in the DB 105 is particularly referred to as a “large speech element” in this specification.
- Correspondence DB 106 is a database that holds information indicating the correspondence between speech units held in small speech unit DB101 and speech units stored in large speech unit DB 105. .
- the speech unit candidate acquisition unit 107 receives the speech unit sequence selected by the small speech unit selection unit 102 as input, and stores information indicating the correspondence relationship between speech units stored in the corresponding DB 106. Based on this, the processing unit obtains speech unit candidates corresponding to each speech unit of the input speech unit series from the large-scale speech unit DB 105 via the network 113 or the like.
- the large-scale speech unit selection unit 108 receives the phoneme information received as an input by the small-scale speech unit selection unit 102 and the small-scale speech unit selection unit 102 as inputs.
- a processing unit that receives prosody information or prosody information modified by the prosody modification unit 104 as an input and selects an optimal speech unit sequence from speech unit candidates selected by the speech unit candidate acquisition unit 107. is there.
- the large-scale speech unit connection unit 109 is a processing unit that generates a synthesized sound by connecting the speech unit sequences selected by the large-scale speech unit selection unit 108.
- Fig. 3 shows the correspondence between the speech unit stored in the small speech unit DB101 and the speech unit stored in the large speech unit DB105 stored in the corresponding DB 106. It is a figure which shows an example of information.
- “small speech unit number” and “large speech unit number” are stored in association with each other.
- “Small speech unit number” identifies the speech unit stored in the small speech unit DB101.
- the “large speech unit number” is a speech unit number for identifying a speech unit stored in the large speech unit DB 105.
- the speech unit with small speech unit number “2” is associated with speech units with large speech unit numbers “1” and “2”.
- speech units having the same number indicate the same speech unit. That is, the speech unit with the small speech unit number “2” and the speech unit with the large speech unit number “2” indicate the same speech unit.
- FIG. 4 is a conceptual diagram when the multi-quality speech synthesizer according to the present embodiment is realized as a system.
- the multi-quality speech synthesis system includes a terminal 111 and a server 112 connected to each other via a network 113, and a multi-quality speech synthesizer is realized by the cooperative operation of the terminal 111 and the server 112. To do.
- the terminal 111 has a small speech unit DB101, a small speech unit selection unit 102, a small speech unit connection unit 103, a prosody modification unit 104, a corresponding DB 106, and a speech unit candidate acquisition.
- the server 112 is composed of a large-scale speech unit DB105.
- the large speech unit DB 105 may be held centrally by the server 112 that is not required to be provided in the terminal 111.
- the operation of the multi-quality speech synthesizer can be broadly divided into synthetic sound editing processing and edited synthetic sound quality enhancement processing.
- description will be given in order by dividing into the editing process of the synthesized sound and the sound quality enhancement process.
- prosodic information is generated based on the phoneme sequence and accent symbols (step S001).
- the method of generating prosodic information is not particularly limited. For example, it may be generated with reference to a template !, and it may be estimated using quantification class I. You may do it.
- the prosody information may be directly input from the outside.
- This prosodic information group includes the prosody information t indicating the phoneme “a” and the corresponding prosody, and the phoneme “r” and the corresponding phoneme “r”.
- the small speech element selection unit 102 is based on the prosodic information t to t obtained in step S001.
- a speech unit sequence that minimizes is searched by the Viterbi algorithm.
- the calculation method of the target cost and the connection cost is not particularly limited.
- the target cost may be calculated by a weighted sum of differences of prosodic information (fundamental frequency 'duration length' power).
- the connection cost is calculated using the cepstrum distance between the end of U and the start of U. i-1 i
- the small speech unit connection unit 103 synthesizes a speech waveform using the speech unit sequence selected by the small speech unit selection unit 102, and presents the synthesized speech to the user by outputting it. (Step S003).
- the method for synthesizing the speech waveform is not particularly limited.
- the prosody modification unit 104 receives an input as to whether or not the user is satisfied with the synthesized sound (step S004). If the user is satisfied with the synthesized sound (YES in step S004), the editing process ends, and the processes after step S006 are executed.
- the prosody modification unit 104 accepts input by the user of information for modifying the prosody information, and the target prosody. Correct the information (step S005).
- “Modification of prosodic information” includes, for example, changing the position of the accent, changing the fundamental frequency, and changing the duration.
- step S005 the user can create a synthesized sound of the desired prosody.
- S s, s,..., S be the speech segment sequence selected in this way.
- the interface of the prosody modification unit 104 is not particularly limited.
- the prosody information may be corrected with a slider, or the user may specify intuitive prosody information such as a high school girl style or Kansai dialect. good.
- the user may input prosodic information by voice.
- Step S to obtain speech unit candidates from the large speech unit DB105 (Step S
- the speech unit candidate acquisition unit 107 displays information indicating a correspondence relationship between the speech unit held in the small speech unit DB101 and the speech unit held in the large speech unit DB105.
- Speech unit sequence (S s, s, ..., s)
- Speech unit candidates corresponding to each speech unit constituting 2 n are obtained from the large speech unit DB105.
- the method for creating the corresponding DB 106 will be described later.
- speech segment candidate acquisition processing by speech segment candidate acquisition section 107 step S
- Figure 6 shows the corresponding DB
- the speech unit candidate group of the large speech unit DB 105 corresponding to each small speech unit (s) is acquired. For example, in the example of FIG. 6, a small speech unit s determined by the editing process as the phoneme “a” is converted into a large scale by using the corresponding DB 106.
- h, h, h, h are multiple real speech waveforms acoustically similar to the small speech unit s (or
- Speech unit candidates can be acquired based on DB106. That is, the large speech element candidate group sequence 602 shown in the figure shows the large speech element candidate group sequence corresponding to the small speech element sequence S.
- the large speech unit selection unit 108 selects a speech unit sequence most suitable for the prosodic information edited by the user from the large speech unit candidate group sequence 602 (step SOO).
- H h, h, h, h, h, h, h, h are derived from the large speech element candidate group sequence 602.
- H h, h, h, h, h, h, h, h, h, h, h are stored in the large speech unit DB105
- the large-scale speech unit connection unit 109 is the large-scale speech unit DB1 selected in step S007.
- the speech unit sequence H held in 05 is connected to generate a synthesized sound (step S008).
- connection method is not particularly limited.
- the pieces when connecting the pieces, the pieces may be appropriately deformed and connected.
- the prosody 'voice quality is similar to the simplified version of the synthesized sound edited in the editing process, In addition, it is possible to generate a high-quality synthesized sound.
- the correspondence DB 106 stores information indicating the correspondence between the speech units held in the small speech unit DB101 and the speech units held in the large speech unit DB105. It is a database to keep.
- the sound quality enhancement process when performing the sound quality enhancement process, it is used to select a speech element similar to the simple synthesized sound created by the editing process from the large-scale speech element DB 105.
- the small speech unit DB101 is a partial set of the segment group held by the large speech unit DB105, and is characterized by satisfying the following relationship.
- the speech unit held in the small speech unit DB101 is associated with one or more speech units held in the large speech unit DB. Furthermore, the speech unit of the large speech unit DB105 associated by the corresponding DB 106 is acoustically similar to the speech unit of the small speech unit DB. Similar criteria include prosodic information (basic frequency, power information, duration length, etc.) and vocal tract information (formant, cepstrum coefficients, etc.).
- the speech unit having a similar prosody and voice quality is subjected to a high quality sound processing.
- Can be selected during The large speech unit DB105 can select an optimal speech unit candidate from among abundant candidates. Therefore, it is possible to reduce the cost when the speech unit is selected by the large speech unit selection unit 108 described above. Thus, if the sound quality of the synthesized sound can be improved, the effect can be obtained.
- the large speech unit DB105 can hold a large amount of data.
- the large speech unit selection unit 108 can select a speech unit sequence having high connectivity between speech units from the large speech unit DB 105 (for example, the method described in Patent Document 1). Can be realized by using [0088] In order to perform the above association, a clustering technique is used. “Clustering” is a technique for classifying individuals into several sets based on an index of similarity between individuals determined by multiple characteristics.
- the clustering method is roughly divided into a hierarchical clustering method in which similar individuals are merged into several sets, and similar individuals are eventually included in the same set.
- This can be broadly divided into non-hierarchical clustering methods that divide the original set.
- hierarchical clustering a technique called “hierarchical clustering using heaps” is known.
- non-hierarchical clustering a method called “k-me ans method” is known!
- FIG. 7 shows a conceptual diagram when hierarchical clustering is performed on a speech unit group held in the large speech unit DB105.
- the initial layer 301 is composed of each speech unit held in the large speech unit DB 105.
- the speech unit held in the large speech unit DB 105 is shown by a rectangle.
- the number given to the rectangle is an identifier for identifying a speech unit, that is, a speech unit number.
- the cluster group 302 of the first hierarchy is a set of clusters clustered as the first hierarchy by hierarchical clustering, and each cluster is shown as a circle.
- the cluster 303 is one of the clusters clustered as the first hierarchy, and specifically includes speech unit numbers “1” and “2”.
- the number shown in each cluster is an identifier of a speech unit representing the cluster.
- the speech element representing the cluster 303 is the speech element having the speech element number “2”.
- As a method for determining the representative speech unit there is a method using a centroid of a speech unit group belonging to the cluster. .
- the speech unit closest to the centroid of the speech unit group belonging to the cluster is used as the cluster representative.
- the speech unit representing the cluster 303 is the speech unit with the speech unit number “2”.
- representative speech segments can be determined for other clusters.
- the similarity is obtained between the vector of each speech unit included in the speech unit group and the centroid vector of the cluster, and the maximum similarity is obtained. Is obtained as a representative segment. Note that the distance (eg, Euclidean distance) between the centroid vector of the cluster and the vector of each speech element is obtained, and the speech element with the smallest distance is obtained as the representative element.
- the distance eg, Euclidean distance
- the cluster group 304 in the second hierarchy is a cluster obtained by clustering clusters belonging to the cluster group 302 in the first hierarchy based on the above-described similarity. Therefore, the number of clusters is smaller than the number of clusters in the first layer cluster group 302. At this time, the representative speech segment can be similarly determined for the cluster 305 in the second layer.
- the speech element with the segment number “2” is a speech element representing the cluster 305.
- the large speech unit DB 105 can be divided into a cluster group 302 in the first hierarchy and a cluster group 304 in the second hierarchy.
- a segment group that also has the power of only the representative speech unit of each cluster of the cluster group 302 of the first hierarchy can be used as the small speech unit DB 101.
- speech units having unit numbers 2, 3, 6, 8, 9, 12, 14, and 15 can be used as the small speech unit DB101.
- a speech unit group in which only the representative speech unit of each cluster of the cluster group of the second hierarchy can also be used as the small speech unit DB 101.
- speech units with unit numbers 2, 8, 12, and 15 can be used for ⁇ ⁇ IJ as the small speech unit DB101.
- the cluster group 302 of the first layer is used as a small speech unit. Is shown.
- the speech unit with the small speech unit number “2” is associated with the speech units with the large speech unit numbers “1” and “2” in the large speech unit DB105.
- the speech unit with the small speech unit number “3” is associated with the speech units with the large speech unit numbers “3” and “4” in the large speech unit DB105.
- the representative speech units of all the first layer cluster groups 302 and the large speech unit numbers of the large speech unit DB 105 can be associated with each other.
- the corresponding DB 106 can be referred to at a very high speed. .
- the scale of the small speech unit DB101 can be changed to be scalable. That is, as the small speech unit DB101, the representative speech unit of the cluster group 302 of the first hierarchy or the representative speech unit of the cluster group 304 of the second configuration can be used. Therefore, it is possible to configure the small speech unit DB101 according to the storage capacity of the terminal 111.
- the small speech unit DB101 and the large speech unit DB0105 satisfy the above-described relationship. That is, when the representative speech unit of the first layer cluster group 302 is used as the small speech unit DB101, for example, the speech unit number “2” held in the small speech unit DB101 is used.
- the speech unit corresponds to the speech unit numbers “1” and “2” of the large speech unit DB105. Further, the speech units having the speech unit numbers “1” and “2” are similar to the representative speech unit having the speech unit number “2” in the cluster 303 according to the above-described criteria.
- the speech unit candidate acquisition unit 107 selects the corresponding DB106. Is used to obtain the speech units with unit numbers “1” and “2”.
- the large-scale speech unit selection unit 108 is a candidate for which the above-mentioned formula (1) is also minimized, that is, the candidate of the acquired speech unit candidate, that is, close to the target prosody and has good connectivity with the front and back units. Select a speech segment
- the cost value of the speech unit sequence selected by the large speech unit selection unit 108 is less than or equal to the cost value of the speech unit sequence selected by the small speech unit selection unit 102.
- the segment candidate includes a speech unit selected by the small speech unit selection unit 102, and a plurality of speech units similar to the speech unit are added as candidates.
- the corresponding DB 106 is configured using hierarchical clustering. However, the corresponding DB 106 may be configured using non-hierarchical clustering.
- the k means method may be used.
- the k means method is non-hierarchical clustering that divides element groups (here, speech segment groups) so as to have a preset number of clusters (k).
- the size of the small speech unit DB101 required by the terminal 111 can be calculated at the time of design. Further, by determining a representative speech unit for each of the k divided clusters and using it as the small speech unit DB 101, the same effect as in the case of hierarchical clustering can be obtained.
- clustering processing is performed by performing clustering separately for each unit of speech units (for example, phonemes, syllables, mora, CV (C: consonant, V: vowel), VCV). Allows efficient clustering.
- terminal 111 has small speech unit DB101, small speech unit selection unit 102, small speech unit connection unit 103, prosody modification unit 104, and corresponding DB 106.
- the large speech unit DB105 may be held centrally on the server 112, even if there are multiple terminals 111, one large speech unit DB105 is stored on the server 112. Just do it.
- a synthesized sound can be created using only the terminal 111 and the small speech unit DB101.
- the prosody modification unit 104 allows the user to edit the synthesized sound.
- the communication between the terminal 111 and the server 112 need only be performed once when the quality enhancement process is performed. For this reason, the time loss accompanying communication can be reduced. That is, by separating the editing work from the high-quality sound processing, it is possible to improve the response required for the sound content editing work. It should be noted that the high sound quality processing may be performed by the server 112 and the result of the high sound quality may be transmitted to the terminal 111 via the network 113 !.
- the small speech unit DB101 is constructed to be a subset of the large speech unit DB105.
- the information amount of the large speech unit DB105 is compressed to a small scale.
- a simulated speech unit DB101 may be created. Specifically, compression may be performed by reducing the sampling frequency, reducing the number of quantization bits, or lowering the analysis order at the time of analysis.
- the correspondence DB 106 may be configured to associate the small speech unit DB101 and the large speech unit DB105 on a one-to-one basis.
- the load varies depending on how each component of the present embodiment is shared between the terminal and the server.
- the information that is communicated between the terminal and the server is different, and therefore the amount of communication is also different. The combination of components and their effects are described below.
- the terminal 111 includes a small speech unit DB 101, a small speech unit selection unit 102, a small speech unit connection unit 103, and a prosody modification unit 104.
- the server 112 includes a large-scale speech unit DB 105, a corresponding DB 106, a speech unit candidate acquisition unit 107, a large-scale speech unit selection unit 108, and a large-scale speech unit connection unit 109.
- Editing processing is performed using terminal 111. Specifically, prosodic information is generated (step S001).
- the small speech element selection unit 102 selects a small speech element sequence from the small speech element DB 101 (step S002).
- the small speech element connection unit 103 connects the small speech elements and generates a simplified synthesized sound (step S003).
- User generated It is determined whether or not the user is satisfied with the received synthesized sound (step S004). If not satisfied (NO in step S004), the prosody modification unit 104 corrects the prosody information (step S005). By repeating the process from step S002 to step S005, the target synthesized sound is generated.
- the terminal 111 If the user is satisfied with the simplified version of the synthesized speech (YES in step S004), the terminal 111 notifies the server 112 of the identifier of the small speech unit sequence selected in step S002 and Prosodic information is transmitted to the server (step S010).
- the speech element candidate acquisition unit 107 refers to the corresponding DB 106 based on the identifier of the small speech element sequence acquired from the terminal 111, and acquires the speech element group as a selection candidate from the large speech element DB 105. (Step S006). Based on the prosodic information received from the terminal 111, the large speech unit selection unit 108 selects an optimal large speech unit sequence from the acquired speech unit candidate group (step S007). The large-scale speech unit connecting unit 109 connects the selected large-scale speech unit sequence to generate a high-quality sound synthesized sound (step S008).
- the server 112 transmits the high-quality sound version synthesized sound created as described above to the terminal 111.
- High-quality synthesized sound can be created by the above processing.
- the terminal 111 has the small speech unit DB101, the small speech unit selection unit 102, the small speech unit connection unit 103, Since it can be configured only with the prosody modification unit 105, the required memory capacity can be reduced. In addition, since the terminal 111 generates a synthesized sound using only small speech elements, the amount of calculation can be reduced. In addition, communication from the terminal 111 to the server 112 is only prosodic information and a small speech unit sequence identifier, and the amount of communication can be very small. In addition, the communication capacity of the server 112 can be reduced by transmitting the high-quality synthesized speech once only to the terminal 111.
- the terminal 111 includes a small speech unit DB 101, a small speech unit selection unit 102, a small speech unit connection unit 103, a prosody modification unit 104, a corresponding DB 106, and a speech unit candidate acquisition unit 107. It comprises.
- Server 112 is a large speech unit DB105, large speech A unit selection unit 108 and a large-scale speech unit connection unit 109 are provided.
- step S001 prosodic information is generated (step S001).
- the small speech element selection unit 102 selects a small speech element sequence from the small speech element DB 101 (step S002).
- the small speech element connection unit 103 connects the small speech elements and generates a simplified synthesized sound (step S003).
- the user determines whether or not he / she is satisfied by listening to the generated synthesized sound (step S004). If not satisfied (NO in step S004), the prosody modification unit 104 corrects the prosody information (step S005). By repeating the process from step S002 to step S005, the target synthesized sound is generated.
- the speech segment candidate acquisition unit 107 uses the corresponding DB 106 to generate a segment that is a candidate for the corresponding large speech unit DB 105.
- the terminal 111 transmits the identifier of the large speech unit selection candidate group and the confirmed prosodic information to the server 112 (step S01 Do).
- the large speech unit selector 108 selects an optimal large speech unit sequence from the acquired speech unit candidate group based on the prosodic information received from the terminal 111 (step S007).
- the large-scale speech unit connection unit 109 connects the selected large-scale speech unit sequence to generate a high-quality sound synthesized sound (step S008).
- the server 112 transmits the high-quality sound synthesized sound generated as described above to the terminal 111.
- a high-quality synthesized sound is created by the above processing.
- the terminal 111 has the small speech unit DB101, the small speech unit selection unit 102, the small speech unit connection unit 103, Since the prosody modification unit 104 and the corresponding DB 106 can be used alone, the required memory capacity can be reduced. In addition, since the terminal 111 generates synthesized sound using only small speech elements, the amount of calculation can be reduced. By providing DB106 on terminal 111 side It is possible to reduce the processing of the server 112. The communication from the terminal 111 to the server 112 is only the prosodic information and the identifier of the speech element candidate group. Since only the identifiers need to be transmitted for the segment candidate group, the amount of communication can be very small.
- the server 112 since the server 112 does not need to perform the process of acquiring the speech segment candidate, the processing load on the server 112 can be reduced.
- communication with the terminal 111 can reduce the amount of communication by simply transmitting the synthesized speech with high sound quality once.
- the terminal 111 includes a small speech unit DB101, a small speech unit selection unit 102, a small speech unit connection unit 103, a prosody modification unit 104, a corresponding DB 106, and a speech unit candidate acquisition unit 107.
- the large-scale speech unit selection unit 108 and the large-scale speech unit connection unit 109 are provided.
- the server 112 includes a large speech unit DB105.
- terminal 111 includes large-scale speech unit selection unit 108 and large-scale speech unit connection unit 109.
- Editing processing is performed using terminal 111. Specifically, prosodic information is generated (step S001).
- the small speech element selection unit 102 selects a small speech element sequence from the small speech element DB 101 (step S002).
- the small speech element connection unit 103 connects the small speech elements and generates a simplified synthesized sound (step S003).
- the user determines whether or not he / she is satisfied by listening to the generated synthesized sound (step S004). If not satisfied (NO in step S004), the prosody modification unit 104 corrects the prosody information (step S005). By repeating the process from step S002 to step S005, the target synthesized sound is generated.
- the terminal 111 uses the corresponding DB 106 to acquire a unit identifier that is a candidate for the corresponding large speech unit DB 105, and The identifier of the large speech segment selection candidate group is transmitted to the server (step SO 09).
- the server 112 uses the received selection candidate group identifier. Based on this, a speech unit candidate group is selected from the large speech unit DB 105 and transmitted to the terminal 111 (step S006).
- large speech unit selection section 108 calculates an optimal large speech unit sequence based on the acquired speech unit candidate group power and already determined prosodic information (step) S007).
- the large-scale speech unit connection unit 109 connects the selected large-scale speech unit sequence to generate a high-quality synthesized sound (step S008).
- the server 112 determines the unit candidates to the terminal 111 based on the identifier of the speech unit candidate group transmitted from the terminal 111! Since only transmission is required, the calculation load on the server 112 can be greatly reduced.
- the terminal 111 can select the optimum speech unit sequence from a limited speech unit candidate group corresponding to a small speech unit by the corresponding DB 106, so the calculation amount is not so large. Is possible.
- the terminal 111 includes a small speech unit DB 101, a small speech unit selection unit 102, a small speech unit connection unit 103, a prosody modification unit 104, a large speech unit selection unit 108, and A large-scale speech unit connection unit 109 is provided.
- the server 112 includes a large speech unit DB1 05, a corresponding DB 106, and a speech unit candidate acquisition unit 107.
- the difference between the present modification and the third modification is that the server 112 includes the corresponding DB 106.
- Editing processing is performed using terminal 111. Specifically, prosodic information is generated (step S001).
- the small speech element selection unit 102 selects a small speech element sequence from the small speech element DB 101 (step S002).
- the small speech element connection unit 103 connects the small speech elements and generates a simplified synthesized sound (step S003).
- the user determines whether or not he / she is satisfied by listening to the generated synthesized sound (step S004). If not satisfied (NO in step S004), the prosody modification unit 104 corrects the prosody information (step S005). Repeat steps S002 and S005 until the desired composition Sound is generated.
- step S004 If the user is satisfied with the simplified synthesized speech (YES in step S004), control of processing is transferred to the server 112 side.
- the server 112 uses the corresponding DB 106 to acquire a speech unit group that is a candidate for the corresponding large speech unit DB 105, and transmits a large speech unit selection candidate group to the terminal 111. (Step S006).
- the optimal large-scale speech unit sequence is obtained from the speech unit candidate group obtained by the large-scale speech unit selection unit 108 based on the already determined prosodic information. Calculate (step S007).
- the large-scale speech unit connection unit 109 connects the selected large-scale speech unit sequence to generate a high-quality synthesized sound (step S008).
- the server 112 receives the identifier of the small-scale speech element sequence, and uses the corresponding speech element from the large-scale speech element DB 105 using the corresponding DB 106. It is only necessary to transmit the single candidate group to the terminal 111, and the calculation load of the server 111 can be greatly reduced. Compared to the third modification, communication from the terminal 111 to the sano 112 requires only a small speech unit sequence identifier, so the amount of communication can be reduced.
- a synthesized sound is generated by connecting speech unit sequences, whereas in this embodiment, HMM (Hidden Markov Model) is used.
- the difference is that the synthesized speech is generated using the (Dell) speech synthesis method.
- the HMM speech synthesis method is a speech synthesis method based on a statistical model, and is characterized in that the capacity of the statistical model is compact and a synthesized sound with stable sound quality can be generated. Since the HMM speech synthesis method is a publicly known technique, its detailed description will not be repeated.
- FIG. 12 is a block diagram of a text-to-speech synthesizer that uses the HMM speech synthesis method, which is one of the speech synthesis methods based on a statistical model (reference: JP 2002-268660 A).
- the text-to-speech synthesizer includes a learning unit 030 and a speech synthesizer 031.
- Learning unit 030 includes speech DB (database) 032, excitation source parameter extraction unit 033, vector parameter extraction unit 034, and HMM learning unit 035.
- the voice synthesis unit 031 includes a context-dependent HMM file 036, a text analysis unit 037, a parameter generation unit 038, an excitation source generation unit 039, and a synthesis filter 040.
- the learning unit 030 has a function of learning the context-dependent HMM file 036 using audio information stored in the audio DB032.
- the voice DB032 stores a large number of voice information prepared in advance as samples. Audio information is obtained by adding label information (amyuru and nuuyooku) that identifies each phoneme part of the waveform to the audio signal.
- the excitation source parameter extraction unit 033 and the spectral parameter extraction unit 034 extract the excitation source parameter sequence and the spectral parameter sequence for each audio signal extracted from the audio DB032.
- the learning unit 035 of the HMM performs HMM learning processing on the extracted excitation source parameter string and spectrum meter string using label information and time information extracted together with the voice signal from the voice DB032.
- the learned HMM is stored in the context-dependent HMM file 036.
- the parameters of the excitation source model are learned using a multi-space distribution HMM.
- the multi-space distribution HMM is an HMM extended to allow different dimensional forces of the parameter vector each time, and the pitch including the voiced Z unvoiced flag is an example of a parameter sequence in which such dimensions change. It is. In other words, it is a one-dimensional parameter vector when voiced and a zero-dimensional parameter vector when unvoiced.
- the learning unit 030 performs learning using this multi-space distribution HMM.
- Label information specifically refers to the following, for example, and each HMM has these as attribute names (contexts).
- Such an HMM is called a context-dependent HMM.
- the speech synthesis unit 031 has a function of generating a speech signal sequence in a reading format from an arbitrary electronic text.
- the text analysis unit 037 analyzes the input text and converts it into label information that is an array of phonemes.
- the parameter generation unit 038 searches the context-dependent HMM file 036 based on the label information, connects the obtained context-dependent HMMs, and constructs a sentence HMM.
- the parameter generation unit 038 further generates a sequence of excitation source parameters and spectral parameters from the obtained sentence HMM by a parameter generation algorithm.
- the excitation source generation unit 039 and the synthesis filter 040 generate a synthesized sound based on the excitation source parameter and spectral parameter sequences.
- the HMM speech synthesis process can generate a stable synthesized speech based on a statistical model.
- FIG. 13 is a configuration diagram of the multi-quality speech synthesizer according to Embodiment 2 of the present invention.
- the multi-quality speech synthesizer is a device that synthesizes speech of multiple qualities, and includes an HMM model DB501, an HMM model selection unit 502, a synthesis unit 503, a prosody modification unit 104, and a large speech unit.
- a segment DB 105, a corresponding DB 506, a speech segment candidate acquisition unit 107, a large speech unit selection unit 108, and a large speech unit connection unit 109 are included.
- the HMM model DB501 is a database that holds an HMM model learned based on speech data.
- the HMM model selection unit 502 receives at least phoneme information and prosodic information as inputs,
- HMM model This is a processing unit that selects the optimal HMM model from DB501.
- the synthesis unit 503 is a processing unit that generates synthesized sound using the HMM model selected by the HMM model selection unit 502.
- the correspondence DB 506 is a database that associates the HMM model held in the HMM model DB501! With the speech unit held in the large-scale speech unit DB 105.
- This embodiment can also be implemented as a multi-quality speech synthesis system as shown in FIG.
- the terminal 111 includes an HMM model DB501, an HMM model selection unit 502, a synthesis unit 503, a prosody modification unit 104, a corresponding DB 106, a speech unit candidate acquisition unit 107, a large-scale speech unit selection unit 108, Consists of a large-scale speech unit connection 109.
- the server 112 is composed of a large-scale speech unit DB105.
- the HMM model file is model-based, so the storage capacity required for terminal 111 can be reduced (approximately several megabytes). . Also, keep the large speech unit DB105 (hundreds of power or several gigabytes) centrally on the server 112! / ⁇ .
- the operation of the multi-quality speech synthesizer according to the present embodiment is the same as the operation of the multi-quality speech synthesizer according to the first embodiment, and the synthesized sound editing process and the edited synthesized sound high-quality sound process. It is divided into. In the following, description will be given in order by dividing the process of editing the synthesized sound and the process of improving the sound quality.
- prosodic information is generated based on the phoneme sequence and the accent symbol (S101).
- the method of generating prosodic information is not particularly limited. For example, it may be generated with reference to a template! Or it may be estimated using quantification class I.
- the prosodic information may be directly input from an external force.
- HMM model selection section 502 performs HMM speech synthesis based on the phoneme information and prosodic information obtained in step S101 (step S102). Specifically, the HMM model selection unit 502 selects an optimum HMM model from the HMM model DB 501 based on the input phoneme information and prosody information, and then selects a synthesis parameter from the selected HMM model. Is generated. Since the details have already been described, they are omitted here.
- the synthesis unit 503 synthesizes a speech waveform based on the synthesis parameter generated by the HMM model selection unit 502 (step S103).
- the synthesis method is not particularly limited.
- the synthesizing unit 503 outputs the synthesized sound created in step S103 to present it to the user (step S104).
- the prosody modification unit 104 receives an input indicating whether the user is satisfied with the synthesized sound, and if the user is satisfied (YES in step S004), ends the editing process, The processing after step S106 is executed.
- the prosody modification unit 104 accepts input by the user of information for modifying the prosody information, and sets the target prosody information. Correct (Step S005).
- “Modification of prosodic information” includes, for example, changing an accident position, changing a fundamental frequency, changing a duration, and the like. As a result, the user can correct a portion that cannot be satisfied with the prosody of the current synthesized sound. When the correction is completed, the process returns to step S002. By repeating the processing from step S002 to step S005, the user can create a synthesized sound of the prosody he desires. Through the above steps, the user can create audio content based on HMM synthesis.
- FIG. 15 is a diagram showing an operation example of the sound quality enhancement processing.
- the speech segment candidate acquisition unit 107 uses the HMM model sequence (M
- Step S106 the speech unit candidate acquisition unit 107 uses the correspondence DB 506 that holds information indicating the correspondence relationship between the HMM model held in the HMM model DB 501 and the unit of the large-scale speech unit DB 105, and performs step S102.
- the large speech unit candidate related to the HMM model in the HM M model DB501 selected by the above process is acquired from the large speech unit DB105.
- the speech unit candidate acquisition unit 107 has a large speech unit (h 1, h 2, h 3) corresponding to the HMM model (ml) selected to synthesize the phoneme “7 & 7”. h), D
- the speech segment candidate acquisition unit 107 refers to the corresponding DB 506 for the HMM model m2, ⁇ , mn.
- Large-scale speech unit candidates can be acquired from the large-scale speech unit DB105. How to create a compatible DB 506 will be described later.
- a synthesized sound is generated (step S008). Since the connection method may be the same as that in Embodiment 1, the description thereof is omitted.
- the learning process of the HMM model is used to associate the HMM model held in the HMM model DB501 with the speech unit held in the large-scale speech unit DB105.
- the HMM model In HMM speech synthesis, the HMM model usually uses a model called “context-dependent model” that is a combination of contexts such as the preceding phoneme, the phoneme, and the subsequent phoneme.
- Context-dependent model a model called “context-dependent model” that is a combination of contexts such as the preceding phoneme, the phoneme, and the subsequent phoneme.
- context clustering is usually performed. Since context clustering is a well-known technology, its detailed description will not be repeated.
- this HMM model is learned using the large speech unit DB105.
- the FIG. 16 shows an example of the result of context clustering performed on the speech unit group held in the large speech unit DB 105 at this time.
- Each speech unit of the speech unit group 702 of the large-scale speech unit DB105 is represented by a rectangle, and the number represents a speech unit identifier.
- context clustering speech samples are classified by context (for example, whether the preceding phoneme is a voiced sound). At this time, speech segments are clustered in stages as in the decision tree shown in FIG.
- the speech element having the same context is classified in the leaf node 703 of the decision tree.
- a speech unit speech unit with speech unit number 1 and speech unit number 2 with the preceding phoneme being voiced, the preceding phoneme being a vowel, and the preceding phoneme being ZaZ is a leaf. It is classified as node 703.
- the HMM model is learned using the speech unit number 1 and the speech unit number 2 as learning data, and model numbers “A” and! Are created.
- the HMM model with model number “A” also learns the speech unit forces of speech unit numbers 1 and 2 of large speech unit DB105.
- This figure is a conceptual diagram, and in reality, a larger amount of speech unit force HMM models are learned.
- the HMM model with model number “A” and the speech units used when learning the HMM model (speech units with speech unit number 1 and speech unit number 2) Information indicating the correspondence with the piece) is held in the correspondence DB 506.
- the correspondence DB 506 shown in Fig. 17 can be created.
- the HMM model with model number “A” is associated with the speech units with speech unit numbers “1” and “2” in the large speech unit DB105.
- the HMM model with model number “B” is associated with the speech unit numbers “3” and “4” in the large speech unit DB105! .
- the correspondence relationship between the model numbers of the HMM models of all leaf nodes and the large speech unit numbers of the large speech unit DB 105 can be stored as a table.
- the HMM model used to generate the sound is associated with the speech unit of the large speech unit DB 105 used for learning the HMM model. Therefore, the speech unit candidate of the large speech unit DB 105 selected by the speech unit candidate acquisition unit 107 is an actual waveform of the learning sample of the HMM model selected by the HMM model selection unit 502 from the HMM model DB 501.
- the prosodic information and voice quality information of the speech segment candidate and the HMM model are naturally similar.
- the HMM model is created by performing statistical processing. For this reason, there is a rounding during playback compared to the speech unit used for learning the HMM model.
- the fine structure that the waveform should have is lost due to statistical processing such as averaging of the learning samples.
- the speech unit in the large speech unit DB105 is not statistically processed, it retains the fine structure as it is. Therefore, in terms of sound quality, it is possible to obtain a high-quality synthesized sound compared to the synthesized sound output from the synthesis unit 503 using the HMM model.
- the HMM model is learned in units of phonemes, but the unit of learning may not be phonemes.
- a plurality of states in the HMM model may be held for one phoneme, and statistics may be learned individually in each state.
- the figure shows an example of an HMM model configured in three states for the phoneme “ZaZ”.
- the correspondence DB 506 stores information for associating each state in the HMM model with the speech unit stored in the large speech unit DB 105.
- the initial state “ml l” is converted into the speech unit of the large speech unit DB105 used for learning (speech unit number 1, 2, 3) shows that it can be expanded.
- the second state “ml2” can be expanded to the speech unit (speech unit number 1, 2, 3, 4, 5) of the large speech unit DB105 using the corresponding DB506.
- the final state “ml3” can be expanded to the speech unit (speech unit number 1, 3, 4, 6) of the large speech unit DB105 using the corresponding DB506.
- the speech segment candidate acquisition unit 107 can select speech segment candidates based on the following three criteria.
- a union of large speech units associated with each state of the HMM is set as a speech unit candidate.
- the large speech unit with speech unit number ⁇ 1, 2, 3, 4, 5, 6 ⁇ is selected as a selection candidate.
- the product set of large speech units associated with each state of the HMM is set as a speech unit candidate.
- the large speech unit with speech unit number ⁇ 1, 3 ⁇ is selected as a selection candidate.
- a speech unit belonging to a set greater than or equal to a predetermined threshold in a set of large speech units associated with each state of the HMM is set as a speech unit candidate.
- the predetermined threshold is “2”, in the example of FIG. 18, for example, a large speech unit having speech unit number ⁇ 1, 2, 3, 4 ⁇ is selected as a selection candidate.
- Each criterion may be used in combination. For example, when the speech unit candidate acquisition unit 107 selects a certain number of speech unit candidates, it may be designed to select speech unit candidates based on different criteria.
- the terminal 111 includes an HMM model DB501, an HMM model selection unit 502, a synthesis unit 503, a prosody modification unit 104, a corresponding DB106, a speech unit candidate acquisition unit 107,
- the large-scale speech unit selection unit 108 and the large-scale speech unit connection unit 109 are provided, and the server 112 is required for the terminal 111 by being configured to include the large-scale speech unit DB 105.
- the storage capacity need not be large.
- the large speech unit DB 105 should be held centrally on the server 112, the large speech unit DB 105 is stored in the server 112 even if there are multiple terminals 111. You only need to keep one.
- the terminal 111 can generate a synthesized sound using HMM speech synthesis.
- the prosody modification unit 104 allows the user to edit the synthesized sound.
- the HMM speech synthesis can generate synthesized speech at a very high speed compared with the case of searching and synthesizing the large speech unit DB105. For this reason, it is possible to reduce the calculation cost when editing the synthesized sound, and it is possible to edit the synthesized sound with good response even when editing multiple times.
- the corresponding DB 106 has already performed the editing process. Since the model number of the determined HMM model is associated with the speech unit number of the speech unit candidate of the large speech unit DB105, the speech unit selection unit 108 selects speech units. Compared with the case of reselecting speech segments, it is only necessary to search a limited search space, so that the amount of calculation can be greatly reduced.
- the communication between the terminal 111 and the server 112 since the communication between the terminal 111 and the server 112 only needs to be performed once when performing the quality improvement process, the time loss associated with the communication can be reduced. In other words, it is possible to improve the response required for audio content editing work by separating the editing work and the sound quality enhancement processing.
- each component may be shared by the terminal and the server.
- the small speech unit DB101, the small speech unit selection unit 102, the small speech unit connection unit 103, and the corresponding DB 106 are changed to the HMM model DB501, the HMM model selection unit 502, the synthesis unit 503, and the corresponding DB506. Each corresponds.
- a style of providing the created audio content to a third party can be considered. That is, the content creator and content user are different.
- the audio content creator sends the audio content created over the network, etc., and the receiver receives the audio content.
- the distribution form of audio content such as receiving a message can be considered.
- the creator A service can be considered when the created audio content is transmitted to the other party.
- a method may be considered in which the creator performs an audio content editing process, and the receiver listens to the received audio content and performs high-quality sound processing if he / she likes it.
- Embodiment 3 of the present invention relates to a communication method of a created audio content and a method of high-quality sound processing.
- FIG. 19 is a block diagram showing a configuration of a multi-quality speech synthesis system according to Embodiment 3 of the present invention.
- the audio content creator performs the editing process and the audio content receiver performs the audio quality improvement process.
- Communication means is provided between the terminal used by the creator and the terminal used by the receiver. This is different from the first and second embodiments.
- the multi-quality speech synthesis system includes a creation terminal 121, a reception terminal 122, and a server 123.
- the creation terminal 121, the reception terminal 122, and the server 123 are connected to each other via a network 113.
- the creation terminal 121 is a device used by an audio content creator to edit audio content.
- the receiving terminal 122 is a device that receives the voice terminal created by the creating terminal 121.
- the creation terminal 121 is used by the audio content receiver.
- the server 123 is a device that holds the large-scale speech unit DB 105 and performs high-quality sound processing of the audio content.
- the creation terminal 121 includes a small speech unit DB 101, a corresponding DB 106, a small speech unit selection unit 102, a small speech unit connection unit 103, and a prosody modification unit 104.
- the receiving terminal 122 includes a speech unit candidate acquisition unit 107, a large-scale speech unit selection unit 108, and a large-scale speech unit connection unit 109.
- the server 123 is composed of a large speech unit DB105.
- FIG. 20 and FIG. 21 are flowcharts showing the flow of processing by the multi-quality speech synthesis system according to Embodiment 3.
- the processing by the multi-quality speech synthesis system is divided into four processes: editing processing, communication processing, confirmation processing, and sound quality enhancement processing. Hereinafter, each processing will be described.
- the editing process is executed on the creation terminal 121.
- the processing content may be the same as in the first embodiment.
- preprocessing text information input by the user is analyzed, and prosodic information is generated based on the phoneme sequence and the accent symbol (step S001).
- the small speech unit selector 102 is based on the prosodic information obtained in step S001.
- Step S002 Select the optimal speech unit sequence considering the distance (target cost (Ct)) from the small speech unit DB101 to the target prosody (target cost (Ct)) and the connectivity of the speech unit (connection cost (Cc)) (Step S002). Specifically, the speech unit sequence that minimizes the cost shown in the above equation (1) is searched by the Viterbi algorithm.
- the small speech unit connection unit 103 synthesizes a speech waveform using the speech unit sequence selected by the small speech unit selection unit 102, and presents the synthesized speech to the user by outputting it. (Step S003).
- the prosody modification unit 104 receives an input indicating whether the user is satisfied with the synthesized sound, and if the user is satisfied with the synthesized sound (YES in step S004), the editing process is performed. When finished, the processing from step S201 is executed.
- the prosody modification unit 104 accepts input by the user of information for modifying the prosody information, and the target prosody. Correct the information (step S005). When the correction is completed, the process returns to step S002. By repeating the processing from step S002 to step S005, the user can create a synthesized sound of the prosody he desires.
- the creation terminal 121 receives the small speech unit sequence and prosodic information determined by the editing process on the creation terminal 121 via a network such as the Internet. (Step S201).
- the communication method is not particularly limited.
- Receiving terminal 122 receives the prosodic information and the small speech unit sequence transmitted in step S201 (step S202).
- the receiving terminal 122 can obtain the minimum information that can reconstruct the audio content created by the creating terminal 121.
- the receiving terminal 122 obtains the speech unit of the small speech unit sequence received from the small speech unit DB101 in step S202 and matches the prosodic information received by the small speech unit connection unit 103.
- a synthesized sound is created (step S203).
- the synthetic sound creation process is the same as in step S003.
- the receiver confirms the simple synthesized sound created in step S203, and the receiving terminal 122
- step S204 The receiver's judgment result is accepted (step S204). At this time, when the receiver determines that the simple version of the synthesized sound may remain (NO in step S204), the receiving terminal 122 employs the simple synthesized sound as the audio content. On the other hand, if it is confirmed that the receiver requests a higher sound quality (YES in step S204), the sound quality improving process after step S006 is performed.
- the speech unit candidate acquisition unit 107 of the receiving terminal 122 transmits the small speech unit sequence to the server 123, and the server 123 refers to the corresponding DB 106 of the receiving terminal 122 and refers to the large speech unit DB105. Speech segment candidates are acquired from (step S006).
- the large speech unit selection unit 108 selects a large speech unit sequence that satisfies the above-described equation (1) from the prosodic information acquired in step S006 and the speech unit candidates (step S007).
- the large-scale speech unit connection unit 109 connects the large-scale speech unit sequence selected in step S007, and generates a high-quality synthesized sound (step S008).
- the audio content created by the creation terminal 121 is sent to the reception terminal 122.
- the prosodic information and the small speech unit sequence need to be transmitted, so the communication volume between the creating terminal 121 and the receiving terminal 122 is reduced compared to the case of transmitting the synthesized sound. It is possible.
- the synthesis terminal 121 can edit a synthesized sound using only a small speech unit sequence, it simplifies the creation of audio content that does not necessarily require a high-quality synthesized sound to be created via the server 123. It becomes possible.
- the receiving terminal 122 can create a synthesized sound based on the prosodic information and the small speech unit sequence, and confirm it by listening to the synthesized sound before performing the high-quality sound processing.
- the audio content can be auditioned without accessing the server 123.
- the recipient can access the server 123 for high-quality sound only when he / she wants to listen to the sampled audio content, so the receiver can freely select the simple version and high-quality version of the audio content. .
- the receiving terminal 122 holds the corresponding DB 106, the speech unit candidate acquisition unit 107, the large speech unit selection unit 108, and the large speech unit connection unit 109.
- the server 123 holds the large-scale speech unit DB 105.
- the server 123 has the large-scale speech unit DB 105, the speech unit candidate acquisition unit 107, the large-scale speech unit selection unit 108, A large-scale speech unit connection unit 109 may be provided.
- the description is based on the configuration of the first embodiment.
- the functions of the creation terminal 121, the reception terminal 122, and the server 123 are configured based on the configuration of the second embodiment. You may make it.
- the creation terminal 121 includes an HMM model DB 501, an HMM model selection unit 502, a synthesis unit 503, and a prosody modification unit 104
- the reception terminal 122 includes a corresponding DB 106 and a speech unit candidate acquisition unit 107.
- the voice unit unit 109 is configured.
- the server 123 may be configured by the large speech unit DB105.
- the present invention can be applied to a speech synthesizer, and in particular, can be applied to a speech synthesizer used when creating a speech content used in a mobile phone or the like.
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
Claims
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US12/303,455 US20090254349A1 (en) | 2006-06-05 | 2007-05-11 | Speech synthesizer |
CN2007800208718A CN101490740B (zh) | 2006-06-05 | 2007-05-11 | 声音合成装置 |
JP2008520470A JP4246790B2 (ja) | 2006-06-05 | 2007-05-11 | 音声合成装置 |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2006-156429 | 2006-06-05 | ||
JP2006156429 | 2006-06-05 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2007141993A1 true WO2007141993A1 (ja) | 2007-12-13 |
Family
ID=38801258
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2007/059765 WO2007141993A1 (ja) | 2006-06-05 | 2007-05-11 | 音声合成装置 |
Country Status (4)
Country | Link |
---|---|
US (1) | US20090254349A1 (ja) |
JP (1) | JP4246790B2 (ja) |
CN (1) | CN101490740B (ja) |
WO (1) | WO2007141993A1 (ja) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2010018649A1 (ja) * | 2008-08-13 | 2010-02-18 | 日本電気株式会社 | 音声合成システム |
WO2010018648A1 (ja) * | 2008-08-13 | 2010-02-18 | 日本電気株式会社 | 音声合成システム |
JP2012103668A (ja) * | 2010-11-08 | 2012-05-31 | Voiceware Co Ltd | 使用者の候補合成単位選択による音声合成方法およびシステム(VoiceSynthesizingMethodandSystemBasedonUserDirectedCandidate−UnitSelection) |
JP2012163721A (ja) * | 2011-02-04 | 2012-08-30 | Toshiba Corp | 読み記号列編集装置および読み記号列編集方法 |
JP2014026222A (ja) * | 2012-07-30 | 2014-02-06 | Brother Ind Ltd | データ生成装置、及びデータ生成方法 |
JP2015153294A (ja) * | 2014-02-18 | 2015-08-24 | 株式会社東芝 | 情報処理装置及び画像データ送信方法 |
JP2016105210A (ja) * | 2016-03-04 | 2016-06-09 | 株式会社東芝 | 読み記号列編集装置および読み記号列編集方法 |
Families Citing this family (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP5238205B2 (ja) * | 2007-09-07 | 2013-07-17 | ニュアンス コミュニケーションズ,インコーポレイテッド | 音声合成システム、プログラム及び方法 |
JP4455633B2 (ja) * | 2007-09-10 | 2010-04-21 | 株式会社東芝 | 基本周波数パターン生成装置、基本周波数パターン生成方法及びプログラム |
US20100305949A1 (en) * | 2007-11-28 | 2010-12-02 | Masanori Kato | Speech synthesis device, speech synthesis method, and speech synthesis program |
JP4528839B2 (ja) * | 2008-02-29 | 2010-08-25 | 株式会社東芝 | 音素モデルクラスタリング装置、方法及びプログラム |
CN101897189B (zh) * | 2008-10-10 | 2016-07-06 | 松下电器产业株式会社 | 图像解码装置以及图像解码方法 |
JP5743625B2 (ja) * | 2011-03-17 | 2015-07-01 | 株式会社東芝 | 音声合成編集装置および音声合成編集方法 |
WO2013018294A1 (ja) * | 2011-08-01 | 2013-02-07 | パナソニック株式会社 | 音声合成装置および音声合成方法 |
CN103366731B (zh) * | 2012-03-31 | 2019-02-01 | 上海果壳电子有限公司 | 语音合成方法及系统 |
EP2685448B1 (en) * | 2012-07-12 | 2018-09-05 | Harman Becker Automotive Systems GmbH | Engine sound synthesis |
JP6048726B2 (ja) | 2012-08-16 | 2016-12-21 | トヨタ自動車株式会社 | リチウム二次電池およびその製造方法 |
JP5726822B2 (ja) * | 2012-08-16 | 2015-06-03 | 株式会社東芝 | 音声合成装置、方法及びプログラム |
US9401140B1 (en) * | 2012-08-22 | 2016-07-26 | Amazon Technologies, Inc. | Unsupervised acoustic model training |
US9997154B2 (en) | 2014-05-12 | 2018-06-12 | At&T Intellectual Property I, L.P. | System and method for prosodically modified unit selection databases |
JP6483578B2 (ja) * | 2015-09-14 | 2019-03-13 | 株式会社東芝 | 音声合成装置、音声合成方法およびプログラム |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH09135264A (ja) * | 1995-11-08 | 1997-05-20 | Nippon Telegr & Teleph Corp <Ntt> | 電子メール通信メディア変換システム |
JPH11249676A (ja) * | 1998-02-27 | 1999-09-17 | Secom Co Ltd | 音声合成装置 |
JP2001242882A (ja) * | 2000-02-29 | 2001-09-07 | Toshiba Corp | 音声合成方法及び音声合成装置 |
JP2002156988A (ja) * | 2000-11-21 | 2002-05-31 | Matsushita Electric Ind Co Ltd | 情報提供システム及び音声合成装置 |
JP2002244689A (ja) * | 2001-02-22 | 2002-08-30 | Rikogaku Shinkokai | 平均声の合成方法及び平均声からの任意話者音声の合成方法 |
JP2002268660A (ja) * | 2001-03-13 | 2002-09-20 | Japan Science & Technology Corp | テキスト音声合成方法および装置 |
JP2003233386A (ja) * | 2002-02-08 | 2003-08-22 | Nippon Telegr & Teleph Corp <Ntt> | 音声合成方法、音声合成装置および音声合成プログラム |
JP2004013122A (ja) * | 2002-06-11 | 2004-01-15 | Fujitsu Ltd | テキスト読み上げシステム及び方法 |
Family Cites Families (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
FR2636163B1 (fr) * | 1988-09-02 | 1991-07-05 | Hamon Christian | Procede et dispositif de synthese de la parole par addition-recouvrement de formes d'onde |
US6366883B1 (en) * | 1996-05-15 | 2002-04-02 | Atr Interpreting Telecommunications | Concatenation of speech segments by use of a speech synthesizer |
US7027568B1 (en) * | 1997-10-10 | 2006-04-11 | Verizon Services Corp. | Personal message service with enhanced text to speech synthesis |
US20030028380A1 (en) * | 2000-02-02 | 2003-02-06 | Freeland Warwick Peter | Speech system |
US6539354B1 (en) * | 2000-03-24 | 2003-03-25 | Fluent Speech Technologies, Inc. | Methods and devices for producing and using synthetic visual speech based on natural coarticulation |
US7266497B2 (en) * | 2002-03-29 | 2007-09-04 | At&T Corp. | Automatic segmentation in speech synthesis |
CN1217311C (zh) * | 2002-04-22 | 2005-08-31 | 安徽中科大讯飞信息科技有限公司 | 分布式语音合成系统 |
EP1471499B1 (en) * | 2003-04-25 | 2014-10-01 | Alcatel Lucent | Method of distributed speech synthesis |
US7496512B2 (en) * | 2004-04-13 | 2009-02-24 | Microsoft Corporation | Refining of segmental boundaries in speech waveforms using contextual-dependent models |
CN100347741C (zh) * | 2005-09-02 | 2007-11-07 | 清华大学 | 移动语音合成方法 |
-
2007
- 2007-05-11 US US12/303,455 patent/US20090254349A1/en not_active Abandoned
- 2007-05-11 JP JP2008520470A patent/JP4246790B2/ja not_active Expired - Fee Related
- 2007-05-11 CN CN2007800208718A patent/CN101490740B/zh not_active Expired - Fee Related
- 2007-05-11 WO PCT/JP2007/059765 patent/WO2007141993A1/ja active Search and Examination
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH09135264A (ja) * | 1995-11-08 | 1997-05-20 | Nippon Telegr & Teleph Corp <Ntt> | 電子メール通信メディア変換システム |
JPH11249676A (ja) * | 1998-02-27 | 1999-09-17 | Secom Co Ltd | 音声合成装置 |
JP2001242882A (ja) * | 2000-02-29 | 2001-09-07 | Toshiba Corp | 音声合成方法及び音声合成装置 |
JP2002156988A (ja) * | 2000-11-21 | 2002-05-31 | Matsushita Electric Ind Co Ltd | 情報提供システム及び音声合成装置 |
JP2002244689A (ja) * | 2001-02-22 | 2002-08-30 | Rikogaku Shinkokai | 平均声の合成方法及び平均声からの任意話者音声の合成方法 |
JP2002268660A (ja) * | 2001-03-13 | 2002-09-20 | Japan Science & Technology Corp | テキスト音声合成方法および装置 |
JP2003233386A (ja) * | 2002-02-08 | 2003-08-22 | Nippon Telegr & Teleph Corp <Ntt> | 音声合成方法、音声合成装置および音声合成プログラム |
JP2004013122A (ja) * | 2002-06-11 | 2004-01-15 | Fujitsu Ltd | テキスト読み上げシステム及び方法 |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2010018649A1 (ja) * | 2008-08-13 | 2010-02-18 | 日本電気株式会社 | 音声合成システム |
WO2010018648A1 (ja) * | 2008-08-13 | 2010-02-18 | 日本電気株式会社 | 音声合成システム |
US8606583B2 (en) | 2008-08-13 | 2013-12-10 | Nec Corporation | Speech synthesis system for generating speech information obtained by converting text into speech |
US8620663B2 (en) | 2008-08-13 | 2013-12-31 | Nec Corporation | Speech synthesis system for generating speech information obtained by converting text into speech |
JP5408133B2 (ja) * | 2008-08-13 | 2014-02-05 | 日本電気株式会社 | 音声合成システム |
JP5408134B2 (ja) * | 2008-08-13 | 2014-02-05 | 日本電気株式会社 | 音声合成システム |
JP2012103668A (ja) * | 2010-11-08 | 2012-05-31 | Voiceware Co Ltd | 使用者の候補合成単位選択による音声合成方法およびシステム(VoiceSynthesizingMethodandSystemBasedonUserDirectedCandidate−UnitSelection) |
JP2012163721A (ja) * | 2011-02-04 | 2012-08-30 | Toshiba Corp | 読み記号列編集装置および読み記号列編集方法 |
JP2014026222A (ja) * | 2012-07-30 | 2014-02-06 | Brother Ind Ltd | データ生成装置、及びデータ生成方法 |
JP2015153294A (ja) * | 2014-02-18 | 2015-08-24 | 株式会社東芝 | 情報処理装置及び画像データ送信方法 |
JP2016105210A (ja) * | 2016-03-04 | 2016-06-09 | 株式会社東芝 | 読み記号列編集装置および読み記号列編集方法 |
Also Published As
Publication number | Publication date |
---|---|
JPWO2007141993A1 (ja) | 2009-10-22 |
JP4246790B2 (ja) | 2009-04-02 |
CN101490740B (zh) | 2012-02-22 |
US20090254349A1 (en) | 2009-10-08 |
CN101490740A (zh) | 2009-07-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP4246790B2 (ja) | 音声合成装置 | |
US10789290B2 (en) | Audio data processing method and apparatus, and computer storage medium | |
CN108573693B (zh) | 文本到语音系统和方法以及其存储介质 | |
JP6434948B2 (ja) | 名前発音システム及び方法 | |
JP5768093B2 (ja) | 音声処理システム | |
US7603278B2 (en) | Segment set creating method and apparatus | |
KR20220004737A (ko) | 다국어 음성 합성 및 언어간 음성 복제 | |
CN101872615B (zh) | 用于分布式文本到话音合成以及可理解性的系统和方法 | |
JP4539537B2 (ja) | 音声合成装置,音声合成方法,およびコンピュータプログラム | |
JP2007249212A (ja) | テキスト音声合成のための方法、コンピュータプログラム及びプロセッサ | |
WO2007033147A1 (en) | Methods and apparatus for formant-based voice synthesis | |
JP2018146803A (ja) | 音声合成装置及びプログラム | |
JPWO2006134736A1 (ja) | 音声合成装置、音声合成方法およびプログラム | |
JP5929909B2 (ja) | 韻律生成装置、音声合成装置、韻律生成方法および韻律生成プログラム | |
CN111465982A (zh) | 信号处理设备和方法、训练设备和方法以及程序 | |
CN114242033A (zh) | 语音合成方法、装置、设备、存储介质及程序产品 | |
CN110600004A (zh) | 一种语音合成播放方法、装置和存储介质 | |
CN115938338A (zh) | 语音合成方法、装置、电子设备及可读存储介质 | |
JP6594251B2 (ja) | 音響モデル学習装置、音声合成装置、これらの方法及びプログラム | |
JP2007178686A (ja) | 音声変換装置 | |
WO2008056604A1 (fr) | Système de collecte de son, procédé de collecte de son et programme de traitement de collecte | |
JP4758931B2 (ja) | 音声合成装置、方法、プログラム及びその記録媒体 | |
CN117854478B (zh) | 基于可控文本的语音合成方法、装置和系统 | |
CN115910028A (zh) | 语音合成方法、模型生成方法 | |
JP2001350489A (ja) | 音声合成装置 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
WWE | Wipo information: entry into national phase |
Ref document number: 200780020871.8 Country of ref document: CN |
|
DPE2 | Request for preliminary examination filed before expiration of 19th month from priority date (pct application filed from 20040101) | ||
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 07743200 Country of ref document: EP Kind code of ref document: A1 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2008520470 Country of ref document: JP |
|
WWE | Wipo information: entry into national phase |
Ref document number: 12303455 Country of ref document: US |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 07743200 Country of ref document: EP Kind code of ref document: A1 |
|
DPE2 | Request for preliminary examination filed before expiration of 19th month from priority date (pct application filed from 20040101) |