WO2005050624A1 - Modificateur de la voix - Google Patents

Modificateur de la voix Download PDF

Info

Publication number
WO2005050624A1
WO2005050624A1 PCT/JP2004/017139 JP2004017139W WO2005050624A1 WO 2005050624 A1 WO2005050624 A1 WO 2005050624A1 JP 2004017139 W JP2004017139 W JP 2004017139W WO 2005050624 A1 WO2005050624 A1 WO 2005050624A1
Authority
WO
WIPO (PCT)
Prior art keywords
voice
voice quality
range
conversion
presenting
Prior art date
Application number
PCT/JP2004/017139
Other languages
English (en)
Japanese (ja)
Inventor
Natsuki Saito
Takahiro Kamai
Original Assignee
Matsushita Electric Industrial Co., Ltd.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Matsushita Electric Industrial Co., Ltd. filed Critical Matsushita Electric Industrial Co., Ltd.
Publication of WO2005050624A1 publication Critical patent/WO2005050624A1/fr

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/003Changing voice quality, e.g. pitch or formants
    • G10L21/007Changing voice quality, e.g. pitch or formants characterised by the process used
    • G10L21/013Adapting to target pitch
    • G10L2021/0135Voice conversion or morphing

Definitions

  • the present invention relates to a voice quality conversion device that converts voice quality of voice.
  • Some voice synthesizers that artificially generate voice include a voice quality conversion device that converts the voice quality of a synthesized voice. (E.g., see Patent Documents 1 and 2.) 0 Akira
  • the voice quality conversion device of Patent Document 1 described above uses a synthesis unit generated from voices of a plurality of speakers.
  • a database in which data is stored in advance is provided.
  • the voice conversion device first selects the synthesis unit data closest to the specified synthesis unit also in the database.
  • the voice conversion device checks how different the voice quality of the speaker of the selected synthesized unit data is from the specified voice quality, and if the voice quality differs from the specified voice quality by more than a predetermined level, the voice quality conversion device approaches the specified voice quality. In this way, voice conversion is performed on the synthesized unit data.
  • the voice conversion device performs codebook mapping from the codebook (information representing characteristics of voice quality) of the selected synthesized unit data to a codebook having a voice quality that matches the specified voice quality.
  • the voice quality of the synthesized unit data is converted to the specified voice quality.
  • the voice quality conversion device of Patent Document 2 converts voice quality of synthesized voice by converting a sampling frequency when converting digital voice data into an analog voice signal.
  • this voice quality conversion device appropriately sets so-called prosodic information (spectral parameters) such as a fundamental frequency and a phoneme duration in accordance with a change in the sampling frequency so that the output voice is appropriate. I have.
  • Patent Document 1 Japanese Patent Application Laid-Open No. 07-319495
  • Patent Document 2 JP 08-152900A
  • Patent Document 3 Japanese Patent Application Laid-Open No. 2000-187491
  • the voice quality conversion device of Patent Document 3 described above has a problem in that it is used from the viewpoint of a user interface, and is not easy to use.
  • the voice quality conversion device of Patent Document 3 it is possible to prevent the voice quality from breaking down, and the user cannot grasp the extent to which the voice quality can be converted while the voice quality is still low. Therefore, the user may instruct the voice conversion device to convert to a desired voice quality even though the voice quality is broken. As a result, the voice quality conversion device converts the voice quality to a voice quality different from the voice quality specified by the user in order to prevent the voice quality from being broken.
  • the present invention has been made in view of a powerful problem, and an object of the present invention is to provide a voice quality conversion device in which the viewpoint of a user interface is improved in usability. Means for solving the problem
  • a voice quality conversion device for converting feature data indicating a feature of a voice into conversion feature data indicating a voice having a voice quality different from the voice.
  • Acquisition means for acquiring the characteristic data
  • presentation means for presenting a range in which voice quality can be converted
  • reception means for receiving a voice quality specified by a user within the range presented by the presentation means.
  • the range presented by the presenting means is changed to an appropriate range in which the voice quality indicated by the converted feature data does not fail.
  • the convertible range of the voice quality presented by the presentation means is changed to an appropriate range according to the feature data and the voice quality specified by the user, and the user can change the voice quality specified by the voice quality to another voice quality.
  • the voice quality if the voice quality is specified within the proper range without being aware of whether or not the voice quality of the converted feature data will be broken, the converted feature data indicating the voice quality expected by the user is generated. can do.
  • the point of view of the one-interface can also improve usability.
  • the presenting means presents, for each of a plurality of types of voice qualities, a range in which the voice qualities can be converted, and the accepting means presents a range within each of the voice qualities presented to the presenting means.
  • the conversion unit may change the feature data into the converted feature data according to the parameters of each voice quality received by the reception unit.
  • the presenting means presents, for each of the plurality of voice qualities, a graphic and a pointer that moves on the graphic according to a user operation, thereby presenting a range in which the voice qualities can be converted.
  • the accepting unit identifies a parameter specified by the user based on the position of the pointer on the graphic, and accepts the parameter.
  • the presenting unit when the user causes the accepting unit to accept a parameter that increases the brightness within a range in which, for example, the voice quality indicating the brightness presented by the presenting unit can be converted, the presenting unit For example, the range in which the conversion of voice quality indicating fast-talking can be converted is reduced, and when the user tries to specify a voice quality that further increases the rate of fast-talking, the voice quality of the conversion feature data is broken.
  • the parameters within the reduced range of the fast voice without being aware of whether or not it will occur, it is possible to generate converted feature data that shows the voice quality expected by the user.
  • the range changing means may change the range that can be converted by moving the pointer.
  • the presenting means displays the graphic in a bar shape, and the range changing means changes the convertible range by moving the pointer along the longitudinal direction of the graphic.
  • the presenting means arranges the figures and pointers for the respective voice qualities in parallel so that the more similar the change content based on the respective voice qualities, the narrower the gap between them. It is good.
  • the presenting means may include a figure and a port for each voice quality. Inters are arranged along the same circumference so that the more similar the content of change based on each voice quality, the smaller the angle between them.
  • the content of change based on voice quality indicating brightness and the content of conversion based on voice quality indicating fast-talking are similar.
  • the pointer corresponding to the voice quality of the brightness is also changed by the range changing means. Move to one end of the figure so that the range that can be increased is reduced. Therefore, by arranging and displaying the figures and pointers corresponding to these voice qualities near each other, the user can easily recognize a change in the range in which the voice qualities can be converted.
  • the range changing means may change the convertible range by deforming the figure.
  • the presenting means displays the graphic in a bar shape
  • the range changing means changes the range of the changeable extent by expanding and contracting the length of the graphic in the longitudinal direction.
  • the speech synthesizer is a speech synthesizer that converts a text indicated by text data into a synthesized speech, acquires the text data, and corresponds to the text of the text data.
  • Characteristic data generating means for generating characteristic data indicating characteristics of a sound to be reproduced
  • obtaining means for obtaining characteristic data generated by the characteristic data generating means
  • presenting means for presenting a convertible range of voice quality, and the presenting Within the range presented by the means
  • receiving means for receiving the voice quality specified by the user, the characteristic data acquired by the acquiring means, and the voice quality received by the receiving means, are presented by the presentation means.
  • Range changing means for changing the range of the synthesized voice to an appropriate range in which the voice quality of the synthesized voice does not break down, and the feature data acquired by the Conversion means for converting into conversion characteristic data indicating voice of voice quality received by the reception means; and voice output means for generating and outputting the synthesized voice based on the conversion characteristic data converted by the conversion means. It is characterized by.
  • the range in which the voice quality presented by the presentation means can be converted is the range of the feature data and Since the voice quality is changed to an appropriate range according to the voice quality specified by the user, the user should be conscious of whether or not the voice quality of the conversion feature data will fail when trying to specify another voice quality. If the voice quality is specified within the proper range, the text indicated by the text data can be converted into a synthesized voice with the voice quality expected by the user. As a result, the user interface viewpoint can be improved in usability.
  • the present invention can be realized not only as such a voice conversion device or a voice synthesis device, but also as a method and a program of an operation performed by the device, and also as a storage medium for storing the program. can do.
  • the voice quality conversion device of the present invention has an operational effect that the viewpoint power of the user interface and the usability can be improved.
  • FIG. 1 is a configuration diagram of a voice quality conversion device according to Embodiment 1 of the present invention.
  • FIG. 2 is an explanatory diagram illustrating an example of an operation of the voice quality conversion device according to the first embodiment.
  • FIG. 3 is an explanatory diagram for explaining another example of the operation of the voice quality conversion device of the above.
  • FIG. 4 is an explanatory diagram for explaining still another example of the operation of the voice quality conversion device of the above.
  • FIG. 5 is an explanatory diagram for explaining still another example of the operation of the above voice quality conversion device.
  • FIG. 6 is an explanatory diagram for explaining still another example of the operation of the voice quality conversion device of the above.
  • FIG. 7 is an explanatory diagram for explaining a constraint condition by the Indigo algorithm of the above.
  • FIG. 8 is a configuration diagram of a voice quality conversion device according to Embodiment 2 of the present invention.
  • FIG. 9 is an explanatory diagram for explaining the content presented by the voice quality adjustment unit according to the embodiment.
  • FIG. 10 is a flowchart showing an operation of an adjustment control unit according to the embodiment.
  • FIG. 11 is for explaining the content presented by the voice quality adjustment unit according to the first modification of the above.
  • FIG. 12 is an explanatory diagram for describing contents presented by a voice quality adjustment unit according to the second modification of the above.
  • FIG. 13A is an explanatory diagram for describing a distance between voice qualities according to Modification 3 of the above.
  • FIG. 13B is a diagram showing a display content of a voice quality adjustment unit according to the third modification of the above.
  • FIG. 14A is a diagram showing a display content of a voice quality adjustment unit according to Modification 4 of the above.
  • FIG. 14B is an explanatory diagram for explaining how the voice quality adjusting unit according to Modification 4 of the above changes the display content.
  • FIG. 15 is a configuration diagram of a speech synthesis device according to Embodiment 3 of the present invention.
  • FIG. 16 is a configuration diagram of a speech synthesis device according to a first modification of the above.
  • FIG. 17 is a configuration diagram of a speech synthesis device according to Modification 2 of the above.
  • FIG. 18 is a configuration diagram of a speech synthesizer according to a third modification of the above.
  • FIG. 19 is a configuration diagram of a speech synthesizer according to a fourth modification of the above.
  • FIG. 20 is a configuration diagram of a speech synthesizer according to a fifth modification of the above embodiment.
  • FIG. 1 is a configuration diagram of a voice quality conversion device according to Embodiment 1 of the present invention.
  • the voice quality conversion apparatus converts voice quality while preventing the occurrence of voice quality breakdown, and includes conversion section 101, voice quality adjustment section 103, adjustment control section 104, conversion coefficient A storage unit 105 and a limit value storage unit 106 are provided.
  • the conversion unit 101 acquires a feature parameter sequence pi indicating an acoustic feature of the speech.
  • the characteristic parameter sequence pi is data indicating the acoustic characteristics of the speech obtained as a result of analyzing the speech for each frame as a parameter, and the original speech is obtained by performing resynthesis based on this. It is.
  • the conversion unit 101 generates a deformed feature parameter sequence p2 by converting the parameter of the acoustic feature indicated by the feature parameter sequence pi according to the instruction from the voice quality adjustment unit 103.
  • the deformed feature parameter sequence p2 indicates the acoustic feature of the voice as a parameter, similar to the feature parameter sequence pi, and is used to generate a synthesized voice.
  • the voice quality of the synthesized speech (voice waveform) generated using the deformed feature parameter sequence p2 and the voice quality of the synthesized voice (voice waveform) generated using the feature parameter sequence pi Depending on.
  • the conversion coefficient storage unit 105 holds coefficient data serving as a template when the conversion unit 101 performs the conversion process.
  • voice quality adjusting section 103 When operated by the user, voice quality adjusting section 103 receives the converted voice quality expected by the user, and receives an instruction to change the voice quality from adjustment control section 104. Further, the voice quality adjustment unit 103 stores the By using the coefficient data, the conversion content according to the operation result of the user and the instruction of the adjustment control unit 104 is specified, and the conversion content is instructed to the conversion unit 101.
  • the voice quality adjustment unit 103 includes, for each type of voice quality, for example, for each brightness and quickness, a range bar B indicating a convertible range of the voice quality, and a movable range bar B on the range bar B.
  • a pointer P indicating the degree of voice quality conversion is displayed. The user operates the pointer P and moves it along the range bar B to set a desired voice quality.
  • the limit value storage unit 106 stores limit conditions (such as a limit value of a parameter indicating each acoustic feature) for obtaining a synthesized speech that maintains naturalness for the deformed feature parameter sequence p2.
  • limit conditions such as a limit value of a parameter indicating each acoustic feature
  • Adjustment control section 104 obtains feature parameter sequence pi, and also obtains the operation result of user on voice quality adjustment section 103.
  • Adjustment control section 104 estimates deformed characteristic parameter sequence p2 based on the characteristic parameter sequence pi and the operation result. Then, the adjustment control unit 104 compares the estimated modified feature parameter sequence p2 with the limit condition in the limit value storage unit 106. If the modified feature meta-string P2 does not satisfy the limit condition, the adjustment control unit 104 instructs the voice quality adjustment unit 103 to change the operation result of the user so as to satisfy the limit condition. That is, the adjustment control unit 104 determines whether or not there is a force that causes sound quality deterioration (breakage of voice quality) in the deformed feature parameter sequence p2 based on the conversion content set in the voice quality adjustment unit 103, and does not cause sound quality deterioration. Adjust the conversion details.
  • FIG. 2 is an explanatory diagram illustrating an example of an operation of the voice quality conversion device according to the present embodiment.
  • Voice quality adjustment section 103 accepts voice quality expected by the user by operating the plurality of pointers P by the user.
  • the voice quality adjustment unit 103 indicates four voice qualities that can be converted: brightness, darkness, masculinity, and fast voice.
  • the conversion range of these voice qualities is indicated by a range bar B scaled from 0 to 10.
  • the user designates the voice quality and the conversion amount to be converted by moving the pointer P corresponding to each voice quality within the range of 0 to 10 scales on the range bar B.
  • the voice quality adjusting unit 103 determines that conversion is not required for the voice quality, and as the indicated value approaches 10, the voice quality adjusting unit 103 Judge that a large conversion is required.
  • the voice quality adjustment unit 103 may be constituted by a volume switch or the like.
  • the characteristic parameter sequence pi is a parameter of an acoustic characteristic that can be adjusted, and is a fundamental frequency FO, a first formant frequency Fl, a second formant frequency F2, a frame duration FR, and a sound source power PW for each analysis frame.
  • a fundamental frequency FO a fundamental frequency
  • Fl a first formant frequency
  • F2 a second formant frequency
  • F2 a frame duration
  • PW sound source power
  • the coefficient data 105a held in the conversion coefficient storage unit 105 stores the above five acoustic characteristics of the feature parameter sequence pi when the indicated value power is increased by ⁇ in each voice quality of the voice quality adjustment unit 103. Indicates the value (coefficient) to be added to the parameter of.
  • the conversion unit 101 of the voice quality conversion apparatus acquires the feature parameter sequence pi, and The same modified feature parameter sequence P2 as the parameter sequence pi is output.
  • FIG. 3 is an explanatory diagram for describing another example of the operation of the voice quality conversion device according to the present embodiment.
  • the user sets the instruction value of the brightness of the voice quality adjustment unit 103 to 5 and the instruction value of the fast-talk to 3
  • the conversion unit 101 calculates the coefficient of each acoustic feature with respect to the brightness of the coefficient data 105a, Integrate with the brightness indication value (5). Further, the conversion unit 101 integrates the coefficient of each acoustic feature of the coefficient data 105a for the fast-talking and the indicated value (3) of the fast-talking. The conversion unit 101 adds up these integrated values for each acoustic feature, and further adds the result to the value of the feature parameter sequence pi. As a result, the conversion unit 101 generates the deformation feature parameter sequence p2.
  • FIG. 4 is an explanatory diagram for explaining still another example of the operation of the voice quality conversion device according to the present embodiment.
  • the user After the state of voice quality adjusting section 103 shown in FIG. 3, the user sets the instruction value of darkness to 7 for voice quality adjusting section 103.
  • adjustment control section 104 changes the instruction value set by voice quality adjustment section 103 to an instruction value that is easy for the user to operate.
  • the adjustment control unit 104 specifies the above-described relationship between the brightness and the darkness from the coefficient data 105a, and when the darkness indication value is set to 7, first reduces the brightness indication value from 5 to 0. To the voice quality adjustment unit 103, and further instructs the voice quality adjustment unit 103 to reduce the indicated value of darkness from 7 to 2.
  • the instruction value of darkness is set to 7 while the instruction value of brightness is 5;
  • the effect of changing the indicated value of brightness from 5 to 0 is the same as the effect of increasing the indicated value of darkness by 5. Therefore, instead of setting the indicated value of darkness to 7, the adjustment control unit 104 determines that the indicated value of brightness should be set to 0 and the indicated value of darkness should be set to 2. Then, adjustment control unit 104 instructs voice quality adjustment unit 103 of the determination result, and changes the instruction value set by the user.
  • the adjustment control unit 104 adjusts the instruction values set in the voice quality adjustment unit 103 by the user so that the values of the respective instruction values become minimum, so that the user can easily operate V, An interface can be built.
  • FIG. 5 is an explanatory diagram for explaining still another example of the operation of the voice conversion device according to the present embodiment.
  • the user sets, for example, an instruction value of brightness to 10.
  • Indicates PW 30.
  • limit value storage section 106 stores limit conditions indicating that the maximum value of fundamental frequency F0 is 350. That is, the limit condition indicates that when the value of the fundamental frequency F0 of the modified feature parameter sequence p2 exceeds 350, the sound quality of the synthesized sound generated based on the modified feature parameter sequence p2 is significantly deteriorated.
  • FIG. 6 is an explanatory diagram for explaining still another example of the operation of the voice quality conversion device according to the present embodiment.
  • the user From the state of voice quality adjustment section 103 shown in FIG. 5, that is, the state in which the brightness instruction value is 10 and all other instruction values are 0, the user further sets the instruction value of the fast-talk to 5 .
  • Adjustment control section 104 estimates modified feature parameter sequence p2 when conversion processing according to the instruction value set by the user for voice quality adjustment portion 103 is performed on feature parameter sequence pi.
  • the adjustment control unit 104 determines whether or not the parameters of each acoustic feature of the estimated deformation feature parameter sequence p2 satisfy the limit condition of the limit value storage unit 106.
  • the adjustment control unit 104 controls the voice quality adjustment unit 103 to change the indicated value so that the parameter satisfies the limit condition.
  • Make instructions At this time, for example, the adjustment control unit 104 gives an instruction to give priority to the instruction value recently set by the user, or gives an instruction to give priority to the largest instruction value.
  • the adjustment control unit 104 estimates the deformed feature parameter sequence p2 in this case, and It is determined that the fundamental frequency F0 (355) in column p2 does not satisfy the limit condition (350 or less). As a result, the adjustment control unit 104 instructs the voice quality adjustment unit 103 to reduce the brightness instruction value by 1 so that the value of the fundamental frequency F0 of the deformed feature parameter sequence p2 is set to 350 or less. I do.
  • voice quality adjusting section 103 changes the indicated value of brightness from 10 to 9. in this way,
  • the adjustment control unit 104 adjusts the indicated value according to the limit condition, so that the user can perform the voice conversion operation so that the voice quality does not break down without being aware of the limit value of the parameter of each acoustic feature. it can.
  • the adjustment control unit 104 may refer to the limit conditions stored in the limit value storage unit 106 as needed.
  • the limit condition indicates the limit value for each parameter of each acoustic feature such that the value of the fundamental frequency FO must not exceed 350, or adds the value of the fundamental frequency FO and the value of the second formant frequency F2. It may show data that the result should not exceed 2000.
  • the conversion given to the characteristic parameter sequence pi by the conversion unit 101 may not be uniform for all analysis frames, and the coefficient data 105a of the conversion coefficient storage unit 105 may be different for each analysis frame! /, You can! / ,.
  • the adjustment of the indicated value by the adjustment control unit 104 may be automatically performed using a constraint satisfaction algorithm.
  • An example of the constraint satisfaction algorithm is the Indigo algorithm (A. Borning, R. Anderson, B. Freeman-Benson: The Indigo Algontnm, TR
  • FIG. 7 is an explanatory diagram for describing a constraint condition by the Indigo algorithm.
  • the constraint condition shown in FIG. 7 is for adjusting the indicated value shown in FIG. 6 with respect to the fundamental frequency F0, and is described as follows in the constraint hierarchy of the Indigo algorithm.
  • variables tl to t8 are variables for holding intermediate results of the calculation. Although omitted in FIG. 7 for the sake of simplicity, in order to obtain more desirable results, it is desirable to provide a REQUIRED constraint that binds each indicated value to a value between 0 and 10.
  • FIG. 8 is a configuration diagram of a voice quality conversion device according to Embodiment 2 of the present invention.
  • the voice quality conversion apparatus has improved viewpoint and usability of a user interface, and includes a conversion section 101, a voice quality adjustment section 103a, an adjustment control section 104a, and a conversion coefficient storage section 105. And a limit value storage unit 106.
  • a conversion section 101 includes a voice quality adjustment section 103a, an adjustment control section 104a, and a conversion coefficient storage section 105.
  • a limit value storage unit 106 includes a limit value storage unit 106.
  • the voice quality adjusting unit 103a receives the converted voice quality expected by the user when operated by the user. That is, the voice quality adjustment unit 103a has a function as a receiving unit that receives a voice quality specified by the user. Further, the voice quality adjustment unit 103a specifies the conversion content according to the operation result of the user by using the coefficient data 105a stored in the conversion coefficient storage unit 105, and sends the conversion content to the conversion unit 101. Instruct. Specifically, similarly to the voice quality adjusting section 103 of the first embodiment, the voice quality adjusting section 103a includes, for each type of voice quality, for example, for each brightness or fast-talking, a range indicating a convertible range (absolute range) of the voice quality.
  • a bar B and a pointer P which is movable on the range bar B and indicates the degree of the voice quality are displayed.
  • the user operates the pointer P to move along the range bar B to set a desired voice quality.
  • the voice quality adjustment unit 103a has a function as a presentation unit that presents a range bar B and a pointer P to present a range that can be further converted from the current voice quality conversion degree. .
  • voice quality adjusting section 103a in the present embodiment receives an instruction of a conversion range for each voice quality from adjustment control section 104, and presents only the instructed conversion range to the user. That is, the voice quality adjustment unit 103a changes the length of the range bar B to a length corresponding to the conversion range instructed by the adjustment control unit 104, and moves the pointer P to a position other than on the range bar B. Ban.
  • the adjustment control unit 104a acquires the characteristic parameter sequence pi, the operation result of the user on the voice quality adjustment unit 103a, and the limit condition of the limit value storage unit 106. Then, the adjustment control unit 104 derives an appropriate conversion range of each voice quality in the voice quality adjustment unit 103a based on the characteristic parameter sequence pi, the operation result, and the limit condition. The adjustment control unit 104a determines the derived The switching range is instructed to the voice quality adjustment unit 103a. That is, the adjustment control unit 104a breaks down the range presented by the voice quality adjustment unit 103a to the voice quality indicated by the modified feature parameter sequence p2 according to the characteristic parameter sequence pl and the voice quality received by the user in the voice quality adjustment unit 103a. It has a function as a range changing means for changing to an appropriate range in which no problem occurs.
  • FIG. 9 is an explanatory diagram for describing the content presented by voice quality adjusting section 103a of the present embodiment.
  • the user sets the pointers P of the respective voice qualities of the voice timbre adjusting unit 103a so that the indicated values are all zero.
  • the user sets the pointer P of the voice quality indicating the brightness of the voice quality adjusting unit 103a so that the indicated value becomes 10.
  • the limit conditions are that the fundamental frequency F0 of the deformed feature parameter sequence p2 is 350 or less, the first formant frequency F1 is 600 or less, the second formant frequency F2 is 1700 or less, the frame duration FR is 100, and the sound source power PW is 50.
  • the adjustment control unit 104a compares each parameter of the deformation feature parameter sequence p2 with the limit condition, and determines that the fundamental frequency F0 cannot be further increased. That is, the adjustment control unit 104a determines that the conversion range of the voice quality indicating the fast-talking is limited to only the 0 scale, and instructs the voice quality adjustment unit 103a of the determination result.
  • the voice quality adjusting unit 103a shortens the range bar B of the length of 10 scales corresponding to the voice quality of the fast-talk to a length of 0 scale. To display this.
  • the length of the fast-talk range bar B becomes only 0 divisions, so that the user cannot move the fast-talk pointer P. . Therefore, it is possible to prevent the occurrence of voice deterioration, that is, the breakdown of voice quality, by increasing the calorie of the instruction value of the fast mouth.
  • the adjustment control unit 104a performs the following based on the setting: Again, as described above, for voice qualities other than voice qualities that indicate brightness, the limit condition in the limit value storage unit 106 (fundamental frequency FO of the deformed feature parameter sequence p2 is 350 or less). Derive a range. That is, the adjustment control unit 104a determines that the conversion range of the voice quality indicating the fast-talking is limited to five scales, and instructs the voice quality adjustment unit 103a of the determination result.
  • the voice quality adjustment unit 103a sets the range bar B corresponding to the voice quality of the fast-talking to a length of five scales, that is, a length corresponding to scales 0 to 5. Display this after a long time.
  • FIG. 10 is a flowchart showing the operation of adjustment control section 104a in the present embodiment.
  • the adjustment control unit 104a acquires the characteristic parameter sequence pi (step S100), and specifies the contents of settings made by the user for the voice quality adjustment unit 103a (step S102).
  • the adjustment control unit 104a estimates a modified feature parameter sequence p2 based on the feature parameter sequence pi and the settings of the voice quality adjustment unit 103a (step S104).
  • the adjustment control unit 104a derives an appropriate conversion range for each voice quality of the voice quality adjustment unit 103a based on the estimated deformation feature parameter sequence p2 and the limit condition of the limit value storage unit 106 (Step S106).
  • adjustment control section 104a instructs voice quality adjustment section 103a of the derived proper conversion range, and displays range bar B having a length corresponding to the conversion range (step S108).
  • the convertible range of the voice quality presented by voice quality adjusting section 103a is changed to an appropriate range according to feature parameter sequence pi and the voice quality specified by the user. Therefore, when the user wants to specify another voice quality, the user can specify the voice quality within an appropriate range without being aware of whether or not the voice quality of the deformed feature parameter sequence p2 is broken. Thus, it is possible to generate a deformed feature parameter sequence indicating the voice quality expected by the user. As a result, usability can be improved from the viewpoint of the user interface.
  • Modification 1 a first modified example regarding the display method of voice quality adjusting section 103a in the present embodiment will be described.
  • the voice quality adjustment unit 103a is configured such that the pointer P can be moved by the conversion range instructed by the adjustment control unit 104a without changing the length of the range bar B. Change position.
  • FIG. 11 is an explanatory diagram for describing the content presented by the voice quality adjusting unit 103a according to the present modification.
  • the user sets the pointers P of the respective voice qualities of the voice timbre adjusting unit 103a such that the indicated values are all zero.
  • the user sets the pointer P of the voice quality indicating the brightness of the voice quality adjusting unit 103a so that the indicated value becomes 10.
  • adjustment control section 104a determines that the conversion range of voice quality indicating fast-talking is limited to only 0 scales, and instructs voice quality adjustment section 103a of the determination result.
  • the voice quality adjustment unit 103a that has received such an instruction moves the pointer P corresponding to the voice quality of the fast-talking to the position of the scale 10, and displays it, as shown in (b) of FIG. . That is, the instruction content of the adjustment control unit 104a indicates that the conversion range of the voice quality indicating the fast-talking is limited to only the 0 scale, and indicates that the pointer P cannot be moved in the increasing direction of the scale. Therefore, the voice quality adjustment unit 103a according to the present modification moves the pointer P to a position where the pointer P cannot be moved in the scale increasing direction, that is, the position of the scale 10, and displays it.
  • the voice quality adjusting unit 103a merely moves the pointer P corresponding to the voice quality of the voice, and the conversion of the parameter of the acoustic feature with the indication value of the voice quality of the voice being 10 to the conversion unit 101. I will not tell you. In this way, the pointer P of the fast-talk is displayed on the scale 10 (maximum value) in conjunction with the setting of the brightness indication value being 10, so that the voice degradation caused by increasing the fast-talk indication value is provided. , That is, breakdown of voice quality can be prevented.
  • the adjustment control unit 104a performs, based on the setting, Again, as described above, it is determined that the conversion range of the voice quality indicating the fast voice is limited to five scales, and the determination result is instructed to the voice quality adjustment unit 103a.
  • the voice quality adjustment unit 103a Upon receiving such an instruction, the voice quality adjustment unit 103a, as shown in FIG. Move the pointer P corresponding to the quality to the position of the scale 5 to display it. That is, the instruction content of the adjustment control unit 104a indicates that the conversion range of the voice quality indicating the fast-talking is limited to only five graduations, and indicates that the pointer P is powered in the increasing direction by five graduations. Therefore, the voice quality adjustment unit 103a according to the present modification moves the pointer P to a position where the pointer P can be moved in the increasing direction by five graduations, that is, the position of the graduation 5, and displays it.
  • the voice quality adjustment unit 103a merely moves the pointer P corresponding to the voice quality of the voice, and sets the indicator value of the voice quality of the voice to 5 to convert the parameter of the acoustic feature to the conversion unit 101. I will not tell you.
  • the voice quality adjustment unit 103a displays the movable range of the pointer P without changing the length of the range bar B in characters.
  • FIG. 12 is an explanatory diagram for describing the content presented by voice quality adjusting section 103a according to the present modification.
  • the user sets the pointers P of each voice quality of the voice quality adjustment unit 103a such that the indicated values are all zero.
  • the user sets the pointer P of the voice quality indicating the brightness of the voice quality adjusting unit 103a so that the indicated value becomes 10.
  • adjustment control section 104a determines that the conversion range of voice quality indicating fast-talking is limited to only 0 scales, and instructs voice quality adjustment section 103a of the determination result.
  • the voice quality adjusting unit 103a receives such an instruction, places the word "up to here" at the position of the scale 0 of the range bar B corresponding to the voice quality of the fast-talk. indicate. Further, even if the user operates to move the pointer P corresponding to the voice quality of the fast-talked voice while such characters are displayed, the voice quality adjustment unit 103a does not accept the operation and the position of the pointer P Is fixed.
  • the adjustment control unit 104a performs, based on the setting, Again, as described above, it was determined that the conversion range of the voice quality indicating the quick speech was limited to only 5 scales. The result of the determination is instructed to the voice quality adjusting unit 103a.
  • the voice quality adjusting unit 103a receives such an instruction, displays the word "up to here" at the position of the scale 5 on the range bar B corresponding to the voice quality of the fast-talk. indicate. Further, even if such a character is displayed and the user operates to move the pointer P corresponding to the quick voice quality to the scale 5 or more while the character is displayed, the voice quality adjusting unit 103a does not accept the operation. Keep the position of pointer P at scale 5 or less.
  • the voice quality adjustment unit 103a arranges the range bar B and the pointer P corresponding to each voice quality such that the closer the change in the voice quality, the closer to each other, and presents it to the user.
  • the voice quality adjustment unit 103 obtains the coefficient data 105a stored in the transform coefficient storage unit 105, and based on the coefficient data 105a, determines the similarity of the change content between the voice qualities such as brightness and darkness. Specify the degree. For example, the voice quality adjustment unit 103 derives a difference value of a coefficient for each acoustic feature between voice qualities indicated by the coefficient data 105a, and obtains a Euclidean distance (hereinafter simply referred to as a distance) between voice qualities from the difference value. Based on this distance, voice quality adjusting section 103 specifies the similarity between voice qualities.
  • a distance Euclidean distance
  • FIG. 13A is an explanatory diagram for describing the distance between voice qualities.
  • the voice quality adjusting unit 103a calculates the distance between the voice qualities as shown in FIG. 13A. For example, the distance between voice quality indicating masculinity and voice quality indicating loudness is 5.4, and the distance between voice quality indicating brightness and voice quality indicating loudness is 11.3. .
  • the voice quality adjusting unit 103a determines that the two voice qualities are more similar as the voice qualities are closer in the calculated distance, and the range bars B and the pointer P indicating the voice qualities are closer to each other.
  • FIG. 13B is a diagram showing display contents of voice quality adjustment section 103a.
  • the voice quality adjusting unit 103 sets the range bar B of each voice quality in the order of voice quality indicating masculinity, voice quality indicating length, voice quality indicating brightness, and voice quality indicating fast-talking. Pointer P is presented.
  • This modification example when combined with the first modification example, can provide a voice quality conversion operation that is intuitive and easy for the user to compose.
  • the range bar B and the pointer P having similar voice quality changes are arranged close to each other, when the user operates the pointer P having a certain voice quality, the other nearby bars are arranged.
  • the pointer P of voice quality moves in the same direction and moves farther away, and the pointer P of another voice quality moves in the opposite direction. Therefore, the user can intuitively understand how the voice quality is converted by operating the pointer P.
  • the voice quality adjusting unit 103a of the third modification arranges the range bars B and the pointers P corresponding to the voice qualities in a line so that the closer the voice variance is, the closer the variation is.
  • the voice quality adjustment unit 103a according to the present modification sets the range bar B and the pointer P corresponding to each voice quality on the same circumference so that the closer the voice content changes, the smaller the angle between them becomes. Place along.
  • FIG. 14A is a diagram showing the display content of voice quality adjusting section 103a.
  • the voice quality adjustment unit 103a summarizes the lower limit of the range bar B of each voice quality at one point and displays each range bar B along the same circle.
  • the voice quality adjustment unit 103 determines that the angle between the masculinity range bar B and the darkness range bar B is the smallest, and that the manhood range bar B and the fast-talk range bar B The angle between Present each range bar B to be the largest.
  • the voice quality adjustment unit 103a according to the present modification may also have the function of changing the position of the pointer P based on the display method described in the first modification, that is, the instruction from the adjustment control unit 104a.
  • FIG. 14B is an explanatory diagram for explaining how the voice quality adjusting unit 103a changes the display content.
  • the voice quality adjustment unit 103a when the user moves the voice quality pointer P indicating the tone in the scale increasing direction, the voice quality adjustment unit 103a, based on the instruction from the adjustment control unit 104a, the voice quality pointer indicating the masculinity. Move P in the direction of increase in the scale, and move each pointer P of the voice quality indicating the brightness and the quickness in the direction of decrease in the scale.
  • FIG. 15 is a configuration diagram of a speech synthesis device according to Embodiment 3 of the present invention.
  • This speech synthesizer is a device capable of acquiring text data and performing speech synthesis with various voice qualities.
  • a database 202, a waveform generation unit 203, and a speaker 204 are provided.
  • the speech synthesis database 202 accumulates segment data indicating a plurality of speech segments. Upon acquiring the text data tdl based on the operation of the user, the speech synthesis unit 201 selects the segment data corresponding to the text indicated by the text data tdl from the speech synthesis database 202. Then, the speech synthesis unit 201 generates a feature parameter sequence pi using the selected segment data, and outputs the feature parameter sequence pi to the voice conversion device.
  • the voice conversion device upon acquiring the characteristic parameter sequence pi, the voice conversion device converts the voice represented by the characteristic parameter sequence pi. Then, the voice conversion device generates and outputs a transformed feature parameter sequence p2 indicating the result of the conversion.
  • the waveform generating unit 203 Upon acquiring the deformed feature parameter sequence p2 from the voice conversion device, the waveform generating unit 203 generates a waveform signal si indicating the deformed feature parameter sequence p2 as a speech waveform, and generates the waveform signal s 1 is output to the speaker 204.
  • the speaker 204 outputs a synthesized voice corresponding to the waveform signal si.
  • the speech synthesis device includes the voice quality conversion device according to the second embodiment, and outputs the contents of text data tdl in a voice with a desired voice quality of the user without failure. And the usability can be further improved.
  • the voice conversion device of the first embodiment may be provided in the speech synthesis device of the present embodiment.
  • FIG. 16 is a configuration diagram of a speech synthesizer according to the present modification.
  • the adjustment control unit 104b of the voice conversion apparatus is different from the force adjustment control unit 104a having the same function as the adjustment control unit 104a of the second embodiment in that it acquires the characteristic parameter sequence pi, The unit data stored in the database 202 is obtained.
  • the adjustment control unit 104b detects the sound quality deterioration of the synthesized speech based on the segment data of the speech synthesis database 202 instead of the feature parameter sequence pi, and thereby the voice quality adjustment unit 103 Change the position of the pointer P, or change the length of the range bar B.
  • the adjustment control unit 104b predicts the tendency of the parameter of the acoustic feature indicated by the feature parameter sequence pi by using a part or all of the segment data stored in the speech synthesis database 202, and generates a prediction result. Based on this, the position of the pointer P and the length of the range bar B are changed.
  • the adjustment control unit 104b selects all the segment data one by one from the speech synthesis database 202 and determines whether or not the quality of the synthesized speech is degraded when the segment data is converted according to the voice quality adjustment unit 103a. Change the position of pointer P to the reference.
  • the speech synthesizer can make the processing content of the adjustment control unit 104b the same no matter what text data tdl is input unless the speech synthesis database 202 is replaced. , The processing content can be simplified. However, if the contents of the characteristic parameter sequence pi greatly differ depending on the contents of the text data tdl, The quality of synthesized speech may be degraded depending on the content of the data tdl.
  • the feature parameter sequence pi in the present modified example does not have to have the unit data power of the speech synthesis database 202 generated by the speech synthesis processing by the speech synthesis unit 201.
  • the feature parameter sequence pi used in the present modification is sufficiently similar to the voice quality indicated by the feature parameter sequence pi generated in this way, if it is similar to the feature parameter sequence pi generated by some other method, There may be.
  • FIG. 17 is a configuration diagram of a speech synthesizer according to the present modification.
  • the speech synthesizer according to the present modification stores a feature table that holds, as a feature table, only data necessary for estimating quality degradation of synthesized speech among a plurality of segment data stored in the speech synthesis database 202.
  • a part 205 is provided.
  • the feature table held in the feature table storage unit 205 includes, for example, an upper limit value and a lower limit value of a parameter for each acoustic feature among all segment data stored in the speech synthesis database 202. Only the value and the average value are extracted.
  • the adjustment control unit 104c according to the present modification is different from the force adjustment control unit 104a having the same function as the adjustment control unit 104a of the second embodiment in The above-mentioned feature table stored in the feature table storage unit 205 is obtained.
  • the adjustment control unit 104c estimates the quality degradation of the synthesized speech based on the feature table of the feature table storage unit 205 instead of the feature parameter sequence pi, and performs voice quality adjustment.
  • the position of the pointer P in the section 103 is changed, and the length of the range bar B is changed.
  • the adjustment control unit 104c according to the present modification uses a feature table having a small amount of information, unlike the adjustment control unit 104b according to the first modification, in which a large amount of segment data of the speech synthesis database 202 is used.
  • the position of the pointer P and the length of the range bar B can be quickly changed.
  • the feature parameter sequence pi in this modification example is also generated by the speech synthesis processing by the speech synthesis unit 201, as in the first modification example. It doesn't have to be something. In other words, if the feature parameter sequence pi used in this modification is sufficiently similar to the voice quality indicated by the feature parameter sequence pi generated in this way, it is a feature parameter sequence pi generated by some other method. Also good.
  • FIG. 18 is a configuration diagram of a speech synthesizer according to the present modification.
  • the speech synthesis device includes a speech synthesis unit 201a instead of speech synthesis unit 201 in the present embodiment. Further, the voice quality conversion device according to the present modification includes a conversion unit 101a and an adjustment control unit 104b instead of the conversion unit 101 and the adjustment control unit 104a.
  • the adjustment control unit 104b changes the position of the pointer P of the voice quality adjustment unit 103a or changes the length of the range bar B based on the unit data of the speech synthesis database 202. Or change it.
  • the conversion unit 101a performs processing on the segment data stored in the speech synthesis database 202.
  • the audio characteristic indicated by the segment data is converted.
  • the speech synthesis unit 201a Upon acquiring the text data tdl, the speech synthesis unit 201a converts the segment data corresponding to the text indicated by the text data tdl and converted for the voice quality (acoustic feature) into a conversion unit. Obtained from 101a. Then, the speech synthesis unit 201a generates a deformed feature parameter sequence p2 using the obtained converted unit data, and outputs the deformed feature parameter sequence p2 to the waveform generating unit 203.
  • the voice synthesizing apparatus includes the feature table storage unit 205 according to the second modification, and includes the adjustment control unit 104c according to the second modification instead of the adjustment control unit 104b of the voice conversion device. May be.
  • FIG. 19 is a configuration diagram of a speech synthesis device according to the present modification.
  • the speech synthesis apparatus includes a speech analysis unit 206 instead of the speech synthesis unit 201 and the speech synthesis database 202.
  • the voice analysis unit 206 acquires voice waveform data dl that is a real voice and indicates the voice waveform, and generates a feature parameter sequence p1 based on the voice waveform data dl.
  • the conversion unit 101 and the adjustment control unit 104a of the voice quality conversion device obtain the characteristic parameter sequence p 1 generated as described above from the voice analysis unit 206.
  • the voice synthesizer of the present modified example converts the voice quality of the voice spoken by the user in real time and outputs the voice as synthesized voice. Further, with this configuration, it is possible to perform voice quality conversion processing on synthesized voice generated from the real voice voice waveform data dl while preventing quality deterioration through an interface that is intuitively easy to operate.
  • the voice quality conversion device may include the voice analysis unit 206.
  • FIG. 20 is a configuration diagram of a speech synthesizer according to the present modification.
  • the speech synthesis apparatus includes a speech analysis unit 206 instead of the speech synthesis unit 201 and the speech synthesis database 202, similarly to the speech synthesis apparatus of the fourth modification. Further, the voice quality conversion device according to the present modification includes an adjustment control unit 104d instead of the adjustment control unit 104a.
  • the adjustment control unit 104d acquires the waveform feature table td2 instead of acquiring the feature parameter sequence pi as in the force adjustment control unit 104a having the same function as the adjustment control unit 104a. That is, the adjustment control unit 104d according to the present modification estimates the position of the pointer P of the voice quality adjustment unit 103a by estimating the quality deterioration of the synthesized speech based on the waveform feature table td2 instead of the feature parameter sequence pi. Or change the length of the range bar B.
  • the waveform feature table td2 contains, for example, only data necessary for estimating the quality degradation of the synthesized speech from the result of analyzing the sample speech previously uttered by the same speaker who uttered the speech waveform data dl. Is extracted.
  • the waveform feature table td2 is obtained by extracting only the upper limit value, the lower limit value, and the average value from the parameters of each acoustic feature that is the analysis result of the sample voice.
  • the adjustment control unit 104d may select any one of the plurality of waveform feature tables td2 from which the plurality of waveform feature tables td2 may be acquired.
  • the adjustment control unit 10 4d selects and uses the waveform feature table t2 that best represents the features of the speech waveform data dl and the feature parameter sequence pi based on attributes such as the age and gender of the speaker.
  • the adjustment control unit 104d of the present modification uses the waveform feature table td2 with a small amount of information instead of using the feature parameter sequence pi with a large amount of information, thereby obtaining the position of the pointer P and the length of the range bar B. Changes can be made quickly.
  • the voice conversion device of the present invention has an effect that the viewpoint power of the user interface and the usability can be improved.
  • an agent application using a synthetic sound ⁇ a text-to-speech application, a communication using a voice conversion function It is useful as a device or a voice quality editor device.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Quality & Reliability (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Telephone Function (AREA)

Abstract

L'invention porte sur un modificateur de la voix plis convivial du point de vue de l'interface de l'utilisateur. Ledit modificateur comporte: une unité d'ajustement de la voix (103) indiquant la plage des modifications possibles et recevant d'un utilisateur un type de qualité de voix intérieur à ladite plage; une unité de commande du réglage (104) acquérant une chaîne de paramètres caractéristiques (p1) et modifiant la plage indiquée par l'unité d'ajustement de la voix (103) en une plage appropriée de manière à ce que la chaîne de paramètres caractéristiques ainsi modifiée (p2) ne comporte pas coupures; et une unité de conversion (101) acquérant la chaîne de paramètres caractéristiques (p1) et la convertissant en une chaîne modifiée (p2) indiquant la qualité de la voix reçue par l'unité de réglage de la qualité de la voix (103).
PCT/JP2004/017139 2003-11-21 2004-11-18 Modificateur de la voix WO2005050624A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2003392672A JP2007041012A (ja) 2003-11-21 2003-11-21 声質変換装置および音声合成装置
JP2003-392672 2003-11-21

Publications (1)

Publication Number Publication Date
WO2005050624A1 true WO2005050624A1 (fr) 2005-06-02

Family

ID=34616459

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2004/017139 WO2005050624A1 (fr) 2003-11-21 2004-11-18 Modificateur de la voix

Country Status (2)

Country Link
JP (1) JP2007041012A (fr)
WO (1) WO2005050624A1 (fr)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2008058696A (ja) * 2006-08-31 2008-03-13 Nara Institute Of Science & Technology 声質変換モデル生成装置及び声質変換システム
US7792673B2 (en) 2005-11-08 2010-09-07 Electronics And Telecommunications Research Institute Method of generating a prosodic model for adjusting speech style and apparatus and method of synthesizing conversational speech using the same
CN102527039A (zh) * 2010-12-30 2012-07-04 德信互动科技(北京)有限公司 声效控制装置及方法

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP6497025B2 (ja) * 2013-10-17 2019-04-10 ヤマハ株式会社 音声処理装置
JP6483578B2 (ja) * 2015-09-14 2019-03-13 株式会社東芝 音声合成装置、音声合成方法およびプログラム
JP6639285B2 (ja) 2016-03-15 2020-02-05 株式会社東芝 声質嗜好学習装置、声質嗜好学習方法及びプログラム

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH09230893A (ja) * 1996-02-22 1997-09-05 N T T Data Tsushin Kk 規則音声合成方法及び音声合成装置
JPH1097267A (ja) * 1996-09-24 1998-04-14 Hitachi Ltd 声質変換方法および装置
JPH11249679A (ja) * 1998-03-04 1999-09-17 Ricoh Co Ltd 音声合成装置
JP2000194390A (ja) * 1998-12-25 2000-07-14 Matsushita Electric Ind Co Ltd 音声合成方法とその装置
JP2001195604A (ja) * 1999-10-20 2001-07-19 Hitachi Kokusai Electric Inc 動画像情報の編集方法
JP2002297176A (ja) * 2001-03-29 2002-10-11 Sanyo Electric Co Ltd 電子書籍装置
JP2003066984A (ja) * 2001-04-30 2003-03-05 Sony Computer Entertainment America Inc ユーザが指定する特性に基づいてネットワーク上を伝送したコンテンツデータを改変する方法
JP2003140678A (ja) * 2001-10-31 2003-05-16 Matsushita Electric Ind Co Ltd 合成音声の音質調整方法と音声合成装置

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH09230893A (ja) * 1996-02-22 1997-09-05 N T T Data Tsushin Kk 規則音声合成方法及び音声合成装置
JPH1097267A (ja) * 1996-09-24 1998-04-14 Hitachi Ltd 声質変換方法および装置
JPH11249679A (ja) * 1998-03-04 1999-09-17 Ricoh Co Ltd 音声合成装置
JP2000194390A (ja) * 1998-12-25 2000-07-14 Matsushita Electric Ind Co Ltd 音声合成方法とその装置
JP2001195604A (ja) * 1999-10-20 2001-07-19 Hitachi Kokusai Electric Inc 動画像情報の編集方法
JP2002297176A (ja) * 2001-03-29 2002-10-11 Sanyo Electric Co Ltd 電子書籍装置
JP2003066984A (ja) * 2001-04-30 2003-03-05 Sony Computer Entertainment America Inc ユーザが指定する特性に基づいてネットワーク上を伝送したコンテンツデータを改変する方法
JP2003140678A (ja) * 2001-10-31 2003-05-16 Matsushita Electric Ind Co Ltd 合成音声の音質調整方法と音声合成装置

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7792673B2 (en) 2005-11-08 2010-09-07 Electronics And Telecommunications Research Institute Method of generating a prosodic model for adjusting speech style and apparatus and method of synthesizing conversational speech using the same
JP2008058696A (ja) * 2006-08-31 2008-03-13 Nara Institute Of Science & Technology 声質変換モデル生成装置及び声質変換システム
CN102527039A (zh) * 2010-12-30 2012-07-04 德信互动科技(北京)有限公司 声效控制装置及方法

Also Published As

Publication number Publication date
JP2007041012A (ja) 2007-02-15

Similar Documents

Publication Publication Date Title
US7991616B2 (en) Speech synthesizer
US20090234652A1 (en) Voice synthesis device
US6405169B1 (en) Speech synthesis apparatus
WO2005109399A1 (fr) Dispositif de synthèse vocale et procédé
US20090204395A1 (en) Strained-rough-voice conversion device, voice conversion device, voice synthesis device, voice conversion method, voice synthesis method, and program
WO2006040908A1 (fr) Synthetiseur de parole et procede de synthese de parole
JPH05333900A (ja) 音声合成方法および装置
JP2008268477A (ja) 韻律調整可能な音声合成装置
EP3480810A1 (fr) Dispositif de synthèse vocale et procédé de synthèse vocale
WO2005050624A1 (fr) Modificateur de la voix
JP4664194B2 (ja) 声質制御装置および方法およびプログラム記憶媒体
JPH05260082A (ja) テキスト読み上げ装置
US11437016B2 (en) Information processing method, information processing device, and program
JP4841339B2 (ja) 韻律補正装置、音声合成装置、韻律補正方法、音声合成方法、韻律補正プログラム、および、音声合成プログラム
JP3685648B2 (ja) 音声合成方法及び音声合成装置、並びに音声合成装置を備えた電話機
JP5518621B2 (ja) 音声合成装置およびコンピュータプログラム
JP2956936B2 (ja) 音声合成装置の発声速度制御回路
JPH08272388A (ja) 音声合成装置及びその方法
US12014723B2 (en) Information processing method, information processing device, and program
JPH09179576A (ja) 音声合成方法
JP6191094B2 (ja) 音声素片切出装置
JP2003271200A (ja) 音声合成方法および音声合成装置
JP3892691B2 (ja) 音声合成方法及びその装置並びに音声合成プログラム
JP3292218B2 (ja) 音声メッセージ作成装置
Ebihara et al. Speech synthesis software with a variable speaking rate and its implementation on a 32-bit microprocessor

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BW BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE EG ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NA NI NO NZ OM PG PH PL PT RO RU SC SD SE SG SK SL SY TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): BW GH GM KE LS MW MZ NA SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IS IT LU MC NL PL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

DPEN Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed from 20040101)
121 Ep: the epo has been informed by wipo that ep was designated in this application
NENP Non-entry into the national phase

Ref country code: DE

WWW Wipo information: withdrawn in national office

Country of ref document: DE

122 Ep: pct application non-entry in european phase
NENP Non-entry into the national phase

Ref country code: JP

WWW Wipo information: withdrawn in national office

Country of ref document: JP