WO2005050624A1 - Voice changer - Google Patents

Voice changer Download PDF

Info

Publication number
WO2005050624A1
WO2005050624A1 PCT/JP2004/017139 JP2004017139W WO2005050624A1 WO 2005050624 A1 WO2005050624 A1 WO 2005050624A1 JP 2004017139 W JP2004017139 W JP 2004017139W WO 2005050624 A1 WO2005050624 A1 WO 2005050624A1
Authority
WO
WIPO (PCT)
Prior art keywords
voice
voice quality
range
conversion
presenting
Prior art date
Application number
PCT/JP2004/017139
Other languages
French (fr)
Japanese (ja)
Inventor
Natsuki Saito
Takahiro Kamai
Original Assignee
Matsushita Electric Industrial Co., Ltd.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Matsushita Electric Industrial Co., Ltd. filed Critical Matsushita Electric Industrial Co., Ltd.
Publication of WO2005050624A1 publication Critical patent/WO2005050624A1/en

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/003Changing voice quality, e.g. pitch or formants
    • G10L21/007Changing voice quality, e.g. pitch or formants characterised by the process used
    • G10L21/013Adapting to target pitch
    • G10L2021/0135Voice conversion or morphing

Definitions

  • the present invention relates to a voice quality conversion device that converts voice quality of voice.
  • Some voice synthesizers that artificially generate voice include a voice quality conversion device that converts the voice quality of a synthesized voice. (E.g., see Patent Documents 1 and 2.) 0 Akira
  • the voice quality conversion device of Patent Document 1 described above uses a synthesis unit generated from voices of a plurality of speakers.
  • a database in which data is stored in advance is provided.
  • the voice conversion device first selects the synthesis unit data closest to the specified synthesis unit also in the database.
  • the voice conversion device checks how different the voice quality of the speaker of the selected synthesized unit data is from the specified voice quality, and if the voice quality differs from the specified voice quality by more than a predetermined level, the voice quality conversion device approaches the specified voice quality. In this way, voice conversion is performed on the synthesized unit data.
  • the voice conversion device performs codebook mapping from the codebook (information representing characteristics of voice quality) of the selected synthesized unit data to a codebook having a voice quality that matches the specified voice quality.
  • the voice quality of the synthesized unit data is converted to the specified voice quality.
  • the voice quality conversion device of Patent Document 2 converts voice quality of synthesized voice by converting a sampling frequency when converting digital voice data into an analog voice signal.
  • this voice quality conversion device appropriately sets so-called prosodic information (spectral parameters) such as a fundamental frequency and a phoneme duration in accordance with a change in the sampling frequency so that the output voice is appropriate. I have.
  • Patent Document 1 Japanese Patent Application Laid-Open No. 07-319495
  • Patent Document 2 JP 08-152900A
  • Patent Document 3 Japanese Patent Application Laid-Open No. 2000-187491
  • the voice quality conversion device of Patent Document 3 described above has a problem in that it is used from the viewpoint of a user interface, and is not easy to use.
  • the voice quality conversion device of Patent Document 3 it is possible to prevent the voice quality from breaking down, and the user cannot grasp the extent to which the voice quality can be converted while the voice quality is still low. Therefore, the user may instruct the voice conversion device to convert to a desired voice quality even though the voice quality is broken. As a result, the voice quality conversion device converts the voice quality to a voice quality different from the voice quality specified by the user in order to prevent the voice quality from being broken.
  • the present invention has been made in view of a powerful problem, and an object of the present invention is to provide a voice quality conversion device in which the viewpoint of a user interface is improved in usability. Means for solving the problem
  • a voice quality conversion device for converting feature data indicating a feature of a voice into conversion feature data indicating a voice having a voice quality different from the voice.
  • Acquisition means for acquiring the characteristic data
  • presentation means for presenting a range in which voice quality can be converted
  • reception means for receiving a voice quality specified by a user within the range presented by the presentation means.
  • the range presented by the presenting means is changed to an appropriate range in which the voice quality indicated by the converted feature data does not fail.
  • the convertible range of the voice quality presented by the presentation means is changed to an appropriate range according to the feature data and the voice quality specified by the user, and the user can change the voice quality specified by the voice quality to another voice quality.
  • the voice quality if the voice quality is specified within the proper range without being aware of whether or not the voice quality of the converted feature data will be broken, the converted feature data indicating the voice quality expected by the user is generated. can do.
  • the point of view of the one-interface can also improve usability.
  • the presenting means presents, for each of a plurality of types of voice qualities, a range in which the voice qualities can be converted, and the accepting means presents a range within each of the voice qualities presented to the presenting means.
  • the conversion unit may change the feature data into the converted feature data according to the parameters of each voice quality received by the reception unit.
  • the presenting means presents, for each of the plurality of voice qualities, a graphic and a pointer that moves on the graphic according to a user operation, thereby presenting a range in which the voice qualities can be converted.
  • the accepting unit identifies a parameter specified by the user based on the position of the pointer on the graphic, and accepts the parameter.
  • the presenting unit when the user causes the accepting unit to accept a parameter that increases the brightness within a range in which, for example, the voice quality indicating the brightness presented by the presenting unit can be converted, the presenting unit For example, the range in which the conversion of voice quality indicating fast-talking can be converted is reduced, and when the user tries to specify a voice quality that further increases the rate of fast-talking, the voice quality of the conversion feature data is broken.
  • the parameters within the reduced range of the fast voice without being aware of whether or not it will occur, it is possible to generate converted feature data that shows the voice quality expected by the user.
  • the range changing means may change the range that can be converted by moving the pointer.
  • the presenting means displays the graphic in a bar shape, and the range changing means changes the convertible range by moving the pointer along the longitudinal direction of the graphic.
  • the presenting means arranges the figures and pointers for the respective voice qualities in parallel so that the more similar the change content based on the respective voice qualities, the narrower the gap between them. It is good.
  • the presenting means may include a figure and a port for each voice quality. Inters are arranged along the same circumference so that the more similar the content of change based on each voice quality, the smaller the angle between them.
  • the content of change based on voice quality indicating brightness and the content of conversion based on voice quality indicating fast-talking are similar.
  • the pointer corresponding to the voice quality of the brightness is also changed by the range changing means. Move to one end of the figure so that the range that can be increased is reduced. Therefore, by arranging and displaying the figures and pointers corresponding to these voice qualities near each other, the user can easily recognize a change in the range in which the voice qualities can be converted.
  • the range changing means may change the convertible range by deforming the figure.
  • the presenting means displays the graphic in a bar shape
  • the range changing means changes the range of the changeable extent by expanding and contracting the length of the graphic in the longitudinal direction.
  • the speech synthesizer is a speech synthesizer that converts a text indicated by text data into a synthesized speech, acquires the text data, and corresponds to the text of the text data.
  • Characteristic data generating means for generating characteristic data indicating characteristics of a sound to be reproduced
  • obtaining means for obtaining characteristic data generated by the characteristic data generating means
  • presenting means for presenting a convertible range of voice quality, and the presenting Within the range presented by the means
  • receiving means for receiving the voice quality specified by the user, the characteristic data acquired by the acquiring means, and the voice quality received by the receiving means, are presented by the presentation means.
  • Range changing means for changing the range of the synthesized voice to an appropriate range in which the voice quality of the synthesized voice does not break down, and the feature data acquired by the Conversion means for converting into conversion characteristic data indicating voice of voice quality received by the reception means; and voice output means for generating and outputting the synthesized voice based on the conversion characteristic data converted by the conversion means. It is characterized by.
  • the range in which the voice quality presented by the presentation means can be converted is the range of the feature data and Since the voice quality is changed to an appropriate range according to the voice quality specified by the user, the user should be conscious of whether or not the voice quality of the conversion feature data will fail when trying to specify another voice quality. If the voice quality is specified within the proper range, the text indicated by the text data can be converted into a synthesized voice with the voice quality expected by the user. As a result, the user interface viewpoint can be improved in usability.
  • the present invention can be realized not only as such a voice conversion device or a voice synthesis device, but also as a method and a program of an operation performed by the device, and also as a storage medium for storing the program. can do.
  • the voice quality conversion device of the present invention has an operational effect that the viewpoint power of the user interface and the usability can be improved.
  • FIG. 1 is a configuration diagram of a voice quality conversion device according to Embodiment 1 of the present invention.
  • FIG. 2 is an explanatory diagram illustrating an example of an operation of the voice quality conversion device according to the first embodiment.
  • FIG. 3 is an explanatory diagram for explaining another example of the operation of the voice quality conversion device of the above.
  • FIG. 4 is an explanatory diagram for explaining still another example of the operation of the voice quality conversion device of the above.
  • FIG. 5 is an explanatory diagram for explaining still another example of the operation of the above voice quality conversion device.
  • FIG. 6 is an explanatory diagram for explaining still another example of the operation of the voice quality conversion device of the above.
  • FIG. 7 is an explanatory diagram for explaining a constraint condition by the Indigo algorithm of the above.
  • FIG. 8 is a configuration diagram of a voice quality conversion device according to Embodiment 2 of the present invention.
  • FIG. 9 is an explanatory diagram for explaining the content presented by the voice quality adjustment unit according to the embodiment.
  • FIG. 10 is a flowchart showing an operation of an adjustment control unit according to the embodiment.
  • FIG. 11 is for explaining the content presented by the voice quality adjustment unit according to the first modification of the above.
  • FIG. 12 is an explanatory diagram for describing contents presented by a voice quality adjustment unit according to the second modification of the above.
  • FIG. 13A is an explanatory diagram for describing a distance between voice qualities according to Modification 3 of the above.
  • FIG. 13B is a diagram showing a display content of a voice quality adjustment unit according to the third modification of the above.
  • FIG. 14A is a diagram showing a display content of a voice quality adjustment unit according to Modification 4 of the above.
  • FIG. 14B is an explanatory diagram for explaining how the voice quality adjusting unit according to Modification 4 of the above changes the display content.
  • FIG. 15 is a configuration diagram of a speech synthesis device according to Embodiment 3 of the present invention.
  • FIG. 16 is a configuration diagram of a speech synthesis device according to a first modification of the above.
  • FIG. 17 is a configuration diagram of a speech synthesis device according to Modification 2 of the above.
  • FIG. 18 is a configuration diagram of a speech synthesizer according to a third modification of the above.
  • FIG. 19 is a configuration diagram of a speech synthesizer according to a fourth modification of the above.
  • FIG. 20 is a configuration diagram of a speech synthesizer according to a fifth modification of the above embodiment.
  • FIG. 1 is a configuration diagram of a voice quality conversion device according to Embodiment 1 of the present invention.
  • the voice quality conversion apparatus converts voice quality while preventing the occurrence of voice quality breakdown, and includes conversion section 101, voice quality adjustment section 103, adjustment control section 104, conversion coefficient A storage unit 105 and a limit value storage unit 106 are provided.
  • the conversion unit 101 acquires a feature parameter sequence pi indicating an acoustic feature of the speech.
  • the characteristic parameter sequence pi is data indicating the acoustic characteristics of the speech obtained as a result of analyzing the speech for each frame as a parameter, and the original speech is obtained by performing resynthesis based on this. It is.
  • the conversion unit 101 generates a deformed feature parameter sequence p2 by converting the parameter of the acoustic feature indicated by the feature parameter sequence pi according to the instruction from the voice quality adjustment unit 103.
  • the deformed feature parameter sequence p2 indicates the acoustic feature of the voice as a parameter, similar to the feature parameter sequence pi, and is used to generate a synthesized voice.
  • the voice quality of the synthesized speech (voice waveform) generated using the deformed feature parameter sequence p2 and the voice quality of the synthesized voice (voice waveform) generated using the feature parameter sequence pi Depending on.
  • the conversion coefficient storage unit 105 holds coefficient data serving as a template when the conversion unit 101 performs the conversion process.
  • voice quality adjusting section 103 When operated by the user, voice quality adjusting section 103 receives the converted voice quality expected by the user, and receives an instruction to change the voice quality from adjustment control section 104. Further, the voice quality adjustment unit 103 stores the By using the coefficient data, the conversion content according to the operation result of the user and the instruction of the adjustment control unit 104 is specified, and the conversion content is instructed to the conversion unit 101.
  • the voice quality adjustment unit 103 includes, for each type of voice quality, for example, for each brightness and quickness, a range bar B indicating a convertible range of the voice quality, and a movable range bar B on the range bar B.
  • a pointer P indicating the degree of voice quality conversion is displayed. The user operates the pointer P and moves it along the range bar B to set a desired voice quality.
  • the limit value storage unit 106 stores limit conditions (such as a limit value of a parameter indicating each acoustic feature) for obtaining a synthesized speech that maintains naturalness for the deformed feature parameter sequence p2.
  • limit conditions such as a limit value of a parameter indicating each acoustic feature
  • Adjustment control section 104 obtains feature parameter sequence pi, and also obtains the operation result of user on voice quality adjustment section 103.
  • Adjustment control section 104 estimates deformed characteristic parameter sequence p2 based on the characteristic parameter sequence pi and the operation result. Then, the adjustment control unit 104 compares the estimated modified feature parameter sequence p2 with the limit condition in the limit value storage unit 106. If the modified feature meta-string P2 does not satisfy the limit condition, the adjustment control unit 104 instructs the voice quality adjustment unit 103 to change the operation result of the user so as to satisfy the limit condition. That is, the adjustment control unit 104 determines whether or not there is a force that causes sound quality deterioration (breakage of voice quality) in the deformed feature parameter sequence p2 based on the conversion content set in the voice quality adjustment unit 103, and does not cause sound quality deterioration. Adjust the conversion details.
  • FIG. 2 is an explanatory diagram illustrating an example of an operation of the voice quality conversion device according to the present embodiment.
  • Voice quality adjustment section 103 accepts voice quality expected by the user by operating the plurality of pointers P by the user.
  • the voice quality adjustment unit 103 indicates four voice qualities that can be converted: brightness, darkness, masculinity, and fast voice.
  • the conversion range of these voice qualities is indicated by a range bar B scaled from 0 to 10.
  • the user designates the voice quality and the conversion amount to be converted by moving the pointer P corresponding to each voice quality within the range of 0 to 10 scales on the range bar B.
  • the voice quality adjusting unit 103 determines that conversion is not required for the voice quality, and as the indicated value approaches 10, the voice quality adjusting unit 103 Judge that a large conversion is required.
  • the voice quality adjustment unit 103 may be constituted by a volume switch or the like.
  • the characteristic parameter sequence pi is a parameter of an acoustic characteristic that can be adjusted, and is a fundamental frequency FO, a first formant frequency Fl, a second formant frequency F2, a frame duration FR, and a sound source power PW for each analysis frame.
  • a fundamental frequency FO a fundamental frequency
  • Fl a first formant frequency
  • F2 a second formant frequency
  • F2 a frame duration
  • PW sound source power
  • the coefficient data 105a held in the conversion coefficient storage unit 105 stores the above five acoustic characteristics of the feature parameter sequence pi when the indicated value power is increased by ⁇ in each voice quality of the voice quality adjustment unit 103. Indicates the value (coefficient) to be added to the parameter of.
  • the conversion unit 101 of the voice quality conversion apparatus acquires the feature parameter sequence pi, and The same modified feature parameter sequence P2 as the parameter sequence pi is output.
  • FIG. 3 is an explanatory diagram for describing another example of the operation of the voice quality conversion device according to the present embodiment.
  • the user sets the instruction value of the brightness of the voice quality adjustment unit 103 to 5 and the instruction value of the fast-talk to 3
  • the conversion unit 101 calculates the coefficient of each acoustic feature with respect to the brightness of the coefficient data 105a, Integrate with the brightness indication value (5). Further, the conversion unit 101 integrates the coefficient of each acoustic feature of the coefficient data 105a for the fast-talking and the indicated value (3) of the fast-talking. The conversion unit 101 adds up these integrated values for each acoustic feature, and further adds the result to the value of the feature parameter sequence pi. As a result, the conversion unit 101 generates the deformation feature parameter sequence p2.
  • FIG. 4 is an explanatory diagram for explaining still another example of the operation of the voice quality conversion device according to the present embodiment.
  • the user After the state of voice quality adjusting section 103 shown in FIG. 3, the user sets the instruction value of darkness to 7 for voice quality adjusting section 103.
  • adjustment control section 104 changes the instruction value set by voice quality adjustment section 103 to an instruction value that is easy for the user to operate.
  • the adjustment control unit 104 specifies the above-described relationship between the brightness and the darkness from the coefficient data 105a, and when the darkness indication value is set to 7, first reduces the brightness indication value from 5 to 0. To the voice quality adjustment unit 103, and further instructs the voice quality adjustment unit 103 to reduce the indicated value of darkness from 7 to 2.
  • the instruction value of darkness is set to 7 while the instruction value of brightness is 5;
  • the effect of changing the indicated value of brightness from 5 to 0 is the same as the effect of increasing the indicated value of darkness by 5. Therefore, instead of setting the indicated value of darkness to 7, the adjustment control unit 104 determines that the indicated value of brightness should be set to 0 and the indicated value of darkness should be set to 2. Then, adjustment control unit 104 instructs voice quality adjustment unit 103 of the determination result, and changes the instruction value set by the user.
  • the adjustment control unit 104 adjusts the instruction values set in the voice quality adjustment unit 103 by the user so that the values of the respective instruction values become minimum, so that the user can easily operate V, An interface can be built.
  • FIG. 5 is an explanatory diagram for explaining still another example of the operation of the voice conversion device according to the present embodiment.
  • the user sets, for example, an instruction value of brightness to 10.
  • Indicates PW 30.
  • limit value storage section 106 stores limit conditions indicating that the maximum value of fundamental frequency F0 is 350. That is, the limit condition indicates that when the value of the fundamental frequency F0 of the modified feature parameter sequence p2 exceeds 350, the sound quality of the synthesized sound generated based on the modified feature parameter sequence p2 is significantly deteriorated.
  • FIG. 6 is an explanatory diagram for explaining still another example of the operation of the voice quality conversion device according to the present embodiment.
  • the user From the state of voice quality adjustment section 103 shown in FIG. 5, that is, the state in which the brightness instruction value is 10 and all other instruction values are 0, the user further sets the instruction value of the fast-talk to 5 .
  • Adjustment control section 104 estimates modified feature parameter sequence p2 when conversion processing according to the instruction value set by the user for voice quality adjustment portion 103 is performed on feature parameter sequence pi.
  • the adjustment control unit 104 determines whether or not the parameters of each acoustic feature of the estimated deformation feature parameter sequence p2 satisfy the limit condition of the limit value storage unit 106.
  • the adjustment control unit 104 controls the voice quality adjustment unit 103 to change the indicated value so that the parameter satisfies the limit condition.
  • Make instructions At this time, for example, the adjustment control unit 104 gives an instruction to give priority to the instruction value recently set by the user, or gives an instruction to give priority to the largest instruction value.
  • the adjustment control unit 104 estimates the deformed feature parameter sequence p2 in this case, and It is determined that the fundamental frequency F0 (355) in column p2 does not satisfy the limit condition (350 or less). As a result, the adjustment control unit 104 instructs the voice quality adjustment unit 103 to reduce the brightness instruction value by 1 so that the value of the fundamental frequency F0 of the deformed feature parameter sequence p2 is set to 350 or less. I do.
  • voice quality adjusting section 103 changes the indicated value of brightness from 10 to 9. in this way,
  • the adjustment control unit 104 adjusts the indicated value according to the limit condition, so that the user can perform the voice conversion operation so that the voice quality does not break down without being aware of the limit value of the parameter of each acoustic feature. it can.
  • the adjustment control unit 104 may refer to the limit conditions stored in the limit value storage unit 106 as needed.
  • the limit condition indicates the limit value for each parameter of each acoustic feature such that the value of the fundamental frequency FO must not exceed 350, or adds the value of the fundamental frequency FO and the value of the second formant frequency F2. It may show data that the result should not exceed 2000.
  • the conversion given to the characteristic parameter sequence pi by the conversion unit 101 may not be uniform for all analysis frames, and the coefficient data 105a of the conversion coefficient storage unit 105 may be different for each analysis frame! /, You can! / ,.
  • the adjustment of the indicated value by the adjustment control unit 104 may be automatically performed using a constraint satisfaction algorithm.
  • An example of the constraint satisfaction algorithm is the Indigo algorithm (A. Borning, R. Anderson, B. Freeman-Benson: The Indigo Algontnm, TR
  • FIG. 7 is an explanatory diagram for describing a constraint condition by the Indigo algorithm.
  • the constraint condition shown in FIG. 7 is for adjusting the indicated value shown in FIG. 6 with respect to the fundamental frequency F0, and is described as follows in the constraint hierarchy of the Indigo algorithm.
  • variables tl to t8 are variables for holding intermediate results of the calculation. Although omitted in FIG. 7 for the sake of simplicity, in order to obtain more desirable results, it is desirable to provide a REQUIRED constraint that binds each indicated value to a value between 0 and 10.
  • FIG. 8 is a configuration diagram of a voice quality conversion device according to Embodiment 2 of the present invention.
  • the voice quality conversion apparatus has improved viewpoint and usability of a user interface, and includes a conversion section 101, a voice quality adjustment section 103a, an adjustment control section 104a, and a conversion coefficient storage section 105. And a limit value storage unit 106.
  • a conversion section 101 includes a voice quality adjustment section 103a, an adjustment control section 104a, and a conversion coefficient storage section 105.
  • a limit value storage unit 106 includes a limit value storage unit 106.
  • the voice quality adjusting unit 103a receives the converted voice quality expected by the user when operated by the user. That is, the voice quality adjustment unit 103a has a function as a receiving unit that receives a voice quality specified by the user. Further, the voice quality adjustment unit 103a specifies the conversion content according to the operation result of the user by using the coefficient data 105a stored in the conversion coefficient storage unit 105, and sends the conversion content to the conversion unit 101. Instruct. Specifically, similarly to the voice quality adjusting section 103 of the first embodiment, the voice quality adjusting section 103a includes, for each type of voice quality, for example, for each brightness or fast-talking, a range indicating a convertible range (absolute range) of the voice quality.
  • a bar B and a pointer P which is movable on the range bar B and indicates the degree of the voice quality are displayed.
  • the user operates the pointer P to move along the range bar B to set a desired voice quality.
  • the voice quality adjustment unit 103a has a function as a presentation unit that presents a range bar B and a pointer P to present a range that can be further converted from the current voice quality conversion degree. .
  • voice quality adjusting section 103a in the present embodiment receives an instruction of a conversion range for each voice quality from adjustment control section 104, and presents only the instructed conversion range to the user. That is, the voice quality adjustment unit 103a changes the length of the range bar B to a length corresponding to the conversion range instructed by the adjustment control unit 104, and moves the pointer P to a position other than on the range bar B. Ban.
  • the adjustment control unit 104a acquires the characteristic parameter sequence pi, the operation result of the user on the voice quality adjustment unit 103a, and the limit condition of the limit value storage unit 106. Then, the adjustment control unit 104 derives an appropriate conversion range of each voice quality in the voice quality adjustment unit 103a based on the characteristic parameter sequence pi, the operation result, and the limit condition. The adjustment control unit 104a determines the derived The switching range is instructed to the voice quality adjustment unit 103a. That is, the adjustment control unit 104a breaks down the range presented by the voice quality adjustment unit 103a to the voice quality indicated by the modified feature parameter sequence p2 according to the characteristic parameter sequence pl and the voice quality received by the user in the voice quality adjustment unit 103a. It has a function as a range changing means for changing to an appropriate range in which no problem occurs.
  • FIG. 9 is an explanatory diagram for describing the content presented by voice quality adjusting section 103a of the present embodiment.
  • the user sets the pointers P of the respective voice qualities of the voice timbre adjusting unit 103a so that the indicated values are all zero.
  • the user sets the pointer P of the voice quality indicating the brightness of the voice quality adjusting unit 103a so that the indicated value becomes 10.
  • the limit conditions are that the fundamental frequency F0 of the deformed feature parameter sequence p2 is 350 or less, the first formant frequency F1 is 600 or less, the second formant frequency F2 is 1700 or less, the frame duration FR is 100, and the sound source power PW is 50.
  • the adjustment control unit 104a compares each parameter of the deformation feature parameter sequence p2 with the limit condition, and determines that the fundamental frequency F0 cannot be further increased. That is, the adjustment control unit 104a determines that the conversion range of the voice quality indicating the fast-talking is limited to only the 0 scale, and instructs the voice quality adjustment unit 103a of the determination result.
  • the voice quality adjusting unit 103a shortens the range bar B of the length of 10 scales corresponding to the voice quality of the fast-talk to a length of 0 scale. To display this.
  • the length of the fast-talk range bar B becomes only 0 divisions, so that the user cannot move the fast-talk pointer P. . Therefore, it is possible to prevent the occurrence of voice deterioration, that is, the breakdown of voice quality, by increasing the calorie of the instruction value of the fast mouth.
  • the adjustment control unit 104a performs the following based on the setting: Again, as described above, for voice qualities other than voice qualities that indicate brightness, the limit condition in the limit value storage unit 106 (fundamental frequency FO of the deformed feature parameter sequence p2 is 350 or less). Derive a range. That is, the adjustment control unit 104a determines that the conversion range of the voice quality indicating the fast-talking is limited to five scales, and instructs the voice quality adjustment unit 103a of the determination result.
  • the voice quality adjustment unit 103a sets the range bar B corresponding to the voice quality of the fast-talking to a length of five scales, that is, a length corresponding to scales 0 to 5. Display this after a long time.
  • FIG. 10 is a flowchart showing the operation of adjustment control section 104a in the present embodiment.
  • the adjustment control unit 104a acquires the characteristic parameter sequence pi (step S100), and specifies the contents of settings made by the user for the voice quality adjustment unit 103a (step S102).
  • the adjustment control unit 104a estimates a modified feature parameter sequence p2 based on the feature parameter sequence pi and the settings of the voice quality adjustment unit 103a (step S104).
  • the adjustment control unit 104a derives an appropriate conversion range for each voice quality of the voice quality adjustment unit 103a based on the estimated deformation feature parameter sequence p2 and the limit condition of the limit value storage unit 106 (Step S106).
  • adjustment control section 104a instructs voice quality adjustment section 103a of the derived proper conversion range, and displays range bar B having a length corresponding to the conversion range (step S108).
  • the convertible range of the voice quality presented by voice quality adjusting section 103a is changed to an appropriate range according to feature parameter sequence pi and the voice quality specified by the user. Therefore, when the user wants to specify another voice quality, the user can specify the voice quality within an appropriate range without being aware of whether or not the voice quality of the deformed feature parameter sequence p2 is broken. Thus, it is possible to generate a deformed feature parameter sequence indicating the voice quality expected by the user. As a result, usability can be improved from the viewpoint of the user interface.
  • Modification 1 a first modified example regarding the display method of voice quality adjusting section 103a in the present embodiment will be described.
  • the voice quality adjustment unit 103a is configured such that the pointer P can be moved by the conversion range instructed by the adjustment control unit 104a without changing the length of the range bar B. Change position.
  • FIG. 11 is an explanatory diagram for describing the content presented by the voice quality adjusting unit 103a according to the present modification.
  • the user sets the pointers P of the respective voice qualities of the voice timbre adjusting unit 103a such that the indicated values are all zero.
  • the user sets the pointer P of the voice quality indicating the brightness of the voice quality adjusting unit 103a so that the indicated value becomes 10.
  • adjustment control section 104a determines that the conversion range of voice quality indicating fast-talking is limited to only 0 scales, and instructs voice quality adjustment section 103a of the determination result.
  • the voice quality adjustment unit 103a that has received such an instruction moves the pointer P corresponding to the voice quality of the fast-talking to the position of the scale 10, and displays it, as shown in (b) of FIG. . That is, the instruction content of the adjustment control unit 104a indicates that the conversion range of the voice quality indicating the fast-talking is limited to only the 0 scale, and indicates that the pointer P cannot be moved in the increasing direction of the scale. Therefore, the voice quality adjustment unit 103a according to the present modification moves the pointer P to a position where the pointer P cannot be moved in the scale increasing direction, that is, the position of the scale 10, and displays it.
  • the voice quality adjusting unit 103a merely moves the pointer P corresponding to the voice quality of the voice, and the conversion of the parameter of the acoustic feature with the indication value of the voice quality of the voice being 10 to the conversion unit 101. I will not tell you. In this way, the pointer P of the fast-talk is displayed on the scale 10 (maximum value) in conjunction with the setting of the brightness indication value being 10, so that the voice degradation caused by increasing the fast-talk indication value is provided. , That is, breakdown of voice quality can be prevented.
  • the adjustment control unit 104a performs, based on the setting, Again, as described above, it is determined that the conversion range of the voice quality indicating the fast voice is limited to five scales, and the determination result is instructed to the voice quality adjustment unit 103a.
  • the voice quality adjustment unit 103a Upon receiving such an instruction, the voice quality adjustment unit 103a, as shown in FIG. Move the pointer P corresponding to the quality to the position of the scale 5 to display it. That is, the instruction content of the adjustment control unit 104a indicates that the conversion range of the voice quality indicating the fast-talking is limited to only five graduations, and indicates that the pointer P is powered in the increasing direction by five graduations. Therefore, the voice quality adjustment unit 103a according to the present modification moves the pointer P to a position where the pointer P can be moved in the increasing direction by five graduations, that is, the position of the graduation 5, and displays it.
  • the voice quality adjustment unit 103a merely moves the pointer P corresponding to the voice quality of the voice, and sets the indicator value of the voice quality of the voice to 5 to convert the parameter of the acoustic feature to the conversion unit 101. I will not tell you.
  • the voice quality adjustment unit 103a displays the movable range of the pointer P without changing the length of the range bar B in characters.
  • FIG. 12 is an explanatory diagram for describing the content presented by voice quality adjusting section 103a according to the present modification.
  • the user sets the pointers P of each voice quality of the voice quality adjustment unit 103a such that the indicated values are all zero.
  • the user sets the pointer P of the voice quality indicating the brightness of the voice quality adjusting unit 103a so that the indicated value becomes 10.
  • adjustment control section 104a determines that the conversion range of voice quality indicating fast-talking is limited to only 0 scales, and instructs voice quality adjustment section 103a of the determination result.
  • the voice quality adjusting unit 103a receives such an instruction, places the word "up to here" at the position of the scale 0 of the range bar B corresponding to the voice quality of the fast-talk. indicate. Further, even if the user operates to move the pointer P corresponding to the voice quality of the fast-talked voice while such characters are displayed, the voice quality adjustment unit 103a does not accept the operation and the position of the pointer P Is fixed.
  • the adjustment control unit 104a performs, based on the setting, Again, as described above, it was determined that the conversion range of the voice quality indicating the quick speech was limited to only 5 scales. The result of the determination is instructed to the voice quality adjusting unit 103a.
  • the voice quality adjusting unit 103a receives such an instruction, displays the word "up to here" at the position of the scale 5 on the range bar B corresponding to the voice quality of the fast-talk. indicate. Further, even if such a character is displayed and the user operates to move the pointer P corresponding to the quick voice quality to the scale 5 or more while the character is displayed, the voice quality adjusting unit 103a does not accept the operation. Keep the position of pointer P at scale 5 or less.
  • the voice quality adjustment unit 103a arranges the range bar B and the pointer P corresponding to each voice quality such that the closer the change in the voice quality, the closer to each other, and presents it to the user.
  • the voice quality adjustment unit 103 obtains the coefficient data 105a stored in the transform coefficient storage unit 105, and based on the coefficient data 105a, determines the similarity of the change content between the voice qualities such as brightness and darkness. Specify the degree. For example, the voice quality adjustment unit 103 derives a difference value of a coefficient for each acoustic feature between voice qualities indicated by the coefficient data 105a, and obtains a Euclidean distance (hereinafter simply referred to as a distance) between voice qualities from the difference value. Based on this distance, voice quality adjusting section 103 specifies the similarity between voice qualities.
  • a distance Euclidean distance
  • FIG. 13A is an explanatory diagram for describing the distance between voice qualities.
  • the voice quality adjusting unit 103a calculates the distance between the voice qualities as shown in FIG. 13A. For example, the distance between voice quality indicating masculinity and voice quality indicating loudness is 5.4, and the distance between voice quality indicating brightness and voice quality indicating loudness is 11.3. .
  • the voice quality adjusting unit 103a determines that the two voice qualities are more similar as the voice qualities are closer in the calculated distance, and the range bars B and the pointer P indicating the voice qualities are closer to each other.
  • FIG. 13B is a diagram showing display contents of voice quality adjustment section 103a.
  • the voice quality adjusting unit 103 sets the range bar B of each voice quality in the order of voice quality indicating masculinity, voice quality indicating length, voice quality indicating brightness, and voice quality indicating fast-talking. Pointer P is presented.
  • This modification example when combined with the first modification example, can provide a voice quality conversion operation that is intuitive and easy for the user to compose.
  • the range bar B and the pointer P having similar voice quality changes are arranged close to each other, when the user operates the pointer P having a certain voice quality, the other nearby bars are arranged.
  • the pointer P of voice quality moves in the same direction and moves farther away, and the pointer P of another voice quality moves in the opposite direction. Therefore, the user can intuitively understand how the voice quality is converted by operating the pointer P.
  • the voice quality adjusting unit 103a of the third modification arranges the range bars B and the pointers P corresponding to the voice qualities in a line so that the closer the voice variance is, the closer the variation is.
  • the voice quality adjustment unit 103a according to the present modification sets the range bar B and the pointer P corresponding to each voice quality on the same circumference so that the closer the voice content changes, the smaller the angle between them becomes. Place along.
  • FIG. 14A is a diagram showing the display content of voice quality adjusting section 103a.
  • the voice quality adjustment unit 103a summarizes the lower limit of the range bar B of each voice quality at one point and displays each range bar B along the same circle.
  • the voice quality adjustment unit 103 determines that the angle between the masculinity range bar B and the darkness range bar B is the smallest, and that the manhood range bar B and the fast-talk range bar B The angle between Present each range bar B to be the largest.
  • the voice quality adjustment unit 103a according to the present modification may also have the function of changing the position of the pointer P based on the display method described in the first modification, that is, the instruction from the adjustment control unit 104a.
  • FIG. 14B is an explanatory diagram for explaining how the voice quality adjusting unit 103a changes the display content.
  • the voice quality adjustment unit 103a when the user moves the voice quality pointer P indicating the tone in the scale increasing direction, the voice quality adjustment unit 103a, based on the instruction from the adjustment control unit 104a, the voice quality pointer indicating the masculinity. Move P in the direction of increase in the scale, and move each pointer P of the voice quality indicating the brightness and the quickness in the direction of decrease in the scale.
  • FIG. 15 is a configuration diagram of a speech synthesis device according to Embodiment 3 of the present invention.
  • This speech synthesizer is a device capable of acquiring text data and performing speech synthesis with various voice qualities.
  • a database 202, a waveform generation unit 203, and a speaker 204 are provided.
  • the speech synthesis database 202 accumulates segment data indicating a plurality of speech segments. Upon acquiring the text data tdl based on the operation of the user, the speech synthesis unit 201 selects the segment data corresponding to the text indicated by the text data tdl from the speech synthesis database 202. Then, the speech synthesis unit 201 generates a feature parameter sequence pi using the selected segment data, and outputs the feature parameter sequence pi to the voice conversion device.
  • the voice conversion device upon acquiring the characteristic parameter sequence pi, the voice conversion device converts the voice represented by the characteristic parameter sequence pi. Then, the voice conversion device generates and outputs a transformed feature parameter sequence p2 indicating the result of the conversion.
  • the waveform generating unit 203 Upon acquiring the deformed feature parameter sequence p2 from the voice conversion device, the waveform generating unit 203 generates a waveform signal si indicating the deformed feature parameter sequence p2 as a speech waveform, and generates the waveform signal s 1 is output to the speaker 204.
  • the speaker 204 outputs a synthesized voice corresponding to the waveform signal si.
  • the speech synthesis device includes the voice quality conversion device according to the second embodiment, and outputs the contents of text data tdl in a voice with a desired voice quality of the user without failure. And the usability can be further improved.
  • the voice conversion device of the first embodiment may be provided in the speech synthesis device of the present embodiment.
  • FIG. 16 is a configuration diagram of a speech synthesizer according to the present modification.
  • the adjustment control unit 104b of the voice conversion apparatus is different from the force adjustment control unit 104a having the same function as the adjustment control unit 104a of the second embodiment in that it acquires the characteristic parameter sequence pi, The unit data stored in the database 202 is obtained.
  • the adjustment control unit 104b detects the sound quality deterioration of the synthesized speech based on the segment data of the speech synthesis database 202 instead of the feature parameter sequence pi, and thereby the voice quality adjustment unit 103 Change the position of the pointer P, or change the length of the range bar B.
  • the adjustment control unit 104b predicts the tendency of the parameter of the acoustic feature indicated by the feature parameter sequence pi by using a part or all of the segment data stored in the speech synthesis database 202, and generates a prediction result. Based on this, the position of the pointer P and the length of the range bar B are changed.
  • the adjustment control unit 104b selects all the segment data one by one from the speech synthesis database 202 and determines whether or not the quality of the synthesized speech is degraded when the segment data is converted according to the voice quality adjustment unit 103a. Change the position of pointer P to the reference.
  • the speech synthesizer can make the processing content of the adjustment control unit 104b the same no matter what text data tdl is input unless the speech synthesis database 202 is replaced. , The processing content can be simplified. However, if the contents of the characteristic parameter sequence pi greatly differ depending on the contents of the text data tdl, The quality of synthesized speech may be degraded depending on the content of the data tdl.
  • the feature parameter sequence pi in the present modified example does not have to have the unit data power of the speech synthesis database 202 generated by the speech synthesis processing by the speech synthesis unit 201.
  • the feature parameter sequence pi used in the present modification is sufficiently similar to the voice quality indicated by the feature parameter sequence pi generated in this way, if it is similar to the feature parameter sequence pi generated by some other method, There may be.
  • FIG. 17 is a configuration diagram of a speech synthesizer according to the present modification.
  • the speech synthesizer according to the present modification stores a feature table that holds, as a feature table, only data necessary for estimating quality degradation of synthesized speech among a plurality of segment data stored in the speech synthesis database 202.
  • a part 205 is provided.
  • the feature table held in the feature table storage unit 205 includes, for example, an upper limit value and a lower limit value of a parameter for each acoustic feature among all segment data stored in the speech synthesis database 202. Only the value and the average value are extracted.
  • the adjustment control unit 104c according to the present modification is different from the force adjustment control unit 104a having the same function as the adjustment control unit 104a of the second embodiment in The above-mentioned feature table stored in the feature table storage unit 205 is obtained.
  • the adjustment control unit 104c estimates the quality degradation of the synthesized speech based on the feature table of the feature table storage unit 205 instead of the feature parameter sequence pi, and performs voice quality adjustment.
  • the position of the pointer P in the section 103 is changed, and the length of the range bar B is changed.
  • the adjustment control unit 104c according to the present modification uses a feature table having a small amount of information, unlike the adjustment control unit 104b according to the first modification, in which a large amount of segment data of the speech synthesis database 202 is used.
  • the position of the pointer P and the length of the range bar B can be quickly changed.
  • the feature parameter sequence pi in this modification example is also generated by the speech synthesis processing by the speech synthesis unit 201, as in the first modification example. It doesn't have to be something. In other words, if the feature parameter sequence pi used in this modification is sufficiently similar to the voice quality indicated by the feature parameter sequence pi generated in this way, it is a feature parameter sequence pi generated by some other method. Also good.
  • FIG. 18 is a configuration diagram of a speech synthesizer according to the present modification.
  • the speech synthesis device includes a speech synthesis unit 201a instead of speech synthesis unit 201 in the present embodiment. Further, the voice quality conversion device according to the present modification includes a conversion unit 101a and an adjustment control unit 104b instead of the conversion unit 101 and the adjustment control unit 104a.
  • the adjustment control unit 104b changes the position of the pointer P of the voice quality adjustment unit 103a or changes the length of the range bar B based on the unit data of the speech synthesis database 202. Or change it.
  • the conversion unit 101a performs processing on the segment data stored in the speech synthesis database 202.
  • the audio characteristic indicated by the segment data is converted.
  • the speech synthesis unit 201a Upon acquiring the text data tdl, the speech synthesis unit 201a converts the segment data corresponding to the text indicated by the text data tdl and converted for the voice quality (acoustic feature) into a conversion unit. Obtained from 101a. Then, the speech synthesis unit 201a generates a deformed feature parameter sequence p2 using the obtained converted unit data, and outputs the deformed feature parameter sequence p2 to the waveform generating unit 203.
  • the voice synthesizing apparatus includes the feature table storage unit 205 according to the second modification, and includes the adjustment control unit 104c according to the second modification instead of the adjustment control unit 104b of the voice conversion device. May be.
  • FIG. 19 is a configuration diagram of a speech synthesis device according to the present modification.
  • the speech synthesis apparatus includes a speech analysis unit 206 instead of the speech synthesis unit 201 and the speech synthesis database 202.
  • the voice analysis unit 206 acquires voice waveform data dl that is a real voice and indicates the voice waveform, and generates a feature parameter sequence p1 based on the voice waveform data dl.
  • the conversion unit 101 and the adjustment control unit 104a of the voice quality conversion device obtain the characteristic parameter sequence p 1 generated as described above from the voice analysis unit 206.
  • the voice synthesizer of the present modified example converts the voice quality of the voice spoken by the user in real time and outputs the voice as synthesized voice. Further, with this configuration, it is possible to perform voice quality conversion processing on synthesized voice generated from the real voice voice waveform data dl while preventing quality deterioration through an interface that is intuitively easy to operate.
  • the voice quality conversion device may include the voice analysis unit 206.
  • FIG. 20 is a configuration diagram of a speech synthesizer according to the present modification.
  • the speech synthesis apparatus includes a speech analysis unit 206 instead of the speech synthesis unit 201 and the speech synthesis database 202, similarly to the speech synthesis apparatus of the fourth modification. Further, the voice quality conversion device according to the present modification includes an adjustment control unit 104d instead of the adjustment control unit 104a.
  • the adjustment control unit 104d acquires the waveform feature table td2 instead of acquiring the feature parameter sequence pi as in the force adjustment control unit 104a having the same function as the adjustment control unit 104a. That is, the adjustment control unit 104d according to the present modification estimates the position of the pointer P of the voice quality adjustment unit 103a by estimating the quality deterioration of the synthesized speech based on the waveform feature table td2 instead of the feature parameter sequence pi. Or change the length of the range bar B.
  • the waveform feature table td2 contains, for example, only data necessary for estimating the quality degradation of the synthesized speech from the result of analyzing the sample speech previously uttered by the same speaker who uttered the speech waveform data dl. Is extracted.
  • the waveform feature table td2 is obtained by extracting only the upper limit value, the lower limit value, and the average value from the parameters of each acoustic feature that is the analysis result of the sample voice.
  • the adjustment control unit 104d may select any one of the plurality of waveform feature tables td2 from which the plurality of waveform feature tables td2 may be acquired.
  • the adjustment control unit 10 4d selects and uses the waveform feature table t2 that best represents the features of the speech waveform data dl and the feature parameter sequence pi based on attributes such as the age and gender of the speaker.
  • the adjustment control unit 104d of the present modification uses the waveform feature table td2 with a small amount of information instead of using the feature parameter sequence pi with a large amount of information, thereby obtaining the position of the pointer P and the length of the range bar B. Changes can be made quickly.
  • the voice conversion device of the present invention has an effect that the viewpoint power of the user interface and the usability can be improved.
  • an agent application using a synthetic sound ⁇ a text-to-speech application, a communication using a voice conversion function It is useful as a device or a voice quality editor device.

Abstract

There is provided a voice changer having improved user-friendliness from the viewpoint of user interface. The voice changer includes: a voice quality adjustment unit (103) for indicating a range where the voice quality can be changed and receiving a voice quality specified by a user in the indicated range; an adjustment control unit (104) for acquiring a feature parameter string (p1) and changing the range indicated by the voice quality adjustment unit (103) to an appropriate range where no crash occurs in the voice quality indicated by a changed feature parameter string (p2) according to the acquired feature parameter string (p1) and the voice quality received by the voice quality adjustment unit (103); and a conversion unit (101) for acquiring the feature parameter string (p1) and converting the acquired feature parameter string (p1) to the changed feature parameter string (p2) indicating the voice of the quality received by the voice quality adjustment unit (103).

Description

技術分野  Technical field
[0001] 本発明は、音声の声質を変換する声質変換装置に関する。  The present invention relates to a voice quality conversion device that converts voice quality of voice.
背景技術  Background art
[0002] 音声を人工的に生成する音声合成装置には、合成音声の声質を変換する声質変 換装置を備えているものがある。(例えば、特許文献 1及び特許文献 2参照。 )0[0002] Some voice synthesizers that artificially generate voice include a voice quality conversion device that converts the voice quality of a synthesized voice. (E.g., see Patent Documents 1 and 2.) 0 Akira
[0003] 上記特許文献 1の声質変換装置は、複数の話者の音声から生成された合成単位 田  [0003] The voice quality conversion device of Patent Document 1 described above uses a synthesis unit generated from voices of a plurality of speakers.
データが予め蓄積されたデータベースを備えて 、る。音声合成に用いた 、合成単位 と声質とが指定されると、声質変換装置は、まず、指定された合成単位に最も近い合 成単位データをデータベース力も選択する。次に、声質変換装置は、その選択した 合成単位データの話者の声質が指定された声質とどの程度異なる力をチェックし、所 定の程度以上異なる場合には、指定された声質に近くなるようにその合成単位デー タに対し声質変換を行う。具体的に、声質変換装置は、選択した合成単位データの コードブック (声質の特徴を表した情報)から、指定の声質にマッチする声質を持った コードブックへコードブックマッピングを行うことにより、選択された合成単位データの 声質を指定の声質に変換して 、る。  A database in which data is stored in advance is provided. When the synthesis unit and the voice quality used for speech synthesis are specified, the voice conversion device first selects the synthesis unit data closest to the specified synthesis unit also in the database. Next, the voice conversion device checks how different the voice quality of the speaker of the selected synthesized unit data is from the specified voice quality, and if the voice quality differs from the specified voice quality by more than a predetermined level, the voice quality conversion device approaches the specified voice quality. In this way, voice conversion is performed on the synthesized unit data. Specifically, the voice conversion device performs codebook mapping from the codebook (information representing characteristics of voice quality) of the selected synthesized unit data to a codebook having a voice quality that matches the specified voice quality. The voice quality of the synthesized unit data is converted to the specified voice quality.
[0004] また、上記特許文献 2の声質変換装置は、デジタル音声データをアナログ音声信 号に変換する際のサンプリング周波数を変換することにより合成音声の声質を変換し ている。さらに、この声質変換装置は、出力される音声が適切になるように基本周波 数や音韻継続時間などの 、わゆる韻律情報 (スペクトルパラメタ)をサンプリング周波 数の変更に応じて適切に設定している。  [0004] Further, the voice quality conversion device of Patent Document 2 converts voice quality of synthesized voice by converting a sampling frequency when converting digital voice data into an analog voice signal. In addition, this voice quality conversion device appropriately sets so-called prosodic information (spectral parameters) such as a fundamental frequency and a phoneme duration in accordance with a change in the sampling frequency so that the output voice is appropriate. I have.
[0005] このような特許文献 1及び特許文献 2の声質変換装置では、その変換された声質に 破綻が生じることがある。そこで、変換後の声質に破綻が生じないように、声質を示す パラメタを修正する声質変換装置が提案されている (例えば、特許文献 3参照)。 特許文献 1:特開平 07- 319495号公報  [0005] In such voice quality conversion devices of Patent Document 1 and Patent Document 2, the converted voice quality may fail. Therefore, a voice quality conversion device that corrects a parameter indicating voice quality so that the converted voice quality does not break down has been proposed (for example, see Patent Document 3). Patent Document 1: Japanese Patent Application Laid-Open No. 07-319495
特許文献 2 :特開平 08— 152900号公報 特許文献 3:特開 2000— 187491号公報 Patent Document 2: JP 08-152900A Patent Document 3: Japanese Patent Application Laid-Open No. 2000-187491
発明の開示  Disclosure of the invention
発明が解決しょうとする課題  Problems to be solved by the invention
[0006] し力しながら、上記特許文献 3の声質変換装置では、ユーザインターフェースの観 点から使 、勝手が悪 、と 、う問題がある。  [0006] However, the voice quality conversion device of Patent Document 3 described above has a problem in that it is used from the viewpoint of a user interface, and is not easy to use.
[0007] 即ち、上記特許文献 3の声質変換装置では、声質の破綻を防 、だ状態でどの程度 の範囲で声質を変換することができるかをユーザは把握することができな 、。したが つて、ユーザは、声質の破綻が生じるにも関わらず所望の声質に変換するように声質 変換装置に対して指示してしまうことがある。その結果、声質変換装置は声質の破綻 を防ぐために、ユーザが指定する声質とは異なる声質に変換してしまう。  [0007] That is, with the voice quality conversion device of Patent Document 3, it is possible to prevent the voice quality from breaking down, and the user cannot grasp the extent to which the voice quality can be converted while the voice quality is still low. Therefore, the user may instruct the voice conversion device to convert to a desired voice quality even though the voice quality is broken. As a result, the voice quality conversion device converts the voice quality to a voice quality different from the voice quality specified by the user in order to prevent the voice quality from being broken.
[0008] そこで、本発明は、力かる問題に鑑みてなされたものであって、ユーザインターフエ ースの観点力も使い勝手を向上した声質変換装置を提供することを目的とする。 課題を解決するための手段  [0008] Therefore, the present invention has been made in view of a powerful problem, and an object of the present invention is to provide a voice quality conversion device in which the viewpoint of a user interface is improved in usability. Means for solving the problem
[0009] 上記目的を達成するために、本発明に係る声質変換装置は、音声の特徴を示す特 徴データを、前記音声と異なる声質の音声を示す変換特徴データに変換する声質 変換装置であって、前記特徴データを取得する取得手段と、声質の変換可能な範囲 を提示する提示手段と、前記提示手段により提示された範囲内において、ユーザの 指定する声質を受け付ける受付手段と、前記取得手段に取得された特徴データ、及 び前記受付手段で受け付けられた声質に応じて、前記提示手段で提示される範囲 を、前記変換特徴データの示す声質に破綻が生じない適正範囲に変更させる範囲 変更手段と、前記取得手段により取得された特徴データを、前記受付手段で受け付 けられた声質の音声を示す変換特徴データに変換する変換手段とを備えることを特 徴とする。 [0009] To achieve the above object, a voice quality conversion device according to the present invention is a voice quality conversion device for converting feature data indicating a feature of a voice into conversion feature data indicating a voice having a voice quality different from the voice. Acquisition means for acquiring the characteristic data, presentation means for presenting a range in which voice quality can be converted, reception means for receiving a voice quality specified by a user within the range presented by the presentation means, In accordance with the acquired feature data and the voice quality received by the receiving means, the range presented by the presenting means is changed to an appropriate range in which the voice quality indicated by the converted feature data does not fail. Means, and conversion means for converting the characteristic data obtained by the obtaining means into conversion characteristic data indicating voice of voice quality received by the receiving means. And butterflies.
[0010] これにより、提示手段により提示される声質の変換可能な範囲は、特徴データ及び ユーザの指定する声質に応じて適正範囲に変更されるため、ユーザはその指定する 声質力もさらに他の声質を指定しょうとするときには、変換特徴データの声質に破綻 が生じるか否かについて意識することなぐその適正範囲内で声質を指定すれば、ュ 一ザの期待通りの声質を示す変換特徴データを生成することができる。その結果、ュ 一ザインターフェースの観点力も使い勝手を向上することができる。 [0010] With this, the convertible range of the voice quality presented by the presentation means is changed to an appropriate range according to the feature data and the voice quality specified by the user, and the user can change the voice quality specified by the voice quality to another voice quality. When specifying the voice quality, if the voice quality is specified within the proper range without being aware of whether or not the voice quality of the converted feature data will be broken, the converted feature data indicating the voice quality expected by the user is generated. can do. As a result, The point of view of the one-interface can also improve usability.
[0011] また、前記提示手段は、複数種の声質ごとに、当該声質の変換可能な程度の範囲 を提示し、前記受付手段は、前記提示手段に提示された声質ごとの各範囲内におい て、ユーザの指定する声質の程度をパラメタとして受け付け、前記範囲変更手段は、 前記受付手段で変換するように受け付けられた声質のパラメタに応じて、前記提示 手段で提示される他の声質の範囲を変更させ、前記変換手段は、前記受付手段で 受け付けられた各声質のパラメタに応じて、前記特徴データを前記変換特徴データ に変換することを特徴としても良い。例えば、前記提示手段は、前記複数種の声質ご とに、図形と、ユーザの操作に応じて前記図形上を移動するポインタとを表示すること で、当該声質の変換可能な程度の範囲を提示し、前記受付手段は、前記図形上に おけるポインタの位置に基づ 、て、ユーザの指定するパラメタを特定して当該パラメ タを受け付ける。  [0011] Further, the presenting means presents, for each of a plurality of types of voice qualities, a range in which the voice qualities can be converted, and the accepting means presents a range within each of the voice qualities presented to the presenting means. Receiving the degree of the voice quality specified by the user as a parameter, and the range changing unit determines a range of another voice quality presented by the presentation unit in accordance with the parameter of the voice quality received to be converted by the reception unit. The conversion unit may change the feature data into the converted feature data according to the parameters of each voice quality received by the reception unit. For example, the presenting means presents, for each of the plurality of voice qualities, a graphic and a pointer that moves on the graphic according to a user operation, thereby presenting a range in which the voice qualities can be converted. Then, the accepting unit identifies a parameter specified by the user based on the position of the pointer on the graphic, and accepts the parameter.
[0012] これにより、提示手段により提示される例えば明るさを示す声質の変換可能な程度 の範囲内で、ユーザが明るさを増すようなパラメタを受付手段に受け付けさせたとき には、提示手段により提示される例えば早口を示す声質の変換可能な程度の範囲 が縮小されるため、ユーザはさらに早口の程度を増すような声質を指定しょうとすると きには、変換特徴データの声質に破綻が生じるか否かについて意識することなぐそ の縮小された早口の範囲内でパラメタを指定すれば、ユーザの期待通りの声質を示 す変換特徴データを生成することができる。  [0012] With this, when the user causes the accepting unit to accept a parameter that increases the brightness within a range in which, for example, the voice quality indicating the brightness presented by the presenting unit can be converted, the presenting unit For example, the range in which the conversion of voice quality indicating fast-talking can be converted is reduced, and when the user tries to specify a voice quality that further increases the rate of fast-talking, the voice quality of the conversion feature data is broken. By specifying the parameters within the reduced range of the fast voice without being aware of whether or not it will occur, it is possible to generate converted feature data that shows the voice quality expected by the user.
[0013] また、前記範囲変更手段は、前記ポインタを移動させることで、前記変換可能な程 度の範囲を変更させることを特徴としても良い。例えば、前記提示手段は、前記図形 を棒状に表示し、前記範囲変更手段は、前記ポインタを図形の長手方向に沿って移 動させることで、前記変換可能な程度の範囲を変更させる。  [0013] Further, the range changing means may change the range that can be converted by moving the pointer. For example, the presenting means displays the graphic in a bar shape, and the range changing means changes the convertible range by moving the pointer along the longitudinal direction of the graphic.
[0014] これにより、ユーザはポインタの位置から図形の一端までを、変換可能な程度の範 囲として視覚的に容易に理解することができる。  [0014] Thereby, the user can easily understand visually from the position of the pointer to one end of the figure as a range that can be converted.
[0015] また、前記提示手段は、前記各声質に対する図形及びポインタを、それぞれの声 質に基づく変化内容が類似するものほど互いの間が狭くなるように、並列して配置す ることを特徴としても良い。又は、前記提示手段は、前記各声質に対する図形及びポ インタを、それぞれの声質に基づく変化内容が類似するものほど互いの間の角度が 小さくなるように、同一円周上に沿って配置する。 [0015] Further, the presenting means arranges the figures and pointers for the respective voice qualities in parallel so that the more similar the change content based on the respective voice qualities, the narrower the gap between them. It is good. Alternatively, the presenting means may include a figure and a port for each voice quality. Inters are arranged along the same circumference so that the more similar the content of change based on each voice quality, the smaller the angle between them.
[0016] 例えば明るさを示す声質に基づく変化内容と早口を示す声質に基づく変換内容と は類似する。その結果、ユーザが明るさの声質に対応するポインタを明るさの程度が 増すように図形の一端側に移動させたときには、範囲変更手段によって、早口の声 質に対応するポインタも、その早口の程度を増すことが可能な範囲が縮小するように 図形の一端側に移動する。そこで、これらの声質に対応する図形及びポインタがそ れぞれ近くに配置して表示されることにより、ユーザは声質の変換可能な程度の範囲 の変更を容易〖こ知ることができる。  For example, the content of change based on voice quality indicating brightness and the content of conversion based on voice quality indicating fast-talking are similar. As a result, when the user moves the pointer corresponding to the voice quality of the brightness to one end of the figure so as to increase the degree of brightness, the pointer corresponding to the voice quality of the fast voice is also changed by the range changing means. Move to one end of the figure so that the range that can be increased is reduced. Therefore, by arranging and displaying the figures and pointers corresponding to these voice qualities near each other, the user can easily recognize a change in the range in which the voice qualities can be converted.
[0017] また、前記範囲変更手段は、前記図形を変形させることで、前記変換可能な程度 の範囲を変更させることを特徴としても良い。例えば、前記提示手段は、前記図形を 棒状に表示し、前記範囲変更手段は、前記図形の長手方向の長さを伸縮させること で、前記変更可能な程度の範囲を変更させる。  [0017] Further, the range changing means may change the convertible range by deforming the figure. For example, the presenting means displays the graphic in a bar shape, and the range changing means changes the range of the changeable extent by expanding and contracting the length of the graphic in the longitudinal direction.
[0018] これにより、ユーザはポインタの位置から図形の一端までを、変換可能な程度の範 囲として視覚的に容易に理解することができる。  This allows the user to easily understand visually from the position of the pointer to one end of the figure as a range that can be converted.
[0019] ここで、本発明に係る音声合成装置は、テキストデータの示すテキストを合成音声 に変換する音声合成装置であって、前記テキストデータを取得して、前記テキストデ ータのテキストに対応する音声の特徴を示す特徴データを生成する特徴データ生成 手段と、前記特徴データ生成手段で生成された特徴データを取得する取得手段と、 声質の変換可能な範囲を提示する提示手段と、前記提示手段により提示された範囲 内において、ユーザの指定する声質を受け付ける受付手段と、前記取得手段に取得 された特徴データ、及び前記受付手段で受け付けられた声質に応じて、前記提示手 段で提示される範囲を、前記合成音声の声質に破綻が生じない適正範囲に変更さ せる範囲変更手段と、前記取得手段により取得された特徴データを、前記受付手段 で受け付けられた声質の音声を示す変換特徴データに変換する変換手段と、前記 変換手段によって変換された変換特徴データに基づいて前記合成音声を生成して 出力する音声出力手段とを備えることを特徴とする。  Here, the speech synthesizer according to the present invention is a speech synthesizer that converts a text indicated by text data into a synthesized speech, acquires the text data, and corresponds to the text of the text data. Characteristic data generating means for generating characteristic data indicating characteristics of a sound to be reproduced, obtaining means for obtaining characteristic data generated by the characteristic data generating means, presenting means for presenting a convertible range of voice quality, and the presenting Within the range presented by the means, receiving means for receiving the voice quality specified by the user, the characteristic data acquired by the acquiring means, and the voice quality received by the receiving means, are presented by the presentation means. Range changing means for changing the range of the synthesized voice to an appropriate range in which the voice quality of the synthesized voice does not break down, and the feature data acquired by the Conversion means for converting into conversion characteristic data indicating voice of voice quality received by the reception means; and voice output means for generating and outputting the synthesized voice based on the conversion characteristic data converted by the conversion means. It is characterized by.
[0020] これにより、提示手段により提示される声質の変換可能な範囲は、特徴データ及び ユーザの指定する声質に応じて適正範囲に変更されるため、ユーザはその指定する 声質力もさらに他の声質を指定しょうとするときには、変換特徴データの声質に破綻 が生じるか否かについて意識することなぐその適正範囲内で声質を指定すれば、テ キストデータの示すテキストを、ユーザの期待通りの声質の合成音声に変換すること ができる。その結果、ユーザインターフェースの観点力も使い勝手を向上することが できる。 [0020] Thereby, the range in which the voice quality presented by the presentation means can be converted is the range of the feature data and Since the voice quality is changed to an appropriate range according to the voice quality specified by the user, the user should be conscious of whether or not the voice quality of the conversion feature data will fail when trying to specify another voice quality. If the voice quality is specified within the proper range, the text indicated by the text data can be converted into a synthesized voice with the voice quality expected by the user. As a result, the user interface viewpoint can be improved in usability.
[0021] なお、本発明は、このような声質変換装置や音声合成装置として実現することがで きるだけでなぐそれらの装置が行う動作の方法やプログラム、そのプログラムを格納 する記憶媒体としても実現することができる。  [0021] The present invention can be realized not only as such a voice conversion device or a voice synthesis device, but also as a method and a program of an operation performed by the device, and also as a storage medium for storing the program. can do.
発明の効果  The invention's effect
[0022] 本発明の声質変換装置は、ユーザインターフェースの観点力 使い勝手を向上す ることができるという作用効果を奏する。  [0022] The voice quality conversion device of the present invention has an operational effect that the viewpoint power of the user interface and the usability can be improved.
図面の簡単な説明  Brief Description of Drawings
[0023] [図 1]図 1は、本発明の実施の形態 1における声質変換装置の構成図である。 FIG. 1 is a configuration diagram of a voice quality conversion device according to Embodiment 1 of the present invention.
[図 2]図 2は、同上の声質変換装置の動作の一例を説明するための説明図である。  FIG. 2 is an explanatory diagram illustrating an example of an operation of the voice quality conversion device according to the first embodiment.
[図 3]図 3は、同上の声質変換装置の動作の他の例を説明するための説明図である。  FIG. 3 is an explanatory diagram for explaining another example of the operation of the voice quality conversion device of the above.
[図 4]図 4は、同上の声質変換装置の動作のさらに他の例を説明するための説明図 である。  FIG. 4 is an explanatory diagram for explaining still another example of the operation of the voice quality conversion device of the above.
[図 5]図 5は、同上の声質変換装置の動作のさらに他の例を説明するための説明図 である。  FIG. 5 is an explanatory diagram for explaining still another example of the operation of the above voice quality conversion device.
[図 6]図 6は、同上の声質変換装置の動作のさらに他の例を説明するための説明図 である。  FIG. 6 is an explanatory diagram for explaining still another example of the operation of the voice quality conversion device of the above.
[図 7]図 7は、同上の Indigoアルゴリズムによる制約条件を説明するための説明図で める。  [FIG. 7] FIG. 7 is an explanatory diagram for explaining a constraint condition by the Indigo algorithm of the above.
[図 8]図 8は、本発明の実施の形態 2における声質変換装置の構成図である。  FIG. 8 is a configuration diagram of a voice quality conversion device according to Embodiment 2 of the present invention.
[図 9]図 9は、同上の声質調整部が提示する内容を説明するための説明図である。  FIG. 9 is an explanatory diagram for explaining the content presented by the voice quality adjustment unit according to the embodiment.
[図 10]図 10は、同上の調整制御部の動作を示すフロー図である。  FIG. 10 is a flowchart showing an operation of an adjustment control unit according to the embodiment.
[図 11]図 11は、同上の変形例 1に係る声質調整部が提示する内容を説明するため の説明図である。 [FIG. 11] FIG. 11 is for explaining the content presented by the voice quality adjustment unit according to the first modification of the above. FIG.
[図 12]図 12は、同上の変形例 2に係る声質調整部が提示する内容を説明するため の説明図である。  [FIG. 12] FIG. 12 is an explanatory diagram for describing contents presented by a voice quality adjustment unit according to the second modification of the above.
[図 13A]図 13Aは、同上の変形例 3に係る声質間の距離を説明するための説明図で める。  FIG. 13A is an explanatory diagram for describing a distance between voice qualities according to Modification 3 of the above.
[図 13B]図 13Bは、同上の変形例 3に係る声質調整部の表示内容を示す図である。  [FIG. 13B] FIG. 13B is a diagram showing a display content of a voice quality adjustment unit according to the third modification of the above.
[図 14A]図 14Aは、同上の変形例 4に係る声質調整部の表示内容を示す図である。 FIG. 14A is a diagram showing a display content of a voice quality adjustment unit according to Modification 4 of the above.
[図 14B]図 14Bは、同上の変形例 4に係る声質調整部が表示内容を変化させる様子 を説明するための説明図である。 [FIG. 14B] FIG. 14B is an explanatory diagram for explaining how the voice quality adjusting unit according to Modification 4 of the above changes the display content.
[図 15]図 15は、本発明の実施の形態 3における音声合成装置の構成図である。  FIG. 15 is a configuration diagram of a speech synthesis device according to Embodiment 3 of the present invention.
[図 16]図 16は、同上の変形例 1に係る音声合成装置の構成図である。  FIG. 16 is a configuration diagram of a speech synthesis device according to a first modification of the above.
[図 17]図 17は、同上の変形例 2に係る音声合成装置の構成図である。  FIG. 17 is a configuration diagram of a speech synthesis device according to Modification 2 of the above.
[図 18]図 18は、同上の変形例 3に係る音声合成装置の構成図である。  FIG. 18 is a configuration diagram of a speech synthesizer according to a third modification of the above.
[図 19]図 19は、同上の変形例 4に係る音声合成装置の構成図である。  FIG. 19 is a configuration diagram of a speech synthesizer according to a fourth modification of the above.
[図 20]図 20は、同上の変形例 5に係る音声合成装置の構成図である。  FIG. 20 is a configuration diagram of a speech synthesizer according to a fifth modification of the above embodiment.
符号の説明 Explanation of symbols
101 変換部  101 converter
103, 103a 声質調整部  103, 103a Voice quality adjustment unit
104, 104a— 104d 調整制御部  104, 104a—104d Adjustment control unit
105 変換係数格納部  105 Transform coefficient storage
105a 係数データ  105a coefficient data
106 限界値格納部  106 Limit value storage
201 音声合成部  201 Voice synthesis unit
202 音声合成データベース  202 Speech Synthesis Database
203 波形生成部  203 Waveform generator
204 スピーカ  204 Speaker
205 特徴テーブル格納部  205 Feature table storage
206 音声分析部 B 範囲バー 206 Voice Analysis Unit B range bar
P ポインタ  P pointer
pi 特徴パラメタ列  pi feature parameter sequence
p2 変形特徴パラメタ列  p2 Deformation feature parameter sequence
si 波形信号  si waveform signal
tdl テキストデータ  tdl text data
td2 波形特徴テーブル  td2 Waveform feature table
発明を実施するための最良の形態  BEST MODE FOR CARRYING OUT THE INVENTION
[0025] 以下、本発明の実施の形態について、図を参照しながら説明する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.
(実施の形態 1)  (Embodiment 1)
図 1は、本発明の実施の形態 1における声質変換装置の構成図である。  FIG. 1 is a configuration diagram of a voice quality conversion device according to Embodiment 1 of the present invention.
[0026] 本実施の形態の声質変換装置は、声質の破綻の発生を防 ヽで声質を変換するも のであって、変換部 101と、声質調整部 103と、調整制御部 104と、変換係数格納部 105と、限界値格納部 106とを備える。 [0026] The voice quality conversion apparatus according to the present embodiment converts voice quality while preventing the occurrence of voice quality breakdown, and includes conversion section 101, voice quality adjustment section 103, adjustment control section 104, conversion coefficient A storage unit 105 and a limit value storage unit 106 are provided.
[0027] 変換部 101は、音声の音響的特徴を示す特徴パラメタ列 piを取得する。特徴パラ メタ列 piは、音声をフレームごとに分析した結果得られるその音声の音響的特徴を ノ メタとして示すデータであって、これをもとに再合成を行うことで元の音声が得ら れる。変換部 101は、声質調整部 103からの指示に応じてその特徴パラメタ列 piの 示す音響的特徴のパラメタを変換することにより変形特徴パラメタ列 p2を生成する。 変形特徴パラメタ列 p2は、特徴パラメタ列 piと同様、音声の音響的特徴をパラメタと して示し、合成音声を生成するために用いられる。変形特徴パラメタ列 p2を用いて生 成される合成音声 (音声波形)の声質と、特徴パラメタ列 piを用いて生成される合成 音声 (音声波形)の声質とは、変換部 101による変換結果に応じて異なる。 [0027] The conversion unit 101 acquires a feature parameter sequence pi indicating an acoustic feature of the speech. The characteristic parameter sequence pi is data indicating the acoustic characteristics of the speech obtained as a result of analyzing the speech for each frame as a parameter, and the original speech is obtained by performing resynthesis based on this. It is. The conversion unit 101 generates a deformed feature parameter sequence p2 by converting the parameter of the acoustic feature indicated by the feature parameter sequence pi according to the instruction from the voice quality adjustment unit 103. The deformed feature parameter sequence p2 indicates the acoustic feature of the voice as a parameter, similar to the feature parameter sequence pi, and is used to generate a synthesized voice. The voice quality of the synthesized speech (voice waveform) generated using the deformed feature parameter sequence p2 and the voice quality of the synthesized voice (voice waveform) generated using the feature parameter sequence pi Depending on.
[0028] 変換係数格納部 105は、変換部 101が変換処理を行う際のテンプレートとなる係数 データを保持する。 [0028] The conversion coefficient storage unit 105 holds coefficient data serving as a template when the conversion unit 101 performs the conversion process.
[0029] 声質調整部 103は、ユーザによって操作されることにより、そのユーザが期待する 変換後の声質を受け付けるとともに、調整制御部 104からその声質に対する変更の 指示を受け付ける。さらに、声質調整部 103は、変換係数格納部 105に格納されて いる係数データを用いることにより、そのユーザによる操作結果と、調整制御部 104 力もの指示とに応じた変換内容を特定し、その変換内容を変換部 101に対して指示 する。 When operated by the user, voice quality adjusting section 103 receives the converted voice quality expected by the user, and receives an instruction to change the voice quality from adjustment control section 104. Further, the voice quality adjustment unit 103 stores the By using the coefficient data, the conversion content according to the operation result of the user and the instruction of the adjustment control unit 104 is specified, and the conversion content is instructed to the conversion unit 101.
[0030] 具体的に、声質調整部 103は、声質の種類ごと、例えば明るさや早口ごとに、その 声質の変換可能な範囲を示す範囲バー Bと、その範囲バー B上に移動自在にあって その声質の変換の程度を示すポインタ Pとを表示する。ユーザはこのポインタ Pを操 作して範囲バー B上に沿って移動させることにより所望の声質を設定する。  [0030] Specifically, the voice quality adjustment unit 103 includes, for each type of voice quality, for example, for each brightness and quickness, a range bar B indicating a convertible range of the voice quality, and a movable range bar B on the range bar B. A pointer P indicating the degree of voice quality conversion is displayed. The user operates the pointer P and moves it along the range bar B to set a desired voice quality.
[0031] 限界値格納部 106は、変形特徴パラメタ列 p2について自然性を保った合成音声が 得られるための限界条件 (各音響的特徴を示すパラメタの限界値など)を格納して 、 る。  [0031] The limit value storage unit 106 stores limit conditions (such as a limit value of a parameter indicating each acoustic feature) for obtaining a synthesized speech that maintains naturalness for the deformed feature parameter sequence p2.
[0032] 調整制御部 104は、特徴パラメタ列 piを取得するとともに、ユーザによる声質調整 部 103に対する操作結果を取得する。  [0032] Adjustment control section 104 obtains feature parameter sequence pi, and also obtains the operation result of user on voice quality adjustment section 103.
[0033] 調整制御部 104は、その特徴パラメタ列 piと操作結果とに基づいて変形特徴パラ メタ列 p2を推定する。そして、調整制御部 104は、その推定した変形特徴パラメタ列 p2を限界値格納部 106の限界条件と比較する。調整制御部 104は、その変形特徴 ノ メタ列 P2が限界条件を満たさなければ、その限界条件を満たすように声質調整 部 103に対してユーザの操作結果を変更するように指示する。即ち、調整制御部 10 4は、声質調整部 103に設定された変換内容によって変形特徴パラメタ列 p2に音質 劣化 (声質の破綻)が生じる力否かを判別し、音質劣化を起こさな 、ように変換内容 を調整する。 [0033] Adjustment control section 104 estimates deformed characteristic parameter sequence p2 based on the characteristic parameter sequence pi and the operation result. Then, the adjustment control unit 104 compares the estimated modified feature parameter sequence p2 with the limit condition in the limit value storage unit 106. If the modified feature meta-string P2 does not satisfy the limit condition, the adjustment control unit 104 instructs the voice quality adjustment unit 103 to change the operation result of the user so as to satisfy the limit condition. That is, the adjustment control unit 104 determines whether or not there is a force that causes sound quality deterioration (breakage of voice quality) in the deformed feature parameter sequence p2 based on the conversion content set in the voice quality adjustment unit 103, and does not cause sound quality deterioration. Adjust the conversion details.
[0034] 以下、本実施の形態の声質変換装置が声質変換を行う際の処理を具体的に説明 する。  [0034] Hereinafter, a process when the voice conversion device of the present embodiment performs voice conversion will be specifically described.
図 2は、本実施の形態における声質変換装置の動作の一例を説明するための説明 図である。  FIG. 2 is an explanatory diagram illustrating an example of an operation of the voice quality conversion device according to the present embodiment.
[0035] 声質調整部 103は、ユーザが複数のポインタ Pを操作することにより、ユーザが期 待する声質を受け付ける。たとえば図 2に示すように、声質調整部 103は、変換可能 な声質として、明るさ、暗さ、男らしさ、早口の 4つの声質を示す。これらの声質の変 換範囲は、 0から 10までの目盛りが付された範囲バー Bによって示される。 [0036] ユーザは、各声質に対応するポインタ Pを範囲バー B上の 0から 10の目盛りの範囲 内で動かすことにより、変換したい声質および変換量を指定する。声質調整部 103は 、ポインタ Pの位置する目盛りの値 (指示値)が 0のときには、当該声質に対して変換 を要しないと判断し、指示値が 10に近づくに伴い、当該声質に対して大きな変換を 要すると判断する。 [0035] Voice quality adjustment section 103 accepts voice quality expected by the user by operating the plurality of pointers P by the user. For example, as shown in FIG. 2, the voice quality adjustment unit 103 indicates four voice qualities that can be converted: brightness, darkness, masculinity, and fast voice. The conversion range of these voice qualities is indicated by a range bar B scaled from 0 to 10. The user designates the voice quality and the conversion amount to be converted by moving the pointer P corresponding to each voice quality within the range of 0 to 10 scales on the range bar B. When the value (indicated value) of the scale at which the pointer P is located is 0, the voice quality adjusting unit 103 determines that conversion is not required for the voice quality, and as the indicated value approaches 10, the voice quality adjusting unit 103 Judge that a large conversion is required.
[0037] なお、図 2では、明るさを「明」、喑さを「暗」、男らしさを「男」、早口を「早」で示して いる。また、声質調整部 103をボリュームスィッチ等で構成しても良い。  In FIG. 2, the brightness is indicated by “bright”, the length is indicated by “dark”, the masculineness is indicated by “male”, and the quickness is indicated by “early”. Further, the voice quality adjustment unit 103 may be constituted by a volume switch or the like.
[0038] 特徴パラメタ列 piは、調整可能な音響的特徴のパラメタとして、分析フレームごとに 基本周波数 FO、第一フォルマント周波数 Fl、第二フォルマント周波数 F2、フレーム 継続長 FR、及び音源パワー PWの 5つのパラメタを示す。 [0038] The characteristic parameter sequence pi is a parameter of an acoustic characteristic that can be adjusted, and is a fundamental frequency FO, a first formant frequency Fl, a second formant frequency F2, a frame duration FR, and a sound source power PW for each analysis frame. One parameter is shown.
[0039] 変換係数格納部 105の保持している係数データ 105aは、声質調整部 103の各声 質において、指示値力^つ増加したときに、特徴パラメタ列 piの上記 5つの音響的特 徴のパラメタに加えるべき値 (係数)を示す。 [0039] The coefficient data 105a held in the conversion coefficient storage unit 105 stores the above five acoustic characteristics of the feature parameter sequence pi when the indicated value power is increased by ^ in each voice quality of the voice quality adjustment unit 103. Indicates the value (coefficient) to be added to the parameter of.
[0040] 即ち、声質変換装置の変換部 101は、図 2に示すように、声質調整部 103で全ての 指示値が 0に設定されているときには、特徴パラメタ列 piを取得すると、その特徴パ ラメタ列 piと同一の変形特徴パラメタ列 P2を出力する。 That is, as shown in FIG. 2, when all the instruction values are set to 0 by the voice quality adjustment unit 103, the conversion unit 101 of the voice quality conversion apparatus acquires the feature parameter sequence pi, and The same modified feature parameter sequence P2 as the parameter sequence pi is output.
[0041] 図 3は、本実施の形態における声質変換装置の動作の他の例を説明するための説 明図である。 FIG. 3 is an explanatory diagram for describing another example of the operation of the voice quality conversion device according to the present embodiment.
[0042] ユーザは、声質調整部 103の明るさの指示値を 5に、早口の指示値を 3に設定する 変換部 101は、係数データ 105aの明るさに対する各音響的特徴の係数と、上記明 るさの指示値 (5)とを積算する。さらに、変換部 101は、係数データ 105aの早口に対 する各音響的特徴の係数と、上記早口の指示値 (3)とを積算する。変換部 101は、 音響的特徴ごとに、これらの積算値を合算し、さらに、その結果を特徴パラメタ列 pi の値に加算する。これにより、変換部 101は、変形特徴パラメタ列 p2を生成する。  The user sets the instruction value of the brightness of the voice quality adjustment unit 103 to 5 and the instruction value of the fast-talk to 3 The conversion unit 101 calculates the coefficient of each acoustic feature with respect to the brightness of the coefficient data 105a, Integrate with the brightness indication value (5). Further, the conversion unit 101 integrates the coefficient of each acoustic feature of the coefficient data 105a for the fast-talking and the indicated value (3) of the fast-talking. The conversion unit 101 adds up these integrated values for each acoustic feature, and further adds the result to the value of the feature parameter sequence pi. As a result, the conversion unit 101 generates the deformation feature parameter sequence p2.
[0043] 例えば、変換部 101は、特徴パラメタ列 piの 1つの分析フレームの基本周波数 FO の値が 300であるため、変形特徴パラメタ列 p2の対応する分析フレームの基本周波 数 F0を、 300 + 5 X ( + 5) + 3 X ( + 1) = 328として求める。他の音響的特徴のパラ メタについても同様に計算を行う。 [0043] For example, since the value of the fundamental frequency FO of one analysis frame of the feature parameter sequence pi is 300, the conversion unit 101 calculates the fundamental frequency F0 of the corresponding analysis frame of the transformed feature parameter sequence p2 by 300 + It is calculated as 5 X (+5) +3 X (+1) = 328. Parametric of other acoustic features The same calculation is performed for the meta.
[0044] 図 4は、本実施の形態における声質変換装置の動作のさらに他の例を説明するた めの説明図である。  FIG. 4 is an explanatory diagram for explaining still another example of the operation of the voice quality conversion device according to the present embodiment.
[0045] ユーザは、図 3に示される声質調整部 103の状態の後、声質調整部 103に対して 暗さの指示値を 7に設定する。  After the state of voice quality adjusting section 103 shown in FIG. 3, the user sets the instruction value of darkness to 7 for voice quality adjusting section 103.
[0046] このように設定されると、調整制御部 104は、声質調整部 103で設定された指示値 を、ユーザにとって操作しやすい指示値に変更させる。  When set as described above, adjustment control section 104 changes the instruction value set by voice quality adjustment section 103 to an instruction value that is easy for the user to operate.
[0047] 即ち、変換係数格納部 105の係数データ 105aに示すように、暗さに対する各音響 的特徴の係数は、明るさに対する各音響的特徴の係数と逆の関係になっている。そ のため、暗さの指示値を増やす代わりに明るさの指示値をまず減らしてやることで、 暗さの指示値を増カロしたときと同じ効果が得られる。そこで調整制御部 104は、上述 の明るさと暗さの関係を係数データ 105aから特定し、暗さの指示値が 7に設定された ときには、まず明るさの指示値を 5から 0に減少させるように声質調整部 103に指示し 、さらに、暗さの指示値を 7から 2に減少させるように声質調整部 103に指示する。  That is, as shown in the coefficient data 105a of the conversion coefficient storage unit 105, the coefficient of each acoustic feature for darkness has an inverse relationship to the coefficient of each acoustic feature for brightness. Therefore, instead of increasing the darkness reading, the brightness reading is first reduced, which has the same effect as increasing the darkness reading. Therefore, the adjustment control unit 104 specifies the above-described relationship between the brightness and the darkness from the coefficient data 105a, and when the darkness indication value is set to 7, first reduces the brightness indication value from 5 to 0. To the voice quality adjustment unit 103, and further instructs the voice quality adjustment unit 103 to reduce the indicated value of darkness from 7 to 2.
[0048] 具体的に、図 4に示す場合、まず、明るさの指示値が 5であった状態でさらに暗さの 指示値が 7に設定されている。ここで、明るさの指示値を 5から 0にすることによる効果 は、暗さの指示値を 5だけ増やすことによる効果と同じである。そこで、調整制御部 10 4は、暗さの指示値を 7にする代わりに、明るさの指示値を 0にして、暗さの指示値を 2 にすればよいと判断する。そして、調整制御部 104は、その判断結果を声質調整部 1 03に指示して、ユーザにより設定された指示値を変更させる。  [0048] Specifically, in the case shown in Fig. 4, first, the instruction value of darkness is set to 7 while the instruction value of brightness is 5; Here, the effect of changing the indicated value of brightness from 5 to 0 is the same as the effect of increasing the indicated value of darkness by 5. Therefore, instead of setting the indicated value of darkness to 7, the adjustment control unit 104 determines that the indicated value of brightness should be set to 0 and the indicated value of darkness should be set to 2. Then, adjustment control unit 104 instructs voice quality adjustment unit 103 of the determination result, and changes the instruction value set by the user.
[0049] このように、調整制御部 104は、ユーザにより声質調整部 103に設定された指示値 を、各指示値の値が最小になるように調整することにより、ユーザにとって操作しやす V、インタフェイスを構築することができる。  As described above, the adjustment control unit 104 adjusts the instruction values set in the voice quality adjustment unit 103 by the user so that the values of the respective instruction values become minimum, so that the user can easily operate V, An interface can be built.
[0050] 図 5は、本実施の形態における声質変換装置の動作のさらに他の例を説明するた めの説明図である。  FIG. 5 is an explanatory diagram for explaining still another example of the operation of the voice conversion device according to the present embodiment.
[0051] ユーザは、例えば明るさの指示値を 10に設定する。また、特徴パラメタ列 piは、 1 つの分析フレームにお 、て基本周波数 FO = 300、第一フォルマント周波数 F1 = 50 0、第二フォルマント周波数 F2= 1600、フレーム継続長 FR= 50、及び音源パワー PW= 30を示す。このような場合、変換部 101は、上述と同様、上記分析フレームに 対応して基本周波数 FO = 300 + 10 X ( + 5) = 350、第一フォルマント周波数 Fl = 500+ 10 X ( + 2) = 520、第二フォルマント周波数 F2= 1600+ 10 X ( + 1) = 161 0、フレーム継続長 FR= 50+ 10 X (-1) =40,及び音源パワー PW= 30+ 10 X ( + 1) =40を示す変形特徴パラメタ列 p2を出力する。 The user sets, for example, an instruction value of brightness to 10. In addition, the characteristic parameter sequence pi is the fundamental frequency FO = 300, the first formant frequency F1 = 500, the second formant frequency F2 = 1600, the frame duration FR = 50, and the sound source power in one analysis frame. Indicates PW = 30. In such a case, as described above, the conversion unit 101 converts the fundamental frequency FO = 300 + 10X (+5) = 350 and the first formant frequency Fl = 500 + 10X (+2) corresponding to the analysis frame. = 520, second formant frequency F2 = 1600 + 10 X (+1) = 1610, frame duration FR = 50 + 10 X (-1) = 40, and sound source power PW = 30 + 10 X (+1) The transformed feature parameter sequence p2 indicating = 40 is output.
[0052] ここで、限界値格納部 106には、基本周波数 F0の最大値が 350であることを示す 限界条件が格納されている。即ち、その限界条件は、変形特徴パラメタ列 p2の基本 周波数 F0の値が 350を超えると、その変形特徴パラメタ列 p2に基づいて生成される 合成音の音質が著しく劣化することを示す。 Here, limit value storage section 106 stores limit conditions indicating that the maximum value of fundamental frequency F0 is 350. That is, the limit condition indicates that when the value of the fundamental frequency F0 of the modified feature parameter sequence p2 exceeds 350, the sound quality of the synthesized sound generated based on the modified feature parameter sequence p2 is significantly deteriorated.
[0053] 図 6は、本実施の形態における声質変換装置の動作のさらに他の例を説明するた めの説明図である。  FIG. 6 is an explanatory diagram for explaining still another example of the operation of the voice quality conversion device according to the present embodiment.
[0054] ユーザは、図 5に示す声質調整部 103の状態、つまり明るさの指示値が 10であつ て他の指示値が全て 0である状態から、さらに早口の指示値を 5に設定する。  From the state of voice quality adjustment section 103 shown in FIG. 5, that is, the state in which the brightness instruction value is 10 and all other instruction values are 0, the user further sets the instruction value of the fast-talk to 5 .
[0055] 調整制御部 104は、声質調整部 103に対してユーザにより設定された指示値に従 つた変換処理が特徴パラメタ列 piに対して行われたときの変形特徴パラメタ列 p2を 推定する。調整制御部 104は、その推定した変形特徴パラメタ列 p2の各音響的特徴 のパラメタが限界値格納部 106の限界条件を満たしているカゝ否かを判断する。調整 制御部 104は、推定した変形特徴パラメタ列 p2のうち少なくとも 1つのパラメタが限界 条件を満たしていないときには、そのパラメタが限界条件を満たすように、声質調整 部 103に対して指示値の変更の指示を行う。このとき、調整制御部 104は、例えば、 ユーザによって最近設定された指示値を優先して変更するように指示したり、最も大 きな指示値を優先して変更するように指示したりする。  [0055] Adjustment control section 104 estimates modified feature parameter sequence p2 when conversion processing according to the instruction value set by the user for voice quality adjustment portion 103 is performed on feature parameter sequence pi. The adjustment control unit 104 determines whether or not the parameters of each acoustic feature of the estimated deformation feature parameter sequence p2 satisfy the limit condition of the limit value storage unit 106. When at least one parameter of the estimated deformation feature parameter sequence p2 does not satisfy the limit condition, the adjustment control unit 104 controls the voice quality adjustment unit 103 to change the indicated value so that the parameter satisfies the limit condition. Make instructions. At this time, for example, the adjustment control unit 104 gives an instruction to give priority to the instruction value recently set by the user, or gives an instruction to give priority to the largest instruction value.
[0056] 具体的に、図 6に示すように、早口の指示値が 5だけ増加されると、調整制御部 10 4は、この場合の変形特徴パラメタ列 p2を推定して、その変形特徴パラメタ列 p2の基 本周波数 F0 (355)が限界条件 (350以下)を満たさないと判断する。その結果、調 整制御部 104は、明るさの指示値を 1だけ減らすように声質調整部 103に指示するこ とで、変形特徴パラメタ列 p2の基本周波数 F0の値を 350以下に収めようとする。  Specifically, as shown in FIG. 6, when the indicated value of the fast-talk is increased by 5, the adjustment control unit 104 estimates the deformed feature parameter sequence p2 in this case, and It is determined that the fundamental frequency F0 (355) in column p2 does not satisfy the limit condition (350 or less). As a result, the adjustment control unit 104 instructs the voice quality adjustment unit 103 to reduce the brightness instruction value by 1 so that the value of the fundamental frequency F0 of the deformed feature parameter sequence p2 is set to 350 or less. I do.
[0057] その結果、声質調整部 103は、明るさの指示値を 10から 9に変更する。このように、 調整制御部 104が指示値を限界条件に応じて調整することで、ユーザは各音響的 特徴のパラメタの限界値を意識することなぐ声質の破綻が生じないように声質変換 の操作を行うことができる。 As a result, voice quality adjusting section 103 changes the indicated value of brightness from 10 to 9. in this way, The adjustment control unit 104 adjusts the indicated value according to the limit condition, so that the user can perform the voice conversion operation so that the voice quality does not break down without being aware of the limit value of the parameter of each acoustic feature. it can.
[0058] なお、限界値格納部 106に格納されている限界条件を調整制御部 104が必要に 応じて参照しても良い。また、限界条件は、基本周波数 FOの値が 350を超えてはな らないというような各音響的特徴のパラメタごとの限界値を示したり、基本周波数 FOと 第二フォルマント周波数 F2の値を加算した結果が 2000を越えてはならないというよ うなデータを示しても良い。  Note that the adjustment control unit 104 may refer to the limit conditions stored in the limit value storage unit 106 as needed. In addition, the limit condition indicates the limit value for each parameter of each acoustic feature such that the value of the fundamental frequency FO must not exceed 350, or adds the value of the fundamental frequency FO and the value of the second formant frequency F2. It may show data that the result should not exceed 2000.
[0059] なお、変換部 101によって特徴パラメタ列 piに与えられる変換は全分析フレームに 対して一律でなくてもよぐ分析フレームごとに変換係数格納部 105の係数データ 10 5aを異ならせて!/、てもよ!/、。  Note that the conversion given to the characteristic parameter sequence pi by the conversion unit 101 may not be uniform for all analysis frames, and the coefficient data 105a of the conversion coefficient storage unit 105 may be different for each analysis frame! /, You can! / ,.
[0060] なお、調整制御部 104による指示値の調整は、制約充足アルゴリズムを用いて自 動的に行っても良い。制約充足アルゴリズムには、例えば Indigoアルゴリズムがある( A. Borning, R. Anderson, B. Freeman-Benson: The Indigo Algontnm, TR  Note that the adjustment of the indicated value by the adjustment control unit 104 may be automatically performed using a constraint satisfaction algorithm. An example of the constraint satisfaction algorithm is the Indigo algorithm (A. Borning, R. Anderson, B. Freeman-Benson: The Indigo Algontnm, TR
96— 05— 01, Department of Computer Science and Engineering, University of  96— 05— 01, Department of Computer Science and Engineering, University of
Washington, July 1996参照)。  Washington, July 1996).
[0061] 図 7は、 Indigoアルゴリズムによる制約条件を説明するための説明図である。  FIG. 7 is an explanatory diagram for describing a constraint condition by the Indigo algorithm.
この図 7に示す制約条件は、基本周波数 F0に対して、図 6に示す指示値の調整を 行うためのものであって、 Indigoアルゴリズムの制約階層で記述すると以下のように なる。  The constraint condition shown in FIG. 7 is for adjusting the indicated value shown in FIG. 6 with respect to the fundamental frequency F0, and is described as follows in the constraint hierarchy of the Indigo algorithm.
[0062] REQUIRED制約 C1 :出力 F0≤ 350  [0062] REQUIRED constraint C1: output F0≤ 350
REQUIRED制約 C2:入力 FO = 300  REQUIRED constraint C2: Input FO = 300
REQUIRED制約 C3:明るさ X 5 = tl  REQUIRED constraint C3: Brightness X 5 = tl
REQUIRED制約 C4:喑さ X— 5 = t2  REQUIRED constraint C4: Length X— 5 = t2
REQUIRED制約 C5 :男らしさ X— 3 = t3  REQUIRED constraint C5: masculinity X— 3 = t3
REQUIRED制約 C6:早口 X 1 =t4  REQUIRED constraint C6: Fast X1 = t4
REQUIRED制約 C7 :tl +t2 = t5  REQUIRED constraint C7: tl + t2 = t5
REQUIRED制約 C8: t3 + t4 = t6 REQUIRED制約 C9: t5 + t6 = t7 REQUIRED constraint C8: t3 + t4 = t6 REQUIRED constraint C9: t5 + t6 = t7
REQUIRED制約 CIO :入力 F0 + t7 = t8  REQUIRED constraint CIO: Input F0 + t7 = t8
REQUIRED制約 C 11 : t8 =出力 FO  REQUIRED constraint C 11: t8 = output FO
STRONG制約 C 12:早口 = 5  STRONG constraint C 12: Fast = 5
WEAK制約 C 13:男らしさ = 0  WEAK constraint C13: masculinity = 0
WEAK制約 C 14:喑さ = 0  WEAK constraint C14: Length = 0
WEAK制約 C15 :明るさ = 10  WEAK constraint C15: Brightness = 10
なお、変数 tlから t8は、計算の途中結果を保持するための変数である。また、図 7 中では、簡単のため省略したが、より望ましい結果を出すためには、各指示値を 0以 上 10以下の値に束縛する REQUIRED制約を設けてあることが望ましい。  Note that the variables tl to t8 are variables for holding intermediate results of the calculation. Although omitted in FIG. 7 for the sake of simplicity, in order to obtain more desirable results, it is desirable to provide a REQUIRED constraint that binds each indicated value to a value between 0 and 10.
以上の制約条件を Indigoアルゴリズムで解く場合の処理の概略を以下に示す。 初期状態:全ての変数の値域が [一∞, +∞]  An outline of the processing when the above constraints are solved by the Indigo algorithm is shown below. Initial state: The range of all variables is [1∞, + ∞]
C1追カロ:出力 F0の値域が [ ∞, 350]になる  C1 follow-up: The range of output F0 is [∞, 350]
C2追カロ:人力 F0の値域力 S[300, 300]になる  C2 follow-up car: human power F0 range power becomes S [300, 300]
C3— C10追加:各変数の値域に変化無し  C3—C10 added: No change in the value range of each variable
C11追加: t8の値域が [ ∞, 350]になる  C11 added: t8 range becomes [に な る, 350]
C10を伝播して t7の値域が [-∞, 50]になる  Propagate C10 and t7 becomes [-∞, 50]
C 12追カロ:早口の値域が [5, 5]になる  C12 additional caro: The range of the fast mouth becomes [5, 5]
C6を伝播して t4の値域が [5, 5]になる  Propagate C6 and the range of t4 becomes [5, 5]
C13追加:男らしさの値域が [0, 0]になる  C13 added: The range of masculinity is [0, 0]
C5を伝播して t3の値域が [0, 0]になる  Propagate C5 and the range of t3 becomes [0, 0]
C8を伝播して t6の値域が [5, 5]になる  Propagate C8 and t6 becomes [5, 5]
C9を伝播して t5の値域が [ ∞, 45]になる  Propagating C9, the range of t5 becomes [∞, 45]
C14追カロ:暗さの値域が [0, 0]になる  C14 additional caro: Dark range becomes [0, 0]
C4を伝播して t2の値域が [0, 0]になる  Propagate C4 and the range of t2 becomes [0, 0]
C7を伝播して tlの値域が [ ∞, 45]になる  Propagate C7 and the range of tl becomes [∞, 45]
C3を伝播して明るさの値域が [ ∞, 9]になる  Propagating through C3, brightness range becomes [∞, 9]
C15追加:明るさの値域が [9, 9]になる [0064] (実施の形態 2) C15 added: Brightness range is [9, 9] (Embodiment 2)
図 8は、本発明の実施の形態 2における声質変換装置の構成図である。  FIG. 8 is a configuration diagram of a voice quality conversion device according to Embodiment 2 of the present invention.
[0065] 本実施の形態の声質変換装置は、ユーザインターフェースの観点力 使い勝手を 向上したものであって、変換部 101と、声質調整部 103aと、調整制御部 104aと、変 換係数格納部 105と、限界値格納部 106とを備える。なお、本実施の形態において 、実施の形態 1の符号と同一の符号を付して示すものは、実施の形態 1のその符号を 付して示されるものと同一であるため、詳細な説明は省略する。  The voice quality conversion apparatus according to the present embodiment has improved viewpoint and usability of a user interface, and includes a conversion section 101, a voice quality adjustment section 103a, an adjustment control section 104a, and a conversion coefficient storage section 105. And a limit value storage unit 106. Note that, in the present embodiment, the components denoted by the same reference numerals as those of the first embodiment are the same as those denoted by the same reference numerals of the first embodiment. Omitted.
[0066] 声質調整部 103aは、ユーザによって操作されることにより、そのユーザが期待する 変換後の声質を受け付ける。即ち、声質調整部 103aは、ユーザの指定する声質を 受け付ける受付手段としての機能を有する。さらに、声質調整部 103aは、変換係数 格納部 105に格納されている係数データ 105aを用いることにより、そのユーザによる 操作結果に応じた変換内容を特定し、その変換内容を変換部 101に対して指示する 。具体的に、声質調整部 103aは、実施の形態 1の声質調整部 103と同様、声質の 種類ごと、例えば明るさや早口ごとに、その声質の変換可能な範囲 (絶対的な範囲) を示す範囲バー Bと、その範囲バー B上に移動自在にあってその声質の程度を示す ポインタ Pとを表示する。ユーザはこのポインタ Pを操作して範囲バー B上に沿って移 動させることにより所望の声質を設定する。なお、この声質調整部 103aは、範囲バー B及びポインタ Pを表示することにより、その現時点の声質の変換程度からさらに変換 可能な範囲湘対的な範囲)を提示する提示手段としての機能を有する。  [0066] The voice quality adjusting unit 103a receives the converted voice quality expected by the user when operated by the user. That is, the voice quality adjustment unit 103a has a function as a receiving unit that receives a voice quality specified by the user. Further, the voice quality adjustment unit 103a specifies the conversion content according to the operation result of the user by using the coefficient data 105a stored in the conversion coefficient storage unit 105, and sends the conversion content to the conversion unit 101. Instruct. Specifically, similarly to the voice quality adjusting section 103 of the first embodiment, the voice quality adjusting section 103a includes, for each type of voice quality, for example, for each brightness or fast-talking, a range indicating a convertible range (absolute range) of the voice quality. A bar B and a pointer P which is movable on the range bar B and indicates the degree of the voice quality are displayed. The user operates the pointer P to move along the range bar B to set a desired voice quality. The voice quality adjustment unit 103a has a function as a presentation unit that presents a range bar B and a pointer P to present a range that can be further converted from the current voice quality conversion degree. .
[0067] また、本実施の形態における声質調整部 103aは、調整制御部 104から各声質に 対する変換範囲の指示を受け付け、その指示された変換範囲のみをユーザに提示 する。即ち、声質調整部 103aは、範囲バー Bの長さを、調整制御部 104から指示さ れた変換範囲に対応する長さに変更し、その範囲バー B上以外へのポインタ Pの移 動を禁止する。  [0067] Further, voice quality adjusting section 103a in the present embodiment receives an instruction of a conversion range for each voice quality from adjustment control section 104, and presents only the instructed conversion range to the user. That is, the voice quality adjustment unit 103a changes the length of the range bar B to a length corresponding to the conversion range instructed by the adjustment control unit 104, and moves the pointer P to a position other than on the range bar B. Ban.
[0068] 調整制御部 104aは、特徴パラメタ列 piと、ユーザによる声質調整部 103aに対す る操作結果と、限界値格納部 106の限界条件とを取得する。そして調整制御部 104 は、その特徴パラメタ列 piと操作結果と限界条件とに基づいて、声質調整部 103aに おける各声質の適正な変換範囲を導出する。調整制御部 104aは、その導出した変 換範囲を声質調整部 103aに指示する。即ち、調整制御部 104aは、特徴パラメタ列 pl、及び声質調整部 103aでユーザにより受け付けられた声質に応じて、声質調整 部 103aで提示される範囲を、変形特徴パラメタ列 p2の示す声質に破綻が生じない 適正範囲に変更させる範囲変更手段としての機能を有する。 The adjustment control unit 104a acquires the characteristic parameter sequence pi, the operation result of the user on the voice quality adjustment unit 103a, and the limit condition of the limit value storage unit 106. Then, the adjustment control unit 104 derives an appropriate conversion range of each voice quality in the voice quality adjustment unit 103a based on the characteristic parameter sequence pi, the operation result, and the limit condition. The adjustment control unit 104a determines the derived The switching range is instructed to the voice quality adjustment unit 103a. That is, the adjustment control unit 104a breaks down the range presented by the voice quality adjustment unit 103a to the voice quality indicated by the modified feature parameter sequence p2 according to the characteristic parameter sequence pl and the voice quality received by the user in the voice quality adjustment unit 103a. It has a function as a range changing means for changing to an appropriate range in which no problem occurs.
[0069] 図 9は、本実施の形態の声質調整部 103aが提示する内容を説明するための説明 図である。 FIG. 9 is an explanatory diagram for describing the content presented by voice quality adjusting section 103a of the present embodiment.
[0070] 例えば、図 9の(a)に示すように、まずユーザは、声質調整部 103aの各声質のボイ ンタ Pを全て指示値が 0となるように設定する。次に、ユーザは、声質調整部 103aの 明るさを示す声質のポインタ Pを、指示値が 10となるように設定する。  [0070] For example, as shown in (a) of Fig. 9, first, the user sets the pointers P of the respective voice qualities of the voice timbre adjusting unit 103a so that the indicated values are all zero. Next, the user sets the pointer P of the voice quality indicating the brightness of the voice quality adjusting unit 103a so that the indicated value becomes 10.
[0071] ここで、調整制御部 104aは、図 5を用いて説明したように、図 9の(a)に示す設定で は、基本周波数 FOが 350を示す変形特徴パラメタ列 p2が生成されることを推定する 。さらに、調整制御部 104aは、明るさを示す声質以外の他の声質に対して、限界値 格納部 106の限界条件が満たされるような適正な変換範囲を導出する。例えば、推 定された変形特徴パラメタ列 p2は、基本周波数 FO = 350、第一フォルマント周波数 Fl = 520、第二フォルマント周波数 F2= 1610、フレーム継続長 FR=40、及び音 源パワー PW=40を示す。また、限界条件は、変形特徴パラメタ列 p2の基本周波数 F0は 350以下、第一フォルマント周波数 F1は 600以下、第二フォルマント周波数 F2 は 1700以下、フレーム継続長 FRは 100、及び音源パワー PWは 50以下を示す。こ のとき、調整制御部 104aは、変形特徴パラメタ列 p2の各パラメタと限界条件とを比較 し、基本周波数 F0をさらに増加させることはできないと判断する。即ち、調整制御部 104aは、早口を示す声質の変換範囲は 0目盛り分だけに限られると判断し、その判 断結果を声質調整部 103aに指示する。  Here, as described with reference to FIG. 5, the adjustment control unit 104a generates the deformed feature parameter sequence p2 in which the fundamental frequency FO indicates 350 with the setting shown in FIG. 9A. Estimate that. Further, the adjustment control unit 104a derives an appropriate conversion range that satisfies the limit condition of the limit value storage unit 106 for voice qualities other than voice qualities indicating brightness. For example, the estimated deformation feature parameter sequence p2 is obtained by calculating the fundamental frequency FO = 350, the first formant frequency Fl = 520, the second formant frequency F2 = 1610, the frame duration FR = 40, and the sound source power PW = 40. Show. The limit conditions are that the fundamental frequency F0 of the deformed feature parameter sequence p2 is 350 or less, the first formant frequency F1 is 600 or less, the second formant frequency F2 is 1700 or less, the frame duration FR is 100, and the sound source power PW is 50. The following is shown. At this time, the adjustment control unit 104a compares each parameter of the deformation feature parameter sequence p2 with the limit condition, and determines that the fundamental frequency F0 cannot be further increased. That is, the adjustment control unit 104a determines that the conversion range of the voice quality indicating the fast-talking is limited to only the 0 scale, and instructs the voice quality adjustment unit 103a of the determination result.
[0072] その結果、図 9の(b)に示すように、声質調整部 103aは、早口の声質に対応する 1 0目盛り分の長さの範囲バー Bを、 0目盛り分の長さに短くしてこれを表示する。このよ うに、明るさの指示値が 10に設定されるのに連動して、早口の範囲バー Bの長さが 0 目盛り分だけとなるため、ユーザは早口のポインタ Pを移動させることができない。し たがって、早口の指示値を増カロさせることによる音声劣化の発生、つまり声質の破綻 を防止することができる。 [0073] また、ユーザが声質調整部 103aの明るさの声質に対応するポインタ Pを調整し、そ の指示値を 10から 9に設定したときには、調整制御部 104aは、その設定に基づいて 、再度上述と同様、明るさを示す声質以外の他の声質に対して、限界値格納部 106 の限界条件 (変形特徴パラメタ列 p2の基本周波数 FOは 350以下)力 S満たされるよう な適正な変換範囲を導出する。即ち、調整制御部 104aは、早口を示す声質の変換 範囲は 5目盛り分に限られると判断し、その判断結果を声質調整部 103aに指示する As a result, as shown in FIG. 9 (b), the voice quality adjusting unit 103a shortens the range bar B of the length of 10 scales corresponding to the voice quality of the fast-talk to a length of 0 scale. To display this. As described above, in conjunction with the setting of the brightness indication value to 10, the length of the fast-talk range bar B becomes only 0 divisions, so that the user cannot move the fast-talk pointer P. . Therefore, it is possible to prevent the occurrence of voice deterioration, that is, the breakdown of voice quality, by increasing the calorie of the instruction value of the fast mouth. Further, when the user adjusts the pointer P corresponding to the voice quality of the brightness of the voice quality adjustment unit 103a and sets the indicated value from 10 to 9, the adjustment control unit 104a performs the following based on the setting: Again, as described above, for voice qualities other than voice qualities that indicate brightness, the limit condition in the limit value storage unit 106 (fundamental frequency FO of the deformed feature parameter sequence p2 is 350 or less). Derive a range. That is, the adjustment control unit 104a determines that the conversion range of the voice quality indicating the fast-talking is limited to five scales, and instructs the voice quality adjustment unit 103a of the determination result.
[0074] その結果、図 9の(c)に示すように、声質調整部 103aは、早口の声質に対応する 範囲バー Bを、 5目盛り分の長さ、つまり目盛り 0から 5に対応する長さに長くしてこれ を表示する。 As a result, as shown in FIG. 9 (c), the voice quality adjustment unit 103a sets the range bar B corresponding to the voice quality of the fast-talking to a length of five scales, that is, a length corresponding to scales 0 to 5. Display this after a long time.
[0075] 図 10は、本実施の形態における調整制御部 104aの動作を示すフロー図である。  FIG. 10 is a flowchart showing the operation of adjustment control section 104a in the present embodiment.
まず、調整制御部 104aは、特徴パラメタ列 piを取得するとともに (ステップ S100)、 声質調整部 103aに対してユーザにより行われた設定の内容を特定する (ステップ S1 02)。  First, the adjustment control unit 104a acquires the characteristic parameter sequence pi (step S100), and specifies the contents of settings made by the user for the voice quality adjustment unit 103a (step S102).
[0076] 次に、調整制御部 104aは、特徴パラメタ列 piと声質調整部 103aの設定内容とに 基づいて、変形特徴パラメタ列 p2を推定する (ステップ S104)。調整制御部 104aは 、推定した変形特徴パラメタ列 p2と、限界値格納部 106の限界条件とに基づいて、 声質調整部 103aの各声質に対する適正な変換範囲を導出する (ステップ S 106)。  Next, the adjustment control unit 104a estimates a modified feature parameter sequence p2 based on the feature parameter sequence pi and the settings of the voice quality adjustment unit 103a (step S104). The adjustment control unit 104a derives an appropriate conversion range for each voice quality of the voice quality adjustment unit 103a based on the estimated deformation feature parameter sequence p2 and the limit condition of the limit value storage unit 106 (Step S106).
[0077] そして、調整制御部 104aは、その導出した適正な変換範囲を声質調整部 103aに 指示して、その変換範囲に応じた長さの範囲バー Bを表示させる(ステップ S 108)。  [0077] Then, adjustment control section 104a instructs voice quality adjustment section 103a of the derived proper conversion range, and displays range bar B having a length corresponding to the conversion range (step S108).
[0078] このように本実施の形態では、声質調整部 103aにより提示される声質の変換可能 な範囲は、特徴パラメタ列 pi及びユーザの指定する声質に応じて適正な範囲に変 更される。したがって、ユーザはその指定する声質力 さらに他の声質を指定しようと するときには、変形特徴パラメタ列 p2の声質に破綻が生じる力否かについて意識す ることなぐその適正範囲内で声質を指定すれば、ユーザの期待通りの声質を示す 変形特徴パラメタ列を生成することができる。その結果、ユーザインターフェースの観 点から使い勝手を向上することができる。  As described above, in the present embodiment, the convertible range of the voice quality presented by voice quality adjusting section 103a is changed to an appropriate range according to feature parameter sequence pi and the voice quality specified by the user. Therefore, when the user wants to specify another voice quality, the user can specify the voice quality within an appropriate range without being aware of whether or not the voice quality of the deformed feature parameter sequence p2 is broken. Thus, it is possible to generate a deformed feature parameter sequence indicating the voice quality expected by the user. As a result, usability can be improved from the viewpoint of the user interface.
[0079] (変形例 1) ここで、本実施の形態における声質調整部 103aの表示方法に関する第 1の変形 例について説明する。 (Modification 1) Here, a first modified example regarding the display method of voice quality adjusting section 103a in the present embodiment will be described.
[0080] 本変形例に係る声質調整部 103aは、範囲バー Bの長さを変化させることなぐ調整 制御部 104aから指示された変換範囲だけポインタ Pが移動可能となるように、ポイン タ Pの位置を変化させる。  [0080] The voice quality adjustment unit 103a according to the present modification is configured such that the pointer P can be moved by the conversion range instructed by the adjustment control unit 104a without changing the length of the range bar B. Change position.
[0081] 図 11は、本変形例に係る声質調整部 103aが提示する内容を説明するための説明 図である。  FIG. 11 is an explanatory diagram for describing the content presented by the voice quality adjusting unit 103a according to the present modification.
[0082] 例えば、図 11の(a)に示すように、まずユーザは、声質調整部 103aの各声質のポ インタ Pを全て指示値が 0となるように設定する。次に、ユーザは、声質調整部 103a の明るさを示す声質のポインタ Pを、指示値が 10となるように設定する。  For example, as shown in FIG. 11 (a), first, the user sets the pointers P of the respective voice qualities of the voice timbre adjusting unit 103a such that the indicated values are all zero. Next, the user sets the pointer P of the voice quality indicating the brightness of the voice quality adjusting unit 103a so that the indicated value becomes 10.
[0083] ここで、調整制御部 104aは、上述と同様、早口を示す声質の変換範囲は 0目盛り だけに限られると判断し、その判断結果を声質調整部 103aに指示する。  Here, similarly to the above, adjustment control section 104a determines that the conversion range of voice quality indicating fast-talking is limited to only 0 scales, and instructs voice quality adjustment section 103a of the determination result.
[0084] このような指示を受けた声質調整部 103aは、図 11の(b)に示すように、早口の声 質に対応するポインタ Pを、目盛り 10の位置に移動させてこれを表示する。即ち、調 整制御部 104aの指示内容は、早口を示す声質の変換範囲は 0目盛りだけに限られ ることであって、ポインタ Pを目盛りの増加方向に動かせないことを示している。そこで 、本変形例に係る声質調整部 103aは、そのポインタ Pを、目盛りの増加方向に動か せない位置、つまり目盛り 10の位置に移動させてこれを表示するのである。ただし、 ここでは声質調整部 103aは、早口の声質に対応するポインタ Pを移動させただけで あり、その早口の声質の指示値を 10とした音響的特徴のパラメタの変換を変換部 10 1に指示することはない。このように、明るさの指示値が 10に設定されるのに連動して 、早口のポインタ Pが目盛り 10 (最大値)に表示されるため、早口の指示値を増加さ せることによる音声劣化の発生、つまり声質の破綻を防止することができる。  The voice quality adjustment unit 103a that has received such an instruction moves the pointer P corresponding to the voice quality of the fast-talking to the position of the scale 10, and displays it, as shown in (b) of FIG. . That is, the instruction content of the adjustment control unit 104a indicates that the conversion range of the voice quality indicating the fast-talking is limited to only the 0 scale, and indicates that the pointer P cannot be moved in the increasing direction of the scale. Therefore, the voice quality adjustment unit 103a according to the present modification moves the pointer P to a position where the pointer P cannot be moved in the scale increasing direction, that is, the position of the scale 10, and displays it. However, in this case, the voice quality adjusting unit 103a merely moves the pointer P corresponding to the voice quality of the voice, and the conversion of the parameter of the acoustic feature with the indication value of the voice quality of the voice being 10 to the conversion unit 101. I will not tell you. In this way, the pointer P of the fast-talk is displayed on the scale 10 (maximum value) in conjunction with the setting of the brightness indication value being 10, so that the voice degradation caused by increasing the fast-talk indication value is provided. , That is, breakdown of voice quality can be prevented.
[0085] また、ユーザが声質調整部 103aの明るさの声質に対応するポインタ Pを調整し、そ の指示値を 10から 9に設定したときには、調整制御部 104aは、その設定に基づいて 、再度上述と同様、早口を示す声質の変換範囲は 5目盛り分に限られると判断し、そ の判断結果を声質調整部 103aに指示する。  Further, when the user adjusts the pointer P corresponding to the voice quality of the brightness of the voice quality adjustment unit 103a and sets the indicated value from 10 to 9, the adjustment control unit 104a performs, based on the setting, Again, as described above, it is determined that the conversion range of the voice quality indicating the fast voice is limited to five scales, and the determination result is instructed to the voice quality adjustment unit 103a.
[0086] このような指示を受けた声質調整部 103aは、図 11の(c)に示すように、早口の声 質に対応するポインタ Pを、目盛り 5の位置に移動させてこれを表示する。即ち、調整 制御部 104aの指示内容は、早口を示す声質の変換範囲は 5目盛り分だけに限られ ることであって、ポインタ Pを 5目盛りだけ増加方向に動力せることを示している。そこ で、本変形例に係る声質調整部 103aは、そのポインタ Pを、 5目盛りだけ増加方向に 動かせる位置、つまり目盛り 5の位置に移動させてこれを表示するのである。ただし、 この場合も声質調整部 103aは、早口の声質に対応するポインタ Pを移動させただけ であり、その早口の声質の指示値を 5として音響的特徴のパラメタの変換を変換部 10 1に指示することはない。 [0086] Upon receiving such an instruction, the voice quality adjustment unit 103a, as shown in FIG. Move the pointer P corresponding to the quality to the position of the scale 5 to display it. That is, the instruction content of the adjustment control unit 104a indicates that the conversion range of the voice quality indicating the fast-talking is limited to only five graduations, and indicates that the pointer P is powered in the increasing direction by five graduations. Therefore, the voice quality adjustment unit 103a according to the present modification moves the pointer P to a position where the pointer P can be moved in the increasing direction by five graduations, that is, the position of the graduation 5, and displays it. However, in this case as well, the voice quality adjustment unit 103a merely moves the pointer P corresponding to the voice quality of the voice, and sets the indicator value of the voice quality of the voice to 5 to convert the parameter of the acoustic feature to the conversion unit 101. I will not tell you.
[0087] (変形例 2) (Modification 2)
ここで、本実施の形態における声質調整部 103aの表示方法に関する第 2の変形 例について説明する。  Here, a second modified example regarding the display method of voice quality adjusting section 103a in the present embodiment will be described.
[0088] 本変形例に係る声質調整部 103aは、範囲バー Bの長さを変化させることなぐボイ ンタ Pが移動可能な範囲を文字で表示する。  [0088] The voice quality adjustment unit 103a according to the present modification displays the movable range of the pointer P without changing the length of the range bar B in characters.
[0089] 図 12は、本変形例に係る声質調整部 103aが提示する内容を説明するための説明 図である。 FIG. 12 is an explanatory diagram for describing the content presented by voice quality adjusting section 103a according to the present modification.
[0090] 例えば、図 12の(a)に示すように、まずユーザは、声質調整部 103aの各声質のポ インタ Pを全て指示値が 0となるように設定する。次に、ユーザは、声質調整部 103a の明るさを示す声質のポインタ Pを、指示値が 10となるように設定する。  For example, as shown in (a) of FIG. 12, first, the user sets the pointers P of each voice quality of the voice quality adjustment unit 103a such that the indicated values are all zero. Next, the user sets the pointer P of the voice quality indicating the brightness of the voice quality adjusting unit 103a so that the indicated value becomes 10.
[0091] ここで、調整制御部 104aは、上述と同様、早口を示す声質の変換範囲は 0目盛り 分だけに限られると判断し、その判断結果を声質調整部 103aに指示する。  Here, as described above, adjustment control section 104a determines that the conversion range of voice quality indicating fast-talking is limited to only 0 scales, and instructs voice quality adjustment section 103a of the determination result.
[0092] このような指示を受けた声質調整部 103aは、図 12の(b)に示すように、早口の声 質に対応する範囲バー Bの目盛り 0の位置に「ここまで」という文字を表示する。また、 このような文字が表示されて 、る状態で、ユーザが早口の声質に対応するポインタ P を移動させようと操作しても、声質調整部 103aはその操作を受け付けず、ポインタ P の位置を固定させる。  [0092] Receiving such an instruction, the voice quality adjusting unit 103a, as shown in FIG. 12 (b), places the word "up to here" at the position of the scale 0 of the range bar B corresponding to the voice quality of the fast-talk. indicate. Further, even if the user operates to move the pointer P corresponding to the voice quality of the fast-talked voice while such characters are displayed, the voice quality adjustment unit 103a does not accept the operation and the position of the pointer P Is fixed.
[0093] また、ユーザが声質調整部 103aの明るさの声質に対応するポインタ Pを調整し、そ の指示値を 10から 9に設定したときには、調整制御部 104aは、その設定に基づいて 、再度上述と同様、早口を示す声質の変換範囲は 5目盛り分だけに限られると判断し 、その判断結果を声質調整部 103aに指示する。 [0093] Further, when the user adjusts the pointer P corresponding to the voice quality of the brightness of the voice quality adjustment unit 103a and sets the indicated value from 10 to 9, the adjustment control unit 104a performs, based on the setting, Again, as described above, it was determined that the conversion range of the voice quality indicating the quick speech was limited to only 5 scales. The result of the determination is instructed to the voice quality adjusting unit 103a.
[0094] このような指示を受けた声質調整部 103aは、図 12の(c)に示すように、早口の声 質に対応する範囲バー Bの目盛り 5の位置に「ここまで」という文字を表示する。また、 このような文字が表示されて 、る状態で、ユーザが早口の声質に対応するポインタ P を目盛り 5以上に移動させようと操作しても、声質調整部 103aはその操作を受け付け ず、ポインタ Pの位置を目盛り 5以下に抑える。  [0094] Receiving such an instruction, the voice quality adjusting unit 103a, as shown in FIG. 12 (c), displays the word "up to here" at the position of the scale 5 on the range bar B corresponding to the voice quality of the fast-talk. indicate. Further, even if such a character is displayed and the user operates to move the pointer P corresponding to the quick voice quality to the scale 5 or more while the character is displayed, the voice quality adjusting unit 103a does not accept the operation. Keep the position of pointer P at scale 5 or less.
[0095] なお、本変形例では「ここまで」 t 、う文字を表示した力 ポインタ Pの移動可能な範 囲を示すものであれば他の文字や図形などを表示しても良い。  [0095] In this modification, other characters and figures may be displayed as long as they indicate the movable range of the force pointer P indicating "up to" t and the letter U.
[0096] (変形例 3)  [0096] (Modification 3)
本実施の形態における声質調整部 103aの範囲バー B及びポインタ Pの配置に関 する変形例を説明する。  A modification example regarding the arrangement of the range bar B and the pointer P of the voice quality adjustment unit 103a according to the present embodiment will be described.
[0097] 本変形例に係る声質調整部 103aは、各声質に対応する範囲バー B及びポインタ P を、声質の変化内容が類似するものほど近付くように配置してユーザに提示する。 [0097] The voice quality adjustment unit 103a according to the present modification arranges the range bar B and the pointer P corresponding to each voice quality such that the closer the change in the voice quality, the closer to each other, and presents it to the user.
[0098] 声質調整部 103は、変換係数格納部 105に格納されている係数データ 105aを取 得すると、その係数データ 105aに基づいて、明るさや暗さなどの各声質間の変化内 容の類似度を特定する。例えば、声質調整部 103は、係数データ 105aに示される 声質間において音響的特徴ごとに係数の差分値を導出し、その差分値から声質間 のユークリッド距離 (以下、単に距離という)を求める。声質調整部 103は、この距離 に基づ!/、て声質間の類似度を特定する。 [0098] The voice quality adjustment unit 103 obtains the coefficient data 105a stored in the transform coefficient storage unit 105, and based on the coefficient data 105a, determines the similarity of the change content between the voice qualities such as brightness and darkness. Specify the degree. For example, the voice quality adjustment unit 103 derives a difference value of a coefficient for each acoustic feature between voice qualities indicated by the coefficient data 105a, and obtains a Euclidean distance (hereinafter simply referred to as a distance) between voice qualities from the difference value. Based on this distance, voice quality adjusting section 103 specifies the similarity between voice qualities.
[0099] 図 13Aは、声質間の距離を説明するための説明図である。 FIG. 13A is an explanatory diagram for describing the distance between voice qualities.
声質調整部 103aは、図 13Aに示すように、各声質間の距離を算出する。例えば、 男らしさを示す声質と喑さを示す声質との間の距離は、 5. 4であり、明るさを示す声 質と喑さを示す声質との間の距離は、 11. 3である。  The voice quality adjusting unit 103a calculates the distance between the voice qualities as shown in FIG. 13A. For example, the distance between voice quality indicating masculinity and voice quality indicating loudness is 5.4, and the distance between voice quality indicating brightness and voice quality indicating loudness is 11.3. .
[0100] 声質調整部 103aは、このように算出された距離の近い声質ほど両声質は類似する と判断し、それらの声質を示す範囲バー B及びポインタ Pが近づくように、各範囲バー[0100] The voice quality adjusting unit 103a determines that the two voice qualities are more similar as the voice qualities are closer in the calculated distance, and the range bars B and the pointer P indicating the voice qualities are closer to each other.
B及びポインタ Pを提示する。 Present B and pointer P.
[0101] 図 13Bは、声質調整部 103aの表示内容を示す図である。 FIG. 13B is a diagram showing display contents of voice quality adjustment section 103a.
例えば、男らしさを示す声質を基準に、男らしさを示す声質と喑さを示す声質との間 の距離は、 5. 4であり、男らしさを示す声質と明るさを示す声質との間の距離は、 10 . 2であり、男らしさを示す声質と早口を示す声質との間の距離は、 10. 8である。した がって、声質調整部 103は、図 13Bに示すように、男らしさを示す声質、喑さを示す 声質、明るさを示す声質、早口を示す声質の順に、各声質の範囲バー B及びポイン タ Pを提示する。 For example, based on the voice quality indicating masculinity, the voice quality between masculinity and voice quality Is 5.4, the distance between the voice quality indicating masculinity and the voice quality indicating brightness is 10.2, and the distance between the voice quality indicating masculinity and the voice quality indicating fast-talking is , 10.8. Therefore, as shown in FIG.13B, the voice quality adjusting unit 103 sets the range bar B of each voice quality in the order of voice quality indicating masculinity, voice quality indicating length, voice quality indicating brightness, and voice quality indicating fast-talking. Pointer P is presented.
[0102] 本変形例は、変形例 1と組み合わせると、ユーザにとって直感的で分力り易い声質 変換の操作を提供することができる。  [0102] This modification example, when combined with the first modification example, can provide a voice quality conversion operation that is intuitive and easy for the user to compose.
[0103] 即ち、本実施例では、声質の変化内容が類似する範囲バー B及びポインタ Pが近く に配置されるので、ユーザがある声質のポインタ Pを操作すると、近くに配置されてい る他の声質のポインタ Pほど同じ方向につられて動き、遠くに配置されて 、る他の声 質のポインタ Pほど逆の方向につられて動く。したがって、ユーザはポインタ Pの操作 によって声質がどのように変換されるのかを直感的に把握することができる。  That is, in this embodiment, since the range bar B and the pointer P having similar voice quality changes are arranged close to each other, when the user operates the pointer P having a certain voice quality, the other nearby bars are arranged. The pointer P of voice quality moves in the same direction and moves farther away, and the pointer P of another voice quality moves in the opposite direction. Therefore, the user can intuitively understand how the voice quality is converted by operating the pointer P.
[0104] (変形例 4)  [0104] (Modification 4)
本実施の形態における声質調整部 103aの範囲バー B及びポインタ Pの配置に関 する他の変形例にっ 、て説明する。  Another modified example of the arrangement of the range bar B and the pointer P of the voice quality adjusting unit 103a according to the present embodiment will be described.
[0105] 変形例 3の声質調整部 103aは、各声質に対応する範囲バー B及びポインタ Pを、 声質の変化内容が類似するものほど近くなるように一列に配置した。一方、本変形例 に係る声質調整部 103aは、各声質に対応する範囲バー B及びポインタ Pを、声質の 変化内容が類似するものほど互いの間の角度が小さくなるように、同一円周上に沿つ て配置する。 [0105] The voice quality adjusting unit 103a of the third modification arranges the range bars B and the pointers P corresponding to the voice qualities in a line so that the closer the voice variance is, the closer the variation is. On the other hand, the voice quality adjustment unit 103a according to the present modification sets the range bar B and the pointer P corresponding to each voice quality on the same circumference so that the closer the voice content changes, the smaller the angle between them becomes. Place along.
[0106] 図 14Aは、声質調整部 103aの表示内容を示す図である。  FIG. 14A is a diagram showing the display content of voice quality adjusting section 103a.
声質調整部 103aは、図 14Aに示すように、各声質の範囲バー Bの下限を一点にま とめて、各範囲バー Bを同一円周上に沿わせて表示する。  As shown in FIG. 14A, the voice quality adjustment unit 103a summarizes the lower limit of the range bar B of each voice quality at one point and displays each range bar B along the same circle.
[0107] また、図 13Aに示すように、男らしさを示す声質と喑さを示す声質との間の距離は、 5. 4であり、男らしさを示す声質と明るさを示す声質との間の距離は、 10. 2であり、 男らしさを示す声質と早口を示す声質との間の距離は、 10. 8である。したがって、声 質調整部 103は、図 14Aに示すように、男らしさの範囲バー Bと暗さの範囲バー Bと の間の角度を最も小さぐ男らしさの範囲バー Bと早口の範囲バー Bとの間の角度を 最も大きくなるように、各範囲バー Bを提示する。 Further, as shown in FIG. 13A, the distance between the voice quality indicating masculinity and the voice quality indicating height is 5.4, and the distance between the voice quality indicating masculinity and the voice quality indicating brightness is Is 10.2, and the distance between the voice quality indicating masculinity and the voice quality indicating fast-talking is 10.8. Therefore, as shown in FIG. 14A, the voice quality adjustment unit 103 determines that the angle between the masculinity range bar B and the darkness range bar B is the smallest, and that the manhood range bar B and the fast-talk range bar B The angle between Present each range bar B to be the largest.
[0108] また、本変形例に係る声質調整部 103aは、変形例 1に示す表示方法、つまり調整 制御部 104aからの指示に基づいてポインタ Pの位置を変化させる機能を兼ね備えて も良い。 [0108] The voice quality adjustment unit 103a according to the present modification may also have the function of changing the position of the pointer P based on the display method described in the first modification, that is, the instruction from the adjustment control unit 104a.
[0109] 図 14Bは、声質調整部 103aが表示内容を変化させる様子を説明するための説明 図である。  FIG. 14B is an explanatory diagram for explaining how the voice quality adjusting unit 103a changes the display content.
[0110] 例えば、ユーザが喑さを示す声質のポインタ Pを目盛りの増加方向に移動させると 、声質調整部 103aは、調整制御部 104aからの指示に基づいて、男らしさを示す声 質のポインタ Pを目盛りの増加方向に移動させ、明るさ及び早口を示す声質の各ボイ ンタ Pを目盛りの減少方向に移動させる。  [0110] For example, when the user moves the voice quality pointer P indicating the tone in the scale increasing direction, the voice quality adjustment unit 103a, based on the instruction from the adjustment control unit 104a, the voice quality pointer indicating the masculinity. Move P in the direction of increase in the scale, and move each pointer P of the voice quality indicating the brightness and the quickness in the direction of decrease in the scale.
[0111] このように、喑さのポインタ Pの移動に伴って、他のポインタ Pも同じような方向につら れて動くように見える。これにより、ユーザにとって直感的に分力り易い声質変^ン タフェイスを提供することが可能になる。  [0111] Thus, as the pointer P moves, the other pointers P appear to move in the same direction. As a result, it is possible to provide a voice quality interface that is intuitive and easy for the user to make a contribution.
[0112] (実施の形態 3)  (Embodiment 3)
図 15は、本発明の実施の形態 3における音声合成装置の構成図である。  FIG. 15 is a configuration diagram of a speech synthesis device according to Embodiment 3 of the present invention.
[0113] この音声合成装置は、テキストデータを取得して様々な声質で音声合成を行うこと が可能な装置であって、実施の形態 2の声質変換装置と、音声合成部 201と、音声 合成データベース 202と、波形生成部 203と、スピーカ 204とを備える。  This speech synthesizer is a device capable of acquiring text data and performing speech synthesis with various voice qualities. The speech synthesizer according to the second embodiment, a speech synthesizer 201, and a speech synthesizer A database 202, a waveform generation unit 203, and a speaker 204 are provided.
[0114] 音声合成データベース 202は、複数の音声素片を示す素片データを蓄積している 。音声合成部 201は、ユーザの操作に基づいてテキストデータ tdlを取得すると、そ のテキストデータ tdlの示すテキストに対応する素片データを音声合成データベース 202から選択する。そして、音声合成部 201は、選択した素片データを用いて特徴パ ラメタ列 piを生成し、その特徴パラメタ列 piを声質変換装置に出力する。  [0114] The speech synthesis database 202 accumulates segment data indicating a plurality of speech segments. Upon acquiring the text data tdl based on the operation of the user, the speech synthesis unit 201 selects the segment data corresponding to the text indicated by the text data tdl from the speech synthesis database 202. Then, the speech synthesis unit 201 generates a feature parameter sequence pi using the selected segment data, and outputs the feature parameter sequence pi to the voice conversion device.
[0115] 声質変換装置は、上述と同様、特徴パラメタ列 piを取得すると、その特徴パラメタ 列 piによって表される声質を変換する。そして声質変換装置は、その変換結果を示 す変形特徴パラメタ列 p2を生成してこれを出力する。  [0115] As described above, upon acquiring the characteristic parameter sequence pi, the voice conversion device converts the voice represented by the characteristic parameter sequence pi. Then, the voice conversion device generates and outputs a transformed feature parameter sequence p2 indicating the result of the conversion.
[0116] 波形生成部 203は、声質変換装置力も変形特徴パラメタ列 p2を取得すると、その 変形特徴パラメタ列 p2を音声波形として示す波形信号 siを生成し、その波形信号 s 1をスピーカ 204に出力する。スピーカ 204は、その波形信号 siに応じた合成音声を 出力する。 [0116] Upon acquiring the deformed feature parameter sequence p2 from the voice conversion device, the waveform generating unit 203 generates a waveform signal si indicating the deformed feature parameter sequence p2 as a speech waveform, and generates the waveform signal s 1 is output to the speaker 204. The speaker 204 outputs a synthesized voice corresponding to the waveform signal si.
[0117] このように本実施の形態における音声合成装置は、上記実施の形態 2の声質変換 装置を備えることにより、テキストデータ tdlの内容をユーザの所望の声質の音声で 破綻することなく出力することができ、さらに使い勝手を向上することができる。  [0117] As described above, the speech synthesis device according to the present embodiment includes the voice quality conversion device according to the second embodiment, and outputs the contents of text data tdl in a voice with a desired voice quality of the user without failure. And the usability can be further improved.
[0118] なお、実施の形態 2の声質変換装置の代わりに、実施の形態 1の声質変換装置を 上記本実施の形態の音声合成装置に備えても良い。  [0118] Note that, instead of the voice conversion device of the second embodiment, the voice conversion device of the first embodiment may be provided in the speech synthesis device of the present embodiment.
[0119] (変形例 1)  [0119] (Modification 1)
ここで、本実施の形態における声質変換装置の調整制御部の動作に関する変形 例について説明する。  Here, a modified example regarding the operation of the adjustment control unit of the voice quality conversion device according to the present embodiment will be described.
[0120] 図 16は、本変形例に係る音声合成装置の構成図である。  FIG. 16 is a configuration diagram of a speech synthesizer according to the present modification.
本変形例に係る声質変換装置の調整制御部 104bは、実施の形態 2の調整制御部 104aと同一の機能を有する力 調整制御部 104aのように特徴パラメタ列 piを取得 する代わりに、音声合成データベース 202に格納されて 、る素片データを取得する。  The adjustment control unit 104b of the voice conversion apparatus according to the present modification is different from the force adjustment control unit 104a having the same function as the adjustment control unit 104a of the second embodiment in that it acquires the characteristic parameter sequence pi, The unit data stored in the database 202 is obtained.
[0121] 即ち、本変形例に係る調整制御部 104bは、特徴パラメタ列 piではなく音声合成デ ータベース 202の素片データに基づいて、合成音声の音質劣化を検出することで、 声質調整部 103のポインタ Pの位置を変更させたり、範囲バー Bの長さを変更させた りする。言い換えれば、調整制御部 104bは、音声合成データベース 202に収められ た素片データの一部または全部を用いて、特徴パラメタ列 piの示す音響的特徴の パラメタの傾向を予測し、その予測結果に基づいて、ポインタ Pの位置や範囲バー B の長さを変更させる。例えば、調整制御部 104bは、音声合成データベース 202から 全ての素片データを 1つずつ選び出し、それらの素片データを声質調整部 103aに 従って変換した場合に合成音声の品質が劣化するかどうかを基準に、ポインタ Pの位 置を変更させる。  That is, the adjustment control unit 104b according to the present modification detects the sound quality deterioration of the synthesized speech based on the segment data of the speech synthesis database 202 instead of the feature parameter sequence pi, and thereby the voice quality adjustment unit 103 Change the position of the pointer P, or change the length of the range bar B. In other words, the adjustment control unit 104b predicts the tendency of the parameter of the acoustic feature indicated by the feature parameter sequence pi by using a part or all of the segment data stored in the speech synthesis database 202, and generates a prediction result. Based on this, the position of the pointer P and the length of the range bar B are changed. For example, the adjustment control unit 104b selects all the segment data one by one from the speech synthesis database 202 and determines whether or not the quality of the synthesized speech is degraded when the segment data is converted according to the voice quality adjustment unit 103a. Change the position of pointer P to the reference.
[0122] このような本実施の形態における音声合成装置は、音声合成データベース 202を 入れ替えない限りどのようなテキストデータ tdlが入力されても、調整制御部 104bの 処理内容を同一とすることができ、処理内容を単純にすることができる。ただし、テキ ストデータ tdlの内容によって特徴パラメタ列 piの内容が大きく異なる場合は、テキス トデータ tdlの内容によって合成音声の品質が劣化する場合が有り得る。 [0122] The speech synthesizer according to the present embodiment can make the processing content of the adjustment control unit 104b the same no matter what text data tdl is input unless the speech synthesis database 202 is replaced. , The processing content can be simplified. However, if the contents of the characteristic parameter sequence pi greatly differ depending on the contents of the text data tdl, The quality of synthesized speech may be degraded depending on the content of the data tdl.
[0123] なお、本変形例における特徴パラメタ列 piは、音声合成データベース 202の素片 データ力も音声合成部 201による音声合成処理によって生成されるものでなくても良 い。つまり、本変形例に用いられる特徴パラメタ列 piは、このように生成された特徴パ ラメタ列 piの示す声質と十分近似して 、れば、他の何らかの方法で生成された特徴 パラメタ列 piであっても良い。  [0123] Note that the feature parameter sequence pi in the present modified example does not have to have the unit data power of the speech synthesis database 202 generated by the speech synthesis processing by the speech synthesis unit 201. In other words, if the feature parameter sequence pi used in the present modification is sufficiently similar to the voice quality indicated by the feature parameter sequence pi generated in this way, if it is similar to the feature parameter sequence pi generated by some other method, There may be.
[0124] (変形例 2)  [0124] (Modification 2)
ここで、本実施の形態における他の変形例にっ 、て説明する。  Here, another modified example of the present embodiment will be described.
[0125] 図 17は、本変形例に係る音声合成装置の構成図である。  FIG. 17 is a configuration diagram of a speech synthesizer according to the present modification.
本変形例に係る音声合成装置は、音声合成データベース 202が格納する複数の 素片データのうち、合成音声の品質劣化を推定するのに必要なデータだけを特徴テ 一ブルとして保持する特徴テーブル格納部 205を備える。  The speech synthesizer according to the present modification stores a feature table that holds, as a feature table, only data necessary for estimating quality degradation of synthesized speech among a plurality of segment data stored in the speech synthesis database 202. A part 205 is provided.
[0126] 具体的に、特徴テーブル格納部 205に保持される特徴テーブルは、例えば音声合 成データベース 202に収められた全素片データの中から、各音響的特徴におけるパ ラメタの上限値、下限値、及び平均値のみを抜き出したものである。  [0126] Specifically, the feature table held in the feature table storage unit 205 includes, for example, an upper limit value and a lower limit value of a parameter for each acoustic feature among all segment data stored in the speech synthesis database 202. Only the value and the average value are extracted.
[0127] また、本変形例に係る調整制御部 104cは、実施の形態 2の調整制御部 104aと同 一の機能を有する力 調整制御部 104aのように特徴パラメタ列 piを取得する代わり に、特徴テーブル格納部 205に格納されて ヽる上述の特徴テーブルを取得する。  The adjustment control unit 104c according to the present modification is different from the force adjustment control unit 104a having the same function as the adjustment control unit 104a of the second embodiment in The above-mentioned feature table stored in the feature table storage unit 205 is obtained.
[0128] 即ち、本変形例に係る調整制御部 104cは、特徴パラメタ列 piではなく特徴テープ ル格納部 205の特徴テーブルに基づ ヽて、合成音声の品質劣化を推定することで、 声質調整部 103のポインタ Pの位置を変更させたり、範囲バー Bの長さを変更させた りする。  That is, the adjustment control unit 104c according to the present modification estimates the quality degradation of the synthesized speech based on the feature table of the feature table storage unit 205 instead of the feature parameter sequence pi, and performs voice quality adjustment. The position of the pointer P in the section 103 is changed, and the length of the range bar B is changed.
[0129] これにより本変形例に係る調整制御部 104cは、変形例 1の調整制御部 104bのよう に音声合成データベース 202の多くの素片データを用いることなぐ情報量の少ない 特徴テーブルを用いることで、ポインタ Pの位置や範囲バー Bの長さの変更を迅速に 実行させることができる。  As a result, the adjustment control unit 104c according to the present modification uses a feature table having a small amount of information, unlike the adjustment control unit 104b according to the first modification, in which a large amount of segment data of the speech synthesis database 202 is used. Thus, the position of the pointer P and the length of the range bar B can be quickly changed.
[0130] なお、本変形例における特徴パラメタ列 piは、変形例 1と同様、音声合成データべ ース 202の素片データ力も音声合成部 201による音声合成処理によって生成される ものでなくても良い。つまり、本変形例に用いられる特徴パラメタ列 piは、このように 生成された特徴パラメタ列 piの示す声質と十分近似していれば、他の何らかの方法 で生成された特徴パラメタ列 piであっても良 、。 [0130] Note that the feature parameter sequence pi in this modification example is also generated by the speech synthesis processing by the speech synthesis unit 201, as in the first modification example. It doesn't have to be something. In other words, if the feature parameter sequence pi used in this modification is sufficiently similar to the voice quality indicated by the feature parameter sequence pi generated in this way, it is a feature parameter sequence pi generated by some other method. Also good.
[0131] (変形例 3) [0131] (Modification 3)
ここで、本実施の形態における他の変形例にっ 、て説明する。  Here, another modified example of the present embodiment will be described.
[0132] 図 18は、本変形例に係る音声合成装置の構成図である。 FIG. 18 is a configuration diagram of a speech synthesizer according to the present modification.
本変形例に係る音声合成装置は、本実施の形態における音声合成部 201の代わ りに音声合成部 201aを備える。また、本変形例に係る声質変換装置は、変換部 101 及び調整制御部 104aの代わりに、変換部 101a及び調整制御部 104bを備える。  The speech synthesis device according to the present modification includes a speech synthesis unit 201a instead of speech synthesis unit 201 in the present embodiment. Further, the voice quality conversion device according to the present modification includes a conversion unit 101a and an adjustment control unit 104b instead of the conversion unit 101 and the adjustment control unit 104a.
[0133] 調整制御部 104bは、変形例 1で説明したように、音声合成データベース 202の素 片データに基づいて、声質調整部 103aのポインタ Pの位置を変更させたり、範囲バ 一 Bの長さを変更させたりする。 As described in the first modification, the adjustment control unit 104b changes the position of the pointer P of the voice quality adjustment unit 103a or changes the length of the range bar B based on the unit data of the speech synthesis database 202. Or change it.
[0134] 変換部 101aは、音声合成データベース 202に格納されている素片データに対して[0134] The conversion unit 101a performs processing on the segment data stored in the speech synthesis database 202.
、声質調整部 103aからの指示に応じてその素片データが示す音響的特徴を変換す る。 In response to an instruction from the voice quality adjusting unit 103a, the audio characteristic indicated by the segment data is converted.
[0135] 音声合成部 201aは、テキストデータ tdlを取得すると、そのテキストデータ tdlの示 すテキストに対応する素片データであって、その声質 (音響的特徴)について変換さ れたものを変換部 101aから取得する。そして、音声合成部 201aは、取得した変換 後の素片データを用いて変形特徴パラメタ列 p2を生成し、その変形特徴パラメタ列 p 2を波形生成部 203に出力する。  [0135] Upon acquiring the text data tdl, the speech synthesis unit 201a converts the segment data corresponding to the text indicated by the text data tdl and converted for the voice quality (acoustic feature) into a conversion unit. Obtained from 101a. Then, the speech synthesis unit 201a generates a deformed feature parameter sequence p2 using the obtained converted unit data, and outputs the deformed feature parameter sequence p2 to the waveform generating unit 203.
[0136] なお、本変形例に係る音声合成装置に、変形例 2の特徴テーブル格納部 205を備 えて、声質変換装置の調整制御部 104bの代わりに、変形例 2の調整制御部 104cを 備えても良い。  The voice synthesizing apparatus according to the present modification includes the feature table storage unit 205 according to the second modification, and includes the adjustment control unit 104c according to the second modification instead of the adjustment control unit 104b of the voice conversion device. May be.
[0137] (変形例 4)  [0137] (Modification 4)
ここで、本実施の形態における他の変形例にっ 、て説明する。  Here, another modified example of the present embodiment will be described.
図 19は、本変形例に係る音声合成装置の構成図である。  FIG. 19 is a configuration diagram of a speech synthesis device according to the present modification.
本変形例に係る音声合成装置は、音声合成部 201及び音声合成データベース 20 2の代わりに、音声分析部 206を備える。 [0138] 音声分析部 206は、肉声であってその音声波形を示す音声波形データ dlを取得 し、その音声波形データ dlに基づ 、て特徴パラメタ列 p 1を生成する。 The speech synthesis apparatus according to the present modification includes a speech analysis unit 206 instead of the speech synthesis unit 201 and the speech synthesis database 202. [0138] The voice analysis unit 206 acquires voice waveform data dl that is a real voice and indicates the voice waveform, and generates a feature parameter sequence p1 based on the voice waveform data dl.
[0139] 声質変換装置の変換部 101及び調整制御部 104aは、上述のように生成された特 徴パラメタ列 p 1を音声分析部 206から取得する。  The conversion unit 101 and the adjustment control unit 104a of the voice quality conversion device obtain the characteristic parameter sequence p 1 generated as described above from the voice analysis unit 206.
[0140] このような本変形例の音声合成装置は、ユーザが話す音声の声質をリアルタイムで 変換して合成音声として出力する。また、この構成によって、肉声の音声波形データ dlから生成される合成音声に対して、直感的に操作しやすいインタフェイスを通じて 、品質劣化を防止しつつ声質変換処理を加えることが可能となる。  [0140] The voice synthesizer of the present modified example converts the voice quality of the voice spoken by the user in real time and outputs the voice as synthesized voice. Further, with this configuration, it is possible to perform voice quality conversion processing on synthesized voice generated from the real voice voice waveform data dl while preventing quality deterioration through an interface that is intuitively easy to operate.
[0141] なお、声質変換装置が音声分析部 206を備えていても良い。  [0141] The voice quality conversion device may include the voice analysis unit 206.
[0142] (変形例 5)  [0142] (Modification 5)
ここで、本実施の形態における他の変形例にっ 、て説明する。  Here, another modified example of the present embodiment will be described.
[0143] 図 20は、本変形例に係る音声合成装置の構成図である。  FIG. 20 is a configuration diagram of a speech synthesizer according to the present modification.
本変形例に係る音声合成装置は、変形例 4の音声合成装置と同様、音声合成部 2 01及び音声合成データベース 202の代わりに、音声分析部 206を備える。また、本 変形例に係る声質変換装置は、調整制御部 104aの代わりに、調整制御部 104dを 備える。  The speech synthesis apparatus according to the present modification includes a speech analysis unit 206 instead of the speech synthesis unit 201 and the speech synthesis database 202, similarly to the speech synthesis apparatus of the fourth modification. Further, the voice quality conversion device according to the present modification includes an adjustment control unit 104d instead of the adjustment control unit 104a.
[0144] 調整制御部 104dは、調整制御部 104aと同一の機能を有する力 調整制御部 104 aのように特徴パラメタ列 piを取得する代わりに波形特徴テーブル td2を取得する。 即ち、本変形例に係る調整制御部 104dは、特徴パラメタ列 piではなく波形特徴テ 一ブル td2に基づいて、合成音声の品質劣化を推定することで、声質調整部 103a のポインタ Pの位置を変更させたり、範囲バー Bの長さを変更させたりする。  The adjustment control unit 104d acquires the waveform feature table td2 instead of acquiring the feature parameter sequence pi as in the force adjustment control unit 104a having the same function as the adjustment control unit 104a. That is, the adjustment control unit 104d according to the present modification estimates the position of the pointer P of the voice quality adjustment unit 103a by estimating the quality deterioration of the synthesized speech based on the waveform feature table td2 instead of the feature parameter sequence pi. Or change the length of the range bar B.
[0145] 波形特徴テーブル td2は、例えば音声波形データ dlを発声したのと同じ話者が事 前に発声したサンプル音声を分析した結果から、合成音声の品質劣化を推定するの に必要なデータのみをあら力じめ抽出したものである。例えば、波形特徴テーブル td 2は、サンプル音声の分析結果である各音響的特徴のパラメタから、上限値、下限値 、及び平均値のみを抜き出したものである。  [0145] The waveform feature table td2 contains, for example, only data necessary for estimating the quality degradation of the synthesized speech from the result of analyzing the sample speech previously uttered by the same speaker who uttered the speech waveform data dl. Is extracted. For example, the waveform feature table td2 is obtained by extracting only the upper limit value, the lower limit value, and the average value from the parameters of each acoustic feature that is the analysis result of the sample voice.
[0146] なお、調整制御部 104dは、複数の波形特徴テーブル td2を取得してもよぐ複数 の波形特徴テーブル td2から何れか 1つを選択しても良い。例えば、調整制御部 10 4dは、話者の年齢や性別などの属性に基づいて、音声波形データ dl及び特徴パラ メタ列 piの特徴を最もよく表す波形特徴テーブル t2を選択して使用する。 [0146] The adjustment control unit 104d may select any one of the plurality of waveform feature tables td2 from which the plurality of waveform feature tables td2 may be acquired. For example, the adjustment control unit 10 4d selects and uses the waveform feature table t2 that best represents the features of the speech waveform data dl and the feature parameter sequence pi based on attributes such as the age and gender of the speaker.
[0147] このように本変形例の音声合成装置では、波形特徴テーブル td2を用いることで、 変換部 101の特徴パラメタ列 p 1の取得に先行して、声質調整部 103のポインタ Pの 位置及び範囲バー Bの長さの変更を実行しておくことができる。また、本変形例の調 整制御部 104dは、情報量の多い特徴パラメタ列 piを用いることなぐ情報量の少な い波形特徴テーブル td2を用いることにより、ポインタ Pの位置や範囲バー Bの長さの 変更を迅速に実行させることができる。 As described above, in the speech synthesizer of the present modification, by using the waveform feature table td2, the position of the pointer P of the voice quality adjustment unit 103 and the position of the pointer P before the acquisition of the feature parameter sequence p1 by the conversion unit 101 A change in the length of the range bar B can be performed. In addition, the adjustment control unit 104d of the present modification uses the waveform feature table td2 with a small amount of information instead of using the feature parameter sequence pi with a large amount of information, thereby obtaining the position of the pointer P and the length of the range bar B. Changes can be made quickly.
産業上の利用可能性  Industrial applicability
[0148] 本発明の声質変換装置は、ユーザインターフェースの観点力 使い勝手を向上す ることができるという効果を奏し、例えば合成音を利用するエージェントアプリケーショ ンゃテキスト読み上げアプリケーション、声質変換機能を利用する通信装置、音声の 声質エディタ装置等として有用である。  The voice conversion device of the present invention has an effect that the viewpoint power of the user interface and the usability can be improved. For example, an agent application using a synthetic sound ゃ a text-to-speech application, a communication using a voice conversion function It is useful as a device or a voice quality editor device.

Claims

請求の範囲 The scope of the claims
[1] 音声の特徴を示す特徴データを、前記音声と異なる声質の音声を示す変換特徴 データに変換する声質変換装置であって、  [1] A voice quality conversion device for converting feature data indicating a feature of a voice into conversion feature data indicating a voice having a voice quality different from the voice,
前記特徴データを取得する取得手段と、  Acquisition means for acquiring the feature data;
声質の変換可能な範囲を提示する提示手段と、  Presentation means for presenting a range in which voice quality can be converted;
前記提示手段により提示された範囲内において、ユーザの指定する声質を受け付 ける受付手段と、  Receiving means for receiving a voice quality specified by the user within a range presented by the presenting means;
前記取得手段に取得された特徴データ、及び前記受付手段で受け付けられた声 質に応じて、前記提示手段で提示される範囲を、前記変換特徴データの示す声質 に破綻が生じない適正範囲に変更させる範囲変更手段と、  In accordance with the characteristic data acquired by the acquiring means and the voice quality received by the receiving means, the range presented by the presenting means is changed to an appropriate range in which the voice quality indicated by the converted characteristic data does not fail. Means for changing the range,
前記取得手段により取得された特徴データを、前記受付手段で受け付けられた声 質の音声を示す変換特徴データに変換する変換手段と  Converting means for converting the characteristic data obtained by the obtaining means into converted characteristic data indicating voice of voice quality received by the receiving means;
を備えることを特徴とする声質変換装置。  A voice quality conversion device comprising:
[2] 前記提示手段は、複数種の声質ごとに、当該声質の変換可能な程度の範囲を提 示し、  [2] The presenting means presents, for each of a plurality of types of voice qualities, a range in which the voice qualities can be converted,
前記受付手段は、前記提示手段に提示された声質ごとの各範囲内において、ユー ザの指定する声質の程度をパラメタとして受け付け、  The receiving means receives, as a parameter, a degree of voice quality specified by a user within each range of voice quality presented by the presenting means,
前記範囲変更手段は、前記受付手段で変換するように受け付けられた声質のパラ メタに応じて、前記提示手段で提示される他の声質の範囲を変更させ、  The range changing means changes a range of another voice quality presented by the presenting means according to a parameter of the voice quality received to be converted by the receiving means,
前記変換手段は、前記受付手段で受け付けられた各声質のパラメタに応じて、前 記特徴データを前記変換特徴データに変換する  The converting means converts the characteristic data into the converted characteristic data according to the parameters of each voice quality received by the receiving means.
ことを特徴とする請求項 1記載の声質変換装置。  2. The voice quality conversion device according to claim 1, wherein:
[3] 前記提示手段は、前記複数種の声質ごとに、図形と、ユーザの操作に応じて前記 図形上を移動するポインタとを表示することで、当該声質の変換可能な程度の範囲 を提示し、 [3] The presenting means presents, for each of the plurality of voice qualities, a graphic and a pointer that moves on the graphic in response to a user operation, thereby presenting a range in which the voice qualities can be converted. And
前記受付手段は、前記図形上におけるポインタの位置に基づいて、ユーザの指定 するパラメタを特定して当該パラメタを受け付ける  The receiving unit specifies a parameter specified by a user based on a position of a pointer on the graphic and receives the parameter.
ことを特徴とする請求項 2記載の声質変換装置。 3. The voice conversion device according to claim 2, wherein:
[4] 前記範囲変更手段は、前記ポインタを移動させることで、前記変換可能な程度の範 囲を変更させる [4] The range changing means changes the range in which the conversion is possible by moving the pointer.
ことを特徴とする請求項 3記載の声質変換装置。  4. The voice conversion device according to claim 3, wherein:
[5] 前記提示手段は、前記図形を棒状に表示し、 [5] The presenting means displays the figure in a bar shape,
前記範囲変更手段は、前記ポインタを図形の長手方向に沿って移動させることで、 前記変換可能な程度の範囲を変更させる  The range changing means changes the convertible range by moving the pointer along the longitudinal direction of the figure.
ことを特徴とする請求項 4記載の声質変換装置。  5. The voice conversion device according to claim 4, wherein:
[6] 前記提示手段は、前記各声質に対する図形及びポインタを、それぞれの声質に基 づく変化内容が類似するものほど互いの間が狭くなるように、並列して配置する ことを特徴とする請求項 5記載の声質変換装置。 [6] The presenting means is characterized in that the figure and the pointer for each voice quality are arranged in parallel so that the more similar the change content based on each voice quality, the narrower the space between them. Item 5. The voice quality conversion device according to Item 5.
[7] 前記提示手段は、前記各声質に対する図形及びポインタを、それぞれの声質に基 づく変化内容が類似するものほど互いの間の角度が小さくなるように、同一円周上に 沿って配置する [7] The presenting means arranges the figure and the pointer for each voice quality along the same circumference so that the more similar the change content based on each voice quality, the smaller the angle between them is.
ことを特徴とする請求項 5記載の声質変換装置。  6. The voice quality conversion device according to claim 5, wherein:
[8] 前記範囲変更手段は、前記図形を変形させることで、前記変換可能な程度の範囲 を変更させる [8] The range changing means changes the convertible range by deforming the figure.
ことを特徴とする請求項 3記載の声質変換装置。  4. The voice conversion device according to claim 3, wherein:
[9] 前記提示手段は、前記図形を棒状に表示し、 [9] The presenting means displays the figure in a bar shape,
前記範囲変更手段は、前記図形の長手方向の長さを伸縮させることで、前記変更 可能な程度の範囲を変更させる  The range changing means changes the range of the changeable extent by expanding and contracting the length of the figure in the longitudinal direction.
ことを特徴とする請求項 8記載の声質変換装置。  9. The voice conversion device according to claim 8, wherein:
[10] 前記声質変換装置は、さらに、 [10] The voice quality conversion device further comprises:
声質に破綻が生じない音響的特徴の限界を示す限界データを格納している限界 格納手段を備え、  A limit storing means for storing limit data indicating a limit of an acoustic feature that does not cause a breakdown in voice quality;
前記範囲変更手段は、前記特徴データと、前記受付手段で受け付けられたパラメ タと、前記限界データにより示される限界とに基づいて前記適正範囲を特定し、前記 提示手段で提示される範囲を前記適正範囲に変更させる  The range change unit specifies the appropriate range based on the characteristic data, the parameter received by the reception unit, and a limit indicated by the limit data, and sets the range presented by the presentation unit to the range. Change to an appropriate range
ことを特徴とする請求項 3記載の声質変換装置。 4. The voice conversion device according to claim 3, wherein:
[11] 前記提示手段が提示する複数種の声質は、明るさを示す声質、喑さを示す声質、 男らしさを示す声質、及び早口を示す声質のうちの少なくとも 2つである [11] The plurality of types of voice quality presented by the presentation means are at least two of voice quality indicating brightness, voice quality indicating height, voice quality indicating masculinity, and voice quality indicating fast-talking.
ことを特徴とする請求項 3記載の声質変換装置。  4. The voice conversion device according to claim 3, wherein:
[12] 前記声質変換装置は、さらに、 [12] The voice conversion device further comprises:
音声を取得して前記音声を示す前記特徴データを生成するデータ生成手段を備 える  Data generating means for obtaining a voice and generating the feature data indicating the voice;
ことを特徴とする請求項 11記載の声質変換装置。  12. The voice conversion device according to claim 11, wherein:
[13] 音声の特徴を示す特徴データを、前記音声と異なる声質の音声を示す変換特徴 データに変換する声質変換方法であって、 [13] A voice quality conversion method for converting feature data indicating a feature of a voice into conversion feature data indicating a voice having a voice quality different from the voice,
前記特徴データを取得する取得ステップと、  An acquisition step of acquiring the feature data;
声質の変換可能な範囲を提示する提示ステップと、  A presentation step of presenting a convertible range of voice quality;
前記提示ステップで提示された範囲内において、ユーザの指定する声質を受け付 ける受付ステップと、  A receiving step of receiving a voice quality specified by the user within the range presented in the presenting step;
前記取得ステップで取得された特徴データ、及び前記受付ステップで受け付けら れた声質に応じて、前記提示ステップで提示された範囲を、前記変換特徴データの 示す声質に破綻が生じない適正範囲に変更して提示する範囲変更ステップと、 前記取得ステップで取得された特徴データを、前記受付ステップで受け付けられた 声質の音声を示す変換特徴データに変換する変換ステップと  The range presented in the presenting step is changed to an appropriate range in which the voice quality indicated by the converted feature data does not fail according to the characteristic data acquired in the acquiring step and the voice quality received in the receiving step. And a conversion step of converting the feature data obtained in the obtaining step into conversion feature data indicating voice of voice quality received in the receiving step.
を含むことを特徴とする声質変換方法。  A voice quality conversion method comprising:
[14] 前記提示ステップでは、複数種の声質ごとに、当該声質の変換可能な程度の範囲 を提示し、 [14] The presenting step presents, for each of a plurality of types of voice qualities, a range in which the voice qualities can be converted,
前記受付ステップでは、前記提示ステップで提示された声質ごとの各範囲内にお V、て、ユーザの指定する声質の程度をパラメタとして受け付け、  In the receiving step, in each range of each voice quality presented in the presenting step, V, a degree of voice quality specified by the user is received as a parameter,
前記範囲変更ステップでは、前記受付ステップで変換するように受け付けられた声 質のパラメタに応じて、前記提示ステップで提示された他の声質の範囲を変更して提 示し、  In the range changing step, the range of another voice quality presented in the presenting step is changed and presented according to a parameter of the voice quality received to be converted in the receiving step,
前記変換ステップでは、前記受付ステップで受け付けられた各声質のパラメタに応 じて、前記特徴データを前記変換特徴データに変換する ことを特徴とする請求項 13記載の声質変換方法。 In the converting step, the characteristic data is converted into the converted characteristic data according to the parameters of each voice quality received in the receiving step. 14. The voice quality conversion method according to claim 13, wherein:
[15] 前記提示ステップでは、前記複数種の声質ごとに、図形と、ユーザの操作に応じて 前記図形上を移動するポインタとを表示することで、当該声質の変換可能な程度の 範囲を提示し、 [15] In the presenting step, for each of the plurality of voice qualities, a graphic and a pointer that moves on the graphic in accordance with a user operation are displayed, thereby presenting a range in which the voice qualities can be converted. And
前記受付ステップでは、前記図形上におけるポインタの位置に基づいて、ユーザの 指定するパラメタを特定して当該パラメタを受け付ける  In the receiving step, a parameter specified by a user is specified based on the position of the pointer on the graphic, and the parameter is received.
ことを特徴とする請求項 14記載の声質変換方法。  15. The voice quality conversion method according to claim 14, wherein:
[16] 前記範囲変更ステップでは、前記ポインタを移動させることで、前記変換可能な程 度の範囲を変更して提示する [16] In the range changing step, the range that can be converted is changed and presented by moving the pointer.
ことを特徴とする請求項 15記載の声質変換方法。  16. The voice conversion method according to claim 15, wherein:
[17] 前記範囲変更ステップでは、前記図形を変形させることで、前記変換可能な程度の 範囲を変更して提示する [17] In the range changing step, the figure is deformed to change and present the range that can be converted.
ことを特徴とする請求項 15記載の声質変換方法。  16. The voice conversion method according to claim 15, wherein:
[18] 音声の特徴を示す特徴データを、前記音声と異なる声質の音声を示す変換特徴 データに変換するためのプログラムであって、 [18] A program for converting feature data indicating a feature of a voice into conversion feature data indicating a voice having a voice quality different from the voice,
前記特徴データを取得する取得ステップと、  An acquisition step of acquiring the feature data;
声質の変換可能な範囲を提示する提示ステップと、  A presentation step of presenting a convertible range of voice quality;
前記提示ステップで提示された範囲内において、ユーザの指定する声質を受け付 ける受付ステップと、  A receiving step of receiving a voice quality specified by the user within the range presented in the presenting step;
前記取得ステップで取得された特徴データ、及び前記受付ステップで受け付けら れた声質に応じて、前記提示ステップで提示された範囲を、前記変換特徴データの 示す声質に破綻が生じない適正範囲に変更して提示する範囲変更ステップと、 前記取得ステップで取得された特徴データを、前記受付ステップで受け付けられた 声質の音声を示す変換特徴データに変換する変換ステップと  The range presented in the presenting step is changed to an appropriate range in which the voice quality indicated by the converted feature data does not fail according to the characteristic data acquired in the acquiring step and the voice quality received in the receiving step. And a conversion step of converting the feature data obtained in the obtaining step into conversion feature data indicating voice of voice quality received in the receiving step.
をコンピュータに実行させることを特徴とするプログラム。  Which causes a computer to execute the program.
[19] テキストデータの示すテキストを合成音声に変換する音声合成装置であって、 前記テキストデータを取得して、前記テキストデータのテキストに対応する音声の特 徴を示す特徴データを生成する特徴データ生成手段と、 前記特徴データ生成手段で生成された特徴データを取得する取得手段と、 声質の変換可能な範囲を提示する提示手段と、 [19] A speech synthesizer for converting a text indicated by text data into synthesized speech, wherein the feature data acquires the text data and generates feature data indicating a feature of speech corresponding to the text of the text data. Generating means; Acquisition means for acquiring the feature data generated by the feature data generation means; presentation means for presenting a convertible range of voice quality;
前記提示手段により提示された範囲内において、ユーザの指定する声質を受け付 ける受付手段と、  Receiving means for receiving a voice quality specified by the user within a range presented by the presenting means;
前記取得手段に取得された特徴データ、及び前記受付手段で受け付けられた声 質に応じて、前記提示手段で提示される範囲を、前記合成音声の声質に破綻が生じ ない適正範囲に変更させる範囲変更手段と、  A range in which the range presented by the presenting means is changed to an appropriate range in which the voice quality of the synthesized speech does not break down according to the characteristic data acquired by the acquiring means and the voice quality received by the receiving means. Change means;
前記取得手段により取得された特徴データを、前記受付手段で受け付けられた声 質の音声を示す変換特徴データに変換する変換手段と、  Converting means for converting the characteristic data obtained by the obtaining means into converted characteristic data indicating voice of voice quality received by the receiving means;
前記変換手段によって変換された変換特徴データに基づいて前記合成音声を生 成して出力する音声出力手段と  Voice output means for generating and outputting the synthesized voice based on the conversion characteristic data converted by the conversion means;
を備えることを特徴とする音声合成装置。  A speech synthesis device comprising:
[20] テキストデータの示すテキストを合成音声に変換する音声合成方法であって、 前記テキストデータを取得して、前記テキストデータのテキストに対応する音声の特 徴を示す特徴データを生成する特徴データ生成ステップと、  [20] A speech synthesis method for converting a text indicated by text data into a synthesized speech, wherein the text data is acquired and feature data is generated which indicates feature of speech corresponding to the text of the text data. Generating step;
前記特徴データ生成ステップで生成された特徴データを取得する取得ステップと、 声質の変換可能な範囲を提示する提示ステップと、  An acquisition step of acquiring the feature data generated in the feature data generation step; a presentation step of presenting a convertible range of voice quality;
前記提示ステップで提示された範囲内において、ユーザの指定する声質を受け付 ける受付ステップと、  A receiving step of receiving a voice quality specified by the user within the range presented in the presenting step;
前記取得ステップで取得された特徴データ、及び前記受付ステップで受け付けら れた声質に応じて、前記提示ステップで提示された範囲を、前記合成音声の声質に 破綻が生じない適正範囲に変更して提示する範囲変更ステップと、  According to the characteristic data acquired in the acquiring step and the voice quality received in the receiving step, the range presented in the presenting step is changed to an appropriate range in which the voice quality of the synthesized voice does not break down. A range change step to be presented;
前記取得ステップで取得された特徴データを、前記受付ステップで受け付けられた 声質の音声を示す変換特徴データに変換する変換ステップと、  A conversion step of converting the feature data obtained in the obtaining step into conversion feature data indicating voice of voice quality received in the receiving step;
前記変換ステップで変換された変換特徴データに基づいて前記合成音声を生成し て出力する音声出力ステップと  A voice output step of generating and outputting the synthesized voice based on the converted feature data converted in the conversion step;
を含むことを特徴とする音声合成方法。  A speech synthesis method comprising:
PCT/JP2004/017139 2003-11-21 2004-11-18 Voice changer WO2005050624A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2003-392672 2003-11-21
JP2003392672A JP2007041012A (en) 2003-11-21 2003-11-21 Voice quality converter and voice synthesizer

Publications (1)

Publication Number Publication Date
WO2005050624A1 true WO2005050624A1 (en) 2005-06-02

Family

ID=34616459

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2004/017139 WO2005050624A1 (en) 2003-11-21 2004-11-18 Voice changer

Country Status (2)

Country Link
JP (1) JP2007041012A (en)
WO (1) WO2005050624A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2008058696A (en) * 2006-08-31 2008-03-13 Nara Institute Of Science & Technology Voice quality conversion model generation device and voice quality conversion system
US7792673B2 (en) 2005-11-08 2010-09-07 Electronics And Telecommunications Research Institute Method of generating a prosodic model for adjusting speech style and apparatus and method of synthesizing conversational speech using the same
CN102527039A (en) * 2010-12-30 2012-07-04 德信互动科技(北京)有限公司 Sound effect control device and method

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP6497025B2 (en) * 2013-10-17 2019-04-10 ヤマハ株式会社 Audio processing device
JP6483578B2 (en) * 2015-09-14 2019-03-13 株式会社東芝 Speech synthesis apparatus, speech synthesis method and program
JP6639285B2 (en) 2016-03-15 2020-02-05 株式会社東芝 Voice quality preference learning device, voice quality preference learning method and program

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH09230893A (en) * 1996-02-22 1997-09-05 N T T Data Tsushin Kk Regular speech synthesis method and device therefor
JPH1097267A (en) * 1996-09-24 1998-04-14 Hitachi Ltd Method and device for voice quality conversion
JPH11249679A (en) * 1998-03-04 1999-09-17 Ricoh Co Ltd Voice synthesizer
JP2000194390A (en) * 1998-12-25 2000-07-14 Matsushita Electric Ind Co Ltd Method and device for synthesizing voice
JP2001195604A (en) * 1999-10-20 2001-07-19 Hitachi Kokusai Electric Inc Method for editing moving picture information
JP2002297176A (en) * 2001-03-29 2002-10-11 Sanyo Electric Co Ltd Electronic book device
JP2003066984A (en) * 2001-04-30 2003-03-05 Sony Computer Entertainment America Inc Method for altering network transmitting content data based on user specified characteristics
JP2003140678A (en) * 2001-10-31 2003-05-16 Matsushita Electric Ind Co Ltd Voice quality control method for synthesized voice and voice synthesizer

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH09230893A (en) * 1996-02-22 1997-09-05 N T T Data Tsushin Kk Regular speech synthesis method and device therefor
JPH1097267A (en) * 1996-09-24 1998-04-14 Hitachi Ltd Method and device for voice quality conversion
JPH11249679A (en) * 1998-03-04 1999-09-17 Ricoh Co Ltd Voice synthesizer
JP2000194390A (en) * 1998-12-25 2000-07-14 Matsushita Electric Ind Co Ltd Method and device for synthesizing voice
JP2001195604A (en) * 1999-10-20 2001-07-19 Hitachi Kokusai Electric Inc Method for editing moving picture information
JP2002297176A (en) * 2001-03-29 2002-10-11 Sanyo Electric Co Ltd Electronic book device
JP2003066984A (en) * 2001-04-30 2003-03-05 Sony Computer Entertainment America Inc Method for altering network transmitting content data based on user specified characteristics
JP2003140678A (en) * 2001-10-31 2003-05-16 Matsushita Electric Ind Co Ltd Voice quality control method for synthesized voice and voice synthesizer

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7792673B2 (en) 2005-11-08 2010-09-07 Electronics And Telecommunications Research Institute Method of generating a prosodic model for adjusting speech style and apparatus and method of synthesizing conversational speech using the same
JP2008058696A (en) * 2006-08-31 2008-03-13 Nara Institute Of Science & Technology Voice quality conversion model generation device and voice quality conversion system
CN102527039A (en) * 2010-12-30 2012-07-04 德信互动科技(北京)有限公司 Sound effect control device and method

Also Published As

Publication number Publication date
JP2007041012A (en) 2007-02-15

Similar Documents

Publication Publication Date Title
US7991616B2 (en) Speech synthesizer
US8073696B2 (en) Voice synthesis device
JP3083640B2 (en) Voice synthesis method and apparatus
US6405169B1 (en) Speech synthesis apparatus
WO2005109399A1 (en) Speech synthesis device and method
US20090204395A1 (en) Strained-rough-voice conversion device, voice conversion device, voice synthesis device, voice conversion method, voice synthesis method, and program
WO2006040908A1 (en) Speech synthesizer and speech synthesizing method
JP2008268477A (en) Rhythm adjustable speech synthesizer
EP3480810A1 (en) Voice synthesizing device and voice synthesizing method
WO2005050624A1 (en) Voice changer
JP4664194B2 (en) Voice quality control device and method, and program storage medium
JPH05260082A (en) Text reader
JP5152588B2 (en) Voice quality change determination device, voice quality change determination method, voice quality change determination program
JP4841339B2 (en) Prosody correction device, speech synthesis device, prosody correction method, speech synthesis method, prosody correction program, and speech synthesis program
JP5518621B2 (en) Speech synthesizer and computer program
JPH07140996A (en) Speech rule synthesizer
JP2956936B2 (en) Speech rate control circuit of speech synthesizer
JP6727477B1 (en) Pitch pattern correction device, program and pitch pattern correction method
JPH09179576A (en) Voice synthesizing method
JP6191094B2 (en) Speech segment extractor
JP2003271200A (en) Method and device for synthesizing voice
JP3892691B2 (en) Speech synthesis method and apparatus, and speech synthesis program
JP3292218B2 (en) Voice message composer
Ebihara et al. Speech synthesis software with a variable speaking rate and its implementation on a 32-bit microprocessor
JPH045694A (en) Rule synthesizing device

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BW BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE EG ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NA NI NO NZ OM PG PH PL PT RO RU SC SD SE SG SK SL SY TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): BW GH GM KE LS MW MZ NA SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IS IT LU MC NL PL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

DPEN Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed from 20040101)
121 Ep: the epo has been informed by wipo that ep was designated in this application
NENP Non-entry into the national phase

Ref country code: DE

WWW Wipo information: withdrawn in national office

Country of ref document: DE

122 Ep: pct application non-entry in european phase
NENP Non-entry into the national phase

Ref country code: JP

WWW Wipo information: withdrawn in national office

Country of ref document: JP