EP1256933B1 - Verfahren und Vorrichtung zur Steuerung eines Emotionssynthesegeräts - Google Patents

Verfahren und Vorrichtung zur Steuerung eines Emotionssynthesegeräts Download PDF

Info

Publication number
EP1256933B1
EP1256933B1 EP20010402176 EP01402176A EP1256933B1 EP 1256933 B1 EP1256933 B1 EP 1256933B1 EP 20010402176 EP20010402176 EP 20010402176 EP 01402176 A EP01402176 A EP 01402176A EP 1256933 B1 EP1256933 B1 EP 1256933B1
Authority
EP
European Patent Office
Prior art keywords
emotion
variable
value
parameter
conveyed
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Lifetime
Application number
EP20010402176
Other languages
English (en)
French (fr)
Other versions
EP1256933A3 (de
EP1256933A2 (de
Inventor
Pierre-Yves Oudeyer
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sony France SA
Original Assignee
Sony France SA
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from EP01401203A external-priority patent/EP1256931A1/de
Priority claimed from EP20010401880 external-priority patent/EP1256932B1/de
Application filed by Sony France SA filed Critical Sony France SA
Priority to EP20010402176 priority Critical patent/EP1256933B1/de
Priority to JP2002206013A priority patent/JP2003177772A/ja
Priority to US10/217,002 priority patent/US7457752B2/en
Publication of EP1256933A2 publication Critical patent/EP1256933A2/de
Publication of EP1256933A3 publication Critical patent/EP1256933A3/de
Application granted granted Critical
Publication of EP1256933B1 publication Critical patent/EP1256933B1/de
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/02Methods for producing synthetic speech; Speech synthesisers
    • G10L13/033Voice editing, e.g. manipulating the voice of the synthesiser
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/08Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination
    • G10L13/10Prosody rules derived from text; Stress or intonation

Definitions

  • the invention relates to the field of emotion synthesis in which an emotion is simulated e.g. in a voice signal, and more particularly aims to provide a new degree of freedom in controlling the possibilities offered by emotion synthesis systems and algorithms.
  • the latter can be intelligible words or unintelligible vocalisations or sounds, such as babble or animal-like noises.
  • Such emotion synthesis finds applications in the animation of communicating objects, such as robotic pets, humanoids, interactive machines, educational training, systems for reading out texts, the creation of sound tracks for films, animations, etc., among others.
  • Figure 1 illustrates the basic concept of a classical voiced emotion synthesis system 2 based on an emotion simulation algorithm.
  • the system receives at an input 4 voice data Vin, which is typically neutral, and produces at an output 6 voice data Vout which is an emotion-tinted form of the input voice data Vin.
  • the voice data is typically in the form of a stream of data elements each corresponding to a sound element, such as a phoneme or syllable.
  • a data element generally specifies one or several values concerning the pitch and/or intensity and/or duration of the corresponding sound element.
  • the voice emotion synthesis operates by performing algorithmic steps modifying at least one of these values in a specified manner to produce the required emotion.
  • the emotion simulation algorithm is governed by a set of input parameters P1, P2, P3, ..., PN, referred to as emotion-setting parameters, applied at an appropriate input 8 of the system 2. These parameters are normally numerical values and possibly indicators for parameterising the emotion simulation algorithm and are generally determined empirically.
  • Each emotion E to be portrayed has its specific set of emotion-setting parameters.
  • the values of the emotion-setting parameters P1, P2, P3, ..., PN are respectively C1, C2, C3, ..., CN for calm, A1, A2, A3,..., AN for angry, H1, H2, H3, ..., HN for happy, S1, S2, S3, ..., SN for sad.
  • the invention proposes, according to a first object, a method of controlling the operation of a device for synthesising an emotion conveyed in a sound, the device having at least one input for a parameter whose value is used to set a type of emotion to be conveyed, the method comprising the steps of:
  • At least one variable parameter is made variable according to a local model over the control range, the model relating a quantity of emotion control variable to the variable parameter, whereby the quantity of emotion control variable is used to variably establish a value of the variable parameter.
  • the local model can be based on the assumption that while different sets of one or several parameter value(s) can produce different identifiable emotions, a chosen set of parameter value(s) for establishing a given type of emotion is sufficiently stable to allow local excursions from the parameter value(s) without causing an uncontrolled change in the nature of the corresponding emotion. As it turns out, the change is in the quantity of the emotion. The determined control range will then be within the range of the local excursions.
  • the model is advantageously a locally linear model for the control range and for a given type of emotion, the variable parameter being made to vary linearly over the control range by means of the quantity of emotion control variable.
  • A is a value inside the control range, whereby the quantity of emotion control variable is variable in an interval which contains the value zero.
  • the value of A can be substantially the mid value of the control range, and the quantity of emotion control variable can be variable in an interval whose mid value is zero.
  • the quantity of emotion control variable is preferably variable in an interval of from -1 to +1.
  • the value A can be equal to the standard parameter value originally specified to set a type of emotion to be conveyed.
  • the value Eimax or Eimin can be determined experimentally by excursion of the standard parameter value originally specified to set a type of emotion to be conveyed and by determining a maximum excursion in an increasing or decreasing direction yielding a desired limit to the quantity of emotion to be conferred by the control range.
  • the invention makes it possible to use a same quantity of emotion control variable to collectively establish a plurality of variable parameters of the emotion synthesising device.
  • the invention relates to an apparatus for controlling the operation of a system for synthesising an emotion conveyed in a sound, the system having at least one input for a parameter whose value is used to set a type of emotion to be conveyed, characterised in that it comprises:
  • the invention relates to the use of the above apparatus to adjust a quantity of emotion in a device for synthesising an emotion conveyed on a sound.
  • the invention relates to a system comprising an emotion synthesis device having at least one input for receiving at least one parameter whose value is used to set a type of emotion to be conveyed and an apparatus according to third aspect, operatively connected to deliver a variable to the at least one input, thereby to confer a variability in an amount of a type of emotion to be conveyed.
  • the invention relates to a computer program providing computer executable instructions, which when loaded onto a data processor causes the data processor to operate the above method.
  • the computer program can be embodied in a recording medium of any suitable form.
  • Figure 2 illustrates the functional units and operation of a quantity of emotion variation system 10 according to a preferred embodiment invention, operating in conjunction with a voice-based emotion simulation algorithm system 12.
  • the latter is of the generative type, i.e. it has its own means for generating voice data conveying a determined emotion E.
  • the embodiment 10 can of course operate equally well with any other type of emotion simulation algorithm system, such as that described with reference to figure 1, in which in a stream of neutral voice data is supplied at an input. Both these types of emotion simulation algorithm systems, as well as others with which the embodiment can operate, are known in the art. More information on voice-based emotion simulation algorithms and systems can be found, inter alia, in : Cahn, J.
  • the emotion simulation algorithm system 12 uses a number N of emotion-setting parameters P1, P2, P3, ..., PN (generically designated P) to produce a given emotion E, as explained above with reference to figure 1.
  • the number N of these parameters can vary considerably from one algorithm to another, typically from 1 to 16 or considerably more.
  • These parameters P are empirically-determined numerical values or indicators exploited in calculation or decision steps of the algorithm. They can be loaded into the emotion simulation algorithm system 12 either through a purpose designed interface or by a parameter-loading routine. In the example, the insertion of the parameters P is shown symbolically by lines entering the system 12, a suitable interface or loading unit being integrated to allow these parameters to be introduced from the outside.
  • the emotion simulation algorithm system 12 can thus produce different types of emotions E, such as calm, angry, happy, sad, etc. by a suitable set of N values for the respective parameters P1, P2, P3, ..., PN.
  • the quantity of emotion variation system 10 operates to impose a variation on these values E1-EN according to linear model.
  • a linear - or progressive - variation of E1-EN causes a progressive variation in the response of the emotion simulation algorithm system 12.
  • the response in question will be a variation in the quantity, i.e. intensity, of the emotion E, at least for a given variation range of the values E1-EN.
  • a range of possible variation for each of these values is initially determined.
  • a parameter Pi i being an arbitrary integer between 1 and N inclusive
  • an exploration of the emotion simulation algorithm system 12 is undertaken, during which a parameter Pi is subjected to an excursion from its initial standard value Ei to a value Eimax which is found to correspond to a maximum intensity of the emotion E.
  • This value Eimax is determined experimentally. It will generally correspond to a value above which that parameter either no longer contributes to a significant increase in the intensity of the emotion E (i.e. a saturation occurs), or beyond which the type of emotion E becomes modified or distorted.
  • the value Eimax can be either greater than or less than the standard value Ei : depending on the parameter Pi, the increase in the intensity of the emotion can result from increasing or decreasing the standard value Ei.
  • the determination of the maximum intensity value Eimax for the parameter Pi can be performed either by keeping all the other parameters at the initial standard value, or by varying some or all of the others according to a knowledge of the interplay of the different parameters P1-PN.
  • variable parameter generator unit 16 whose function is to replace the parameters P1-PN of the emotion simulation algorithm system 12 by corresponding variable parameters VP1-VPN.
  • variable parameter values VP1-VPN thus produced by the variable parameter generator unit 16 are delivered at respective outputs 17-1 to 17-N which are connected to respective parameter accepting inputs 13-1 to 13-N of the emotion simulation algorithm system 12.
  • the schematic representation of these connections from the variable parameter generator unit 16 to the emotion simulation algorithm system 12 can be embodied in any suitable form: parallel or serial data bus, wireless link, etc. using any suitable data transfer protocol.
  • the loading of the variable parameters VP can be controlled by a routine at the level of the emotion simulation algorithm system 12.
  • the control variable ⁇ is in the range of - 1 to +1 inclusive. Its value is set by an emotion quantity selector unit 18 which can be a user-accessible interface or an electronic control unit operating according to a program which determines the quantity of emotion to be produced, e.g. as a function an external command indicating that quantity, or automatically depending on the environment, the history, the context, etc. of operation e.g. of a robotic pet or the like.
  • an emotion quantity selector unit 18 can be a user-accessible interface or an electronic control unit operating according to a program which determines the quantity of emotion to be produced, e.g. as a function an external command indicating that quantity, or automatically depending on the environment, the history, the context, etc. of operation e.g. of a robotic pet or the like.
  • the range of variation of ⁇ is illustrated as a scale 20 along which a pointer 22 can slide to designate the required value of ⁇ in the interval [-1,1].
  • the scale 20 and pointer 22 can be embodied through a graphic interface so as to be displayed as a cursor on a monitor screen of a computer, or forming part of a robotic pet.
  • the pointer 22 can then be displaceable through a keyboard, buttons, a mouse or the like.
  • the scale can also be defined by a potentiometer or similar variable component.
  • can be to all intents and purposes continuous or stepwise incremental over the range [-1, +1].
  • ⁇ designated by the pointer 20 is generated by an emotion quantity selector unit 18 and supplied to an input 22 of the variable parameter generator unit 16 adapted to receive the control variable so as to enter it into formula (1) above.
  • a scale normalised in the interval [-1, +1] is advantageous in that it simplifies the management of the values used by the variable parameter generator unit 16. More specifically, it allows the values of the memory unit 14 to be used directly as they are in formula (1), without the need to introduce a scaling factor.
  • the embodiment is remarkable in that the same variable ⁇ serves for varying each of the N variable parameter values VPi for the emotion simulation algorithm system 12, while covering the respective ranges of values for the parameters P1-PN.
  • the variation law according to formula (1) is able to manage both parameters whose value needs to be increased to produce an increased quantity of emotion and parameters whose value needs to be decreased to produce an increased quantity of emotion.
  • the value Eimax in question will be less than Ei.
  • the bracketed term of formula (1) will then be negative with a magnitude which increases as the quantity of emotion chosen through the variable ⁇ increases in the region between 0 and +1.
  • the term ⁇ (Eimax - Ei) will be positive and contribute to increasing VPi and thereby to reduce the quantity the emotion.
  • variable parameters VP will each have the same relative position in their respective range, whereby the variation produced by the emotion quantity selector 14 is well balanced and homogeneous throughout variable parameters.
  • Eimin can be determined experimentally for each parameter to be made variable in a manner analogous to as described above: Eimin is identified as the value which yields the lowest useful amount of emotion, below which there is either no practically useful lowering of emotional intensity or there is a distortion in the type of emotion. The memory will then store values Eimin instead of Eimax.
  • the mid range value can be a value different from the standard value Ei.
  • Example 1 a robotic pet able to express by modulated sounds produced by a voice synthesiser which has a set of input parameters defining an emotional state to be conveyed by the voice.
  • the emotion synthesis algorithm is based on the notion that an emotion can be expressed in a feature space consisting of an arousal component and a valence component. For example, anger, sadness, happiness and comfort are represented in particular regions in the arousalvalence feature space.
  • the algorithm refers to tables representing a set of parameters P, including at least the duration (DUR), the pitch (PITCH), and the sound (VOLUME) of a phoneme defined in advance for each basic emotion. These parameters are numerical values or states (such as “rising” or “falling”). These state parameters can be kept as per the standard setting and not be controlled by the quantity of emotion variation system 10.
  • Table I is an example of the parameters and their attributed values for the emotion "happiness”.
  • the named parameters apply to unintelligible words of one or a few syllables or phonemes, specified inter alia in terms of pitch characteristics, duration, contour, volume, etc., in recognised units. These characteristics are expressed in a formatted data structure recognised by the algorithm.
  • Table I parameter settings for the emotion "happiness" characteristic Parameter: numerical value or state Last word accentuated true Mean pitch 400 Hz Pitch variation 100 Hz Maximum pitch 600 Hz Mean duration 170 milliseconds Duration variation 50 milliseconds Probability of accentuating a word 0.3 (30%) Default contour rising Contour of last word rising Volume 2 (specific units)
  • the robotic pet incorporating this algorithm is made to switch from one set of parameter values to another following the emotion it decides to portray.
  • the standard parameter value of 400 becomes the value Ei in equation (1) for that parameter.
  • a step of determining i) in which direction (increase/decrease) this value can be modified to produce a more intense portrayal of the happiness There is performed a step ii) of determining how far in that direction this parameter can be changed to usefully increase this intensity.
  • This limit value is Eimax of equation (1).
  • Example 2 a system able to add an emotion content to incoming voice data corresponding to intelligible words or unintelligible sounds in a neutral tone, so that the added emotion can be sensed when the thus-processed voice data is played.
  • the system comprises an emotion simulation algorithm system which, as in the case of figure 1, has an input for receiving sound data and an output for delivering the sound data in the same format, but with modified data values according the emotion to be conveyed.
  • the system can thus be effectively placed along a chain between a source of sound data and a sound data playing device, such as an interpolator plus synthesiser, in a completely transparent manner.
  • the sound data will be in the form of successive data elements each corresponding to sound element, e.g. a syllable or phoneme to be played by a synthesiser.
  • a data element will specify e.g. the duration of the sound element, and one or several pitch value(s) to be present over this duration.
  • the data element may also designate the syllable to be reproduced, and there can be associated an indication as to whether or not that data element can be accentuated.
  • a data element for the syllable "be” may have the following data structure : “be: 100, P1, P2, P3, P4, P5".
  • the first number, 100 expresses the duration in milliseconds.
  • the following five values (symbolised by P1-P5) indicate the pitch value (F0) at five respective and successive intervals within that duration.
  • Figure 3 is a block diagram showing in functional terms how the emotion simulation algorithm system integrates with the above emotion synthesiser 26 to produce variable-intensity emotion-tinted voice data.
  • the emotion simulation algorithm system 26 operates by selectively applying the operators O on the syllable data read out from a vocalisation data file 28. Depending on their type, these operators can modify either the pitch data (pitch operator) or the syllable duration data (duration operator). These modifications take place upstream of an interpolator 30, e.g. before a voice data decoder 32, so that the interpolation is performed on the operator-modified values. As explained below, the modification is such as to transform selectively a neutral form of speech into a speech conveying a chosen emotion (sad, calm, happy, angry) in a chosen quantity.
  • the basic operator forms are stored in an operator set library 34, from which they can be selectively by an operator set configuration unit 36.
  • the latter serves to prepare and parameterise the operators in accordance with current requirements.
  • an operator parameterisation unit 38 which determines the parameterisation of the operators in accordance with: i) the emotion to be imprinted on the voice (calm, sad, happy, angry, etc.), ii) the degree - or intensity - of the emotion to apply, and iii) the context of the syllable, as explained below.
  • the operation parameterisation unit 38 incorporates the variable parameter generator unit 16 and the memory 14 of the quantity of emotion variation system 10.
  • the emotion and degree of emotion are instructed to the operator parameterisation unit 38 by an emotion selection interface 40 which presents inputs accessible by a user U.
  • this user interface incorporates the quantity of emotion selector 18 (cf. figure 2), the pointer 22 being a physically or electronically user-displaceable device. Accordingly, among the commands issued by the interface unit 40 will be the variable ⁇ .
  • the emotion selection interface 40 can be in the form of a computer interface with on-screen menus and icons, allowing the user U to indicate all the necessary emotion characteristics and other operating parameters.
  • the context of the syllable which is operator sensitive is: i) the position of syllable in a phrase, as some operator sets are applied only to the first and last syllables of the phrase, ii) whether the syllables relate to intelligible word sentences or to unintelligible sounds (babble, etc.) and iii) as the case arises, whether or not a syllable considered is allowed or not to be accentuated, as indicated in the vocalisation data file 28.
  • a first and last syllables identification unit 42 and an authorised syllable accentuation detection unit 44 both having an access to the vocalisation data file unit 28 and informing the operator parameterisation unit 38 of the appropriate context-sensitive parameters.
  • the random selection is provided by a controllable probability random draw unit 46 operatively connected between the authorised syllable accentuation unit 44 and the operator parameterisation unit 38.
  • the random draw unit 38 has a controllable degree of probability of selecting a syllable from the candidates. Specifically, if N is the probability of a candidate being selected, with N ranging controllably from 0 to 1, then for P candidate syllables, N.P syllables shall be selected on average for being subjected to a specific operator set associated to a random accentuation. The distribution of the randomly selected candidates is substantially uniform over the sequence of syllables.
  • the suitably configured operator sets from the operator set configuration unit 26 are sent to a syllable data modifier unit 48 where they operate on the syllable data.
  • the syllable data modifier unit 48 receives the syllable data directly from the vocalisation data file 28.
  • the thus-received syllable data are modified by unit 48 as a function of the operator set, notably in terms of pitch and duration data.
  • the resulting modified syllable data (new syllable data) are then outputted by the syllable data modifier unit 48 to the decoder 32, with the same structure as presented in the vocalisation data file.
  • the decoder can process the new syllable data exactly as if it originated directly from the vocalisation data file. From there, the new syllable data are interpolated (interpolator unit 30) and processed by an audio frequency sound processor, audio amplifier and speaker. However, the sound produced at the speaker then no longer corresponds to a neutral tone, but rather to the sound with a simulation of an emotion as defined by the user U.
  • All the above functional units are under the overall control of an operations sequencer unit 50 which governs the complete execution of the emotion generation procedure in accordance with a prescribed set of rules.
  • Figure 4 illustrates graphically the effect of the pitch operator set OP on a pitch curve of a synthesised sound element originally specified by its sound data.
  • the figure shows - respectively on left and right columns - a pitch curve (fundamental frequency f against time t) before the action of the pitch operator and after the action of a pitch operator.
  • the input pitch curves are identical for all operators and happen to be relatively flat.
  • the rising slope and falling slope operators OPrs and OPfs have the following characteristic: the pitch at the central point in time (1/2 t1 for a pitch duration of t1) remains substantially unchanged after the operator. In other words, the operators act to pivot the input pitch curve about the pitch value at the central point in time, so as to impose the required slope. This means that in the case of a rising slope operator OPrs, the pitch values before the central point in time are in fact lowered, and that in the case of a falling slope operator OPfs, the pitch values before the central point in time are in fact raised, as shown by the figure.
  • intensity operators designated OI.
  • OI intensity operators
  • the effects of these operators are shown in figure 5, which is directly analogous to the illustration of figure 4.
  • These operators are also four in number and are identical to those of the pitch operators OP, except that they act on the curve of intensity I over time t. Accordingly, these operators shall not be detailed separately, for the sake of conciseness.
  • the pitch and intensity operators can each be parameterised as follows :
  • Figure 6 illustrates graphically the effect of a duration (or time) operator OD on a syllable.
  • the illustration shows on left and right columns respectively the duration of the syllable (in terms of a horizontal line expressing an initial length of time t1) of the input syllable before the effect of a duration operator and after the effect of a duration operator.
  • the duration operator can be:
  • the operator can also be neutralised or made as a neutral operator, simply by inserting the value 0 for the parameter D.
  • duration operator has been represented as being of two different types, respectively dilation and contraction, it is clear that the only difference resides in the sign plus or minus placed before the parameter D.
  • a same operator mechanism can produce both operator functions (dilation and contraction) if it can handle both positive and negative numbers.
  • the range of possible values for D and its possible incremental values in the range can be chosen according to requirements.
  • the embodiment further uses a separate operator, which establishes the probability N for the random draw unit 46.
  • This value is selected from a range of 0 (no possibility of selection) to 1 (certainty of selection).
  • the value N serves to control the density of accentuated syllables in the vocalised output as appropriate for the emotional quality to reproduce.
  • variable parameter VPi can correspond to one of the following above-defined parameter values to be made variable: Prs, Pfs, Psu, Psd, Irs, Ifs, Isu, Isd, Dd, Dc.
  • the number and selection of these values to be made variable is selectable by the user interface 40.
  • Figures 7A and 7B constitute a flow chart indicating the process of forming and applying selectively the above operators to syllable data on the basis of the system described with reference to figure 3.
  • Figure 7B is a continuation of figure 7A.
  • the process starts with an initialisation phase P1 which involves loading input syllable data from the vocalisation data file 28 (step S2).
  • step S4 is loaded the emotion to be conveyed on the phrase or passage of which the loaded syllable data forms a part, using the interface unit 40 (step S4).
  • the emotions can be calm, sad, happy, angry, etc.
  • the interface also inputs the quantity (degree) of emotion to be given, e.g. by attributing a weighting value (step S6).
  • the system then enters into a universal operator phase P2, in which a universal operator set OS(U) is applied systematically to all the syllables.
  • the universal operator set OS(U) contains all the operators of figures 4 and 6, i.e. OPrs, OPfs, OPsu, OPsd, forming the four pitch operators, plus ODd and ODc, forming the two duration operators.
  • Each of these operators of operator set OS(U) is parameterised by a respective associated value, respectively Prs(U), Pfs(U), Psu(U), Psd(U), Dd(U), and Dc(U), as explained above (step S8). This step involves attributing numerical values to these parameters, and is performed by the operator set configuration unit 26.
  • variable ⁇ The choice of parameter values for the universal operator set OS(U) is determined by the operator parameterisation unit 8 as a function of the programmed emotion and quantity of emotion, plus other factors as the case arises.
  • each of these parameters is made variable by the variable ⁇ , whereupon they shall be designated respectively as VPrs(U), VPfs(U), VPsu(U), VPsd(U), VDd(U), and VDc(U).
  • any parameter value or operator/operator set which is thus made variable by the variable ⁇ is identified as such by the letter "V" placed as the initial letter of its designation.
  • the universal operator set VOS(U) is then applied systematically to all the syllables of a phrase or group of phrases (step S10).
  • the action involves modifying the numerical values t1, P1-P5 of the syllable data.
  • the slope parameter VPrs or VPfs is translated into a group of five difference values to be applied arithmetically to the values P1-P5 respectively. These difference values are chosen to move each of the values P1-P5 according to the parameterised slope, the middle value P3 remaining substantially unchanged, as explained earlier.
  • the first two values of the rising slope parameters will be negative to cause the first half of the pitch to be lowered and the last two values will be positive to cause the last half of the pitch to be raised, so creating the rising slope articulated at the centre point in time, as shown in figure 6.
  • the degree of slope forming the variable parameterisation is expressed in terms of these difference values.
  • a similar approach in reverse is used for the falling slope parameter.
  • the shift up or shift down operators can be applied before or after the slope operators. They simply add or subtract a same value, determined by the parameterisation, to the five pitch values P1-P5.
  • the operators form mutually exclusive pairs, i.e. a rising slope operator will not be applied if a falling slope operator is to be applied, and likewise for the shift up and down and duration operators.
  • the application of the operators i.e. calculation to modify the data parameters t1, P1-P5 is performed by the syllable data modifier unit 48.
  • VOS(U) Once the syllables have thus been processed by the universal operator set VOS(U), they are provisionally buffered for further processing if necessary.
  • the system then enters into a probabilistic accentuation phase P2, for which another operator accentuation parameter set VOS(PA) is prepared.
  • This operator set has the same operators as the universal operator set, but with different variable values for the parameterisation.
  • the operator set VOS(PA) is parameterised by respective values: VPrs(PA), VPfs(PA), VPsu(PA), VPsd(PA), VDd(PA), and VDc(PA).
  • These parameter values are likewise calculated by the operator parameterisation unit 38 as a function of the emotion, degree of emotion and other factors provided by the interface unit 40.
  • the choice of the parameters is generally made to add a degree of intonation (prosody) to the speech according to the emotion considered.
  • VOS(PA) An additional parameter of the probabilistic accentuation operator set VOS(PA) is the value of the probability N, as defined above, which is also made variable (VN) by the variable ⁇ . This value depends on the emotion and degree of emotion, as well as other factors, e.g. the nature of the syllable file.
  • step S14 Next is determined which of the syllables is to be submitted to this operator set VOS(PA), as determined by the random unit 46 (step S14).
  • the latter supplies the list of the randomly drawn syllables for accentuating by this operator set.
  • the candidate syllables are:
  • the randomly selected syllables among the candidates are then submitted for processing by the probabilistic accentuation operator set VOS(PA) by the syllable data modifier unit 48 (step S16).
  • the actual processing performed is the same as explained above for the universal operator set, with the same technical considerations, the only difference being in the parameter values involved.
  • the syllable data modifier unit 48 will supply the following modified forms of the syllable data (generically denoted S) originally in the file 28:
  • phase P4 of processing an accentuation specific to the first and last syllables of a phrase.
  • this phase P4 acts to accentuate all the syllables of the first and last words of the phrase.
  • the term phrase can be understood in the normal grammatical sense for intelligible text to be spoken, e.g. in terms of pauses in the recitation.
  • a phrase is understood in terms of a beginning and end of the utterance, marked by a pause. Typically, such a phrase can last from around one to three or four seconds.
  • the phase P4 of accentuating the last syllables applies to at least the first and last syllables, and preferably the first m and last n syllables, where m or n are typically equal to around 2 or 3 and can be the same or different.
  • the resulting operator set VOS(FL) is then applied to the first and last syllables of each phrase (step S20), these syllables being identified by the first/last syllables detector unit 34.
  • the syllable data on which is applied operator set VOS(FL) will have previously been processed by the universal operator set VOS(U) at step S10. Additionally, it may happen that a first or last syllable(s) would also been drawn at the random selection step S14 and thereby also be processed with by probabilistic accentuation operator set VOS(PA).
  • the parameterisation of the same general type for all operator sets is the same for all operator sets, only the actual values being different.
  • the values are usually chosen so that the least amount of change is produced by the universal operator set, and the largest amount of change is produced by the first and last syllable accentuation, the probabilistic accentuation operator set producing an intermediate amount of change.
  • the system can also be made to use intensity operators OI in its set, depending on the parameterisation used.
  • the interface unit 40 can be integrated into a computer interface to provide different controls. Among these can be direct choice of parameters of the different operator sets mentioned above, in order to allow the user U to fine-tune the system.
  • the interface can be made user friendly by providing visual scales, showing e.g. graphically the slope values, shift values, contraction/dilation values for the different parameters.
  • the invention can cover many other types of emotion synthesis systems. While being particularly suitable for synthesis systems that convey an emotion on voice or sound, the invention can also be envisaged for other types of emotion synthesis systems, in which the emotion is conveyed on other forms: facial or body expressions, visual effects, etc., motion of animated objects where the parameters involved reflect a type of emotion to be conveyed.

Claims (20)

  1. Geräte(2,12)-Betrieb-Steuerungsverfahren zum Synthetisieren einer in einem Sound übertragenen Emotion, wobei das Gerät zumindest einen Eingang für einen Parameter (Pi) aufweist, dessen Wert (Ei) verwendet wird, um einen zu übertragenden Emotionstyp einzustellen, wobei das Verfahren nachfolgende Schritte umfasst:
    - das Programmieren des Eingangssignals (der Eingangssignale) mit einer Parametrisierung, um einen gegebenen Emotionstyp(E) zu produzieren, und
    - das Zuteilen einer Betragsvariabilität auf den gegebenen zu übertragenden Emotionstyp;
    dadurch gekennzeichnet, dass die Betragsvariabilität des gegebenen Emotionstyps erhalten wird, indem innerhalb eines vorherbestimmten Steuerungsbereiches zumindest ein Parameter (Pi), welcher verwendet wird, um einen Emotionstyp einzustellen, einer Abweichung von seinem Anfangsstandardwert (Ei) unterzogen wird.
  2. Verfahren gemäß Anspruch 1, wobei zumindest ein variabler Parameter (VPi) gemäß einem lokalen Modell über den Steuerungsbereich variabel gemacht wird, wobei das Modell eine Quantität einer Emotionssteuerungsvariablen (δ) auf den variablen Parameter (VPi) bezieht, wobei die Quantität der Emotionssteuerungsvariablen verwendet wird, um einen Wert des variablen Parameters variabel einzurichten.
  3. Verfahren gemäß Anspruch 2, wobei das lokale Modell ein lokales lineares Modell für den Steuerungsbereich und für einen gegebenen Emotionstyp ist, wobei der variable Parameter (VPi) gemacht ist, um sich linear über den Steuerungsbereich mittels der Quantität der Emotionssteuerungsvariable (δ) zu verändern.
  4. Verfahren gemäß einem der Ansprüche 1 bis 3, wobei die Quantität der Emotion durch eine Steuerungsvariable (δ) bestimmt wird, welche den variablen Parameter (VPi) in Übereinstimmung mit einer Relation modifiziert, welche durch folgende Formel gegeben ist: VPi = A + δB
    Figure imgb0007

    wobei:
    VPi der Wert des untersuchten variablen Parameters ist,
    A und B für den Steuerungsbereich zugelassene Werte sind, und
    δ die Quantität der Emotionssteuerungsvariablen ist.
  5. Verfahren gemäß Anspruch 4, wobei A ein Wert innerhalb des Steuerungsbereiches ist, wobei die Quantität der Emotionssteuerungsvariablen (δ) in einem Intervall variabel ist, welches den Wert Null enthält.
  6. Verfahren gemäß Anspruch 5, wobei A im Wesentlichen der Mittenwert (Emr) des Steuerungsbereiches ist, und wobei die Quantität der Emotionssteuerungsvariablen (δ) in einem Intervall variabel ist, dessen Mittenwert Null ist.
  7. Verfahren gemäß Anspruch 6, wobei die Quantität der Emotionssteuerungsvariablen (δ) in einem Intervall von -1 bis +1 variabel ist.
  8. Verfahren gemäß einem der Ansprüche 4 bis 7, wobei B bestimmt wird durch: B = Eimax - A , oder durch B = Eimin + A ,
    Figure imgb0008

    wobei:
    Eimax der Wert des Eingangsparameters ist, um die Maximalquantität des zu übertragenden Emotionstyps in dem Steuerungsbereich zu produzieren, und
    Eimin der Wert des Parameters ist, um die Minimalquantität des zu übertragenden Emotionstyps in dem Steuerungsbereich zu produzieren.
  9. Verfahren gemäß einem der Ansprüche 4 bis 8, wobei A gleich dem Standardparameterwert (Ei) ist, welcher ursprünglich spezifiziert wurde, um einen zu übertragenden Emotionstyp einzustellen.
  10. Verfahren gemäß Anspruch 8 oder 9, wobei der Wert Eimax oder Eimin experimentell bestimmt wird, durch Abweichung von dem Standardparameterwert (Ei), welcher ursprünglich spezifiziert wurde, um einen zu übertragenden Emotionstyp einzustellen, und durch Bestimmung einer Maximalabweichung in Zunahme- oder Abnahmerichtung, welches eine gewünschte Grenze für die Quantität der durch den Steuerungsbereich zu übertragenden Emotion ergibt.
  11. Verfahren gemäß einem der Ansprüche 1 bis 10, wobei eine gleiche Quantität der Emotionssteuerungsvariablen (δ) verwendet wird, um gemeinsam eine Vielzahl von variablen Parametern (VP1-VPN) des Emotionssynthetisiergerätes (2, 12) zu bilden.
  12. System(2,12)-Betriebs-Steuerungsvorrichtung (10) zum Synthetisieren einer in einem Sound übertragenen Emotion, wobei das System zumindest einen Eingang für einen Parameter (Pi) aufweist, dessen Wert (Ei) verwendet wird, um einen zu übertragenden Emotionstyp einzustellen,
    dadurch gekennzeichnet, dass die Vorrichtung umfasst:
    - ein Mittel zum Programmieren des Eingangssignals (der Eingangssignale) mit einer Parametrisierung, um einen gegebenen Emotionstyp(E) zu produzieren, und
    - Variationsmittel (14,16,18) zum Zuteilen einer Betragsvariabilität an den zu übertragenden Emotionstyp;
    dadurch gekennzeichnet, dass die Variationsmittel (14,16,18) in der Lage sind, innerhalb eines vorherbestimmten Steuerungsbereiches zumindest einen Parameter (Pi), welcher verwendet wird, um einen Emotionstyp einzustellen, einer Abweichung von seinem Anfangsstandardwert (Ei) zu unterziehen.
  13. Vorrichtung gemäß Anspruch 12, wobei auf die Variationsmittel (14,16,20) zugreifbar ist, um zu bewirken, dass zumindest ein variabler Parameter (VPi) sich als Antwort auf eine Quantität der Emotionssteuerungsvariablen (δ) verändert, auf welche zugreifbar ist, um variabel einen Wert des variablen Parameters zu bilden.
  14. Vorrichtung gemäß Anspruch 13, wobei die Variationsmittel (14,16,18) bewirken, dass sich der variable Parameter (VPi) linear gemäß einem lokalen linearen Modell verändert, mit einer Variation bei der Quantität der Emotionssteuerungsvariablen (δ).
  15. Vorrichtung gemäß einem der Ansprüche 13 und 14, wobei die Quantität der Emotionssteuerungsvariablen (δ) in einem Intervall variabel ist, welches den Wert Null enthält.
  16. Vorrichtung gemäß Anspruch 15, wobei die Quantität der Emotionssteuerungsvariablen (δ) in einem Intervall von -1 bis +1 variabel ist.
  17. Vorrichtung gemäß einem der Ansprüche 12 bis 16, wobei die Variationsmittel (14,16,20) bewirken, dass sich zumindest ein variabler Parameter (VPi) als Antwort auf eine Quantität der Emotionssteuerungsvariablen (δ) gemäß einer der nachfolgenden Formeln verändert: VPi = Emr + δ Eimax - Emr ,
    Figure imgb0009
    oder VPi = Emr + δ Eimin + Emr
    Figure imgb0010

    wobei:
    δ der Wert der Quantität der Emotionssteuerungsvariablen ist,
    Emr im Wesentlichen der Mittenwert des Steuerungsbereiches ist, vorzugsweise gleich dem Standardparameterwert (Ei), welcher ursprünglich spezifiziert wurde, um einen zu übertragenden Emotionstyp einzustellen,
    Eimax der Wert des Parameters zum Produzieren des Maximalbetrages des zu übertragenden Emotionstyps in dem Steuerungsbereich ist, und
    Eimin der Wert des Parameters zum Produzieren des Minimalbetrages des zu übertragenden Emotionstyps in dem Steuerungsbereich ist.
  18. Vorrichtung gemäß einem der Ansprüche 12 bis 17, welche in der Lage ist, gemeinsam mit der gleichen Quantität der Emotionssteuerungsvariablen (δ) eine Vielzahl von variablen Parametern (VP1-VPN) des Emotionssynthetisiersystems (2,12) zu bilden, um variabel einen Wert des variablen Parameters zu bilden.
  19. System, welches umfasst, ein Emotionssynthetisiergerät (2,12), welches zumindest einen Eingang zur Aufnahme von zumindest einem Parameter (Pi) aufweist, dessen Wert (Ei) verwendet wird, um einen zu übertragenden Emotionstyp einzustellen, und eine Vorrichtung (10) gemäß einem der Ansprüche 13 bis 19, welche sich im Wirkverbund befindet, um eine Variable (VPi) an zumindest einen Eingang zu übergeben, und dadurch eine Betragsvariabilität eines zu übertragenden Emotionstyps zuzuteilen.
  20. Computerprogramm, welches von einem Computer ausführbare Anweisungen bereitstellt, welche, sobald sie auf einem Datenprozessor laufen, bewirken, dass der Datenprozessor alle Verfahrensschritte in Übereinstimmung mit einem der Ansprüche 1 bis 11 ausführt.
EP20010402176 2001-05-11 2001-08-14 Verfahren und Vorrichtung zur Steuerung eines Emotionssynthesegeräts Expired - Lifetime EP1256933B1 (de)

Priority Applications (3)

Application Number Priority Date Filing Date Title
EP20010402176 EP1256933B1 (de) 2001-05-11 2001-08-14 Verfahren und Vorrichtung zur Steuerung eines Emotionssynthesegeräts
JP2002206013A JP2003177772A (ja) 2001-07-13 2002-07-15 感情合成装置の処理を制御する方法及び装置
US10/217,002 US7457752B2 (en) 2001-08-14 2002-08-12 Method and apparatus for controlling the operation of an emotion synthesizing device

Applications Claiming Priority (5)

Application Number Priority Date Filing Date Title
EP01401203 2001-05-11
EP01401203A EP1256931A1 (de) 2001-05-11 2001-05-11 Verfahren und Vorrichtung zur Sprachsynthese und Roboter
EP01401880 2001-07-13
EP20010401880 EP1256932B1 (de) 2001-05-11 2001-07-13 Verfahren und Vorrichtung um eine mittels eines Klangs übermittelte Emotion zu synthetisieren
EP20010402176 EP1256933B1 (de) 2001-05-11 2001-08-14 Verfahren und Vorrichtung zur Steuerung eines Emotionssynthesegeräts

Publications (3)

Publication Number Publication Date
EP1256933A2 EP1256933A2 (de) 2002-11-13
EP1256933A3 EP1256933A3 (de) 2004-10-13
EP1256933B1 true EP1256933B1 (de) 2007-11-21

Family

ID=27224389

Family Applications (1)

Application Number Title Priority Date Filing Date
EP20010402176 Expired - Lifetime EP1256933B1 (de) 2001-05-11 2001-08-14 Verfahren und Vorrichtung zur Steuerung eines Emotionssynthesegeräts

Country Status (1)

Country Link
EP (1) EP1256933B1 (de)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4130190B2 (ja) 2003-04-28 2008-08-06 富士通株式会社 音声合成システム

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH04199098A (ja) * 1990-11-29 1992-07-20 Meidensha Corp 規則音声合成装置
US5860064A (en) * 1993-05-13 1999-01-12 Apple Computer, Inc. Method and apparatus for automatic generation of vocal emotion in a synthetic text-to-speech system
US5732232A (en) * 1996-09-17 1998-03-24 International Business Machines Corp. Method and apparatus for directing the expression of emotion for a graphical user interface

Also Published As

Publication number Publication date
EP1256933A3 (de) 2004-10-13
EP1256933A2 (de) 2002-11-13

Similar Documents

Publication Publication Date Title
Toda et al. A speech parameter generation algorithm considering global variance for HMM-based speech synthesis
US20030093280A1 (en) Method and apparatus for synthesising an emotion conveyed on a sound
KR101542005B1 (ko) 음성 합성 정보 편집 장치
US20100066742A1 (en) Stylized prosody for speech synthesis-based applications
JPH04331997A (ja) 音声合成装置のアクセント成分制御方式
US7457752B2 (en) Method and apparatus for controlling the operation of an emotion synthesizing device
CN112599113A (zh) 方言语音合成方法、装置、电子设备和可读存储介质
JP3728173B2 (ja) 音声合成方法、装置および記憶媒体
JP2001265375A (ja) 規則音声合成装置
EP1256933B1 (de) Verfahren und Vorrichtung zur Steuerung eines Emotionssynthesegeräts
JP2001242882A (ja) 音声合成方法及び音声合成装置
JPH09319391A (ja) 音声合成方法
van Rijnsoever A multilingual text-to-speech system
DE60131521T2 (de) Verfahren und Vorrichtung zur Steuerung des Betriebs eines Geräts bzw. eines Systems sowie System mit einer solchen Vorrichtung und Computerprogramm zur Ausführung des Verfahrens
JP2008191477A (ja) ハイブリッド型音声合成方法、及びその装置とそのプログラムと、その記憶媒体
JPH09179576A (ja) 音声合成方法
JP2003177772A (ja) 感情合成装置の処理を制御する方法及び装置
JPH07244496A (ja) テキスト朗読装置
JPH11249676A (ja) 音声合成装置
JPH05224689A (ja) 音声合成装置
JP2000310996A (ja) 音声合成装置および音韻継続時間長の制御方法
JPH05224688A (ja) テキスト音声合成装置
Liberman Computer speech synthesis: its status and prospects.
JPH05108084A (ja) 音声合成装置
JPH09292897A (ja) 音声合成装置

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

AK Designated contracting states

Kind code of ref document: A2

Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LI LU MC NL PT SE TR

AX Request for extension of the european patent

Free format text: AL;LT;LV;MK;RO;SI

PUAL Search report despatched

Free format text: ORIGINAL CODE: 0009013

AK Designated contracting states

Kind code of ref document: A3

Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LI LU MC NL PT SE TR

AX Request for extension of the european patent

Extension state: AL LT LV MK RO SI

RIC1 Information provided on ipc code assigned before grant

Ipc: 7G 10L 13/02 B

Ipc: 7G 10L 13/08 A

17P Request for examination filed

Effective date: 20050317

17Q First examination report despatched

Effective date: 20050509

AKX Designation fees paid

Designated state(s): DE FR GB

GRAP Despatch of communication of intention to grant a patent

Free format text: ORIGINAL CODE: EPIDOSNIGR1

GRAS Grant fee paid

Free format text: ORIGINAL CODE: EPIDOSNIGR3

GRAA (expected) grant

Free format text: ORIGINAL CODE: 0009210

AK Designated contracting states

Kind code of ref document: B1

Designated state(s): DE FR GB

REG Reference to a national code

Ref country code: GB

Ref legal event code: FG4D

REF Corresponds to:

Ref document number: 60131521

Country of ref document: DE

Date of ref document: 20080103

Kind code of ref document: P

ET Fr: translation filed
PLBE No opposition filed within time limit

Free format text: ORIGINAL CODE: 0009261

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: NO OPPOSITION FILED WITHIN TIME LIMIT

26N No opposition filed

Effective date: 20080822

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: DE

Payment date: 20140821

Year of fee payment: 14

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: GB

Payment date: 20140820

Year of fee payment: 14

Ref country code: FR

Payment date: 20140821

Year of fee payment: 14

REG Reference to a national code

Ref country code: DE

Ref legal event code: R119

Ref document number: 60131521

Country of ref document: DE

GBPC Gb: european patent ceased through non-payment of renewal fee

Effective date: 20150814

REG Reference to a national code

Ref country code: FR

Ref legal event code: ST

Effective date: 20160429

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: DE

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20160301

Ref country code: GB

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20150814

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: FR

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20150831