JP2003202885A5

JP2003202885A5 -

Info

Publication number: JP2003202885A5
Application number: JP2001401424A
Authority: JP
Filing date: 2001-12-28
Publication date: 2004-11-11
Anticipated expiration: 2021-12-28

Claims

Calling means for making a voice call,
Generating means for generating voice feature data from the voice of the other party obtained from the calling means;
Storage means for storing the voice feature data generated by the generation means in association with the other party,
Receiving means for receiving a message including text data;
Acquisition means for acquiring from the storage means voice feature data of the other party corresponding to the sender of the message received by the receiving means,
An information processing apparatus comprising: a synthesizing unit configured to generate synthesized voice data for text data included in the message using the voice feature data acquired by the acquiring unit.

Updating means for updating voice characteristic data corresponding to the other party of the call, which is already stored in the storage means, using the voice characteristic data obtained by the generation means during the call by the call means. The information processing apparatus according to claim 1, further comprising:

The information processing apparatus according to claim 1, further comprising an adjustment unit that manually adjusts the audio feature data stored in the storage unit.

Classification means for classifying the voice of the other party obtained from the call means into any of a plurality of emotion classification items,
2. The method according to claim 1, wherein the generation unit acquires voice feature data for each emotion classification item classified by the classification unit, and the storage unit stores the voice feature data for each emotion classification item. An information processing apparatus according to claim 1.

The information processing apparatus according to claim 4, wherein the classification unit performs emotion classification based on voice rule information such as pitch and accent detected from the voice.

Further comprising a determination unit that determines to which of the plurality of emotion classification items the text data included in the message belongs,
The acquisition unit acquires, from the storage unit, voice feature data of a communication partner corresponding to a sender of the message received by the reception unit, the speech feature data corresponding to the emotion item classification determined by the determination unit. The information processing device according to claim 4.

When voice generating data corresponding to a certain emotion classification item is generated by the generating unit, the voice generating device further includes an updating unit that updates voice characteristic data corresponding to another emotion classification item using the voice characteristic data. The information processing apparatus according to claim 4, wherein:

Input means for inputting voice,
Voice recognition means for generating a text string from the voice input by the input means,
Detecting means for detecting a change in the utterance state in the voice input by the input means,
Providing means for providing additional data to the text string generated by the voice recognition means, based on a change in the utterance state detected by the detection means,
An information processing apparatus comprising: a generating unit configured to generate a transmission message including a text string to which additional data has been added by the adding unit.

The information processing apparatus according to claim 8, wherein the additional data is an attribute of the text string.

9. The information processing apparatus according to claim 8, wherein the detection unit detects a change in volume and / or speed of the input voice.

Further comprising an acquisition unit for acquiring voice feature data from the input voice,
9. The information processing apparatus according to claim 8, wherein the transmission message generated by the generation unit includes the voice feature data.

Input means for inputting voice,
Voice recognition means for generating a text string from the voice input by the input means,
Classification means for classifying the input voice into one of a plurality of emotion classification items,
An assigning unit that assigns additional data corresponding to the classification item to a text string generated by the voice recognition unit, based on a classification result by the classification unit;
An information processing apparatus comprising: a generation unit configured to generate a transmission message including a text string to which additional data is added by the adding unit.

13. The information processing apparatus according to claim 12, wherein the classification unit classifies each of the phrases of the input voice into one of the plurality of emotion classification items.

An acquisition unit that acquires audio feature data for each emotion classification item based on each audio classified by the classification unit,
13. The information processing apparatus according to claim 12, wherein the transmission message generated by the generation unit includes voice feature data for each of the emotion classification items.

Receiving means for receiving a message including a text string to which additional data indicating an utterance state is added;
Voice synthesis means for generating voice data based on a text string of the message received by the receiving means,
An information processing apparatus comprising: a change unit configured to obtain the additional data from the received text string, and change an utterance state of the voice data based on the additional data.

The information processing apparatus according to claim 15, wherein the utterance state includes a volume and a speed of the utterance.

The message includes audio feature data;
16. The information processing apparatus according to claim 15, wherein the voice synthesizing unit generates voice data for a text string using voice feature data included in the message.

Receiving means for receiving a message including a text string to which additional data indicating which emotion classification belongs to a plurality of emotion classification items and voice feature data corresponding to each of the plurality of emotion classification items;
Acquiring means for acquiring, from the message, voice feature data corresponding to the emotion classification item to which the text string belongs, based on the additional data of the message received by the receiving means;
An information processing apparatus comprising: a voice synthesizing unit configured to generate voice data for the text string using the voice feature data acquired by the acquiring unit.

A call process for making a voice call;
A generation step of generating voice feature data from the voice of the other party obtained from the call step;
A storing step of storing the voice feature data generated in the generating step in a storage unit in association with the call partner;
A receiving step of receiving a message including text data;
An obtaining step of obtaining, from the storage unit, voice characteristic data of a call partner corresponding to a sender of the message received in the receiving step;
A synthesizing step of generating synthesized voice data for the text data included in the message using the voice feature data acquired in the acquiring step.

An updating step of updating voice feature data corresponding to the other party of the call, which is already stored in the storing step, using the voice feature data obtained in the generation step during the call in the call step. The information processing method according to claim 19, further comprising:

20. The information processing method according to claim 19, further comprising an adjusting step of manually adjusting the audio feature data stored in the storing step.

The method further comprises a classification step of classifying the voice of the other party obtained from the call step into one of a plurality of emotion classification items,
The generating step acquires voice feature data for each emotion classification item classified by the classification step, and the storing step stores voice feature data in the storage unit for each emotion classification item. 20. The information processing method according to claim 19, wherein

23. The information processing method according to claim 22, wherein, in the classification step, emotion classification is performed based on voice rule information such as pitch and accent detected from the voice.

A determining step of determining to which of the plurality of emotion classification items the text data included in the message belongs;
The acquiring step is characterized by acquiring, from the storage unit, voice feature data of a communication partner corresponding to a sender of the message received in the receiving step, corresponding to the emotion item classification determined in the determining step. The information processing method according to claim 22.

When voice feature data corresponding to a certain emotion classification item is generated in the generation step, the method further includes an update step of updating voice feature data corresponding to another emotion classification item using the voice feature data. 23. The information processing method according to claim 22, wherein

An input step of inputting voice,
A voice recognition step of generating a text string from the voice input in the input step,
A detection step of detecting a change in the utterance state in the voice input in the input step,
Based on a change in the utterance state detected in the detection step, an adding step of adding additional data to the text string generated in the voice recognition step,
Generating a transmission message including a text string to which additional data has been added in the adding step.

The method according to claim 26, wherein the additional data is an attribute of the text string.

27. The information processing method according to claim 26, wherein the detecting step detects a change in volume and / or speed of the input voice.

An acquisition step of acquiring audio feature data from the input audio,
27. The information processing method according to claim 26, wherein the transmission message generated in the generation step includes the voice feature data.

An input step of inputting voice,
A voice recognition step of generating a text string from the voice input in the input step,
A classification step of classifying the input voice into one of a plurality of emotion classification items;
An assigning step of assigning additional data corresponding to the classification item to the text string generated in the voice recognition step, based on a classification result in the classification step;
Generating a transmission message including a text string to which additional data has been added in the adding step.

31. The information processing method according to claim 30, wherein the classification step classifies each phrase of the input voice into one of the plurality of emotion classification items.

An acquisition step of acquiring audio feature data for each emotion classification item based on each audio classified in the classification step,
31. The information processing method according to claim 30, wherein the transmission message generated in the generation step includes voice feature data for each of the emotion classification items.

A receiving step of receiving a message including a text string to which additional data indicating an utterance state is added;
A voice synthesis step of generating voice data based on the text string of the message received in the receiving step,
An information processing method comprising: acquiring the additional data from the received text string; and changing an utterance state of the audio data based on the additional data.

The method according to claim 33, wherein the utterance state includes a volume and a speed of the utterance.

The message includes audio feature data;
The information processing method according to claim 33, wherein the voice synthesizing step generates voice data for a text string using voice feature data included in the message.

A receiving step of receiving a message including a text string to which additional data indicating which emotion classification belongs to the plurality of emotion classification items and voice feature data corresponding to each of the plurality of emotion classification items,
An acquiring step of acquiring, from the message, voice feature data corresponding to the emotion classification item to which the text string belongs based on the additional data of the message received in the receiving step;
A voice synthesizing step of generating voice data for the text string using the voice feature data obtained in the obtaining step.

A storage medium for storing a computer program for causing a computer to execute the information processing method according to any one of claims 19 to 36.