WO2018050212A1 - Terminal de télécommunication à conversion vocale - Google Patents
Terminal de télécommunication à conversion vocale Download PDFInfo
- Publication number
- WO2018050212A1 WO2018050212A1 PCT/EP2016/071595 EP2016071595W WO2018050212A1 WO 2018050212 A1 WO2018050212 A1 WO 2018050212A1 EP 2016071595 W EP2016071595 W EP 2016071595W WO 2018050212 A1 WO2018050212 A1 WO 2018050212A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- voice
- voice signal
- terminal device
- data sets
- telecommunication terminal
- Prior art date
Links
- 238000006243 chemical reaction Methods 0.000 title 1
- 230000001755 vocal effect Effects 0.000 claims abstract description 63
- 239000003607 modifier Substances 0.000 claims abstract description 19
- 238000004891 communication Methods 0.000 claims abstract description 15
- 230000008451 emotion Effects 0.000 claims description 36
- 238000000034 method Methods 0.000 claims description 33
- 238000004590 computer program Methods 0.000 claims description 11
- 238000012545 processing Methods 0.000 description 8
- 230000008909 emotion recognition Effects 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 230000004075 alteration Effects 0.000 description 2
- 206010011469 Crying Diseases 0.000 description 1
- 206010039740 Screaming Diseases 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 230000000994 depressogenic effect Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 230000003340 mental effect Effects 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 230000001537 neural effect Effects 0.000 description 1
- 238000012552 review Methods 0.000 description 1
- 230000002889 sympathetic effect Effects 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 210000001260 vocal cord Anatomy 0.000 description 1
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M1/00—Substation equipment, e.g. for use by subscribers
- H04M1/60—Substation equipment, e.g. for use by subscribers including speech amplifiers
- H04M1/6008—Substation equipment, e.g. for use by subscribers including speech amplifiers in the transmitter circuit
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/003—Changing voice quality, e.g. pitch or formants
Definitions
- the present invention relates generally to the field of voice telecommunication terminal devices.
- Known voice telecommunication terminal devices allow to convert the voice of a user into an electrical voice signal and to transmit the voice signal via a telecommunication network to a recipient.
- the object is achieved by a voice telecommunication terminal device configured for being used for voice communication, wherein the voice telecommunication terminal device comprises: a microphone configured for converting a voice of a user into an electrical voice signal; a voice modifier configured for modifying one or more vocal parameters of at least one portion of the voice signal in order to produce a vocally modified voice signal; and a telecommunication interface configured for connecting the voice telecommunication terminal device to a telecommunication network and for transmitting the vocally modified voice signal to the telecommunication network.
- the voice telecommunication terminal device may be any device used directly by an end-user for real-time voice communication (as in telephony) and/or for time-delayed voice communication (as in voice mail systems).
- the voice telecommunication terminal device may be a landline telephone, a mobile telephone, a smart phone, a tablet or laptop computer equipped with a landline or mobile telecommunication interface, or any other device configured for voice communication.
- the microphone may be any transducer that converts sound, in particular voice, into an electrical signal.
- the user may be the end-user of the voice telecommunication terminal device.
- the voice modifier may be any digital or analog sound processing device, which is capable of modifying one or more vocal parameters of an electrical voice signal produced by the microphone in order to produce a vocally modified voice signal.
- the voice modifier can be a stand-alone module of the voice communication terminal device or it can be a module integrated in a voice messaging module, a video conference module, a voice over IP module or any other voice communication module of the voice communication terminal device.
- the telecommunication interface may be any digital or analog telecommunication interface, which is capable of transmitting the vocally modified voice signal to a telecommunication network, when being connected to a telecommunication network.
- the telecommunication interface may be configured for transmitting the vocally modified voice signal via cable or wireless.
- the telecommunication network may be any digital or analog network used for voice telecommunication, such as a landline telephone network, a mobile telephone network, a local area network or a wide area network.
- voice In general, voice consists of sound made by a human being using the vocal folds for talking, singing, laughing, crying, screaming, etc.
- Voice in this sense has two main components: verbal components, which refer to the words expressed by voice, and vocal components, which refer to the intonation of the voice.
- the importance of vocal components in voice may be indicated by the fact that children can understand intonations in voice before they can understand words.
- Vocal components are generated by changes over time in the pitch, duration and amplitude of voice segments as well as by voice quality (e.g. nasal, breathy or hoarse).
- Voice quality in general is intrasegmental, depending on the individual vocal tract of a human.
- vocal parameters of the voice signal are all parameters referring to pitch, timing, vocal quality and/or loudness of the voice signal or a portion thereof.
- the invention gives the end-user the possibility to modify the original vocal characteristics of the voice signal so that the modified voice signal has vocal characteristics not necessarily present in the original voice signal.
- the invention allows automatically changing the vocal characteristics of a voice signal in a voice telecommunication terminal device before the voice signal is sent as a modified voice signal via a telecommunication network to a contact person.
- the invention allows automatically modifying a voice signal in a voice telecommunication terminal device without having to use external soft- ware to produce the modifications and without re-recording of the voice signal.
- the one or more vocal parameters comprise at least one pitch parameter, which refers to a pitch envelope of the at least one portion of the voice signal, at least one timing parameter, which refers to a time scale of the at least one portion of the voice signal, at least one quality parameter, which refers to a voice quality of the at least one portion of the voice signal, and/or at least one loudness parameter, which refers to a loudness envelope of the at least one portion of the voice signal.
- the use of a pitch parameter, a timing parameter, a quality parameter and/or loudness parameter allow various modifications of the vocal characteristics of the voice signal, especially if said parameters are used in combination.
- Pitch in general is a characteristic of sound that may be quantified as frequencies of the sound. Examples for pitch parameters are: fundamental frequency, pitch range, shape and timing of the pitch contour.
- Timing parameters in general refer to the tempo of a voice.
- Examples for tim- ing parameters are: basic speed, speed range, timing of segments and pauses.
- Voice quality is a speaker-dependent characteristic which gives a voice its particular identity and by which speakers are most quickly identified. Such factors as age, sex, regional background, stature, state of health, and the overall speaking situation will affect voice quality. Examples for quality parameters are percentage of overtones and percentage of noise.
- Loudness in general is a characteristic of sound that primarily correlates with physical strength (amplitude) of the sound.
- loudness parameters are: dynamic range and shape and timing of the intensity.
- the voice modifier is configured for modifying the at least one portion of the voice signal based on a selected data set being selectable from a plurality of data sets stored at the voice telecommunication terminal device, wherein each data set of the plurality of data sets comprises a predefined value for each vocal parameter of the one or more vocal parameters. Selecting a predefined data set from a plurality of predefined data sets allow selecting an overall vocal characteristic for the locally modified voice signal in an easy way. The selection can be done manually by the user or, in other cases, even automatically.
- Each data set comprises one predefined value for each vocal parameter to be changed.
- Some data sets may comprise prede- fined values for a part of the vocal parameters so that only a part of the vocal parameters will be modified, whereas other data sets may comprise prede- fined values for all vocal parameters for which the voice telecommunication terminal device is configured to modify so that all available vocal parameters will be modified.
- the predefined values of one or more data sets of the plurality of data sets are predefined in such way that the modified voice signal conveys an emotion.
- Emotion is any relatively brief conscious experience characterized by intense mental activity and a high degree of pleasure or displeasure. It is known that emotions may be conveyed by vocal characteristics of voice. By providing a data set comprising vocal parameters suitable for conveying a specific emotion, it is possible to convey an emotion, which was not conveyed by the original voice signal.
- the emotion conveyed by a first data set of the plurality of data sets is related to being angry, wherein the emotion conveyed by a second data set of the plurality of data sets is related to being scared, wherein the emotion conveyed by a third data set of the plurality of data sets is related to being tender, wherein the emotion conveyed by a fourth data set of the plurality of data sets is related to being excited, wherein the emotion conveyed by a fifth data set of the plurality of data sets is related to being happy, and/or wherein the emotion conveyed by a sixth data set of the plurality of data sets is related to being sad.
- Being angry includes emotions like “being irritated”, “being resentful”, “being miffed”, “being upset”, “being mad”, “being furious” and “being raging”.
- being scared comprises emotions like “being tense”, “being nervous”, “being anxious”, “being jittery”, “being frightened”, “being panic- stricken” and “being ashamed”.
- being tender includes emotions like, “being intimate”, “being loving”, “being warm-hearted”, “being sympathetic”, “being touched”, “being kind” and “being soft.
- Being excited comprises emotions like “being ecstatic”, “being energetic”, “being aroused”, “being bouncy”, “being nervous”, “being perky” and “being antsy”.
- the emotion “being happy” includes emotions like “being fulfilled”, “being contended”, “being glad”, “being complete”, “being satisfied”, “being optimistic” and “being pleased”.
- being sad comprises emotions like “being down”, “being blue”, “being mopey”, “being grieved”, “being dejected”, “being depressed” and “being heartbroken”.
- the predefined values of one or more data sets of the plurality of data sets are predefined in such way that the vocal parameters of the modified voice signal correspond to vo- cal parameters of a voice of a person being different from the user.
- the predefined values of one or more data sets of the plurality of data sets are predefined in such way that the vocal parameters of the modified voice signal correspond to vocal parameters preferably used in a class in society.
- social class may be defined as “people having the same social, economic, or educational status,” e.g., "the working class”. These features allow making the vocally modified voice signal sound like a member of a social class, when the vocally modified voice signal is reproduced as sound.
- the predefined values of one or more data sets of the plurality of data sets are predefined in such way that the vocal parameters of the modified voice signal correspond to vocal parameters preferably used in a geographical region.
- the predefined values of one or more data sets of the plurality of data sets are predefined in such way that the vocal parameters of the modified voice signal correspond to vocal parameters preferably used in a period in history.
- These features allow making the vocally modified voice signal sound like a record of a voice from an antecedent era, when the vocally modified voice signal is reproduced as sound.
- the voice modifier is configured for modifying the one or more vocal parameters in real-time.
- the voice telecommunication terminal device comprises a memory device configured for storing the at least one portion of the voice signal and/or for storing at least one portion of the vocally modified voice signal.
- the problem is solved by a method for operating a voice telecommunication terminal device configured for being used for voice communication, wherein the method comprises following steps: converting a voice of a user into an electrical voice signal by using a microphone; modifying one or more vocal parameters of at least one portion of the voice signal in order to produce a vocally modified voice signal by using a voice modifier; and connecting the voice telecommunication terminal device to a telecommunication network by using an interface; and transmitting the vocally modified voice signal to the telecommunication network by using the interface.
- Figure 1 illustrates a first embodiment of a voice telecommunication terminal device according to the invention in a schematic view
- Figure 2 illustrates a second embodiment of a voice telecommunication terminal device according to the invention in a schematic view.
- Figure 1 illustrates a first embodiment of a voice telecommunication terminal device 1 according to the invention in a schematic view.
- the voice telecommunication terminal device 1 comprises: a microphone 2 configured for converting a voice VO of a user into an electrical voice signal VS; a voice modifier 3 configured for modifying one or more vocal parameters of at least one portion of the voice signal VS in order to produce a vocally modi- fied voice signal MVS; and a telecommunication interface 4 configured for connecting the voice telecommunication terminal device 1 to a telecommunication network TN and for transmitting the vocally modified voice signal MVS to the telecommunication network TN.
- the invention may refer to the use of signal processing techniques and algorithms for the manipulation of a voice signal VS recorded or streamed, within a messaging app, with the goal of conveying emotions in the modified voice signal MVS to be transmitted with the messaging app.
- the invention may be used for real-time processing.
- the invention may refer to the processing of a voice signal VS in real-time in the context of messaging and videoconference applications. It also may apply to ap- plications such as a telephony application or a Voice over IP (VOIP) application, where the processing of the voice signal VS happens in real-time.
- VOIP Voice over IP
- the user may select a modification to be applied to the voice signal VS, which may be transmitted in real-time as the modified voice signal MVS using the messaging or videoconference application, the telephony application or the Voice over IP (VOIP) application.
- a modification to be applied to the voice signal VS which may be transmitted in real-time as the modified voice signal MVS using the messaging or videoconference application, the telephony application or the Voice over IP (VOIP) application.
- VOIP Voice over IP
- the one or more vocal parameters comprise at least one pitch parameter, which refers to a pitch envelope of the at least one portion of the voice signal VS, at least one timing parameter, which refers to a time scale of the at least one portion of the voice signal VS, at least one quality parameter, which refers to a voice quality of the at least one portion of the voice signal VS, and/or at least one loudness parameter, which refers to a loudness envelope of the at least one portion of the voice signal VS.
- at least one pitch parameter which refers to a pitch envelope of the at least one portion of the voice signal VS
- at least one timing parameter which refers to a time scale of the at least one portion of the voice signal VS
- at least one quality parameter which refers to a voice quality of the at least one portion of the voice signal VS
- at least one loudness parameter which refers to a loudness envelope of the at least one portion of the voice signal VS.
- the invention provides a method for operating a voice telecommunication terminal device 1 configured for being used for voice communication, the method comprising following steps: converting a voice VO of a user into an electrical voice signal VS by using a microphone 2; modifying one or more vocal parameters of at least one portion of the voice signal VS in order to produce a vocally modified voice signal MVS by using a voice modifier 3; and connecting the voice telecommunication terminal device 1 to a telecommunication network TN by using a telecommunication interface ⁇ and transmitting the vocally modified voice signal MVS to the telecommunication network TN by using the telecommunication interfaced
- the invention provides a computer program for, when running on a processor, executing the inventive method.
- Figure 2 illustrates a second embodiment of a voice telecommunication terminal device 1 according to the invention in a schematic view.
- the second embodiment shown in Figure 2 is based on the first embodiment shown in Figure 1.
- the voice modifier 3 is configured for modifying the at least one portion of the voice signal VS based on a selected data set 5 being selectable from a plurality of data sets 6 stored at the voice telecommunication terminal device 1 , wherein each data set 6 of the plurality of data sets 6 comprises a predefined value for each vocal parameter of the one or more vocal parameters.
- the voice modifier 3 is configured for modifying the one or more vocal parameters in real-time.
- the plurality of data sets 6 consists, as an example, of four data sets 6.1 , 6.2, 6.3 and 6.4.
- Data set 6.1 is chosen as the selected data set 5.
- the predefined values of data set 6.1 are used by the voice modifier 3 for modifying the voice signal VS in order to produce the vocally modified voice signal MVS.
- another data set e.g. 6.2, 6.3 or 6.4 could be used for that purpose.
- the predefined values of one or more data sets 6 of the plurality of data sets 6 are predefined in such way that the modified voice signal MVS conveys an emotion.
- the emotion conveyed by a first data set 6 of the plurality of data sets 6 is related to being angry, wherein the emotion conveyed by a second data set 6 of the plurality of data sets 6 is related to being scared, wherein the emotion conveyed by a third data set 6 of the plurality of data sets 6 is related to being tender, wherein the emotion conveyed by a fourth data set 6 of the plurality of data sets 6 is related to being excited, wherein the emotion conveyed by a fifth data set 6 of the plurality of data sets 6 is related to being happy, and/or wherein the emotion conveyed by a sixth data set 6 of the plurality of data sets 6 is related to being sad.
- the predefined values of one or more data sets 6 of the plurality of data sets 6 are predefined in such way that the vocal parameters of the modified voice signal MVS correspond to vocal parameters of a voice of a person being different from the user.
- the predefined values of one or more data sets 6 of the plurality of data sets 6 are predefined in such way that the vocal parameters of the modified voice signal MVS correspond to vocal parameters preferably used in a class in society.
- the predefined values of one or more data sets 6 of the plurality of data sets 6 are predefined in such way that the vocal parameters of the modified voice signal MVS correspond to vocal parameters preferably used in a geographical region.
- the predefined values of one or more data sets 6 of the plurality of data sets 6 are predefined in such way that the vocal parameters of the modified voice signal MVS correspond to vocal parameters preferably used in a period in history.
- the voice telecommunication terminal device 1 comprises a memory device 7 configured for storing the at least one portion of the voice signal VS and/or for storing at least one portion of the vocally modified voice signal MVS.
- the second embodiment may also be used for off-line processing, for example, in the context of a voice messaging application.
- the voice signal VS may be stored as a voice file in the memory 7.
- the voice file containing the voice signal VS may then be readout of the memory 7 at a later stage.
- the voice modifier 3 will modify the read-out voice signal VS contained in the voice file before transmitting it as the modified voice signal MVS with other persons.
- the voice modifier 3 modifies the vocal parameters of the voice signal VS instantly so that the modified voice signal MVS is stored as a voice file in memory 7. In both cases the modified voice signal MVS may be transmitted using the messaging application.
- aspects have been described in the context of an apparatus, it is clear that these aspects also represent a description of the corresponding method, where a block or device corresponds to a method step or a feature of a method step. Analogously, aspects described in the context of a method step also represent a description of a corresponding block or item or feature of a corresponding apparatus.
- embodiments of the invention can be implemented in hardware and/or in software.
- the implemen- tation can be performed using a digital storage medium, for example a floppy disk, a DVD, a CD, a ROM, a PROM, an EPROM, an EEPROM or a FLASH memory, having electronically readable control signals stored thereon, which cooperate (or are capable of cooperating) with a programmable computer system such that the respective method is performed.
- a digital storage medium for example a floppy disk, a DVD, a CD, a ROM, a PROM, an EPROM, an EEPROM or a FLASH memory, having electronically readable control signals stored thereon, which cooperate (or are capable of cooperating) with a programmable computer system such that the respective method is performed.
- Some embodiments according to the invention comprise a data carrier having electronically readable control signals, which are capable of cooperating with a programmable computer system such that one of the methods described herein is performed.
- embodiments of the present invention can be implemented as a computer program product with a program code, the program code being operative for performing one of the methods when the computer program product runs on a computer.
- the program code may for example be stored on a machine readable carrier.
- Other embodiments comprise the computer program for performing one of the methods described herein, which is stored on a machine readable carrier or a non-transitory storage medium.
- an embodiment of the inventive method is, therefore, a computer program having a program code for performing one of the methods described herein, when the computer program runs on a computer.
- a further embodiment of the inventive methods is, therefore, a data carrier (or a digital storage medium, or a computer-readable medium) comprising, recorded thereon, the computer program for performing one of the methods described herein.
- a further embodiment of the inventive method is, therefore, a data stream or a sequence of signals representing the computer program for performing one of the methods described herein.
- the data stream or the sequence of signals may be configured, for example, to be transferred via a data communication connection, for example via the Internet.
- a further embodiment comprises a processing means, for example a computer, or a programmable logic device, in particular a processor comprising hardware, configured or adapted to perform one of the methods described herein.
- a further embodiment comprises a computer having installed thereon the computer program for performing one of the methods described herein.
- a programmable logic device for example a field programmable gate array
- a field programmable gate array may cooperate with a microprocessor in order to perform one of the methods described herein.
- the methods are advantageously performed by any hardware apparatus. While this invention has been described in terms of several embodiments, there are alterations, permutations, and equivalents which fall within the scope of this invention. It should also be noted that there are many alternative ways of implementing the methods and compositions of the present invention. It is therefore intended that the following appended claims be interpreted as including all such alterations, permutations and equivalents as fall within the true spirit and scope of the present invention.
Landscapes
- Engineering & Computer Science (AREA)
- Quality & Reliability (AREA)
- Signal Processing (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Telephonic Communication Services (AREA)
Abstract
L'invention concerne un dispositif de terminal de télécommunication vocale (1) configuré pour être utilisé pour une communication vocale, le dispositif de terminal de télécommunication vocale (1) comprenant : un microphone (2) configuré pour convertir une voix d'un utilisateur (VO) en un signal vocal électrique (VS) ; un modificateur vocal (3) configuré pour modifier un ou plusieurs paramètres vocaux d'au moins une partie du signal vocal (VS) afin de produire un signal vocal modifié vocalement (MVS) ; et une interface de télécommunication (4) configurée pour relier le dispositif de terminal de télécommunication vocale (1) à un réseau de télécommunication (TN) et pour transmettre le signal vocal modifié vocalement (MVS) au réseau de télécommunication (TN).
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/EP2016/071595 WO2018050212A1 (fr) | 2016-09-13 | 2016-09-13 | Terminal de télécommunication à conversion vocale |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/EP2016/071595 WO2018050212A1 (fr) | 2016-09-13 | 2016-09-13 | Terminal de télécommunication à conversion vocale |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2018050212A1 true WO2018050212A1 (fr) | 2018-03-22 |
Family
ID=56943512
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/EP2016/071595 WO2018050212A1 (fr) | 2016-09-13 | 2016-09-13 | Terminal de télécommunication à conversion vocale |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2018050212A1 (fr) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5860064A (en) | 1993-05-13 | 1999-01-12 | Apple Computer, Inc. | Method and apparatus for automatic generation of vocal emotion in a synthetic text-to-speech system |
US20030014246A1 (en) * | 2001-07-12 | 2003-01-16 | Lg Electronics Inc. | Apparatus and method for voice modulation in mobile terminal |
EP2224703A1 (fr) * | 2009-02-26 | 2010-09-01 | Research In Motion Limited | Dispositif de communications sans fil mobile avec une nouvelle altération de la voix et procédés correspondants |
EP2928164A1 (fr) * | 2012-12-27 | 2015-10-07 | ZTE Corporation | Procédé et dispositif de transmission pour données vocales |
-
2016
- 2016-09-13 WO PCT/EP2016/071595 patent/WO2018050212A1/fr active Application Filing
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5860064A (en) | 1993-05-13 | 1999-01-12 | Apple Computer, Inc. | Method and apparatus for automatic generation of vocal emotion in a synthetic text-to-speech system |
US20030014246A1 (en) * | 2001-07-12 | 2003-01-16 | Lg Electronics Inc. | Apparatus and method for voice modulation in mobile terminal |
EP2224703A1 (fr) * | 2009-02-26 | 2010-09-01 | Research In Motion Limited | Dispositif de communications sans fil mobile avec une nouvelle altération de la voix et procédés correspondants |
EP2928164A1 (fr) * | 2012-12-27 | 2015-10-07 | ZTE Corporation | Procédé et dispositif de transmission pour données vocales |
Non-Patent Citations (7)
Title |
---|
KIM, YOUNGMOO E. ET AL.: "Music Emotion Recognition: A State of the Art Review", PROCEEDINGS OF THE 11TH INTERNATIONAL SOCIETY FOR MUSIC INFORMATION RETRIEVAL CONFERENCE (ISMIR 2010, 2010 |
KWON, OH-WOOK ET AL.: "Emotion Recognition by Speech Signals", 2003, INSTITUTE FOR NEURAL COMPUTATION UNIVERSITY OF CALIFORNIA |
MURRAY, LAIN R. ET AL.: "Implementation and Testing of a System for Producing Emotion-By-Rule in Synthetic Speech", SPEECH COMMUNICATION, vol. 16, 1995, pages 369 - 390, XP004008594, DOI: doi:10.1016/0167-6393(95)00005-9 |
OUDEYER, PIERRE-YVES: "The Production and Recognition of Emotions in Speech: Features and Algorithms", INTERNATIONAL JOURNAL OF HUMAN-COMPUTER STUDIES, vol. 59, 2003, pages 157 - 183, XP055043379, DOI: doi:10.1016/S1071-5819(02)00141-6 |
RAO, K. SREENIVASA ET AL.: "Emotion Recognition from Speech", INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND INFORMATION TECHNOLOGIES, vol. 3, no. 2, 2012, pages 3603 - 3607 |
SCHRODER, MARC: "Emotional Speech Synthesis: A Review", PROCEEDINGS OF THE 7TH EUROPEAN CONFERENCE ON SPEECH COMMUNICATION AND TECHNOLOGY (EUROSPEECH'01, 2001, pages 561 - 564 |
SONG, YADING ET AL.: "Evaluation of Musical Features for Emotion Classification", PROCEEDINGS OF THE 13TH INTERNATIONAL SOCIETY FOR MUSIC INFORMATION RETRIEVAL CONFERENCE (ISMIR 2012, 2012 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110223705B (zh) | 语音转换方法、装置、设备及可读存储介质 | |
WO2016063879A1 (fr) | Dispositif et procédé de synthèse de discours | |
JP5600092B2 (ja) | 携帯型デバイス内のテキスト音声処理用システムおよび方法 | |
US20210192332A1 (en) | Method and system for analyzing customer calls by implementing a machine learning model to identify emotions | |
CN102254553B (zh) | 语音音节时长的自动归一化 | |
US9196241B2 (en) | Asynchronous communications using messages recorded on handheld devices | |
WO2014192959A1 (fr) | Procédé permettant de répondre à des remarques au moyen d'une synthèse de la parole | |
US12027165B2 (en) | Computer program, server, terminal, and speech signal processing method | |
CN110867177A (zh) | 音色可选的人声播放系统、其播放方法及可读记录介质 | |
JPWO2018168427A1 (ja) | 学習装置、学習方法、音声合成装置、音声合成方法 | |
CN109005419B (zh) | 一种语音信息的处理方法及客户端 | |
CN109104616B (zh) | 一种直播间的语音连麦方法及客户端 | |
WO2016088557A1 (fr) | Dispositif et procédé d'évaluation de conversation | |
JP2011186143A (ja) | ユーザ挙動を学習する音声合成装置、音声合成方法およびそのためのプログラム | |
JP7218143B2 (ja) | 再生システムおよびプログラム | |
JP6343895B2 (ja) | 音声制御装置、音声制御方法およびプログラム | |
JP6375605B2 (ja) | 音声制御装置、音声制御方法およびプログラム | |
WO2018050212A1 (fr) | Terminal de télécommunication à conversion vocale | |
JP6566076B2 (ja) | 音声合成方法およびプログラム | |
US8219402B2 (en) | Asynchronous receipt of information from a user | |
JP6424419B2 (ja) | 音声制御装置、音声制御方法およびプログラム | |
JP2015064480A (ja) | 音声合成装置およびプログラム | |
JP6232892B2 (ja) | 音声合成装置およびプログラム | |
JP6343896B2 (ja) | 音声制御装置、音声制御方法およびプログラム | |
JP2018151661A (ja) | 音声制御装置、音声制御方法およびプログラム |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 16766918 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 16766918 Country of ref document: EP Kind code of ref document: A1 |