WO2015043442A1

WO2015043442A1 - Method, device and mobile terminal for text-to-speech processing

Info

Publication number: WO2015043442A1
Application number: PCT/CN2014/087137
Authority: WO
Inventors: Hui Tang
Original assignee: Tencent Technology (Shenzhen) Company Limited
Priority date: 2013-09-25
Filing date: 2014-09-23
Publication date: 2015-04-02
Also published as: CN104142778B; CN104142778A

Abstract

The present disclosure discloses a method, a device and a mobile terminal for text-to-speech processing. The method comprises: receiving, by a device having a processor and a speaker, a voice play request on an interactive interface of a social application, the voice play request comprises selected text; converting, by the device, the selected text into speech according to the voice play request; and outputting, by the speaker of the device, the speech.

Description

METHOD, DEVICE AND MOBILE TERMINAL FOR TEXT-TO-SPEECH PROCESSING

CROSS-REFERENCE TO RELATED APPLICATIONS

This application claims priority to a Chinese Patent Application No. 201310442687. X, filed on September 25, 2013, which is incorporated by reference in its entirety.

FIELD OF THE TECHNOLOGY

The present disclosure relates to the field of computer and mobile communication technologies, and more particularly to a method, device and mobile terminal for text-to-speech processing.

BACKGROUND

In the information age, mobile phones and tablet PCs (Personal Computers) , as well as other mobile terminals, have become indispensable part of our lives. These terminals are not only used to communicate with others but also to text messages, take photos, play games and so on.

A user can easily read texts, such as a novel, a prose or the like, on a browser of mobile terminals. However, due to the small size of characters displayed on a mobile terminal screen and other adverse effects on the display screen, such as strong ambient sunlight, rain and so on, text browsing by sight has many limitations in its applications. In addition, a user is limited to browse by sight, which adversely affects user experience. Accordingly, it would be advantageous to provide a method to browse texts by sound.

SUMMARY OF THE DISCLOSURE

The present disclosure discloses a method, a device and a mobile terminal for text-to-speech processing, which provide more approaches to browse text and reduce the effects on the performance of text browsing.

In the first aspect, a method for text-to-speech processing, comprises: receiving, by a device having a processor and a speaker, a voice play request on an interactive interface of a social application, the voice play request comprises selected text； converting, by the device, the selected text into speech according to the voice play request； and outputting, by the speaker of the device, the speech.

In the second aspect, a device for text-to-speech processing, comprises a processor and a non-transitory storage medium. The non-transitory storage medium comprises: a request acquisition module, configured to obtain a voice play request on an interactive interface of a social application, wherein the voice play request comprises selected text； a transforming module, configured to transform the selected text to speech according to the voice play request； and a play control module, configured to play the speech on the interactive interface.

In the third aspect, a mobile terminal comprises a device for text-to-speech processing. The device comprises a processor and a non-transitory storage medium. The non-transitory storage medium comprises: a request acquisition module, configured to obtain a voice play request on an interactive interface of a social application, wherein the voice play request comprises selected text； a transforming module, configured to transform the selected text to speech according to the voice play request； and a play control module, configured to play the speech on the interactive interface.

BRIEF DESCRIPTION OF THE DRAWINGS

In order to make further introduction to the embodiments of this disclosure and technical scheme of the existing technology, the accompanying drawings used in embodiments and description of the existing technology will be introduced briefly. Obviously, the drawings in the description are just some embodiments of this disclosure, according to which other drawings may be obtained without creative labor by a person of skill in the art.

Fig. 1 is the flow diagram of a text-to-speech processing method in one embodiment of this disclosure.

Fig. 2 is the flow diagram of a text-to-speech processing method in one embodiment of this disclosure.

Fig. 3 is the flow diagram of a text-to-speech processing method in one embodiment of this disclosure.

Fig. 4 is the structural diagram of a device for text-to-speech processing in one embodiment of this disclosure.

Fig. 5 is the structural diagram of a transforming module in one embodiment of this disclosure.

Fig. 6 is the structural diagram of a device for text-to-speech processing in one embodiment of this disclosure.

Fig. 7 is the structural diagram of a device for text-to-speech processing in one embodiment of this disclosure.

Fig. 8 is the diagram of application scenarios provided in one embodiment of this disclosure.

DETAILED DESCRIPTION OF THE EMBODIMENTS

For a better understanding of the technical scheme thereof, the present disclosure is described in further detail in connection with the accompanying drawings as follows.

The technological scheme in the embodiments of this disclosure will be described clearly and completely, using the accompanying drawings. Obviously, only some embodiments, rather than all of them, will be described below. Any other embodiment obtained by a person of skill in the art on basis of embodiments of this disclosure without any creative labor is in the scope of this invention.

Interactive interface is the channel of message exchange between human and mobile terminals. The interactive interface is used by users to input messages into mobile terminals and execute operations, while the mobile terminals are used to provide messages for reading, analyzing and judging. Interactive interfaces in the embodiments of this disclosure contain home pages of software or applications, as well as various function buttons, interactive interfaces. The interactive interfaces, of different levels or appearing after different triggering events, are different interactive interfaces, and those of a same level or appearing after one triggering event are the same interactive interface, which is not determined by the content on the interactive interface. The difference between various interactive interfaces is only related to the inherent difference of interactive interface in a certain level but not to the content in the interactive interface. Social applications include instant messaging, text chat, internet forums, blogs, social network services, and any other social software.

In embodiments of this disclosure, the mobile terminal may be a mobile phone, a smart phone, a tablet PC or other terminal devices. The device of text-to-speech processing may be a hardware device contained in the mobile terminal, or a text application in the mobile terminal, e. g. a browser application or the like. The text-to-speech processing device can show webpage on basis of Webview, which is a kind of space used to show webpage on basis of Webkit Kernel.

The text-to-speech processing method provided in the embodiments of this disclosure may be applied when it is inconvenient to read text. For example, when words in an article are read by a user on a bus, the text-to-speech processing device can transform the text to speech； or when words in a webpage are read by a user on a trip, the text-to-speech processing device can transform the text to speech, or the like. The text-to-speech processing device obtains the text selected for voice play on an interactive interface of a social application, transforms the selected text, generates speech corresponding to the selected text and plays the speech on the interactive interface.

The method for text-to-speech processing provided in the embodiments of this disclosure will be described in detail with drawings 1-3 as below.

Referring to Fig. 1, the flow diagram of a text-to-speech processing method shows the embodiments of this disclosure. As shown in Fig. 1, the method in the embodiments of this disclosure comprises steps of S101-S103 as below.

S101: obtaining a voice play request on the interactive interface of a social application, in which the voice play request comprises a selected text.

Specifically, when the text-to-speech processing device finds a voice play request on the interactive interface of the social application, the text-to-speech processing device will obtain the selected text on the interactive interface.

It should be aware that, the voice play request comprises the selected text, which may contains Chinese characters, English characters, and so on. The selected text may be the text selected on the interactive interface by a user. Preferably, the text-to-speech processing device can analyze the voice play request and obtain the text after analysis.

S102: transforming the selected text into speech according to the voice play request.

Specifically, the text-to-speech processing device may apply a voice application to obtain speech corresponding to the selected text according to the voice play request. Preferably, the voice application obtains the speech corresponding to the text, according to the correspondence between the text and the speech. The correspondence between the text and the speech is based on the following: one text is corresponding to only one code information, and one code information is corresponding to only one speech.

It should be aware that the voice application may be a built-in voice application or a voice module of instant messaging application, and the system may be an operating system inset in the mobile terminal, e. g. Android system, IOS system, and so on.

S103: playing the speech on the interactive interface.

Specifically, the text-to-speech processing device can control the voice application to play the speech on the interactive interface.

In the embodiment, it can transform the selected text on an interactive interface of social application to speech and play the speech. The device provides more approaches to browse text. The method reduces effects on text browsing caused by character sizes, environmental factors and so on, and improves the intellectuality of the mobile terminal.

Referring to Fig. 2, the flow diagram of another method for text-to-speech processing shows the embodiments of this disclosure. As shown in Fig. 2, the method in the embodiments comprises steps of S201-S205 as below.

S201: obtaining the text selected by a user on an interactive interface, when the request of selecting text on an interactive interface of social application is received.

Specifically, when the text-to-speech processing device detects the user’s voice play request on the interactive interface of the social application, the text-to-speech processing device will obtain the text of user’s selection on the interactive interface.

Preferably, the user’s voice play request on the interactive interface may be issued by the user through long-pressing on the interactive interface, and then the text-to-speech processing device will display a pull-selection cursor on the interactive interface. The user can select the text for voice play by sliding the pull-selection cursor.

S202: generating a voice play request according to the text.

Specifically, the text-to-speech processing device packages the text of the user’s selection on the interactive interface, and generates a voice play request.

S203: obtaining the voice play request on the interactive interface of a social application, in which the voice play request comprises the selected text.

Specifically, when the text-to-speech processing device finds a voice play request on the interactive interface of a social application, the text-to-speech processing device will obtain the selected text on the interactive interface.

The voice play request comprises the selected text, which may contain Chinese characters, English characters, and so on. The selected text may be the text selected on the interactive interface by a user. Preferably, the text-to-speech processing device can analyze the voice play request and obtain the text after analysis.

S204: applying a voice application to obtain speech corresponding to the selected text, according to correspondence between the text and the speech.

Specifically, correspondence between the text and the speech is based on the following: one text is corresponding to only one coding information, and one coding information is corresponding to only one speech. The text-to-speech processing device applies a voice application to obtain the coding information corresponding to the selected text, and then control the voice application to search the speech corresponding to the coding information.

For example, supposing that the text of A is corresponding to coding information 66 and the coding information 66 is corresponding to the pronunciation of A, the text-to-speech processing device applies the voice application to obtain the coding information 66 of A, and controls the voice application to search pronunciation of A corresponding to the coding information 66.

The voice application may be a built-in voice application or a voice module of instant messaging application, and the system may be an operating system inset in the mobile terminal, e. g. Android system, IOS system, and so on.

S205: playing the speech on the interactive interface.

In the embodiment, the device transforms the text of selection on an interactive interface of a social application to speech and plays the message, which provides more approaches to browse text. The method can reduce effects on text browsing caused by character sizes, environmental factors and so on. One coding information is corresponding to only one text and only one speech, which increases the accuracy of transformation from text to speech. The intellectuality of the mobile terminal is improved.

Referring to Fig. 3, the flow diagram of another method for text-to-speech processing shows the embodiments of this disclosure. As shown in Fig. 3, the method in the embodiment of this disclosure comprises steps of S301-S307 as below.

S301: obtaining the text selected by a user on an interactive interface, when the request of selecting text on the interactive interface of a social application is received.

Specifically, when the text-to-speech processing device detects a user’s voice play request on the interactive interface of a social application, the text-to-speech processing device will obtain the text of the user’s selection on the interactive interface.

Preferably, the user’s voice play request on the interactive interface may be issued by the user through long-pressing on the interactive interface, and then the text-to-speech processing device will display a pull-selection cursor on the interactive interface and the user can select the text for voice play through sliding the pull-selection cursor.

S302: displaying a prompt button of voice play.

Specifically, after the text-to-speech processing device obtains the text of the user’s selection on the interactive interface, the text-to-speech processing device displays a prompt button of voice play on the interactive interface and the user clicks the prompt button of voice play. When the prompt button of voice play is clicked, it means the prompt button of voice play is triggered.

The prompt button of voice play may be displayed when the interactive interface appears. The prompt button of voice play may be a prompt message of voice play, used to prompt a user whether to play the selected text. By displaying the prompt button of voice play or the prompt message of voice play, the automatic voice play of text caused by misoperation can be avoided, when a user is browsing the interactive interface.

S303: executing the step of generating a voice play request according to the selected text when the device detects that the prompt button of voice play is triggered.

Specifically, the step of S304 is executed when the text-to-speech processing device detects that the prompt button of voice play is triggered.

S304: generating a voice play request according to the text.

Specifically, the text-to-speech processing device packages the text of the user’s selection on the interactive interface and generates a voice play request.

S305: obtaining a voice play request on the interactive interface of a social application, and the voice play request comprises the selected text.

Specifically, when the text-to-speech processing device finds a voice play request on the interactive interface of a social application, the text-to-speech processing device obtains selected text on the interactive interface.

S306: applying a voice application to obtain the speech corresponding to the selected text, according to correspondence between the text and the speech.

Specifically, correspondence between the text and the speech is based on the following； one text is corresponding to only one coding information, and one coding information is corresponding to only one speech. The text-to-speech processing device may apply a voice application to obtain coding information corresponding to the text, and then control the voice application to search the speech corresponding to the coding information.

For example, suppose the text of A is corresponding to coding information 66 and the coding information is corresponding to the pronunciation of A, the text-to-speech processing device can apply the voice application to obtain the coding information 66 of A, and control the voice application to search pronunciation of A corresponding to the coding information 66.

Preferably, the voice application saves the speech, correspondence between the coding information and the text, and correspondence between the coding information and the speech beforehand.

S307: playing the speech on the interactive interface.

In the embodiment, the device can transform the text of selection on the interactive interface of a social application to speech and play the speech. This device provides more approaches to browse text. In addition, the process can reduce effects on text browsing caused by character sizes, environmental factors and so on. One coding information is corresponding to only one text and only one speech, which increases the accuracy of transformation from text to speech. In addition, by displaying the prompt button of voice play or the prompt message of voice play, the automatic voice play of text caused by misoperation can be avoided when user is browsing the interactive interface. Therefore, the intellectuality of mobile terminal is improved.

The text-to-speech processing device provided in the embodiments of this disclosure will be introduced in detail below, using the accompanying drawings 4-7. It should be noticed that the text-to-speech processing device shown in the accompanying drawings 4-7 is used to execute the method in the accompanying drawings 1-3. In order to make better introduction, it only shows some parts related to the embodiments of this disclosure. Reference may be made to the embodiments in accompanying drawings 1-3 to obtain more information about the undisclosed specific technique detail.

Reference is made to Fig. 4, which is the structural diagram of a device for text-to-speech processing provided by the embodiments of this disclosure. As shown in Fig. 4, the device for text-to-speech processing 1 comprises request acquisition module 11, transforming module 12 and play control module 13.

The request acquisition module 11 is used to obtain a voice play request on the interactive interface of a social application, in which the voice play request comprises the selected text.

Specifically, when the device for text-to-speech processing 1 finds a voice play request on the interactive interface of a social application, the request acquisition module 11 obtains the selected text on the interactive interface.

The voice play request comprises the selected text, which may contain Chinese characters, English characters, and so on. The text of selection may be the text selected on the interactive interface by a user. Preferably, the request acquisition module 11 can analyze the voice play request and obtain the text after analysis.

The transforming module 12 is used to transform the text into speech according to the voice play request.

Specifically, the transforming module 12 applies a voice application to obtain speech corresponding to the selected text, according to the voice play request. Preferably, the voice application obtains speech corresponding to the text, according to correspondence between the text and the speech. The correspondence between the text and the speech is based on the following； one text is corresponding to only one coding information, and one coding information is corresponding to only one voice.

Preferably, the voice application saves the correspondence between the coding information and the text, and correspondence between the coding information and the speech beforehand.

The voice application may be a built-in voice application or a voice module of instant messaging application and the system may be an operating system inset in the mobile terminal, e. g. Android system, IOS system, and so on.

Specifically, further reference is made to Fig. 5, which is the structural diagram of the transforming module provided by the embodiments of this disclosure. As shown in Fig. 5, the transforming module 12 contains code acquisition unit 121 and speech search unit 122 .

Code acquisition unit 121 is used to apply a voice application to obtain coding information corresponding to the text.

Specifically, the code acquisition unit 121 applies the voice application to obtain the coding information corresponding to the text. For example, supposing that the text of A is corresponding to coding information 66 and the coding information is corresponding to the pronunciation of A, the code acquisition unit 121 applies the voice application to obtain the coding information 66 of A.

Speech search unit 122 is used to control the voice application to search speech corresponding to the coding information.

Specifically, the speech search unit 122 controls the voice application to search the speech corresponding to the coding information. For example, supposing that the text of A is corresponding to coding information 66 and the coding information is corresponding to the pronunciation of A, when the code acquisition unit 121 calls the voice application to get the coding information 66 of A, the speech search unit 122 controls the voice application to search pronunciation of A corresponding to the coding information 66.

The play control module 13 is used to play the speech on the interactive interface.

Specifically, the play control module 13 controls the voice application to play the speech on the interactive interface.

In the embodiment, the device transforms the text of selection on an interactive interface of a social application into speech and plays the information. The device provides more approaches to browse text. In addition, the method reduces effects on text browsing caused by character sizes, environmental factors and so on. One coding information is corresponding to only one text and only one speech, which increases the accuracy of transformation from text into speech. The intellectuality of mobile terminal is improved.

Reference is made to Fig. 6, which is the structural diagram of another device for text-to-speech processing provided by the embodiments of this disclosure. As shown in Fig. 6, the device for text-to-speech processing 1 may comprise request acquisition module 11, transforming module 12, play control module 13, acquisition module 14 and generating module 15, among which the structure of request acquisition module 11, transforming module 12 and play control module 13 have been described in the introduction of embodiment corresponding to Fig. 4, which will not be described in detail again.

Acquisition module 14 is used to obtain the text selected by a user on the interactive interface, when the request of selecting text on the interactive interface of a social application is received.

Specifically, when the text-to-speech processing device 1 finds a user’s voice play request on the interactive interface of a social application, acquisition module 14 obtains the selected text on the interactive interface.

Preferably, the user’s voice play request on the interactive interface may be issued by the user through long-pressing on the interactive interface, and then the text-to-speech processing device 1 displays a pull-selection cursor on the interactive interface and the user can select the text for voice play by sliding the pull-selection cursor.

Generating module 15 is used to generate a voice play request according to the text.

Specifically, the text-to-speech processing device packages the text of a user’s selection on the interactive interface and generates voice play request.

In the embodiment, the device can transform the text of selection for voice play on the interactive interface of a social application to speech and play the information, which avoids the fuzziness in text caused by small-size characters and brings convenience and better feeling in reading to user. In addition, because one coding information is corresponding to only one text and only one speech, the device increases the accuracy of transformation from text to speech, and the intellectuality of mobile terminal is improved.

Reference is made to Fig. 7, which is the structural diagram of another device for text-to-speech processing provided by the embodiments of this disclosure. As shown in Fig. 7, the device for text-to-speech processing 1 may comprise request acquisition module 11, transforming module 12, play control module 13, acquisition module 14, generating module 15, display module 16 and notify module 17, among which the structure of the request acquisition module 11, transforming module 12 and play control module 13 have been described in the introduction of embodiment corresponding to Fig. 4, and the acquisition module 14 and generating module 15 have been described in the introduction of embodiment corresponding to Fig. 6, and they will not be described in detail again.

Display module 16 is used to display a prompt button of voice play.

Specifically, after the acquisition module 14 obtains the text of a user’s selection on the interactive interface, the display module 16 displays a prompt button of voice play on the interactive interface. The user can click the prompt button of voice play. When the prompt button of voice play is clicked, it means the prompt button of voice play is triggered.

The prompt button of voice play might be displayed when the interactive interface appears. The prompt button of voice play may be a prompt message of voice play, used to prompt user whether to play the selected text. By displaying the prompt button of voice play or the prompt message of voice play, the automatic voice play of text caused by misoperation can be avoided, when the user is browsing the interactive interface.

Notify module 17 is used to notify the generating module to execute the step of generating a voice play request according to the text, in case that it is detected that the prompt button of voice play is triggered.

Specifically, the notify module 17 informs the generating module 15 to execute the step of generating a voice play request according to the text when the device detects that the prompt button of voice play is triggered.

In the embodiment, the device transforms the text of selection on the interactive interface of a social application to speech. It provides more approaches to browse text. In addition, the method reduces effects on text browsing caused by character sizes, environmental factors and so on.One coding information is corresponding to only one text and only one speech. It increases the accuracy of transformation from text into speech. In addition, by displaying the prompt button of voice play or the prompt message of voice play, the automatic voice play of text caused by misoperation can be avoided, when a user is browsing the interactive interface. Therefore, the intellectuality of mobile terminal is improved.

There is also a provided mobile terminal in the embodiment of this disclosure, comprising the text-to-speech processing device shown in embodiments corresponding to Fig. 4 -Fig. 7, as well as applications in embodiments above. The mobile terminal in the embodiment of this disclosure can be used in the above methods.

Reference is made to Fig. 8, which is the diagram of application scenarios provided by the embodiments of this disclosure. As shown in Fig. 8, after receiving the user’s request of text selection on the interactive interface of a social application, the text-to-speech processing device displays a pull-selection cursor on the interactive interface and the user can select the text for voice play through sliding the pull-selection cursor.

The shadowed part in Fig. 8 is the text of user’s selection on the interactive interface of a social application, and the text-to-speech processing device displays a prompt button of voice play (the button “PLAY” in Fig. 8) . If the prompt button of voice play is triggered, the text-to-speech processing device applies the voice application to transform the text into speech, and controls the voice application to play the speech on the interactive interface.

The voice application may be a built-in voice application or an voice module of instant messaging application and the system may be an operating system inset in the mobile terminal, e. g. Android system, IOS system, and so on.

In the embodiment, the device transforms the text of selection on the interactive interface of a social application into speech and plays the speech. It provides more approaches to browse text. In addition, the method reduces effects on performance of text browse caused by character sizes, environmental factors and so on. One coding information is corresponding to only one text and only one speech, which increases the accuracy of transformation from text to speech. Moreover, by displaying the prompt button of voice play or the prompt message of voice play, the automatic voice play of text caused by misoperation can be avoided, when user is browsing the interactive interface. Therefore, the intellectuality of mobile terminal is improved.

Person of skill in the art can be aware that the whole or part of process in the embodiments may be realized by involved hardware under control of computer program, which may be stored in a memory medium. When the program is executed, flow processes in the embodiments above may be contained. The memory medium above may be diskettes, optical disks, Read-Only Memory (ROM) or Random Access Memory (RAM) , or the like.

All disclosures above are the preferred the embodiments and it does not intend to limit the range of the invention. Therefore, any equivalent change according to the claims of the invention is in range of this invention.

It must be noted that the smart terminal of the present disclosure is not limited to smart phones, the server device is not limited to personal computer, and the disclosed method is also suitable for operating systems other than Android systems. The server device may be a computer, a tablet, a smart phone, or any computing devices. The disclosed methods in the above embodiments may be combined with each other.

Disclosed above are only embodiments of the present disclosure and these embodiments are not intended to be limiting the scope of the present disclosure, hence any equivalent variations made based on the prospectus and accompanying drawings of the present disclosure, or any direct or indirect use based thereon in other related fields shall fall within the scope of the present disclosure.

Claims

A method for text-to-speech processing, comprising:

receiving, by a device having a processor and a speaker, a voice play request on an interactive interface of a social application, the voice play request comprising selected text；

converting, by the device, the selected text into speech according to the voice play request； and

outputting, by the speaker of the device, the speech.
The method of claim 1, wherein converting the selected text into the speech according to the voice play request comprises:

applying a voice application to obtain the speech corresponding to the selected text, according to the correspondence between text and speech.
The method of claim 2, wherein the correspondence between the text and the speech comprises:

the text that corresponds to only one coding information； and

the coding information that corresponds to only one speech.
The method of claim 3, wherein applying a voice application to obtain the speech comprises:

applying the voice application to obtain the coding information corresponding to the selected text； and

controlling the voice application to search speech corresponding to the coding information.
The method of claim 1, further comprising:

obtaining the text selected by a user on an interactive interface； and

generating the voice play request according to the selected text.
The method of claim 5, further comprising:

displaying a prompt button of voice play； and

generating a voice play request according to the selected text, when the prompt button of voice play is triggered.
The method of claim 2 or claim 4, wherein the voice application comprises at least one of a built-in voice application and a voice module of instant messaging application.
A device for text-to-speech processing, comprising a processor and a non-transitory storage medium, the non-transitory storage medium comprising:

a request acquisition module, configured to obtain a voice play request on an interactive interface of a social application, wherein the voice play request comprises selected text；

a transforming module, configured to transform the selected text to speech according to the voice play request； and

a play control module, configured to play the speech on the interactive interface.
The device of claim 8, wherein the transforming module is configured to apply a voice application to obtain speech corresponding to the selected text according to the correspondence between the text and the speech.
The device of claim 9, wherein the correspondence between the text and the speech comprises:

the text that corresponds to only one coding information； and

the coding information that corresponds to only one speech.
The device of claim 10, wherein the transforming module comprises:

a code acquisition unit, configured to apply a voice application to obtain the coding information corresponding to the text； and

a speech search unit, configured to control the voice application to search speech corresponding to the coding information.
The device of claim 8, further comprising:

an acquisition module, configured to obtain the text selected by a user on the interactive interface, when a request of selecting text on the interactive interface of the social application is received, and

a generating module, configured to generate a voice play request according to the text.
The device of claim 12, further comprising:

a display module, configured to display a prompt button of voice play； and

a notify module, configured to notify the generating module to execute the step of generating a voice play request according to the text, when the prompt button of voice play is triggered.
The device of claim 9 or claim 11, wherein the voice application comprises at least one of a built-in voice application and a voice module of instant messaging application.
A mobile terminal, comprising a device for text-to-speech processing, wherein the device comprises a processor and a non-transitory storage medium, the non-transitory storage medium comprising:

a request acquisition module, configured to obtain a voice play request on an interactive interface of a social application, wherein the voice play request comprises selected text；

a transforming module, configured to transform the selected text to speech according to the voice play request； and

a play control module, configured to play the speech on the interactive interface.