CN110797001B

CN110797001B - Method and device for generating voice audio of electronic book and readable storage medium

Info

Publication number: CN110797001B
Application number: CN201810783023.2A
Authority: CN
Inventors: 苏云琳
Original assignee: 阿里巴巴（中国）有限公司
Current assignee: Alibaba China Co Ltd
Priority date: 2018-07-17
Filing date: 2018-07-17
Publication date: 2022-04-12
Anticipated expiration: 2038-07-17
Also published as: CN110797001A

Abstract

According to the method, the device and the readable storage medium for generating the speech audio of the electronic book, provided by the invention, the text paragraphs of the electronic book corresponding to each audio clip are determined by receiving the audio clips uploaded by each user; the audio clip is generated by the user reading the text paragraph; selecting a preferred audio clip from the plurality of audio clips corresponding to each text paragraph according to the received audition feedback information of each audio clip; according to the paragraph sequence of each text paragraph, the preferred audio segments are integrated to generate the electronic book voice audio, so that the user can upload the audio segments read and recorded by himself, and the selected preferred audio segments are integrated to obtain the electronic book voice audio which is vivid in language, can embody the emotional color of the electronic book, accords with the public audio-visual aesthetic sense, and improves the user experience.

Description

Method and device for generating voice audio of electronic book and readable storage medium

Technical Field

The present invention relates to the field of electronic books, and in particular, to a method and an apparatus for generating a voice audio of an electronic book, and a readable storage medium.

Background

With the wider application of internet technology, the traditional paper reading mode has been gradually replaced by electronic books, and electronic books with audio function are also produced in order to prevent users from being affected by physical reasons or light reasons during reading.

The voice audio in the audio function of the existing electronic book is generally obtained through a voice synthesis system. The speech synthesis system can convert the text content of the electronic book into mechanical speech corresponding to the text content according to the text content of the electronic book so as to generate speech audio corresponding to the text of the electronic book.

However, the tone of the speech and audio of the electronic book obtained by the existing method is rather stiff, and the emotional color of the text of the electronic book cannot be reflected, so that the user feels bad when listening.

Disclosure of Invention

In view of the above-mentioned technical problem that the speech audio of the electronic book in the prior art is hard and cannot reflect the emotional color of the text of the electronic book, the present invention provides a method and an apparatus for generating speech audio of an electronic book, and a readable storage medium.

In one aspect, the present invention provides a method for generating a voice audio of an electronic book, including:

receiving audio clips uploaded by each user, and determining text paragraphs of the electronic book corresponding to each audio clip; the audio clip is generated by the user reading the text paragraph;

selecting a preferred audio clip from the plurality of audio clips corresponding to each text paragraph according to the received audition feedback information of each audio clip;

and integrating the preferred audio segments according to the paragraph sequence of each text paragraph to generate the e-book voice audio.

In an optional implementation manner, the receiving audio segments uploaded by users and determining a text paragraph of an electronic book corresponding to each audio segment before the determining includes:

splitting a text of the electronic book to obtain at least one text paragraph;

setting a corresponding audio uploading port for each text paragraph in the electronic book;

correspondingly, the receiving the audio segments uploaded by the users and determining the text paragraphs of the electronic book corresponding to each audio segment includes:

receiving audio clips uploaded by each user at an audio uploading port;

and determining text paragraphs of the electronic book according to the audio uploading ports corresponding to the audio clips.

In an optional implementation manner, the receiving audio segments uploaded by users and determining a text paragraph of an electronic book corresponding to each audio segment includes:

performing voice recognition on the audio clip to obtain text information corresponding to the audio clip;

according to the text information, text paragraphs corresponding to the text information are determined in the electronic book.

In an optional embodiment, the audition feedback information includes rating information, and/or comment information, and/or audition popularity;

correspondingly, the selecting a preferred audio clip from the plurality of audio clips corresponding to each text paragraph according to the received audition feedback information of each audio clip includes:

and sequencing the audio clips corresponding to each text paragraph according to the grading information, and/or the comment information, and/or the audition popularity to determine a preferred audio file.

In an optional implementation manner, the integrating, according to the paragraph order of each text paragraph, each preferred audio segment to generate an electronic book voice audio includes:

sequencing the preferred audio segments according to the sequence of the text segments in the electronic book;

and editing and integrating the sequenced preferred audio segments to generate the speech audio of the electronic book.

In another aspect, the present invention further provides an apparatus for generating speech audio of an electronic book, including:

the communication module is used for receiving the audio clips uploaded by the users;

a text paragraph identification module for determining a text paragraph of the electronic book corresponding to each audio clip; the audio clip is generated by the user reading the text paragraph;

the voice audio generation module is used for selecting a preferred audio fragment from the plurality of audio fragments corresponding to each text paragraph according to the received audition feedback information of each audio fragment; and the electronic book audio integration module is further used for integrating the preferred audio segments according to the paragraph sequence of each text paragraph to generate the electronic book voice audio.

In an optional embodiment, the communication module further comprises an audio upload port corresponding to each text paragraph in the electronic book;

the text paragraph identification module is further used for splitting a text of the electronic book to obtain at least one text paragraph before receiving the audio segments uploaded by each user and determining the text paragraph of the electronic book corresponding to each audio segment;

after the audio uploading port receives the audio clips uploaded by the users, the text passage identification module is further configured to determine text passages of the electronic book according to the audio uploading port corresponding to the audio clips.

In an optional implementation manner, the text passage identification module is specifically configured to perform speech recognition on an audio clip to obtain text information corresponding to the audio clip, and determine a text passage corresponding to the text information in the electronic book according to the text information.

the voice audio generation module is specifically configured to sort the audio segments corresponding to each text paragraph according to the scoring information, and/or the comment information, and/or the listening popularity to determine a preferred audio file.

In an optional implementation manner, the voice audio generation module is specifically configured to sequence the preferred audio segments according to a paragraph sequence of each text segment in the electronic book, and clip and integrate the sequenced preferred audio segments to generate the electronic book voice audio.

In another aspect, the present invention provides an apparatus for generating speech audio of an electronic book, including: a memory, a processor, and a computer program;

wherein the computer program is stored in the memory and configured to be executed by the processor to implement the method of any of the preceding claims.

In a final aspect, the invention provides a readable storage medium, characterized in that a computer program is stored thereon, which computer program is processed to be executed to implement the method as described in any of the previous items.

Drawings

Fig. 1 is a schematic flowchart illustrating a method for generating a voice audio of an electronic book according to an embodiment of the present invention;

fig. 2 is a schematic flowchart of a method for generating a voice audio of an electronic book according to a second embodiment of the present invention;

fig. 3 is a schematic flowchart of a method for generating a voice audio of an electronic book according to a third embodiment of the present invention;

fig. 4 is a schematic structural diagram of an apparatus for generating speech and audio of an electronic book according to a fourth embodiment of the present invention;

fig. 5 is a schematic structural diagram of an apparatus for generating speech and audio of an electronic book according to a fifth embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention.

Fig. 1 is a flowchart illustrating a method for generating a voice audio of an electronic book according to an embodiment of the present invention.

As shown in fig. 1, the generation method includes:

step 101, receiving audio clips uploaded by each user, and determining text paragraphs of the electronic book corresponding to each audio clip;

wherein the audio clip is generated for the user to read the text passage.

And 102, selecting a preferred audio fragment from the plurality of audio fragments corresponding to each text paragraph according to the received audition feedback information of each audio fragment.

And 103, integrating the preferred audio segments according to the paragraph sequence of each text paragraph to generate the e-book voice audio.

It should be noted that the execution subject of the method for generating the e-book voice audio provided by the present invention may specifically be a device for generating the e-book voice audio, and the generating device may be implemented by hardware and/or software. The server based on the generating device and the data server can be the same server or different servers belonging to the same server cluster, which is not limited in the present invention.

In this embodiment, a user can listen to the voice audio of the electronic book through smart devices such as a smart phone, a tablet computer, and an electronic reader, and can also read the electronic book text corresponding to the voice audio of the electronic book of the present invention. The voice audio can be synchronously played according to the text reading progress of the user, namely, the voice audio corresponding to the text content is played when the user reads the text content; the voice audio can also be played independently of the reading behavior of the user, i.e. the voice audio can start playing once the user clicks or triggers a playing instruction of the voice audio, regardless of whether the user is in a state of reading the text or not.

In this embodiment, the device for generating the speech and audio of the electronic book may receive audio segments uploaded by the user, where the audio segments include sound information of the user reading the electronic book. The audio clips can be obtained by direct recording, namely when the user selects to upload the audio clips, the recording function is started by the electronic book voice audio generating device so as to collect and receive sound information of the user reading the electronic book; or the user may pre-record and upload the voice audio, that is, the user transmits the pre-recorded voice audio to the electronic book voice audio generating device through wireless network transmission, near field transmission, wired transmission, or other transmission methods, so as to process the voice audio. After the device for generating the speech audio of the electronic book receives a plurality of audio segments uploaded by each user, it is further required to determine which text paragraph of the electronic book is read by the user in each audio segment.

Subsequently, a listening port can be provided for each audio clip for the user to listen on the audio clip. Then, audition feedback information of each audio segment input or triggered by each user is received, and a preferred audio segment of each text segment is selected from a plurality of audio segments corresponding to each text segment by using the audition feedback information. In order to make the obtained speech audio of the electronic book more vivid and meet the public audio-visual habit and aesthetic requirement, the preferable audio clip can be an audio clip with a better feedback result after audition of each user.

Further, the audition feedback information comprises scoring information, and/or comment information, and/or audition popularity. It should be noted that the scoring information and the comment information are information input or triggered by the user, and the audition popularity is information obtained according to statistics of audition behaviors of the user, such as audition play amount, current audition number, and the like. Correspondingly, selecting a preferred audio clip from the plurality of audio clips corresponding to each text paragraph according to the received audition feedback information of each audio clip, includes: and sequencing the audio clips corresponding to each text paragraph according to the grading information, and/or the comment information, and/or the audition popularity to determine a preferred audio file.

And finally, integrating the preferred audio segments according to the paragraph sequence of each text paragraph to generate the e-book voice audio. It can be known that, since the preferred audio segments corresponding to the text paragraphs have been acquired, at this time, the preferred audio segments may be spliced and integrated according to the paragraph order of the text paragraphs to obtain the e-book speech audio.

The method for generating the speech audio of the electronic book, provided by the embodiment of the invention, comprises the steps of receiving audio segments uploaded by each user, and determining text paragraphs of the electronic book corresponding to each audio segment; the audio clip is generated by the user reading the text paragraph; selecting a preferred audio clip from the plurality of audio clips corresponding to each text paragraph according to the received audition feedback information of each audio clip; according to the paragraph sequence of each text paragraph, the preferred audio segments are integrated to generate the electronic book voice audio, so that the user can upload the audio segments read and recorded by himself, and the selected preferred audio segments are integrated to obtain the electronic book voice audio which is vivid in language, can embody the emotional color of the electronic book, accords with the public audio-visual aesthetic sense, and improves the user experience.

On the basis of the first embodiment, fig. 2 is a schematic flow chart of a method for generating a voice audio of an electronic book according to a second embodiment of the present invention. As shown in fig. 2, the generation method includes:

step 201, splitting a text of an electronic book to obtain at least one text paragraph, and setting a corresponding audio uploading port for each text paragraph in the electronic book.

Step 202, receiving the audio segments uploaded by the users at the audio uploading ports, and determining text paragraphs of the electronic book corresponding to each audio segment according to the audio uploading ports corresponding to the audio segments.

Step 203, selecting a preferred audio clip from the plurality of audio clips corresponding to each text paragraph according to the received audition feedback information of each audio clip.

Step 204, sequencing the preferred audio segments according to the sequence of the text segments in the electronic book;

and step 205, clipping and integrating the sequenced preferred audio segments to generate the speech audio of the electronic book.

In the second embodiment, similar to the first embodiment, a user can listen to the voice audio of the electronic book through a smart device such as a smart phone, a tablet computer, an electronic reader, and the like, and can also read the text of the electronic book corresponding to the voice audio of the electronic book of the present invention. The voice audio can be synchronously played according to the text reading progress of the user, namely, the voice audio corresponding to the text content is played when the user reads the text content; the voice audio can also be played independently of the reading behavior of the user, i.e. the voice audio can start playing once the user clicks or triggers a playing instruction of the voice audio, regardless of whether the user is in a state of reading the text or not.

The device for generating the speech and audio of the electronic book can receive audio segments uploaded by a user, wherein the audio segments comprise sound information when the user reads the electronic book. The audio clips can be obtained by direct recording, namely when the user selects to upload the audio clips, the recording function is started by the electronic book voice audio generating device so as to collect and receive sound information of the user reading the electronic book; or the user may pre-record and upload the voice audio, that is, the user transmits the pre-recorded voice audio to the electronic book voice audio generating device through wireless network transmission, near field transmission, wired transmission, or other transmission methods, so as to process the voice audio.

Different from the first embodiment, in the second embodiment, the text of the electronic book may be firstly split to obtain at least one text paragraph, and a corresponding audio upload port is set for each text paragraph in the electronic book. The text can be split according to chapters or paragraphs of the text, for example, the text can be split into text paragraphs such as "first chapter first section" and "tenth chapter second section". Then, a corresponding audio uploading port is set for each text paragraph, so that the user can upload the audio segments recorded by the user and consistent with the text paragraph to the electronic book voice audio generation device through the corresponding audio uploading port. It should be noted that the audio upload ports may be disposed at the start positions of the corresponding text paragraphs, or may be uniformly disposed in the audio upload area, which is not limited in the embodiments of the present invention. Subsequently, after receiving the plurality of audio segments uploaded by each user, the device for generating e-book speech audio may determine a text paragraph corresponding to the audio upload port directly according to the audio upload port used by the user.

Then, similar to the embodiment, an audition port may be provided for each audio piece for the user to audite the audio pieces. Then, audition feedback information of each audio segment input or triggered by each user is received, and a preferred audio segment of each text segment is selected from a plurality of audio segments corresponding to each text segment by using the audition feedback information. In order to make the obtained speech audio of the electronic book more vivid and meet the public audio-visual habit and aesthetic requirement, the preferable audio clip can be an audio clip with a better feedback result after audition of each user.

Finally, different from the first embodiment, the integrating the preferred audio segments according to the paragraph order of each text paragraph to generate the e-book speech audio may specifically include: sequencing the preferred audio segments according to the sequence of the text segments in the electronic book; and editing and integrating the sequenced preferred audio segments to generate the speech audio of the electronic book. In this embodiment, after the preferred audio segments are sequenced according to the paragraph order, in order to improve the style integrity of the generated e-book speech audio, the audio styles such as the speed tone style and the background music style of the preferred audio segments may be integrated, and the preferred audio segments may be effectively clipped to finally obtain the e-book speech audio. Preferably, in the process of integrating the voice rate and intonation styles, background music styles and other audio styles of the preferred audio segments in the embodiment, the integration can be realized in multiple ways, for example, the voice feature extraction can be performed on each preferred audio segment to generate the voice style of each preferred audio segment; and adjusting the voice style of each preferable audio clip according to each voice style to realize relative unification.

According to the method for generating the e-book voice audio provided by the second embodiment of the invention, on the basis of the first embodiment, the audio uploading port corresponding to each text paragraph is arranged, so that the text paragraphs of the e-book corresponding to each audio segment are determined, and thus the e-book voice audio which is vivid in language, can embody the emotional color of the e-book, accords with the public aesthetic sense of seeing and hearing is obtained, and the user experience is improved.

On the basis of the first embodiment, fig. 3 is a schematic flow chart of a method for generating a voice audio of an electronic book according to a third embodiment of the present invention. As shown in fig. 3, the generation method includes:

step 301, performing voice recognition on the audio clip to obtain text information corresponding to the audio clip.

Step 302, according to the text information, determining a text paragraph corresponding to the text information in the electronic book.

And step 303, selecting a preferred audio segment from the plurality of audio segments corresponding to each text paragraph according to the received audition feedback information of each audio segment.

Step 304, sequencing the preferred audio segments according to the sequence of the text segments in the electronic book;

and 305, cutting and integrating the sequenced preferred audio segments to generate the speech audio of the electronic book.

In the third embodiment, similar to the third embodiment, a user can listen to the voice audio of the electronic book through a smart device such as a smart phone, a tablet computer, an electronic reader, and the like, and can also read the electronic book text corresponding to the voice audio of the electronic book of the present invention. The voice audio can be synchronously played according to the text reading progress of the user, namely, the voice audio corresponding to the text content is played when the user reads the text content; the voice audio can also be played independently of the reading behavior of the user, i.e. the voice audio can start playing once the user clicks or triggers a playing instruction of the voice audio, regardless of whether the user is in a state of reading the text or not.

Different from the first embodiment, in the third embodiment, after the audio clip uploaded by the user is received, speech recognition may be performed on the audio clip to obtain text information corresponding to the audio clip. The method of speech recognition may be any one of the prior art, and the present invention is not limited thereto. Through voice recognition, the text information corresponding to the audio segments can be recognized and acquired, and then the matched text paragraphs are determined in the electronic book according to the text information obtained through recognition.

According to the method for generating the speech audio of the electronic book provided by the third embodiment of the invention, on the basis of the first embodiment, the text paragraphs of the electronic book corresponding to each audio segment are determined in a speech recognition mode, so that the speech audio of the electronic book which is vivid in language, can embody the emotional color of the electronic book and accords with the audience aesthetic sense of public audiences is obtained, and the user experience is improved.

Fig. 4 is a device for generating speech and audio of an electronic book provided by the present invention, which includes:

the communication module 10 is used for receiving audio clips uploaded by users;

a text passage identification module 20, configured to determine a text passage of the electronic book corresponding to each audio clip; the audio clip is generated by the user reading the text paragraph;

the speech audio generation module 30 is configured to select a preferred audio segment from the multiple audio segments corresponding to each text paragraph according to the received audition feedback information of each audio segment; and the electronic book audio integration module is further used for integrating the preferred audio segments according to the paragraph sequence of each text paragraph to generate the electronic book voice audio.

In an alternative embodiment, the communication module 10 further includes an audio upload port corresponding to each text paragraph in the electronic book;

the text passage identification module 20 is further configured to split a text of the electronic book to obtain at least one text passage before receiving the audio passages uploaded by each user and determining the text passage of the electronic book corresponding to each audio passage;

after the audio upload port receives the audio segments uploaded by each user, the text passage identification module 20 is further configured to determine text passages of the electronic book according to the audio upload port corresponding to the audio segment.

In an optional implementation manner, the text passage identification module 20 is specifically configured to perform speech recognition on an audio clip, obtain text information corresponding to the audio clip, and determine a text passage corresponding to the text information in the electronic book according to the text information.

the voice audio generating module 30 is specifically configured to sort the audio segments corresponding to each text paragraph according to the scoring information, and/or the comment information, and/or the listening popularity to determine a preferred audio file.

In an optional implementation manner, the voice audio generating module 30 is specifically configured to sequence the preferred audio segments according to the paragraph order of each text segment in the electronic book, and it can be clearly understood by those skilled in the art that for convenience and simplicity of description, the specific working process and corresponding beneficial effects of the system described above may refer to the corresponding process in the foregoing method embodiment, and details are not described here again.

The device for generating the speech audio of the electronic book provided by the fourth embodiment of the invention determines the text paragraphs of the electronic book corresponding to each audio clip by receiving the audio clips uploaded by each user; the audio clip is generated by the user reading the text paragraph; selecting a preferred audio clip from the plurality of audio clips corresponding to each text paragraph according to the received audition feedback information of each audio clip; according to the paragraph sequence of each text paragraph, the preferred audio segments are integrated to generate the electronic book voice audio, so that the user can upload the audio segments read and recorded by himself, and the selected preferred audio segments are integrated to obtain the electronic book voice audio which is vivid in language, can embody the emotional color of the electronic book, accords with the public audio-visual aesthetic sense, and improves the user experience.

Fig. 5 is a schematic structural diagram of an electronic book voice audio apparatus according to a fifth embodiment of the present invention. As shown in fig. 5, the electronic book voice audio apparatus includes: a memory 41, a processor 42 and a computer program stored on the memory 41 and executable on the processor 42, the processor 42 executing the method of any of the above embodiments when executing the computer program.

The present invention also provides a readable storage medium comprising a program which, when run on a terminal, causes the terminal to perform the method of any of the above embodiments.

Those of ordinary skill in the art will understand that: all or a portion of the steps of implementing the above-described method embodiments may be performed by hardware associated with program instructions. The program may be stored in a computer-readable storage medium. When executed, the program performs steps comprising the method embodiments described above; and the aforementioned storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic or optical disks.

Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims

1. A method for generating speech audio of an electronic book is characterized by comprising the following steps:

splitting a text of the electronic book to obtain at least one text paragraph, and setting a corresponding audio uploading port for each text paragraph in the electronic book;

receiving audio clips uploaded by each user at an audio uploading port, and determining text paragraphs of the electronic book according to the audio uploading ports corresponding to the audio clips, wherein the audio clips are generated by reading the text paragraphs by the users;

2. The method for generating e-book voice audio according to claim 1, wherein the receiving audio segments uploaded by users and determining a text paragraph of the e-book corresponding to each audio segment includes:

3. The method for generating e-book voice audio according to claim 1,

the audition feedback information comprises grading information, and/or comment information, and/or audition popularity;

4. The method for generating e-book voice audio according to claim 1, wherein the integrating the preferred audio segments according to the paragraph order of the text paragraphs to generate e-book voice audio comprises:

5. An apparatus for generating speech audio of an electronic book, comprising:

the communication module is used for receiving the audio clips uploaded by the users and comprises an audio uploading port corresponding to each text paragraph in the electronic book;

the text paragraph identification module is used for splitting the text of the electronic book to obtain at least one text paragraph, and determining the text paragraph of the electronic book according to the audio uploading port corresponding to the audio fragment after the audio uploading port receives the audio fragment uploaded by each user; the audio clip is generated by the user reading the text paragraph;

the voice audio generation module is used for selecting a preferred audio fragment from the plurality of audio fragments corresponding to each text paragraph according to the received audition feedback information of each audio fragment; and integrating the preferred audio segments according to the paragraph sequence of each text paragraph to generate the e-book voice audio.

6. The apparatus of claim 5, wherein the text passage recognition module is specifically configured to perform speech recognition on an audio segment to obtain text information corresponding to the audio segment, and determine a text passage corresponding to the text information in the electronic book according to the text information.

7. The apparatus for generating electronic book voice audio according to claim 5,

8. The apparatus of claim 5, wherein the speech audio generation module is specifically configured to sort the preferred audio segments according to a paragraph sequence of each text segment in the electronic book, and clip and integrate the sorted preferred audio segments to generate the electronic book speech audio.

9. An apparatus for generating speech audio of an electronic book, comprising: a memory, a processor, and a computer program;

wherein the computer program is stored in the memory and configured to be executed by the processor to implement the method of any one of claims 1-4.

10. A readable storage medium, having stored thereon a computer program which is processed to be executed to implement the method of any one of claims 1-4.