CN110797001B - Method and device for generating voice audio of electronic book and readable storage medium - Google Patents

Method and device for generating voice audio of electronic book and readable storage medium Download PDF

Info

Publication number
CN110797001B
CN110797001B CN201810783023.2A CN201810783023A CN110797001B CN 110797001 B CN110797001 B CN 110797001B CN 201810783023 A CN201810783023 A CN 201810783023A CN 110797001 B CN110797001 B CN 110797001B
Authority
CN
China
Prior art keywords
audio
text
electronic book
paragraph
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810783023.2A
Other languages
Chinese (zh)
Other versions
CN110797001A (en
Inventor
苏云琳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba China Co Ltd
Original Assignee
阿里巴巴(中国)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 阿里巴巴(中国)有限公司 filed Critical 阿里巴巴(中国)有限公司
Priority to CN201810783023.2A priority Critical patent/CN110797001B/en
Publication of CN110797001A publication Critical patent/CN110797001A/en
Application granted granted Critical
Publication of CN110797001B publication Critical patent/CN110797001B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/02Methods for producing synthetic speech; Speech synthesisers
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/08Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/04Segmentation; Word boundary detection
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems

Abstract

According to the method, the device and the readable storage medium for generating the speech audio of the electronic book, provided by the invention, the text paragraphs of the electronic book corresponding to each audio clip are determined by receiving the audio clips uploaded by each user; the audio clip is generated by the user reading the text paragraph; selecting a preferred audio clip from the plurality of audio clips corresponding to each text paragraph according to the received audition feedback information of each audio clip; according to the paragraph sequence of each text paragraph, the preferred audio segments are integrated to generate the electronic book voice audio, so that the user can upload the audio segments read and recorded by himself, and the selected preferred audio segments are integrated to obtain the electronic book voice audio which is vivid in language, can embody the emotional color of the electronic book, accords with the public audio-visual aesthetic sense, and improves the user experience.

Description

Method and device for generating voice audio of electronic book and readable storage medium
Technical Field
The present invention relates to the field of electronic books, and in particular, to a method and an apparatus for generating a voice audio of an electronic book, and a readable storage medium.
Background
With the wider application of internet technology, the traditional paper reading mode has been gradually replaced by electronic books, and electronic books with audio function are also produced in order to prevent users from being affected by physical reasons or light reasons during reading.
The voice audio in the audio function of the existing electronic book is generally obtained through a voice synthesis system. The speech synthesis system can convert the text content of the electronic book into mechanical speech corresponding to the text content according to the text content of the electronic book so as to generate speech audio corresponding to the text of the electronic book.
However, the tone of the speech and audio of the electronic book obtained by the existing method is rather stiff, and the emotional color of the text of the electronic book cannot be reflected, so that the user feels bad when listening.
Disclosure of Invention
In view of the above-mentioned technical problem that the speech audio of the electronic book in the prior art is hard and cannot reflect the emotional color of the text of the electronic book, the present invention provides a method and an apparatus for generating speech audio of an electronic book, and a readable storage medium.
In one aspect, the present invention provides a method for generating a voice audio of an electronic book, including:
receiving audio clips uploaded by each user, and determining text paragraphs of the electronic book corresponding to each audio clip; the audio clip is generated by the user reading the text paragraph;
selecting a preferred audio clip from the plurality of audio clips corresponding to each text paragraph according to the received audition feedback information of each audio clip;
and integrating the preferred audio segments according to the paragraph sequence of each text paragraph to generate the e-book voice audio.
In an optional implementation manner, the receiving audio segments uploaded by users and determining a text paragraph of an electronic book corresponding to each audio segment before the determining includes:
splitting a text of the electronic book to obtain at least one text paragraph;
setting a corresponding audio uploading port for each text paragraph in the electronic book;
correspondingly, the receiving the audio segments uploaded by the users and determining the text paragraphs of the electronic book corresponding to each audio segment includes:
receiving audio clips uploaded by each user at an audio uploading port;
and determining text paragraphs of the electronic book according to the audio uploading ports corresponding to the audio clips.
In an optional implementation manner, the receiving audio segments uploaded by users and determining a text paragraph of an electronic book corresponding to each audio segment includes:
performing voice recognition on the audio clip to obtain text information corresponding to the audio clip;
according to the text information, text paragraphs corresponding to the text information are determined in the electronic book.
In an optional embodiment, the audition feedback information includes rating information, and/or comment information, and/or audition popularity;
correspondingly, the selecting a preferred audio clip from the plurality of audio clips corresponding to each text paragraph according to the received audition feedback information of each audio clip includes:
and sequencing the audio clips corresponding to each text paragraph according to the grading information, and/or the comment information, and/or the audition popularity to determine a preferred audio file.
In an optional implementation manner, the integrating, according to the paragraph order of each text paragraph, each preferred audio segment to generate an electronic book voice audio includes:
sequencing the preferred audio segments according to the sequence of the text segments in the electronic book;
and editing and integrating the sequenced preferred audio segments to generate the speech audio of the electronic book.
In another aspect, the present invention further provides an apparatus for generating speech audio of an electronic book, including:
the communication module is used for receiving the audio clips uploaded by the users;
a text paragraph identification module for determining a text paragraph of the electronic book corresponding to each audio clip; the audio clip is generated by the user reading the text paragraph;
the voice audio generation module is used for selecting a preferred audio fragment from the plurality of audio fragments corresponding to each text paragraph according to the received audition feedback information of each audio fragment; and the electronic book audio integration module is further used for integrating the preferred audio segments according to the paragraph sequence of each text paragraph to generate the electronic book voice audio.
In an optional embodiment, the communication module further comprises an audio upload port corresponding to each text paragraph in the electronic book;
the text paragraph identification module is further used for splitting a text of the electronic book to obtain at least one text paragraph before receiving the audio segments uploaded by each user and determining the text paragraph of the electronic book corresponding to each audio segment;
after the audio uploading port receives the audio clips uploaded by the users, the text passage identification module is further configured to determine text passages of the electronic book according to the audio uploading port corresponding to the audio clips.
In an optional implementation manner, the text passage identification module is specifically configured to perform speech recognition on an audio clip to obtain text information corresponding to the audio clip, and determine a text passage corresponding to the text information in the electronic book according to the text information.
In an optional embodiment, the audition feedback information includes rating information, and/or comment information, and/or audition popularity;
the voice audio generation module is specifically configured to sort the audio segments corresponding to each text paragraph according to the scoring information, and/or the comment information, and/or the listening popularity to determine a preferred audio file.
In an optional implementation manner, the voice audio generation module is specifically configured to sequence the preferred audio segments according to a paragraph sequence of each text segment in the electronic book, and clip and integrate the sequenced preferred audio segments to generate the electronic book voice audio.
In another aspect, the present invention provides an apparatus for generating speech audio of an electronic book, including: a memory, a processor, and a computer program;
wherein the computer program is stored in the memory and configured to be executed by the processor to implement the method of any of the preceding claims.
In a final aspect, the invention provides a readable storage medium, characterized in that a computer program is stored thereon, which computer program is processed to be executed to implement the method as described in any of the previous items.
According to the method, the device and the readable storage medium for generating the speech audio of the electronic book, provided by the invention, the text paragraphs of the electronic book corresponding to each audio clip are determined by receiving the audio clips uploaded by each user; the audio clip is generated by the user reading the text paragraph; selecting a preferred audio clip from the plurality of audio clips corresponding to each text paragraph according to the received audition feedback information of each audio clip; according to the paragraph sequence of each text paragraph, the preferred audio segments are integrated to generate the electronic book voice audio, so that the user can upload the audio segments read and recorded by himself, and the selected preferred audio segments are integrated to obtain the electronic book voice audio which is vivid in language, can embody the emotional color of the electronic book, accords with the public audio-visual aesthetic sense, and improves the user experience.
Drawings
Fig. 1 is a schematic flowchart illustrating a method for generating a voice audio of an electronic book according to an embodiment of the present invention;
fig. 2 is a schematic flowchart of a method for generating a voice audio of an electronic book according to a second embodiment of the present invention;
fig. 3 is a schematic flowchart of a method for generating a voice audio of an electronic book according to a third embodiment of the present invention;
fig. 4 is a schematic structural diagram of an apparatus for generating speech and audio of an electronic book according to a fourth embodiment of the present invention;
fig. 5 is a schematic structural diagram of an apparatus for generating speech and audio of an electronic book according to a fifth embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention.
With the wider application of internet technology, the traditional paper reading mode has been gradually replaced by electronic books, and electronic books with audio function are also produced in order to prevent users from being affected by physical reasons or light reasons during reading.
The voice audio in the audio function of the existing electronic book is generally obtained through a voice synthesis system. The speech synthesis system can convert the text content of the electronic book into mechanical speech corresponding to the text content according to the text content of the electronic book so as to generate speech audio corresponding to the text of the electronic book.
However, the tone of the speech and audio of the electronic book obtained by the existing method is rather stiff, and the emotional color of the text of the electronic book cannot be reflected, so that the user feels bad when listening.
In view of the above-mentioned technical problem that the speech audio of the electronic book in the prior art is hard and cannot reflect the emotional color of the text of the electronic book, the present invention provides a method and an apparatus for generating speech audio of an electronic book, and a readable storage medium.
Fig. 1 is a flowchart illustrating a method for generating a voice audio of an electronic book according to an embodiment of the present invention.
As shown in fig. 1, the generation method includes:
step 101, receiving audio clips uploaded by each user, and determining text paragraphs of the electronic book corresponding to each audio clip;
wherein the audio clip is generated for the user to read the text passage.
And 102, selecting a preferred audio fragment from the plurality of audio fragments corresponding to each text paragraph according to the received audition feedback information of each audio fragment.
And 103, integrating the preferred audio segments according to the paragraph sequence of each text paragraph to generate the e-book voice audio.
It should be noted that the execution subject of the method for generating the e-book voice audio provided by the present invention may specifically be a device for generating the e-book voice audio, and the generating device may be implemented by hardware and/or software. The server based on the generating device and the data server can be the same server or different servers belonging to the same server cluster, which is not limited in the present invention.
In this embodiment, a user can listen to the voice audio of the electronic book through smart devices such as a smart phone, a tablet computer, and an electronic reader, and can also read the electronic book text corresponding to the voice audio of the electronic book of the present invention. The voice audio can be synchronously played according to the text reading progress of the user, namely, the voice audio corresponding to the text content is played when the user reads the text content; the voice audio can also be played independently of the reading behavior of the user, i.e. the voice audio can start playing once the user clicks or triggers a playing instruction of the voice audio, regardless of whether the user is in a state of reading the text or not.
In this embodiment, the device for generating the speech and audio of the electronic book may receive audio segments uploaded by the user, where the audio segments include sound information of the user reading the electronic book. The audio clips can be obtained by direct recording, namely when the user selects to upload the audio clips, the recording function is started by the electronic book voice audio generating device so as to collect and receive sound information of the user reading the electronic book; or the user may pre-record and upload the voice audio, that is, the user transmits the pre-recorded voice audio to the electronic book voice audio generating device through wireless network transmission, near field transmission, wired transmission, or other transmission methods, so as to process the voice audio. After the device for generating the speech audio of the electronic book receives a plurality of audio segments uploaded by each user, it is further required to determine which text paragraph of the electronic book is read by the user in each audio segment.
Subsequently, a listening port can be provided for each audio clip for the user to listen on the audio clip. Then, audition feedback information of each audio segment input or triggered by each user is received, and a preferred audio segment of each text segment is selected from a plurality of audio segments corresponding to each text segment by using the audition feedback information. In order to make the obtained speech audio of the electronic book more vivid and meet the public audio-visual habit and aesthetic requirement, the preferable audio clip can be an audio clip with a better feedback result after audition of each user.
Further, the audition feedback information comprises scoring information, and/or comment information, and/or audition popularity. It should be noted that the scoring information and the comment information are information input or triggered by the user, and the audition popularity is information obtained according to statistics of audition behaviors of the user, such as audition play amount, current audition number, and the like. Correspondingly, selecting a preferred audio clip from the plurality of audio clips corresponding to each text paragraph according to the received audition feedback information of each audio clip, includes: and sequencing the audio clips corresponding to each text paragraph according to the grading information, and/or the comment information, and/or the audition popularity to determine a preferred audio file.
And finally, integrating the preferred audio segments according to the paragraph sequence of each text paragraph to generate the e-book voice audio. It can be known that, since the preferred audio segments corresponding to the text paragraphs have been acquired, at this time, the preferred audio segments may be spliced and integrated according to the paragraph order of the text paragraphs to obtain the e-book speech audio.
The method for generating the speech audio of the electronic book, provided by the embodiment of the invention, comprises the steps of receiving audio segments uploaded by each user, and determining text paragraphs of the electronic book corresponding to each audio segment; the audio clip is generated by the user reading the text paragraph; selecting a preferred audio clip from the plurality of audio clips corresponding to each text paragraph according to the received audition feedback information of each audio clip; according to the paragraph sequence of each text paragraph, the preferred audio segments are integrated to generate the electronic book voice audio, so that the user can upload the audio segments read and recorded by himself, and the selected preferred audio segments are integrated to obtain the electronic book voice audio which is vivid in language, can embody the emotional color of the electronic book, accords with the public audio-visual aesthetic sense, and improves the user experience.
On the basis of the first embodiment, fig. 2 is a schematic flow chart of a method for generating a voice audio of an electronic book according to a second embodiment of the present invention. As shown in fig. 2, the generation method includes:
step 201, splitting a text of an electronic book to obtain at least one text paragraph, and setting a corresponding audio uploading port for each text paragraph in the electronic book.
Step 202, receiving the audio segments uploaded by the users at the audio uploading ports, and determining text paragraphs of the electronic book corresponding to each audio segment according to the audio uploading ports corresponding to the audio segments.
Step 203, selecting a preferred audio clip from the plurality of audio clips corresponding to each text paragraph according to the received audition feedback information of each audio clip.
Step 204, sequencing the preferred audio segments according to the sequence of the text segments in the electronic book;
and step 205, clipping and integrating the sequenced preferred audio segments to generate the speech audio of the electronic book.
In the second embodiment, similar to the first embodiment, a user can listen to the voice audio of the electronic book through a smart device such as a smart phone, a tablet computer, an electronic reader, and the like, and can also read the text of the electronic book corresponding to the voice audio of the electronic book of the present invention. The voice audio can be synchronously played according to the text reading progress of the user, namely, the voice audio corresponding to the text content is played when the user reads the text content; the voice audio can also be played independently of the reading behavior of the user, i.e. the voice audio can start playing once the user clicks or triggers a playing instruction of the voice audio, regardless of whether the user is in a state of reading the text or not.
The device for generating the speech and audio of the electronic book can receive audio segments uploaded by a user, wherein the audio segments comprise sound information when the user reads the electronic book. The audio clips can be obtained by direct recording, namely when the user selects to upload the audio clips, the recording function is started by the electronic book voice audio generating device so as to collect and receive sound information of the user reading the electronic book; or the user may pre-record and upload the voice audio, that is, the user transmits the pre-recorded voice audio to the electronic book voice audio generating device through wireless network transmission, near field transmission, wired transmission, or other transmission methods, so as to process the voice audio.
Different from the first embodiment, in the second embodiment, the text of the electronic book may be firstly split to obtain at least one text paragraph, and a corresponding audio upload port is set for each text paragraph in the electronic book. The text can be split according to chapters or paragraphs of the text, for example, the text can be split into text paragraphs such as "first chapter first section" and "tenth chapter second section". Then, a corresponding audio uploading port is set for each text paragraph, so that the user can upload the audio segments recorded by the user and consistent with the text paragraph to the electronic book voice audio generation device through the corresponding audio uploading port. It should be noted that the audio upload ports may be disposed at the start positions of the corresponding text paragraphs, or may be uniformly disposed in the audio upload area, which is not limited in the embodiments of the present invention. Subsequently, after receiving the plurality of audio segments uploaded by each user, the device for generating e-book speech audio may determine a text paragraph corresponding to the audio upload port directly according to the audio upload port used by the user.
Then, similar to the embodiment, an audition port may be provided for each audio piece for the user to audite the audio pieces. Then, audition feedback information of each audio segment input or triggered by each user is received, and a preferred audio segment of each text segment is selected from a plurality of audio segments corresponding to each text segment by using the audition feedback information. In order to make the obtained speech audio of the electronic book more vivid and meet the public audio-visual habit and aesthetic requirement, the preferable audio clip can be an audio clip with a better feedback result after audition of each user.
Further, the audition feedback information comprises scoring information, and/or comment information, and/or audition popularity. It should be noted that the scoring information and the comment information are information input or triggered by the user, and the audition popularity is information obtained according to statistics of audition behaviors of the user, such as audition play amount, current audition number, and the like. Correspondingly, selecting a preferred audio clip from the plurality of audio clips corresponding to each text paragraph according to the received audition feedback information of each audio clip, includes: and sequencing the audio clips corresponding to each text paragraph according to the grading information, and/or the comment information, and/or the audition popularity to determine a preferred audio file.
Finally, different from the first embodiment, the integrating the preferred audio segments according to the paragraph order of each text paragraph to generate the e-book speech audio may specifically include: sequencing the preferred audio segments according to the sequence of the text segments in the electronic book; and editing and integrating the sequenced preferred audio segments to generate the speech audio of the electronic book. In this embodiment, after the preferred audio segments are sequenced according to the paragraph order, in order to improve the style integrity of the generated e-book speech audio, the audio styles such as the speed tone style and the background music style of the preferred audio segments may be integrated, and the preferred audio segments may be effectively clipped to finally obtain the e-book speech audio. Preferably, in the process of integrating the voice rate and intonation styles, background music styles and other audio styles of the preferred audio segments in the embodiment, the integration can be realized in multiple ways, for example, the voice feature extraction can be performed on each preferred audio segment to generate the voice style of each preferred audio segment; and adjusting the voice style of each preferable audio clip according to each voice style to realize relative unification.
According to the method for generating the e-book voice audio provided by the second embodiment of the invention, on the basis of the first embodiment, the audio uploading port corresponding to each text paragraph is arranged, so that the text paragraphs of the e-book corresponding to each audio segment are determined, and thus the e-book voice audio which is vivid in language, can embody the emotional color of the e-book, accords with the public aesthetic sense of seeing and hearing is obtained, and the user experience is improved.
On the basis of the first embodiment, fig. 3 is a schematic flow chart of a method for generating a voice audio of an electronic book according to a third embodiment of the present invention. As shown in fig. 3, the generation method includes:
step 301, performing voice recognition on the audio clip to obtain text information corresponding to the audio clip.
Step 302, according to the text information, determining a text paragraph corresponding to the text information in the electronic book.
And step 303, selecting a preferred audio segment from the plurality of audio segments corresponding to each text paragraph according to the received audition feedback information of each audio segment.
Step 304, sequencing the preferred audio segments according to the sequence of the text segments in the electronic book;
and 305, cutting and integrating the sequenced preferred audio segments to generate the speech audio of the electronic book.
In the third embodiment, similar to the third embodiment, a user can listen to the voice audio of the electronic book through a smart device such as a smart phone, a tablet computer, an electronic reader, and the like, and can also read the electronic book text corresponding to the voice audio of the electronic book of the present invention. The voice audio can be synchronously played according to the text reading progress of the user, namely, the voice audio corresponding to the text content is played when the user reads the text content; the voice audio can also be played independently of the reading behavior of the user, i.e. the voice audio can start playing once the user clicks or triggers a playing instruction of the voice audio, regardless of whether the user is in a state of reading the text or not.
The device for generating the speech and audio of the electronic book can receive audio segments uploaded by a user, wherein the audio segments comprise sound information when the user reads the electronic book. The audio clips can be obtained by direct recording, namely when the user selects to upload the audio clips, the recording function is started by the electronic book voice audio generating device so as to collect and receive sound information of the user reading the electronic book; or the user may pre-record and upload the voice audio, that is, the user transmits the pre-recorded voice audio to the electronic book voice audio generating device through wireless network transmission, near field transmission, wired transmission, or other transmission methods, so as to process the voice audio.
Different from the first embodiment, in the third embodiment, after the audio clip uploaded by the user is received, speech recognition may be performed on the audio clip to obtain text information corresponding to the audio clip. The method of speech recognition may be any one of the prior art, and the present invention is not limited thereto. Through voice recognition, the text information corresponding to the audio segments can be recognized and acquired, and then the matched text paragraphs are determined in the electronic book according to the text information obtained through recognition.
Then, similar to the embodiment, an audition port may be provided for each audio piece for the user to audite the audio pieces. Then, audition feedback information of each audio segment input or triggered by each user is received, and a preferred audio segment of each text segment is selected from a plurality of audio segments corresponding to each text segment by using the audition feedback information. In order to make the obtained speech audio of the electronic book more vivid and meet the public audio-visual habit and aesthetic requirement, the preferable audio clip can be an audio clip with a better feedback result after audition of each user.
Further, the audition feedback information comprises scoring information, and/or comment information, and/or audition popularity. It should be noted that the scoring information and the comment information are information input or triggered by the user, and the audition popularity is information obtained according to statistics of audition behaviors of the user, such as audition play amount, current audition number, and the like. Correspondingly, selecting a preferred audio clip from the plurality of audio clips corresponding to each text paragraph according to the received audition feedback information of each audio clip, includes: and sequencing the audio clips corresponding to each text paragraph according to the grading information, and/or the comment information, and/or the audition popularity to determine a preferred audio file.
Finally, different from the first embodiment, the integrating the preferred audio segments according to the paragraph order of each text paragraph to generate the e-book speech audio may specifically include: sequencing the preferred audio segments according to the sequence of the text segments in the electronic book; and editing and integrating the sequenced preferred audio segments to generate the speech audio of the electronic book. In this embodiment, after the preferred audio segments are sequenced according to the paragraph order, in order to improve the style integrity of the generated e-book speech audio, the audio styles such as the speed tone style and the background music style of the preferred audio segments may be integrated, and the preferred audio segments may be effectively clipped to finally obtain the e-book speech audio. Preferably, in the process of integrating the voice rate and intonation styles, background music styles and other audio styles of the preferred audio segments in the embodiment, the integration can be realized in multiple ways, for example, the voice feature extraction can be performed on each preferred audio segment to generate the voice style of each preferred audio segment; and adjusting the voice style of each preferable audio clip according to each voice style to realize relative unification.
According to the method for generating the speech audio of the electronic book provided by the third embodiment of the invention, on the basis of the first embodiment, the text paragraphs of the electronic book corresponding to each audio segment are determined in a speech recognition mode, so that the speech audio of the electronic book which is vivid in language, can embody the emotional color of the electronic book and accords with the audience aesthetic sense of public audiences is obtained, and the user experience is improved.
Fig. 4 is a device for generating speech and audio of an electronic book provided by the present invention, which includes:
the communication module 10 is used for receiving audio clips uploaded by users;
a text passage identification module 20, configured to determine a text passage of the electronic book corresponding to each audio clip; the audio clip is generated by the user reading the text paragraph;
the speech audio generation module 30 is configured to select a preferred audio segment from the multiple audio segments corresponding to each text paragraph according to the received audition feedback information of each audio segment; and the electronic book audio integration module is further used for integrating the preferred audio segments according to the paragraph sequence of each text paragraph to generate the electronic book voice audio.
In an alternative embodiment, the communication module 10 further includes an audio upload port corresponding to each text paragraph in the electronic book;
the text passage identification module 20 is further configured to split a text of the electronic book to obtain at least one text passage before receiving the audio passages uploaded by each user and determining the text passage of the electronic book corresponding to each audio passage;
after the audio upload port receives the audio segments uploaded by each user, the text passage identification module 20 is further configured to determine text passages of the electronic book according to the audio upload port corresponding to the audio segment.
In an optional implementation manner, the text passage identification module 20 is specifically configured to perform speech recognition on an audio clip, obtain text information corresponding to the audio clip, and determine a text passage corresponding to the text information in the electronic book according to the text information.
In an optional embodiment, the audition feedback information includes rating information, and/or comment information, and/or audition popularity;
the voice audio generating module 30 is specifically configured to sort the audio segments corresponding to each text paragraph according to the scoring information, and/or the comment information, and/or the listening popularity to determine a preferred audio file.
In an optional implementation manner, the voice audio generating module 30 is specifically configured to sequence the preferred audio segments according to the paragraph order of each text segment in the electronic book, and it can be clearly understood by those skilled in the art that for convenience and simplicity of description, the specific working process and corresponding beneficial effects of the system described above may refer to the corresponding process in the foregoing method embodiment, and details are not described here again.
The device for generating the speech audio of the electronic book provided by the fourth embodiment of the invention determines the text paragraphs of the electronic book corresponding to each audio clip by receiving the audio clips uploaded by each user; the audio clip is generated by the user reading the text paragraph; selecting a preferred audio clip from the plurality of audio clips corresponding to each text paragraph according to the received audition feedback information of each audio clip; according to the paragraph sequence of each text paragraph, the preferred audio segments are integrated to generate the electronic book voice audio, so that the user can upload the audio segments read and recorded by himself, and the selected preferred audio segments are integrated to obtain the electronic book voice audio which is vivid in language, can embody the emotional color of the electronic book, accords with the public audio-visual aesthetic sense, and improves the user experience.
Fig. 5 is a schematic structural diagram of an electronic book voice audio apparatus according to a fifth embodiment of the present invention. As shown in fig. 5, the electronic book voice audio apparatus includes: a memory 41, a processor 42 and a computer program stored on the memory 41 and executable on the processor 42, the processor 42 executing the method of any of the above embodiments when executing the computer program.
The present invention also provides a readable storage medium comprising a program which, when run on a terminal, causes the terminal to perform the method of any of the above embodiments.
Those of ordinary skill in the art will understand that: all or a portion of the steps of implementing the above-described method embodiments may be performed by hardware associated with program instructions. The program may be stored in a computer-readable storage medium. When executed, the program performs steps comprising the method embodiments described above; and the aforementioned storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic or optical disks.
Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims (10)

1. A method for generating speech audio of an electronic book is characterized by comprising the following steps:
splitting a text of the electronic book to obtain at least one text paragraph, and setting a corresponding audio uploading port for each text paragraph in the electronic book;
receiving audio clips uploaded by each user at an audio uploading port, and determining text paragraphs of the electronic book according to the audio uploading ports corresponding to the audio clips, wherein the audio clips are generated by reading the text paragraphs by the users;
selecting a preferred audio clip from the plurality of audio clips corresponding to each text paragraph according to the received audition feedback information of each audio clip;
and integrating the preferred audio segments according to the paragraph sequence of each text paragraph to generate the e-book voice audio.
2. The method for generating e-book voice audio according to claim 1, wherein the receiving audio segments uploaded by users and determining a text paragraph of the e-book corresponding to each audio segment includes:
performing voice recognition on the audio clip to obtain text information corresponding to the audio clip;
according to the text information, text paragraphs corresponding to the text information are determined in the electronic book.
3. The method for generating e-book voice audio according to claim 1,
the audition feedback information comprises grading information, and/or comment information, and/or audition popularity;
correspondingly, the selecting a preferred audio clip from the plurality of audio clips corresponding to each text paragraph according to the received audition feedback information of each audio clip includes:
and sequencing the audio clips corresponding to each text paragraph according to the grading information, and/or the comment information, and/or the audition popularity to determine a preferred audio file.
4. The method for generating e-book voice audio according to claim 1, wherein the integrating the preferred audio segments according to the paragraph order of the text paragraphs to generate e-book voice audio comprises:
sequencing the preferred audio segments according to the sequence of the text segments in the electronic book;
and editing and integrating the sequenced preferred audio segments to generate the speech audio of the electronic book.
5. An apparatus for generating speech audio of an electronic book, comprising:
the communication module is used for receiving the audio clips uploaded by the users and comprises an audio uploading port corresponding to each text paragraph in the electronic book;
the text paragraph identification module is used for splitting the text of the electronic book to obtain at least one text paragraph, and determining the text paragraph of the electronic book according to the audio uploading port corresponding to the audio fragment after the audio uploading port receives the audio fragment uploaded by each user; the audio clip is generated by the user reading the text paragraph;
the voice audio generation module is used for selecting a preferred audio fragment from the plurality of audio fragments corresponding to each text paragraph according to the received audition feedback information of each audio fragment; and integrating the preferred audio segments according to the paragraph sequence of each text paragraph to generate the e-book voice audio.
6. The apparatus of claim 5, wherein the text passage recognition module is specifically configured to perform speech recognition on an audio segment to obtain text information corresponding to the audio segment, and determine a text passage corresponding to the text information in the electronic book according to the text information.
7. The apparatus for generating electronic book voice audio according to claim 5,
the audition feedback information comprises grading information, and/or comment information, and/or audition popularity;
the voice audio generation module is specifically configured to sort the audio segments corresponding to each text paragraph according to the scoring information, and/or the comment information, and/or the listening popularity to determine a preferred audio file.
8. The apparatus of claim 5, wherein the speech audio generation module is specifically configured to sort the preferred audio segments according to a paragraph sequence of each text segment in the electronic book, and clip and integrate the sorted preferred audio segments to generate the electronic book speech audio.
9. An apparatus for generating speech audio of an electronic book, comprising: a memory, a processor, and a computer program;
wherein the computer program is stored in the memory and configured to be executed by the processor to implement the method of any one of claims 1-4.
10. A readable storage medium, having stored thereon a computer program which is processed to be executed to implement the method of any one of claims 1-4.
CN201810783023.2A 2018-07-17 2018-07-17 Method and device for generating voice audio of electronic book and readable storage medium Active CN110797001B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810783023.2A CN110797001B (en) 2018-07-17 2018-07-17 Method and device for generating voice audio of electronic book and readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810783023.2A CN110797001B (en) 2018-07-17 2018-07-17 Method and device for generating voice audio of electronic book and readable storage medium

Publications (2)

Publication Number Publication Date
CN110797001A CN110797001A (en) 2020-02-14
CN110797001B true CN110797001B (en) 2022-04-12

Family

ID=69425001

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810783023.2A Active CN110797001B (en) 2018-07-17 2018-07-17 Method and device for generating voice audio of electronic book and readable storage medium

Country Status (1)

Country Link
CN (1) CN110797001B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112541323A (en) * 2020-12-21 2021-03-23 广州优谷信息技术有限公司 Method and device for processing reading materials
CN112732216B (en) * 2020-12-31 2022-05-10 南京南机智农农机科技研究院有限公司 Interaction method and interaction system for parallel reading voice
CN113096635B (en) * 2021-03-31 2024-01-09 抖音视界有限公司 Audio and text synchronization method, device, equipment and medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101354840A (en) * 2008-09-08 2009-01-28 众智瑞德科技(北京)有限公司 Method and apparatus for performing voice reading control of electronic book
CN102779508A (en) * 2012-03-31 2012-11-14 安徽科大讯飞信息科技股份有限公司 Speech corpus generating device and method, speech synthesizing system and method
CN103492996A (en) * 2011-02-24 2014-01-01 谷歌公司 Electronic book interface system and method
CN105869446A (en) * 2016-03-29 2016-08-17 广州阿里巴巴文学信息技术有限公司 Electronic reading apparatus and voice reading loading method
CN106326277A (en) * 2015-06-30 2017-01-11 上海证大喜马拉雅网络科技有限公司 User behavior-based personalized audio recommendation method and system
CN107342079A (en) * 2017-07-05 2017-11-10 谌勋 A kind of acquisition system of the true voice based on internet
CN107369462A (en) * 2017-07-21 2017-11-21 广州阿里巴巴文学信息技术有限公司 E-book speech playing method, device and terminal device

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020099552A1 (en) * 2001-01-25 2002-07-25 Darryl Rubin Annotating electronic information with audio clips
US9548052B2 (en) * 2013-12-17 2017-01-17 Google Inc. Ebook interaction using speech recognition

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101354840A (en) * 2008-09-08 2009-01-28 众智瑞德科技(北京)有限公司 Method and apparatus for performing voice reading control of electronic book
CN103492996A (en) * 2011-02-24 2014-01-01 谷歌公司 Electronic book interface system and method
CN102779508A (en) * 2012-03-31 2012-11-14 安徽科大讯飞信息科技股份有限公司 Speech corpus generating device and method, speech synthesizing system and method
CN106326277A (en) * 2015-06-30 2017-01-11 上海证大喜马拉雅网络科技有限公司 User behavior-based personalized audio recommendation method and system
CN105869446A (en) * 2016-03-29 2016-08-17 广州阿里巴巴文学信息技术有限公司 Electronic reading apparatus and voice reading loading method
CN107342079A (en) * 2017-07-05 2017-11-10 谌勋 A kind of acquisition system of the true voice based on internet
CN107369462A (en) * 2017-07-21 2017-11-21 广州阿里巴巴文学信息技术有限公司 E-book speech playing method, device and terminal device

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
A prototype Haptic E-Book system to support immersive remote reading in a smart space;Abu Saleh Md Mahfujur Rahman,et al.;《2011 IEEE International Workshop on Haptic Audio Visual Environments and Games》;IEEE;20111201;全文 *
手机读书 我只用耳朵——讯飞语音电子书;文惠子;《电脑爱好者(普及版)》;中国知网;20100201(第2期);全文 *

Also Published As

Publication number Publication date
CN110797001A (en) 2020-02-14

Similar Documents

Publication Publication Date Title
CN106652997B (en) Audio synthesis method and terminal
CN110797001B (en) Method and device for generating voice audio of electronic book and readable storage medium
CN108259971A (en) Subtitle adding method, device, server and storage medium
CN109754783B (en) Method and apparatus for determining boundaries of audio sentences
CN102256049A (en) Automatic story production
JP2015212928A (en) Method, apparatus, device, and system for inserting audio advertisement
CN113691909B (en) Digital audio workstation with audio processing recommendations
WO2021227308A1 (en) Video resource generation method and apparatus
CN107680584B (en) Method and device for segmenting audio
CN114449313B (en) Method and device for adjusting audio and video playing rate of video
CN105868400A (en) Recorded sound information processing method and recorded sound information processing device
CN111125384B (en) Multimedia answer generation method and device, terminal equipment and storage medium
CN113676772A (en) Video generation method and device
CN107451185A (en) The way of recording, bright read apparatus, computer-readable recording medium and computer installation
US11775070B2 (en) Vibration control method and system for computer device
CN114783408A (en) Audio data processing method and device, computer equipment and medium
CN108777804A (en) media playing method and device
CN114339451A (en) Video editing method and device, computing equipment and storage medium
JP6627315B2 (en) Information processing apparatus, information processing method, and control program
CN114120943A (en) Method, device, equipment, medium and program product for processing virtual concert
US11062693B1 (en) Silence calculator
JP7117228B2 (en) karaoke system, karaoke machine
CN112562430B (en) Auxiliary reading method, video playing method, device, equipment and storage medium
KR102025903B1 (en) Apparatus and method for language learning
CN111368099B (en) Method and device for generating core information semantic graph

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20200417

Address after: 310052 room 508, floor 5, building 4, No. 699, Wangshang Road, Changhe street, Binjiang District, Hangzhou City, Zhejiang Province

Applicant after: Alibaba (China) Co.,Ltd.

Address before: 510627 Guangdong city of Guangzhou province Whampoa Tianhe District Road No. 163 Xiping Yun Lu Yun Ping square B radio tower 13 layer self unit 03

Applicant before: GUANGZHOU ALIBABA LITERATURE INFORMATION TECHNOLOGY Co.,Ltd.

TA01 Transfer of patent application right
GR01 Patent grant
GR01 Patent grant