CN107886939B

CN107886939B - Pause-continue type text voice playing method and device at client

Info

Publication number: CN107886939B
Application number: CN201610871990.5A
Authority: CN
Inventors: 熊健南; 莫文
Original assignee: Beijing Jingdong Century Trading Co Ltd; Beijing Jingdong Shangke Information Technology Co Ltd
Current assignee: Beijing Jingdong Century Trading Co Ltd; Beijing Jingdong Shangke Information Technology Co Ltd
Priority date: 2016-09-30
Filing date: 2016-09-30
Publication date: 2021-03-30
Anticipated expiration: 2036-09-30
Also published as: CN107886939A

Abstract

The invention provides a pause-continue text voice playing method and device at a client, which can solve the problem of too low loading speed during voice playing of a digital document, shorten the waiting time of a user and improve the user experience. The pause-continue text voice playing method at the client comprises the following steps: receiving a voice playing command of a user to a text; acquiring a text from a corresponding digital document of a server, and simultaneously playing a mid-point voice file; after the acquisition of the text is completed, checking whether the playing of the voice file at the termination point is completed, and if the playing is completed, starting voice generation and playing from a position in the text corresponding to the end of the voice file at the termination point; when a user sends a command of stopping the voice playing of the text, the position of the current playing stop in the text is recorded and the middle stop point is updated by using the position, and a voice file corresponding to a text segment with a set length before and after the current playing stop point in the text is generated and replaces the voice file at the playing stop point.

Description

Pause-continue type text voice playing method and device at client

Technical Field

The present invention relates to the field of computer and software technologies thereof, and in particular, to a pause-continue text-to-speech playing method and apparatus at a client.

Background

With the development of mobile internet, the utilization of voice technology is increasing, and the voice reading of digital documents is also becoming more and more popular. In many scenarios, such as when driving a car or in a crowded vehicle, etc., it is not very convenient to read visually. Therefore, in the mobile device, the text file is quickly loaded and parsed and voice reading is performed, which is a popular application.

At present, the scheme for reading digital documents is mainly to read and analyze a digital document file, then extract text contents in the digital document, and finally call a voice module to read the digital document. As shown in fig. 1, according to fig. 1, the existing overall process of reading digital documents mainly includes:

s11: reading the digital document under the specific path and loading the digital document into a memory;

s12: analyzing the structure of the digital document file loaded into the memory to obtain the internal information;

for the PDF document, each page in the PDF document and objects related to the pages (the objects contain text information) are mainly analyzed; for the ePub file, mainly analyzing the file list and the corresponding chapter sequence file therein to obtain each chapter file (HTML file), and for the file of text type (txt), directly obtaining the text.

S13: extracting text content in the digital document;

for the PDF document, extracting an object of a text type from the content object of each page; for the ePub file, analyzing the chapter file to obtain each paragraph, and then only taking the text in the paragraph; for the text type file, the result of the last step (step S12) is directly used.

S14: and submitting the document to a voice reading module for reading.

The scheme has certain defects, which are mainly embodied in that the speed of document analysis is not high enough, and reading (playing) can be started only when the document is analyzed and the text is extracted, so that the waiting time of a user is too long, and the user experience is influenced.

Disclosure of Invention

In view of the above, the present invention provides a pause-continue text-to-speech playing method and device at a client, which can solve the problem of too slow loading speed during speech playing of a digital document, shorten the waiting time of a user, and improve the user experience.

To achieve the above object, according to one aspect of the present invention, there is provided a pause-continue text speech playing method at a client.

A pause-continue text-to-speech playing method at a client, where the text is associated with a pause point, the pause point is a position where the previous speech playing is paused in the text, and the pause point corresponds to a pause point speech file saved at the client, the pause point speech file corresponds to a text segment of a set length before and after the pause point in the text, where when the current playing is the first playing of the text, the pause point is a start point of the text, and the pause point speech file contains a predetermined speech prompt, the method comprising: receiving a voice playing command of a user to the text; acquiring the text from a corresponding digital document of a server, and simultaneously playing the mid-point voice file; after the acquisition of the text is finished, checking whether the playing of the voice file at the stopping point is finished, and calling a corresponding voice synthesizer to start voice generation and playing from a position corresponding to the end of the voice file at the stopping point in the text when the playing of the voice file at the middle stopping point is finished; when a user sends a command of stopping the voice playing of the text, recording the position of the current playing stopping in the text, updating the middle stop point by using the position, generating a voice file corresponding to a text segment with a set length before and after the current stopping point in the text, and replacing the voice file at the stopping point by using the generated voice file.

Optionally, the step of acquiring the text includes: reading the digital document and loading the digital document into a local memory; parsing the digital document according to a format of the digital document to identify textual content therein; and extracting text content in the digital document and forming the text.

Optionally, the acquiring the text further includes timing the time for acquiring the text by using a timer to determine a duration required for acquiring the text, and accordingly determining a length of a text segment corresponding to the suspension point voice file as the set length, so that the time required for completing playing of the suspension point voice file is longer than the duration.

Optionally, the step of generating a voice file corresponding to a text segment with a set length before and after the current termination point in the text includes: intercepting the text segment with the set length according to a set rule before and after the current termination point of the text; and recording the end position of the text segment; and generating the voice file according to the text segment by utilizing the voice synthesizer.

Optionally, the setting rule includes: and intercepting the text segment with the set length according to a given proportion before and after the current middle stop point.

Optionally, the types of formats of the digital document include PDF, ePub, txt.

According to another aspect of the present invention, there is provided an pause-continue text speech playback apparatus at a client.

An apparatus for playing a text with pause-continue at a client, wherein the text is associated with a pause point, the pause point is a position of pause of a previous voice playing in the text, and the pause point corresponds to a pause point voice file saved at the client, the pause point voice file corresponds to a text segment with a set length before and after the pause point in the text, wherein when the playing is a first playing of the text, the pause point is a starting point of the text, and the pause point voice file contains a predetermined voice prompt, the apparatus comprises a command receiving module, a text acquiring module, a voice playing module and a file generating module, wherein: the command receiving module is used for receiving a voice playing command of a user to the text; the text acquisition module is used for acquiring the text from the corresponding digital document of the server and simultaneously playing the voice file of the mid-point by the voice playing module; the voice playing module is used for checking whether the voice file at the stopping point is played completely or not after the text acquisition module finishes acquiring the text, and calling a corresponding voice synthesizer to start voice generation and playing from a position corresponding to the end of the voice file at the stopping point in the text when the voice file at the stopping point is played completely; the file generation module is used for recording the position of the current playing stop in the text and updating the middle stop point by using the position when a user sends a command of stopping the voice playing of the text, generating a voice file corresponding to a text segment with a set length before and after the current stop point in the text, and replacing the voice file of the stop point by using the generated voice file.

Optionally, the text obtaining module is further configured to: reading the digital document and loading the digital document into a local memory; parsing the digital document according to a format of the digital document to identify textual content therein; and extracting text content in the digital document and forming the text.

Optionally, the text obtaining module is further configured to: and timing the time for acquiring the text by using a timer to determine the time length required by acquiring the text, and accordingly determining the length of the text segment corresponding to the voice file at the stopping point as the set length so that the time required by the voice file at the stopping point to finish playing is longer than the time length.

Optionally, the file generation module is further configured to: intercepting the text segment with the set length according to a set rule before and after the current termination point of the text; and recording the end position of the text segment; and generating the voice file according to the text segment by utilizing the voice synthesizer.

According to yet another aspect of the present invention, an electronic device is provided.

An electronic device, comprising: one or more processors; a storage device to store one or more programs that, when executed by the one or more processors, cause the one or more processors to implement a pause-continue text-to-speech method at a client.

According to yet another aspect of the invention, a computer-readable medium is provided.

A computer-readable medium, on which a computer program is stored, which, when being executed by a processor, implements a pause-and-continue text-to-speech playing method for a client.

According to the technical scheme of the invention, a voice playing command of a user to a text is received, the text is obtained from a corresponding digital document of a server, and meanwhile, a stop point voice file corresponding to a stop point of previous voice playing in the stored text is played; after the text acquisition is finished, checking whether the playing of the voice file at the termination point is finished, and if the playing is finished, starting voice generation and playing from a position in the text corresponding to the end of the voice file at the termination point; when a user sends a command of stopping the voice playing of the text, recording the position of the current playing stop in the text, updating the middle stop point by using the position, generating a voice file corresponding to a text segment with a set length before and after the current playing stop point in the text, and replacing the voice file of the middle stop point by using the generated voice file. By using the technical scheme of the invention, the problem of too low loading speed during voice playing of the digital document can be solved, the waiting time of a user is shortened, and the user experience is improved.

Drawings

The drawings are included to provide a better understanding of the invention and are not to be construed as unduly limiting the invention. Wherein:

FIG. 1 is a general flow diagram of a prior art digital document reading;

FIG. 2 is a schematic diagram of the main steps of a pause-continue text-to-speech playing method at a client according to an embodiment of the present invention;

FIG. 3 is a schematic flow chart of a pause-continue text voice playing method at a client according to an embodiment of the present invention;

fig. 4 is a schematic diagram of main modules of an pause-continue text speech playing apparatus at a client according to an embodiment of the present invention.

Detailed Description

Exemplary embodiments of the present invention are described below with reference to the accompanying drawings, in which various details of embodiments of the invention are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the invention. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

Fig. 2 is a schematic diagram of main steps of an abort-continue text-to-speech playing method at a client according to an embodiment of the present invention.

As shown in fig. 2, the pause-continue text speech playing method at the client end of the embodiment of the present invention mainly includes the following steps S21 to S24.

The text in this embodiment is associated with a stop point, where the stop point is a position of a stop of a previous voice playback in the text, and the stop point corresponds to a stop point voice file stored at the client, where the stop point voice file corresponds to a text segment of a set length before and after the stop point in the text, where when the playback of this time is a first playback of the text, the stop point is a start point of the text, and the stop point voice file contains a predetermined voice prompt. The client of the embodiment of the invention can be a mobile device, such as an embedded device like a mobile phone, a Pad, an electronic book and the like, and can also be a fixed device like a desktop computer and the like.

Step S21: and receiving a voice playing command of the text from the user.

Step S22: and acquiring a text from the corresponding digital document of the server, and simultaneously playing the voice file of the middle stop point.

The types of formats of the digital document can include PDF, ePub, txt, and other types of digital documents.

The step of acquiring the text specifically comprises: reading a digital document and loading the digital document into a local memory; parsing the digital document according to the format of the digital document to identify textual content therein; text content in the digital document is extracted and text is formed.

The step of obtaining the text further comprises the step of timing the time for obtaining the text by using a timer so as to determine the time length required by obtaining the text, and the length of the text segment corresponding to the middle stop point voice file is determined as the set length according to the time length, so that the time required by the completion of playing the stop point voice file is longer than the time length.

Step S23: and after the text acquisition is finished, checking whether the playing of the voice file at the termination point is finished, and calling a corresponding voice synthesizer to start voice generation and playing from a position corresponding to the end of the voice file at the termination point in the text when the playing of the voice file at the termination point is finished.

Step S24: when a user sends a command of stopping the voice playing of the text, recording the position of the current playing stop in the text, updating the middle stop point by using the position, generating a voice file corresponding to a text segment with a set length before and after the current playing stop point in the text, and replacing the voice file of the middle stop point by using the generated voice file.

Generating a voice file corresponding to a text segment with a set length before and after the current stopping point in the text, wherein the text segment with the set length is mainly intercepted before and after the current stopping point of the text according to a set rule; and recording the end position of the text segment; and generating a voice file according to the text segment by using a voice synthesizer.

Wherein, setting the rule may include: and intercepting the text segment with the set length according to a given proportion before and after the current middle stop point.

Fig. 3 is a schematic flow chart illustrating a preferred pause-continue text voice playing method at a client according to an embodiment of the present invention. Wherein:

after the client receives a voice playing command of a user for a text, whether a suspension point voice file exists in a local cache is checked, if yes, the suspension point voice file in the cache is played through a voice playing module, and if not, a preset voice prompt is played (not shown in the figure). The system comprises a text playing module, a voice file storing module and a client, wherein the middle stop point is the position corresponding to the text when the last voice playing is stopped, and the user stores text segments with set lengths before and after the stop point when the last voice playing is stopped, generates a voice file according to the text segments and stores the voice file in a local cache of the client. If the text is currently being voice played for the first time, a predetermined voice prompt, such as a voice of "current document is loading", is played, and the voice prompt may be played in a loop.

The method comprises the following steps that when a voice playing module plays a mid-point voice file or a preset voice prompt, a client side obtains a text from a corresponding digital document of a server, and the specific process comprises the following steps: firstly, reading a digital document: reading a digital document from a server through a digital document storage path and loading the digital document to a local memory; secondly, analyzing the digital document: for example, for a PDF document, each page in the PDF document and objects related to the pages (the objects contain text information) are mainly parsed, for an ePub file, a file list and a corresponding chapter sequence file in the PDF document are mainly parsed to obtain each chapter file (HTML file), and for a text type file (txt file), a text is directly obtained; thirdly, extracting text contents in the digital document and forming a text: for a PDF document, a text type object is mainly extracted from a content object of each page, for an ePub file, a chapter file is mainly analyzed to obtain each paragraph, then only texts in the paragraphs are extracted, and for a text type file (txt file), the texts obtained by analysis can be directly used as the texts can be directly obtained by analysis; recording the duration required by acquiring the text: specifically, the length of the text segment can be determined by calculating the time length and a preset playing speed, for example, the playing speed is 120 words/minute, the time length required for obtaining the text is 5 seconds, the product of the two is multiplied by a preset coefficient a to obtain the length of the text segment, the preset coefficient a can be set by itself, for example, can be set to 12, and then the time length is the playing speed and the preset coefficient a is 120 words. Thus, at a playback speech rate of 120 words/minute, the playback completion time for playing back the pause-point speech file generated from the text segment of that length is 1 minute. Theoretically, the duration required for obtaining the text is the same every time the voice playing is performed, so that the playing completion time of the voice file at the stop point every time the voice playing is performed is also the same under the condition of the same playing speed and the preset coefficient value. However, considering the influence of factors such as the CPU and the memory of the client used each time, there may be a difference in the duration required for acquiring the text each time, and therefore, when setting the specific value of the preset coefficient a, the length of the calculated text segment is such that the playing completion time of the voice file at the termination point is longer than the duration for normally acquiring the text (i.e. the duration without considering the influence of factors such as the CPU and the memory of the client), that is, assuming that the duration required for normally acquiring the text is 5 seconds, the following calculation is performed: duration is playback speed is preset to be the length of the text segment determined by the coefficient a, and the playback completion time of the generated voice file should be longer than 5 seconds, for example, the coefficient a is set to 12, so that the playback completion time of the pause point voice file is 1 minute at the playback speed of 120 words/minute. Therefore, the situation that the text is not completely acquired when the speech file at the pause point is played is avoided from being influenced by factors such as a CPU (central processing unit) and a memory of the client when the speech file at the pause point is played next time.

And after the client finishes acquiring the text, the voice playing module checks whether the playing of the voice file of the middle stop point is finished, and when the playing of the voice file of the middle stop point is finished, the voice generation and playing are started from the position corresponding to the end of the voice file of the stop point in the text. The voice playing module can be specifically a voice reading SDK.

When a command of stopping the voice playing of the text sent by a user is received, the position of the current playing stopping in the text is recorded, a voice file corresponding to a text segment with a set length before and after the position in the text is generated, and the generated voice file is stored in a local cache to replace the voice file at the stopping point of the current local cache, so that the generated voice file at the stopping point is played while the text is acquired when the voice playing is performed next time. The text passage of the set length may be truncated according to a set rule, specifically, the truncation may be performed according to a given ratio before and after the current middle stop point, for example, the ratio may be set to 1:3, assuming that the calculation is performed by: the duration of the text is obtained, the playback speed is preset by a factor a of 120, so that 120 words 1/4 is 30 words before the position where the current playback is stopped, and 120 words 3/4 is 90 words after the position where the current playback is stopped. Then, the end position of the intercepted text segment is recorded, for example, information such as a chapter, a paragraph, and a character where the end position is located is recorded, and a speech file is generated from the text segment by using a speech synthesizer.

And storing the generated voice file in a local cache as a stop point voice file when voice playing is performed next time.

Fig. 4 is a schematic diagram of main modules of an pause-continue text speech playing apparatus at a client according to an embodiment of the present invention. The text of the embodiment of the present invention is associated with an end point, where the end point is a position of an end of a previous voice playback in the text, and the end point corresponds to an end point voice file stored at a client, where the end point voice file corresponds to a text segment with a set length before and after the end point in the text, where when the current playback is a first playback of the text, the end point is a start point of the text, and the end point voice file includes a predetermined voice prompt.

The pause-continue text voice playing device 40 at the client according to the embodiment of the present invention mainly comprises: a command receiving module 41, a text acquiring module 42, a voice playing module 43 and a file generating module 44.

Wherein: the command receiving module 41 is configured to receive a voice playing command of a user for a text; the text acquisition module 42 is used for acquiring texts from the corresponding digital documents of the server, and simultaneously, the voice playing module 43 plays the voice files of the stop points; the voice playing module 43 is configured to check whether the playing of the voice file at the termination point is completed after the text obtaining module 42 completes the obtaining of the text, and call a corresponding voice synthesizer to start voice generation and playing from a position in the text corresponding to the end of the voice file at the termination point when the playing of the voice file at the termination point is completed; the file generating module 44 is configured to, when a user issues a command to terminate the voice playing of the text, record a position where the playing is currently terminated in the text and update the middle point with the position, generate a voice file corresponding to a text segment with a set length before and after the current termination point in the text, and replace the middle point voice file with the generated voice file.

The text acquisition module 42 may also be used to read digital documents and load them into local memory; parsing the digital document according to the format of the digital document to identify textual content therein; text content in the digital document is extracted and text is formed.

In addition, the text acquisition module 42 may be further configured to: and timing the time for acquiring the text by using a timer to determine the time length required by acquiring the text, and determining the length of the text segment corresponding to the middle stop point voice file as a set length according to the time length so that the time required by the completion of the playing of the stop point voice file is longer than the time length.

The file generation module 44 may also be configured to: intercepting text segments with set lengths according to set rules before and after the current termination point of the text; and recording the end position of the text segment; and generating a voice file according to the text segment by using a voice synthesizer. Wherein, setting the rule specifically may include: and intercepting the text segment with the set length according to a given proportion before and after the current middle stop point.

Types of formats of digital documents include, but are not limited to, PDF, ePub, txt.

According to the technical scheme of the embodiment of the invention, a voice playing command of a user to a text is received, the text is obtained from a corresponding digital document of a server, and a stop point voice file corresponding to a stop point of previous voice playing in the stored text is played; after the text acquisition is finished, checking whether the playing of the voice file at the termination point is finished, and if the playing is finished, starting voice generation and playing from a position in the text corresponding to the end of the voice file at the termination point; when a user sends a command of stopping the voice playing of the text, recording the position of the current playing stop in the text, updating the middle stop point by using the position, generating a voice file corresponding to a text segment with a set length before and after the current playing stop point in the text, and replacing the voice file of the middle stop point by using the generated voice file. By using the technical scheme of the embodiment of the invention, the problem of too low loading speed during voice playing of the digital document can be solved, the waiting time of a user is shortened, and the user experience is improved.

The above-described embodiments should not be construed as limiting the scope of the invention. Those skilled in the art will appreciate that various modifications, combinations, sub-combinations, and substitutions can occur, depending on design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A pause-continue type text-to-speech playing method at a client, wherein the text is associated with a pause point, the pause point is a position of pause of a previous speech playing in the text, and the pause point corresponds to a pause point speech file saved at the client, the stop point speech file corresponds to a text segment with a set length before and after the pause point in the text, wherein when the current playing is the first playing of the text, the pause point is a start point of the text, and the pause point speech file contains a predetermined speech prompt, the method comprising:

receiving a voice playing command of a user to the text;

acquiring the text from a corresponding digital document of a server, and simultaneously playing the mid-point voice file;

after the acquisition of the text is finished, checking whether the playing of the voice file at the stopping point is finished, and calling a corresponding voice synthesizer to start voice generation and playing from a position corresponding to the end of the voice file at the stopping point in the text when the playing of the voice file at the middle stopping point is finished;

when a user sends a command of stopping the voice playing of the text, recording the position of the current playing stopping in the text, updating the middle stop point by using the position, generating a voice file corresponding to a text segment with a set length before and after the current stopping point in the text, and replacing the voice file at the stopping point by using the generated voice file.

2. The method of claim 1, wherein the step of obtaining the text comprises:

reading the digital document and loading the digital document into a local memory;

parsing the digital document according to a format of the digital document to identify textual content therein;

and extracting text content in the digital document and forming the text.

3. The method of claim 1, wherein obtaining the text further comprises timing the time for obtaining the text by using a timer to determine a duration required for obtaining the text, and accordingly determining a length of a text segment corresponding to the suspension point voice file as the set length, so that the time required for completing the playing of the suspension point voice file is longer than the duration.

4. The method according to claim 1, wherein the step of generating the voice file corresponding to the text segment with the set length before and after the current pause point in the text comprises:

intercepting the text segment with the set length according to a set rule before and after the current termination point of the text;

and recording the end position of the text segment; and

and generating the voice file according to the text segment by utilizing the voice synthesizer.

5. The method of claim 4, wherein setting the rule comprises:

and intercepting the text segment with the set length according to a given proportion before and after the current middle stop point.

6. The method of claim 1, wherein the type of format of the digital document comprises PDF, ePub, txt.

7. A pause-continue type text-to-speech playing apparatus at a client, wherein the text is associated with a pause point, the pause point is a position where the previous speech playing in the text is paused, and the pause point corresponds to a pause point speech file saved at the client, the stop point speech file corresponds to a text segment of a set length before and after the pause point in the text, wherein when the current playing is the first playing of the text, the pause point is a start point of the text, and the pause point speech file contains a predetermined speech prompt, the apparatus comprises a command receiving module, a text acquiring module, a speech playing module, and a file generating module, wherein:

the command receiving module is used for receiving a voice playing command of a user to the text;

the text acquisition module is used for acquiring the text from the corresponding digital document of the server and simultaneously playing the mid-point voice file by the voice playing module;

the voice playing module is configured to check whether the playing of the voice file at the termination point is completed after the text acquisition module finishes acquiring the text, and call a corresponding voice synthesizer to start voice generation and playing from a position in the text corresponding to the end of the voice file at the termination point when the playing of the voice file at the termination point is completed;

and the file generation module is used for recording the position of the current playing stop in the text and updating the middle stop point by using the position when a user sends a command of stopping the voice playing of the text, generating a voice file corresponding to a text segment with a set length before and after the current stop point in the text, and replacing the voice file at the stop point by using the generated voice file.

8. The apparatus of claim 7, wherein the text acquisition module is further configured to:

and extracting text content in the digital document and forming the text.

9. The apparatus of claim 7, wherein the text acquisition module is further configured to:

and timing the time for acquiring the text by using a timer to determine the time length required by acquiring the text, and accordingly determining the length of the text segment corresponding to the voice file at the stopping point as the set length so that the time required by the voice file at the stopping point to finish playing is longer than the time length.

10. The apparatus of claim 7, wherein the file generation module is further configured to:

and recording the end position of the text segment; and

11. The apparatus of claim 10, wherein the setting rules comprises:

12. The apparatus of claim 7, wherein the type of format of the digital document comprises PDF, ePub, txt.

13. An electronic device, comprising:

one or more processors;

a storage device for storing one or more programs,

the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method recited in any of claims 1-6.

14. A computer-readable medium, on which a computer program is stored, which, when being executed by a processor, carries out the method according to any one of claims 1-6.