CN107886939B - Pause-continue type text voice playing method and device at client - Google Patents

Pause-continue type text voice playing method and device at client Download PDF

Info

Publication number
CN107886939B
CN107886939B CN201610871990.5A CN201610871990A CN107886939B CN 107886939 B CN107886939 B CN 107886939B CN 201610871990 A CN201610871990 A CN 201610871990A CN 107886939 B CN107886939 B CN 107886939B
Authority
CN
China
Prior art keywords
text
playing
point
voice
file
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201610871990.5A
Other languages
Chinese (zh)
Other versions
CN107886939A (en
Inventor
熊健南
莫文
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jingdong Century Trading Co Ltd
Beijing Jingdong Shangke Information Technology Co Ltd
Original Assignee
Beijing Jingdong Century Trading Co Ltd
Beijing Jingdong Shangke Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jingdong Century Trading Co Ltd, Beijing Jingdong Shangke Information Technology Co Ltd filed Critical Beijing Jingdong Century Trading Co Ltd
Priority to CN201610871990.5A priority Critical patent/CN107886939B/en
Publication of CN107886939A publication Critical patent/CN107886939A/en
Application granted granted Critical
Publication of CN107886939B publication Critical patent/CN107886939B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/08Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/28Constructional details of speech recognition systems
    • G10L15/30Distributed recognition, e.g. in client-server systems, for mobile phones or network applications
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/08Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination
    • G10L2013/083Special characters, e.g. punctuation marks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Telephonic Communication Services (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The invention provides a pause-continue text voice playing method and device at a client, which can solve the problem of too low loading speed during voice playing of a digital document, shorten the waiting time of a user and improve the user experience. The pause-continue text voice playing method at the client comprises the following steps: receiving a voice playing command of a user to a text; acquiring a text from a corresponding digital document of a server, and simultaneously playing a mid-point voice file; after the acquisition of the text is completed, checking whether the playing of the voice file at the termination point is completed, and if the playing is completed, starting voice generation and playing from a position in the text corresponding to the end of the voice file at the termination point; when a user sends a command of stopping the voice playing of the text, the position of the current playing stop in the text is recorded and the middle stop point is updated by using the position, and a voice file corresponding to a text segment with a set length before and after the current playing stop point in the text is generated and replaces the voice file at the playing stop point.

Description

Pause-continue type text voice playing method and device at client
Technical Field
The present invention relates to the field of computer and software technologies thereof, and in particular, to a pause-continue text-to-speech playing method and apparatus at a client.
Background
With the development of mobile internet, the utilization of voice technology is increasing, and the voice reading of digital documents is also becoming more and more popular. In many scenarios, such as when driving a car or in a crowded vehicle, etc., it is not very convenient to read visually. Therefore, in the mobile device, the text file is quickly loaded and parsed and voice reading is performed, which is a popular application.
At present, the scheme for reading digital documents is mainly to read and analyze a digital document file, then extract text contents in the digital document, and finally call a voice module to read the digital document. As shown in fig. 1, according to fig. 1, the existing overall process of reading digital documents mainly includes:
s11: reading the digital document under the specific path and loading the digital document into a memory;
s12: analyzing the structure of the digital document file loaded into the memory to obtain the internal information;
for the PDF document, each page in the PDF document and objects related to the pages (the objects contain text information) are mainly analyzed; for the ePub file, mainly analyzing the file list and the corresponding chapter sequence file therein to obtain each chapter file (HTML file), and for the file of text type (txt), directly obtaining the text.
S13: extracting text content in the digital document;
for the PDF document, extracting an object of a text type from the content object of each page; for the ePub file, analyzing the chapter file to obtain each paragraph, and then only taking the text in the paragraph; for the text type file, the result of the last step (step S12) is directly used.
S14: and submitting the document to a voice reading module for reading.
The scheme has certain defects, which are mainly embodied in that the speed of document analysis is not high enough, and reading (playing) can be started only when the document is analyzed and the text is extracted, so that the waiting time of a user is too long, and the user experience is influenced.
Disclosure of Invention
In view of the above, the present invention provides a pause-continue text-to-speech playing method and device at a client, which can solve the problem of too slow loading speed during speech playing of a digital document, shorten the waiting time of a user, and improve the user experience.
To achieve the above object, according to one aspect of the present invention, there is provided a pause-continue text speech playing method at a client.
A pause-continue text-to-speech playing method at a client, where the text is associated with a pause point, the pause point is a position where the previous speech playing is paused in the text, and the pause point corresponds to a pause point speech file saved at the client, the pause point speech file corresponds to a text segment of a set length before and after the pause point in the text, where when the current playing is the first playing of the text, the pause point is a start point of the text, and the pause point speech file contains a predetermined speech prompt, the method comprising: receiving a voice playing command of a user to the text; acquiring the text from a corresponding digital document of a server, and simultaneously playing the mid-point voice file; after the acquisition of the text is finished, checking whether the playing of the voice file at the stopping point is finished, and calling a corresponding voice synthesizer to start voice generation and playing from a position corresponding to the end of the voice file at the stopping point in the text when the playing of the voice file at the middle stopping point is finished; when a user sends a command of stopping the voice playing of the text, recording the position of the current playing stopping in the text, updating the middle stop point by using the position, generating a voice file corresponding to a text segment with a set length before and after the current stopping point in the text, and replacing the voice file at the stopping point by using the generated voice file.
Optionally, the step of acquiring the text includes: reading the digital document and loading the digital document into a local memory; parsing the digital document according to a format of the digital document to identify textual content therein; and extracting text content in the digital document and forming the text.
Optionally, the acquiring the text further includes timing the time for acquiring the text by using a timer to determine a duration required for acquiring the text, and accordingly determining a length of a text segment corresponding to the suspension point voice file as the set length, so that the time required for completing playing of the suspension point voice file is longer than the duration.
Optionally, the step of generating a voice file corresponding to a text segment with a set length before and after the current termination point in the text includes: intercepting the text segment with the set length according to a set rule before and after the current termination point of the text; and recording the end position of the text segment; and generating the voice file according to the text segment by utilizing the voice synthesizer.
Optionally, the setting rule includes: and intercepting the text segment with the set length according to a given proportion before and after the current middle stop point.
Optionally, the types of formats of the digital document include PDF, ePub, txt.
According to another aspect of the present invention, there is provided an pause-continue text speech playback apparatus at a client.
An apparatus for playing a text with pause-continue at a client, wherein the text is associated with a pause point, the pause point is a position of pause of a previous voice playing in the text, and the pause point corresponds to a pause point voice file saved at the client, the pause point voice file corresponds to a text segment with a set length before and after the pause point in the text, wherein when the playing is a first playing of the text, the pause point is a starting point of the text, and the pause point voice file contains a predetermined voice prompt, the apparatus comprises a command receiving module, a text acquiring module, a voice playing module and a file generating module, wherein: the command receiving module is used for receiving a voice playing command of a user to the text; the text acquisition module is used for acquiring the text from the corresponding digital document of the server and simultaneously playing the voice file of the mid-point by the voice playing module; the voice playing module is used for checking whether the voice file at the stopping point is played completely or not after the text acquisition module finishes acquiring the text, and calling a corresponding voice synthesizer to start voice generation and playing from a position corresponding to the end of the voice file at the stopping point in the text when the voice file at the stopping point is played completely; the file generation module is used for recording the position of the current playing stop in the text and updating the middle stop point by using the position when a user sends a command of stopping the voice playing of the text, generating a voice file corresponding to a text segment with a set length before and after the current stop point in the text, and replacing the voice file of the stop point by using the generated voice file.
Optionally, the text obtaining module is further configured to: reading the digital document and loading the digital document into a local memory; parsing the digital document according to a format of the digital document to identify textual content therein; and extracting text content in the digital document and forming the text.
Optionally, the text obtaining module is further configured to: and timing the time for acquiring the text by using a timer to determine the time length required by acquiring the text, and accordingly determining the length of the text segment corresponding to the voice file at the stopping point as the set length so that the time required by the voice file at the stopping point to finish playing is longer than the time length.
Optionally, the file generation module is further configured to: intercepting the text segment with the set length according to a set rule before and after the current termination point of the text; and recording the end position of the text segment; and generating the voice file according to the text segment by utilizing the voice synthesizer.
Optionally, the setting rule includes: and intercepting the text segment with the set length according to a given proportion before and after the current middle stop point.
Optionally, the types of formats of the digital document include PDF, ePub, txt.
According to yet another aspect of the present invention, an electronic device is provided.
An electronic device, comprising: one or more processors; a storage device to store one or more programs that, when executed by the one or more processors, cause the one or more processors to implement a pause-continue text-to-speech method at a client.
According to yet another aspect of the invention, a computer-readable medium is provided.
A computer-readable medium, on which a computer program is stored, which, when being executed by a processor, implements a pause-and-continue text-to-speech playing method for a client.
According to the technical scheme of the invention, a voice playing command of a user to a text is received, the text is obtained from a corresponding digital document of a server, and meanwhile, a stop point voice file corresponding to a stop point of previous voice playing in the stored text is played; after the text acquisition is finished, checking whether the playing of the voice file at the termination point is finished, and if the playing is finished, starting voice generation and playing from a position in the text corresponding to the end of the voice file at the termination point; when a user sends a command of stopping the voice playing of the text, recording the position of the current playing stop in the text, updating the middle stop point by using the position, generating a voice file corresponding to a text segment with a set length before and after the current playing stop point in the text, and replacing the voice file of the middle stop point by using the generated voice file. By using the technical scheme of the invention, the problem of too low loading speed during voice playing of the digital document can be solved, the waiting time of a user is shortened, and the user experience is improved.
Drawings
The drawings are included to provide a better understanding of the invention and are not to be construed as unduly limiting the invention. Wherein:
FIG. 1 is a general flow diagram of a prior art digital document reading;
FIG. 2 is a schematic diagram of the main steps of a pause-continue text-to-speech playing method at a client according to an embodiment of the present invention;
FIG. 3 is a schematic flow chart of a pause-continue text voice playing method at a client according to an embodiment of the present invention;
fig. 4 is a schematic diagram of main modules of an pause-continue text speech playing apparatus at a client according to an embodiment of the present invention.
Detailed Description
Exemplary embodiments of the present invention are described below with reference to the accompanying drawings, in which various details of embodiments of the invention are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the invention. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
Fig. 2 is a schematic diagram of main steps of an abort-continue text-to-speech playing method at a client according to an embodiment of the present invention.
As shown in fig. 2, the pause-continue text speech playing method at the client end of the embodiment of the present invention mainly includes the following steps S21 to S24.
The text in this embodiment is associated with a stop point, where the stop point is a position of a stop of a previous voice playback in the text, and the stop point corresponds to a stop point voice file stored at the client, where the stop point voice file corresponds to a text segment of a set length before and after the stop point in the text, where when the playback of this time is a first playback of the text, the stop point is a start point of the text, and the stop point voice file contains a predetermined voice prompt. The client of the embodiment of the invention can be a mobile device, such as an embedded device like a mobile phone, a Pad, an electronic book and the like, and can also be a fixed device like a desktop computer and the like.
Step S21: and receiving a voice playing command of the text from the user.
Step S22: and acquiring a text from the corresponding digital document of the server, and simultaneously playing the voice file of the middle stop point.
The types of formats of the digital document can include PDF, ePub, txt, and other types of digital documents.
The step of acquiring the text specifically comprises: reading a digital document and loading the digital document into a local memory; parsing the digital document according to the format of the digital document to identify textual content therein; text content in the digital document is extracted and text is formed.
The step of obtaining the text further comprises the step of timing the time for obtaining the text by using a timer so as to determine the time length required by obtaining the text, and the length of the text segment corresponding to the middle stop point voice file is determined as the set length according to the time length, so that the time required by the completion of playing the stop point voice file is longer than the time length.
Step S23: and after the text acquisition is finished, checking whether the playing of the voice file at the termination point is finished, and calling a corresponding voice synthesizer to start voice generation and playing from a position corresponding to the end of the voice file at the termination point in the text when the playing of the voice file at the termination point is finished.
Step S24: when a user sends a command of stopping the voice playing of the text, recording the position of the current playing stop in the text, updating the middle stop point by using the position, generating a voice file corresponding to a text segment with a set length before and after the current playing stop point in the text, and replacing the voice file of the middle stop point by using the generated voice file.
Generating a voice file corresponding to a text segment with a set length before and after the current stopping point in the text, wherein the text segment with the set length is mainly intercepted before and after the current stopping point of the text according to a set rule; and recording the end position of the text segment; and generating a voice file according to the text segment by using a voice synthesizer.
Wherein, setting the rule may include: and intercepting the text segment with the set length according to a given proportion before and after the current middle stop point.
Fig. 3 is a schematic flow chart illustrating a preferred pause-continue text voice playing method at a client according to an embodiment of the present invention. Wherein:
after the client receives a voice playing command of a user for a text, whether a suspension point voice file exists in a local cache is checked, if yes, the suspension point voice file in the cache is played through a voice playing module, and if not, a preset voice prompt is played (not shown in the figure). The system comprises a text playing module, a voice file storing module and a client, wherein the middle stop point is the position corresponding to the text when the last voice playing is stopped, and the user stores text segments with set lengths before and after the stop point when the last voice playing is stopped, generates a voice file according to the text segments and stores the voice file in a local cache of the client. If the text is currently being voice played for the first time, a predetermined voice prompt, such as a voice of "current document is loading", is played, and the voice prompt may be played in a loop.
The method comprises the following steps that when a voice playing module plays a mid-point voice file or a preset voice prompt, a client side obtains a text from a corresponding digital document of a server, and the specific process comprises the following steps: firstly, reading a digital document: reading a digital document from a server through a digital document storage path and loading the digital document to a local memory; secondly, analyzing the digital document: for example, for a PDF document, each page in the PDF document and objects related to the pages (the objects contain text information) are mainly parsed, for an ePub file, a file list and a corresponding chapter sequence file in the PDF document are mainly parsed to obtain each chapter file (HTML file), and for a text type file (txt file), a text is directly obtained; thirdly, extracting text contents in the digital document and forming a text: for a PDF document, a text type object is mainly extracted from a content object of each page, for an ePub file, a chapter file is mainly analyzed to obtain each paragraph, then only texts in the paragraphs are extracted, and for a text type file (txt file), the texts obtained by analysis can be directly used as the texts can be directly obtained by analysis; recording the duration required by acquiring the text: specifically, the length of the text segment can be determined by calculating the time length and a preset playing speed, for example, the playing speed is 120 words/minute, the time length required for obtaining the text is 5 seconds, the product of the two is multiplied by a preset coefficient a to obtain the length of the text segment, the preset coefficient a can be set by itself, for example, can be set to 12, and then the time length is the playing speed and the preset coefficient a is 120 words. Thus, at a playback speech rate of 120 words/minute, the playback completion time for playing back the pause-point speech file generated from the text segment of that length is 1 minute. Theoretically, the duration required for obtaining the text is the same every time the voice playing is performed, so that the playing completion time of the voice file at the stop point every time the voice playing is performed is also the same under the condition of the same playing speed and the preset coefficient value. However, considering the influence of factors such as the CPU and the memory of the client used each time, there may be a difference in the duration required for acquiring the text each time, and therefore, when setting the specific value of the preset coefficient a, the length of the calculated text segment is such that the playing completion time of the voice file at the termination point is longer than the duration for normally acquiring the text (i.e. the duration without considering the influence of factors such as the CPU and the memory of the client), that is, assuming that the duration required for normally acquiring the text is 5 seconds, the following calculation is performed: duration is playback speed is preset to be the length of the text segment determined by the coefficient a, and the playback completion time of the generated voice file should be longer than 5 seconds, for example, the coefficient a is set to 12, so that the playback completion time of the pause point voice file is 1 minute at the playback speed of 120 words/minute. Therefore, the situation that the text is not completely acquired when the speech file at the pause point is played is avoided from being influenced by factors such as a CPU (central processing unit) and a memory of the client when the speech file at the pause point is played next time.
And after the client finishes acquiring the text, the voice playing module checks whether the playing of the voice file of the middle stop point is finished, and when the playing of the voice file of the middle stop point is finished, the voice generation and playing are started from the position corresponding to the end of the voice file of the stop point in the text. The voice playing module can be specifically a voice reading SDK.
When a command of stopping the voice playing of the text sent by a user is received, the position of the current playing stopping in the text is recorded, a voice file corresponding to a text segment with a set length before and after the position in the text is generated, and the generated voice file is stored in a local cache to replace the voice file at the stopping point of the current local cache, so that the generated voice file at the stopping point is played while the text is acquired when the voice playing is performed next time. The text passage of the set length may be truncated according to a set rule, specifically, the truncation may be performed according to a given ratio before and after the current middle stop point, for example, the ratio may be set to 1:3, assuming that the calculation is performed by: the duration of the text is obtained, the playback speed is preset by a factor a of 120, so that 120 words 1/4 is 30 words before the position where the current playback is stopped, and 120 words 3/4 is 90 words after the position where the current playback is stopped. Then, the end position of the intercepted text segment is recorded, for example, information such as a chapter, a paragraph, and a character where the end position is located is recorded, and a speech file is generated from the text segment by using a speech synthesizer.
And storing the generated voice file in a local cache as a stop point voice file when voice playing is performed next time.
Fig. 4 is a schematic diagram of main modules of an pause-continue text speech playing apparatus at a client according to an embodiment of the present invention. The text of the embodiment of the present invention is associated with an end point, where the end point is a position of an end of a previous voice playback in the text, and the end point corresponds to an end point voice file stored at a client, where the end point voice file corresponds to a text segment with a set length before and after the end point in the text, where when the current playback is a first playback of the text, the end point is a start point of the text, and the end point voice file includes a predetermined voice prompt.
The pause-continue text voice playing device 40 at the client according to the embodiment of the present invention mainly comprises: a command receiving module 41, a text acquiring module 42, a voice playing module 43 and a file generating module 44.
Wherein: the command receiving module 41 is configured to receive a voice playing command of a user for a text; the text acquisition module 42 is used for acquiring texts from the corresponding digital documents of the server, and simultaneously, the voice playing module 43 plays the voice files of the stop points; the voice playing module 43 is configured to check whether the playing of the voice file at the termination point is completed after the text obtaining module 42 completes the obtaining of the text, and call a corresponding voice synthesizer to start voice generation and playing from a position in the text corresponding to the end of the voice file at the termination point when the playing of the voice file at the termination point is completed; the file generating module 44 is configured to, when a user issues a command to terminate the voice playing of the text, record a position where the playing is currently terminated in the text and update the middle point with the position, generate a voice file corresponding to a text segment with a set length before and after the current termination point in the text, and replace the middle point voice file with the generated voice file.
The text acquisition module 42 may also be used to read digital documents and load them into local memory; parsing the digital document according to the format of the digital document to identify textual content therein; text content in the digital document is extracted and text is formed.
In addition, the text acquisition module 42 may be further configured to: and timing the time for acquiring the text by using a timer to determine the time length required by acquiring the text, and determining the length of the text segment corresponding to the middle stop point voice file as a set length according to the time length so that the time required by the completion of the playing of the stop point voice file is longer than the time length.
The file generation module 44 may also be configured to: intercepting text segments with set lengths according to set rules before and after the current termination point of the text; and recording the end position of the text segment; and generating a voice file according to the text segment by using a voice synthesizer. Wherein, setting the rule specifically may include: and intercepting the text segment with the set length according to a given proportion before and after the current middle stop point.
Types of formats of digital documents include, but are not limited to, PDF, ePub, txt.
According to the technical scheme of the embodiment of the invention, a voice playing command of a user to a text is received, the text is obtained from a corresponding digital document of a server, and a stop point voice file corresponding to a stop point of previous voice playing in the stored text is played; after the text acquisition is finished, checking whether the playing of the voice file at the termination point is finished, and if the playing is finished, starting voice generation and playing from a position in the text corresponding to the end of the voice file at the termination point; when a user sends a command of stopping the voice playing of the text, recording the position of the current playing stop in the text, updating the middle stop point by using the position, generating a voice file corresponding to a text segment with a set length before and after the current playing stop point in the text, and replacing the voice file of the middle stop point by using the generated voice file. By using the technical scheme of the embodiment of the invention, the problem of too low loading speed during voice playing of the digital document can be solved, the waiting time of a user is shortened, and the user experience is improved.
The above-described embodiments should not be construed as limiting the scope of the invention. Those skilled in the art will appreciate that various modifications, combinations, sub-combinations, and substitutions can occur, depending on design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (14)

1. A pause-continue type text-to-speech playing method at a client, wherein the text is associated with a pause point, the pause point is a position of pause of a previous speech playing in the text, and the pause point corresponds to a pause point speech file saved at the client, the stop point speech file corresponds to a text segment with a set length before and after the pause point in the text, wherein when the current playing is the first playing of the text, the pause point is a start point of the text, and the pause point speech file contains a predetermined speech prompt, the method comprising:
receiving a voice playing command of a user to the text;
acquiring the text from a corresponding digital document of a server, and simultaneously playing the mid-point voice file;
after the acquisition of the text is finished, checking whether the playing of the voice file at the stopping point is finished, and calling a corresponding voice synthesizer to start voice generation and playing from a position corresponding to the end of the voice file at the stopping point in the text when the playing of the voice file at the middle stopping point is finished;
when a user sends a command of stopping the voice playing of the text, recording the position of the current playing stopping in the text, updating the middle stop point by using the position, generating a voice file corresponding to a text segment with a set length before and after the current stopping point in the text, and replacing the voice file at the stopping point by using the generated voice file.
2. The method of claim 1, wherein the step of obtaining the text comprises:
reading the digital document and loading the digital document into a local memory;
parsing the digital document according to a format of the digital document to identify textual content therein;
and extracting text content in the digital document and forming the text.
3. The method of claim 1, wherein obtaining the text further comprises timing the time for obtaining the text by using a timer to determine a duration required for obtaining the text, and accordingly determining a length of a text segment corresponding to the suspension point voice file as the set length, so that the time required for completing the playing of the suspension point voice file is longer than the duration.
4. The method according to claim 1, wherein the step of generating the voice file corresponding to the text segment with the set length before and after the current pause point in the text comprises:
intercepting the text segment with the set length according to a set rule before and after the current termination point of the text;
and recording the end position of the text segment; and
and generating the voice file according to the text segment by utilizing the voice synthesizer.
5. The method of claim 4, wherein setting the rule comprises:
and intercepting the text segment with the set length according to a given proportion before and after the current middle stop point.
6. The method of claim 1, wherein the type of format of the digital document comprises PDF, ePub, txt.
7. A pause-continue type text-to-speech playing apparatus at a client, wherein the text is associated with a pause point, the pause point is a position where the previous speech playing in the text is paused, and the pause point corresponds to a pause point speech file saved at the client, the stop point speech file corresponds to a text segment of a set length before and after the pause point in the text, wherein when the current playing is the first playing of the text, the pause point is a start point of the text, and the pause point speech file contains a predetermined speech prompt, the apparatus comprises a command receiving module, a text acquiring module, a speech playing module, and a file generating module, wherein:
the command receiving module is used for receiving a voice playing command of a user to the text;
the text acquisition module is used for acquiring the text from the corresponding digital document of the server and simultaneously playing the mid-point voice file by the voice playing module;
the voice playing module is configured to check whether the playing of the voice file at the termination point is completed after the text acquisition module finishes acquiring the text, and call a corresponding voice synthesizer to start voice generation and playing from a position in the text corresponding to the end of the voice file at the termination point when the playing of the voice file at the termination point is completed;
and the file generation module is used for recording the position of the current playing stop in the text and updating the middle stop point by using the position when a user sends a command of stopping the voice playing of the text, generating a voice file corresponding to a text segment with a set length before and after the current stop point in the text, and replacing the voice file at the stop point by using the generated voice file.
8. The apparatus of claim 7, wherein the text acquisition module is further configured to:
reading the digital document and loading the digital document into a local memory;
parsing the digital document according to a format of the digital document to identify textual content therein;
and extracting text content in the digital document and forming the text.
9. The apparatus of claim 7, wherein the text acquisition module is further configured to:
and timing the time for acquiring the text by using a timer to determine the time length required by acquiring the text, and accordingly determining the length of the text segment corresponding to the voice file at the stopping point as the set length so that the time required by the voice file at the stopping point to finish playing is longer than the time length.
10. The apparatus of claim 7, wherein the file generation module is further configured to:
intercepting the text segment with the set length according to a set rule before and after the current termination point of the text;
and recording the end position of the text segment; and
and generating the voice file according to the text segment by utilizing the voice synthesizer.
11. The apparatus of claim 10, wherein the setting rules comprises:
and intercepting the text segment with the set length according to a given proportion before and after the current middle stop point.
12. The apparatus of claim 7, wherein the type of format of the digital document comprises PDF, ePub, txt.
13. An electronic device, comprising:
one or more processors;
a storage device for storing one or more programs,
the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method recited in any of claims 1-6.
14. A computer-readable medium, on which a computer program is stored, which, when being executed by a processor, carries out the method according to any one of claims 1-6.
CN201610871990.5A 2016-09-30 2016-09-30 Pause-continue type text voice playing method and device at client Active CN107886939B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610871990.5A CN107886939B (en) 2016-09-30 2016-09-30 Pause-continue type text voice playing method and device at client

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610871990.5A CN107886939B (en) 2016-09-30 2016-09-30 Pause-continue type text voice playing method and device at client

Publications (2)

Publication Number Publication Date
CN107886939A CN107886939A (en) 2018-04-06
CN107886939B true CN107886939B (en) 2021-03-30

Family

ID=61768922

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610871990.5A Active CN107886939B (en) 2016-09-30 2016-09-30 Pause-continue type text voice playing method and device at client

Country Status (1)

Country Link
CN (1) CN107886939B (en)

Citations (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1916907A (en) * 2005-08-17 2007-02-21 株式会社东芝 Information processing apparatus, information processing method
CN1956530A (en) * 2005-10-24 2007-05-02 三星电子株式会社 Method and apparatus for generating moving picture clip and/or displaying content file list
CN101127870A (en) * 2007-09-13 2008-02-20 深圳市融合视讯科技有限公司 A creation and use method for video stream media bookmark
CN101867780A (en) * 2010-04-30 2010-10-20 中山大学 Break-point continuous playing method for digital television and digital television
CN102196313A (en) * 2010-03-08 2011-09-21 华为技术有限公司 Method and device for continuous playing of cross-platform breakpoint as well as method and device for continuous playing of breakpoint
CN102724566A (en) * 2011-02-11 2012-10-10 索尼公司 Method and apparatus for content playback using multiple IPTV devices
CN103167358A (en) * 2011-12-09 2013-06-19 深圳市快播科技有限公司 Set top box, media playing processing method and media resuming playing method
CN104038827A (en) * 2014-06-06 2014-09-10 小米科技有限责任公司 Multimedia playing method and device
US8978076B2 (en) * 2012-11-05 2015-03-10 Comcast Cable Communications, Llc Methods and systems for content control
CN104954866A (en) * 2015-06-19 2015-09-30 杭州施强网络科技有限公司 Dynamic control method for playing point in live broadcast of streaming media data
CN105100912A (en) * 2014-05-12 2015-11-25 联想(北京)有限公司 Streaming media processing method and streaming media processing apparatus
CN105095321A (en) * 2014-05-22 2015-11-25 中兴通讯股份有限公司 Electronic bookmark implementation method and apparatus as well as electronic device
CN105530547A (en) * 2014-09-30 2016-04-27 中兴通讯股份有限公司 Bookmark display method and device for internet television on-demand content, and set top box
CN105704512A (en) * 2014-10-06 2016-06-22 财团法人资讯工业策进会 Video capturing system and video capturing method thereof
CN105828192A (en) * 2016-03-22 2016-08-03 乐视网信息技术(北京)股份有限公司 Multi-terminal video continuous playing method and device
CN105898583A (en) * 2015-01-26 2016-08-24 北京搜狗科技发展有限公司 Image recommendation method and electronic equipment

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7124082B2 (en) * 2002-10-11 2006-10-17 Twisted Innovations Phonetic speech-to-text-to-speech system and method
US20040260551A1 (en) * 2003-06-19 2004-12-23 International Business Machines Corporation System and method for configuring voice readers using semantic analysis
US20050177369A1 (en) * 2004-02-11 2005-08-11 Kirill Stoimenov Method and system for intuitive text-to-speech synthesis customization
US20060106618A1 (en) * 2004-10-29 2006-05-18 Microsoft Corporation System and method for converting text to speech
WO2007023436A1 (en) * 2005-08-26 2007-03-01 Koninklijke Philips Electronics N.V. System and method for synchronizing sound and manually transcribed text
US8000969B2 (en) * 2006-12-19 2011-08-16 Nuance Communications, Inc. Inferring switching conditions for switching between modalities in a speech application environment extended for interactive text exchanges
US8392192B2 (en) * 2007-09-18 2013-03-05 Samuel Seungmin Cho Method and apparatus for improving transaction success rates for voice reminder applications in E-commerce
US20090313020A1 (en) * 2008-06-12 2009-12-17 Nokia Corporation Text-to-speech user interface control
CN102543068A (en) * 2010-12-31 2012-07-04 北大方正集团有限公司 Method and device for speech broadcast of text information
US9368114B2 (en) * 2013-03-14 2016-06-14 Apple Inc. Context-sensitive handling of interruptions
CN105609096A (en) * 2015-12-30 2016-05-25 小米科技有限责任公司 Text data output method and device

Patent Citations (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1916907A (en) * 2005-08-17 2007-02-21 株式会社东芝 Information processing apparatus, information processing method
CN1956530A (en) * 2005-10-24 2007-05-02 三星电子株式会社 Method and apparatus for generating moving picture clip and/or displaying content file list
CN101127870A (en) * 2007-09-13 2008-02-20 深圳市融合视讯科技有限公司 A creation and use method for video stream media bookmark
CN102196313A (en) * 2010-03-08 2011-09-21 华为技术有限公司 Method and device for continuous playing of cross-platform breakpoint as well as method and device for continuous playing of breakpoint
CN101867780A (en) * 2010-04-30 2010-10-20 中山大学 Break-point continuous playing method for digital television and digital television
CN102724566A (en) * 2011-02-11 2012-10-10 索尼公司 Method and apparatus for content playback using multiple IPTV devices
CN103167358A (en) * 2011-12-09 2013-06-19 深圳市快播科技有限公司 Set top box, media playing processing method and media resuming playing method
US8978076B2 (en) * 2012-11-05 2015-03-10 Comcast Cable Communications, Llc Methods and systems for content control
CN105100912A (en) * 2014-05-12 2015-11-25 联想(北京)有限公司 Streaming media processing method and streaming media processing apparatus
CN105095321A (en) * 2014-05-22 2015-11-25 中兴通讯股份有限公司 Electronic bookmark implementation method and apparatus as well as electronic device
CN104038827A (en) * 2014-06-06 2014-09-10 小米科技有限责任公司 Multimedia playing method and device
CN105530547A (en) * 2014-09-30 2016-04-27 中兴通讯股份有限公司 Bookmark display method and device for internet television on-demand content, and set top box
CN105704512A (en) * 2014-10-06 2016-06-22 财团法人资讯工业策进会 Video capturing system and video capturing method thereof
CN105898583A (en) * 2015-01-26 2016-08-24 北京搜狗科技发展有限公司 Image recommendation method and electronic equipment
CN104954866A (en) * 2015-06-19 2015-09-30 杭州施强网络科技有限公司 Dynamic control method for playing point in live broadcast of streaming media data
CN105828192A (en) * 2016-03-22 2016-08-03 乐视网信息技术(北京)股份有限公司 Multi-terminal video continuous playing method and device

Also Published As

Publication number Publication date
CN107886939A (en) 2018-04-06

Similar Documents

Publication Publication Date Title
US10503470B2 (en) Method for user training of information dialogue system
US11176141B2 (en) Preserving emotion of user input
CN106960051B (en) Audio playing method and device based on electronic book and terminal equipment
US9754591B1 (en) Dialog management context sharing
US10109273B1 (en) Efficient generation of personalized spoken language understanding models
CN105975311B (en) Application starting method and device
US20140278400A1 (en) Search Results Using Intonation Nuances
US8670984B2 (en) Automatically generating audible representations of data content based on user preferences
CN112700769B (en) Semantic understanding method, semantic understanding device, semantic understanding equipment and computer readable storage medium
US9355250B2 (en) Method and system for rapidly scanning files
CN110060656B (en) Model management and speech synthesis method, device and system and storage medium
US10878835B1 (en) System for shortening audio playback times
US20200218760A1 (en) Music search method and device, server and computer-readable storage medium
US20200035243A1 (en) System and method for uninterrupted application awakening and speech recognition
CN110399306B (en) Automatic testing method and device for software module
JP2014513828A (en) Automatic conversation support
CN106528715B (en) Audio content checking method and device
CN107886939B (en) Pause-continue type text voice playing method and device at client
CN111506747B (en) File analysis method, device, electronic equipment and storage medium
CN116634246A (en) Video generation method, device, equipment, medium and program product
CN111445344A (en) Insurance business list processing method, electronic equipment and storage medium
CN111382258A (en) Method and device for determining electronic reading object chapter
CN110928549B (en) Method and device for re-editing front-end script program
CN113778820A (en) Application program data export method, device, electronic device and storage medium
CN112861482A (en) File online editing method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant