CN107886939A - A kind of termination splice text voice playing method and device in client - Google Patents

A kind of termination splice text voice playing method and device in client Download PDF

Info

Publication number
CN107886939A
CN107886939A CN201610871990.5A CN201610871990A CN107886939A CN 107886939 A CN107886939 A CN 107886939A CN 201610871990 A CN201610871990 A CN 201610871990A CN 107886939 A CN107886939 A CN 107886939A
Authority
CN
China
Prior art keywords
text
voice
document
middle stop
stop
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201610871990.5A
Other languages
Chinese (zh)
Other versions
CN107886939B (en
Inventor
熊健南
莫文
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jingdong Century Trading Co Ltd
Beijing Jingdong Shangke Information Technology Co Ltd
Original Assignee
Beijing Jingdong Century Trading Co Ltd
Beijing Jingdong Shangke Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jingdong Century Trading Co Ltd, Beijing Jingdong Shangke Information Technology Co Ltd filed Critical Beijing Jingdong Century Trading Co Ltd
Priority to CN201610871990.5A priority Critical patent/CN107886939B/en
Publication of CN107886939A publication Critical patent/CN107886939A/en
Application granted granted Critical
Publication of CN107886939B publication Critical patent/CN107886939B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/08Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/28Constructional details of speech recognition systems
    • G10L15/30Distributed recognition, e.g. in client-server systems, for mobile phones or network applications
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/08Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination
    • G10L2013/083Special characters, e.g. punctuation marks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command

Abstract

The present invention provides a kind of termination splice text voice playing method and device in client, can solve the problems, such as that loading velocity is excessively slow during digital document speech play, shortens period of reservation of number, improves Consumer's Experience.The termination splice text voice player method in client of the present invention includes:Receive speech play order of the user to text;Text, while stop voice document in broadcasting are obtained from the corresponding digital document of server;Complete text acquisition after, whether stop voice document finishes in inspection, if finishing, since text with speech production and broadcasting the middle corresponding position of stop voice document ending;When user sends the order for the speech play for stopping text, the position of currently playing termination and using stop in the location updating in recording text, generate it is current in text in setting length before and after stop text chunk corresponding to voice document and replacement in stop voice document.

Description

A kind of termination-splice text voice playing method and device in client
Technical field
The present invention relates to computer and its software technology field, a kind of particularly termination-splice in client Text voice playing method and device.
Background technology
With the development of mobile Internet, the utilization to voice technology is more and more, to the massage voice reading of digital document Increasingly popularize.Under many scenes, such as when driving or under the medium scene of congested traffic instrument, carry out vision reading It is not very convenient.Therefore, in a mobile device, rapidly text is loaded and is parsed and carry out massage voice reading, into For a kind of welcome application.
The scheme read aloud at present digital document mainly first reads digital document file and parsed, Ran Houti The content of text in digital document is taken, finally calls voice module to be read aloud.Idiographic flow as shown in figure 1, according to Fig. 1, The overall procedure that existing digital document is read aloud mainly includes:
S11:The digital document under particular path is read, is loaded into internal memory;
S12:To the digital document file being already loaded into internal memory, its structure is parsed to obtain the information in inside;
Wherein, for PDF document, each page therein, and object (these that these pages are related are mainly parsed Contain text information in object);For ePub files, lists of documents therein and corresponding chapters and sections sequential file are mainly parsed Each chapters and sections file (html file) is obtained, for text type (txt) file, then directly obtains text.
S13:Extract the content of text in digital document;
Wherein, for PDF document, from the content object of every page, the object of text type is taken out;For ePub texts Part, chapters and sections file is parsed, wherein each paragraph is obtained, then only takes paragraph text therein;For the file of text type, directly Connect the result using previous step (S12 steps).
S14:Document is submitted into massage voice reading module to be read aloud.
The defects of certain be present in above-mentioned scheme, be mainly reflected in parsing document speed it is not fast enough, and need solving Document is analysed and could start to read aloud (broadcasting) when needing to have extracted text, caused period of reservation of number long, influence user's body Test.
The content of the invention
In view of this, the present invention provides a kind of termination-splice text voice playing method and device in client, energy Enough solve the problems, such as that loading velocity is excessively slow during digital document speech play, shorten period of reservation of number, improve Consumer's Experience.
To achieve the above object, according to an aspect of the invention, there is provided a kind of termination-splice text in client This speech playing method.
A kind of termination-splice text voice player method in client, the text is associated with a middle stop, Stop is the position of the termination of previous speech play in the text in this, and the middle stop is corresponding in client guarantor The middle stop voice document deposited, the middle stop voice document correspond to the text of the setting length in this in the text before and after stop This section, wherein, when playing the first time that this broadcasting is the text, stop is the starting point of the text in this, and it is described in Stop voice document includes predetermined voice message, and methods described includes:Receive speech play order of the user to the text; The text is obtained from the corresponding digital document of server, while plays the middle stop voice document;When the completion text After this acquisition, check whether the middle stop voice document finishes, and when the middle stop voice document plays Bi Shi, call corresponding VODER since the text with language the middle corresponding position of stop voice document ending Sound is generated and played;When user sends the order for the speech play for stopping the text, record currently playing in the text The position of termination simultaneously utilizes the setting length before and after current middle stop in middle stop described in the location updating, and the generation text Voice document corresponding to the text chunk of degree, and replace the middle stop voice document with the voice document generated.
Alternatively, wherein, the step of obtaining the text, includes:Read the digital document and be loaded into and be locally stored Device;According to digital document described in the format analysis of the digital document to identify content of text therein;Extract the numeral Content of text in document simultaneously forms the text.
Alternatively, wherein, obtain the text and timing also is carried out to the time for obtaining the text using timer To determine to obtain the duration needed for the text, and determine therefrom that the length of text chunk corresponding to the middle stop voice document is made For the setting length so that the time that the middle stop voice document is completed needed for broadcasting is more than the duration.
Alternatively, the voice document corresponding to the text chunk of the setting length in the text before and after current middle stop is generated The step of, including:According to the text chunk of the setting rule interception setting length before and after the current middle stop of the text;And Record the end position of text section;And institute's voice file is generated according to text section using the VODER.
Alternatively, the setting rule includes:Before and after current middle stop the setting is intercepted according to given ratio The text chunk of length.
Alternatively, the type of the form of the digital document includes PDF, ePub, txt.
According to another aspect of the present invention, there is provided a kind of termination-splice text voice playing device in client.
A kind of termination-splice text voice playing device in client, the text is associated with a middle stop, Stop is the position of the termination of previous speech play in the text in this, and the middle stop is corresponding in client guarantor The middle stop voice document deposited, the middle stop voice document correspond to the text of the setting length in this in the text before and after stop This section, wherein, when playing the first time that this broadcasting is the text, stop is the starting point of the text in this, and it is described in Stop voice document includes predetermined voice message, and described device includes Order receiver module, text acquisition module, speech play Module and file generating module, wherein:The Order receiver module is used to receive speech play order of the user to the text; The text acquisition module is used to obtain the text from the corresponding digital document of server, while by the speech play mould Block plays the middle stop voice document;The voice playing module is used for when the text acquisition module completes the text After acquisition, check whether the middle stop voice document finishes, and when the middle stop voice document finishes, Corresponding VODER is called to be given birth to since the text with voice the middle corresponding position of stop voice document ending Into and play;The file generating module is used for when user sends the order for the speech play for stopping the text, records institute State in text the position of currently playing termination and using in current in middle stop described in the location updating, and the generation text Voice document corresponding to the text chunk of setting length before and after stop, and replace the middle stop with the voice document generated Voice document.
Alternatively, wherein, the text acquisition module is additionally operable to:Read the digital document and be loaded into and be locally stored Device;According to digital document described in the format analysis of the digital document to identify content of text therein;Extract the numeral Content of text in document simultaneously forms the text.
Alternatively, wherein, the text acquisition module is additionally operable to:The time for obtaining the text is carried out using timer Timing determines therefrom that the length of text chunk corresponding to the middle stop voice document to determine to obtain the duration needed for the text Degree is used as the setting length so that the time that the middle stop voice document is completed needed for broadcasting is more than the duration.
Alternatively, the file generating module is additionally operable to:According to setting rule before and after the current middle stop of the text Intercept the text chunk of the setting length;And record the end position of text section;And using the VODER according to Text section generation institute voice file.
Alternatively, the setting rule includes:Before and after current middle stop the setting is intercepted according to given ratio The text chunk of length.
Alternatively, the type of the form of the digital document includes PDF, ePub, txt.
Technique according to the invention scheme, speech play order of the user to text is received, from the corresponding numeral of server Text is obtained in document, while plays middle stop voice document corresponding to the middle stop of previous speech play in the text of preservation; When complete text obtain after, whether stop voice document finishes in inspection, if finishing, from text with middle stop language Position corresponding to sound EOF starts speech production and broadcasting;When user sends the order for the speech play for stopping text, In recording text the position of currently playing termination and using stop in current in stop in the location updating, and generation text before Voice document corresponding to the text chunk of setting length afterwards, and stop voice document in being replaced with the voice document that is generated. Using technical scheme, the problem of loading velocity is excessively slow when can solve the problem that digital document speech play, shorten user etc. Treat the time, improve Consumer's Experience.
Brief description of the drawings
Accompanying drawing is used to more fully understand the present invention, does not form inappropriate limitation of the present invention.Wherein:
Fig. 1 is the overall procedure schematic diagram that the digital document of prior art is read aloud;
Fig. 2 is the main step of termination-splice text voice player method according to embodiments of the present invention in client Rapid schematic diagram;
Fig. 3 is the preferred stream of termination-splice text voice player method according to embodiments of the present invention in client Journey schematic diagram;
Fig. 4 is the main mould of termination-splice text voice playing device according to embodiments of the present invention in client Block schematic diagram.
Embodiment
The one exemplary embodiment of the present invention is explained below in conjunction with accompanying drawing, including the various of the embodiment of the present invention Details should think them only exemplary to help understanding.Therefore, those of ordinary skill in the art should recognize Arrive, various changes and modifications can be made to the embodiments described herein, without departing from scope and spirit of the present invention.Together Sample, for clarity and conciseness, the description to known function and structure is eliminated in following description.
Fig. 2 is the main step of termination-splice text voice player method according to embodiments of the present invention in client Rapid schematic diagram.
As shown in Fig. 2 the termination in client of the embodiment of the present invention-splice text voice player method mainly includes Steps S21 to step S24.
Text in the present embodiment is associated with a middle stop, and stop is in the text in previous speech play in this Position only, and the corresponding middle stop voice document preserved in client of middle stop, middle stop voice document are corresponded in text The text chunk of setting length in this before and after stop, wherein, when playing the first time that this broadcasting is the text, stop in this It is the starting point of the text, and middle stop voice document includes predetermined voice message.The client of the embodiment of the present invention can be with For mobile device, such as mobile phone, Pad, e-book embedded device, or the fixed equipment such as desktop computer.
Step S21:Receive speech play order of the user to text.
Step S22:Text, while stop voice document in broadcasting are obtained from the corresponding digital document of server.
The type of the form of digital document can include PDF, ePub, txt, or other kinds of digital document.
Wherein, the step of obtaining text specifically includes:Read digital document and be loaded into local storage;According to digital text The format analysis digital document of shelves is to identify content of text therein;Extract the content of text in digital document and form text This.
Obtain text and timing also is carried out to the time for obtaining text using timer to determine to obtain needed for text Duration, and determine therefrom that the length of text chunk corresponding to middle stop voice document as setting length so that middle stop voice text The time that part completes needed for broadcasting is more than above-mentioned duration.
Step S23:After text acquisition is completed, whether stop voice document finishes in inspection, and central stop When voice document finishes, corresponding VODER is called to be opened from text with the middle corresponding position of stop voice document ending Beginning speech production and broadcasting.
Step S24:When user sends the order for the speech play for stopping text, currently playing termination in recording text Position is simultaneously right using the text chunk institute of the setting length before and after current middle stop in stop in the location updating, and generation text The voice document answered, and stop voice document in being replaced with the voice document that is generated.
The voice document corresponding to the text chunk of the setting length in text before and after current middle stop is generated, mainly in text The front and rear text chunk according to setting rule interception setting length of this current middle stop;And record the end position of text section; And voice document is generated according to text section using VODER.
Wherein, setting rule can include:According to given ratio interception setting length before and after current middle stop Text chunk.
Fig. 3 shows the preferred stream of the termination in client-splice text voice player method of the embodiment of the present invention Journey schematic diagram.Wherein:
After client receives user to the speech play order of text, first check for whether having in local cache Middle stop voice document, if so, then playing the middle stop voice document in caching by voice playing module, otherwise, play pre- Fixed voice message (not shown).Wherein, middle stop is position corresponding when the last speech play text stops, User it is upper once stop the speech play text when save the text chunk of setting length in this before and after stop, and according to this Text chunk generates voice document and is stored in the local cache of client.If it is currently to carry out voice to the text first to broadcast Put, then play predetermined voice message, such as the voice such as " current document loads ", and the voice message is recycled and broadcast Put.
Client is while stop voice document during voice playing module plays or predetermined voice message, from server Corresponding digital document in obtain text, detailed process includes:First, digital document is read:By digital document store path from Server reads digital document, and is loaded into local storage;2nd, digital document is parsed:To being already loaded into local storage In digital document, according to document format to digital document carry out structure elucidation, to identify content of text therein, for example, right In PDF document, each page therein is mainly parsed, and the object that these pages are related (is believed in these objects containing word Breath), for ePub files, lists of documents therein and corresponding chapters and sections sequential file are mainly parsed, to obtain each chapter File (html file) is saved, for the file (txt file) of text type, then directly obtains text;3rd, extract in digital document Content of text and form text:Wherein, for PDF document, mainly from the content object of every page, text class is extracted The object of type, for ePub files, chapters and sections file is mainly parsed, each paragraph is obtained, then only extracts paragraph In text, for the file (txt file) of text type, due to text can be directly obtained by parsing, then directly using solving Analyse obtained text;4th, record obtains the duration needed for text:The time for obtaining text is counted using timer When, to determine to obtain the duration needed for text, and determine therefrom that play stop when for generating the text chunk of voice document Length, text section for play middle stop before and after setting length text chunk, and generation voice document by as in Stop voice document is stored in the local cache of client, specifically, can be carried out by the duration and default broadcasting word speed Calculate and determine the numerical value of the setting length, for example, it be 120 words per minute clocks to play word speed, needed for acquisition text when a length of 5 seconds, Then the product of the two is multiplied with a default coefficient A can draw the length of text chunk, and the default coefficient A can be voluntarily Set, such as could be arranged to 12, then, duration * plays word speed * predetermined coefficient A=120 words.So, in 120 words per minute clocks Play under word speed, play the time that finishes playing of the middle stop voice document generated according to the text chunk of the length as 1 minute.Reason By upper, when carrying out speech play every time, the duration obtained needed for text is identical, therefore, word speed and pre- is played in identical If in the case of coefficient value, when carrying out speech play every time in time that finishes playing of stop voice document be also identical.But The influence of the factors such as CPU, the internal memory of the client used every time is allowed for, may cause to obtain the duration needed for text every time Have differences, therefore, when setting default coefficient A concrete numerical value, during the text segment length that should to be calculated meets The time that finishes playing of stop voice document be longer than generally obtain text duration (for do not consider the CPU of client, internal memory etc. because Duration under the influence of element), that is, assume generally obtain text needed for when a length of 5 seconds, then, by being calculated as below:Duration * is played The text segment length that word speed * predetermined coefficients A is determined, and the time that finishes playing of voice document generated should be longer than 5 seconds, for example, Coefficient A=12 is set so that under the broadcasting word speed of 120 words per minute clocks, the time that finishes playing of middle stop voice document is 1 point Clock.When carrying out speech play this avoid next time, the factor such as CPU, internal memory by client is influenceed, middle stop voice document Text does not obtain situation about finishing also and occurred when completing to play.
After client completes the acquisition of text, whether stop voice document finishes in voice playing module inspection, And when central stop voice document finishes, since text with voice the middle corresponding position of stop voice document ending Generation and broadcasting.Voice playing module concretely massage voice reading SDK.
When receiving the order of speech play for the termination text that user sends, currently playing termination in recording text Position, and the voice document corresponding to the text chunk of the setting length in text before and after the position is generated, and in local cache Generated voice document is preserved to replace the middle stop voice document of current local cache, when carrying out speech play so as to next time The middle stop voice document of this generation is played while text is obtained.Setting length can be intercepted according to setting rule Text chunk, specifically, it can be intercepted before and after current middle stop according to given ratio, for example, the ratio can be set It is set to 1:3, it is assumed that by being calculated as below:Obtain the duration * needed for text and play word speed * predetermined coefficient A=120 words, then can To intercept 120 word * 1/4=30 words before the position of currently playing termination, 120 are intercepted after the position of currently playing termination Word * 3/4=90 words.Then, the end position of intercepted text chunk, such as the chapter paragraph where record end position are recorded Fall, the information such as character, and voice document is generated according to text section using VODER.
The voice document of generation is stored in local cache, as middle stop voice text of the next time when carrying out speech play Part.
Fig. 4 is the main mould of termination-splice text voice playing device according to embodiments of the present invention in client Block schematic diagram.Wherein, the text of the embodiment of the present invention is associated with a middle stop, and stop is that previous voice is broadcast in text in this The position for the termination put, and the corresponding middle stop voice document preserved in client of middle stop, middle stop voice document are corresponding The text chunk of setting length in this in text before and after stop, wherein, should when playing the first time that this broadcasting is the text Middle stop is the starting point of the text, and middle stop voice document includes predetermined voice message.
Termination in client-splice text voice playing device 40 according to embodiments of the present invention mainly includes:Life Make receiving module 41, text acquisition module 42, voice playing module 43 and file generating module 44.
Wherein:Order receiver module 41 is used to receive speech play order of the user to text;Text acquisition module 42 is used Text is obtained in the corresponding digital document from server, while stop voice document in being played by voice playing module 43;Language Sound playing module 43 is used for after text acquisition module 42 completes the acquisition of text, and whether stop voice document plays in inspection Finish, and when central stop voice document finishes, call corresponding VODER from text with middle stop voice text Position corresponding to part ending starts speech production and broadcasting;File generating module 44 is used for when user sends the voice of termination text The position of currently playing termination and stop in the location updating, and generation text are utilized during the order of broadcasting, in recording text In it is current in setting length before and after stop text chunk corresponding to voice document, and with the voice document replacement generated Stop voice document.
Text acquisition module 42 can be also used for reading digital document and be loaded into local storage;According to digital document Format analysis digital document is to identify content of text therein;Extract the content of text in digital document and form text.
In addition, text acquisition module 42 can be also used for:Timing is carried out with true to the time for obtaining text using timer Surely the duration needed for text is obtained, and determines therefrom that the length of text chunk corresponding to middle stop voice document is used as setting length, So that the time that middle stop voice document completes needed for broadcasting is more than the duration.
File generating module 44 can be also used for:According to setting rule interception setting length before and after the current middle stop of text The text chunk of degree;And record the end position of text section;And voice text is generated according to text section using VODER Part.Wherein, setting rule can specifically include:According to the text of given ratio interception setting length before and after current middle stop This section.
The type of the form of digital document includes but is not limited to PDF, ePub, txt.
Technical scheme according to embodiments of the present invention, speech play order of the user to text is received, from pair of server Answer and text is obtained in digital document, while play middle stop voice corresponding to the middle stop of previous speech play in the text of preservation File;After text acquisition is completed, whether stop voice document finishes in inspection, if finishing, from text with Position corresponding to the ending of stop voice document starts speech production and broadcasting;When user sends the life for the speech play for stopping text When making, the position of currently playing termination and using in current in stop in the location updating, and generation text in recording text Voice document corresponding to the text chunk of setting length before and after stop, and stop voice in being replaced with the voice document that is generated File.Using the technical scheme of the embodiment of the present invention, the problem of loading velocity is excessively slow when can solve the problem that digital document speech play, Shorten period of reservation of number, improve Consumer's Experience.
Above-mentioned embodiment, does not form limiting the scope of the invention.Those skilled in the art should be bright It is white, depending on design requirement and other factors, various modifications, combination, sub-portfolio and replacement can occur.It is any Modifications, equivalent substitutions and improvements made within the spirit and principles in the present invention etc., should be included in the scope of the present invention Within.

Claims (12)

1. a kind of termination-splice text voice player method in client, it is characterised in that the text and a termination Point is associated, and stop is the position of the termination of previous speech play in the text in this, and the middle stop is corresponding in institute The middle stop voice document of client preservation is stated, the middle stop voice document corresponds to setting before and after stop in this in the text The text chunk of measured length, wherein, when playing the first time that this broadcasting is the text, stop is the starting point of the text in this, And the middle stop voice document includes predetermined voice message, methods described includes:
Receive speech play order of the user to the text;
The text is obtained from the corresponding digital document of server, while plays the middle stop voice document;
After the acquisition of the text is completed, check whether the middle stop voice document finishes, and work as the termination When point voice document finishes, corresponding VODER is called to be ended up from the text with the middle stop voice document Corresponding position starts speech production and broadcasting;
When user sends the order for the speech play for stopping the text, the position of currently playing termination in the text is recorded And utilize the text chunk of the setting length before and after current middle stop in middle stop described in the location updating, and the generation text Corresponding voice document, and replace the middle stop voice document with the voice document generated.
2. according to the method for claim 1, it is characterised in that wherein, the step of obtaining the text includes:
Read the digital document and be loaded into local storage;
According to digital document described in the format analysis of the digital document to identify content of text therein;
Extract the content of text in the digital document and form the text.
3. according to the method for claim 1, it is characterised in that wherein, obtain the text also using timer pair The time for obtaining the text carries out timing to determine to obtain the duration needed for the text, and determines therefrom that the middle stop language The length of text chunk corresponding to sound file is as the setting length so that the middle stop voice document is completed needed for broadcasting Time is more than the duration.
4. according to the method for claim 1, it is characterised in that generate the setting length before and after current middle stop in the text The step of voice document corresponding to the text chunk of degree, including:
According to the text chunk of the setting rule interception setting length before and after the current middle stop of the text;
And record the end position of text section;And
Institute's voice file is generated according to text section using the VODER.
5. according to the method for claim 4, it is characterised in that the setting rule includes:
According to the text chunk of the given ratio interception setting length before and after current middle stop.
6. according to the method for claim 1, it is characterised in that the type of the form of the digital document include PDF, ePub、txt。
7. a kind of termination-splice text voice playing device in client, it is characterised in that the text and a termination Point is associated, and stop is the position of the termination of previous speech play in the text in this, and the middle stop is corresponding in institute The middle stop voice document of client preservation is stated, the middle stop voice document corresponds to setting before and after stop in this in the text The text chunk of measured length, wherein, when playing the first time that this broadcasting is the text, stop is the starting point of the text in this, And the middle stop voice document includes predetermined voice message, and described device includes Order receiver module, text obtains mould Block, voice playing module and file generating module, wherein:
The Order receiver module, for receiving speech play order of the user to the text;
The text acquisition module, for obtaining the text from the corresponding digital document of server, while by the voice Playing module plays the middle stop voice document;
The voice playing module, for after the acquisition of the text acquisition module completion text, checking the termination Whether point voice document finishes, and when the middle stop voice document finishes, calls corresponding phonetic synthesis Device since the text with speech production and broadcasting the middle corresponding position of stop voice document ending;
The file generating module, during order for sending the speech play for stopping the text as user, record the text The position of currently playing termination and stop in current in middle stop described in the location updating, and the generation text is utilized in this Voice document corresponding to the text chunk of front and rear setting length, and replace the middle stop voice with the voice document generated File.
8. device according to claim 7, it is characterised in that wherein, the text acquisition module is additionally operable to:
Read the digital document and be loaded into local storage;
According to digital document described in the format analysis of the digital document to identify content of text therein;
Extract the content of text in the digital document and form the text.
9. device according to claim 7, it is characterised in that wherein, the text acquisition module is additionally operable to:
Timing is carried out to determine to obtain the duration needed for the text to the time for obtaining the text using timer, and accordingly Determine the length of text chunk corresponding to the middle stop voice document as the setting length so that the middle stop voice text The time that part completes needed for broadcasting is more than the duration.
10. device according to claim 7, it is characterised in that the file generating module is additionally operable to:
According to the text chunk of the setting rule interception setting length before and after the current middle stop of the text;
And record the end position of text section;And
Institute's voice file is generated according to text section using the VODER.
11. device according to claim 10, it is characterised in that the setting rule includes:
According to the text chunk of the given ratio interception setting length before and after current middle stop.
12. device according to claim 7, it is characterised in that the type of the form of the digital document include PDF, ePub、txt。
CN201610871990.5A 2016-09-30 2016-09-30 Pause-continue type text voice playing method and device at client Active CN107886939B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610871990.5A CN107886939B (en) 2016-09-30 2016-09-30 Pause-continue type text voice playing method and device at client

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610871990.5A CN107886939B (en) 2016-09-30 2016-09-30 Pause-continue type text voice playing method and device at client

Publications (2)

Publication Number Publication Date
CN107886939A true CN107886939A (en) 2018-04-06
CN107886939B CN107886939B (en) 2021-03-30

Family

ID=61768922

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610871990.5A Active CN107886939B (en) 2016-09-30 2016-09-30 Pause-continue type text voice playing method and device at client

Country Status (1)

Country Link
CN (1) CN107886939B (en)

Citations (27)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040260551A1 (en) * 2003-06-19 2004-12-23 International Business Machines Corporation System and method for configuring voice readers using semantic analysis
US20050177369A1 (en) * 2004-02-11 2005-08-11 Kirill Stoimenov Method and system for intuitive text-to-speech synthesis customization
CN1783212A (en) * 2004-10-29 2006-06-07 微软公司 System and method for converting text to speech
US7124082B2 (en) * 2002-10-11 2006-10-17 Twisted Innovations Phonetic speech-to-text-to-speech system and method
CN1916907A (en) * 2005-08-17 2007-02-21 株式会社东芝 Information processing apparatus, information processing method
CN1956530A (en) * 2005-10-24 2007-05-02 三星电子株式会社 Method and apparatus for generating moving picture clip and/or displaying content file list
CN101127870A (en) * 2007-09-13 2008-02-20 深圳市融合视讯科技有限公司 A creation and use method for video stream media bookmark
CN101207655A (en) * 2006-12-19 2008-06-25 国际商业机器公司 Method and system switching between voice and text exchanging forms in a communication conversation
CN101253549A (en) * 2005-08-26 2008-08-27 皇家飞利浦电子股份有限公司 System and method for synchronizing sound and manually transcribed text
US20090313020A1 (en) * 2008-06-12 2009-12-17 Nokia Corporation Text-to-speech user interface control
US20100070282A1 (en) * 2007-09-18 2010-03-18 Samuel Cho Method and apparatus for improving transaction success rates for voice reminder applications in e-commerce
CN101867780A (en) * 2010-04-30 2010-10-20 中山大学 Break-point continuous playing method for digital television and digital television
CN102196313A (en) * 2010-03-08 2011-09-21 华为技术有限公司 Method and device for continuous playing of cross-platform breakpoint as well as method and device for continuous playing of breakpoint
CN102543068A (en) * 2010-12-31 2012-07-04 北大方正集团有限公司 Method and device for speech broadcast of text information
CN102724566A (en) * 2011-02-11 2012-10-10 索尼公司 Method and apparatus for content playback using multiple IPTV devices
CN103167358A (en) * 2011-12-09 2013-06-19 深圳市快播科技有限公司 Set top box, media playing processing method and media resuming playing method
CN104038827A (en) * 2014-06-06 2014-09-10 小米科技有限责任公司 Multimedia playing method and device
US8978076B2 (en) * 2012-11-05 2015-03-10 Comcast Cable Communications, Llc Methods and systems for content control
CN104954866A (en) * 2015-06-19 2015-09-30 杭州施强网络科技有限公司 Dynamic control method for playing point in live broadcast of streaming media data
CN105027195A (en) * 2013-03-14 2015-11-04 苹果公司 Context-sensitive handling of interruptions
CN105100912A (en) * 2014-05-12 2015-11-25 联想(北京)有限公司 Streaming media processing method and streaming media processing apparatus
CN105095321A (en) * 2014-05-22 2015-11-25 中兴通讯股份有限公司 Electronic bookmark implementation method and apparatus as well as electronic device
CN105530547A (en) * 2014-09-30 2016-04-27 中兴通讯股份有限公司 Bookmark display method and device for internet television on-demand content, and set top box
CN105609096A (en) * 2015-12-30 2016-05-25 小米科技有限责任公司 Text data output method and device
CN105704512A (en) * 2014-10-06 2016-06-22 财团法人资讯工业策进会 Video capturing system and video capturing method thereof
CN105828192A (en) * 2016-03-22 2016-08-03 乐视网信息技术(北京)股份有限公司 Multi-terminal video continuous playing method and device
CN105898583A (en) * 2015-01-26 2016-08-24 北京搜狗科技发展有限公司 Image recommendation method and electronic equipment

Patent Citations (27)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7124082B2 (en) * 2002-10-11 2006-10-17 Twisted Innovations Phonetic speech-to-text-to-speech system and method
US20040260551A1 (en) * 2003-06-19 2004-12-23 International Business Machines Corporation System and method for configuring voice readers using semantic analysis
US20050177369A1 (en) * 2004-02-11 2005-08-11 Kirill Stoimenov Method and system for intuitive text-to-speech synthesis customization
CN1783212A (en) * 2004-10-29 2006-06-07 微软公司 System and method for converting text to speech
CN1916907A (en) * 2005-08-17 2007-02-21 株式会社东芝 Information processing apparatus, information processing method
CN101253549A (en) * 2005-08-26 2008-08-27 皇家飞利浦电子股份有限公司 System and method for synchronizing sound and manually transcribed text
CN1956530A (en) * 2005-10-24 2007-05-02 三星电子株式会社 Method and apparatus for generating moving picture clip and/or displaying content file list
CN101207655A (en) * 2006-12-19 2008-06-25 国际商业机器公司 Method and system switching between voice and text exchanging forms in a communication conversation
CN101127870A (en) * 2007-09-13 2008-02-20 深圳市融合视讯科技有限公司 A creation and use method for video stream media bookmark
US20100070282A1 (en) * 2007-09-18 2010-03-18 Samuel Cho Method and apparatus for improving transaction success rates for voice reminder applications in e-commerce
US20090313020A1 (en) * 2008-06-12 2009-12-17 Nokia Corporation Text-to-speech user interface control
CN102196313A (en) * 2010-03-08 2011-09-21 华为技术有限公司 Method and device for continuous playing of cross-platform breakpoint as well as method and device for continuous playing of breakpoint
CN101867780A (en) * 2010-04-30 2010-10-20 中山大学 Break-point continuous playing method for digital television and digital television
CN102543068A (en) * 2010-12-31 2012-07-04 北大方正集团有限公司 Method and device for speech broadcast of text information
CN102724566A (en) * 2011-02-11 2012-10-10 索尼公司 Method and apparatus for content playback using multiple IPTV devices
CN103167358A (en) * 2011-12-09 2013-06-19 深圳市快播科技有限公司 Set top box, media playing processing method and media resuming playing method
US8978076B2 (en) * 2012-11-05 2015-03-10 Comcast Cable Communications, Llc Methods and systems for content control
CN105027195A (en) * 2013-03-14 2015-11-04 苹果公司 Context-sensitive handling of interruptions
CN105100912A (en) * 2014-05-12 2015-11-25 联想(北京)有限公司 Streaming media processing method and streaming media processing apparatus
CN105095321A (en) * 2014-05-22 2015-11-25 中兴通讯股份有限公司 Electronic bookmark implementation method and apparatus as well as electronic device
CN104038827A (en) * 2014-06-06 2014-09-10 小米科技有限责任公司 Multimedia playing method and device
CN105530547A (en) * 2014-09-30 2016-04-27 中兴通讯股份有限公司 Bookmark display method and device for internet television on-demand content, and set top box
CN105704512A (en) * 2014-10-06 2016-06-22 财团法人资讯工业策进会 Video capturing system and video capturing method thereof
CN105898583A (en) * 2015-01-26 2016-08-24 北京搜狗科技发展有限公司 Image recommendation method and electronic equipment
CN104954866A (en) * 2015-06-19 2015-09-30 杭州施强网络科技有限公司 Dynamic control method for playing point in live broadcast of streaming media data
CN105609096A (en) * 2015-12-30 2016-05-25 小米科技有限责任公司 Text data output method and device
CN105828192A (en) * 2016-03-22 2016-08-03 乐视网信息技术(北京)股份有限公司 Multi-terminal video continuous playing method and device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
李梦瑶: "基于语音识别技术的移动全能秘书平台设计", 《软件导刊》 *

Also Published As

Publication number Publication date
CN107886939B (en) 2021-03-30

Similar Documents

Publication Publication Date Title
US10140982B2 (en) Method for using pauses detected in speech input to assist in interpreting the input during conversational interaction for information retrieval
EP2700071B1 (en) Speech recognition using multiple language models
US9799323B2 (en) System and method for low-latency web-based text-to-speech without plugins
CN109858038B (en) Text punctuation determination method and device
CN111081280B (en) Text-independent speech emotion recognition method and device and emotion recognition algorithm model generation method
CN104756181B (en) Playback reproducer, setting device, back method and program
CN111199732B (en) Emotion-based voice interaction method, storage medium and terminal equipment
US8447603B2 (en) Rating speech naturalness of speech utterances based on a plurality of human testers
CN108885869A (en) The playback of audio data of the control comprising voice
EP3321927A1 (en) Voice interaction method and voice interaction device
CN110136715B (en) Speech recognition method and device
JP7274210B2 (en) Dialog systems and programs
US9613616B2 (en) Synthesizing an aggregate voice
CN110399315B (en) Voice broadcast processing method and device, terminal equipment and storage medium
WO2022105693A1 (en) Sample generation method and apparatus
CN110650250A (en) Method, system, device and storage medium for processing voice conversation
CN104239442A (en) Method and device for representing search results
CN110503944A (en) The training of voice wake-up model and application method and device
CN102881309A (en) Lyric file generating and correcting method and device
WO2014176489A2 (en) A system and method for supervised creation of personalized speech samples libraries in real-time for text-to-speech synthesis
CN114125506B (en) Voice auditing method and device
US20230410791A1 (en) Text-to-speech synthesis method, electronic device, and computer-readable storage medium
CN111414748A (en) Traffic data processing method and device
CN107886939A (en) A kind of termination splice text voice playing method and device in client
CN113536029B (en) Method and device for aligning audio and text, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant