CN107886939A

CN107886939A - A kind of termination splice text voice playing method and device in client

Info

Publication number: CN107886939A
Application number: CN201610871990.5A
Authority: CN
Inventors: 熊健南; 莫文
Original assignee: Beijing Jingdong Century Trading Co Ltd; Beijing Jingdong Shangke Information Technology Co Ltd
Current assignee: Beijing Jingdong Century Trading Co Ltd; Beijing Jingdong Shangke Information Technology Co Ltd
Priority date: 2016-09-30
Filing date: 2016-09-30
Publication date: 2018-04-06
Anticipated expiration: 2036-09-30
Also published as: CN107886939B

Abstract

The present invention provides a kind of termination splice text voice playing method and device in client, can solve the problems, such as that loading velocity is excessively slow during digital document speech play, shortens period of reservation of number, improves Consumer's Experience.The termination splice text voice player method in client of the present invention includes：Receive speech play order of the user to text；Text, while stop voice document in broadcasting are obtained from the corresponding digital document of server；Complete text acquisition after, whether stop voice document finishes in inspection, if finishing, since text with speech production and broadcasting the middle corresponding position of stop voice document ending；When user sends the order for the speech play for stopping text, the position of currently playing termination and using stop in the location updating in recording text, generate it is current in text in setting length before and after stop text chunk corresponding to voice document and replacement in stop voice document.

Description

A kind of termination-splice text voice playing method and device in client

Technical field

The present invention relates to computer and its software technology field, a kind of particularly termination-splice in client Text voice playing method and device.

Background technology

With the development of mobile Internet, the utilization to voice technology is more and more, to the massage voice reading of digital document Increasingly popularize.Under many scenes, such as when driving or under the medium scene of congested traffic instrument, carry out vision reading It is not very convenient.Therefore, in a mobile device, rapidly text is loaded and is parsed and carry out massage voice reading, into For a kind of welcome application.

The scheme read aloud at present digital document mainly first reads digital document file and parsed, Ran Houti The content of text in digital document is taken, finally calls voice module to be read aloud.Idiographic flow as shown in figure 1, according to Fig. 1, The overall procedure that existing digital document is read aloud mainly includes：

S11：The digital document under particular path is read, is loaded into internal memory；

S12：To the digital document file being already loaded into internal memory, its structure is parsed to obtain the information in inside；

Wherein, for PDF document, each page therein, and object (these that these pages are related are mainly parsed Contain text information in object)；For ePub files, lists of documents therein and corresponding chapters and sections sequential file are mainly parsed Each chapters and sections file (html file) is obtained, for text type (txt) file, then directly obtains text.

S13：Extract the content of text in digital document；

Wherein, for PDF document, from the content object of every page, the object of text type is taken out；For ePub texts Part, chapters and sections file is parsed, wherein each paragraph is obtained, then only takes paragraph text therein；For the file of text type, directly Connect the result using previous step (S12 steps).

S14：Document is submitted into massage voice reading module to be read aloud.

The defects of certain be present in above-mentioned scheme, be mainly reflected in parsing document speed it is not fast enough, and need solving Document is analysed and could start to read aloud (broadcasting) when needing to have extracted text, caused period of reservation of number long, influence user's body Test.

The content of the invention

In view of this, the present invention provides a kind of termination-splice text voice playing method and device in client, energy Enough solve the problems, such as that loading velocity is excessively slow during digital document speech play, shorten period of reservation of number, improve Consumer's Experience.

To achieve the above object, according to an aspect of the invention, there is provided a kind of termination-splice text in client This speech playing method.

A kind of termination-splice text voice player method in client, the text is associated with a middle stop, Stop is the position of the termination of previous speech play in the text in this, and the middle stop is corresponding in client guarantor The middle stop voice document deposited, the middle stop voice document correspond to the text of the setting length in this in the text before and after stop This section, wherein, when playing the first time that this broadcasting is the text, stop is the starting point of the text in this, and it is described in Stop voice document includes predetermined voice message, and methods described includes：Receive speech play order of the user to the text； The text is obtained from the corresponding digital document of server, while plays the middle stop voice document；When the completion text After this acquisition, check whether the middle stop voice document finishes, and when the middle stop voice document plays Bi Shi, call corresponding VODER since the text with language the middle corresponding position of stop voice document ending Sound is generated and played；When user sends the order for the speech play for stopping the text, record currently playing in the text The position of termination simultaneously utilizes the setting length before and after current middle stop in middle stop described in the location updating, and the generation text Voice document corresponding to the text chunk of degree, and replace the middle stop voice document with the voice document generated.

Alternatively, wherein, the step of obtaining the text, includes：Read the digital document and be loaded into and be locally stored Device；According to digital document described in the format analysis of the digital document to identify content of text therein；Extract the numeral Content of text in document simultaneously forms the text.

Alternatively, wherein, obtain the text and timing also is carried out to the time for obtaining the text using timer To determine to obtain the duration needed for the text, and determine therefrom that the length of text chunk corresponding to the middle stop voice document is made For the setting length so that the time that the middle stop voice document is completed needed for broadcasting is more than the duration.

Alternatively, the voice document corresponding to the text chunk of the setting length in the text before and after current middle stop is generated The step of, including：According to the text chunk of the setting rule interception setting length before and after the current middle stop of the text；And Record the end position of text section；And institute's voice file is generated according to text section using the VODER.

Alternatively, the setting rule includes：Before and after current middle stop the setting is intercepted according to given ratio The text chunk of length.

Alternatively, the type of the form of the digital document includes PDF, ePub, txt.

According to another aspect of the present invention, there is provided a kind of termination-splice text voice playing device in client.

A kind of termination-splice text voice playing device in client, the text is associated with a middle stop, Stop is the position of the termination of previous speech play in the text in this, and the middle stop is corresponding in client guarantor The middle stop voice document deposited, the middle stop voice document correspond to the text of the setting length in this in the text before and after stop This section, wherein, when playing the first time that this broadcasting is the text, stop is the starting point of the text in this, and it is described in Stop voice document includes predetermined voice message, and described device includes Order receiver module, text acquisition module, speech play Module and file generating module, wherein：The Order receiver module is used to receive speech play order of the user to the text； The text acquisition module is used to obtain the text from the corresponding digital document of server, while by the speech play mould Block plays the middle stop voice document；The voice playing module is used for when the text acquisition module completes the text After acquisition, check whether the middle stop voice document finishes, and when the middle stop voice document finishes, Corresponding VODER is called to be given birth to since the text with voice the middle corresponding position of stop voice document ending Into and play；The file generating module is used for when user sends the order for the speech play for stopping the text, records institute State in text the position of currently playing termination and using in current in middle stop described in the location updating, and the generation text Voice document corresponding to the text chunk of setting length before and after stop, and replace the middle stop with the voice document generated Voice document.

Alternatively, wherein, the text acquisition module is additionally operable to：Read the digital document and be loaded into and be locally stored Device；According to digital document described in the format analysis of the digital document to identify content of text therein；Extract the numeral Content of text in document simultaneously forms the text.

Alternatively, wherein, the text acquisition module is additionally operable to：The time for obtaining the text is carried out using timer Timing determines therefrom that the length of text chunk corresponding to the middle stop voice document to determine to obtain the duration needed for the text Degree is used as the setting length so that the time that the middle stop voice document is completed needed for broadcasting is more than the duration.

Alternatively, the file generating module is additionally operable to：According to setting rule before and after the current middle stop of the text Intercept the text chunk of the setting length；And record the end position of text section；And using the VODER according to Text section generation institute voice file.

Technique according to the invention scheme, speech play order of the user to text is received, from the corresponding numeral of server Text is obtained in document, while plays middle stop voice document corresponding to the middle stop of previous speech play in the text of preservation； When complete text obtain after, whether stop voice document finishes in inspection, if finishing, from text with middle stop language Position corresponding to sound EOF starts speech production and broadcasting；When user sends the order for the speech play for stopping text, In recording text the position of currently playing termination and using stop in current in stop in the location updating, and generation text before Voice document corresponding to the text chunk of setting length afterwards, and stop voice document in being replaced with the voice document that is generated. Using technical scheme, the problem of loading velocity is excessively slow when can solve the problem that digital document speech play, shorten user etc. Treat the time, improve Consumer's Experience.

Brief description of the drawings

Accompanying drawing is used to more fully understand the present invention, does not form inappropriate limitation of the present invention.Wherein：

Fig. 1 is the overall procedure schematic diagram that the digital document of prior art is read aloud；

Fig. 2 is the main step of termination-splice text voice player method according to embodiments of the present invention in client Rapid schematic diagram；

Fig. 3 is the preferred stream of termination-splice text voice player method according to embodiments of the present invention in client Journey schematic diagram；

Fig. 4 is the main mould of termination-splice text voice playing device according to embodiments of the present invention in client Block schematic diagram.

Embodiment

The one exemplary embodiment of the present invention is explained below in conjunction with accompanying drawing, including the various of the embodiment of the present invention Details should think them only exemplary to help understanding.Therefore, those of ordinary skill in the art should recognize Arrive, various changes and modifications can be made to the embodiments described herein, without departing from scope and spirit of the present invention.Together Sample, for clarity and conciseness, the description to known function and structure is eliminated in following description.

Fig. 2 is the main step of termination-splice text voice player method according to embodiments of the present invention in client Rapid schematic diagram.

As shown in Fig. 2 the termination in client of the embodiment of the present invention-splice text voice player method mainly includes Steps S21 to step S24.

Text in the present embodiment is associated with a middle stop, and stop is in the text in previous speech play in this Position only, and the corresponding middle stop voice document preserved in client of middle stop, middle stop voice document are corresponded in text The text chunk of setting length in this before and after stop, wherein, when playing the first time that this broadcasting is the text, stop in this It is the starting point of the text, and middle stop voice document includes predetermined voice message.The client of the embodiment of the present invention can be with For mobile device, such as mobile phone, Pad, e-book embedded device, or the fixed equipment such as desktop computer.

Step S21：Receive speech play order of the user to text.

Step S22：Text, while stop voice document in broadcasting are obtained from the corresponding digital document of server.

The type of the form of digital document can include PDF, ePub, txt, or other kinds of digital document.

Wherein, the step of obtaining text specifically includes：Read digital document and be loaded into local storage；According to digital text The format analysis digital document of shelves is to identify content of text therein；Extract the content of text in digital document and form text This.

Obtain text and timing also is carried out to the time for obtaining text using timer to determine to obtain needed for text Duration, and determine therefrom that the length of text chunk corresponding to middle stop voice document as setting length so that middle stop voice text The time that part completes needed for broadcasting is more than above-mentioned duration.

Step S23：After text acquisition is completed, whether stop voice document finishes in inspection, and central stop When voice document finishes, corresponding VODER is called to be opened from text with the middle corresponding position of stop voice document ending Beginning speech production and broadcasting.

Step S24：When user sends the order for the speech play for stopping text, currently playing termination in recording text Position is simultaneously right using the text chunk institute of the setting length before and after current middle stop in stop in the location updating, and generation text The voice document answered, and stop voice document in being replaced with the voice document that is generated.

The voice document corresponding to the text chunk of the setting length in text before and after current middle stop is generated, mainly in text The front and rear text chunk according to setting rule interception setting length of this current middle stop；And record the end position of text section； And voice document is generated according to text section using VODER.

Wherein, setting rule can include：According to given ratio interception setting length before and after current middle stop Text chunk.

Fig. 3 shows the preferred stream of the termination in client-splice text voice player method of the embodiment of the present invention Journey schematic diagram.Wherein：

After client receives user to the speech play order of text, first check for whether having in local cache Middle stop voice document, if so, then playing the middle stop voice document in caching by voice playing module, otherwise, play pre- Fixed voice message (not shown).Wherein, middle stop is position corresponding when the last speech play text stops, User it is upper once stop the speech play text when save the text chunk of setting length in this before and after stop, and according to this Text chunk generates voice document and is stored in the local cache of client.If it is currently to carry out voice to the text first to broadcast Put, then play predetermined voice message, such as the voice such as " current document loads ", and the voice message is recycled and broadcast Put.

Client is while stop voice document during voice playing module plays or predetermined voice message, from server Corresponding digital document in obtain text, detailed process includes：First, digital document is read：By digital document store path from Server reads digital document, and is loaded into local storage；2nd, digital document is parsed：To being already loaded into local storage In digital document, according to document format to digital document carry out structure elucidation, to identify content of text therein, for example, right In PDF document, each page therein is mainly parsed, and the object that these pages are related (is believed in these objects containing word Breath), for ePub files, lists of documents therein and corresponding chapters and sections sequential file are mainly parsed, to obtain each chapter File (html file) is saved, for the file (txt file) of text type, then directly obtains text；3rd, extract in digital document Content of text and form text：Wherein, for PDF document, mainly from the content object of every page, text class is extracted The object of type, for ePub files, chapters and sections file is mainly parsed, each paragraph is obtained, then only extracts paragraph In text, for the file (txt file) of text type, due to text can be directly obtained by parsing, then directly using solving Analyse obtained text；4th, record obtains the duration needed for text：The time for obtaining text is counted using timer When, to determine to obtain the duration needed for text, and determine therefrom that play stop when for generating the text chunk of voice document Length, text section for play middle stop before and after setting length text chunk, and generation voice document by as in Stop voice document is stored in the local cache of client, specifically, can be carried out by the duration and default broadcasting word speed Calculate and determine the numerical value of the setting length, for example, it be 120 words per minute clocks to play word speed, needed for acquisition text when a length of 5 seconds, Then the product of the two is multiplied with a default coefficient A can draw the length of text chunk, and the default coefficient A can be voluntarily Set, such as could be arranged to 12, then, duration * plays word speed * predetermined coefficient A=120 words.So, in 120 words per minute clocks Play under word speed, play the time that finishes playing of the middle stop voice document generated according to the text chunk of the length as 1 minute.Reason By upper, when carrying out speech play every time, the duration obtained needed for text is identical, therefore, word speed and pre- is played in identical If in the case of coefficient value, when carrying out speech play every time in time that finishes playing of stop voice document be also identical.But The influence of the factors such as CPU, the internal memory of the client used every time is allowed for, may cause to obtain the duration needed for text every time Have differences, therefore, when setting default coefficient A concrete numerical value, during the text segment length that should to be calculated meets The time that finishes playing of stop voice document be longer than generally obtain text duration (for do not consider the CPU of client, internal memory etc. because Duration under the influence of element), that is, assume generally obtain text needed for when a length of 5 seconds, then, by being calculated as below：Duration * is played The text segment length that word speed * predetermined coefficients A is determined, and the time that finishes playing of voice document generated should be longer than 5 seconds, for example, Coefficient A=12 is set so that under the broadcasting word speed of 120 words per minute clocks, the time that finishes playing of middle stop voice document is 1 point Clock.When carrying out speech play this avoid next time, the factor such as CPU, internal memory by client is influenceed, middle stop voice document Text does not obtain situation about finishing also and occurred when completing to play.

After client completes the acquisition of text, whether stop voice document finishes in voice playing module inspection, And when central stop voice document finishes, since text with voice the middle corresponding position of stop voice document ending Generation and broadcasting.Voice playing module concretely massage voice reading SDK.

When receiving the order of speech play for the termination text that user sends, currently playing termination in recording text Position, and the voice document corresponding to the text chunk of the setting length in text before and after the position is generated, and in local cache Generated voice document is preserved to replace the middle stop voice document of current local cache, when carrying out speech play so as to next time The middle stop voice document of this generation is played while text is obtained.Setting length can be intercepted according to setting rule Text chunk, specifically, it can be intercepted before and after current middle stop according to given ratio, for example, the ratio can be set It is set to 1:3, it is assumed that by being calculated as below：Obtain the duration * needed for text and play word speed * predetermined coefficient A=120 words, then can To intercept 120 word * 1/4=30 words before the position of currently playing termination, 120 are intercepted after the position of currently playing termination Word * 3/4=90 words.Then, the end position of intercepted text chunk, such as the chapter paragraph where record end position are recorded Fall, the information such as character, and voice document is generated according to text section using VODER.

The voice document of generation is stored in local cache, as middle stop voice text of the next time when carrying out speech play Part.

Fig. 4 is the main mould of termination-splice text voice playing device according to embodiments of the present invention in client Block schematic diagram.Wherein, the text of the embodiment of the present invention is associated with a middle stop, and stop is that previous voice is broadcast in text in this The position for the termination put, and the corresponding middle stop voice document preserved in client of middle stop, middle stop voice document are corresponding The text chunk of setting length in this in text before and after stop, wherein, should when playing the first time that this broadcasting is the text Middle stop is the starting point of the text, and middle stop voice document includes predetermined voice message.

Termination in client-splice text voice playing device 40 according to embodiments of the present invention mainly includes：Life Make receiving module 41, text acquisition module 42, voice playing module 43 and file generating module 44.

Wherein：Order receiver module 41 is used to receive speech play order of the user to text；Text acquisition module 42 is used Text is obtained in the corresponding digital document from server, while stop voice document in being played by voice playing module 43；Language Sound playing module 43 is used for after text acquisition module 42 completes the acquisition of text, and whether stop voice document plays in inspection Finish, and when central stop voice document finishes, call corresponding VODER from text with middle stop voice text Position corresponding to part ending starts speech production and broadcasting；File generating module 44 is used for when user sends the voice of termination text The position of currently playing termination and stop in the location updating, and generation text are utilized during the order of broadcasting, in recording text In it is current in setting length before and after stop text chunk corresponding to voice document, and with the voice document replacement generated Stop voice document.

Text acquisition module 42 can be also used for reading digital document and be loaded into local storage；According to digital document Format analysis digital document is to identify content of text therein；Extract the content of text in digital document and form text.

In addition, text acquisition module 42 can be also used for：Timing is carried out with true to the time for obtaining text using timer Surely the duration needed for text is obtained, and determines therefrom that the length of text chunk corresponding to middle stop voice document is used as setting length, So that the time that middle stop voice document completes needed for broadcasting is more than the duration.

File generating module 44 can be also used for：According to setting rule interception setting length before and after the current middle stop of text The text chunk of degree；And record the end position of text section；And voice text is generated according to text section using VODER Part.Wherein, setting rule can specifically include：According to the text of given ratio interception setting length before and after current middle stop This section.

The type of the form of digital document includes but is not limited to PDF, ePub, txt.

Technical scheme according to embodiments of the present invention, speech play order of the user to text is received, from pair of server Answer and text is obtained in digital document, while play middle stop voice corresponding to the middle stop of previous speech play in the text of preservation File；After text acquisition is completed, whether stop voice document finishes in inspection, if finishing, from text with Position corresponding to the ending of stop voice document starts speech production and broadcasting；When user sends the life for the speech play for stopping text When making, the position of currently playing termination and using in current in stop in the location updating, and generation text in recording text Voice document corresponding to the text chunk of setting length before and after stop, and stop voice in being replaced with the voice document that is generated File.Using the technical scheme of the embodiment of the present invention, the problem of loading velocity is excessively slow when can solve the problem that digital document speech play, Shorten period of reservation of number, improve Consumer's Experience.

Above-mentioned embodiment, does not form limiting the scope of the invention.Those skilled in the art should be bright It is white, depending on design requirement and other factors, various modifications, combination, sub-portfolio and replacement can occur.It is any Modifications, equivalent substitutions and improvements made within the spirit and principles in the present invention etc., should be included in the scope of the present invention Within.

Claims

1. a kind of termination-splice text voice player method in client, it is characterised in that the text and a termination Point is associated, and stop is the position of the termination of previous speech play in the text in this, and the middle stop is corresponding in institute The middle stop voice document of client preservation is stated, the middle stop voice document corresponds to setting before and after stop in this in the text The text chunk of measured length, wherein, when playing the first time that this broadcasting is the text, stop is the starting point of the text in this, And the middle stop voice document includes predetermined voice message, methods described includes：

Receive speech play order of the user to the text；

The text is obtained from the corresponding digital document of server, while plays the middle stop voice document；

After the acquisition of the text is completed, check whether the middle stop voice document finishes, and work as the termination When point voice document finishes, corresponding VODER is called to be ended up from the text with the middle stop voice document Corresponding position starts speech production and broadcasting；

When user sends the order for the speech play for stopping the text, the position of currently playing termination in the text is recorded And utilize the text chunk of the setting length before and after current middle stop in middle stop described in the location updating, and the generation text Corresponding voice document, and replace the middle stop voice document with the voice document generated.

2. according to the method for claim 1, it is characterised in that wherein, the step of obtaining the text includes：

Read the digital document and be loaded into local storage；

According to digital document described in the format analysis of the digital document to identify content of text therein；

Extract the content of text in the digital document and form the text.

3. according to the method for claim 1, it is characterised in that wherein, obtain the text also using timer pair The time for obtaining the text carries out timing to determine to obtain the duration needed for the text, and determines therefrom that the middle stop language The length of text chunk corresponding to sound file is as the setting length so that the middle stop voice document is completed needed for broadcasting Time is more than the duration.

4. according to the method for claim 1, it is characterised in that generate the setting length before and after current middle stop in the text The step of voice document corresponding to the text chunk of degree, including：

According to the text chunk of the setting rule interception setting length before and after the current middle stop of the text；

And record the end position of text section；And

Institute's voice file is generated according to text section using the VODER.

5. according to the method for claim 4, it is characterised in that the setting rule includes：

According to the text chunk of the given ratio interception setting length before and after current middle stop.

6. according to the method for claim 1, it is characterised in that the type of the form of the digital document include PDF, ePub、txt。

7. a kind of termination-splice text voice playing device in client, it is characterised in that the text and a termination Point is associated, and stop is the position of the termination of previous speech play in the text in this, and the middle stop is corresponding in institute The middle stop voice document of client preservation is stated, the middle stop voice document corresponds to setting before and after stop in this in the text The text chunk of measured length, wherein, when playing the first time that this broadcasting is the text, stop is the starting point of the text in this, And the middle stop voice document includes predetermined voice message, and described device includes Order receiver module, text obtains mould Block, voice playing module and file generating module, wherein：

The Order receiver module, for receiving speech play order of the user to the text；

The text acquisition module, for obtaining the text from the corresponding digital document of server, while by the voice Playing module plays the middle stop voice document；

The voice playing module, for after the acquisition of the text acquisition module completion text, checking the termination Whether point voice document finishes, and when the middle stop voice document finishes, calls corresponding phonetic synthesis Device since the text with speech production and broadcasting the middle corresponding position of stop voice document ending；

The file generating module, during order for sending the speech play for stopping the text as user, record the text The position of currently playing termination and stop in current in middle stop described in the location updating, and the generation text is utilized in this Voice document corresponding to the text chunk of front and rear setting length, and replace the middle stop voice with the voice document generated File.

8. device according to claim 7, it is characterised in that wherein, the text acquisition module is additionally operable to：

Read the digital document and be loaded into local storage；

Extract the content of text in the digital document and form the text.

9. device according to claim 7, it is characterised in that wherein, the text acquisition module is additionally operable to：

Timing is carried out to determine to obtain the duration needed for the text to the time for obtaining the text using timer, and accordingly Determine the length of text chunk corresponding to the middle stop voice document as the setting length so that the middle stop voice text The time that part completes needed for broadcasting is more than the duration.

10. device according to claim 7, it is characterised in that the file generating module is additionally operable to：

And record the end position of text section；And

Institute's voice file is generated according to text section using the VODER.

11. device according to claim 10, it is characterised in that the setting rule includes：

12. device according to claim 7, it is characterised in that the type of the form of the digital document include PDF, ePub、txt。