CN106228972B

CN106228972B - Method and system are read aloud in multi-language text mixing towards intelligent robot system

Info

Publication number: CN106228972B
Application number: CN201610537801.0A
Authority: CN
Inventors: 王合心
Original assignee: Beijing Guangnian Wuxian Technology Co Ltd
Current assignee: Beijing Guangnian Wuxian Technology Co Ltd
Priority date: 2016-07-08
Filing date: 2016-07-08
Publication date: 2019-09-27
Anticipated expiration: 2036-07-08
Also published as: CN106228972A

Abstract

The invention discloses a kind of, and method and system are read aloud in the multi-language text mixing towards intelligent robot system, and this method includes that the multi-language text for the bright read output to be mixed that intelligent robot end will acquire is sent to Cloud Server；Cloud Server marks the type of different speech synthesis engines according to the language form of the multi-language text, and the result of mark feedback is back to intelligent robot end；Corresponding speech synthesis engine is called to carry out bright read output to the multi-language text according to the information of feedback in intelligent robot end.This method solve mixing in the prior art to read aloud that flexibility is low, and problem at high cost and low accuracy improves user experience.

Description

Method and system are read aloud in multi-language text mixing towards intelligent robot system

Technical field

The invention belongs to field in intelligent robotics more particularly to a kind of multi-language text towards intelligent robot system are mixed Method and system are read aloud in conjunction.

Background technique

With the extensive use of intelligent robot, more and more answered for what multilingual mixing interleaved order was read aloud In.

The voice output of intelligent robot passes through text-to-speech (Text To Speech, TTS) technology mainly to realize. Existing multilingual mixing interleaved order is read aloud, and most of realized by a tts engine, such as common Chinese and English Mixing is read aloud.

The problem of above scheme, is, in order to realize that Chinese and English mixing is read aloud, it is necessary to select and support Chinese, English bright Tts engine is read, while often there is a phenomenon where the bright read errors of intersection for this engine for supporting multilingual mixing to read aloud, therefore lack Weary flexibility.In addition, the languages for supporting mixing to read aloud are less, such as Sino-British mixing tts engine is common, still Sino-Russian, Sino-Japan etc. It is less to mix tts engine.And TTS is supported to mix the engine higher cost read aloud.

Summary of the invention

The first technical problem to be solved by the present invention be need to provide it is a kind of for realizing the multilingual of multi-language text Mix the method read aloud.

In order to solve the above-mentioned technical problem, embodiments herein provides firstly a kind of towards intelligent robot system Method is read aloud in multi-language text mixing, the multi-language text transmission including the bright read output to be mixed that intelligent robot end will acquire To Cloud Server；Cloud Server marks the type of different speech synthesis engines according to the language form of the multi-language text, And the result of mark feedback is back to intelligent robot end；Corresponding voice is called to close according to the information of feedback in intelligent robot end Bright read output is carried out to the multi-language text at engine.

Preferably, the Cloud Server marks different speech synthesis engines according to the language form of the multi-language text Type, comprising: text is divided by least one text chunk according to the language form of the multi-language text；Based on each text The language form of section marks the type of speech synthesis engine corresponding with this section of text.

Preferably, the speech synthesis engine is the speech synthesis engine of single languages.

Preferably, the result feedback by mark is back to intelligent robot end, comprising: by each text chunk and with this The type package of the corresponding speech synthesis engine of Duan Wenben is array, wherein each text chunk corresponds to one in array Array element；Array feedback is back to intelligent robot end.

Preferably, the intelligent robot end calls corresponding speech synthesis engine to described multi-lingual according to the information of feedback Say that text carries out bright read output, comprising: be successively read each array element of the array, and solve to the data element Analysis；Corresponding speech synthesis engine is called according to the type of the speech synthesis engine marked in parsing result；Utilize the language of calling Sound Compositing Engine carries out bright read output to the multi-language text.

Embodiments herein additionally provides a kind of bright read apparatus of multi-language text mixing towards intelligent robot system, Include: transmission module, be located at intelligent robot end, the multi-language text for the bright read output to be mixed that will acquire is sent to cloud clothes Business device；Feedback module is marked, Cloud Server is located at, different voices is marked according to the language form of the multi-language text and is closed Intelligent robot end is back at the type of engine, and by the result of mark feedback；Output module is read aloud, intelligent robot is located at End calls corresponding speech synthesis engine to carry out bright read output to the multi-language text according to the information of feedback.

Preferably, the mark feedback module is marking different voice conjunctions according to the language form of the multi-language text At engine type when, text is divided by least one text chunk according to the language form of the multi-language text, and be based on The language form of each text chunk marks the type of speech synthesis engine corresponding with this section of text.

Preferably, the mark feedback module, will be described each when the result feedback of mark is back to intelligent robot end The type package of text chunk and speech synthesis engine corresponding with this section of text is array, wherein each text chunk corresponds to An array element in array；And the array feedback is back to intelligent robot end.

Preferably, the output module of reading aloud is calling corresponding speech synthesis engine to described more according to the information of feedback When language text carries out bright read output, it is successively read each array element of the array, and parse to the data element； Corresponding speech synthesis engine is called according to the type of the speech synthesis engine marked in parsing result；It is closed using the voice of calling Bright read output is carried out to the multi-language text at engine.

Compared with prior art, one or more embodiments in above scheme can have following advantage or beneficial to effect Fruit:

It is segmented by the multi-language text for treating bright read output according to language form, and for the text that division obtains Section calls the speech synthesis engine of different single languages respectively to complete the bright read output of multilingual mixing, solves existing skill Mixing reads aloud that flexibility is low in art, and problem at high cost and low accuracy improves user experience.

Other advantages, target and feature of the invention will be illustrated in the following description to a certain extent, and And to a certain extent, based on will be apparent to those skilled in the art to investigating hereafter, Huo Zheke To be instructed from the practice of the present invention.Target and other advantages of the invention can be wanted by following specification, right Specifically noted structure is sought in book and attached drawing to be achieved and obtained.

Detailed description of the invention

Attached drawing is used to provide to the technical solution of the application or further understanding for the prior art, and constitutes specification A part.Wherein, the attached drawing for expressing the embodiment of the present application is used to explain the technical side of the application together with embodiments herein Case, but do not constitute the limitation to technical scheme.

Fig. 1 is to read aloud method according to the multi-language text mixing towards intelligent robot system of first embodiment of the invention Flow diagram；

Fig. 2 is to read aloud method according to the multi-language text mixing towards intelligent robot system of second embodiment of the invention Flow diagram；

Fig. 3 is to read aloud method according to the multi-language text mixing towards intelligent robot system of third embodiment of the invention Flow diagram；

Fig. 4 is to mix bright read apparatus according to the multi-language text towards intelligent robot system of fourth embodiment of the invention Structural schematic diagram.

Specific embodiment

Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings and examples, how to apply to the present invention whereby Technological means solves technical problem, and the realization process for reaching relevant art effect can fully understand and implement.This Shen Please each feature in embodiment and embodiment, can be combined with each other under the premise of not colliding, be formed by technical solution It is within the scope of the present invention.

First embodiment:

Fig. 1 is to read aloud method according to the multi-language text mixing towards intelligent robot system of one embodiment of the invention Flow diagram, as shown, this method comprises:

Step S110, the multi-language text for the bright read output to be mixed that intelligent robot end will acquire is sent to cloud service Device.

Step S120, Cloud Server marks the class of different speech synthesis engines according to the language form of multi-language text Type, and the result of mark feedback is back to intelligent robot end.

Step S130, intelligent robot end calls corresponding speech synthesis engine to multi-language text according to the information of feedback Carry out bright read output.

In step s 110, the multi-language text that bright read output to be mixed is received by intelligent robot end, can pass through Internal database obtains, and can also be inputted by user by the reception device at intelligent robot end.In embodiments of the present invention, Processing to multi-language text is to complete in Cloud Server, therefore intelligent robot end is then by bright read output to be mixed Multi-language text is sent to Cloud Server.

In the step s 120, Cloud Server handles the multi-language text received.By to multi-language text into Row analysis obtains language form included in text, and the language form of corresponding multi-language text marks different speech syntheses The type of engine.

Assuming that in the multi-language text of the present embodiment comprising at least two or more language text, in the prior art, Generally realized by calling the TTS mixing Compositing Engine read aloud corresponding to support multilingual.And in implementation of the invention In example, before calling TTS Compositing Engine to read aloud multi-language text, first the content of multi-language text is analyzed.

Specifically, text is divided at least one text chunk according to the language form of multi-language text, then it is based on each text This section of language form marks the type of TTS speech synthesis engine corresponding with this section of text.By divide obtain to more Language text only includes a kind of single language form inside each text chunk, therefore reads aloud respectively each text chunk, The TTS speech synthesis engine of single languages can be only called to complete to read aloud.Due to no longer needing multilingual calling speech synthesis Engine, thus be conducive to improve TTS speech synthesis accuracy and raising read aloud quality.

Further, in the step s 120, division and the voice of the text chunk of multi-language text are completed in Cloud Server After the mark of Compositing Engine, the result feedback of mark is back to intelligent robot end.Specifically, Cloud Server obtains division Each text chunk and speech synthesis engine corresponding with this section of text type package be array, wherein each text chunk pair It should an array element in array.It again will be by the full text section of multi-language text and language corresponding with this section of text The feedback of array composed by the type of sound Compositing Engine is back to intelligent robot end.

In an embodiment of the present invention, returned text section and the type of corresponding speech synthesis engine letter in the form of array Breath can be convenient next inquiry and read aloud with execution, is conducive to the combined coefficient for improving TTS speech synthesis.

In step s 130, intelligent robot end receives the array that feeds back from Cloud Server, and by array into Row parsing implements bright read output to multi-language text.Specifically, each array member being successively read in the array fed back Element, and each array element is parsed.The type that the speech synthesis engine of mark is obtained by parsing array element, further according to The type of speech synthesis engine calls corresponding speech synthesis engine to carry out bright read output to multi-language text.

Due to being segmented to multi-language text, and each text chunk only includes single language form, because This is the speech synthesis engine of single languages according to the speech synthesis engine that markup information calls.The speech synthesis of single languages is drawn It holds up and stablizes compared with mixing voice Compositing Engine, and cost is lower.The embodiment of the present invention has while reducing TTS speech synthesis cost Conducive to the accuracy for increasing speech synthesis, quality is read aloud in raising, improves user experience.

Second embodiment:

Fig. 2 is to read aloud method according to the multi-language text mixing towards intelligent robot system of second embodiment of the invention Flow diagram be further illustrated in the present embodiment to how multi-language text being divided into different text chunks, It is described in detail below only for it with the difference in first embodiment.

In step S210, the urtext information of bright read output to be mixed is sent to cloud service by intelligent robot end Device, the step execute operation identical with step S110 in first embodiment, repeat no more.

Next, Cloud Server is segmented multi-language text according to the language form of multi-language text.Specifically, root Text chunk is divided according to the natural paragraph of multi-language text.

Each natural paragraph being successively read in multi-language text, and by analyzing and determining whether in the paragraph be same language Type is sayed, as shown in the step S220 of Fig. 2.If only including a kind of language form in the nature paragraph, according to the class of languages Type marks the type of corresponding speech synthesis engine, as shown in step S230.If in the nature paragraph including two kinds or two kinds Above language form then drops into row further division to the paragragh, as shown in step S240.

When the paragraph to the language form comprising two or more carries out further division, can use with fixation The mode that the section of length divides multi-language text.Specifically, one is marked off from paragraph first in step S240 With the new paragraph consistent in length of preset paragraph, in this way, former paragraph is divided into two paragraphs.Then successively investigate this two A paragraph judges whether in each paragraph divided be same language type, as shown in figure step S250.

If only including a kind of language form in the paragraph divided according to the length of preset paragraph, then follow the steps S230 marks the type of corresponding speech synthesis engine according to the language form.

If in the paragraph divided according to the length of preset paragraph still including two or more class of languages Type, then return step S240, the paragraph is divided according still further to preset bout length, it should be noted that divide again When preset paragraph when being less than preceding primary division of used preset bout length length, be denoted as the first default section respectively Fall length and the second default bout length.If also need repeatedly to divide herein, third is respectively adopted and presets bout length, the Four default bout lengths etc. indicate the length standard divided every time.

Then remaining paragraph after the paragraph for removing and dividing according to the length of preset paragraph is investigated again, investigates method It is identical, it repeats no more.

It should be noted that being divided to paragraph, at the ending of paragraph, when the length of remaining unallocated paragraph Less than preset paragraph length when, using the remaining content not divided as a text chunk, then judge respectively above-mentioned It whether include single language form in each new paragraph.

In step S260, after completing the mark of the type of speech synthesis engine of a paragraph, whether the section is judged For the last one paragraph, if not the last one paragraph, then return step S220 continues to investigate to next paragraph, if Through for the last one paragraph, then terminating annotation process.

In completing multi-language text after the mark of the type of the speech synthesis engine of the last one paragraph, by Cloud Server The result feedback of mark is back to intelligent robot end.Intelligent robot end corresponding speech synthesis is called according to the information of feedback Engine carries out bright read output to multi-language text.

The method of the embodiment of the present invention, by reference to the natural paragraph information in multi-language text come to multi-language text into Row divides, and the boundary of natural paragraph is easy to determining, and due to only including generally a kind of language form inside natural paragraph, This method is conducive to improve the speed of segmentation, while reducing the complexity of the operation of the type of segmentation and Markup Language Compositing Engine Degree.

The embodiment can be used for the division for being distributed more complex multi-language text of different language type.

3rd embodiment:

Fig. 3 is to read aloud method according to the multi-language text mixing towards intelligent robot system of third embodiment of the invention Flow diagram, in the present embodiment, using the sides being segmented to multi-language text different from second embodiment Formula is described in detail only for it with the difference in second embodiment below.

As shown in figure 3, if being transferred to step by judging to obtain in the natural paragraph investigated as same language type Rapid S330 is executed.In step S330, whether the language form for continuing to judge that the paragraph and previous paragraph are included is identical, if The paragraph is identical as the language form that previous paragraph is included, then the paragraph and previous paragraph is merged into a paragraph, merges Paragraph afterwards uses the same speech synthesis engine for being directed to previous paragraph mark.If the paragraph is wrapped with previous paragraph The language form contained is not identical, then marks corresponding speech synthesis engine according to the language form that the paragraph is included.

After merging paragraph or mark speech synthesis engine, judge whether the paragraph is the last one paragraph, if not The last one paragraph, then return step S320 continues to investigate to next paragraph, if being the last one paragraph, terminates Annotation process.

It is as also shown in fig. 3, when being obtained in the paragraph investigated not only by judgement comprising a kind of language form, execute with The identical operation of step S240 of second embodiment, marked off from paragraph one it is consistent in length new with preset paragraph Paragraph, then obtained paragraph will be newly divided as the paragraph currently investigated, it is obtained in the paragraph investigated only until by judgement When comprising a kind of language form, step S330 execution is gone to, that is, enters the circulation of right branch in Fig. 3.

By aforesaid operations, it can be realized and drawn with boundary between the true different language type in multi-language text Divide text chunk, advantageously reduces mark project.Especially when multi-language text is larger, the language form for being included is less, and point When cloth is more concentrated, the method for the embodiment of the present invention can reduce the project finally marked significantly.

In the present embodiment, the annotation results of speech synthesis engine are also to be returned in the form of array, when mark project subtracts When few, corresponding data element is also correspondingly reduced, and can simplify feedback result, conducive to the transmission of data.In addition, record After the array of annotation results is simplified, corresponding speech synthesis engine is obtained according to array element and text chunk is read aloud When output, operation is also simplified, and is conducive to the efficiency for improving bright read output, reduce stagnation during reading aloud with it is discontinuous Situation, improve user experience.

The method of division text chunk in the above second embodiment and 3rd embodiment is merely to illustrate to multi-language text Operability when being segmented, and do not constitute a limitation of the invention, can be realized will need mixing voice Compositing Engine bright The method that the multi-language text of read output is divided into multiple text chunks of the bright read output of speech synthesis engine using single languages, It is within the scope of the invention.

Fourth embodiment:

Fig. 4 is to mix bright read apparatus according to the multi-language text towards intelligent robot system of fourth embodiment of the invention Structural schematic diagram, as shown, the system includes:

Transmission module 41, is located at intelligent robot end, and the multi-language text for the bright read output to be mixed that will acquire is sent To Cloud Server.

Feedback module 42 is marked, is located at Cloud Server, different voices is marked according to the language form of multi-language text The type of Compositing Engine, and the result of mark feedback is back to intelligent robot end.

Output module 43 is read aloud, intelligent robot end is located at, calls corresponding speech synthesis to draw according to the information of feedback It holds up and bright read output is carried out to multi-language text.

Specifically, mark feedback module 42 marks different speech synthesis engines in the language form according to multi-language text Type when, text is divided by least one text chunk according to the language form of multi-language text, and based on each text chunk Language form marks the type of speech synthesis engine corresponding with this section of text.

Mark feedback module 42 when the result of mark feedback is back to intelligent robot end, by each text chunk and with the section The type package of the corresponding speech synthesis engine of text is array, wherein each text chunk corresponds to a number in array Group element, and obtained array feedback is back to intelligent robot end.

Read aloud output module 43 is calling corresponding speech synthesis engine to the multi-language text according to the information of feedback When carrying out bright read output, it is successively read each array element of the array, and data element is parsed, according to parsing result The type of the speech synthesis engine of middle mark calls corresponding speech synthesis engine, using the speech synthesis engine of calling to multi-lingual Say that text carries out bright read output.

Further, mark feedback module 42 can also be using the difference as proposed in second embodiment and 3rd embodiment Segmentation method multi-language text is segmented, details are not described herein again.

The multi-language text of the embodiment of the present invention mixes bright read apparatus, solve in the prior art to multi-language text into Inflexible problem when row bright read output.System in the present embodiment only needs to call the voice of multiple single languages to close The bright read output of multilingual mixing can be completed at engine, system constitutes simple, cost significant decrease.

Since the speech synthesis engine of single languages have been relatively mature, and type is compared with horn of plenty, therefore the present invention is implemented The multi-language text of example mixes bright read apparatus and can support to read aloud defeated due to lacking mixing voice Compositing Engine in the prior art Text out, is more widely applied.

Those skilled in the art should be understood that each module of the above invention or each step can use general calculating Device realizes that they can be concentrated on a single computing device, or be distributed in network constituted by multiple computing devices On, optionally, they can be realized with the program code that computing device can perform, it is thus possible to be stored in storage It is performed by computing device in device, perhaps they are fabricated to each integrated circuit modules or will be more in them A module or step are fabricated to single integrated circuit module to realize.In this way, the present invention is not limited to any specific hardware and Software combines.

Although disclosed herein embodiment it is as above, the content is only to facilitate understanding the present invention and adopting Embodiment is not intended to limit the invention.Any those skilled in the art to which this invention pertains are not departing from this Under the premise of the disclosed spirit and scope of invention, any modification and change can be made in the implementing form and in details, But scope of patent protection of the invention, still should be subject to the scope of the claims as defined in the appended claims.

Claims

1. method is read aloud in a kind of multi-language text mixing towards intelligent robot system, comprising:

The multi-language text for the bright read output to be mixed that intelligent robot end will acquire is sent to Cloud Server；

Cloud Server marks the type of different speech synthesis engines according to the language form of the multi-language text, and will mark Result feedback be back to intelligent robot end, wherein include:

Each natural paragraph being successively read in multi-language text, and judge in the paragraph whether to be same language type, include:

If in the nature paragraph only including a kind of language form, step 1 judges the language that the paragraph and previous paragraph are included Whether speech is identical, the paragraph and previous paragraph is then merged into a paragraph if they are the same, and the paragraph after merging uses the last period The type for falling the same speech synthesis engine of mark, the language form mark for being included according to the paragraph if not identical correspond to Speech synthesis engine type；

If in the nature paragraph including at least two language forms, step 2 carries out paragraph according to the first default bout length It divides, successively judges whether in each paragraph divided be same language type, if dividing according to the first default bout length To paragraph in only include a kind of language form, then return step one, otherwise returns the paragraph according still further to preset bout length It returns in step 2 and carries out paragraph division again, wherein first when bout length when division is less than preceding primary division again is pre- If bout length；

Corresponding speech synthesis engine is called to read aloud the multi-language text according to the information of feedback in intelligent robot end Output.

2. the method according to claim 1, wherein the speech synthesis engine is the speech synthesis of single languages Engine.

3. according to the method described in claim 2, it is characterized in that, described be back to intelligent robot for the result feedback of mark End, comprising:

It is array by the type package of each text chunk and speech synthesis engine corresponding with this section of text, wherein each Text chunk corresponds to an array element in array；

Array feedback is back to intelligent robot end.

4. according to the method described in claim 3, it is characterized in that, phase is called according to the information of feedback in the intelligent robot end The speech synthesis engine answered carries out bright read output to the multi-language text, comprising:

It is successively read each array element of the array, and the array element is parsed；

Corresponding speech synthesis engine is called according to the type of the speech synthesis engine marked in parsing result；

Bright read output is carried out to the multi-language text using the speech synthesis engine of calling.

5. a kind of multi-language text towards intelligent robot system mixes bright read apparatus, comprising:

Transmission module, is located at intelligent robot end, and the multi-language text for the bright read output to be mixed that will acquire is sent to cloud clothes Business device；

Feedback module is marked, Cloud Server is located at, different voices is marked according to the language form of the multi-language text and is closed Intelligent robot end is back at the type of engine, and by the result of mark feedback, wherein the mark feedback module, comprising:

Output module is read aloud, intelligent robot end is located at, calls corresponding speech synthesis engine to institute according to the information of feedback It states multi-language text and carries out bright read output.

6. system according to claim 5, which is characterized in that the speech synthesis engine is the speech synthesis of single languages Engine.

7. system according to claim 6, which is characterized in that the mark feedback module is fed back to by the result of mark It is number by the type package of each text chunk and speech synthesis engine corresponding with this section of text when to intelligent robot end Group, wherein each text chunk corresponds to an array element in array；And the array feedback is back to intelligent robot end.

8. system according to claim 7, which is characterized in that the output module of reading aloud is called according to the information of feedback When corresponding speech synthesis engine carries out bright read output to the multi-language text, it is successively read each array member of the array Element, and the array element is parsed；It is called according to the type of the speech synthesis engine marked in parsing result corresponding Speech synthesis engine；Bright read output is carried out to the multi-language text using the speech synthesis engine of calling.