CN109086026A

CN109086026A - Broadcast the determination method, apparatus and equipment of voice

Info

Publication number: CN109086026A
Application number: CN201810781624.XA
Authority: CN
Inventors: 韩喆; 陈力; 姚四海; 杨磊; 吴军
Original assignee: Alibaba Group Holding Ltd
Current assignee: Advanced New Technologies Co Ltd; Advantageous New Technologies Co Ltd
Priority date: 2018-07-17
Filing date: 2018-07-17
Publication date: 2018-12-25
Anticipated expiration: 2038-07-17
Also published as: CN109086026B; TWI711967B; WO2020015479A1; TW202006532A

Abstract

Present description provides a kind of determination method, apparatus and equipment for broadcasting voice.Wherein, the determination method for broadcasting voice includes: to obtain target number sequence to be broadcasted；Target number sequence is converted into character string；Obtain the audio data of the linking syllable between the audio data and adjacent character of the trunk syllable of each character, wherein linking syllable is used to connect the trunk syllable of adjacent character；According to the audio data of the linking syllable between the audio data and adjacent character of the trunk syllable of preset order splicing character, the audio data of target number sequence is obtained.By the audio data for obtaining the linking syllable between adjacent character, and the audio data of the trunk syllable using the character of the audio data splicing correspondence of the linking syllable between adjacent character, obtain the more reasonable voice audio data of transition, to carry out the voice broadcast of digital content, so that the target number sequence broadcasted is more natural, smooth, user experience is improved.

Description

Broadcast the determination method, apparatus and equipment of voice

Technical field

Technology involved in this specification belongs to speech synthesis technique field more particularly to a kind of determination side for broadcasting voice Method, device and equipment.

Background technique

In daily life and work, many case where needing to digital content progress voice broadcast can be usually faced.Example Such as, in transaction, businessman would generally come automatic information broadcast businessman's using the plug-in card program built in mobile-phone payment software The amount of money number of the debt received in account.

Currently, the determination method of existing casting voice is to obtain and splice each character mostly when broadcasting digital content The audio data of the main part of the character syllable of (including character corresponding with number, unit etc.).For example, broadcasting some When specific number, the audio data that can extract to obtain the main part of the character syllable of each character in the number is spelled It connects, obtains the audio data for broadcasting, to carry out voice broadcasting.This character syllable by obtaining and utilizing each character The audio data of main part directly carry out splicing obtained audio data when playing, often will appear character syllable it Between transition it is not smooth enough, naturally, people can feel relatively lofty when listening to played voice, feel not meeting people The voice of class is accustomed to, or even influences understanding of the listener to the digital content broadcasted, and user experience is relatively poor.Therefore, urgently Need a kind of determination method of casting voice that can carry out voice broadcast to digital content naturally, glibly.

Summary of the invention

This specification is designed to provide a kind of determination method, apparatus and equipment for broadcasting voice, to solve existing method Present in number casting unnatural, poor user experience the problem of, operation cost can be taken into account by reaching, and be had efficiently, glibly Close the voice broadcast of digital content.

The determination method, apparatus and equipment for a kind of casting voice that this specification provides are achieved in that

A kind of determination method for broadcasting voice, comprising: obtain target number sequence to be broadcasted；By the target number sequence Column are converted to character string, wherein the character string includes multiple characters arranged according to preset order；It obtains in the character string Each character the audio data of trunk syllable and the adjacent character in the character string between linking syllable sound Frequency evidence, wherein the linking syllable is used to connect the trunk syllable of adjacent character；Splice the character according to preset order Trunk syllable audio data and the adjacent character between linking syllable audio data, obtain the target number The audio data of sequence.

A kind of determining device for broadcasting voice, comprising: first obtains module, for obtaining target number sequence to be broadcasted Column；Conversion module, for the target number sequence to be converted to character string, wherein the character string includes multiple according to pre- If tactic character；Second obtains module, the audio of the trunk syllable for obtaining each character in the character string The audio data of linking syllable between adjacent character in data and the character string, wherein the linking syllable is used In the trunk syllable for connecting adjacent character；Splicing module, for splicing according to preset order the trunk syllable of the character The audio data of linking syllable between audio data and the adjacent character, obtains the audio number of the target number sequence According to.

A kind of determination method for broadcasting voice, comprising: obtain character string to be played, wherein the character string includes more A character arranged according to preset order；The audio data of the trunk syllable of each character in the character string is obtained, and The audio data of linking syllable between adjacent character in the character string, wherein the linking syllable is for connecting phase The trunk syllable of adjacent character；Splice the audio data and the adjacent word of the trunk syllable of the character according to preset order The audio data of linking syllable between symbol, obtains the audio data of the character string to be played.

A kind of casting voice locking equipment really, including processor and for the storage of storage processor executable instruction Device is realized when the processor executes described instruction and obtains target number sequence to be broadcasted；The target number sequence is turned It is changed to character string, wherein the character string includes multiple characters arranged according to preset order；It obtains each in the character string The audio number of linking syllable between the audio data of the trunk syllable of a character and the adjacent character in the character string According to, wherein the linking syllable is used to connect the trunk syllable of adjacent character；Splice the master of the character according to preset order The audio data of linking syllable between the audio data of dry syllable and the adjacent character, obtains the target number sequence Audio data.

A kind of computer readable storage medium, is stored thereon with computer instruction, and described instruction is performed realization and obtains Target number sequence to be broadcasted；The target number sequence is converted into character string, wherein the character string include it is multiple by According to the character of preset order arrangement；Obtain the audio data of the trunk syllable of each character in the character string and described The audio data of linking syllable between adjacent character in character string, wherein the linking syllable is adjacent for connecting The trunk syllable of character；According to preset order splice the trunk syllable of the character audio data and the adjacent character it Between linking syllable audio data, obtain the audio data of the target number sequence.

The determination method, apparatus and equipment for a kind of casting voice that this specification provides, due to by obtaining adjacent word The audio data of linking syllable between symbol, and utilize the audio data splicing correspondence of the linking syllable between adjacent character The audio data of the trunk syllable of character obtains the more natural voice audio data of transition, to carry out voice broadcast, to solve The problem of number present in existing method of having determined casting unnatural, poor user experience, operation cost can be taken into account by reaching, efficiently, The voice broadcast in relation to digital content is carried out glibly.

Detailed description of the invention

In order to illustrate more clearly of this specification embodiment or technical solution in the prior art, below will to embodiment or Attached drawing needed to be used in the description of the prior art is briefly described, it should be apparent that, the accompanying drawings in the following description is only The some embodiments recorded in this specification, for those of ordinary skill in the art, in not making the creative labor property Under the premise of, it is also possible to obtain other drawings based on these drawings.

Fig. 1 is in a Sample Scenario, and the determination method of the casting voice provided using this specification embodiment carries out A kind of schematic diagram for the embodiment broadcasted to the account amount of money；

Fig. 2 is in a Sample Scenario, and the determination method of the casting voice provided using this specification embodiment is spliced Obtain a kind of schematic diagram of embodiment of the audio data of target number sequence；

Fig. 3 is that the determination method of the casting voice in a Sample Scenario, provided using this specification embodiment obtains For being played to a kind of schematic diagram of embodiment of the voice audio data of the account amount of money；

Fig. 4 is the schematic diagram of a kind of embodiment of annotated audio data in a Sample Scenario；

Fig. 5 is intercepted between the audio data and adjacent character of the trunk syllable of character in a Sample Scenario It is connected a kind of schematic diagram of embodiment of the audio data of syllable；

Fig. 6 is that a kind of process of embodiment of the determination method for the casting voice that one embodiment of this specification provides is shown It is intended to；

Fig. 7 is the position in determining specified region in the determination method for the casting voice that one embodiment of this specification provides A kind of schematic diagram of embodiment of point；

Fig. 8 is that a kind of process of embodiment of the determination method for the casting voice that one embodiment of this specification provides is shown It is intended to；

Fig. 9 is a kind of embodiment for the casting voice structure of locking equipment really that one embodiment of this specification provides Schematic diagram；

Figure 10 is a kind of embodiment of the structure of the determining device for the casting voice that one embodiment of this specification provides Schematic diagram.

Specific embodiment

In order to make those skilled in the art more fully understand the technical solution in this specification, below in conjunction with this explanation Attached drawing in book embodiment is clearly and completely described the technical solution in this specification embodiment, it is clear that described Embodiment be only this specification a part of the embodiment, instead of all the embodiments.The embodiment of base in this manual, Every other embodiment obtained by those of ordinary skill in the art without making creative efforts, all should belong to The range of this specification protection.

In view of the determination method of existing casting voice is often without in depth analyzing the language when mankind normally speak Habit and characteristic voice.For example, people after issuing character syllable " ten ", issues character sound when saying " 16 " this number Before saving " six ", usually can also it issue a kind of for connecting the linking syllable of above two character syllable " ten " and " six ".And not Linking syllable between same character syllable can have differences toward contact.Such as character syllable " five " and character syllable in " 50 " Linking syllable between " ten " is also not phase with the syllable that is connected in " 15 " between character syllable " ten " and character syllable " five " With.Above-mentioned linking syllable itself does not correspond to some specific character, can not characterize what specific interior perhaps meaning, but Similar to a kind of connection auxiliary word, character syllable adjacent in the normal word of the mankind is connected together naturally, glibly, with Just the person that understands can preferably receive and understand information and content in speaker's word.And the determination of existing casting voice Voice habit and characteristic voice of the method due to not accounting for the above-mentioned mankind, in synthesis about target number sequence to be broadcasted Voice audio data when, the sound of the main part of the character syllable of the corresponding numerical character in usual intercepted samples data Frequency evidence is directly spliced.Due to not meeting the natural transition of human speech habit between adjacent character syllable, cause Based on the above method voice audio data generated about target number sequence when playing often unlike number that the mankind say Word is so natural, smooth, or even will affect understanding of the people to the digital content broadcasted, and causes using upper inconvenience.Cause This, existing method in the specific implementation, often has that number broadcasts unnatural, poor user experience.

For the basic reason for generating the above problem, this specification is deep, comprehensively analyzes the mankind when normally speaking Speech habits and characteristic voice consider and have paid close attention to the linking syllable between mankind's character syllable adjacent when normally speaking In the presence of and effect.When establishing preset audio database, not only interception saves the audio number of the trunk syllable of character syllable According to there are also consciousness ground to intercept the audio data for being connected syllable saved between adjacent character syllable.And then it is a certain generating When the voice audio data of a specific number, the trunk syllable of each character in the corresponding multiple characters of the number can be obtained simultaneously Audio data and adjacent character between linking syllable audio data, recycle the linking syllable between adjacent character Audio data splicing correspondence two adjacent characters trunk syllable audio data so that speech audio number generated In, the transition between adjacent character syllable is more natural, smooth, to solve the casting of number present in existing method The problem of unnatural, poor user experience, operation cost can be taken into account by reaching, and carry out related digital voice broadcast efficiently, glibly.

For these reasons, this specification embodiment provide one kind can efficiently, carry out digital speech casting naturally Casting voice locking equipment really, by the casting voice, following functions are may be implemented in locking equipment really: obtaining mesh to be broadcasted Mark Serial No.；The target number sequence is converted into character string, wherein the character string includes multiple according to preset order The character of arrangement；It obtains in audio data and the character string of the trunk syllable of each character in the character string The audio data of linking syllable between adjacent character, wherein the linking syllable is used to connect the trunk of adjacent character Syllable；Splice the linking sound between the audio data of the trunk syllable of the character and the adjacent character according to preset order The audio data of section obtains the audio data of the target number sequence；Play the audio data of the target number sequence.

In the present embodiment, the casting voice really locking equipment can be it is a kind of used in user side it is relatively simple Electronic equipment.Specifically, the casting voice really locking equipment can be it is a kind of have data operation, voice play function with And the electronic equipment of network interaction function；Or run in the electronic equipment, it is data processing, voice plays and network The software application that the offers such as interaction are supported.

Specifically, above-mentioned casting voice really locking equipment for example can be desktop computer, tablet computer, laptop, Smart phone, digital assistants, intelligent wearable device, shopping guide's terminal etc..Alternatively, locking equipment can also be with really for above-mentioned casting voice It is the software application that can be run in above-mentioned electronic equipment.For example, locking equipment can also be in intelligence above-mentioned casting voice really The XX treasured APP run in energy mobile phone.

In a Sample Scenario, the determination method for the casting voice that application this specification embodiment provides can be passed through The amount of money of the casting voice debt that locking equipment arrives account for the trade company A account for broadcasting trade company A automatically in real time really is digital.

In the present embodiment, oneself mobile phone can be used as above-mentioned casting voice locking equipment really in trade company A.Having Before body is implemented, the account of phone number and trade company A on certain payment platform is closed in the setting operation that trade company A can first pass through mobile phone Connection.As shown in fig.1, can be directly flat by certain payment on mobile phone after usually consumer consumes in the shop of trade company A The payment software of platform carries out checkout payment on the net, is paid the bill face to face without lower online with trade company.Specifically, consumer It can use mobile phone to be communicated with the server of certain payment platform, be transferred accounts by the debt that payment platform will cope with to trade company A Into the account of trade company A, checkout payment is completed.The server of payment platform receives consumer in the account of confirmation trade company A and leads to After crossing the debt transferred accounts on the net, can be sent to the mobile phone of trade company A account prompt information (such as be sent to account SMS Tip, or Pushed in the payment APP on the mobile phone of trade company A corresponding to account prompted dialog frame etc.), to prompt trade company A: consumer is Checkout payment is carried out on the net, while the debt that is received of the account that trade company A can be also identified in prompt information is specific Amount of money number, so that trade company A can further confirm that whether consumer is accurate in the amount of money of the debt of online payment.For example, branch The server for paying platform can be when confirming that the account of trade company A receives 54 yuan of the debt that consumer transfers accounts on the net, Ke Yixiang Sent with the associated mobile phone of the account of trade company A include the following contents prompt information: " account to account 54 yuan ".

Usually during business, trade company can relatively hurry, and be often possible to no time and leaf through in time, read above-mentioned mention Show information, therefore checkout payment whether has been carried out on the net by inconvenient confirmation consumer in time and consumer ties on the net Whether the amount of money of account payment is accurate.At this moment trade company wishes to be received by the mobile phone account that voice broadcast goes out oneself in real time The specific amount of money number of debt, even if such trade company is busy during doing business, no time oneself goes to leaf through, confirms payment platform Server send prompt information, can also recognize in time consumer by payment platform checkout pay the bill concrete condition.

Mobile phone can first parse prompt information, and extract after the prompt information for receiving payment platform transmission The amount of money digital " 54 " in prompt information is as target number sequence to be broadcasted, so as to corresponding to the subsequent determination Serial No. Audio data carry out voice broadcast.

In the present embodiment, above-mentioned prompt information is usually to generate according to unalterable rules, therefore have relatively uniform Format.For example, above-mentioned prompt information can be according to following format composition: preposition leading question portion in this Sample Scenario Divide (i.e. " account to account ")+numerical portion (the i.e. specific amount of money " 54 ")+unit portion (i.e. " member ").Therefore, it is obtaining wait broadcast Target number sequence, i.e., in prompt information when the particular content of numerical portion, can according to above-mentioned generation prompt information The corresponding resolution rules of unalterable rules parse prompt information, are split, it can mention from the numerical portion of prompt information Obtain number to be broadcasted, i.e. target number sequence.

In the present embodiment, it should be noted that for different prompt informations, above-mentioned preposition leading question part and list The content of bit position is usually the same, and only the content of numerical portion can be different and different with prompt information.It therefore, can be with The audio data for pre-generating and storing the audio data of unified preposition leading question part, unit portion, in casting prompt letter When breath, it is only necessary to generate the audio data of numerical portion in prompt information, then the sound with pre-stored preposition leading question part Frequency is spliced according to the audio data of, unit portion, it can obtains the complete voice audio data of prompt information.

Target number sequence first can be converted to correspondence after having acquired the target number sequence wait broadcast by mobile phone Character string.Wherein, above-mentioned character string specifically can be understood as the character syllable for characterizing target number sequence, and according to The character string arranged that puts in order and (preset and put in order) corresponding with target number sequence, each word in above-mentioned character string Accord with a character syllable in corresponding target number sequence.

For example, the corresponding character string obtained after target number sequence " 54 " conversion can be expressed as " 54 ".Character String " 54 " can be understood as the character string of the character syllable of characterization target number sequence " 54 ", wherein the word in character string Symbol " five ", " ten " are corresponding with number " 5 " being located on ten in target number sequence；Character " four " and target in character string The number " 4 " being located on a position in Serial No. is corresponding.And the character in character string according to in target number sequence " 54 " (i.e. first " 5 " afterwards " 4 ") corresponding default put in order that put in order of number is arranged, i.e., first row it is ten corresponding on " 5 " Character " five " " ten ", then arrange the character " four " of " 4 " on corresponding position.Certainly, it should be noted that above-mentioned cited character String and corresponding default put in order are intended merely to be better described this specification embodiment.When it is implemented, can be with According to specific scenario, selection uses the character string and preset rules of other forms, can also be to target number sequence not It converts, directly carries out identification splicing etc..In this regard, this specification is not construed as limiting.

Mobile phone can be identified and be determined in character string and arrange in order after obtaining character string corresponding with target number sequence Connection relationship between each character and adjacent character of column.Wherein, the connection relationship between above-mentioned adjacent character specifically may be used Be interpreted as adjacent two characters between sequencing a kind of identification information.For example, character in character string " 54 " " five " and " ten " are two adjacent characters, and the connection relationship between " five " and " ten " can be stated are as follows: character " five " hyphen " ten ".Certainly, it should be noted that the connection relationship between above-mentioned cited adjacent character is that one kind schematically illustrates.Tool Body can also indicate the connection relationship between adjacent character by other identifier mode when implementing.In this regard, this specification does not limit It is fixed.

In the present embodiment, mobile phone is by character recognition, can determine each character in character string in order according to Secondary is " five ", " ten ", " four ", and the connection relationship between corresponding adjacent character is successively are as follows: character " five " hyphen " ten ", character " ten " hyphen " four ".

Further, mobile phone can according to the connection relationship between each character and adjacent character identified, It is retrieved from preset audio database, to obtain the connection relationship pair between each character and adjacent character The audio data answered, i.e., in acquisition character string between the audio data and adjacent character of the trunk syllable of each character It is connected the audio data of syllable.

Wherein, the trunk syllable of above-mentioned character specifically can be understood as the major part of character syllable, the usual part Syllable identification with higher, the audio frequency characteristics such as fundamental frequency, the loudness of a sound of trunk syllable of the same character syllable are more consistent, closely Patibhaga-nimitta is same, therefore can extract the trunk syllable of character syllable to distinguish other character syllables.For example, people is issuing character When " five " corresponding voice, the voice of middle section is the major part of the character syllable, i.e. trunk syllable, is typically different human hair When character " five " corresponding voice, although having differences, trunk syllable part is all consistent mostly.

Linking syllable between above-mentioned adjacent character specifically can be understood as the trunk syllable for connecting adjacent character The syllable of coupling part.For example, people is at sending " 50 ", in the trunk syllable of character " five " and the trunk syllable of character " ten " Between coupling part language, the as linking syllable between character " five " and character " ten ".This part syllable is different from trunk There is no what concrete meanings for syllable itself, characterize some specific character, but the wave in audio data without in corresponding Graphic data is not 0.People voice habit in, it will usually appear between the trunk syllable of adjacent character, play undertaking, The effect of transition so that people's word different from machine pronounce, be not it is dull, frigidly directly by each character Trunk syllable simply connects, but naturally, is glibly transitioned into another character syllable from a character syllable.This What the number that sample broadcasts more met the mankind hears habit, convenient for the reception and understanding of the mankind, while can also make listener Feel more comfortable when listening to, experience is more preferable.Also you need to add is that, (including character is different and character for different adjacent characters Identical characters sequencing difference etc.) between linking syllable it is often also not identical.For example, the rank between character " five " and " ten " Connect linking syllable between syllable and " five " and " hundred ", the linking syllable between " ten " and " five " corresponding audio data wave It is all had differences between each other in shape.Therefore, in the present embodiment, it is accurate using the connection relationship between adjacent character to need Ground gets the audio data of corresponding linking syllable.

Above-mentioned preset audio database specifically can be in advance there is Platform Server to establish and be stored in server or Broadcast the database of voice locking equipment really, wherein specifically can wrap in above-mentioned preset audio database containing each character Trunk syllable audio data and each adjacent character between linking syllable audio data.

Specifically, mobile phone can according to the connection relationship between each character and adjacent character identified, Retrieve preset audio database respectively obtain the audio data A of trunk syllable of character " five ", " ten " trunk syllable sound Frequency is according to the audio for being connected syllable between B, the audio data C of the trunk syllable of " four " and character " five " hyphen " ten " The audio data r of linking syllable between data f, character " ten " hyphen " four ".

In turn, mobile phone can be by the linking between the audio data and adjacent character of the trunk syllable of above-mentioned character The audio data of syllable is spliced according to put in order (i.e. the preset order) of character in character string, to obtain corresponding number of targets The audio data of word sequence.Specifically, can according to preset order (i.e. with the arrangement of character in the character string of target number sequence Sequentially), the audio data of the trunk syllable of each character is arranged；Recycle the audio number of the linking syllable between adjacent character According to the audio data for the trunk syllable for connecting adjacent character.

Specifically, for example, can with as shown in fig.2, according to character in character string (i.e. " 54 ") the elder generation that puts in order The audio data A for arranging the trunk syllable of " five ", then arranges the audio data B of the trunk syllable of " ten ", the master of most heel row " four " again The audio data C of dry syllable.After the audio data for sequencing trunk syllable；Character " five " hyphen " ten " may further be utilized Between linking syllable audio data f connection audio data A and audio data B, using between character " ten " hyphen " four " Linking syllable audio data r connection audio data B and audio data C.It is finally obtained to have spliced, for target number The audio data of sequence " 54 " can indicate are as follows: " A-f-B-r-C ".Transition has just been obtained so more naturally for number of targets The audio data of word sequence.

After the audio data for obtaining target number sequence, it can will pre-set and be stored in mobile phone or server Front audio data (such as the preposition leading question part for being used to indicate the data object that the target number sequence is characterized The audio data of audio data, unit portion) spliced with the audio data of target number sequence, obtain voice to be played Audio data, mobile phone play corresponding content information further according to above-mentioned voice audio data.

In the present embodiment, as shown in fig.3, the mobile phone of trade company A is available default and to be stored in mobile phone local Front audio data, that is, the audio data Y for stating " account to account " pre-set and the audio for stating " member " Data Z；And splice above-mentioned front audio data with the audio data about target number sequence " 54 " generated, it obtains Complete voice audio data to be played, can be expressed as " Y-A-f-B-r-C-Z ", and then play above-mentioned speech audio number According to such trade company A can hear clear, nature, smoothness, and more meet the voice broadcast that the mankind normally listen to habit, keep away Machine talk influence caused by trade company's listening experience is exempted from.

Therefore the determination method for broadcasting voice that this specification embodiment provides is by obtaining between adjacent character Linking syllable audio data, and utilize the character of the audio data splicing correspondence of the linking syllable between adjacent character The audio data of trunk syllable obtains the more natural voice audio data of transition, to carry out voice broadcast, to solve existing There is the problem of unnatural casting of number present in method, poor user experience, operation cost can be taken into account by reaching, efficiently, glibly Carry out the voice broadcast in relation to number.

In another Sample Scenario, the server of payment platform can pre-establish preset audio database, and will Above-mentioned preset audio database is sent to casting voice locking equipment really.Casting voice really locking equipment receive it is preset Preset audio database can be stored in the local of casting voice locking equipment really, to broadcast voice by audio database Really locking equipment can be by retrieving each character in character string of the preset audio database to obtain target number sequence The audio data of trunk syllable and the adjacent character in the character string between linking syllable audio data.When So, the server of payment platform can not also send preset audio database after establishing preset audio database To casting voice locking equipment really, and it is stored in server-side, locking equipment is generating target number sequence to casting voice really It, can be by calling the preset audio database for being stored in server-side to obtain target number sequence when the audio data of column Between the audio data of the trunk syllable of each character in the character string of column and the adjacent character in the character string It is connected the audio data of syllable.

In the present embodiment, when it is implemented, the available audio data for including number of server is as sample Data.And then the sound for obtaining the trunk syllable of character can be intercepted respectively from the sample data after mark according to certain rules The audio data of linking syllable between frequency evidence and adjacent character, further according to the audio of the trunk syllable of above-mentioned character The audio data of linking syllable between data and adjacent character, establishes preset audio database.

Specifically, it may include: interception announcer that above-mentioned acquisition, which includes digital audio data as sample data, Broadcasting in audio data includes the audio data with the casting content of digital correlation as above-mentioned sample data.It can also acquire The voice data that people reads according to pre-set text, as above-mentioned sample data, wherein upper pre-set text can be pre-set Comprising there are many content of text of number combination.

After obtaining sample data, first sample data can also be labeled.Specifically, can with as shown in fig.4, In acquired sample data, the institute of the corresponding audio data of each character syllable can be identified using character syllable mark The range areas at place.For example, for the audio data " 56 " in sample data, it can use " 5 ", " 10 ", " 6 " are made respectively It is identified for the character syllable mark of character syllable " five ", the character syllable of character syllable " ten ", the character sound of character syllable " six " Feast-brand mark is known, and identifies character syllable " five ", the range areas of " ten ", " six " in the audio data respectively.Certainly, it needs Bright, above-mentioned cited character syllable mark is that one kind schematically illustrates, and should not constitute the improper limit to this specification It is fixed.

Further, it when interception obtains the audio data of trunk syllable of character from sample data, specifically can wrap It includes: retrieving the character syllable mark in the sample data；It is identified according to the character syllable, intercepts institute in the sample data State the audio of trunk syllable of the audio data in the specified region in the identified range of character syllable mark as the character Data.

Specifically, the character syllable mark of determining sample data sound intermediate frequency data can be retrieved, and then can be according to above-mentioned Character syllable mark, determines regional scope of each character syllable in audio data in sample data, i.e., in sample data Character syllable identify identified range；Again according to preset in the regional scope from above-mentioned character syllable in audio data Rule intercepts the audio data of trunk syllable of the audio data as character in specified region.For example, in sample data Audio data " 56 ", can first retrieve in the audio data character syllable mark " 5 ", " 10 ", " 6 "；And then it can be with " 5 " are identified according to character syllable and determine regional scope of the character syllable " five " in audio data, are identified according to character syllable " 10 " determine regional scope of the character syllable " ten " in audio data, identify " 6 " according to character syllable and determine character syllable The regional scope of " six " in audio data；And then it can be cut from the audio data of the regional scope where character syllable " five " Fetching determines the audio data of the audio data in region as the trunk syllable of character " five ", the region where character syllable " ten " The audio data that trunk syllable of the audio data in specified region as character " ten " is intercepted in the audio data of range, from character Trunk sound of the audio data in specified region as character " six " is intercepted in the audio data of regional scope where syllable " six " The audio data of section.

In the specifically audio data of the trunk syllable of interception character, it is contemplated that people corresponds to number when saying specific number The audio data of the middle section of the syllable of each of word number or unit is more consistent, i.e. identical characters sound mostly The audio data difference of the most middle section of the audio data of section is relatively small, and the audio data of kinds of characters syllable is mostly intermediate Partial audio data difference is relatively large.For example, people say " 56 " and " 65 " the two number when, " 56 " In character " five " syllable audio data the middle section often audio with the syllable of the character " five " in " 65 " The middle section of data is identical.It therefore, can be using the audio data of the middle section in the audio data of character syllable as referring to The audio data for determining region is intercepted, to obtain the audio data of the trunk syllable of the character syllable.Based on These characteristics, tool When body is implemented, it can be identified in the character syllable in identified range, identified range is identified with the character syllable In midpoint be center symmetric points, and the siding-to-siding block length in region and the character syllable identify the siding-to-siding block length of identified range Ratio be equal to the region of default ratio.

For example, can be as shown in fig.5, by right centered on the midpoint O in character syllable mark " 5 " range for being identified Claim point, intercepting the 1/2 region group cooperation of the two sides central symmetry point O respectively is specified region, specifies the audio data in region true this It is set to the audio data of the trunk syllable of character " five ".Wherein, above-mentioned specified region accounts for the model that character syllable mark " 5 " is identified 1/2 enclosed.In the manner described above, it can also intercept to obtain the audio data and character " six " of the trunk syllable of character " ten " Trunk syllable audio data.Certainly, above-mentioned cited default ratio is intended merely to that this specification implementation is better described Mode.When it is implemented, other numerical value can also be selected as default ratio according to specific scenario, it is specified to determine Region intercepts the audio data of the trunk syllable of corresponding character in turn.

It, can be with adjacent in the audio data of intercepted samples data after interception obtains the audio data of trunk syllable of character Character trunk syllable audio data between region in audio data as the linking between above-mentioned adjacent character The audio data of syllable.

For example, can with as shown in fig.5, the adjacent character " five " in the audio data of intercepted samples data trunk sound The audio data in region between the audio data of the trunk syllable of the audio data and character " ten " of section is as character " five " The audio data of linking syllable between hyphen " ten ", i.e., the audio data of the linking syllable between adjacent character.According to Aforesaid way can also intercept to obtain the audio data of the linking syllable between adjacent character " ten " and character " six ".

In this Scene case, it is contemplated that if sample data compared with horn of plenty, can intercept to obtain multiple same phases of characterization The audio data of linking syllable between adjacent character.For example, in audio data " 56 ", " 54 " in sample data Audio data (or character " five " loigature for being connected syllable that can be truncated between identical character " five " and character " ten " Accord with the audio data of the linking syllable between " ten ").In addition, may include " 56 " that different people issues in sample data Audio data, and then the linking between multiple characters " five " and character " ten " can be obtained based on the audio data of different people The audio data of syllable.

It therefore, include same adjacent in the audio data of the linking syllable between the adjacent character intercepted Character between linking syllable audio data in the case where, in order to obtain the preferable audio data of effect as adjacent word The audio data of linking syllable between symbol, when to be subsequently used for being connected the audio data of the trunk syllable of corresponding character more It is natural, smoothness, the audio data of multiple linking syllables between same adjacent character can be divided into multiple types, point The frequency of occurrences of various types of audio datas in other statistical sample data, and screen and occur from a plurality of types of audio datas The audio data of the highest type of frequency is stored in as the audio data between the linking syllable between above-mentioned adjacent character In preset audio database.Certainly, in addition to the above-mentioned cited frequency of occurrences according to various types of audio datas is from same It is filtered out in the audio data of multiple linking syllables between one adjacent character outside the preferable audio data of effect saved It can also be filtered out from the audio data of multiple linking syllables between same adjacent character using other suitable modes The preferable audio data of effect is saved.For example, it is also possible to calculate separately multiple linking sounds between same adjacent character The MOS value (Mean Opinion Score, mean subjective opinion point) of the audio data of section, according to the audio data of linking syllable MOS value, filter out MOS value it is highest linking syllable audio data as between adjacent character linking syllable audio Data.Wherein, above-mentioned MOS value can be used for natural, smooth degree that is more accurate, objectively evaluating audio data.

Similar, when interception obtains the audio data of the trunk syllable of multiple same characters of characterization, can count same The frequency of occurrences of the audio data of different types of trunk syllable in the audio data of multiple trunk syllables of character, and then can be with The highest audio data conduct of the frequency of occurrences is filtered out from the audio data of a plurality of types of trunk syllables of same sub- symbol should The audio data of the trunk syllable of character is simultaneously saved into preset audio database.The more of same character can also be determined respectively The MOS value of the audio data of a trunk syllable filters out sound of the highest audio data of MOS value as the trunk syllable of the character Frequency evidence simultaneously saves medium to preset audio database.

Therefore the determination method for broadcasting voice that this specification embodiment provides is by obtaining between adjacent character Linking syllable audio data, and utilize the character of the audio data splicing correspondence of the linking syllable between adjacent character The audio data of trunk syllable obtains the more natural voice audio data of transition, to carry out voice broadcast, to solve existing There is the problem of unnatural casting of number present in method, poor user experience, operation cost can be taken into account by reaching, efficiently, glibly Carry out the voice broadcast in relation to number；Include also digital sample data by obtaining, specified area is intercepted from sample data The audio data of audio data in domain as the trunk syllable of character, so intercept character trunk syllable audio data it Between audio data as between adjacent character linking syllable audio data, so as to establish accurately it is preset Audio database, so as to generate more natural, smooth target number sequence by retrieving above-mentioned preset audio database The audio data of column.

As shown in fig.6, present description provides a kind of determination methods for broadcasting voice, wherein this method concrete application In casting voice locking equipment (or user terminal) side really.When it is implemented, this method may include the following contents.

S601: target number sequence to be broadcasted is obtained.

In the present embodiment, above-mentioned target number sequence to be broadcasted specifically can be the amount of money number of the debt of account 54 in word, such as 54 yuan；It is also possible to the distance number of vehicle driving mileage, such as 80 in 80 kilometers；It can also be stock The real time price of ticket, for example, 20.9 yuan it is per share in 20.9.Certainly the data that above-mentioned cited target number sequence is characterized Object is intended merely to that present embodiment is better described.When it is implemented, according to specific application scenarios, above-mentioned mesh to be broadcasted Mark Serial No. can also be the number for characterizing other data objects.In this regard, this specification is not construed as limiting.

In the present embodiment, it is specific it is to be understood that obtaining number to be broadcasted to obtain target number sequence to be broadcasted According to parsing data to be broadcasted, extract in the data to be broadcasted number as above-mentioned target number sequence to be broadcasted.Example Such as, the server of payment platform confirm user account arrive 54 yuan of account when, understand to the associated casting language of the account of the user Really locking equipment (such as mobile phone of the user) is sent to account prompt information " account to account 54 yuan " to sound.The determination of casting voice is set It is standby to receive above-mentioned to after account prompt information, it can parse the prompt information, and extract the number " 54 " in the prompt information As target number sequence to be broadcasted.Certainly, it should be noted that obtain target number sequence to be broadcasted cited by above-mentioned Column are that one kind schematically illustrates, in this regard, this specification is not construed as limiting.

S603: the target number sequence is converted into character string, wherein the character string includes multiple suitable according to presetting The character of sequence arrangement.

In the present embodiment, wherein above-mentioned character string specifically can be understood as the word for characterizing target number sequence Syllable is accorded with, and according to the character string of (the i.e. default to put in order) arrangement that puts in order corresponding with target number sequence, it is above-mentioned Each character corresponds to a character syllable in target number sequence in character string.For example, the word of target number sequence " 67 " Symbol string can be expressed as " 67 ", wherein character " six ", " ten ", " seven " correspond respectively to one in target number sequence Character syllable, and above-mentioned character is arranged according to preset order corresponding with target number sequence.Certainly, it should be noted that Above-mentioned cited character string is intended merely to that present embodiment is better described.When it is implemented, as the case may be can be with Selection uses other kinds of character string.In this regard, this specification is not construed as limiting.

In the present embodiment, above-mentioned that the target number sequence is converted into character string, according to being specifically understood that Target number sequence is converted to the character of the corresponding character syllable for being used to characterize target number sequence by preset mapping ruler String.For example, the number " 6 " in target number sequence " 67 " ten can be converted to correspondence according to preset mapping ruler Character " six " and " ten ", the number " 7 " on a position is converted into corresponding character " seven ", according still further to target number sequence " 67 " corresponding preset order, the character arranged, so that obtaining corresponding character string is " 67 ".Certainly, it needs Bright, the above-mentioned cited implementation that the target number sequence is converted to character string is that one kind is schematically said It is bright.When it is implemented, corresponding character can also be converted to target number sequence using other modes as the case may be String.In this regard, this specification is not construed as limiting.

S605: it obtains in audio data and the character string of the trunk syllable of each character in the character string Adjacent character between linking syllable audio data, wherein the linking syllable is used to connect the master of adjacent character Dry syllable.

In the present embodiment, the trunk syllable of above-mentioned character specifically can be understood as the major part of a character syllable (such as middle section of character syllable).The syllable identification with higher of this usual part, the master of the same character syllable The audio frequency characteristics such as fundamental frequency, the loudness of a sound of dry syllable are more consistent, approximately uniform, therefore the trunk syllable that can extract character syllable is used To distinguish other character syllables.

In the present embodiment, the linking syllable between above-mentioned adjacent character specifically can be understood as connecting adjacent words The syllable of the coupling part of the trunk syllable of symbol.The syllable of this usual part is different from trunk syllable itself there is no what specific Meaning characterizes some specific character without in corresponding, but the Wave data in audio data is not 0.In the language of people In sound habit, it will usually appear between the trunk syllable of adjacent character, play the role of undertaking, transition, so as to make People's word be different from machine pronunciation, be not it is dull, directly the trunk syllable of each character is simply connected frigidly Come, but naturally, is glibly transitioned into another character syllable from a character syllable.For example, people is at sending " 50 ", The language of coupling part between the trunk syllable of character " five " and the trunk syllable of character " ten ", as character " five " and character Linking syllable between " ten ".

In the present embodiment, the audio data of the trunk syllable of each character in the above-mentioned acquisition character string, with And the audio data of the linking syllable between the adjacent character in the character string, it can specifically include: according to target number Specific character in the character string of sequence retrieves preset audio database to obtain the master of each character in the character string The audio data of linking syllable between adjacent character in the audio data of dry syllable and the character string.

Wherein, above-mentioned preset audio database specifically can be establishing in advance and be stored in server or casting language The database of sound locking equipment really.Specifically, specifically can wrap the master containing each character in above-mentioned preset audio database The audio data of linking syllable between the audio data of dry syllable and each adjacent character.

S607: splice between the audio data of the trunk syllable of the character and the adjacent character according to preset order Linking syllable audio data, obtain the audio data of the target number sequence.

In the present embodiment, the audio data of above-mentioned target number sequence specifically can be understood as voice broadcast mesh Mark the audio data of Serial No..

In the present embodiment, the audio data of the above-mentioned trunk syllable for splicing the character according to preset order and described The audio data of linking syllable between adjacent character, when it is implemented, may include: according to preset order (i.e. and target Character puts in order in the character string of Serial No.), arrange the audio data of the trunk syllable of each character；It recycles adjacent Character between linking syllable audio data connect adjacent character trunk syllable audio data.

In the present embodiment, it should be noted that locking equipment is big really for the casting voice used in view of usual user It is mostly Embedded device systems, this kind of device systems are limited to the structure of itself, often operational capability, data-handling capacity phase To weaker, lead to directly to synthesize the audio data of corresponding Serial No. by speech synthesis model that the cost is relatively high, handles Efficiency is also relatively poor.Resource can be avoided passing through by the determination method of the casting voice provided using this specification embodiment It occupies higher speech synthesis model and generates corresponding audio data, but simply retrieved in preset audio database true The audio data of linking syllable between the audio data and adjacent character of the trunk syllable of fixed corresponding character is spliced Combination, is mentioned with obtaining the audio data of the target number sequence with high accuracy so as to reduce the occupancy to resource High treatment efficiency is preferably suitable for Embedded device systems.

In one embodiment, the audio data of the trunk syllable of each character in the above-mentioned acquisition character string, And the audio data of the linking syllable between the adjacent character in the character string, when it is implemented, may include following Content.

S1: it identifies each character in the character string, and determines the company between the adjacent character in the character string Connect relationship, wherein the connection relationship between adjacent character in the character string is used to indicate the adjacent word in character string The successive order of connection between symbol；

S2: it according to each character in the character string, is retrieved from preset audio database and obtains each character Trunk syllable audio data, wherein the audio number of the trunk syllable of character is stored in the preset audio database According to the audio data of the linking syllable between adjacent character；

S3: it according to the connection relationship between the adjacent character in the character string, is examined from preset audio database Rope and the audio data for obtaining the linking syllable between the adjacent character in the character string.

In the present embodiment, the connection relationship between above-mentioned adjacent character specifically can be understood as two adjacent words A kind of identification information of sequencing between symbol.For example, character " five " and " ten " are adjacent two in character string " 54 " A character, the connection relationship between " five " and " ten " can be stated are as follows: character " five " hyphen " ten ".Certainly, it needs to illustrate It is connection relationship between above-mentioned cited adjacent character is that one kind schematically illustrates.It can also be passed through when specific implementation His identification means indicate the connection relationship between adjacent character.In this regard, this specification is not construed as limiting.

It in the present embodiment, when it is implemented, can be according to by the character identified and identified adjacent word Connection relationship between symbol is retrieved, to extract preset audio database as mark in preset audio database In between the audio data or adjacent character in the matched audio data of above-mentioned mark as the trunk syllable of above-mentioned character Linking syllable audio data.

In one embodiment, the preset audio database can specifically be established in the following way.

S1: sample data is obtained；Wherein, the sample data is the audio for including character string corresponding to Serial No. Data；

S2: interception obtains the audio data of the trunk syllable of character from the sample data；

S3: the audio data of the linking syllable between adjacent character is obtained from interception in the sample data；

S4: according to the sound of the linking syllable between the audio data of the trunk syllable of the character, the adjacent character Frequency evidence establishes the preset audio database.

In the present embodiment, above-mentioned acquisition include number audio data as sample data when it is implemented, can It include with the audio data of the casting content of digital correlation as above-mentioned to include: in the casting audio data for intercepting announcer Sample data；The voice data that people reads according to pre-set text can also be acquired, as above-mentioned sample data, wherein upper default Text can be pre-set comprising there are many content of text of number combination.Certainly it should be noted that it is above-mentioned cited Acquisition include the audio data of number as the implementation of sample data be that one kind schematically illustrates.Specific implementation When, can also select to obtain by other means as the case may be includes digital audio data as sample data.It is right This, this specification is not construed as limiting.

In the present embodiment, after obtaining sample data, sample data can also be labeled.Specifically, can To mark the corresponding audio data of each character syllable using corresponding character syllable in acquired sample data Locating range areas.

Correspondingly, the above-mentioned interception from sample data specifically can wrap when obtaining the audio data of trunk syllable of character It includes: retrieving the character syllable mark in the sample data；It is identified according to the character syllable, intercepts institute in the sample data State the audio of trunk syllable of the audio data in the specified region in the identified range of character syllable mark as the character Data.

In the present embodiment, above-mentioned specified region specifically can be understood as identifying identified model in the character syllable In enclosing, the midpoint in identified range is identified using the character syllable as center symmetric points, and the siding-to-siding block length in region and institute The ratio for stating the siding-to-siding block length that character syllable identifies identified range is equal to the region of default ratio.

For example, character syllable can be identified to symmetric points centered on the midpoint O in the range that " 5 " are identified, cut respectively Taking the 1/2 region group cooperation of the two sides central symmetry point O is specified region, specifies the audio data in region to be determined as character " five " this Trunk syllable audio data.Wherein, above-mentioned specified region accounts for the 1/2 of the range that character syllable mark " 5 " is identified.When So, it should be noted that above-mentioned cited specified region, and determine that the mode in specified region is intended merely to be better described This specification embodiment.When it is implemented, can also select to use other regions as specified region as the case may be, And then specified region is determined using corresponding method of determination.

For example, it is also possible to using character syllable identify loudness of a sound amplitude in identified range be greater than the region of threshold intensity as Specified region.Correspondingly, when it is implemented, loudness of a sound can be intercepted from range represented by character syllable mark according to loudness of a sound Amplitude is greater than audio data of the audio data in the region of threshold intensity as the trunk syllable of character.

When it is implemented, can be refering to shown in Fig. 7.It is identified from character syllable in identified range, selects the width of loudness of a sound Value is greater than the location point that the loudness of a sound value in a cycle of threshold intensity is 0 and loudness of a sound amplitude is less than first of threshold intensity The region between location point that loudness of a sound in period is 0 can intercept the sound in above-mentioned specified region as specified region Frequency according to the trunk syllable as above-mentioned character audio data.

Wherein, it should be noted that the specific value of above-mentioned threshold intensity can be determined according to the phoneme of character syllable.Tool Body, if the phoneme of character syllable is vowel, above-mentioned threshold intensity can be arranged relatively high, such as can be set to 0.1.If the phoneme of character syllable is consonant, above-mentioned threshold intensity can be arranged relatively low, such as can be set to 0.03.For example, for some character character syllable with vowel, ended up with consonant, when specific implementation can be by the word The character syllable of symbol identifies the position that the loudness of a sound value in a cycle of the amplitude greater than 0.1 of loudness of a sound in identified range is 0 The region between location point that loudness of a sound in point and a cycle of the loudness of a sound amplitude less than 0.03 is 0 is used as specified region, into And the audio data of trunk syllable of the audio data in the available specified region as the character.

In addition, the specific value of above-mentioned threshold intensity can also be determined according to the power of background sound in audio data, tool Body, if the background sound in audio data is stronger, above-mentioned threshold intensity can be arranged relatively high, such as can set If being set to 0.16., the background sound in audio data is weaker, can be arranged relatively low by above-mentioned threshold intensity, such as It can be set to 0.047.Certainly, it should be noted that the mode of above-mentioned cited determination threshold intensity is intended merely to more preferably Ground illustrates mode when this implementation.When it is implemented, can also select to use other suitable modes according to specific application scenarios Threshold value intensity.In this regard, this specification is not construed as limiting.

From in the sample data interception obtain the audio data of trunk syllable of character after, correspondingly, above-mentioned from institute The audio data for the linking syllable that interception in sample data obtains between adjacent character is stated, when it is implemented, may include: to cut Take the audio data in the region between the audio data of the trunk syllable of character adjacent in the sample data as the phase The audio data of linking syllable between adjacent character.

In the present embodiment, it further contemplates and is accustomed to according to the voice of the mankind, issued about target number sequence Voice data in first character syllable when, in loudness of a sound between 0 audio data to the trunk syllable of first character There is also a kind of audio datas of connection syllable for playing linking.It therefore, when it is implemented, can be in intercepted samples data Audio data in audio data between the audio data of trunk syllable that has of initial position and the first character as a kind of rank Connect the audio data of syllable, so as to it is subsequent can splice to obtain effect preferably, the audio number of more natural and tripping target number According to start-up portion character audio data.

It in the present embodiment, when it is implemented, can be with two adjacent specified in intercepted samples data sound intermediate frequency data Audio data of the audio data in region as the linking syllable between corresponding adjacent character between region.

It in the present embodiment, when it is implemented, can be in the manner described above respectively to each audio in sample data Data are intercepted, to obtain the linking syllable between the audio data of the trunk syllable of the character, the adjacent character Audio data, and then can save the audio data of the trunk syllable of the acquired character, the adjacent character it Between linking syllable audio data, and according between the audio data of the trunk syllable of the character, the adjacent character Linking syllable audio data, establish the preset audio database.

In one embodiment, it is intercepted from the sample data and obtains the audio data of the trunk syllable of character, tool It may include the following contents when body is implemented: retrieving the character syllable mark in the sample data；According to the character syllable mark Know, intercepts the audio data in the specified region in the identified range of the mark of character syllable described in the sample data as institute State the audio data of the trunk syllable of character.

In one embodiment, the specified region specifically can be understood as being identified in the character syllable In range, the midpoint in identified range is identified as center symmetric points using the character syllable, and the siding-to-siding block length in region and The ratio that the character syllable identifies the siding-to-siding block length of identified range is equal to the region of default ratio.

In one embodiment, the sound of the linking syllable between adjacent character is obtained from interception in the sample data Frequency evidence, when it is implemented, may include the following contents: intercepting the sound of the trunk syllable of character adjacent in the sample data Audio data of the audio data in region of the frequency between as the linking syllable between the adjacent character.

In one embodiment, the linking syllable between adjacent character is being obtained from interception in the sample data After audio data, for find and determine linking effect preferably, the audio data of more natural and tripping linking syllable saved, When it is implemented, the method can also include the following contents:

Whether S1: including same adjacent character in the audio data of the linking syllable between the detection adjacent character Between multiple linking syllables audio data；

S2: determine between the adjacent character linking syllable audio data in include same adjacent character it Between multiple linking syllables audio data in the case where, count multiple linking syllables between the same adjacent character The frequency of occurrences of the audio data of various types of linking syllables in audio data, by the rank of the highest type of the frequency of occurrences The audio data for connecing syllable is determined as the audio data of the linking syllable between the adjacent character.

In the present embodiment, due to sample data be mostly by human hair go out include number voice audio data, For the audio data of multiple linking syllables between same adjacent character, the higher correspondence of the frequency of occurrences is in the normal language of the mankind More frequent, the more universal voice habit of the mankind that can more coincide is used in sound habit.It therefore can be by the highest class of the frequency of occurrences As effect, preferable, more natural audio data is stored in preset audio database the audio data of the linking syllable of type To improve the accuracy of audio database.

Specifically, the audio data of multiple linking syllables between same adjacent character can be divided into multiple types Type, distinguishes the frequency of occurrences of various types of audio datas in statistical sample data, and sieves from a plurality of types of audio datas Select the audio data of the highest type of the frequency of occurrences as between above-mentioned adjacent character linking syllable between audio data, It is stored in preset audio database.Certainly, in addition to the above-mentioned cited appearance frequency according to various types of audio datas Rate filters out the preferable audio data of effect from the audio data of multiple linking syllables between same adjacent character and carries out It can also be using other suitable modes from the audio data of multiple linking syllables between same adjacent character outside saving The preferable audio data of effect is filtered out to be saved.For example, it is also possible to calculate separately multiple between same adjacent character It is connected the MOS value (Mean Opinion Score, mean subjective opinion point) of the audio data of syllable, according to the sound of linking syllable The MOS value of frequency evidence filters out the audio data of the highest linking syllable of MOS value as the linking syllable between adjacent character Audio data.Wherein, above-mentioned MOS value can be used for natural, smooth degree that is more accurate, objectively evaluating audio data.

In one embodiment, more complete voice audio data carries out including target number sequence in order to obtain Voice broadcast, after the audio data for obtaining the target number sequence, the method be embodied when can also include with Lower content:

S1: preset front audio data are obtained, wherein the preset front audio data are used to indicate the target The data object that Serial No. is characterized；

S2: the audio data of the preset front audio data and the target number sequence is spliced, is obtained Voice audio data to be played；

S3: the voice audio data to be played is played.

In the present embodiment, above-mentioned preset front audio data, which specifically can be, is used to indicate target number sequence institute The audio data of the contents such as the data object of characterization.For example, for the casting of the account amount of money, above-mentioned preset front audio number According to may include voice audio data " account to account " before amount of money number is arranged in, and be arranged in the amount of money it is digital after Voice audio data " member ".For stock price casting, above-mentioned preset front audio data may include being arranged in valence Voice audio data " the newest unit price of XX stock is " before lattice number, and the speech audio after price number is set Data " member is per share ".Certainly, above-mentioned cited preset front audio data are that one kind schematically illustrates.Specific implementation When, other audio datas can also be set as above-mentioned preset front audio data according to specific application scenarios.It is right This, this specification is not construed as limiting.

In the present embodiment, it should be noted that in the voice data usually broadcasted front audio data often compared with For fixation, variation is target number sequence to be broadcasted in voice data.For to the casting of the account amount of money, different arrives account Front audio data are all identical in the voice broadcast data of the amount of money.For example, " account to the account amount of money is 50 quaternarys ", " account Be 79 yuan to the account amount of money " in front audio data it is identical be all " account is ", and " member " to the account amount of money, difference It is amount of money number to be broadcasted.Therefore, when it is implemented, in order to improve treatment effeciency, can preset save it is corresponding before Set audio data, after the audio data for having regenerated target number sequence, can by preset front audio data with generated The audio data of target number sequence directly spliced and combined, obtain voice audio data to be played, carry out voice and broadcast It puts.So as to avoid carrying out duplicate audio data synthesis to the identical front audio data of content, treatment effeciency is improved, is made The determination method for obtaining the casting voice that this specification provides is more applicable for the limited embedded system of data-handling capacity, such as Mobile phone etc. broadcasts voice locking equipment really.

Specifically, for example, can first call and set after the audio data for having obtained target number sequence " 54 " Front audio data " account is to the account amount of money ", " member "；According still further to certain sequence by the audio of target number sequence " 54 " Data are spliced and combined with preset front audio data.Specifically, can be in the audio data of " account is to the account amount of money " The audio data of linking objective Serial No. " 54 " afterwards connects " member " after the audio data in target number sequence " 54 ", from And obtained more completely, it include the voice broadcast data to the account amount of money of target number sequence.

In one embodiment, the preset front audio data can specifically include at least one of: be used for Broadcast the audio data of the preposition term of the account amount of money, the audio data of preposition term for broadcasting mileage travelled, for broadcasting Report the audio data etc. of the preposition term of stock price.Certainly, it should be noted that above-mentioned cited preset front audio Data are intended merely to that present embodiment is better described.When it is implemented, can also be selected according to specific application scenarios and requirement It selects and uses other preset audio datas as above-mentioned preset advance data.In this regard, this specification is not construed as limiting.

As shown in fig.8, present description provides a kind of determination methods for broadcasting voice, wherein this method concrete application In casting voice locking equipment side really.When it is implemented, this method may include the following contents.

S801: character string to be played is obtained, wherein the character string includes multiple words arranged according to preset order Symbol；

S803: it obtains in audio data and the character string of the trunk syllable of each character in the character string Adjacent character between linking syllable audio data, wherein the linking syllable is used to connect the master of adjacent character Dry syllable；

S805: splice between the audio data of the trunk syllable of the character and the adjacent character according to preset order Linking syllable audio data, obtain the audio data of the character string to be played.

In the present embodiment, above-mentioned character string to be played specifically can be the character string of Serial No. to be played, It is also possible to the character string of text information to be played.When it is implemented, according to concrete application scene and can implement to require choosing The character string of corresponding contents is selected as above-mentioned character string to be played.Above-mentioned character string to be played is characterized specific interior Hold, this specification is not construed as limiting.

This specification embodiment additionally provides a kind of casting voice locking equipment really, including processor and at storage The memory of device executable instruction is managed, the processor can be according to instruction execution following steps when being embodied: obtaining wait broadcast The target number sequence of report；The target number sequence is converted into character string, wherein the character string includes multiple according to pre- If tactic character；Obtain the audio data and the character of the trunk syllable of each character in the character string The audio data of linking syllable between adjacent character in string, wherein the linking syllable is for connecting adjacent character Trunk syllable；Splice between the audio data of the trunk syllable of the character and the adjacent character according to preset order It is connected the audio data of syllable, obtains the audio data of the target number sequence.

In order to more accurately complete above-metioned instruction, refering to Fig. 9, this specification additionally provides another kind and specifically broadcasts Report voice locking equipment really, wherein locking equipment includes input interface 901, processor 902 and storage to the casting voice really Device 903, above structure is connected by Internal cable, so that each structure can carry out specific data interaction.

Wherein, the input interface 901 specifically can be used for inputting target number sequence to be broadcasted.

The processor 902 specifically can be used for the target number sequence being converted to character string, wherein the word Symbol string includes multiple characters arranged according to preset order；Obtain the audio of the trunk syllable of each character in the character string The audio data of linking syllable between adjacent character in data and the character string, wherein the linking syllable is used In the trunk syllable for connecting adjacent character；Splice the audio data of the trunk syllable of the character and described according to preset order The audio data of linking syllable between adjacent character, obtains the audio data of the target number sequence.

The memory 903 specifically can be used for storing the target number sequence to be broadcasted inputted through input interface 901 Column, preset audio database, and the corresponding instruction repertorie of storage.

In the present embodiment, the input interface 901 specifically can be a kind of support casting voice locking equipment obtain really It takes, and extracts unit, the module of target data sequence to be broadcasted from acquired information data.

In the present embodiment, the processor 902 can be implemented in any suitable manner.For example, processor can be with Take such as microprocessor or processor and storage can by (micro-) processor execute computer readable program code (such as Software or firmware) computer-readable medium, logic gate, switch, specific integrated circuit (Application Specific Integrated Circuit, ASIC), programmable logic controller (PLC) and the form etc. for being embedded in microcontroller.This specification is simultaneously It is not construed as limiting.

In the present embodiment, the memory 903 may include many levels, in digital display circuit, as long as can save Binary data can be memory；In integrated circuits, the circuit with store function of a not no physical form Also memory, such as RAM, FIFO are；In systems, the storage equipment with physical form is also memory, such as memory bar, TF Card etc..

This specification embodiment additionally provides a kind of computer storage medium based on above-mentioned method of payment, the computer Storage medium is stored with computer program instructions, is performed realization in the computer program instructions: by the target number Sequence is converted to character string, wherein the character string includes multiple characters arranged according to preset order；Obtain the character string In each character the audio data of trunk syllable and the adjacent character in the character string between linking syllable Audio data, wherein the linking syllable is used to connect the trunk syllable of adjacent character；Splice the word according to preset order The audio data of linking syllable between the audio data of the trunk syllable of symbol and the adjacent character, obtains the number of targets The audio data of word sequence.

In the present embodiment, above-mentioned storage medium includes but is not limited to random access memory (Random Access Memory, RAM), read-only memory (Read-Only Memory, ROM), caching (Cache), hard disk (Hard DiskDrive, ) or storage card (Memory Card) HDD.The memory can be used for storing computer program instructions.Network communication unit It can be according to standard setting as defined in communication protocol, for carrying out the interface of network connection communication.

In the present embodiment, the function and effect of the program instruction specific implementation of computer storage medium storage, can To compare explanation with other embodiment, details are not described herein.

Refering to fig. 10, on software view, this specification embodiment additionally provides a kind of determining device for broadcasting voice, should Device can specifically include construction module below:

First obtains module 1001, specifically can be used for obtaining target number sequence to be broadcasted；

Conversion module 1002 specifically can be used for the target number sequence being converted to character string, wherein the character String includes multiple characters arranged according to preset order；

Second obtains module 1003, specifically can be used for obtaining the sound of the trunk syllable of each character in the character string The audio data of linking syllable between adjacent character in frequency evidence and the character string, wherein the linking syllable For connecting the trunk syllable of adjacent character；

Splicing module 1004 specifically can be used for splicing according to preset order the audio data of the trunk syllable of the character The audio data of linking syllable between the adjacent character, obtains the audio data of the target number sequence.

In one embodiment, the second acquisition module 1003 can specifically include following structural unit:

Recognition unit specifically can be used for identifying each character in the character string, and determine in the character string Connection relationship between adjacent character, wherein the connection relationship between adjacent character in the character string is used to indicate The successive order of connection between adjacent character in character string；

First acquisition unit specifically can be used for according to each character in the character string, from preset audio data The audio data of the trunk syllable of each character is retrieved and obtained in library, wherein is stored in the preset audio database The audio data of linking syllable between the audio data and adjacent character of the trunk syllable of character；

Second acquisition unit specifically can be used for according to the connection relationship between the adjacent character in the character string, From the audio number that the linking syllable between the adjacent character in the character string is retrieved and obtained in preset audio database According to.

In one embodiment, preset audio database to be used is needed in order to be prepared in advance, when it is implemented, Described device can also include establishing module, specifically can be used for establishing preset audio database.

In one embodiment, described to establish module when it is implemented, may include following structural unit:

Third acquiring unit, specifically can be used for obtaining includes digital audio data as sample data；

First interception unit specifically can be used for the audio that the interception from the sample data obtains the trunk syllable of character Data；

Second interception unit specifically can be used for obtaining the linking between adjacent character from interception in the sample data The audio data of syllable；

Unit is established, specifically can be used for audio data, the adjacent character of the trunk syllable according to the character Between linking syllable audio data, establish the preset audio database.

In one embodiment, described device including playing module when it is implemented, can also specifically can be used for obtaining Take preset front audio data, wherein the preset front audio data are used to indicate target number sequence institute table The data object of sign；The audio data of the preset front audio data and the target number sequence is spliced, is obtained To voice audio data to be played；Play the voice audio data to be played.

In one embodiment, the preset front audio data can specifically include at least one of: be used for Broadcast the audio data of the preposition term of the account amount of money, the audio data of preposition term for broadcasting mileage travelled, for broadcasting Report the audio data etc. of the preposition term of stock changing value.Certainly, it should be noted that above-mentioned cited front audio data Only one kind schematically illustrates.When it is implemented, can also select according to specific application scenarios and requirement or obtain other Suitable audio data is as above-mentioned preset front audio data.In this regard, this specification is not construed as limiting.

It should be noted that unit, device or module etc. that above-described embodiment illustrates, specifically can by computer chip or Entity is realized, or is realized by the product with certain function.For convenience of description, it describes to divide when apparatus above with function It is described respectively for various modules.It certainly, can be the function of each module in same or multiple softwares when implementing this specification And/or realized in hardware, the module for realizing same function can also be realized by the combination of multiple submodule or subelement etc..With Upper described Installation practice is only schematical, for example, the division of the unit, only a kind of logic function is drawn Point, there may be another division manner in actual implementation, such as multiple units or components may be combined or can be integrated into separately One system, or some features can be ignored or not executed.Another point, shown or discussed mutual coupling or straight Connecing coupling or communication connection can be through some interfaces, and the indirect coupling or communication connection of device or unit can be electrical property, Mechanical or other forms.

Therefore the determining device for broadcasting voice that this specification embodiment provides obtains module acquisition phase by second The audio data of linking syllable between adjacent character, and the linking syllable between adjacent character is utilized by splicing module The audio data of the trunk syllable of the character of audio data splicing correspondence obtains the more natural voice audio data of transition, with Voice broadcast is carried out, to solve the problems, such as that number present in existing method broadcasts unnatural, poor user experience, reaches energy Operation cost is taken into account, efficiently, carries out the voice broadcast in relation to number glibly；It include also number by establishing module and obtaining Sample data, the audio data of the trunk syllable from the audio data intercepted in sample data in specified region as character, into And intercept audio of the audio data between the audio data of the trunk syllable of character as the linking syllable between adjacent character Data, so as to establish accurate preset audio database, so as to by retrieving above-mentioned preset audio number According to library, the audio data of more natural, smooth target number sequence is generated.

Although being based on routine or nothing present description provides the method operating procedure as described in embodiment or flow chart Creative means may include more or less operating procedure.The step of enumerating in embodiment sequence is only numerous steps One of rapid execution sequence mode does not represent and unique executes sequence.When device or client production in practice executes, Can be executed according to embodiment or the execution of method shown in the drawings sequence or parallel (such as parallel processor or multithreading The environment of processing, even distributed data processing environment).The terms "include", "comprise" or its any other variant are intended to Cover non-exclusive inclusion, so that the process, method, product or the equipment that include a series of elements not only include those Element, but also including other elements that are not explicitly listed, or further include for this process, method, product or setting Standby intrinsic element.In the absence of more restrictions, being not precluded is including process, method, the product of the element Or there is also other identical or equivalent elements in equipment.The first, the second equal words are used to indicate names, and are not offered as appointing What specific sequence.

It is also known in the art that other than realizing controller in a manner of pure computer readable program code, it is complete Entirely can by by method and step carry out programming in logic come so that controller with logic gate, switch, specific integrated circuit, programmable Logic controller realizes identical function with the form for being embedded in microcontroller etc..Therefore this controller is considered one kind Hardware component, and the structure that the device for realizing various functions that its inside includes can also be considered as in hardware component.Or Person even, can will be considered as realizing the device of various functions either the software module of implementation method can be hardware again Structure in component.

This specification can describe in the general context of computer-executable instructions executed by a computer, such as journey Sequence module.Generally, program module include routines performing specific tasks or implementing specific abstract data types, programs, objects, Component, data structure, class etc..This specification can also be practiced in a distributed computing environment, in these distributed computing rings In border, by executing task by the connected remote processing devices of communication network.In a distributed computing environment, program mould Block can be located in the local and remote computer storage media including storage equipment.

As seen through the above description of the embodiments, those skilled in the art can be understood that this specification It can realize by means of software and necessary general hardware platform.Based on this understanding, the technical solution of this specification Substantially the part that contributes to existing technology can be embodied in the form of software products in other words, the computer software Product can store in storage medium, such as ROM/RAM, magnetic disk, CD, including some instructions are used so that a computer Equipment (can be personal computer, mobile terminal, server or the network equipment etc.) execute each embodiment of this specification or Method described in certain parts of person's embodiment.

Each embodiment in this specification is described in a progressive manner, the same or similar portion between each embodiment Dividing may refer to each other, and each embodiment focuses on the differences from other embodiments.This specification can be used for In numerous general or special purpose computing system environments or configuration.Such as: personal computer, server computer, handheld device Or portable device, laptop device, multicomputer system, microprocessor-based system, set top box, programmable electronics set Standby, network PC, minicomputer, mainframe computer, distributed computing environment including any of the above system or equipment etc..

Although depicting this specification by embodiment, it will be appreciated by the skilled addressee that there are many become for this specification Spirit of the shape without departing from this specification, it is desirable to which the attached claims include these deformations and change without departing from this specification Spirit.

Claims

1. a kind of determination method for broadcasting voice, which comprises

Obtain target number sequence to be broadcasted；

The target number sequence is converted into character string, wherein the character string includes multiple arranging according to preset order Character；

The audio data and the adjacent word in the character string for obtaining the trunk syllable of each character in the character string The audio data of linking syllable between symbol, wherein the linking syllable is used to connect the trunk syllable of adjacent character；

Splice the linking sound between the audio data of the trunk syllable of the character and the adjacent character according to preset order The audio data of section obtains the audio data of the target number sequence.

2. according to the method described in claim 1, obtain the audio data of the trunk syllable of each character in the character string, And the audio data of the linking syllable between the adjacent character in the character string, comprising:

It identifies each character in the character string, and determines the connection relationship between the adjacent character in the character string, Wherein, the connection relationship between the adjacent character in the character string is used to indicate between the adjacent character in character string The successive order of connection；

According to each character in the character string, the trunk sound of each character is retrieved and obtained from preset audio database The audio data of section, wherein the audio data of the trunk syllable of character and adjacent is stored in the preset audio database Character between linking syllable audio data；

According to the connection relationship between the adjacent character in the character string, retrieves and obtain from preset audio database The audio data of linking syllable between adjacent character in the character string.

3. according to the method described in claim 2, the preset audio database is established in the following way:

Obtain sample data；

Interception obtains the audio data of the trunk syllable of character from the sample data；

The audio data of the linking syllable between adjacent character is obtained from interception in the sample data；

According between the audio data of the trunk syllable of the character, the adjacent character linking syllable audio data, Establish the preset audio database.

4. according to the method described in claim 3, interception obtains the audio number of the trunk syllable of character from the sample data According to, comprising:

Retrieve the character syllable mark in the sample data；

It is identified according to the character syllable, intercepts the finger in the identified range of the mark of character syllable described in the sample data Determine the audio data of the audio data in region as the trunk syllable of the character.

5. according to the method described in claim 4, the specified region is identifies in identified range in the character syllable, The midpoint in identified range is identified using the character syllable as center symmetric points, and the siding-to-siding block length in region and the character The ratio that syllable identifies the siding-to-siding block length of identified range is equal to the region of default ratio.

6. according to the method described in claim 4, obtaining the linking sound between adjacent character from interception in the sample data The audio data of section, comprising:

Intercept the audio data conduct in the region between the audio data of the trunk syllable of character adjacent in the sample data The audio data of linking syllable between the adjacent character.

7. according to the method described in claim 3, obtaining the linking between adjacent character from interception in the sample data After the audio data of syllable, the method also includes:

Whether detect in the audio data of the linking syllable between the adjacent character includes between same adjacent character The audio data of multiple linking syllables；

It include more between same adjacent character in the audio data for determining the linking syllable between the adjacent character In the case where the audio data of a linking syllable, the audio number of multiple linking syllables between the same adjacent character is counted The frequency of occurrences of the audio data of various types of linking syllables in, by the audio of the highest linking syllable of the frequency of occurrences Data are determined as the audio data of the linking syllable between the adjacent character.

8. according to the method described in claim 1, the method is also wrapped after the audio data for obtaining the target number sequence It includes:

Obtain preset front audio data, wherein the preset front audio data are used to indicate the target number sequence Arrange characterized data object；

The audio data of the preset front audio data and the target number sequence is spliced, is obtained to be played Voice audio data；

Play the voice audio data to be played.

9. according to the method described in claim 8, the preset front audio data include at least one of: for broadcasting The audio data of preposition term to the audio data of the preposition term of the account amount of money, for broadcasting mileage travelled, for broadcasting stock The audio data of the preposition term of admission fee lattice.

10. a kind of determining device for broadcasting voice, described device include:

First obtains module, for obtaining target number sequence to be broadcasted；

Conversion module, for the target number sequence to be converted to character string, wherein the character string includes multiple according to pre- If tactic character；

Second obtains module, for obtaining the audio data of the trunk syllable of each character in the character string and described The audio data of linking syllable between adjacent character in character string, wherein the linking syllable is adjacent for connecting The trunk syllable of character；

Splicing module, for splicing the audio data and the adjacent character of the trunk syllable of the character according to preset order Between linking syllable audio data, obtain the audio data of the target number sequence.

11. device according to claim 10, the second acquisition module include:

Recognition unit, each character in the character string for identification, and determine the adjacent character in the character string it Between connection relationship, wherein the connection relationship between adjacent character in the character string is used to indicate the phase in character string The successive order of connection between adjacent character；

First acquisition unit, for retrieving and obtaining from preset audio database according to each character in the character string Take the audio data of the trunk syllable of each character, wherein the trunk sound of character is stored in the preset audio database The audio data of linking syllable between the audio data of section and adjacent character；

Second acquisition unit, for according to the connection relationship between the adjacent character in the character string, from preset audio The audio data of the linking syllable between the adjacent character in the character string is retrieved and obtained in database.

12. device according to claim 10, described device further includes establishing module, for establishing preset audio data Library.

13. device according to claim 12, described to establish module, comprising:

Third acquiring unit, for obtaining sample data；

First interception unit obtains the audio data of the trunk syllable of character for intercepting from the sample data；

Second interception unit, for obtaining the audio number of the linking syllable between adjacent character from interception in the sample data According to；

Unit is established, for according to the linking sound between the audio data of the trunk syllable of the character, the adjacent character The audio data of section establishes the preset audio database.

14. device according to claim 10, described device further includes playing module, for obtaining preset front audio Data, wherein the preset front audio data are used to indicate the data object that the target number sequence is characterized；By institute The audio data for stating preset front audio data and the target number sequence is spliced, and speech audio to be played is obtained Data；Play the voice audio data to be played.

15. device according to claim 14, the preset front audio data include at least one of: for broadcasting The audio data of the preposition term of the registration account amount of money, the audio data of preposition term for broadcasting mileage travelled, for broadcasting The audio data of the preposition term of stock price.

16. a kind of determination method for broadcasting voice, which comprises

Obtain character string to be played, wherein the character string includes multiple characters arranged according to preset order；

Splice the linking sound between the audio data of the trunk syllable of the character and the adjacent character according to preset order The audio data of section obtains the audio data of the character string to be played.

17. a kind of casting voice locking equipment really, including processor and for the memory of storage processor executable instruction, The step of processor realizes any one of claims 1 to 9 the method when executing described instruction.

18. a kind of computer readable storage medium is stored thereon with computer instruction, described instruction, which is performed, realizes that right is wanted The step of seeking any one of 1 to 9 the method.