Broadcast the determination method, apparatus and equipment of voice
Technical field
Technology involved in this specification belongs to speech synthesis technique field more particularly to a kind of determination side for broadcasting voice
Method, device and equipment.
Background technique
In daily life and work, many case where needing to digital content progress voice broadcast can be usually faced.Example
Such as, in transaction, businessman would generally come automatic information broadcast businessman's using the plug-in card program built in mobile-phone payment software
The amount of money number of the debt received in account.
Currently, the determination method of existing casting voice is to obtain and splice each character mostly when broadcasting digital content
The audio data of the main part of the character syllable of (including character corresponding with number, unit etc.).For example, broadcasting some
When specific number, the audio data that can extract to obtain the main part of the character syllable of each character in the number is spelled
It connects, obtains the audio data for broadcasting, to carry out voice broadcasting.This character syllable by obtaining and utilizing each character
The audio data of main part directly carry out splicing obtained audio data when playing, often will appear character syllable it
Between transition it is not smooth enough, naturally, people can feel relatively lofty when listening to played voice, feel not meeting people
The voice of class is accustomed to, or even influences understanding of the listener to the digital content broadcasted, and user experience is relatively poor.Therefore, urgently
Need a kind of determination method of casting voice that can carry out voice broadcast to digital content naturally, glibly.
Summary of the invention
This specification is designed to provide a kind of determination method, apparatus and equipment for broadcasting voice, to solve existing method
Present in number casting unnatural, poor user experience the problem of, operation cost can be taken into account by reaching, and be had efficiently, glibly
Close the voice broadcast of digital content.
The determination method, apparatus and equipment for a kind of casting voice that this specification provides are achieved in that
A kind of determination method for broadcasting voice, comprising: obtain target number sequence to be broadcasted;By the target number sequence
Column are converted to character string, wherein the character string includes multiple characters arranged according to preset order;It obtains in the character string
Each character the audio data of trunk syllable and the adjacent character in the character string between linking syllable sound
Frequency evidence, wherein the linking syllable is used to connect the trunk syllable of adjacent character;Splice the character according to preset order
Trunk syllable audio data and the adjacent character between linking syllable audio data, obtain the target number
The audio data of sequence.
A kind of determining device for broadcasting voice, comprising: first obtains module, for obtaining target number sequence to be broadcasted
Column;Conversion module, for the target number sequence to be converted to character string, wherein the character string includes multiple according to pre-
If tactic character;Second obtains module, the audio of the trunk syllable for obtaining each character in the character string
The audio data of linking syllable between adjacent character in data and the character string, wherein the linking syllable is used
In the trunk syllable for connecting adjacent character;Splicing module, for splicing according to preset order the trunk syllable of the character
The audio data of linking syllable between audio data and the adjacent character, obtains the audio number of the target number sequence
According to.
A kind of determination method for broadcasting voice, comprising: obtain character string to be played, wherein the character string includes more
A character arranged according to preset order;The audio data of the trunk syllable of each character in the character string is obtained, and
The audio data of linking syllable between adjacent character in the character string, wherein the linking syllable is for connecting phase
The trunk syllable of adjacent character;Splice the audio data and the adjacent word of the trunk syllable of the character according to preset order
The audio data of linking syllable between symbol, obtains the audio data of the character string to be played.
A kind of casting voice locking equipment really, including processor and for the storage of storage processor executable instruction
Device is realized when the processor executes described instruction and obtains target number sequence to be broadcasted;The target number sequence is turned
It is changed to character string, wherein the character string includes multiple characters arranged according to preset order;It obtains each in the character string
The audio number of linking syllable between the audio data of the trunk syllable of a character and the adjacent character in the character string
According to, wherein the linking syllable is used to connect the trunk syllable of adjacent character;Splice the master of the character according to preset order
The audio data of linking syllable between the audio data of dry syllable and the adjacent character, obtains the target number sequence
Audio data.
A kind of computer readable storage medium, is stored thereon with computer instruction, and described instruction is performed realization and obtains
Target number sequence to be broadcasted;The target number sequence is converted into character string, wherein the character string include it is multiple by
According to the character of preset order arrangement;Obtain the audio data of the trunk syllable of each character in the character string and described
The audio data of linking syllable between adjacent character in character string, wherein the linking syllable is adjacent for connecting
The trunk syllable of character;According to preset order splice the trunk syllable of the character audio data and the adjacent character it
Between linking syllable audio data, obtain the audio data of the target number sequence.
The determination method, apparatus and equipment for a kind of casting voice that this specification provides, due to by obtaining adjacent word
The audio data of linking syllable between symbol, and utilize the audio data splicing correspondence of the linking syllable between adjacent character
The audio data of the trunk syllable of character obtains the more natural voice audio data of transition, to carry out voice broadcast, to solve
The problem of number present in existing method of having determined casting unnatural, poor user experience, operation cost can be taken into account by reaching, efficiently,
The voice broadcast in relation to digital content is carried out glibly.
Detailed description of the invention
In order to illustrate more clearly of this specification embodiment or technical solution in the prior art, below will to embodiment or
Attached drawing needed to be used in the description of the prior art is briefly described, it should be apparent that, the accompanying drawings in the following description is only
The some embodiments recorded in this specification, for those of ordinary skill in the art, in not making the creative labor property
Under the premise of, it is also possible to obtain other drawings based on these drawings.
Fig. 1 is in a Sample Scenario, and the determination method of the casting voice provided using this specification embodiment carries out
A kind of schematic diagram for the embodiment broadcasted to the account amount of money;
Fig. 2 is in a Sample Scenario, and the determination method of the casting voice provided using this specification embodiment is spliced
Obtain a kind of schematic diagram of embodiment of the audio data of target number sequence;
Fig. 3 is that the determination method of the casting voice in a Sample Scenario, provided using this specification embodiment obtains
For being played to a kind of schematic diagram of embodiment of the voice audio data of the account amount of money;
Fig. 4 is the schematic diagram of a kind of embodiment of annotated audio data in a Sample Scenario;
Fig. 5 is intercepted between the audio data and adjacent character of the trunk syllable of character in a Sample Scenario
It is connected a kind of schematic diagram of embodiment of the audio data of syllable;
Fig. 6 is that a kind of process of embodiment of the determination method for the casting voice that one embodiment of this specification provides is shown
It is intended to;
Fig. 7 is the position in determining specified region in the determination method for the casting voice that one embodiment of this specification provides
A kind of schematic diagram of embodiment of point;
Fig. 8 is that a kind of process of embodiment of the determination method for the casting voice that one embodiment of this specification provides is shown
It is intended to;
Fig. 9 is a kind of embodiment for the casting voice structure of locking equipment really that one embodiment of this specification provides
Schematic diagram;
Figure 10 is a kind of embodiment of the structure of the determining device for the casting voice that one embodiment of this specification provides
Schematic diagram.
Specific embodiment
In order to make those skilled in the art more fully understand the technical solution in this specification, below in conjunction with this explanation
Attached drawing in book embodiment is clearly and completely described the technical solution in this specification embodiment, it is clear that described
Embodiment be only this specification a part of the embodiment, instead of all the embodiments.The embodiment of base in this manual,
Every other embodiment obtained by those of ordinary skill in the art without making creative efforts, all should belong to
The range of this specification protection.
In view of the determination method of existing casting voice is often without in depth analyzing the language when mankind normally speak
Habit and characteristic voice.For example, people after issuing character syllable " ten ", issues character sound when saying " 16 " this number
Before saving " six ", usually can also it issue a kind of for connecting the linking syllable of above two character syllable " ten " and " six ".And not
Linking syllable between same character syllable can have differences toward contact.Such as character syllable " five " and character syllable in " 50 "
Linking syllable between " ten " is also not phase with the syllable that is connected in " 15 " between character syllable " ten " and character syllable " five "
With.Above-mentioned linking syllable itself does not correspond to some specific character, can not characterize what specific interior perhaps meaning, but
Similar to a kind of connection auxiliary word, character syllable adjacent in the normal word of the mankind is connected together naturally, glibly, with
Just the person that understands can preferably receive and understand information and content in speaker's word.And the determination of existing casting voice
Voice habit and characteristic voice of the method due to not accounting for the above-mentioned mankind, in synthesis about target number sequence to be broadcasted
Voice audio data when, the sound of the main part of the character syllable of the corresponding numerical character in usual intercepted samples data
Frequency evidence is directly spliced.Due to not meeting the natural transition of human speech habit between adjacent character syllable, cause
Based on the above method voice audio data generated about target number sequence when playing often unlike number that the mankind say
Word is so natural, smooth, or even will affect understanding of the people to the digital content broadcasted, and causes using upper inconvenience.Cause
This, existing method in the specific implementation, often has that number broadcasts unnatural, poor user experience.
For the basic reason for generating the above problem, this specification is deep, comprehensively analyzes the mankind when normally speaking
Speech habits and characteristic voice consider and have paid close attention to the linking syllable between mankind's character syllable adjacent when normally speaking
In the presence of and effect.When establishing preset audio database, not only interception saves the audio number of the trunk syllable of character syllable
According to there are also consciousness ground to intercept the audio data for being connected syllable saved between adjacent character syllable.And then it is a certain generating
When the voice audio data of a specific number, the trunk syllable of each character in the corresponding multiple characters of the number can be obtained simultaneously
Audio data and adjacent character between linking syllable audio data, recycle the linking syllable between adjacent character
Audio data splicing correspondence two adjacent characters trunk syllable audio data so that speech audio number generated
In, the transition between adjacent character syllable is more natural, smooth, to solve the casting of number present in existing method
The problem of unnatural, poor user experience, operation cost can be taken into account by reaching, and carry out related digital voice broadcast efficiently, glibly.
For these reasons, this specification embodiment provide one kind can efficiently, carry out digital speech casting naturally
Casting voice locking equipment really, by the casting voice, following functions are may be implemented in locking equipment really: obtaining mesh to be broadcasted
Mark Serial No.;The target number sequence is converted into character string, wherein the character string includes multiple according to preset order
The character of arrangement;It obtains in audio data and the character string of the trunk syllable of each character in the character string
The audio data of linking syllable between adjacent character, wherein the linking syllable is used to connect the trunk of adjacent character
Syllable;Splice the linking sound between the audio data of the trunk syllable of the character and the adjacent character according to preset order
The audio data of section obtains the audio data of the target number sequence;Play the audio data of the target number sequence.
In the present embodiment, the casting voice really locking equipment can be it is a kind of used in user side it is relatively simple
Electronic equipment.Specifically, the casting voice really locking equipment can be it is a kind of have data operation, voice play function with
And the electronic equipment of network interaction function;Or run in the electronic equipment, it is data processing, voice plays and network
The software application that the offers such as interaction are supported.
Specifically, above-mentioned casting voice really locking equipment for example can be desktop computer, tablet computer, laptop,
Smart phone, digital assistants, intelligent wearable device, shopping guide's terminal etc..Alternatively, locking equipment can also be with really for above-mentioned casting voice
It is the software application that can be run in above-mentioned electronic equipment.For example, locking equipment can also be in intelligence above-mentioned casting voice really
The XX treasured APP run in energy mobile phone.
In a Sample Scenario, the determination method for the casting voice that application this specification embodiment provides can be passed through
The amount of money of the casting voice debt that locking equipment arrives account for the trade company A account for broadcasting trade company A automatically in real time really is digital.
In the present embodiment, oneself mobile phone can be used as above-mentioned casting voice locking equipment really in trade company A.Having
Before body is implemented, the account of phone number and trade company A on certain payment platform is closed in the setting operation that trade company A can first pass through mobile phone
Connection.As shown in fig.1, can be directly flat by certain payment on mobile phone after usually consumer consumes in the shop of trade company A
The payment software of platform carries out checkout payment on the net, is paid the bill face to face without lower online with trade company.Specifically, consumer
It can use mobile phone to be communicated with the server of certain payment platform, be transferred accounts by the debt that payment platform will cope with to trade company A
Into the account of trade company A, checkout payment is completed.The server of payment platform receives consumer in the account of confirmation trade company A and leads to
After crossing the debt transferred accounts on the net, can be sent to the mobile phone of trade company A account prompt information (such as be sent to account SMS Tip, or
Pushed in the payment APP on the mobile phone of trade company A corresponding to account prompted dialog frame etc.), to prompt trade company A: consumer is
Checkout payment is carried out on the net, while the debt that is received of the account that trade company A can be also identified in prompt information is specific
Amount of money number, so that trade company A can further confirm that whether consumer is accurate in the amount of money of the debt of online payment.For example, branch
The server for paying platform can be when confirming that the account of trade company A receives 54 yuan of the debt that consumer transfers accounts on the net, Ke Yixiang
Sent with the associated mobile phone of the account of trade company A include the following contents prompt information: " account to account 54 yuan ".
Usually during business, trade company can relatively hurry, and be often possible to no time and leaf through in time, read above-mentioned mention
Show information, therefore checkout payment whether has been carried out on the net by inconvenient confirmation consumer in time and consumer ties on the net
Whether the amount of money of account payment is accurate.At this moment trade company wishes to be received by the mobile phone account that voice broadcast goes out oneself in real time
The specific amount of money number of debt, even if such trade company is busy during doing business, no time oneself goes to leaf through, confirms payment platform
Server send prompt information, can also recognize in time consumer by payment platform checkout pay the bill concrete condition.
Mobile phone can first parse prompt information, and extract after the prompt information for receiving payment platform transmission
The amount of money digital " 54 " in prompt information is as target number sequence to be broadcasted, so as to corresponding to the subsequent determination Serial No.
Audio data carry out voice broadcast.
In the present embodiment, above-mentioned prompt information is usually to generate according to unalterable rules, therefore have relatively uniform
Format.For example, above-mentioned prompt information can be according to following format composition: preposition leading question portion in this Sample Scenario
Divide (i.e. " account to account ")+numerical portion (the i.e. specific amount of money " 54 ")+unit portion (i.e. " member ").Therefore, it is obtaining wait broadcast
Target number sequence, i.e., in prompt information when the particular content of numerical portion, can according to above-mentioned generation prompt information
The corresponding resolution rules of unalterable rules parse prompt information, are split, it can mention from the numerical portion of prompt information
Obtain number to be broadcasted, i.e. target number sequence.
In the present embodiment, it should be noted that for different prompt informations, above-mentioned preposition leading question part and list
The content of bit position is usually the same, and only the content of numerical portion can be different and different with prompt information.It therefore, can be with
The audio data for pre-generating and storing the audio data of unified preposition leading question part, unit portion, in casting prompt letter
When breath, it is only necessary to generate the audio data of numerical portion in prompt information, then the sound with pre-stored preposition leading question part
Frequency is spliced according to the audio data of, unit portion, it can obtains the complete voice audio data of prompt information.
Target number sequence first can be converted to correspondence after having acquired the target number sequence wait broadcast by mobile phone
Character string.Wherein, above-mentioned character string specifically can be understood as the character syllable for characterizing target number sequence, and according to
The character string arranged that puts in order and (preset and put in order) corresponding with target number sequence, each word in above-mentioned character string
Accord with a character syllable in corresponding target number sequence.
For example, the corresponding character string obtained after target number sequence " 54 " conversion can be expressed as " 54 ".Character
String " 54 " can be understood as the character string of the character syllable of characterization target number sequence " 54 ", wherein the word in character string
Symbol " five ", " ten " are corresponding with number " 5 " being located on ten in target number sequence;Character " four " and target in character string
The number " 4 " being located on a position in Serial No. is corresponding.And the character in character string according to in target number sequence " 54 "
(i.e. first " 5 " afterwards " 4 ") corresponding default put in order that put in order of number is arranged, i.e., first row it is ten corresponding on " 5 "
Character " five " " ten ", then arrange the character " four " of " 4 " on corresponding position.Certainly, it should be noted that above-mentioned cited character
String and corresponding default put in order are intended merely to be better described this specification embodiment.When it is implemented, can be with
According to specific scenario, selection uses the character string and preset rules of other forms, can also be to target number sequence not
It converts, directly carries out identification splicing etc..In this regard, this specification is not construed as limiting.
Mobile phone can be identified and be determined in character string and arrange in order after obtaining character string corresponding with target number sequence
Connection relationship between each character and adjacent character of column.Wherein, the connection relationship between above-mentioned adjacent character specifically may be used
Be interpreted as adjacent two characters between sequencing a kind of identification information.For example, character in character string " 54 "
" five " and " ten " are two adjacent characters, and the connection relationship between " five " and " ten " can be stated are as follows: character " five " hyphen
" ten ".Certainly, it should be noted that the connection relationship between above-mentioned cited adjacent character is that one kind schematically illustrates.Tool
Body can also indicate the connection relationship between adjacent character by other identifier mode when implementing.In this regard, this specification does not limit
It is fixed.
In the present embodiment, mobile phone is by character recognition, can determine each character in character string in order according to
Secondary is " five ", " ten ", " four ", and the connection relationship between corresponding adjacent character is successively are as follows: character " five " hyphen " ten ", character
" ten " hyphen " four ".
Further, mobile phone can according to the connection relationship between each character and adjacent character identified,
It is retrieved from preset audio database, to obtain the connection relationship pair between each character and adjacent character
The audio data answered, i.e., in acquisition character string between the audio data and adjacent character of the trunk syllable of each character
It is connected the audio data of syllable.
Wherein, the trunk syllable of above-mentioned character specifically can be understood as the major part of character syllable, the usual part
Syllable identification with higher, the audio frequency characteristics such as fundamental frequency, the loudness of a sound of trunk syllable of the same character syllable are more consistent, closely
Patibhaga-nimitta is same, therefore can extract the trunk syllable of character syllable to distinguish other character syllables.For example, people is issuing character
When " five " corresponding voice, the voice of middle section is the major part of the character syllable, i.e. trunk syllable, is typically different human hair
When character " five " corresponding voice, although having differences, trunk syllable part is all consistent mostly.
Linking syllable between above-mentioned adjacent character specifically can be understood as the trunk syllable for connecting adjacent character
The syllable of coupling part.For example, people is at sending " 50 ", in the trunk syllable of character " five " and the trunk syllable of character " ten "
Between coupling part language, the as linking syllable between character " five " and character " ten ".This part syllable is different from trunk
There is no what concrete meanings for syllable itself, characterize some specific character, but the wave in audio data without in corresponding
Graphic data is not 0.People voice habit in, it will usually appear between the trunk syllable of adjacent character, play undertaking,
The effect of transition so that people's word different from machine pronounce, be not it is dull, frigidly directly by each character
Trunk syllable simply connects, but naturally, is glibly transitioned into another character syllable from a character syllable.This
What the number that sample broadcasts more met the mankind hears habit, convenient for the reception and understanding of the mankind, while can also make listener
Feel more comfortable when listening to, experience is more preferable.Also you need to add is that, (including character is different and character for different adjacent characters
Identical characters sequencing difference etc.) between linking syllable it is often also not identical.For example, the rank between character " five " and " ten "
Connect linking syllable between syllable and " five " and " hundred ", the linking syllable between " ten " and " five " corresponding audio data wave
It is all had differences between each other in shape.Therefore, in the present embodiment, it is accurate using the connection relationship between adjacent character to need
Ground gets the audio data of corresponding linking syllable.
Above-mentioned preset audio database specifically can be in advance there is Platform Server to establish and be stored in server or
Broadcast the database of voice locking equipment really, wherein specifically can wrap in above-mentioned preset audio database containing each character
Trunk syllable audio data and each adjacent character between linking syllable audio data.
Specifically, mobile phone can according to the connection relationship between each character and adjacent character identified,
Retrieve preset audio database respectively obtain the audio data A of trunk syllable of character " five ", " ten " trunk syllable sound
Frequency is according to the audio for being connected syllable between B, the audio data C of the trunk syllable of " four " and character " five " hyphen " ten "
The audio data r of linking syllable between data f, character " ten " hyphen " four ".
In turn, mobile phone can be by the linking between the audio data and adjacent character of the trunk syllable of above-mentioned character
The audio data of syllable is spliced according to put in order (i.e. the preset order) of character in character string, to obtain corresponding number of targets
The audio data of word sequence.Specifically, can according to preset order (i.e. with the arrangement of character in the character string of target number sequence
Sequentially), the audio data of the trunk syllable of each character is arranged;Recycle the audio number of the linking syllable between adjacent character
According to the audio data for the trunk syllable for connecting adjacent character.
Specifically, for example, can with as shown in fig.2, according to character in character string (i.e. " 54 ") the elder generation that puts in order
The audio data A for arranging the trunk syllable of " five ", then arranges the audio data B of the trunk syllable of " ten ", the master of most heel row " four " again
The audio data C of dry syllable.After the audio data for sequencing trunk syllable;Character " five " hyphen " ten " may further be utilized
Between linking syllable audio data f connection audio data A and audio data B, using between character " ten " hyphen " four "
Linking syllable audio data r connection audio data B and audio data C.It is finally obtained to have spliced, for target number
The audio data of sequence " 54 " can indicate are as follows: " A-f-B-r-C ".Transition has just been obtained so more naturally for number of targets
The audio data of word sequence.
After the audio data for obtaining target number sequence, it can will pre-set and be stored in mobile phone or server
Front audio data (such as the preposition leading question part for being used to indicate the data object that the target number sequence is characterized
The audio data of audio data, unit portion) spliced with the audio data of target number sequence, obtain voice to be played
Audio data, mobile phone play corresponding content information further according to above-mentioned voice audio data.
In the present embodiment, as shown in fig.3, the mobile phone of trade company A is available default and to be stored in mobile phone local
Front audio data, that is, the audio data Y for stating " account to account " pre-set and the audio for stating " member "
Data Z;And splice above-mentioned front audio data with the audio data about target number sequence " 54 " generated, it obtains
Complete voice audio data to be played, can be expressed as " Y-A-f-B-r-C-Z ", and then play above-mentioned speech audio number
According to such trade company A can hear clear, nature, smoothness, and more meet the voice broadcast that the mankind normally listen to habit, keep away
Machine talk influence caused by trade company's listening experience is exempted from.
Therefore the determination method for broadcasting voice that this specification embodiment provides is by obtaining between adjacent character
Linking syllable audio data, and utilize the character of the audio data splicing correspondence of the linking syllable between adjacent character
The audio data of trunk syllable obtains the more natural voice audio data of transition, to carry out voice broadcast, to solve existing
There is the problem of unnatural casting of number present in method, poor user experience, operation cost can be taken into account by reaching, efficiently, glibly
Carry out the voice broadcast in relation to number.
In another Sample Scenario, the server of payment platform can pre-establish preset audio database, and will
Above-mentioned preset audio database is sent to casting voice locking equipment really.Casting voice really locking equipment receive it is preset
Preset audio database can be stored in the local of casting voice locking equipment really, to broadcast voice by audio database
Really locking equipment can be by retrieving each character in character string of the preset audio database to obtain target number sequence
The audio data of trunk syllable and the adjacent character in the character string between linking syllable audio data.When
So, the server of payment platform can not also send preset audio database after establishing preset audio database
To casting voice locking equipment really, and it is stored in server-side, locking equipment is generating target number sequence to casting voice really
It, can be by calling the preset audio database for being stored in server-side to obtain target number sequence when the audio data of column
Between the audio data of the trunk syllable of each character in the character string of column and the adjacent character in the character string
It is connected the audio data of syllable.
In the present embodiment, when it is implemented, the available audio data for including number of server is as sample
Data.And then the sound for obtaining the trunk syllable of character can be intercepted respectively from the sample data after mark according to certain rules
The audio data of linking syllable between frequency evidence and adjacent character, further according to the audio of the trunk syllable of above-mentioned character
The audio data of linking syllable between data and adjacent character, establishes preset audio database.
Specifically, it may include: interception announcer that above-mentioned acquisition, which includes digital audio data as sample data,
Broadcasting in audio data includes the audio data with the casting content of digital correlation as above-mentioned sample data.It can also acquire
The voice data that people reads according to pre-set text, as above-mentioned sample data, wherein upper pre-set text can be pre-set
Comprising there are many content of text of number combination.
After obtaining sample data, first sample data can also be labeled.Specifically, can with as shown in fig.4,
In acquired sample data, the institute of the corresponding audio data of each character syllable can be identified using character syllable mark
The range areas at place.For example, for the audio data " 56 " in sample data, it can use " 5 ", " 10 ", " 6 " are made respectively
It is identified for the character syllable mark of character syllable " five ", the character syllable of character syllable " ten ", the character sound of character syllable " six "
Feast-brand mark is known, and identifies character syllable " five ", the range areas of " ten ", " six " in the audio data respectively.Certainly, it needs
Bright, above-mentioned cited character syllable mark is that one kind schematically illustrates, and should not constitute the improper limit to this specification
It is fixed.
Further, it when interception obtains the audio data of trunk syllable of character from sample data, specifically can wrap
It includes: retrieving the character syllable mark in the sample data;It is identified according to the character syllable, intercepts institute in the sample data
State the audio of trunk syllable of the audio data in the specified region in the identified range of character syllable mark as the character
Data.
Specifically, the character syllable mark of determining sample data sound intermediate frequency data can be retrieved, and then can be according to above-mentioned
Character syllable mark, determines regional scope of each character syllable in audio data in sample data, i.e., in sample data
Character syllable identify identified range;Again according to preset in the regional scope from above-mentioned character syllable in audio data
Rule intercepts the audio data of trunk syllable of the audio data as character in specified region.For example, in sample data
Audio data " 56 ", can first retrieve in the audio data character syllable mark " 5 ", " 10 ", " 6 ";And then it can be with
" 5 " are identified according to character syllable and determine regional scope of the character syllable " five " in audio data, are identified according to character syllable
" 10 " determine regional scope of the character syllable " ten " in audio data, identify " 6 " according to character syllable and determine character syllable
The regional scope of " six " in audio data;And then it can be cut from the audio data of the regional scope where character syllable " five "
Fetching determines the audio data of the audio data in region as the trunk syllable of character " five ", the region where character syllable " ten "
The audio data that trunk syllable of the audio data in specified region as character " ten " is intercepted in the audio data of range, from character
Trunk sound of the audio data in specified region as character " six " is intercepted in the audio data of regional scope where syllable " six "
The audio data of section.
In the specifically audio data of the trunk syllable of interception character, it is contemplated that people corresponds to number when saying specific number
The audio data of the middle section of the syllable of each of word number or unit is more consistent, i.e. identical characters sound mostly
The audio data difference of the most middle section of the audio data of section is relatively small, and the audio data of kinds of characters syllable is mostly intermediate
Partial audio data difference is relatively large.For example, people say " 56 " and " 65 " the two number when, " 56 "
In character " five " syllable audio data the middle section often audio with the syllable of the character " five " in " 65 "
The middle section of data is identical.It therefore, can be using the audio data of the middle section in the audio data of character syllable as referring to
The audio data for determining region is intercepted, to obtain the audio data of the trunk syllable of the character syllable.Based on These characteristics, tool
When body is implemented, it can be identified in the character syllable in identified range, identified range is identified with the character syllable
In midpoint be center symmetric points, and the siding-to-siding block length in region and the character syllable identify the siding-to-siding block length of identified range
Ratio be equal to the region of default ratio.
For example, can be as shown in fig.5, by right centered on the midpoint O in character syllable mark " 5 " range for being identified
Claim point, intercepting the 1/2 region group cooperation of the two sides central symmetry point O respectively is specified region, specifies the audio data in region true this
It is set to the audio data of the trunk syllable of character " five ".Wherein, above-mentioned specified region accounts for the model that character syllable mark " 5 " is identified
1/2 enclosed.In the manner described above, it can also intercept to obtain the audio data and character " six " of the trunk syllable of character " ten "
Trunk syllable audio data.Certainly, above-mentioned cited default ratio is intended merely to that this specification implementation is better described
Mode.When it is implemented, other numerical value can also be selected as default ratio according to specific scenario, it is specified to determine
Region intercepts the audio data of the trunk syllable of corresponding character in turn.
It, can be with adjacent in the audio data of intercepted samples data after interception obtains the audio data of trunk syllable of character
Character trunk syllable audio data between region in audio data as the linking between above-mentioned adjacent character
The audio data of syllable.
For example, can with as shown in fig.5, the adjacent character " five " in the audio data of intercepted samples data trunk sound
The audio data in region between the audio data of the trunk syllable of the audio data and character " ten " of section is as character " five "
The audio data of linking syllable between hyphen " ten ", i.e., the audio data of the linking syllable between adjacent character.According to
Aforesaid way can also intercept to obtain the audio data of the linking syllable between adjacent character " ten " and character " six ".
In this Scene case, it is contemplated that if sample data compared with horn of plenty, can intercept to obtain multiple same phases of characterization
The audio data of linking syllable between adjacent character.For example, in audio data " 56 ", " 54 " in sample data
Audio data (or character " five " loigature for being connected syllable that can be truncated between identical character " five " and character " ten "
Accord with the audio data of the linking syllable between " ten ").In addition, may include " 56 " that different people issues in sample data
Audio data, and then the linking between multiple characters " five " and character " ten " can be obtained based on the audio data of different people
The audio data of syllable.
It therefore, include same adjacent in the audio data of the linking syllable between the adjacent character intercepted
Character between linking syllable audio data in the case where, in order to obtain the preferable audio data of effect as adjacent word
The audio data of linking syllable between symbol, when to be subsequently used for being connected the audio data of the trunk syllable of corresponding character more
It is natural, smoothness, the audio data of multiple linking syllables between same adjacent character can be divided into multiple types, point
The frequency of occurrences of various types of audio datas in other statistical sample data, and screen and occur from a plurality of types of audio datas
The audio data of the highest type of frequency is stored in as the audio data between the linking syllable between above-mentioned adjacent character
In preset audio database.Certainly, in addition to the above-mentioned cited frequency of occurrences according to various types of audio datas is from same
It is filtered out in the audio data of multiple linking syllables between one adjacent character outside the preferable audio data of effect saved
It can also be filtered out from the audio data of multiple linking syllables between same adjacent character using other suitable modes
The preferable audio data of effect is saved.For example, it is also possible to calculate separately multiple linking sounds between same adjacent character
The MOS value (Mean Opinion Score, mean subjective opinion point) of the audio data of section, according to the audio data of linking syllable
MOS value, filter out MOS value it is highest linking syllable audio data as between adjacent character linking syllable audio
Data.Wherein, above-mentioned MOS value can be used for natural, smooth degree that is more accurate, objectively evaluating audio data.
Similar, when interception obtains the audio data of the trunk syllable of multiple same characters of characterization, can count same
The frequency of occurrences of the audio data of different types of trunk syllable in the audio data of multiple trunk syllables of character, and then can be with
The highest audio data conduct of the frequency of occurrences is filtered out from the audio data of a plurality of types of trunk syllables of same sub- symbol should
The audio data of the trunk syllable of character is simultaneously saved into preset audio database.The more of same character can also be determined respectively
The MOS value of the audio data of a trunk syllable filters out sound of the highest audio data of MOS value as the trunk syllable of the character
Frequency evidence simultaneously saves medium to preset audio database.
Therefore the determination method for broadcasting voice that this specification embodiment provides is by obtaining between adjacent character
Linking syllable audio data, and utilize the character of the audio data splicing correspondence of the linking syllable between adjacent character
The audio data of trunk syllable obtains the more natural voice audio data of transition, to carry out voice broadcast, to solve existing
There is the problem of unnatural casting of number present in method, poor user experience, operation cost can be taken into account by reaching, efficiently, glibly
Carry out the voice broadcast in relation to number;Include also digital sample data by obtaining, specified area is intercepted from sample data
The audio data of audio data in domain as the trunk syllable of character, so intercept character trunk syllable audio data it
Between audio data as between adjacent character linking syllable audio data, so as to establish accurately it is preset
Audio database, so as to generate more natural, smooth target number sequence by retrieving above-mentioned preset audio database
The audio data of column.
As shown in fig.6, present description provides a kind of determination methods for broadcasting voice, wherein this method concrete application
In casting voice locking equipment (or user terminal) side really.When it is implemented, this method may include the following contents.
S601: target number sequence to be broadcasted is obtained.
In the present embodiment, above-mentioned target number sequence to be broadcasted specifically can be the amount of money number of the debt of account
54 in word, such as 54 yuan;It is also possible to the distance number of vehicle driving mileage, such as 80 in 80 kilometers;It can also be stock
The real time price of ticket, for example, 20.9 yuan it is per share in 20.9.Certainly the data that above-mentioned cited target number sequence is characterized
Object is intended merely to that present embodiment is better described.When it is implemented, according to specific application scenarios, above-mentioned mesh to be broadcasted
Mark Serial No. can also be the number for characterizing other data objects.In this regard, this specification is not construed as limiting.
In the present embodiment, it is specific it is to be understood that obtaining number to be broadcasted to obtain target number sequence to be broadcasted
According to parsing data to be broadcasted, extract in the data to be broadcasted number as above-mentioned target number sequence to be broadcasted.Example
Such as, the server of payment platform confirm user account arrive 54 yuan of account when, understand to the associated casting language of the account of the user
Really locking equipment (such as mobile phone of the user) is sent to account prompt information " account to account 54 yuan " to sound.The determination of casting voice is set
It is standby to receive above-mentioned to after account prompt information, it can parse the prompt information, and extract the number " 54 " in the prompt information
As target number sequence to be broadcasted.Certainly, it should be noted that obtain target number sequence to be broadcasted cited by above-mentioned
Column are that one kind schematically illustrates, in this regard, this specification is not construed as limiting.
S603: the target number sequence is converted into character string, wherein the character string includes multiple suitable according to presetting
The character of sequence arrangement.
In the present embodiment, wherein above-mentioned character string specifically can be understood as the word for characterizing target number sequence
Syllable is accorded with, and according to the character string of (the i.e. default to put in order) arrangement that puts in order corresponding with target number sequence, it is above-mentioned
Each character corresponds to a character syllable in target number sequence in character string.For example, the word of target number sequence " 67 "
Symbol string can be expressed as " 67 ", wherein character " six ", " ten ", " seven " correspond respectively to one in target number sequence
Character syllable, and above-mentioned character is arranged according to preset order corresponding with target number sequence.Certainly, it should be noted that
Above-mentioned cited character string is intended merely to that present embodiment is better described.When it is implemented, as the case may be can be with
Selection uses other kinds of character string.In this regard, this specification is not construed as limiting.
In the present embodiment, above-mentioned that the target number sequence is converted into character string, according to being specifically understood that
Target number sequence is converted to the character of the corresponding character syllable for being used to characterize target number sequence by preset mapping ruler
String.For example, the number " 6 " in target number sequence " 67 " ten can be converted to correspondence according to preset mapping ruler
Character " six " and " ten ", the number " 7 " on a position is converted into corresponding character " seven ", according still further to target number sequence
" 67 " corresponding preset order, the character arranged, so that obtaining corresponding character string is " 67 ".Certainly, it needs
Bright, the above-mentioned cited implementation that the target number sequence is converted to character string is that one kind is schematically said
It is bright.When it is implemented, corresponding character can also be converted to target number sequence using other modes as the case may be
String.In this regard, this specification is not construed as limiting.
S605: it obtains in audio data and the character string of the trunk syllable of each character in the character string
Adjacent character between linking syllable audio data, wherein the linking syllable is used to connect the master of adjacent character
Dry syllable.
In the present embodiment, the trunk syllable of above-mentioned character specifically can be understood as the major part of a character syllable
(such as middle section of character syllable).The syllable identification with higher of this usual part, the master of the same character syllable
The audio frequency characteristics such as fundamental frequency, the loudness of a sound of dry syllable are more consistent, approximately uniform, therefore the trunk syllable that can extract character syllable is used
To distinguish other character syllables.
In the present embodiment, the linking syllable between above-mentioned adjacent character specifically can be understood as connecting adjacent words
The syllable of the coupling part of the trunk syllable of symbol.The syllable of this usual part is different from trunk syllable itself there is no what specific
Meaning characterizes some specific character without in corresponding, but the Wave data in audio data is not 0.In the language of people
In sound habit, it will usually appear between the trunk syllable of adjacent character, play the role of undertaking, transition, so as to make
People's word be different from machine pronunciation, be not it is dull, directly the trunk syllable of each character is simply connected frigidly
Come, but naturally, is glibly transitioned into another character syllable from a character syllable.For example, people is at sending " 50 ",
The language of coupling part between the trunk syllable of character " five " and the trunk syllable of character " ten ", as character " five " and character
Linking syllable between " ten ".
In the present embodiment, the audio data of the trunk syllable of each character in the above-mentioned acquisition character string, with
And the audio data of the linking syllable between the adjacent character in the character string, it can specifically include: according to target number
Specific character in the character string of sequence retrieves preset audio database to obtain the master of each character in the character string
The audio data of linking syllable between adjacent character in the audio data of dry syllable and the character string.
Wherein, above-mentioned preset audio database specifically can be establishing in advance and be stored in server or casting language
The database of sound locking equipment really.Specifically, specifically can wrap the master containing each character in above-mentioned preset audio database
The audio data of linking syllable between the audio data of dry syllable and each adjacent character.
S607: splice between the audio data of the trunk syllable of the character and the adjacent character according to preset order
Linking syllable audio data, obtain the audio data of the target number sequence.
In the present embodiment, the audio data of above-mentioned target number sequence specifically can be understood as voice broadcast mesh
Mark the audio data of Serial No..
In the present embodiment, the audio data of the above-mentioned trunk syllable for splicing the character according to preset order and described
The audio data of linking syllable between adjacent character, when it is implemented, may include: according to preset order (i.e. and target
Character puts in order in the character string of Serial No.), arrange the audio data of the trunk syllable of each character;It recycles adjacent
Character between linking syllable audio data connect adjacent character trunk syllable audio data.
In the present embodiment, it should be noted that locking equipment is big really for the casting voice used in view of usual user
It is mostly Embedded device systems, this kind of device systems are limited to the structure of itself, often operational capability, data-handling capacity phase
To weaker, lead to directly to synthesize the audio data of corresponding Serial No. by speech synthesis model that the cost is relatively high, handles
Efficiency is also relatively poor.Resource can be avoided passing through by the determination method of the casting voice provided using this specification embodiment
It occupies higher speech synthesis model and generates corresponding audio data, but simply retrieved in preset audio database true
The audio data of linking syllable between the audio data and adjacent character of the trunk syllable of fixed corresponding character is spliced
Combination, is mentioned with obtaining the audio data of the target number sequence with high accuracy so as to reduce the occupancy to resource
High treatment efficiency is preferably suitable for Embedded device systems.
In one embodiment, the audio data of the trunk syllable of each character in the above-mentioned acquisition character string,
And the audio data of the linking syllable between the adjacent character in the character string, when it is implemented, may include following
Content.
S1: it identifies each character in the character string, and determines the company between the adjacent character in the character string
Connect relationship, wherein the connection relationship between adjacent character in the character string is used to indicate the adjacent word in character string
The successive order of connection between symbol;
S2: it according to each character in the character string, is retrieved from preset audio database and obtains each character
Trunk syllable audio data, wherein the audio number of the trunk syllable of character is stored in the preset audio database
According to the audio data of the linking syllable between adjacent character;
S3: it according to the connection relationship between the adjacent character in the character string, is examined from preset audio database
Rope and the audio data for obtaining the linking syllable between the adjacent character in the character string.
In the present embodiment, the connection relationship between above-mentioned adjacent character specifically can be understood as two adjacent words
A kind of identification information of sequencing between symbol.For example, character " five " and " ten " are adjacent two in character string " 54 "
A character, the connection relationship between " five " and " ten " can be stated are as follows: character " five " hyphen " ten ".Certainly, it needs to illustrate
It is connection relationship between above-mentioned cited adjacent character is that one kind schematically illustrates.It can also be passed through when specific implementation
His identification means indicate the connection relationship between adjacent character.In this regard, this specification is not construed as limiting.
It in the present embodiment, when it is implemented, can be according to by the character identified and identified adjacent word
Connection relationship between symbol is retrieved, to extract preset audio database as mark in preset audio database
In between the audio data or adjacent character in the matched audio data of above-mentioned mark as the trunk syllable of above-mentioned character
Linking syllable audio data.
In one embodiment, the preset audio database can specifically be established in the following way.
S1: sample data is obtained;Wherein, the sample data is the audio for including character string corresponding to Serial No.
Data;
S2: interception obtains the audio data of the trunk syllable of character from the sample data;
S3: the audio data of the linking syllable between adjacent character is obtained from interception in the sample data;
S4: according to the sound of the linking syllable between the audio data of the trunk syllable of the character, the adjacent character
Frequency evidence establishes the preset audio database.
In the present embodiment, above-mentioned acquisition include number audio data as sample data when it is implemented, can
It include with the audio data of the casting content of digital correlation as above-mentioned to include: in the casting audio data for intercepting announcer
Sample data;The voice data that people reads according to pre-set text can also be acquired, as above-mentioned sample data, wherein upper default
Text can be pre-set comprising there are many content of text of number combination.Certainly it should be noted that it is above-mentioned cited
Acquisition include the audio data of number as the implementation of sample data be that one kind schematically illustrates.Specific implementation
When, can also select to obtain by other means as the case may be includes digital audio data as sample data.It is right
This, this specification is not construed as limiting.
In the present embodiment, after obtaining sample data, sample data can also be labeled.Specifically, can
To mark the corresponding audio data of each character syllable using corresponding character syllable in acquired sample data
Locating range areas.
Correspondingly, the above-mentioned interception from sample data specifically can wrap when obtaining the audio data of trunk syllable of character
It includes: retrieving the character syllable mark in the sample data;It is identified according to the character syllable, intercepts institute in the sample data
State the audio of trunk syllable of the audio data in the specified region in the identified range of character syllable mark as the character
Data.
In the present embodiment, above-mentioned specified region specifically can be understood as identifying identified model in the character syllable
In enclosing, the midpoint in identified range is identified using the character syllable as center symmetric points, and the siding-to-siding block length in region and institute
The ratio for stating the siding-to-siding block length that character syllable identifies identified range is equal to the region of default ratio.
For example, character syllable can be identified to symmetric points centered on the midpoint O in the range that " 5 " are identified, cut respectively
Taking the 1/2 region group cooperation of the two sides central symmetry point O is specified region, specifies the audio data in region to be determined as character " five " this
Trunk syllable audio data.Wherein, above-mentioned specified region accounts for the 1/2 of the range that character syllable mark " 5 " is identified.When
So, it should be noted that above-mentioned cited specified region, and determine that the mode in specified region is intended merely to be better described
This specification embodiment.When it is implemented, can also select to use other regions as specified region as the case may be,
And then specified region is determined using corresponding method of determination.
For example, it is also possible to using character syllable identify loudness of a sound amplitude in identified range be greater than the region of threshold intensity as
Specified region.Correspondingly, when it is implemented, loudness of a sound can be intercepted from range represented by character syllable mark according to loudness of a sound
Amplitude is greater than audio data of the audio data in the region of threshold intensity as the trunk syllable of character.
When it is implemented, can be refering to shown in Fig. 7.It is identified from character syllable in identified range, selects the width of loudness of a sound
Value is greater than the location point that the loudness of a sound value in a cycle of threshold intensity is 0 and loudness of a sound amplitude is less than first of threshold intensity
The region between location point that loudness of a sound in period is 0 can intercept the sound in above-mentioned specified region as specified region
Frequency according to the trunk syllable as above-mentioned character audio data.
Wherein, it should be noted that the specific value of above-mentioned threshold intensity can be determined according to the phoneme of character syllable.Tool
Body, if the phoneme of character syllable is vowel, above-mentioned threshold intensity can be arranged relatively high, such as can be set to
0.1.If the phoneme of character syllable is consonant, above-mentioned threshold intensity can be arranged relatively low, such as can be set to
0.03.For example, for some character character syllable with vowel, ended up with consonant, when specific implementation can be by the word
The character syllable of symbol identifies the position that the loudness of a sound value in a cycle of the amplitude greater than 0.1 of loudness of a sound in identified range is 0
The region between location point that loudness of a sound in point and a cycle of the loudness of a sound amplitude less than 0.03 is 0 is used as specified region, into
And the audio data of trunk syllable of the audio data in the available specified region as the character.
In addition, the specific value of above-mentioned threshold intensity can also be determined according to the power of background sound in audio data, tool
Body, if the background sound in audio data is stronger, above-mentioned threshold intensity can be arranged relatively high, such as can set
If being set to 0.16., the background sound in audio data is weaker, can be arranged relatively low by above-mentioned threshold intensity, such as
It can be set to 0.047.Certainly, it should be noted that the mode of above-mentioned cited determination threshold intensity is intended merely to more preferably
Ground illustrates mode when this implementation.When it is implemented, can also select to use other suitable modes according to specific application scenarios
Threshold value intensity.In this regard, this specification is not construed as limiting.
From in the sample data interception obtain the audio data of trunk syllable of character after, correspondingly, above-mentioned from institute
The audio data for the linking syllable that interception in sample data obtains between adjacent character is stated, when it is implemented, may include: to cut
Take the audio data in the region between the audio data of the trunk syllable of character adjacent in the sample data as the phase
The audio data of linking syllable between adjacent character.
In the present embodiment, it further contemplates and is accustomed to according to the voice of the mankind, issued about target number sequence
Voice data in first character syllable when, in loudness of a sound between 0 audio data to the trunk syllable of first character
There is also a kind of audio datas of connection syllable for playing linking.It therefore, when it is implemented, can be in intercepted samples data
Audio data in audio data between the audio data of trunk syllable that has of initial position and the first character as a kind of rank
Connect the audio data of syllable, so as to it is subsequent can splice to obtain effect preferably, the audio number of more natural and tripping target number
According to start-up portion character audio data.
It in the present embodiment, when it is implemented, can be with two adjacent specified in intercepted samples data sound intermediate frequency data
Audio data of the audio data in region as the linking syllable between corresponding adjacent character between region.
It in the present embodiment, when it is implemented, can be in the manner described above respectively to each audio in sample data
Data are intercepted, to obtain the linking syllable between the audio data of the trunk syllable of the character, the adjacent character
Audio data, and then can save the audio data of the trunk syllable of the acquired character, the adjacent character it
Between linking syllable audio data, and according between the audio data of the trunk syllable of the character, the adjacent character
Linking syllable audio data, establish the preset audio database.
In one embodiment, it is intercepted from the sample data and obtains the audio data of the trunk syllable of character, tool
It may include the following contents when body is implemented: retrieving the character syllable mark in the sample data;According to the character syllable mark
Know, intercepts the audio data in the specified region in the identified range of the mark of character syllable described in the sample data as institute
State the audio data of the trunk syllable of character.
In one embodiment, the specified region specifically can be understood as being identified in the character syllable
In range, the midpoint in identified range is identified as center symmetric points using the character syllable, and the siding-to-siding block length in region and
The ratio that the character syllable identifies the siding-to-siding block length of identified range is equal to the region of default ratio.
In one embodiment, the sound of the linking syllable between adjacent character is obtained from interception in the sample data
Frequency evidence, when it is implemented, may include the following contents: intercepting the sound of the trunk syllable of character adjacent in the sample data
Audio data of the audio data in region of the frequency between as the linking syllable between the adjacent character.
In one embodiment, the linking syllable between adjacent character is being obtained from interception in the sample data
After audio data, for find and determine linking effect preferably, the audio data of more natural and tripping linking syllable saved,
When it is implemented, the method can also include the following contents:
Whether S1: including same adjacent character in the audio data of the linking syllable between the detection adjacent character
Between multiple linking syllables audio data;
S2: determine between the adjacent character linking syllable audio data in include same adjacent character it
Between multiple linking syllables audio data in the case where, count multiple linking syllables between the same adjacent character
The frequency of occurrences of the audio data of various types of linking syllables in audio data, by the rank of the highest type of the frequency of occurrences
The audio data for connecing syllable is determined as the audio data of the linking syllable between the adjacent character.
In the present embodiment, due to sample data be mostly by human hair go out include number voice audio data,
For the audio data of multiple linking syllables between same adjacent character, the higher correspondence of the frequency of occurrences is in the normal language of the mankind
More frequent, the more universal voice habit of the mankind that can more coincide is used in sound habit.It therefore can be by the highest class of the frequency of occurrences
As effect, preferable, more natural audio data is stored in preset audio database the audio data of the linking syllable of type
To improve the accuracy of audio database.
Specifically, the audio data of multiple linking syllables between same adjacent character can be divided into multiple types
Type, distinguishes the frequency of occurrences of various types of audio datas in statistical sample data, and sieves from a plurality of types of audio datas
Select the audio data of the highest type of the frequency of occurrences as between above-mentioned adjacent character linking syllable between audio data,
It is stored in preset audio database.Certainly, in addition to the above-mentioned cited appearance frequency according to various types of audio datas
Rate filters out the preferable audio data of effect from the audio data of multiple linking syllables between same adjacent character and carries out
It can also be using other suitable modes from the audio data of multiple linking syllables between same adjacent character outside saving
The preferable audio data of effect is filtered out to be saved.For example, it is also possible to calculate separately multiple between same adjacent character
It is connected the MOS value (Mean Opinion Score, mean subjective opinion point) of the audio data of syllable, according to the sound of linking syllable
The MOS value of frequency evidence filters out the audio data of the highest linking syllable of MOS value as the linking syllable between adjacent character
Audio data.Wherein, above-mentioned MOS value can be used for natural, smooth degree that is more accurate, objectively evaluating audio data.
Similar, when interception obtains the audio data of the trunk syllable of multiple same characters of characterization, can count same
The frequency of occurrences of the audio data of different types of trunk syllable in the audio data of multiple trunk syllables of character, and then can be with
The highest audio data conduct of the frequency of occurrences is filtered out from the audio data of a plurality of types of trunk syllables of same sub- symbol should
The audio data of the trunk syllable of character is simultaneously saved into preset audio database.The more of same character can also be determined respectively
The MOS value of the audio data of a trunk syllable filters out sound of the highest audio data of MOS value as the trunk syllable of the character
Frequency evidence simultaneously saves medium to preset audio database.
In one embodiment, more complete voice audio data carries out including target number sequence in order to obtain
Voice broadcast, after the audio data for obtaining the target number sequence, the method be embodied when can also include with
Lower content:
S1: preset front audio data are obtained, wherein the preset front audio data are used to indicate the target
The data object that Serial No. is characterized;
S2: the audio data of the preset front audio data and the target number sequence is spliced, is obtained
Voice audio data to be played;
S3: the voice audio data to be played is played.
In the present embodiment, above-mentioned preset front audio data, which specifically can be, is used to indicate target number sequence institute
The audio data of the contents such as the data object of characterization.For example, for the casting of the account amount of money, above-mentioned preset front audio number
According to may include voice audio data " account to account " before amount of money number is arranged in, and be arranged in the amount of money it is digital after
Voice audio data " member ".For stock price casting, above-mentioned preset front audio data may include being arranged in valence
Voice audio data " the newest unit price of XX stock is " before lattice number, and the speech audio after price number is set
Data " member is per share ".Certainly, above-mentioned cited preset front audio data are that one kind schematically illustrates.Specific implementation
When, other audio datas can also be set as above-mentioned preset front audio data according to specific application scenarios.It is right
This, this specification is not construed as limiting.
In the present embodiment, it should be noted that in the voice data usually broadcasted front audio data often compared with
For fixation, variation is target number sequence to be broadcasted in voice data.For to the casting of the account amount of money, different arrives account
Front audio data are all identical in the voice broadcast data of the amount of money.For example, " account to the account amount of money is 50 quaternarys ", " account
Be 79 yuan to the account amount of money " in front audio data it is identical be all " account is ", and " member " to the account amount of money, difference
It is amount of money number to be broadcasted.Therefore, when it is implemented, in order to improve treatment effeciency, can preset save it is corresponding before
Set audio data, after the audio data for having regenerated target number sequence, can by preset front audio data with generated
The audio data of target number sequence directly spliced and combined, obtain voice audio data to be played, carry out voice and broadcast
It puts.So as to avoid carrying out duplicate audio data synthesis to the identical front audio data of content, treatment effeciency is improved, is made
The determination method for obtaining the casting voice that this specification provides is more applicable for the limited embedded system of data-handling capacity, such as
Mobile phone etc. broadcasts voice locking equipment really.
Specifically, for example, can first call and set after the audio data for having obtained target number sequence " 54 "
Front audio data " account is to the account amount of money ", " member ";According still further to certain sequence by the audio of target number sequence " 54 "
Data are spliced and combined with preset front audio data.Specifically, can be in the audio data of " account is to the account amount of money "
The audio data of linking objective Serial No. " 54 " afterwards connects " member " after the audio data in target number sequence " 54 ", from
And obtained more completely, it include the voice broadcast data to the account amount of money of target number sequence.
In one embodiment, the preset front audio data can specifically include at least one of: be used for
Broadcast the audio data of the preposition term of the account amount of money, the audio data of preposition term for broadcasting mileage travelled, for broadcasting
Report the audio data etc. of the preposition term of stock price.Certainly, it should be noted that above-mentioned cited preset front audio
Data are intended merely to that present embodiment is better described.When it is implemented, can also be selected according to specific application scenarios and requirement
It selects and uses other preset audio datas as above-mentioned preset advance data.In this regard, this specification is not construed as limiting.
Therefore the determination method for broadcasting voice that this specification embodiment provides is by obtaining between adjacent character
Linking syllable audio data, and utilize the character of the audio data splicing correspondence of the linking syllable between adjacent character
The audio data of trunk syllable obtains the more natural voice audio data of transition, to carry out voice broadcast, to solve existing
There is the problem of unnatural casting of number present in method, poor user experience, operation cost can be taken into account by reaching, efficiently, glibly
Carry out the voice broadcast in relation to number;Include also digital sample data by obtaining, specified area is intercepted from sample data
The audio data of audio data in domain as the trunk syllable of character, so intercept character trunk syllable audio data it
Between audio data as between adjacent character linking syllable audio data, so as to establish accurately it is preset
Audio database, so as to generate more natural, smooth target number sequence by retrieving above-mentioned preset audio database
The audio data of column.
As shown in fig.8, present description provides a kind of determination methods for broadcasting voice, wherein this method concrete application
In casting voice locking equipment side really.When it is implemented, this method may include the following contents.
S801: character string to be played is obtained, wherein the character string includes multiple words arranged according to preset order
Symbol;
S803: it obtains in audio data and the character string of the trunk syllable of each character in the character string
Adjacent character between linking syllable audio data, wherein the linking syllable is used to connect the master of adjacent character
Dry syllable;
S805: splice between the audio data of the trunk syllable of the character and the adjacent character according to preset order
Linking syllable audio data, obtain the audio data of the character string to be played.
In the present embodiment, above-mentioned character string to be played specifically can be the character string of Serial No. to be played,
It is also possible to the character string of text information to be played.When it is implemented, according to concrete application scene and can implement to require choosing
The character string of corresponding contents is selected as above-mentioned character string to be played.Above-mentioned character string to be played is characterized specific interior
Hold, this specification is not construed as limiting.
This specification embodiment additionally provides a kind of casting voice locking equipment really, including processor and at storage
The memory of device executable instruction is managed, the processor can be according to instruction execution following steps when being embodied: obtaining wait broadcast
The target number sequence of report;The target number sequence is converted into character string, wherein the character string includes multiple according to pre-
If tactic character;Obtain the audio data and the character of the trunk syllable of each character in the character string
The audio data of linking syllable between adjacent character in string, wherein the linking syllable is for connecting adjacent character
Trunk syllable;Splice between the audio data of the trunk syllable of the character and the adjacent character according to preset order
It is connected the audio data of syllable, obtains the audio data of the target number sequence.
In order to more accurately complete above-metioned instruction, refering to Fig. 9, this specification additionally provides another kind and specifically broadcasts
Report voice locking equipment really, wherein locking equipment includes input interface 901, processor 902 and storage to the casting voice really
Device 903, above structure is connected by Internal cable, so that each structure can carry out specific data interaction.
Wherein, the input interface 901 specifically can be used for inputting target number sequence to be broadcasted.
The processor 902 specifically can be used for the target number sequence being converted to character string, wherein the word
Symbol string includes multiple characters arranged according to preset order;Obtain the audio of the trunk syllable of each character in the character string
The audio data of linking syllable between adjacent character in data and the character string, wherein the linking syllable is used
In the trunk syllable for connecting adjacent character;Splice the audio data of the trunk syllable of the character and described according to preset order
The audio data of linking syllable between adjacent character, obtains the audio data of the target number sequence.
The memory 903 specifically can be used for storing the target number sequence to be broadcasted inputted through input interface 901
Column, preset audio database, and the corresponding instruction repertorie of storage.
In the present embodiment, the input interface 901 specifically can be a kind of support casting voice locking equipment obtain really
It takes, and extracts unit, the module of target data sequence to be broadcasted from acquired information data.
In the present embodiment, the processor 902 can be implemented in any suitable manner.For example, processor can be with
Take such as microprocessor or processor and storage can by (micro-) processor execute computer readable program code (such as
Software or firmware) computer-readable medium, logic gate, switch, specific integrated circuit (Application Specific
Integrated Circuit, ASIC), programmable logic controller (PLC) and the form etc. for being embedded in microcontroller.This specification is simultaneously
It is not construed as limiting.
In the present embodiment, the memory 903 may include many levels, in digital display circuit, as long as can save
Binary data can be memory;In integrated circuits, the circuit with store function of a not no physical form
Also memory, such as RAM, FIFO are;In systems, the storage equipment with physical form is also memory, such as memory bar, TF
Card etc..
This specification embodiment additionally provides a kind of computer storage medium based on above-mentioned method of payment, the computer
Storage medium is stored with computer program instructions, is performed realization in the computer program instructions: by the target number
Sequence is converted to character string, wherein the character string includes multiple characters arranged according to preset order;Obtain the character string
In each character the audio data of trunk syllable and the adjacent character in the character string between linking syllable
Audio data, wherein the linking syllable is used to connect the trunk syllable of adjacent character;Splice the word according to preset order
The audio data of linking syllable between the audio data of the trunk syllable of symbol and the adjacent character, obtains the number of targets
The audio data of word sequence.
In the present embodiment, above-mentioned storage medium includes but is not limited to random access memory (Random Access
Memory, RAM), read-only memory (Read-Only Memory, ROM), caching (Cache), hard disk (Hard DiskDrive,
) or storage card (Memory Card) HDD.The memory can be used for storing computer program instructions.Network communication unit
It can be according to standard setting as defined in communication protocol, for carrying out the interface of network connection communication.
In the present embodiment, the function and effect of the program instruction specific implementation of computer storage medium storage, can
To compare explanation with other embodiment, details are not described herein.
Refering to fig. 10, on software view, this specification embodiment additionally provides a kind of determining device for broadcasting voice, should
Device can specifically include construction module below:
First obtains module 1001, specifically can be used for obtaining target number sequence to be broadcasted;
Conversion module 1002 specifically can be used for the target number sequence being converted to character string, wherein the character
String includes multiple characters arranged according to preset order;
Second obtains module 1003, specifically can be used for obtaining the sound of the trunk syllable of each character in the character string
The audio data of linking syllable between adjacent character in frequency evidence and the character string, wherein the linking syllable
For connecting the trunk syllable of adjacent character;
Splicing module 1004 specifically can be used for splicing according to preset order the audio data of the trunk syllable of the character
The audio data of linking syllable between the adjacent character, obtains the audio data of the target number sequence.
In one embodiment, the second acquisition module 1003 can specifically include following structural unit:
Recognition unit specifically can be used for identifying each character in the character string, and determine in the character string
Connection relationship between adjacent character, wherein the connection relationship between adjacent character in the character string is used to indicate
The successive order of connection between adjacent character in character string;
First acquisition unit specifically can be used for according to each character in the character string, from preset audio data
The audio data of the trunk syllable of each character is retrieved and obtained in library, wherein is stored in the preset audio database
The audio data of linking syllable between the audio data and adjacent character of the trunk syllable of character;
Second acquisition unit specifically can be used for according to the connection relationship between the adjacent character in the character string,
From the audio number that the linking syllable between the adjacent character in the character string is retrieved and obtained in preset audio database
According to.
In one embodiment, preset audio database to be used is needed in order to be prepared in advance, when it is implemented,
Described device can also include establishing module, specifically can be used for establishing preset audio database.
In one embodiment, described to establish module when it is implemented, may include following structural unit:
Third acquiring unit, specifically can be used for obtaining includes digital audio data as sample data;
First interception unit specifically can be used for the audio that the interception from the sample data obtains the trunk syllable of character
Data;
Second interception unit specifically can be used for obtaining the linking between adjacent character from interception in the sample data
The audio data of syllable;
Unit is established, specifically can be used for audio data, the adjacent character of the trunk syllable according to the character
Between linking syllable audio data, establish the preset audio database.
In one embodiment, described device including playing module when it is implemented, can also specifically can be used for obtaining
Take preset front audio data, wherein the preset front audio data are used to indicate target number sequence institute table
The data object of sign;The audio data of the preset front audio data and the target number sequence is spliced, is obtained
To voice audio data to be played;Play the voice audio data to be played.
In one embodiment, the preset front audio data can specifically include at least one of: be used for
Broadcast the audio data of the preposition term of the account amount of money, the audio data of preposition term for broadcasting mileage travelled, for broadcasting
Report the audio data etc. of the preposition term of stock changing value.Certainly, it should be noted that above-mentioned cited front audio data
Only one kind schematically illustrates.When it is implemented, can also select according to specific application scenarios and requirement or obtain other
Suitable audio data is as above-mentioned preset front audio data.In this regard, this specification is not construed as limiting.
It should be noted that unit, device or module etc. that above-described embodiment illustrates, specifically can by computer chip or
Entity is realized, or is realized by the product with certain function.For convenience of description, it describes to divide when apparatus above with function
It is described respectively for various modules.It certainly, can be the function of each module in same or multiple softwares when implementing this specification
And/or realized in hardware, the module for realizing same function can also be realized by the combination of multiple submodule or subelement etc..With
Upper described Installation practice is only schematical, for example, the division of the unit, only a kind of logic function is drawn
Point, there may be another division manner in actual implementation, such as multiple units or components may be combined or can be integrated into separately
One system, or some features can be ignored or not executed.Another point, shown or discussed mutual coupling or straight
Connecing coupling or communication connection can be through some interfaces, and the indirect coupling or communication connection of device or unit can be electrical property,
Mechanical or other forms.
Therefore the determining device for broadcasting voice that this specification embodiment provides obtains module acquisition phase by second
The audio data of linking syllable between adjacent character, and the linking syllable between adjacent character is utilized by splicing module
The audio data of the trunk syllable of the character of audio data splicing correspondence obtains the more natural voice audio data of transition, with
Voice broadcast is carried out, to solve the problems, such as that number present in existing method broadcasts unnatural, poor user experience, reaches energy
Operation cost is taken into account, efficiently, carries out the voice broadcast in relation to number glibly;It include also number by establishing module and obtaining
Sample data, the audio data of the trunk syllable from the audio data intercepted in sample data in specified region as character, into
And intercept audio of the audio data between the audio data of the trunk syllable of character as the linking syllable between adjacent character
Data, so as to establish accurate preset audio database, so as to by retrieving above-mentioned preset audio number
According to library, the audio data of more natural, smooth target number sequence is generated.
Although being based on routine or nothing present description provides the method operating procedure as described in embodiment or flow chart
Creative means may include more or less operating procedure.The step of enumerating in embodiment sequence is only numerous steps
One of rapid execution sequence mode does not represent and unique executes sequence.When device or client production in practice executes,
Can be executed according to embodiment or the execution of method shown in the drawings sequence or parallel (such as parallel processor or multithreading
The environment of processing, even distributed data processing environment).The terms "include", "comprise" or its any other variant are intended to
Cover non-exclusive inclusion, so that the process, method, product or the equipment that include a series of elements not only include those
Element, but also including other elements that are not explicitly listed, or further include for this process, method, product or setting
Standby intrinsic element.In the absence of more restrictions, being not precluded is including process, method, the product of the element
Or there is also other identical or equivalent elements in equipment.The first, the second equal words are used to indicate names, and are not offered as appointing
What specific sequence.
It is also known in the art that other than realizing controller in a manner of pure computer readable program code, it is complete
Entirely can by by method and step carry out programming in logic come so that controller with logic gate, switch, specific integrated circuit, programmable
Logic controller realizes identical function with the form for being embedded in microcontroller etc..Therefore this controller is considered one kind
Hardware component, and the structure that the device for realizing various functions that its inside includes can also be considered as in hardware component.Or
Person even, can will be considered as realizing the device of various functions either the software module of implementation method can be hardware again
Structure in component.
This specification can describe in the general context of computer-executable instructions executed by a computer, such as journey
Sequence module.Generally, program module include routines performing specific tasks or implementing specific abstract data types, programs, objects,
Component, data structure, class etc..This specification can also be practiced in a distributed computing environment, in these distributed computing rings
In border, by executing task by the connected remote processing devices of communication network.In a distributed computing environment, program mould
Block can be located in the local and remote computer storage media including storage equipment.
As seen through the above description of the embodiments, those skilled in the art can be understood that this specification
It can realize by means of software and necessary general hardware platform.Based on this understanding, the technical solution of this specification
Substantially the part that contributes to existing technology can be embodied in the form of software products in other words, the computer software
Product can store in storage medium, such as ROM/RAM, magnetic disk, CD, including some instructions are used so that a computer
Equipment (can be personal computer, mobile terminal, server or the network equipment etc.) execute each embodiment of this specification or
Method described in certain parts of person's embodiment.
Each embodiment in this specification is described in a progressive manner, the same or similar portion between each embodiment
Dividing may refer to each other, and each embodiment focuses on the differences from other embodiments.This specification can be used for
In numerous general or special purpose computing system environments or configuration.Such as: personal computer, server computer, handheld device
Or portable device, laptop device, multicomputer system, microprocessor-based system, set top box, programmable electronics set
Standby, network PC, minicomputer, mainframe computer, distributed computing environment including any of the above system or equipment etc..
Although depicting this specification by embodiment, it will be appreciated by the skilled addressee that there are many become for this specification
Spirit of the shape without departing from this specification, it is desirable to which the attached claims include these deformations and change without departing from this specification
Spirit.