CN109036373A - A kind of method of speech processing and electronic equipment - Google Patents
A kind of method of speech processing and electronic equipment Download PDFInfo
- Publication number
- CN109036373A CN109036373A CN201810857848.4A CN201810857848A CN109036373A CN 109036373 A CN109036373 A CN 109036373A CN 201810857848 A CN201810857848 A CN 201810857848A CN 109036373 A CN109036373 A CN 109036373A
- Authority
- CN
- China
- Prior art keywords
- voice
- casting
- audio
- speech samples
- information
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/02—Methods for producing synthetic speech; Speech synthesisers
- G10L13/033—Voice editing, e.g. manipulating the voice of the synthesiser
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/08—Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G10L15/1822—Parsing for meaning understanding
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G10L15/183—Speech classification or search using natural language modelling using context dependencies, e.g. language models
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G10L15/183—Speech classification or search using natural language modelling using context dependencies, e.g. language models
- G10L15/19—Grammatical context, e.g. disambiguation of the recognition hypotheses based on word sequence rules
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L2021/02087—Noise filtering the noise being separate speech, e.g. cocktail party
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Artificial Intelligence (AREA)
- Quality & Reliability (AREA)
- Signal Processing (AREA)
- Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
Abstract
This application discloses a kind of method of speech processing and electronic equipments, this method comprises: carrying out semantics recognition to the content of the pre- casting information of acquisition;Background audio corresponding with the pre- casting information and voice messaging are generated according to recognition result;Synthetic operation is carried out to the background audio and the voice messaging, generates casting audio corresponding with the pre- casting information.The method of speech processing can carry out synthetic operation to background audio and voice messaging, so that can also play background audio while broadcasting to voice messaging, enhance user experience.
Description
Technical field
This application involves data processing field, in particular to a kind of method of speech processing and electronic equipment.
Background technique
With information-based continuous development, more and more users are in the reading etc. for carrying out information using electronic equipment
Movement, carries out voice broadcast using electronic equipment sometimes when being read, to meet the related needs of user.But mesh
Preceding when carrying out voice broadcast, the voice of the mechanization of sending seems very stiff, without the voice such as the corresponding tone, rhythm spy
Sign, and the content that content dullness is not bound with casting when casting issues corresponding background sound, and user experience is not lively.
Summary of the invention
The embodiment of the present application be designed to provide a kind of method of speech processing and electronic equipment, this method can be to backgrounds
Audio and voice messaging carry out synthetic operation, so that can also play background audio while broadcasting to voice messaging.
In order to solve the above-mentioned technical problem, embodiments herein adopts the technical scheme that a kind of speech processes side
Method, comprising:
Semantics recognition is carried out to the content of the pre- casting information of acquisition;
Background audio corresponding with the pre- casting information and voice messaging are generated according to recognition result;
Synthetic operation is carried out to the background audio and the voice messaging, is generated corresponding with the pre- casting information
Broadcast audio.
Preferably, after generation casting audio corresponding with the pre- casting information, further includes:
The casting audio is played based on selected voice bank, wherein the selected voice bank is preset sound data
A voice bank in library.
Preferably, the method also includes:
Speech samples are received, and the speech samples are pre-processed to establish customized voice bank, and by the sound
Sound library is added in preset sound database.
Preferably, described pre-process the speech samples to establish customized voice bank and include:
Individual pronunciation difference and/or the ambient noise in the speech samples are removed, is removed superfluous in the speech samples
Remaining information is to retain key message;
Determine the voice unit in the speech samples;
Institute's speech units are identified according to the key message corresponding grammer, to obtain the speech samples
Language mode;
Customized voice bank is constructed based on the language mode.
Preferably, the content progress semantics recognition of the pre- casting information to acquisition includes:
Identify the corresponding context relation data of content of the pre- casting information, and according to the context relation data
Determine the context of the pre- casting information.
Preferably, carrying out synthetic operation to the background audio and the voice messaging, generates and believe with the pre- casting
The corresponding casting audio of manner of breathing includes:
The voice messaging and the background audio are divided according to the context of the pre- casting information;
By the voice messaging and background audio progress synthetic operation with identical context.
The embodiment of the present application also provides a kind of electronic equipment, including identification module, processing module and synthesis module,
The identification module is configured to carry out semantics recognition to the content of the pre- casting information of acquisition;
The processing module be configured to be generated according to recognition result background audio corresponding with the pre- casting information and
Voice messaging;
The synthesis module is configured to carry out synthetic operation to the background audio and the voice messaging, generate with it is described
It is pre- to broadcast the corresponding casting audio of information.
Preferably, the electronic equipment further includes broadcasting module, the broadcasting module is configured to selected sound
Library plays the casting audio, wherein the selected voice bank is a voice bank in preset sound database.
Preferably, the electronic equipment further includes building module, the building module is configured that
Speech samples are received, and the speech samples are pre-processed to establish customized voice bank, and by the sound
Sound library is added in preset sound database.
Preferably, the building module is further configured to:
Individual pronunciation difference and/or the ambient noise in the speech samples are removed, is removed superfluous in the speech samples
Remaining information is to retain key message;
Determine the voice unit in the speech samples;
Institute's speech units are identified according to the key message corresponding grammer, to obtain the speech samples
Language mode;
Customized voice bank is constructed based on the language mode.
The beneficial effect of the embodiment of the present application is: this method can carry out synthesis behaviour to background audio and voice messaging
Make, so that background audio can also be played while broadcasting to voice messaging, enhances user experience.
Detailed description of the invention
Fig. 1 is the flow chart of the method for speech processing of the embodiment of the present application;
Fig. 2 is the flow chart of a specific embodiment of the method for speech processing of the embodiment of the present application;
Fig. 3 is the structural schematic diagram of the electronic equipment of the embodiment of the present application.
Specific embodiment
The various schemes and feature of the application are described herein with reference to attached drawing.
It should be understood that various modifications can be made to the embodiment applied herein.Therefore, description above should not regard
To limit, and only as the example of embodiment.Those skilled in the art will expect in the scope and spirit of the present application
Other modifications.
The attached drawing being included in the description and forms part of the description shows embodiments herein, and with it is upper
What face provided is used to explain the application together to substantially description and the detailed description given below to embodiment of the application
Principle.
By the description of the preferred form with reference to the accompanying drawings to the embodiment for being given as non-limiting example, the application's
These and other characteristic will become apparent.
It is also understood that although the application is described referring to some specific examples, those skilled in the art
Member realizes many other equivalents of the application in which can determine, they have feature as claimed in claim and therefore all
In the protection scope defined by whereby.
When read in conjunction with the accompanying drawings, in view of following detailed description, above and other aspect, the feature and advantage of the application will become
It is more readily apparent.
The specific embodiment of the application is described hereinafter with reference to attached drawing;It will be appreciated, however, that applied embodiment is only
Various ways implementation can be used in the example of the application.Known and/or duplicate function and structure and be not described in detail to avoid
Unnecessary or extra details makes the application smudgy.Therefore, applied specific structural and functionality is thin herein
Section is not intended to restrictions, but as just the basis of claim and representative basis be used to instructing those skilled in the art with
Substantially any appropriate detailed construction diversely uses the application.
This specification can be used phrase " in one embodiment ", " in another embodiment ", " in another embodiment
In " or " in other embodiments ", it can be referred to one or more of the identical or different embodiment according to the application.
A kind of method of speech processing of the embodiment of the present application can be identified and be handled to text or audio, and be carried out
Corresponding voice broadcast, Fig. 1 be the embodiment of the present application method of speech processing flow chart, as shown in Figure 1, this method include with
Lower step:
S1 carries out semantics recognition to the content of the pre- casting information of acquisition.Pre- casting information includes text information, audio letter
A plurality of types of information such as breath or pre-stored information, the pre- content for broadcasting information have corresponding semanteme, and semanteme is data
Meaning, i.e. pass between the meaning and these meanings of concept representated by things in the real world corresponding to data
System, is explanation and logical expressions of the data on some field.In one embodiment, can make when being identified to semanteme
With pre-stored knowledge base, corresponding FIELD Data can be obtained in knowledge base according to the information of identification, in conjunction with the field
Data further carry out semantics recognition to the content of pre- casting information.So that recognition result more closing to reality situation.
S2 generates background audio corresponding with pre- casting information and voice messaging according to recognition result.To pre- casting
The content of information can know after semantics recognition the corresponding usage scenario of the pre- casting information, for example, the pre- casting information
It is information relevant to historical events or literary works relevant information (such as contemporary lyric prose, horror story), or
Person is the relevant information (such as cocktail party, movement meeting) of a living scene in reality, can also be one kind of multiple scenes to mixing
Information of conjunction etc..Background audio corresponding with the pre- casting information can be generated according to recognition result in the present embodiment,
The background audio can be adapted in entire content with pre- casting information, can also be with the local key content of pre- casting information
It is adapted.For example, when casting information in advance and when a lyric prose, the to releive corresponding with the lyric prose can be generated
One background sound;When the pre- casting information is a horror story, the second background corresponding with the horror story can be generated
Sound, and thrill of second background sound in the horror story is put to have and be dubbed emphatically, and then increases and broadcast the terror
Terrified effect when story.In addition, in the present embodiment, it is also necessary to be generated according to recognition result corresponding with pre- casting information
Voice messaging.Due to had learned that it is pre- casting information content semanteme, this enable generate voice messaging more adjunction
The content to be expressed of script in nearly pre- casting information, such as when broadcasting information and a lyric prose in advance, can generate
The requirement such as the tone, intonation, the rhythm, stress of the lyric prose can be more in line with when corresponding voice messaging, with more adjunction
The voice of nearly true man.
S3 carries out synthetic operation to background audio and voice messaging, generates casting audio corresponding with pre- casting information.
In one embodiment, when carrying out synthetic operation, according to the entire content phase of the entire content of background audio and voice messaging
It is corresponding, the casting audio of generation enable can also be heard while user's uppick voice broadcast in the voice broadcast
Hold corresponding background audio, to enrich the content of casting audio, meet user demand and enrich audio content, uses
It experiences at family.In another embodiment, when carrying out synthetic operation, by background audio at least one key sound frequency point with
The corresponding Key Words point of articulation contrasts in voice messaging, that is to say, that in the casting audio of generation, when the casting key voice
The key sound frequency point is played while point, so as to the content for expressing the Key Words point of articulation more emphatically, enriches expression
Effect.
In one embodiment of the application, the described generation step for broadcasting audio corresponding with the pre- casting information
After rapid the following steps are included:
S4 plays casting audio based on selected voice bank, wherein selected voice bank is in preset sound database
One voice bank.It include at least one voice bank in preset sound database.Different voice banks has different sound characteristics,
If preset sound database may include: boy student library, it can using standard male voice (such as the standard for meeting preset rules
Male voice) carry out the expression of sound;Female voice library, it can (such as meet the mark of preset rules using the female voice of standard
Quasi- female voice) carry out the expression of sound;Male and female students mixing library, it can according to the particular content of casting audio or requirement
Simultaneously using male voice and female voice, or alternate mode carries out the expression of sound;Dialect library is carried out using preset dialect
The expression of sound;Can additionally there are the multiple types voice banks such as fast word speed voice bank, slow word speed voice bank, child's voice library.It can be with
It is needed and is established according to user, and stored, with being capable of quick calling in the one or more of voice banks of use.
When playing out to casting audio, language can be used to play voice messaging, specially based on selected sound
Sound library can be used male voice and come in casting audio to carry out the broadcasting of voice messaging such as the male voice library that selected user requires
The content of voice messaging carries out the casting (such as reading aloud a lyric prose) of language, and also to play the casting sound at the same time
Background audio in frequency.Certainly corresponding voice bank can also be selected according to the voice messaging specifically played, such as first laugh at
Words can choose dialect library to carry out the broadcasting of casting audio, improve acoustic expression effect.
In one embodiment of the application, this method is further comprising the steps of: S5, receives speech samples, and to voice
Sample is pre-processed to establish customized voice bank, and voice bank is added in preset sound database.Each user's tool
There is the preference of oneself, such as someone likes the sweet female voice of sound to broadcast, someone likes the strong male voice of sound quality sense and carries out
Casting, someone then like will casting audio with child's voice come broadcast in addition someone like the sound of specific people broadcast (as use
The relatives and friends of oneself).And the foundation of customized voice bank becomes realization purpose basis, received speech samples are that user inclines
To the speech samples used, that is to say, that user is desirable for voice identical with the speech samples to broadcast to casting audio
It puts, customized voice bank can be established based on the speech samples.
Preferably, Fig. 2 is the flow chart of a specific embodiment of the method for speech processing of the embodiment of the present application, such as scheme
Shown in 2, described to establish at least one voice bank further comprising the steps of:
S51 removes individual pronunciation difference and/or ambient noise in the speech samples, removes superfluous in speech samples
Remaining information is to retain key message.The pretreatment for removing the individual pronunciation difference and/or ambient noise in speech samples, so that language
Sound sample, which is more clear, to be defined, and semantic meaning representation is more accurate, then to lay the foundation to being further processed for speech samples.In language
When speech expression, it is possible that some duplicate information perhaps useless information such as pet phrase, interjection or repeat
Multiple word needs to be removed these redundancies in the present embodiment, retains key message, so as to more
Subsequent recognition accuracy is improved using key message.
S52 determines the voice unit in speech samples.Different spoken and written languages have different voice units, such as English
Can be using each word as a voice unit in text, it can be using each independent word or independent word as one in Chinese
A voice unit.It can be determined when determining the voice unit in speech samples according to the actual conditions of speech samples come specific,
Such as according to the text of speech samples (being Chinese or English), according to the language communicative habits etc. in speech samples.
S53 identifies voice unit according to the corresponding grammer of key message, to obtain the language mould of speech samples
Formula.Different speech samples have corresponded to different grammers, and such as Chinese has Chinese grammer, and English has English grammar, due to closing
Key information is after treatment, to eliminate the remaining more regular and ideal information of expression after being not easy identified information, because
This, which carries out identification to voice unit according to the corresponding grammer of key message, can be realized to accurately identifying voice unit, thus
The language mode of speech samples is obtained, it includes this in the speech pattern that different speech samples, which have respective language mode,
The various features of speech samples, and then can be with identical language expression-form come to other voice messagings based on the various features
It is expressed.
S54 constructs customized voice bank based on language mode.The language mode has corresponded to the voice sample of user oneself offer
This, therefore the customized voice bank based on language mode building can embody feature identical with the speech samples, such as have
Identical sound characteristic, language communicative habits etc..Customized voice bank is enabled to be more in line with the needs of user.
In one embodiment of the application, the content of the pre- casting information to acquisition carries out the step of semantics recognition
It suddenly include: the corresponding context relation data of content of the pre- casting information of identification, and broadcast in advance according to the determination of context relation data
It notifies the context of breath.Specifically, the pre- casting information can be more accurately known from the entire content of pre- casting information
Context, in order to avoid the result and actual pre- casting information that only obtain from local content have deviation.In the present embodiment, it is based on
Entire content considers, the context of pre- casting information, example can be determined from the corresponding context relationship of content of pre- casting information
Identify the relationship between relationship or previous paragraphs and rear paragraph between preceding sentence and rear sentence such as more accurately to determine pre- casting letter
The context of breath.
It is described that synthetic operation is carried out to background audio and voice messaging in one embodiment of the application, generate with
The step of pre- casting information corresponding casting audio the following steps are included: according to the context of pre- casting information by voice messaging and
Background audio is divided;By voice messaging and background audio progress synthetic operation with identical context.Specifically, voice
Information can be divided into multiple portions, such as content low tide part, content climax parts, content turning point according to the difference of context
Point etc.;Same background audio can also be divided into multiple portions, such as sound low tide part, sound climax parts according to context,
Sound return portion etc., preferably, the voice messaging of identical context and background audio can be carried out synthetic operation, so that interior
Hold and sound is adapted, it is such as that content low tide part and sound low tide is partially synthetic, by content climax parts and sound climax portion
Division is at so that the casting audio more effect generated is more lively.In addition, in the present embodiment can also be according to voice messaging
In at least one key point content carry out synthetic operation, such as synthesis after when broadcasting a horror story, in thrilling point
The background audio that place plays is stress or supper bass, to make user have sense on the spot in person to the effect that adds to one's terrors at thrilling point
Feel.Synthetic operation can certainly be carried out according to the entire content of voice messaging, background audio and voice messaging are existed
It is adapted in entire content, can such as play a head within the period entirely broadcasted when broadcasting a lyric prose and releive sound
It is happy.
The embodiment of the present application also provides a kind of electronic equipment, Fig. 3 is that the structure of the electronic equipment of the embodiment of the present application is shown
It is intended to, as shown in figure 3, including identification module, processing module and synthesis module.
Identification module is configured to carry out semantics recognition to the content of the pre- casting information of acquisition.Pre- casting information includes text
A plurality of types of information such as information, audio-frequency information or pre-stored information, the pre- content for broadcasting information have corresponding semanteme,
Semanteme is the meaning of data, i.e., the meaning of concept representated by things in the real world corresponding to data and these contain
Relationship between justice is explanation and logical expressions of the data on some field.In one embodiment, in identification module to language
Pre-stored knowledge base can be used when justice is identified, corresponding neck can be obtained in knowledge base according to the information of identification
Numeric field data further carries out semantics recognition to the content of pre- casting information in conjunction with the FIELD Data.So that recognition result is more sticked on
Nearly actual conditions.
Processing module is configured to generate background audio corresponding with pre- casting information and voice messaging according to recognition result.
The corresponding usage scenario of the pre- casting information can be known after the content to pre- casting information carries out semantics recognition, for example, this
Pre- casting information is information relevant to historical events or literary works relevant information (lyric prose, terror such as the present age
Story etc.) or a living scene in reality relevant information (such as cocktail party, movement meeting), can also be that one kind is more
Scene is planted to mixed information etc..Processing module can generate and the pre- casting information according to recognition result in the present embodiment
Corresponding background audio, the background audio can be adapted in entire content with pre- casting information, can also be with pre- casting
The local key content of information is adapted.For example, when casting information in advance and when a lyric prose, processing module can be generated with
Corresponding the first background sound releived of the lyric prose;When the pre- casting information is a horror story, processing module can
To generate the second background sound corresponding with the horror story, and thrilling point of second background sound in the horror story can
With emphatically dub, and then increase broadcast the horror story when terrified effect.In addition, in the present embodiment, processing module
Also need to generate voice messaging corresponding with pre- casting information according to recognition result.Due to having had learned that pre- casting information
The semanteme of content, this enables the voice messaging generated to be more nearly the content to be expressed of script in pre- casting information, example
Such as when casting information is with a lyric prose in advance, it can be more in line with this when generating corresponding voice messaging and express one's emotion
The requirement such as the tone, intonation, the rhythm, stress of prose, to be more nearly the voice of true man.
Synthesis module is configured to carry out synthetic operation to background audio and voice messaging, generates corresponding with pre- casting information
Casting audio.In one embodiment, synthesis module is when carrying out synthetic operation, according to the entire content and language of background audio
The entire content of message breath is corresponding, and the casting audio of generation enables can also while user's uppick voice broadcast
It hears background audio corresponding with the voice broadcast content, to enrich the content of casting audio, meets user demand
And audio content is enriched, user experience is good.In another embodiment, synthesis module is when carrying out synthetic operation, by background
At least one key sound frequency point in audio is contrasted with the Key Words point of articulation corresponding in voice messaging, that is to say, that in generation
It broadcasts in audio, plays the key sound frequency point while broadcasting the Key Words point of articulation, expressed so as to what is more focused on
The content of the Key Words point of articulation, enriches expression effect.
In one embodiment of the application, electronic equipment further includes broadcasting module, and broadcasting module is configured to select
Voice bank play the casting audio, wherein the selected voice bank is a voice bank in preset sound database.
It include at least one voice bank in preset sound database.Different voice banks has different sound characteristics, such as constructs module
The voice bank of foundation may include: boy student library, it can (such as meet the male of the standard of preset rules using standard male voice
Sound) carry out the expression of sound;Female voice library, it can using the female voice of standard (such as the standard for meeting preset rules
Female voice) carry out the expression of sound;Male and female students mixing library, it can according to the particular content of casting audio or require to use
Simultaneously, or alternate mode carries out the expression of sound for male voice and female voice;Dialect library carries out sound using preset dialect
Expression;Can additionally there are the multiple types voice banks such as fast word speed voice bank, slow word speed voice bank, child's voice library.It can basis
User needs and establishes, and is stored, with being capable of quick calling in the one or more of voice banks of use.
When playing out to casting audio, language can be used to play voice messaging, specially based on selected sound
Sound library can be used male voice and come in casting audio to carry out the broadcasting of voice messaging such as the male voice library that selected user requires
The content of voice messaging carries out the casting (such as reading aloud a lyric prose) of language, and also to play the casting sound at the same time
Background audio in frequency.Certainly corresponding voice bank can also be selected according to the voice messaging specifically played, such as first laugh at
Words can choose dialect library to carry out the broadcasting of casting audio, improve acoustic expression effect.
In one embodiment of the application, electronic equipment further includes building module, and building module is configured that reception voice
Sample, and speech samples are pre-processed to establish customized voice bank, and voice bank is added to preset sound database
In.Each user has the preference of oneself, such as someone likes the sweet female voice of sound to broadcast, and someone likes sound quality sense
Strong male voice is broadcasted, and someone is then liked broadcasting audio and broadcasted or even someone likes the sound of specific people with child's voice
To broadcast (such as the relatives and friends using oneself).And the foundation of customized voice bank becomes realization purpose basis, received language
Sound sample is the speech samples that user is inclined to use, that is to say, that user is desirable for voice identical with the speech samples and comes pair
Casting audio plays out, and building module can establish customized voice bank based on the speech samples.
Preferably, building module is further configured to: individual pronunciation difference and/or environment in removal speech samples are made an uproar
Sound removes the redundancy in speech samples to retain key message;Determine the voice unit in speech samples;Believed according to key
It ceases corresponding grammer to identify voice unit, to obtain the language mode of speech samples;It is made by oneself based on language mode building
Adopted voice bank.The pretreatment for removing the individual pronunciation difference and/or ambient noise in speech samples, so that speech samples are more clear
Clear clear, semantic meaning representation is more accurate, then to lay the foundation to being further processed for speech samples.It, can in language expression
Some duplicate information can be will appear and perhaps useless information such as pet phrase, interjection or repeat multiple word,
Building module needs to be removed these redundancies in the present embodiment, retains key message, so as to more benefits
Subsequent recognition accuracy is improved with key message.Different spoken and written languages have can in different voice units, such as English
Using by each word as a voice unit, can be using each independent word or independent word as a voice in Chinese
Unit.Constructing module can be according to the actual conditions of speech samples come specific true when determining the voice unit in speech samples
It is fixed, such as according to the text of speech samples (being Chinese or English), according to the language communicative habits etc. in speech samples.Different languages
Sound sample has corresponded to different grammers, and such as Chinese has Chinese grammer, and English has English grammar, due to key message be by
Treated, eliminates the remaining more regular and ideal information of expression after being not easy identified information, therefore construct module root
Carrying out identification to voice unit according to the corresponding grammer of key message can be realized to accurately voice unit is identified, to obtain
The language mode of speech samples, different speech samples have respective language mode, include the voice in the speech pattern
The various features of sample, and then can be with identical language expression-form come to the progress of other voice messagings based on the various features
Expression.The language mode has corresponded to the speech samples of user oneself offer, thus construct that module construct based on language mode oneself
Feature identical with the speech samples can be embodied by defining voice bank, such as sound characteristic having the same, language communicative habits
Deng.Customized voice bank is enabled to be more in line with the needs of user.
In one embodiment of the application, identification module is further configured to: the content of the pre- casting information of identification is corresponding
Context relation data, and determine according to context relation data the context of pre- casting information.Specifically, from pre- casting letter
The entire content of breath can more accurately know the context of the pre- casting information, in order to avoid the knot only obtained from local content
Fruit and actual pre- casting information have deviation.In the present embodiment, considered based on entire content, it can be out of pre- casting information
Hold corresponding context relationship to determine that the context of pre- casting information, such as identification module identify the pass between preceding sentence and rear sentence
System or the relationship between previous paragraphs and rear paragraph come more accurately determine in advance broadcast information context.
In one embodiment of the application, synthesis module is further configured to: according to the context of pre- casting information by language
Message breath and background audio are divided;By voice messaging and background audio progress synthetic operation with identical context.Specifically
For, voice messaging can be divided into multiple portions, such as content low tide part, content climax parts according to the difference of context,
Content return portion etc.;Same background audio can also be divided into multiple portions, such as sound low tide part, sound according to context
Climax parts, sound return portion etc., preferably, synthesis module can by the voice messaging of identical context and background audio into
Row synthetic operation, so that content and sound are adapted, it is such as that content low tide part and sound low tide is partially synthetic, by content climax
Part is synthesized with sound climax parts, so that the casting audio more effect generated is more lively.In addition, closing in the present embodiment
It is being broadcasted after can also carrying out synthetic operation, such as synthesis according to the content of the key point of at least one in voice messaging at module
When one horror story, the background audio played at thrilling point is stress or supper bass, to the effect that adds to one's terrors at thrilling point
Fruit makes user have feeling on the spot in person.Certain synthesis module can also carry out synthesis behaviour according to the entire content of voice messaging
Make, background audio is adapted in entire content with voice messaging, it such as can be when broadcasting a lyric prose
A head is played in period for entirely broadcasting to releive music.
Above embodiments are only the exemplary embodiment of the application, are not used in limitation the application, the protection scope of the application
It is defined by the claims.Those skilled in the art can make respectively the application in the essence and protection scope of the application
Kind modification or equivalent replacement, this modification or equivalent replacement also should be regarded as falling within the scope of protection of this application.
Claims (10)
1. a kind of method of speech processing, comprising:
Semantics recognition is carried out to the content of the pre- casting information of acquisition;
Background audio corresponding with the pre- casting information and voice messaging are generated according to recognition result;
Synthetic operation is carried out to the background audio and the voice messaging, generates casting corresponding with the pre- casting information
Audio.
2. according to the method described in claim 1, after generation casting audio corresponding with the pre- casting information,
Further include:
The casting audio is played based on selected voice bank, wherein the selected voice bank is in preset sound database
A voice bank.
3. according to the method described in claim 2, the method also includes:
Speech samples are received, and the speech samples are pre-processed to establish customized voice bank, and will be described customized
Voice bank is added in preset sound database.
4. according to the method described in claim 3, described pre-process the speech samples to establish customized sound
Library includes:
Individual pronunciation difference and/or the ambient noise in the speech samples are removed, the redundancy letter in the speech samples is removed
Breath is to retain key message;
Determine the voice unit in the speech samples;
Institute's speech units are identified according to the key message corresponding grammer, to obtain the language of the speech samples
Mode;
The customized voice bank is constructed based on the language mode.
5. according to the method described in claim 1, the content progress semantics recognition of the pre- casting information to acquisition includes:
It identifies the corresponding context relation data of content of the pre- casting information, and is determined according to the context relation data
The context of the pre- casting information.
6. according to the method described in claim 5, carry out synthetic operation to the background audio and the voice messaging, generate with
The corresponding casting audio of the pre- casting information includes:
The voice messaging and the background audio are divided according to the context of the pre- casting information;
By the voice messaging and background audio progress synthetic operation with identical context.
7. a kind of electronic equipment, including identification module, processing module and synthesis module,
The identification module is configured to carry out semantics recognition to the content of the pre- casting information of acquisition;
The processing module is configured to generate background audio corresponding with the pre- casting information and voice according to recognition result
Information;
The synthesis module is configured to carry out synthetic operation to the background audio and the voice messaging, generates and pre- broadcasts with described
The corresponding casting audio of manner of breathing of notifying.
8. electronic equipment according to claim 7, the electronic equipment further includes broadcasting module, the broadcasting module configuration
To play the casting audio based on selected voice bank, wherein the selected voice bank is in preset sound database
One voice bank.
9. electronic equipment according to claim 8, the electronic equipment further includes building module, the building module configuration
Are as follows:
Speech samples are received, and the speech samples are pre-processed to establish customized voice bank, and by the voice bank
It is added in preset sound database.
10. electronic equipment according to claim 9, the building module is further configured to:
Individual pronunciation difference and/or the ambient noise in the speech samples are removed, the redundancy letter in the speech samples is removed
Breath is to retain key message;
Determine the voice unit in the speech samples;
Institute's speech units are identified according to the key message corresponding grammer, to obtain the language of the speech samples
Mode;
Customized voice bank is constructed based on the language mode.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810857848.4A CN109036373A (en) | 2018-07-31 | 2018-07-31 | A kind of method of speech processing and electronic equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810857848.4A CN109036373A (en) | 2018-07-31 | 2018-07-31 | A kind of method of speech processing and electronic equipment |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109036373A true CN109036373A (en) | 2018-12-18 |
Family
ID=64647187
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810857848.4A Pending CN109036373A (en) | 2018-07-31 | 2018-07-31 | A kind of method of speech processing and electronic equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109036373A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109981448A (en) * | 2019-03-28 | 2019-07-05 | 联想(北京)有限公司 | Information processing method and electronic equipment |
CN112107773A (en) * | 2020-09-08 | 2020-12-22 | 杭州趣安科技有限公司 | Audio processing method and system |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101000767A (en) * | 2006-01-09 | 2007-07-18 | 杭州世导科技有限公司 | Speech recognition equipment and method |
CN101567186A (en) * | 2008-04-23 | 2009-10-28 | 索尼爱立信移动通信日本株式会社 | Speech synthesis apparatus, method, program, system, and portable information terminal |
CN103680492A (en) * | 2012-09-24 | 2014-03-26 | Lg电子株式会社 | Mobile terminal and controlling method thereof |
US20150066518A1 (en) * | 2013-09-05 | 2015-03-05 | Electronics And Telecommunications Research Institute | Audio encoding apparatus and method, audio decoding apparatus and method, and audio reproducing apparatus |
CN105096932A (en) * | 2015-07-14 | 2015-11-25 | 百度在线网络技术(北京)有限公司 | Voice synthesis method and apparatus of talking book |
CN107464555A (en) * | 2016-06-03 | 2017-12-12 | 索尼移动通讯有限公司 | Background sound is added to the voice data comprising voice |
CN108010512A (en) * | 2017-12-05 | 2018-05-08 | 广东小天才科技有限公司 | The acquisition methods and recording terminal of a kind of audio |
-
2018
- 2018-07-31 CN CN201810857848.4A patent/CN109036373A/en active Pending
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101000767A (en) * | 2006-01-09 | 2007-07-18 | 杭州世导科技有限公司 | Speech recognition equipment and method |
CN101567186A (en) * | 2008-04-23 | 2009-10-28 | 索尼爱立信移动通信日本株式会社 | Speech synthesis apparatus, method, program, system, and portable information terminal |
CN103680492A (en) * | 2012-09-24 | 2014-03-26 | Lg电子株式会社 | Mobile terminal and controlling method thereof |
US20150066518A1 (en) * | 2013-09-05 | 2015-03-05 | Electronics And Telecommunications Research Institute | Audio encoding apparatus and method, audio decoding apparatus and method, and audio reproducing apparatus |
CN105096932A (en) * | 2015-07-14 | 2015-11-25 | 百度在线网络技术(北京)有限公司 | Voice synthesis method and apparatus of talking book |
CN107464555A (en) * | 2016-06-03 | 2017-12-12 | 索尼移动通讯有限公司 | Background sound is added to the voice data comprising voice |
CN108010512A (en) * | 2017-12-05 | 2018-05-08 | 广东小天才科技有限公司 | The acquisition methods and recording terminal of a kind of audio |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109981448A (en) * | 2019-03-28 | 2019-07-05 | 联想(北京)有限公司 | Information processing method and electronic equipment |
CN109981448B (en) * | 2019-03-28 | 2022-03-25 | 联想(北京)有限公司 | Information processing method and electronic device |
CN112107773A (en) * | 2020-09-08 | 2020-12-22 | 杭州趣安科技有限公司 | Audio processing method and system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Lindsey | English after RP: Standard British pronunciation today | |
Elbow | Voice in writing again: Embracing contraries | |
Peterson | The art of language invention: From horse-lords to Dark Elves to sand worms, the words behind world-building | |
Foley | The singer of tales in performance | |
US6847931B2 (en) | Expressive parsing in computerized conversion of text to speech | |
Mithen | The singing Neanderthals: The origins of music, language, mind, and body | |
Bird | How I stopped dreading and learned to love transcription | |
Pieraccini | The voice in the machine: building computers that understand speech | |
US6865533B2 (en) | Text to speech | |
Jeffries | Discovering language: The structure of modern English | |
Yasar | Electrified Voices: How the Telephone, Phonograph, and Radio Shaped Modern Japan, 1868–1945 | |
Setter | Your voice speaks volumes: it's not what you say, but how you say it | |
Oestreich | Performance Criticism of the Pauline Letters | |
Snaith | Sound and literature | |
Clark | Staging language: Place and identity in the enactment, performance and representation of regional dialects | |
CN108986785B (en) | Text recomposition method and device | |
CN109036373A (en) | A kind of method of speech processing and electronic equipment | |
Werner et al. | Optionality and variability of speech pauses in read speech across languages and rates | |
Nilsenová et al. | Prosodic adaptation in language learning | |
Aaron et al. | Conversational computers | |
Poss | Hmong music and language cognition: An interdisciplinary investigation | |
Shankar | Speaking on the record: A theory of composition | |
Catlin | Puzzling the text: Thought-songs, secret languages, and archaic tones in Hmong music | |
Sandoval et al. | Thinking like a linguist: an introduction to the science of language | |
Osuolale-Ajayi | Discourse and Humour Strategies in Two-Person Stand-up Art in Nigeria |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |