CN106384593B - A kind of conversion of voice messaging, information generating method and device - Google Patents
A kind of conversion of voice messaging, information generating method and device Download PDFInfo
- Publication number
- CN106384593B CN106384593B CN201610801720.7A CN201610801720A CN106384593B CN 106384593 B CN106384593 B CN 106384593B CN 201610801720 A CN201610801720 A CN 201610801720A CN 106384593 B CN106384593 B CN 106384593B
- Authority
- CN
- China
- Prior art keywords
- information
- target voice
- text conversion
- client
- text
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L51/00—User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail
- H04L51/06—Message adaptation to terminal or network requirements
- H04L51/066—Format adaptation, e.g. format conversion or compression
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Telephonic Communication Services (AREA)
- Information Transfer Between Computers (AREA)
Abstract
The embodiment of the present application discloses a kind of conversion of voice messaging, information generating method and device, is related to field of computer technology, is applied to electronic equipment, wherein above-mentioned voice messaging conversion method includes: to receive target voice information;In the case where meeting information switch condition, speech recognition is carried out to the target voice information and obtains text conversion information, so that client shows the text conversion information based on the display location of the target voice information.Using scheme provided by the embodiments of the present application, text can be converted speech into.
Description
Technical field
This application involves field of computer technology, in particular to a kind of voice messaging conversion, information generating method and device.
Background technique
With the fast development of computer technology, the various communication class clients applied to user terminal are come into being.With
Family can be communicated by these communication class clients with good friend.
It when user communicates by above-mentioned communication class client with good friend, can be communicated by text information, also
It can be communicated by voice messaging, greatly facilitate user in this way, however, in some cases, for example, participating in some
Perhaps ambient enviroment is noisy or in the case that user is not intended to other people to hear for meeting, and user, which is inconvenient to listen to, to be received
Voice messaging, it may be considered that the voice messaging received is converted, text is converted into and is shown to user.
In view of the foregoing, a kind of voice messaging conversion method need to be provided, to convert speech information into text information.
Summary of the invention
The embodiment of the present application discloses a kind of conversion of voice messaging, information generating method and device, and voice messaging is turned
Change text information into.
In order to achieve the above objectives, the embodiment of the present application discloses a kind of voice messaging conversion method, is applied to electronic equipment,
The described method includes:
Receive target voice information;
In the case where meeting information switch condition, speech recognition is carried out to the target voice information and obtains text conversion
Information, so that client shows the text conversion information based on the display location of the target voice information.
In order to achieve the above objectives, the embodiment of the present application discloses a kind of information generating method, which comprises
The target voice information that reception source client is sent;
According to the first user speech library of corresponding first user of the target voice information, to the target voice information
Carry out speech recognition;
Recognition result is sent to the source client;
Receive the update information for the recognition result that the source client is sent;
First user speech library is updated according to the update information.
In order to achieve the above objectives, the embodiment of the present application discloses a kind of voice messaging conversion equipment, is applied to electronic equipment,
Described device includes:
Information receiving module, for receiving target voice information;
Speech recognition module, for carrying out language to the target voice information in the case where meeting information switch condition
Sound identification obtains text conversion information, so that client shows the text based on the display location of the target voice information
Transitional information.
In order to achieve the above objectives, the embodiment of the present application discloses a kind of information generation device, and described device includes:
Information receiving module, for receiving the target voice information of source client transmission;
Speech recognition module, for the first user speech library according to corresponding first user of the target voice information,
Speech recognition is carried out to the target voice information;
As a result sending module, for recognition result to be sent to the source client;
Update information receiving module is believed for receiving the amendment for the recognition result that the source client is sent
Breath;
Sound bank update module, for updating first user speech library according to the update information.
As seen from the above, in scheme provided by the embodiments of the present application, after receiving target voice information, turn meeting information
The case where changing condition carries out speech recognition to target voice information and obtains text conversion information, and such client can be based on mesh
The display location for marking voice messaging shows above-mentioned text conversion information.As it can be seen that using scheme provided by the embodiments of the present application, it can
Convert speech information into text information.
Detailed description of the invention
In order to illustrate the technical solutions in the embodiments of the present application or in the prior art more clearly, to embodiment or will show below
There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this
Some embodiments of application for those of ordinary skill in the art without creative efforts, can be with
It obtains other drawings based on these drawings.
Fig. 1 is the flow diagram of the first voice messaging conversion method provided by the embodiments of the present application;
Fig. 2 is the flow diagram of second of voice messaging conversion method provided by the embodiments of the present application;
Fig. 3 is the flow diagram of the third voice messaging conversion method provided by the embodiments of the present application;
Fig. 4 is the flow diagram of the 4th kind of voice messaging conversion method provided by the embodiments of the present application;
Fig. 5 is the flow diagram of the 5th kind of voice messaging conversion method provided by the embodiments of the present application;
Fig. 6 is the flow diagram of the 6th kind of voice messaging conversion method provided by the embodiments of the present application;
Fig. 7 is a kind of flow diagram of information generating method provided by the embodiments of the present application;
Fig. 8 is a kind of structural schematic diagram of voice messaging conversion equipment provided by the embodiments of the present application;
Fig. 9 is a kind of structural schematic diagram of information generation device provided by the embodiments of the present application;
Figure 10 a is the first voice messaging conversion effect schematic diagram provided by the embodiments of the present application;
Figure 10 b is second of voice messaging conversion effect schematic diagram provided by the embodiments of the present application.
Specific embodiment
Below in conjunction with the attached drawing in the embodiment of the present application, technical solutions in the embodiments of the present application carries out clear, complete
Site preparation description, it is clear that described embodiments are only a part of embodiments of the present application, instead of all the embodiments.It is based on
Embodiment in the application, it is obtained by those of ordinary skill in the art without making creative efforts every other
Embodiment shall fall in the protection scope of this application.
First concept involved in the embodiment of the present application is introduced below.
1, sound bank
It include at least a pair of above-mentioned correspondence in sound bank for the corresponding relationship between stored voice message and text information
Relationship.Wherein, the corresponding text information of voice messaging can be single language, be also possible to it is bilingual, the application not to this into
Row limits.
In addition, sound bank can be the included sound bank of system, it is also possible to the voice trained in use
Library.
For example, corresponding " child " two word of the voice messaging of pronunciation " child ";Or it is corresponding " child ";Pronounce " child "
Voice messaging, " shoes " etc. of corresponding Sichuan words.
2, sound bank set
It include at least one sound bank in sound bank set, these sound banks can be divided into: received pronunciation library, common voice
Library, classification sound bank etc..
Wherein, received pronunciation library can be understood as the sound bank for received pronunciation, for example, for mandarin in Chinese
Sound bank, the sound bank for British English, the sound bank for Americanese etc., commonly use sound bank;
Common sound bank, can be the sound bank selected according to own situation, can be client according to current device institute
The geographical location choice sound bank at place can also be the sound bank that client is selected according to current language mode.
Classification sound bank can be the sound bank obtained based on different classification foundations.
For example, the different type according to language can be divided into: sound bank, the Chinese are talked about in Chinese putonghua speech library, Chinese Sichuan
Language Cantonese speech library, Chinese Guangdong language sound bank, English Phonetics library, German speech library, French sound bank, Russian sound bank etc.
Deng;
It can be divided into according to different professional classifications: law class sound bank, computer sound bank, economy class sound bank etc.
Deng.
In addition, the method that voice of the existing technology is converted to text, often only supports the speech recognition of mandarin.But
It is known that everyone has dialect, there is oneself unique pronunciation to some vocabulary, or some vocabulary is misread
Possibility also compares more, and is exchanged between country variant user with different language and also becomes more frequent and this
Exchange is all often mode by wireless communication, and bandwidth and network environment are all more complex, and existing voice is caused to be converted to text
Method cannot effectively identify all voices, and be smoothly converted to text, be not able to satisfy the needs of most users.
There are personalization features from above sound, i.e., the accent of each user, pronunciation, clarity, word speed, tone not phase
Same angle is set out, and can be established sound bank for each user, can be referred to as user speech library, can in above-mentioned sound bank set
With the user speech library comprising each user.
One initial library can be first set when specifically, establishing sound bank for each user, then in use process
In, constantly the initial library is modified to obtain for the use according to the voice messaging, recognition result and update information of user
The user speech library at family realizes that the unique sound of each user can correspond to correct Chinese text or English words.
Wherein, above-mentioned initial library can be artificially selected, for example, the normal word of user is mandarin, then the initial library
It can be selected as Chinese putonghua speech library, the normal word of user is Sichuan words, then the initial library can be selected as Chinese Sichuan
Talk about sound bank.In addition, above-mentioned initial library can also be Default sound library, the application is defined not to this.
Fig. 1 is the flow diagram of the first voice messaging conversion method provided by the embodiments of the present application, this method application
In electronic equipment.
Specifically, the executing subject as the embodiment of the present application, above-mentioned electronic equipment can be server, be also possible to use
Family terminal further in the case where above-mentioned electronic equipment is user terminal, is often equipped with various clients in user terminal
End, therefore, in this case, the executing subject of the provided scheme of the embodiment of the present application is it can be appreciated that client.
Specifically, the above method includes:
S101: target voice information is received.
S102: in the case where meeting information switch condition, speech recognition is carried out to target voice information and obtains text turn
Information is changed, so that client shows text conversion information based on the display location of target voice information.
Above-mentioned target text transitional information includes the text conversion information of at least one form, for example, can only include
Chinese can also be Chinese, English, can also be that Chinese, English, French, German etc. include the text conversion of diversified forms
Information.
Specifically, carrying out speech recognition in the case where meeting information switch condition to target voice information and obtaining text
It when transitional information, can be after receiving target voice information, speech recognition directly carried out to target voice information and obtains text
Transitional information, be also possible to monitoring receive the information conversion instruction for target voice information after, then to target voice carry out
Speech recognition obtains text conversion information.
It is, receiving target voice information can be considered to meet information switch condition, or only connecing
Information conversion instruction is received just to be considered to meet information switch condition later.
In addition, carrying out may having the word continuously repeated in speech recognition text information obtained to target voice information
Language, or the word of speech habits is not obviously met, so can be advised according to preset amendment after obtaining above-mentioned text information
Processing first then is modified to above-mentioned text information, the text of target voice information is then obtained further according to revised text information
Word transitional information.It is of course also possible to which above-mentioned text information to be directly determined as to the text conversion information of target voice information, this Shen
It please be to be illustrated for above-mentioned, be defined not to this.
Specifically, above-mentioned preset modification rule can be, filters out duplicate word, filters out duplicate word, it will not
The word for meeting speech habits, which is modified to, meets word of speech habits etc., and the application is not to above-mentioned preset modification rule
Particular content be defined.
For example, I hobby, after amendment are as follows: my hobby;
Your child's paper is very lovely, after amendment are as follows: your child is very lovely.
It is below server, for client by executing subject, a point different situations are illustrated:
It is assumed that being communicated between the first client and the second client.
Situation one, executing subject are server
Above-mentioned target voice information is sent to server by the first client, and server receives above-mentioned target voice information
Afterwards, above-mentioned target voice information is sent to the second client, the second client receives after above-mentioned target voice information at it
It shows and shows target voice information on interface.
In a kind of situation, server just starts to carry out target voice information after receiving above-mentioned target voice information
Speech recognition obtains text conversion information, will be above-mentioned when server receives the information conversion instruction of the second client transmission
Text conversion information is sent to the second client, shows the text based on the display location of target voice information by the second client
Transitional information.Since server just starts to carry out speech recognition after receiving target voice information, second is being received in this way
After the information conversion instruction that client is sent, it is enable to respond quickly the second client, sends text conversion letter to the second client
The delay of breath is small, and user experience is preferable.
In another case, server after receiving above-mentioned target voice information, can not also start at once to target
Voice messaging carries out speech recognition, but after the information conversion instruction for receiving the transmission of the second client, just carry out voice knowledge
Not Huo get text conversion information, such server only just carries out voice knowledge in the case where client has voice conversion requirements
Not, server resource can effectively be saved.
Situation two, executing subject are server
Above-mentioned target voice information is sent to the second client, while the first client or the second visitor by the first client
Family end also sends the target voice information to server, after the second client receives above-mentioned target voice information, in its displaying
Target voice information is shown on interface.
In a kind of situation, server just starts to carry out target voice information after receiving above-mentioned target voice information
Speech recognition obtains text conversion information, will be above-mentioned when server receives the information conversion instruction of the second client transmission
Text conversion information is sent to the second client, shows the text based on the display location of target voice information by the second client
Transitional information.Since server just starts to carry out speech recognition after receiving target voice information, second is being received in this way
After the information conversion instruction that client is sent, it is enable to respond quickly the second client, sends text conversion letter to the second client
The delay of breath is small, and user experience is preferable.
In another case, server after receiving above-mentioned target voice information, can not also start at once to target
Voice messaging carries out speech recognition, but after the information conversion instruction for receiving the transmission of the second client, just carry out voice knowledge
Not Huo get text conversion information, such server only just carries out voice knowledge in the case where client has voice conversion requirements
Not, server resource can effectively be saved.
Situation three, executing subject are server
Above-mentioned target voice information is sent to the second client by the first client, and the second client receives above-mentioned target
After voice messaging, is shown at it and show target voice information on interface.
When the second client has voice messaging conversion requirements, the second user end to server sends information conversion instruction,
Target voice information is carried in the information conversion instruction, after server receives above- mentioned information conversion instruction, is referred to from information conversion
Target voice information is parsed in order, is then carried out speech recognition and is obtained text conversion information, and by text conversion obtained
Information is sent to the second client, is shown by the second client.
Situation four, executing subject are client
Above-mentioned target voice information is sent to the second client by the first client, and the second client receives target voice
It is shown after information at it and shows target voice information on interface, when carrying out voice communication in this case, the first client is direct
Target voice information is sent to the second client, without by means of server, such other equipment also can not be just somebody's turn to do
Target voice information, to a certain degree from can be improved the safety of voice communication.
In a kind of situation, after the second client receives above-mentioned target voice information, just start to target voice information into
Row speech recognition obtains text conversion information, after the second client receives information conversion instruction, direct basis target voice
The display location of information shows above-mentioned text conversion information.
In another case, the second client can also only be shown on interface at it after receiving target voice information
It shows target voice information, speech recognition is carried out to target voice information without starting, is only receiving information conversion instruction
Afterwards, just start to carry out speech recognition acquisition text conversion information, and then obtain the text conversion information of target voice information, then
Just according to the display location of target voice information, above-mentioned text conversion information is shown.
The second user end to server mentioned in above situation one, situation two and situation three sends information conversion instruction,
It can be and send after the instruction that the second client receives user's long-pressing target voice information, naturally it is also possible to be second
Client receives user for sending after the selection operation of voice conversion, and the application is not to second client of triggering to clothes
The condition that business device sends information conversion instruction is defined.
In above situation four, the instruction that the second client receives user's long-pressing target voice information may be considered second
Client has received information conversion instruction, and the second client receives user can also recognize for the selection operation of voice conversion
To be that the second client has received information conversion instruction.
In addition, when client shows text conversion information based on the display location of target voice information, it can be in target language
In the displayed page of message breath, the preset direction of the display location of target voice information, text conversion information is shown, for example, can
Be the positions such as lower section, the top of the display location of target voice information show text conversion information, text conversion information with
The distance between target voice information may be set according to actual conditions, and the application does not limit the exhibition of above-mentioned text conversion information
Show, specific location can be determines according to actual conditions.For example, with reference to the exhibition that Figure 10 a, Figure 10 b are respectively in target voice information
Show the schematic diagram that text conversion information is shown in the page, above and below the display location of target voice information.
As seen from the above, in scheme provided in this embodiment, after receiving target voice information, meeting information conversion stripes
The case where part, carries out speech recognition to target voice information and obtains text conversion information, and such client can be based on target language
The display location of message breath shows above-mentioned text conversion information.As it can be seen that using scheme provided by the embodiments of the present application, it can be by language
Message breath is converted into text information.
In a kind of specific implementation of the application, referring to fig. 2, the stream of second of voice messaging conversion method is provided
Journey schematic diagram, compared with previous embodiment, in the present embodiment, in the case where meeting information switch condition, to target language message
Breath carries out speech recognition and obtains text conversion information, so that client shows text based on the display location of target voice information
Transitional information (S102), comprising:
S102A: in the case where meeting information switch condition, voice word segmentation processing is carried out to target voice information, is obtained
The voice participle that target voice information includes.
Specifically, when carrying out voice word segmentation processing to target voice information, it can be in conjunction with the amplitude of sound in voice messaging
Information, for example, indicate that amplitude information of sound to pause etc. carries out voice word segmentation processing, it can also be according to fixed data length
Voice word segmentation processing is carried out, the application is not defined the concrete mode of voice word segmentation processing.
It can summarize and learn from foregoing description, carrying out voice word segmentation processing to voice messaging can simply understand are as follows: right
Voice messaging carries out segment processing, each after segment processing to be segmented the small voice messaging of a corresponding data length.Also
It is above-mentioned voice participle it is to be understood that the data length voice messaging smaller than the data length of target voice information.
S102B: for each voice participle obtained, rule is selected according to preset sound bank, from sound bank set
Middle selection sound bank, and the voice is segmented by selected sound bank and carries out speech recognition acquisition text conversion information, with
So that client shows text conversion information based on the display location of target voice information.
Above-mentioned preset sound bank, which selects rule, to be the rule for sound bank selecting sequence, carry out speech recognition
When, according to sound bank selecting sequence specified in sound bank selection rule, a sound bank is first selected, with the sound bank to voice
Participle carries out speech recognition, if recognition result shows that discrimination is low, selects according to above-mentioned sound bank selecting sequence next
Sound bank segments voice carry out speech recognition again, until recognition result shows that discrimination meets preset requirement.
For example, it is assumed that preset sound bank selection rule are as follows: first select user speech library, the identification knot in user speech library
In the case that fruit is unsatisfactory for preset requirement, selection criteria sound bank is first adopted when then carrying out speech recognition to a voice participle
Speech recognition is carried out with the corresponding user speech library of active user, if recognition result shows that discrimination is higher than preset threshold value,
The speech recognition result that the corresponding recognition result in present user speech library is segmented as the voice;If recognition result display identification
Rate is not higher than preset threshold value, then is segmented using received pronunciation library to above-mentioned voice and carry out speech recognition, and recognition result is made
For the speech recognition result of voice participle.
Certainly above-mentioned preset sound bank selection rule is also possible to: first selection criteria sound bank is being directed to received pronunciation
In the case that the recognition result in library is unsatisfactory for preset requirement, user speech library is selected.
Specifically, above-mentioned preset sound bank selection rule can at least one of following information determination according to
Rule:
The classification of first user's said target group of target voice information is sent, such as: the first user said target group
The classification of group is law class, then the topic probability relevant to law that user discusses in the group is higher, preferential to select law class
The classification of sound bank, first user's said target group is IT class, then the topic that user discusses in the group is relevant to IT general
Rate is higher, preferential to select IT class sound bank etc.;
The language form of above-mentioned target group title, for example, the entitled Chinese of target group, then preferentially selection Chinese is common
Sound bank is talked about, the entitled English of target group then preferentially selects English Phonetics library etc.;
The customer attribute information of above-mentioned first user, above-mentioned customer attribute information may include gender, age etc., example
Such as, the customer attribute information of the first user are as follows: gender: female, age: 5 years old, then female voice library and child's voice are preferentially selected
Library;
Send geographical location locating for the source client of target voice information, the certain journey in geographical location locating for the client of source
Language used by a user is able to reflect on degree, for example, geographical location locating for the client of source are as follows: Beijing, then user speaks standard Chinese pronunciation
Probability it is higher, can preferential selection criteria sound bank, geographical location locating for the client of source are as follows: Britain, then user is English-speaking
Probability is higher, can preferentially select English Phonetics library, in addition, geographical location locating for the client of source can pass through the IP of source client
The information acquisitions such as address, movable signal, GPS information;
Before receiving target voice information, the text information and/or voice messaging of stored first user corresponds to voice
In participle, sound bank belonging to the first preset quantity voice participle that frequency of occurrence sorts forward;
Before receiving target voice information, the text information and/or voice messaging of stored target group correspond to voice
In participle, sound bank belonging to the second preset quantity voice participle that frequency of occurrence sorts forward;
The sound bank selecting sequence of user setting.
It should be noted that the application is only illustrated for above-mentioned, preset sound bank selection in practical application
The particular content of rule is not limited to that.
In a kind of implementation of the application, the preset sound bank selection rule determined according to aforesaid way can be with
Carry out speech recognition is combined with user speech library, specifically, for group, wherein generally comprising multiple users, voice
It the user speech library that may include each user in the set of library can also be first using transmission voice letter when carrying out speech recognition
The user speech library of the user of breath carries out speech recognition, and in the case where recognition result cannot be met the requirements, reselection is according to this
The sound bank that the information of group determines is identified, if if recognition result cannot still be met the requirements, it is normal can to continue selection
Speech recognition is carried out with sound bank, received pronunciation library etc..For example, the voice messaging sent to the user A in law class group
When carrying out speech recognition, speech recognition first can be carried out using the user speech library of user A, cannot be met the requirements in recognition result
In the case where, select law class sound bank to carry out speech recognition.
In addition, selecting sound bank from sound bank set, and segment to voice by selected sound bank in this step
Speech recognition is carried out, is that each voice obtained is segmented for individual voice participle, is required to repeat above-mentioned
Process.
As seen from the above, in scheme provided in this embodiment, voice word segmentation processing first is carried out to target voice information, then
Selection sound bank is segmented for each voice and carries out speech recognition, the accuracy rate of speech recognition can be improved, especially for mesh
It the case where including the voice messaging of the voice messaging of different language, different user in mark voice messaging, can further improve
The accuracy rate of speech recognition.
It should be understood that accent is also to be likely to occur variation, for example, sometimes for even for same user
It is communicated using mandarin, is communicated sometimes using dialect, in consideration of it, in a kind of specific implementation of the application,
Referring to Fig. 3, the flow diagram of the third voice messaging conversion method is provided, compared with previous embodiment, in the present embodiment,
In the case where meeting information switch condition, speech recognition is carried out to target voice information and obtains text conversion information, so that
Client shows text conversion information (S102) based on the display location of target voice information, comprising:
S102C: in the case where meeting information switch condition, rule is determined according to preset voice segments, obtains target language
The first aim voice segments of message breath.
Specifically, above-mentioned preset voice segments determine that rule may is that first pause in detection target voice information,
Then target voice information initial position is started to be determined as target voice to the voice messaging between the above-mentioned position detected
The first aim voice segments of information.
It wherein, can be with the amplitude information of reference voice, naturally it is also possible to refer to when detecting the pause in target voice information
Other information, the method for detecting the pause in voice messaging belong to the prior art, and I will not elaborate.
In addition, above-mentioned preset voice segments determine rule it may also is that since the initial position of target voice information, select
Select first aim voice segments of the voice messaging as target voice information of preset length.
The value of above-mentioned preset length can be preset fixed value, can also be the length according to target voice information
Spend determining numerical value.
S102D: being respectively adopted each sound bank in sound bank set, carries out speech recognition to target language segment.
S102E: the highest sound bank of discrimination is determined as target voice library.
S102F: using target voice library to part carries out speech recognition in addition to target language segment in target voice information,
Obtain the first recognition result.
S102G: according to the first recognition result and the second recognition result, obtaining the text conversion information of target voice information,
So that client shows text conversion information based on the display location of target voice information.
Wherein, above-mentioned second recognition result are as follows: carry out the result of speech recognition to target language segment using target voice library.
Those skilled in the art are it is understood that determine that target language segment may have error, using voice
Each sound bank in the set of library is also likely to be present error when carrying out speech recognition to target language segment, therefore, final selected
Target voice library out may not be optimal sound bank.In consideration of it, in a kind of optional implementation of the application, according to
First recognition result and the second recognition result can be according to the first identifications when obtaining the text conversion information of target voice information
As a result with the second recognition result, the first discrimination for being directed to target voice library is obtained, then judges whether the first discrimination is less than
Preset discrimination threshold value, if it is, carrying out speech recognition to target voice information using default Default sound library, and according to needle
To the recognition result in default Default sound library, the text conversion information of target voice information is obtained.
Specifically, may include a sound bank in above-mentioned default Default sound library also may include multiple sound banks, example
Such as, received pronunciation library, English Phonetics library, user speech library etc. be may include in above-mentioned default Default sound library, in addition, may be used also
To provide the use priority information of sound bank included in above-mentioned default Default sound library, for example, can specify that user speech
The priority in library is higher than the priority in English Phonetics library higher than the priority in received pronunciation library, received pronunciation library, that is, preferentially
User speech library is selected, in the case where recognition result is undesirable, selection criteria sound bank carries out speech recognition, in recognition result
In the case where undesirable, reselection English Phonetics library carries out speech recognition, if this time recognition result reaches requirement, by this
Recognition result is as final recognition result, can be by target voice library, received pronunciation if this time recognition result is also undesirable
Best recognition result is as final recognition result in library, the corresponding recognition result in English Phonetics library.The application be more than
It is illustrated for stating, user can set each sound bank in default Default sound library according to their own needs in practical application
Priority, be defined not to this, user each in this way can possess personalized sound bank recognition sequence.
As seen from the above, in scheme provided in this embodiment, according to the first aim voice segments in target voice information
It determines the sound bank for carrying out speech recognition, rather than fixed sound bank is used to carry out speech recognition, help to improve voice
The accuracy of identification.
Above-mentioned voice messaging conversion method is further discussed in detail below by specific example.
It is assumed that the Sichuan user of only one China is carrying out speech exchange, Chinese Sichuan user hair with a Japanese user
The voice messaging sent are as follows: " child wears all well and good, meets physics engineering, the Sichuan intonation voice messaging of very good ", day
After this user receives the voice messaging, selection carries out voice conversion, the default language of Japanese user institute's using terminal selection are as follows:
Japanese,
When being identified to " child wear all well and good, meet physics engineering ", discovery user speech library (Sichuan language
Adjust) when being identified, discrimination is higher, it is believed that the corresponding Chinese of this section of voice, obtained Chinese recognition result are that " shoes are worn
Get up all well and good, meet physics engineering ", it is contemplated that Japanese user may be failed to understand, and English and Japanese OCR can be provided for it
As a result, specifically, above-mentioned Chinese recognition result first can be converted to English recognition result, then again by English recognition result turn
Japanese OCR is changed to as a result, English recognition result and Japanese OCR result directly can certainly be obtained by Chinese recognition result;
When identifying to " very good ", it is found that when sound bank is identified in English, discrimination is higher, so recognizing
For the corresponding English of this section of voice, can directly obtain English recognition result is " very good ", then according to English recognition result
Chinese recognition result " fine " is obtained, Japanese OCR result is obtained according to English recognition result;
Japanese OCR result, English recognition result and Chinese recognition result are finally supplied to Japanese user together.
In addition, when in voice messaging include japanese voice when, when can be identified with japanese voice library, discrimination compared with
Height can determine that this section of voice corresponds to Japanese, directly obtain Japanese recognition result and be supplied to Japanese user, and without with its
Conversion between his languages.
In a kind of specific implementation of the application, referring to fig. 4, the stream of the 4th kind of voice messaging conversion method is provided
Journey schematic diagram, compared with previous embodiment, in the present embodiment, above-mentioned voice messaging conversion method further include:
S103: text conversion information is sent to the first client.
Wherein, the first client is to send the client of target voice information.
S104: the update information for text conversion information that the first client is sent is received, and more according to update information
New above-mentioned text conversion information.
It learns that the executing subject of the present embodiment can be server by the description of front, is also possible to client.
For server, above-mentioned text conversion information is carried out more according to the update information that the first client is sent
Newly, facilitate the correct transformation result of client acquisition that other requests carry out information conversion, in addition, in the sound bank of server
There are in the case where user speech library in set, server can also update user speech library according to above-mentioned update information, in this way
Help to improve the subsequent accuracy rate for carrying out speech recognition.
It is noted that improve the safety of user's communication information, above-mentioned server can be only in a period of time
Voice messaging, text information etc. between the first client of interior storage and the second client, after the time for reaching setting, in deletion
State voice messaging, the text information between two clients.And for user speech library, wherein only storing the voice according to user
The participle information that information determines, the complete voice messaging without storing user, it is possible to guarantee that user communicates letter well
The safety of breath.
For client, obtain update information from the first client, can the recognition result to client carry out school
Just, so that user sees accurate recognition result, user experience is helped to improve, in addition, in the sound bank set of client
There is also in the case where user speech library, client can also update user speech library according to above-mentioned update information, help in this way
In the accuracy rate for improving subsequent progress speech recognition.
In a kind of specific implementation of the application, in the case where above-mentioned electronic equipment is server, above-mentioned voice letter
Ceasing conversion method can also include:
Judge whether the text conversion information before updating has been sent to other clients, if it is, to above-mentioned text has been received
Second client of word transitional information sends amendment prompt information, and according to the second client for above-mentioned amendment prompt information
Feedback result, it is determined whether send updated text conversion information to the second client.
Specifically, server is not necessarily to second if the second client feedback does not need to be updated according to update information
Client sends updated text conversion information;If the second client needs are updated according to update information, server
Updated text conversion information directly can be sent to the second client, the second client believes updated text conversion
Breath shows user.
It is learnt from the description of front, speech recognition can be carried out using user speech library, so, directly to target language message
When breath carries out speech recognition, the first user corresponding first user speech library can be used, directly target voice information is carried out
Speech recognition, wherein the first user are as follows: send the user of target voice information;
When carrying out speech recognition to target voice information, using the first user speech library, language is carried out to target voice information
Sound identification.
Used when due to carrying out speech recognition is the corresponding user speech library of the first user, and user speech library
There may be error, for guarantee it is subsequent reuse accuracy rate with higher when the user speech library carries out speech recognition, connect
After receiving above-mentioned update information, the first user speech library can also be updated further according to above-mentioned update information.
As seen from the above, in the scheme provided in the present embodiment, after obtaining text conversion information, by text conversion information
It is sent to the first client for sending target voice information, and receives the update information that the first client is sent, is believed according to amendment
Breath updates text conversion information, enables to the text conversion information of target voice information more accurate in this way.
In a kind of specific implementation of the application, referring to Fig. 5, the stream of the 5th kind of voice messaging conversion method is provided
Journey schematic diagram, compared with previous embodiment, in the present embodiment, in the case where meeting information switch condition, to target language message
Breath carries out speech recognition and obtains text conversion information, so that client shows text based on the display location of target voice information
Transitional information, comprising:
S102H: in the case where meeting information switch condition, the frequency of the included audio frame of target voice information is obtained.
S102I: according to frequency obtained, target voice information is divided at least one audio section.
It should be understood that may include the voice messaging of multiple users in target voice information, and the sound of each user
Frequency is not generally identical, can be first according to the frequency of audio frame by target language when carrying out speech recognition to target voice information
Message breath is divided into multiple audio sections.Specifically, frequency can be located at the audio frame in a certain frequency range is divided into one
Audio section.One frequency range can be understood as a user, and different frequency ranges corresponds to different users.
Specifically, frequency can be located in a certain frequency range and adjacent audio frame is divided into an audio section, it will
Frequency is located at another frequency range and adjacent audio frame is divided into another audio section.
S102J: being each frequency range selection from sound bank set based on the frequency range for dividing obtained audio section
Corresponding sound bank, and then determine the corresponding sound bank of each audio section.
It is assumed that a frequency range corresponds to multiple audio sections, then it only can be first directed to one of audio section, from voice
Sound bank is selected in the set of library, then corresponds to sound bank for selected sound bank as the audio section, carries out speech recognition.
S102K: using the corresponding sound bank of each audio section, speech recognition is carried out respectively to each audio section.
Specifically, for each audio section, when carrying out speech recognition according to its corresponding sound bank, recognition result
It cannot meet the requirements, then can further select other sound banks to be identified, for example, the voice determined according to group information
Library, common sound bank, received pronunciation library etc., the application is defined not to this.
S102L: according to the recognition result of each audio section, the text conversion information of target voice information is obtained.
As seen from the above, in scheme provided in this embodiment, target voice information is divided into different sounds according to frequency
Then frequency range selects sound bank to carry out speech recognition respectively for different audio sections, in this way can be according to not audio segments
It is different specific, determine sound bank, and then obtain preferable speech recognition result.
In a kind of specific implementation of the application, referring to Fig. 6, the stream of the 6th kind of voice messaging conversion method is provided
Journey schematic diagram, compared with previous embodiment, in the present embodiment, above-mentioned voice messaging conversion method further include:
S105: it receives meeting summary and generates instruction.
S106: the corresponding text conversion information of text information and voice messaging for generating meeting summary is obtained.
It should be noted that may be had been completed before obtaining the corresponding text conversion information of above-mentioned voice messaging
For the speech recognition of voice messaging, then the text conversion information obtained after speech recognition can be directly obtained, if not completing also
For the speech recognition of voice messaging, then speech recognition can be first carried out, then obtain text conversion letter further according to recognition result
Breath.
S107: according to text information obtained and text conversion information obtained, according to preset meeting summary
Format generates meeting summary.
Above-mentioned preset meeting summary format may include: the time of meeting, meeting time span, participate in meeting personnel, spokesman,
Minutes, meeting keyword etc. information.
Specifically, above-mentioned the time of meeting, meeting time span can be according to text information, the voice for generating meeting summary
What the earliest sending time of information and the latest sending time determined.
It can be what the information such as the user according to included in group determined referring to meeting personnel.
Meeting keyword, which can be, carries out keyword to text information obtained and text conversion information obtained
Extract determining, the keyword extracted may be relatively more, can according to rules such as the sequence of frequency of occurrence from less to more,
Select certain amount keyword as final meeting keyword, it can also be according to preset filtering rule, to what is extracted
Keyword is filtered, for example, filter out " ", the words such as " obtaining ", it is crucial that filtered keyword is determined as final meeting
Word.
Specifically, being recorded according to text information obtained and text conversion information obtained according to preset meeting
Format is wanted, meeting summary is generated, comprising:
Determine the user name and IP for sending the user of text information obtained and text conversion information obtained
Address;
According to each audio frame in identified user name, IP address and voice messaging for generating meeting summary
Frequency determines spokesman;
According to text information obtained, text conversion information obtained and spokesman, record according to preset meeting
Format is wanted, meeting summary is generated.
Above-mentioned meeting summary can be pure words form, can also be multimedia form, it can believe comprising voice
The information such as breath, picture, video, text.
Specifically, other than generating meeting summary, can be combined with meeting summary in a kind of implementation of the application
A voice backup information and/or text backup information are generated, to facilitate later stage work personnel to proofread.
Furthermore it is also possible to receive the update information for meeting summary, and above-mentioned meeting is updated according to above-mentioned update information
Summary.
As seen from the above, in scheme provided in this embodiment, after the conference is over, it is not necessarily to staff manual editing
Meeting summary is generated, the operating pressure of staff is alleviated, improves work efficiency.
Fig. 7 is a kind of flow diagram of information generating method provided by the embodiments of the present application, this method comprises:
S701: the target voice information that source client is sent is received.
S702: according to the first user speech library of corresponding first user of target voice information, to target voice information into
Row speech recognition.
S703: recognition result is sent to source client.
S704: the update information for recognition result that source client is sent is received.
S705: first user speech library is updated according to update information.
Specifically, the initial speech library in the first user speech library can be preset received pronunciation library.
As seen from the above, in scheme provided in this embodiment, the target voice information of source client transmission is received, is gone forward side by side
After row speech recognition, then recognition result is sent to source client, is corrected by source client for recognition result, then root
Sound bank update is carried out according to the update information that source client is sent, the personalized speech library for user can be generated in this way, have
When helping the later period and carrying out speech recognition to the voice messaging of user using the user speech library, accurate recognition result is obtained.
Corresponding with above-mentioned voice messaging conversion method, the embodiment of the present application provides a kind of voice messaging conversion equipment.
Fig. 8 is a kind of structural schematic diagram of voice messaging conversion equipment provided by the embodiments of the present application, which is applied to
Electronic equipment, comprising:
Information receiving module 801, for receiving target voice information;
Speech recognition module 802, for being carried out to the target voice information in the case where meeting information switch condition
Speech recognition obtains text conversion information, so that client shows the text based on the display location of the target voice information
Word transitional information.
Specifically, the speech recognition module 802 can be specifically used for carrying out voice participle to the target voice information
Processing obtains the voice participle that the target voice information includes, segments for each voice obtained, according to preset language
Sound library selection rule, selects sound bank, and segment by selected sound bank to the voice and carry out language from sound bank set
Sound identification obtains text conversion information, so that client shows the text based on the display location of the target voice information
Transitional information.
Specifically, the preset sound bank selection rule can at least one of following information determines according to rule
Then:
Send the classification of first user's said target group of the target voice information;
The language form of the target group title;
The customer attribute information of first user;
Send geographical location locating for the source client of the target voice information;
Before receiving the target voice information, the text information and/or voice messaging of stored first user
In corresponding voice participle, sound bank belonging to the first preset quantity voice participle that frequency of occurrence sorts forward;
Before receiving the target voice information, the text information and/or voice messaging of the stored target group
In corresponding voice participle, sound bank belonging to the second preset quantity voice participle that frequency of occurrence sorts forward;
The sound bank selecting sequence of user setting.
Specifically, the speech recognition module 802 includes:
Voice segments obtain submodule, for determining rule according to preset voice segments, obtain the target voice information
First aim voice segments;
First speech recognition submodule, each sound bank for being respectively adopted in sound bank set, to the target language
Segment carries out speech recognition;
Sound bank determines submodule, for the highest sound bank of discrimination to be determined as target voice library;
Second speech recognition submodule, for using the target voice library in the target voice information remove the mesh
Poster segment carries out speech recognition with outer portion, obtains the first recognition result;
The first information obtains submodule, for obtaining the mesh according to first recognition result and the second recognition result
Mark the text conversion information of voice messaging, wherein second recognition result are as follows: using the target voice library to the target
The result of voice segments progress speech recognition.
Specifically, the information acquisition submodule may include:
Discrimination computing unit, for obtaining and being directed to the mesh according to first recognition result and the second recognition result
Mark the first discrimination of sound bank;
Discrimination judging unit, for judging whether first discrimination is less than preset discrimination threshold value;
Information obtainment unit, in the case where the judging result of the discrimination judging unit, which is, is, using default
Default sound library carries out speech recognition to the target voice information, and according to the identification knot for being directed to the default Default sound library
Fruit obtains the text conversion information of the target voice information.
Specifically, the speech recognition module 802, after receiving the target voice information, directly to institute
It states target voice information and carries out speech recognition acquisition text conversion information, so that client is based on the target voice information
Display location shows the text conversion information;Or
Specifically for monitoring whether to receive the information conversion instruction for the target voice information, if it is, to institute
It states target voice and carries out speech recognition acquisition text conversion information, so that displaying of the client based on the target voice information
Position shows the text conversion information.
Specifically, described device can also include:
As a result sending module, for the text conversion information to be sent to first client, wherein described first
Client is to send the client of the target voice information;
As a result update module is believed for receiving the amendment for the text conversion information that first client is sent
Breath, and the text conversion information is updated according to the update information.
Specifically, described device can also include: in the case where the electronic equipment is server
As a result judgment module, for judging whether the text conversion information before updating has been sent to other clients;
Prompt information sending module is the case where being, to having received for the judging result in the result judgment module
Second client of the text conversion information sends amendment prompt information, and is directed to the amendment according to second client
The feedback result of prompt information, it is determined whether the second client of Xiang Suoshu sends the updated text conversion information.
Specifically, the speech recognition module 802, is specifically used for using the first user corresponding first user speech library,
Speech recognition directly is carried out to the target voice information and obtains text conversion information, so that client is based on the target language
The display location of message breath shows the text conversion information, wherein first user are as follows: send the target voice information
User;Or
The speech recognition module 802 is specifically used for using first user speech library, to the target voice information
It carries out speech recognition and obtains text conversion information, so that client shows institute based on the display location of the target voice information
State text conversion information;
Described device can also include:
Sound bank update module, for updating first user speech library according to the update information.
Specifically, the speech recognition module 802 may include:
Frame per second obtains submodule, in the case where meeting information switch condition, obtaining the target voice information institute
Frequency comprising audio frame;
Audio section divides submodule, for according to frequency obtained, the target voice information to be divided at least one
A audio section;
Sound bank selects submodule, for being from sound bank set based on the frequency range for dividing obtained audio section
Each frequency range selects corresponding sound bank, and then determines the corresponding sound bank of each audio section;
Third speech recognition submodule, for using the corresponding sound bank of each audio section, to each audio section respectively into
Row speech recognition;
Second information acquisition submodule obtains the target voice information for the recognition result according to each audio section
Text conversion information.
Specifically, the voice messaging conversion equipment can also include:
Command reception module generates instruction for receiving meeting summary;
Information acquisition module, for obtaining text information and the corresponding text of voice messaging for generating meeting summary
Transitional information;
Summary generation module is used for according to text information obtained and text conversion information obtained, according to pre-
If meeting summary format, generate meeting summary.
Specifically, the summary generation module may include:
Information determines submodule, sends text information obtained and text conversion information obtained for determining
The user name and IP address of user;
Spokesman determines submodule, for according to identified user name, IP address and for generating meeting summary
The frequency of each audio frame, determines spokesman in voice messaging;
Summary generates submodule, for according to text information obtained, text conversion information obtained and described
Spokesman generates meeting summary according to preset meeting summary format.
As seen from the above, in the scheme that above-mentioned each embodiment provides, after receiving target voice information, meeting information
The case where switch condition, carries out speech recognition to target voice information and obtains text conversion information, and such client can be based on
The display location of target voice information shows above-mentioned text conversion information.As it can be seen that using scheme provided by the embodiments of the present application, energy
Enough convert speech information into text information.
Corresponding with above- mentioned information generation method, the embodiment of the present application also provides a kind of information generation devices.
Fig. 9 is a kind of structural schematic diagram of information generation device provided by the embodiments of the present application, which includes:
Information receiving module 901, for receiving the target voice information of source client transmission;
Speech recognition module 902, for the first user speech according to corresponding first user of the target voice information
Library carries out speech recognition to the target voice information;
As a result sending module 903, for recognition result to be sent to the source client;
Update information receiving module 904, the amendment for the recognition result sent for receiving the source client
Information;
Sound bank update module 905, for updating first user speech library according to the update information.
Specifically, the initial speech library in first user speech library is preset received pronunciation library.
As seen from the above, in scheme provided in this embodiment, the target voice information of source client transmission is received, is gone forward side by side
After row speech recognition, then recognition result is sent to source client, is corrected by source client for recognition result, then root
Sound bank update is carried out according to the update information that source client is sent, the personalized speech library for user can be generated in this way, have
When helping the later period and carrying out speech recognition to the voice messaging of user using the user speech library, accurate recognition result is obtained.
For device embodiment, since it is substantially similar to the method embodiment, related so being described relatively simple
Place illustrates referring to the part of embodiment of the method.
It should be noted that, in this document, relational terms such as first and second and the like are used merely to a reality
Body or operation are distinguished with another entity or operation, are deposited without necessarily requiring or implying between these entities or operation
In any actual relationship or order or sequence.Moreover, the terms "include", "comprise" or its any other variant are intended to
Non-exclusive inclusion, so that the process, method, article or equipment including a series of elements is not only wanted including those
Element, but also including other elements that are not explicitly listed, or further include for this process, method, article or equipment
Intrinsic element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that
There is also other identical elements in process, method, article or equipment including the element.
Those of ordinary skill in the art will appreciate that all or part of the steps in realization above method embodiment is can
It is completed with instructing relevant hardware by program, the program can store in computer-readable storage medium,
The storage medium designated herein obtained, such as: ROM/RAM, magnetic disk, CD.
The foregoing is merely the preferred embodiments of the application, are not intended to limit the protection scope of the application.It is all
Any modification, equivalent replacement, improvement and so within spirit herein and principle are all contained in the protection scope of the application
It is interior.
Claims (17)
1. a kind of voice messaging conversion method is applied to electronic equipment, which is characterized in that the described method includes:
Receive target voice information;
In the case where meeting information switch condition, speech recognition is carried out to the target voice information and obtains text conversion letter
Breath, so that client shows the text conversion information based on the display location of the target voice information;
It is wherein, described that the step of speech recognition obtains text conversion information is carried out to the target voice information, comprising:
Rule is determined according to preset voice segments, obtains the first aim voice segments of the target voice information;It is respectively adopted
Each sound bank in sound bank set carries out speech recognition to the target language segment;The highest sound bank of discrimination is true
It is set to target voice library;Using the target voice library in the target voice information in addition to the target language segment part
Speech recognition is carried out, the first recognition result is obtained;According to first recognition result and the second recognition result, the target is obtained
The text conversion information of voice messaging, wherein second recognition result are as follows: using the target voice library to the target language
The result of segment progress speech recognition.
2. the method according to claim 1, wherein described tie according to first recognition result and the second identification
Fruit obtains the text conversion information of the target voice information, comprising:
According to first recognition result and the second recognition result, the first discrimination for being directed to the target voice library is obtained;
Judge whether first discrimination is less than preset discrimination threshold value;
If it is, carrying out speech recognition to the target voice information using default Default sound library, and according to for described pre-
If the recognition result in Default sound library, the text conversion information of the target voice information is obtained.
3. the method according to claim 1, wherein described in the case where meeting information switch condition, to institute
It states target voice information and carries out speech recognition acquisition text conversion information, comprising:
After receiving the target voice information, speech recognition directly is carried out to the target voice information and obtains text conversion letter
Breath;Or
Monitor whether to receive the information conversion instruction for the target voice information, if it is, to the target voice into
Row speech recognition obtains text conversion information.
4. according to the method described in claim 3, it is characterized in that, the method also includes:
The text conversion information is sent to the first client, wherein first client is to send the target voice
The client of information;
The update information for the text conversion information that first client is sent is received, and according to the update information
Update the text conversion information.
5. according to the method described in claim 4, it is characterized in that, the electronic equipment be server in the case where, it is described
Method further include:
Judge whether the text conversion information before updating has been sent to other clients;
If it is, sending amendment prompt information to the second client for having received the text conversion information, and according to described the
Feedback result of two clients for the amendment prompt information, it is determined whether the second client of Xiang Suoshu sends updated institute
State text conversion information.
6. according to the method described in claim 4, it is characterized in that,
It is described that speech recognition acquisition text conversion information directly is carried out to the target voice information, comprising:
Using the first user corresponding first user speech library, speech recognition directly is carried out to the target voice information and obtains text
Word transitional information, wherein first user are as follows: send the user of the target voice information;
It is described that speech recognition acquisition text conversion information is carried out to the target voice information, comprising:
Using first user speech library, speech recognition is carried out to the target voice information and obtains text conversion information;
The method also includes:
According to the update information, first user speech library is updated.
7. method according to claim 1 to 6, which is characterized in that the method also includes:
It receives meeting summary and generates instruction;
Obtain the corresponding text conversion information of text information and voice messaging for generating meeting summary;
It is generated according to text information obtained and text conversion information obtained according to preset meeting summary format
Meeting summary.
8. the method according to the description of claim 7 is characterized in that described according to text information obtained and obtained
Text conversion information generates meeting summary according to preset meeting summary format, comprising:
Determine the user name and IP address for sending the user of text information obtained and text conversion information obtained;
According to the frequency of each audio frame in identified user name, IP address and voice messaging for generating meeting summary
Rate determines spokesman;
According to text information obtained, text conversion information obtained and the spokesman, record according to preset meeting
Format is wanted, meeting summary is generated.
9. method according to claim 1 to 6, which is characterized in that the client is based on the target voice
The display location of information shows the text conversion information, comprising:
Client is in the displayed page of the target voice information, the default side of the display location of the target voice information
To showing the text conversion information.
10. a kind of voice messaging conversion equipment, it is applied to electronic equipment, which is characterized in that described device includes:
Information receiving module, for receiving target voice information;
Speech recognition module, for carrying out voice knowledge to the target voice information in the case where meeting information switch condition
Not Huo get text conversion information so that client shows the text conversion based on the display location of the target voice information
Information;
Wherein, the speech recognition module, comprising: voice segments obtain submodule, for determining rule according to preset voice segments,
Obtain the first aim voice segments of the target voice information;First speech recognition submodule, for sound bank to be respectively adopted
Each sound bank in set carries out speech recognition to the target language segment;Sound bank determines submodule, is used for discrimination
Highest sound bank is determined as target voice library;Second speech recognition submodule, for using the target voice library to described
Part carries out speech recognition in addition to the target language segment in target voice information, obtains the first recognition result;The first information
Submodule is obtained, for obtaining the text of the target voice information according to first recognition result and the second recognition result
Transitional information, wherein second recognition result are as follows: voice knowledge is carried out to the target language segment using the target voice library
Other result.
11. device according to claim 10, which is characterized in that the information acquisition submodule, comprising:
Discrimination computing unit, for obtaining and being directed to the target language according to first recognition result and the second recognition result
First discrimination in sound library;
Discrimination judging unit, for judging whether first discrimination is less than preset discrimination threshold value;
Information obtainment unit, for the judging result of the discrimination judging unit be in the case where, using default default
Sound bank to the target voice information carry out speech recognition, and according to be directed to the default Default sound library recognition result,
Obtain the text conversion information of the target voice information.
12. device according to claim 10, which is characterized in that
The speech recognition module, after receiving the target voice information, directly to the target voice information
It carries out speech recognition and obtains text conversion information, so that client shows institute based on the display location of the target voice information
State text conversion information;
Or
Specifically for monitoring whether to receive the information conversion instruction for the target voice information, if it is, to the mesh
Poster sound carries out speech recognition and obtains text conversion information, so that display location of the client based on the target voice information
Show the text conversion information.
13. device according to claim 12, which is characterized in that described device further include:
As a result sending module, for the text conversion information to be sent to the first client, wherein first client is
Send the client of the target voice information;
As a result update module, the update information for the text conversion information sent for receiving first client,
And the text conversion information is updated according to the update information.
14. device according to claim 13, which is characterized in that in the case where the electronic equipment is server, institute
State device further include:
As a result judgment module, for judging whether the text conversion information before updating has been sent to other clients;
Prompt information sending module is the case where being, described in having received for the judging result in the result judgment module
Second client of text conversion information sends amendment prompt information, and is prompted according to second client for the amendment
The feedback result of information, it is determined whether the second client of Xiang Suoshu sends the updated text conversion information.
15. device according to claim 13, which is characterized in that
The speech recognition module is specifically used for using the first user corresponding first user speech library, directly to the target
Voice messaging carries out speech recognition and obtains text conversion information, so that displaying position of the client based on the target voice information
It sets and shows the text conversion information, wherein first user are as follows: send the user of the target voice information;Or
The speech recognition module, is specifically used for using first user speech library, carries out language to the target voice information
Sound identification obtains text conversion information, so that client shows the text based on the display location of the target voice information
Transitional information;
Described device further include:
Sound bank update module, for updating first user speech library according to the update information.
16. device described in any one of 0-15 according to claim 1, which is characterized in that described device further include:
Command reception module generates instruction for receiving meeting summary;
Information acquisition module, for obtaining text information and the corresponding text conversion of voice messaging for generating meeting summary
Information;
Summary generation module is used for according to text information obtained and text conversion information obtained, according to preset
Meeting summary format generates meeting summary.
17. device according to claim 16, which is characterized in that the summary generation module, comprising:
Information determines submodule, for determining the user for sending text information obtained and text conversion information obtained
User name and IP address;
Spokesman determines submodule, for according to identified user name, IP address and voice for generating meeting summary
The frequency of each audio frame, determines spokesman in information;
Summary generates submodule, for according to text information obtained, text conversion information obtained and the speech
People generates meeting summary according to preset meeting summary format.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610801720.7A CN106384593B (en) | 2016-09-05 | 2016-09-05 | A kind of conversion of voice messaging, information generating method and device |
CN201910311351.7A CN110060687A (en) | 2016-09-05 | 2016-09-05 | A kind of conversion of voice messaging, information generating method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610801720.7A CN106384593B (en) | 2016-09-05 | 2016-09-05 | A kind of conversion of voice messaging, information generating method and device |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910311351.7A Division CN110060687A (en) | 2016-09-05 | 2016-09-05 | A kind of conversion of voice messaging, information generating method and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106384593A CN106384593A (en) | 2017-02-08 |
CN106384593B true CN106384593B (en) | 2019-11-01 |
Family
ID=57938973
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610801720.7A Active CN106384593B (en) | 2016-09-05 | 2016-09-05 | A kind of conversion of voice messaging, information generating method and device |
CN201910311351.7A Pending CN110060687A (en) | 2016-09-05 | 2016-09-05 | A kind of conversion of voice messaging, information generating method and device |
Family Applications After (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910311351.7A Pending CN110060687A (en) | 2016-09-05 | 2016-09-05 | A kind of conversion of voice messaging, information generating method and device |
Country Status (1)
Country | Link |
---|---|
CN (2) | CN106384593B (en) |
Families Citing this family (26)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107147564A (en) * | 2017-05-09 | 2017-09-08 | 胡巨鹏 | Real-time speech recognition error correction system and identification error correction method based on cloud server |
CN107316637A (en) * | 2017-05-31 | 2017-11-03 | 广东欧珀移动通信有限公司 | Audio recognition method and Related product |
CN106991961A (en) * | 2017-06-08 | 2017-07-28 | 无锡职业技术学院 | A kind of artificial intelligence LED dot matrix display screens control device and its control method |
CN107342088B (en) * | 2017-06-19 | 2021-05-18 | 联想(北京)有限公司 | Method, device and equipment for converting voice information |
CN107436748B (en) * | 2017-07-13 | 2020-06-30 | 普联技术有限公司 | Method and device for processing third-party application message, terminal equipment and readable medium |
CN107704447A (en) * | 2017-08-23 | 2018-02-16 | 海信集团有限公司 | A kind of Chinese word cutting method, Chinese word segmentation device and terminal |
CN107465834A (en) * | 2017-09-07 | 2017-12-12 | 深圳支点电子智能科技有限公司 | Mobile terminal and Related product with adaptive identifying ability |
CN107731229B (en) * | 2017-09-29 | 2021-06-08 | 百度在线网络技术(北京)有限公司 | Method and apparatus for recognizing speech |
CN109587429A (en) * | 2017-09-29 | 2019-04-05 | 北京国双科技有限公司 | Audio-frequency processing method and device |
CN107995101B (en) * | 2017-11-30 | 2021-03-23 | 上海掌门科技有限公司 | Method and equipment for converting voice message into text message |
CN108428446B (en) * | 2018-03-06 | 2020-12-25 | 北京百度网讯科技有限公司 | Speech recognition method and device |
CN108831473B (en) * | 2018-03-30 | 2021-08-17 | 联想(北京)有限公司 | Audio processing method and device |
CN110392158A (en) * | 2018-04-19 | 2019-10-29 | 成都野望数码科技有限公司 | A kind of message treatment method, device and terminal device |
CN108647267A (en) * | 2018-04-28 | 2018-10-12 | 广东金贝贝智能机器人研究院有限公司 | One kind being based on internet big data robot Internet of things system |
CN108682420B (en) * | 2018-05-14 | 2023-07-07 | 平安科技(深圳)有限公司 | Audio and video call dialect recognition method and terminal equipment |
CN109039509A (en) * | 2018-07-16 | 2018-12-18 | 广州辉群智能科技有限公司 | A kind of method and broadcasting equipment of voice control broadcasting equipment |
CN109237740A (en) * | 2018-07-31 | 2019-01-18 | 珠海格力电器股份有限公司 | A kind of control method of electric appliance, device, storage medium and electric appliance |
CN109036406A (en) * | 2018-08-01 | 2018-12-18 | 深圳创维-Rgb电子有限公司 | A kind of processing method of voice messaging, device, equipment and storage medium |
CN109036410A (en) * | 2018-08-30 | 2018-12-18 | Oppo广东移动通信有限公司 | Audio recognition method, device, storage medium and terminal |
CN109036424A (en) * | 2018-08-30 | 2018-12-18 | 出门问问信息科技有限公司 | Audio recognition method, device, electronic equipment and computer readable storage medium |
CN109600299B (en) * | 2018-11-19 | 2021-06-25 | 维沃移动通信有限公司 | Message sending method and terminal |
CN111277589A (en) * | 2020-01-19 | 2020-06-12 | 腾讯云计算(北京)有限责任公司 | Conference document generation method and device |
CN111756930A (en) * | 2020-06-28 | 2020-10-09 | 维沃移动通信有限公司 | Communication control method, communication control device, electronic apparatus, and readable storage medium |
CN111816183A (en) * | 2020-07-15 | 2020-10-23 | 前海人寿保险股份有限公司 | Voice recognition method, device and equipment based on audio and video recording and storage medium |
CN112417095A (en) * | 2020-11-17 | 2021-02-26 | 维沃软件技术有限公司 | Voice message processing method and device |
CN112672213A (en) * | 2020-12-18 | 2021-04-16 | 努比亚技术有限公司 | Video information processing method and device and computer readable storage medium |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5555320A (en) * | 1992-11-27 | 1996-09-10 | Kabushiki Kaisha Toshiba | Pattern recognition system with improved recognition rate using nonlinear transformation |
CN1737902A (en) * | 2005-09-12 | 2006-02-22 | 周运南 | Text-to-speech interchanging device |
CN101377726A (en) * | 2007-08-31 | 2009-03-04 | 西门子(中国)有限公司 | Input method combining speech recognition with stroke recognition and terminal thereof |
CN101453499A (en) * | 2008-11-07 | 2009-06-10 | 康佳集团股份有限公司 | Mobile phone syllable conversion device and method thereof |
CN101923858A (en) * | 2009-06-17 | 2010-12-22 | 劳英杰 | Real-time and synchronous mutual translation voice terminal |
CN202026434U (en) * | 2011-04-29 | 2011-11-02 | 广东九联科技股份有限公司 | Voice conversion STB (set top box) |
CN102436812A (en) * | 2011-11-01 | 2012-05-02 | 展讯通信(上海)有限公司 | Conference recording device and conference recording method using same |
Family Cites Families (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101923854B (en) * | 2010-08-31 | 2012-03-28 | 中国科学院计算技术研究所 | Interactive speech recognition system and method |
CN102572372B (en) * | 2011-12-28 | 2018-10-16 | 中兴通讯股份有限公司 | The extracting method and device of meeting summary |
CN102750949B (en) * | 2012-07-16 | 2015-04-01 | 深圳市车音网科技有限公司 | Voice recognition method and device |
CN103680498A (en) * | 2012-09-26 | 2014-03-26 | 华为技术有限公司 | Speech recognition method and speech recognition equipment |
CN103903611B (en) * | 2012-12-24 | 2018-07-03 | 联想(北京)有限公司 | A kind of recognition methods of voice messaging and equipment |
CN103903621A (en) * | 2012-12-26 | 2014-07-02 | 联想(北京)有限公司 | Method for voice recognition and electronic equipment |
CN103106900B (en) * | 2013-02-28 | 2016-05-04 | 用友网络科技股份有限公司 | Speech recognition equipment and audio recognition method |
CN103489444A (en) * | 2013-09-30 | 2014-01-01 | 乐视致新电子科技(天津)有限公司 | Speech recognition method and device |
CN104916283A (en) * | 2015-06-11 | 2015-09-16 | 百度在线网络技术(北京)有限公司 | Voice recognition method and device |
CN105096940B (en) * | 2015-06-30 | 2019-03-08 | 百度在线网络技术(北京)有限公司 | Method and apparatus for carrying out speech recognition |
CN105244026B (en) * | 2015-08-24 | 2019-09-20 | 北京意匠文枢科技有限公司 | A kind of method of speech processing and device |
CN105913845A (en) * | 2016-04-26 | 2016-08-31 | 惠州Tcl移动通信有限公司 | Mobile terminal voice recognition and subtitle generation method and system and mobile terminal |
-
2016
- 2016-09-05 CN CN201610801720.7A patent/CN106384593B/en active Active
- 2016-09-05 CN CN201910311351.7A patent/CN110060687A/en active Pending
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5555320A (en) * | 1992-11-27 | 1996-09-10 | Kabushiki Kaisha Toshiba | Pattern recognition system with improved recognition rate using nonlinear transformation |
CN1737902A (en) * | 2005-09-12 | 2006-02-22 | 周运南 | Text-to-speech interchanging device |
CN101377726A (en) * | 2007-08-31 | 2009-03-04 | 西门子(中国)有限公司 | Input method combining speech recognition with stroke recognition and terminal thereof |
CN101453499A (en) * | 2008-11-07 | 2009-06-10 | 康佳集团股份有限公司 | Mobile phone syllable conversion device and method thereof |
CN101923858A (en) * | 2009-06-17 | 2010-12-22 | 劳英杰 | Real-time and synchronous mutual translation voice terminal |
CN202026434U (en) * | 2011-04-29 | 2011-11-02 | 广东九联科技股份有限公司 | Voice conversion STB (set top box) |
CN102436812A (en) * | 2011-11-01 | 2012-05-02 | 展讯通信(上海)有限公司 | Conference recording device and conference recording method using same |
Also Published As
Publication number | Publication date |
---|---|
CN106384593A (en) | 2017-02-08 |
CN110060687A (en) | 2019-07-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106384593B (en) | A kind of conversion of voice messaging, information generating method and device | |
US11315546B2 (en) | Computerized system and method for formatted transcription of multimedia content | |
US9070369B2 (en) | Real time generation of audio content summaries | |
CN106331893B (en) | Real-time caption presentation method and system | |
CN105895103B (en) | Voice recognition method and device | |
CN105719649B (en) | Audio recognition method and device | |
US8515728B2 (en) | Language translation of visual and audio input | |
US20180307667A1 (en) | Travel guide generating method and system | |
WO2021129439A1 (en) | Voice recognition method and related product | |
US20160189713A1 (en) | Apparatus and method for automatically creating and recording minutes of meeting | |
CN107102990A (en) | The method and apparatus translated to voice | |
CN111128183B (en) | Speech recognition method, apparatus and medium | |
US20160189107A1 (en) | Apparatus and method for automatically creating and recording minutes of meeting | |
CN112399269B (en) | Video segmentation method, device, equipment and storage medium | |
US20160189103A1 (en) | Apparatus and method for automatically creating and recording minutes of meeting | |
US10089898B2 (en) | Information processing device, control method therefor, and computer program | |
JP2015212732A (en) | Sound metaphor recognition device and program | |
CN112738557A (en) | Video processing method and device | |
US20190213998A1 (en) | Method and device for processing data visualization information | |
CN110781346A (en) | News production method, system, device and storage medium based on virtual image | |
JP7107229B2 (en) | Information processing device, information processing method, and program | |
CN105161112B (en) | Audio recognition method and device | |
CN113409791A (en) | Voice recognition processing method and device, electronic equipment and storage medium | |
CN116958342A (en) | Method for generating actions of virtual image, method and device for constructing action library | |
CN113539234B (en) | Speech synthesis method, device, system and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |